Behind the Data: What It Takes to Keep Up EC3’s EPD Database
February 13, 2026
The world’s largest open-access database of Environmental Product Declarations doesn’t maintain itself. Here’s what goes into making EPD data searchable, comparable, and trustworthy.

Whether you are searching EC3 for low-carbon concrete or comparing steel EPDs across manufacturers, you’re seeing the end result of a complex data operation. Behind every one of the 200,000+ EPDs in our database is a story of ingestion, validation, alignment, and ongoing curation—work that transforms inconsistent PDF documents into structured, comparable data.
This post pulls back the curtain on EC3’s EPD data operations: what it actually takes to build and maintain a database that over 2,000 organizations across 84 countries rely on for material selection decisions.
The impetus of this article is improving the understanding of BT’s current process and to invite everyone to engage in and propose improvements for EC3 2.0. Please share your feedback in our feedback form and during our feedback sessions.
___
The Fundamental Challenge: EPDs Weren’t Built for Databases
Many Environmental Product Declarations are designed as standalone disclosure documents, not database records. They’re usually PDFs—often 10+ pages of tables, charts, and methodology descriptions—published by over 60 different Program Operators worldwide, each with their own formatting conventions, reporting requirements, and data management practices.
The challenges compound quickly:
Inconsistent data structures. One Program Operator reports GWP-fossil separately from GWP-biogenic. Another reports only “GWP-total.” A third reports “GWP” without specifying which definition they’re using. All three are technically compliant with ISO 14025, but they’re not directly comparable without careful interpretation.
Life Cycle Stage Boundaries and Scope. EPDs frequently differ in the life cycle stages they report. Some provide only the product stage (A1-A3), while others extend to include construction (A4-A5), use (B1-B5), or end-of-life (C1-C4) stages. Furthermore, the reporting can be aggregated (A1-A3 total) or disaggregated (A1, A2, A3 separately). The treatment of replacement modules also varies, with some including them in the product stage (A) and others in the use stage (B).
Declared and Functional Units. We also see frequent differences in declared or functional units across EPDs within the same product categories, often stemming from temporal or regional variations in reporting rules.
Version management. Each Program Operator handles EPD updates, extensions, and replacements differently. Some issue new declaration numbers for minor updates; others reuse the same number across major revisions. Expiration date extensions sometimes appear as entirely new documents, sometimes as amendments to existing ones.
Variable data quality. Some EPDs arrive with complete, well-structured data. Others are missing plant locations, useful performance characteristics, have ambiguous product descriptions, and other limitations.
These differences do not necessarily mean the EPDs are incomparable, but they introduce complexity for alignment. They require substantial effort to normalize them so that meaningful, apples-to-apples comparisons can be made—a core part of the ongoing curation work that Building Transparency performs whenever possible. Building Transparency’s goal is to help transform this heterogeneous landscape into an easy to use tool for allowing users to make informed design and procurement decisions.

___
How EPDs Enter EC3
EPDs reach our database through several pathways, each with different data quality characteristics.
- Digital transfer from EPD Generators & Program Operators directly to EC3
- Sourcing data from Program Operator platforms and other EPD aggregator platforms
- PDF documents uploaded directly to EC3 by users, including architects, engineers, contractors, manufacturers, and others.
Our team does not track or search for EPDs on manufacturers’ own websites, but we encourage manufacturers with EPDs to upload their data to EC3, or opt in to sending data to EC3 from their digital EPD Generator or Program Operator.
EPDs from aggregated sources and direct uploads to EC3 are collected in their original PDF form, then machine-read, parsed by our system, and checked for quality before being searchable within the Find & Compare Materials or Plan & Compare Buildings sections.
Direct Digital Transfer from Generators & Operators
The cleanest data comes from LCA software providers who transmit structured EPD data directly upon creation. This pathway typically has the fewest quality issues because the data is born digital—no PDF parsing by BT required.
Program Operator Integrations
We maintain connections with Program Operators who make their EPD registries available online or via an API. This provides an authoritative source of data, but we still need to map their varying data structures to EC3’s uniform schema. We have to handle variability in Program Operators’ own version management approaches, split EPDs that contain multiple results sets for multiple products (split in EC3 into individual data points for user to be able to search and use), or otherwise require any normalization to make them fit specific EC3 categories and comparison requirements.
AI-Powered PDF Parsing
Many EPDs today still enter EC3 as PDF documents—uploaded by users or pulled from Program Operator registries that only publish PDFs. Our AI-powered parser extracts structured data automatically. This isn’t simple Optical Character Recognition (OCR). The parser must also understand the larger context and formatting patterns in order to:
- Identify which tables contain impact data versus methodology references
- Distinguish between GWP-fossil, GWP-biogenic, GWP-luluc, and GWP-total, GWP Including Biogenic, GWP Excluding Biogenic, and handle EPDs that just say “GWP” without clarifying, in addition to other impact indicators like acidification, eutrophication and all their variants
- Extract declared units, functional units, and life cycle stage boundaries correctly
- Handle multi-product EPDs that declare impacts for several product variants
- Map manufacturer-specific product names to standardized product categories
- Find and connect organizations and individuals that are the EPD’s owner, program operator, verifier, and developer
- Find and connect relevant plant information
- Extract any category-specific performance parameter information for the product (e.g., compressive strengths, material types, thickness, fire-rating, intended applications, etc.)
We have gone through multiple iterations of the parsing system since EC3’s launch in 2020. Since deploying our latest AI parsing pipeline last year, we’ve added thousands of new EPDs—a pace impossible with manual digitization alone. But parsing is only the first step.
___
Quality Assurance: Automated Checks and Human Review
Every EPD runs through approximately 250 automated validation checks before publication. These flag issues across several dimensions:
Completeness. Are required impact categories present? Is the declared unit specified? Are validity dates included?
Plausibility. Is this GWP value physically reasonable for this material category? A concrete mix claiming 50 kg CO₂e/m³ or 5,000 kg CO₂e/m³ both warrant investigation.
Consistency. Do life cycle stage totals add up correctly? Are module boundaries consistent with the declared scope?
Comparability. Does this EPD meet minimum requirements for comparison within its product category? Are the specificity flags (product-specific, facility-specific, manufacturer-specific) correctly set?
EPDs that pass all critical checks get “OK” or “warning” status and become searchable in the the Find & Compare Materials or Plan & Compare Buildings sections. EPDs with “errors” or “failures” get flagged for additional manual review before they can be used by users.
Manual QA: Where Human Expertise Matters
Our data team reviews flagged EPDs to determine root causes and appropriate fixes. The work falls into several categories:
Category-level QA. Reviewing entire product categories to ensure statistics are correctly represented. This often involves sorting by reported GWP to identify outliers—EPDs with unusually high or low values can indicate categorization errors or parsing problems that need to be addressed.
Single-EPD deep dives. When an EPD needs complete review, we check every field against the source PDF: product name, validity dates, reported GWP, manufacturer and plant information, data source information, PCR references, product characteristics and performance attributes. The most critical fields get priority attention.
Pattern identification. When we notice recurring issues—say, EPDs from a particular Program Operator consistently having incorrect GWP field mapping—we flag these for systematic fixes and updates to our parser.
User feedback reports. Users that find issues are encouraged to use the “Report Bugs & Feedback,” button in EC3 which notifies our team. Common issues include product names that don’t match manufacturer websites, missing plant locations, or digitized values that don’t match the source PDF. Users can also use the “suggest corrections” feature to send us suggestions for changes.
Manufacturers can also manage parts of the EPD pages in EC3 that they have the most knowledge about, including anything on the right side of the EPD page, like product description, performance specifications, and plant information. The remainder of the EPD data in EC3 can be managed by the Program Operator of record on the EPD or by our data team.
___
The Ongoing Work of Data Hygiene
A database of 200,000+ EPDs requires constant maintenance.
Version and Expiration Management
EPDs have validity periods—typically five years. But the reality is messier than that sounds:
Extensions. EPDs sometimes get six-month or twelve-month validity extensions without other changes. We need to recognize these as continuations of existing records, not new additions to the database. When we identify extensions, we either update the expiration date on the existing entry or create a linked record that maintains continuity without overlap.
Major updates. When a manufacturer republishes a new EPD for a product whose prior EPD hasn’t technically expired yet, we create a new record for the new EPD and mark the old one as superseded, preserving the audit trail while ensuring searches return only currently applicable data.
Corrections. Sometimes an EPD gets republished due to corrections or changes to graphics and formatting. The significance of these changes for making procurement decisions varies and our management approach does too. For graphical and formatting changes, we simply replace the attached PDF of record. Value corrections are updated directly in the existing EC3 EPD record and may affect users’ Project Planner calculations.
Discontinued products. Sometimes products with valid EPDs get retired before the EPD expires. Showing discontinued products in current searches would be misleading, so we manually expire these records in our system, usually when notified by the manufacturer.
Declaration numbers and version inconsistencies. Not all Program Operators maintain clean declaration number practices. The same product might appear with different declaration numbers across versions, or in rare instances, different products might share the same declaration number. We maintain deduplication logic, but edge cases require human judgment. Sometimes our team goes as far as investigating the creation dates and individuals recorded in the PDF file’s metadata.
Deduplication
As mentioned earlier, the same EPD can enter EC3 through multiple pathways—user upload, Program Operator feed, direct deposit to EC3. Our system checks for identical documents, but if there’s even a small PDF change (updated logo, revised company description), it may appear as a new record until we run additional deduplication checks. Manual duplicate checks are also part of standard quality assurance workflow.
Permanent Record
Rather than deleting data, EC3 retains all EPD records indefinitely–including expired, discontinued, and updated EPDs–and instead manages whether they are searchable. EPD data in EC3 may be referenced on building and infrastructure projects, used for carbon calculations in EC3 and in external platforms, and tied to documentation of actual material purchases. When an EPD expires or gets replaced, we retain the previous record indefinitely for auditability and to maintain any references that may have been established to it. This is one of the core differences between EC3 and many Program Operator platforms or manufacturer websites, which often remove old EPD records entirely from their systems–making auditing difficult, or an extra step at the least.

GWP Interpretation
This deserves special mention because it’s a constant source of complexity. EPDs report global warming potential in several ways:
| EPD reports | EC3 stores as | Notes |
| GWP-fossil / GWP-excluding biogenic | GWP-fossil and GWP* | Standard fossil carbon |
| GWP-biogenic | GWP-biogenic | Carbon from biological sources |
| GWP-luluc | GWP-luluc | Land use change |
| GWP-total / GWP-net / GWP including biogenic | GWP-net | Sum of all GWP components |
| Just “GWP” | Depends | Requires interpretation |

When an EPD with bio-based content reports only “GWP” without specifying which definition, we have to determine whether it includes biogenic carbon or not. For products containing wood or paper, this distinction significantly affects the reported value. We sometimes need to calculate stored carbon separately using standardized methods like EN16449:2014 to enable proper comparison, for example, taking into account percent of product that is wood, moisture content, density, and other factors helping to estimate the amount of stored biogenic carbon. EC3 provides additional biogenic carbon data outside of the environmental impacts table, stored as “[emitted] biogenic embodied carbon” and “stored biogenic carbon”.

Most recently published EPDs, aligning to newer PCRs and standards also report biogenic carbon removals and emissions (BCRP, BCEP) dramatically helping us align all GWP values, but until all of the older EPDs expire, we have to manage the transition where not all EPDs go into that level of transparency.
GWP values may also vary based on the underlying characterization factors, such as the ones defined by varying IPCC versions. The largest differences in characterization factors affect emissions related to methane, nitrous oxides, and refrigerants, which are typically mostly relevant to specific product categories. Product categories dominated by carbon dioxide emissions will see minor differences. In any case, we do store the characterization factor information on the full EPD page. When EPDs report multiple sets of results with different characterization factors, we show the end users the one that matches their preference.

___
Building Product Categories: Research, Not Just Data Entry
EC3’s product categories and tracked attributes don’t appear automatically. Each category represents significant research investment:
Expert consultation. We spend countless hours on calls with industry experts—manufacturers, engineers, sustainability consultants—understanding what attributes matter for material selection decisions in each category.
PCR analysis. Product Category Rules define what EPDs in a category must report, but PCRs vary by Program Operator and evolve over time. We analyze current PCRs to understand what data should be available and how to structure comparisons.
Existing EPD review. Before launching a new category, we review available EPDs to understand the realistic data landscape—what’s consistently reported, what’s often missing, what variation exists.
Comparability decisions. Not everything can be directly compared. When EPDs use different declared units, system boundaries, or methodological approaches, we need to either harmonize the data, limit comparisons, or clearly communicate limitations through the interface.
When users can’t find an appropriate category for a product, we document the gap and evaluate whether a new category is warranted—a process that triggers another round of research and expert consultation.
Updates to existing EPDs
Once additional categories and fields are implemented, our team also has to update all existing EPDs in the system that may be affected by the improvements. We have to selectively re-parse existing data to move the EPDs to new categories or to fill in new fields, when available.
___
Improving Quality Through Relationships & Engagement
EC3’s data quality depends on relationships with the organizations that create and publish EPDs:
Program Operators maintain EPD registries and can review/approve corrections to EPDs published under their programs. We work with them to improve data feeds and resolve systematic issues.
LCA developers create EPDs and can provide structured data directly, bypassing PDF parsing entirely. These integrations produce the cleanest data. The extra time it takes to make the automated connections and map distinct digital formats is well worth the quality and completeness benefits.
Manufacturers sometimes contact us to help them add new data, correct existing product information, update plant locations, or flag discontinued products.
Each relationship requires ongoing maintenance as partners evolve their systems, change data formats, or update their own practices.
Our team members have also been involved in various PCR committees and ACLCA PCR Open Standard developments, amongst other standardization initiatives, pushing towards digitization of EPDs, improvements in background data, and better transparency of EPDs to make the entire EPD ecosystem better.
___
The Team and Infrastructure
Who Does This Work
Data Manager and Technical Director lead data quality initiatives—reviewing flagged EPDs, making QA decisions, improving internal systems, managing category development, and coordinating with Program Operators.
Front-end and backend engineers build and maintain the parsing systems, API, and create the tools that make QA workflows efficient for the rest of the data team.
DevOps engineer monitors infrastructure, manages deployments, and ensures the system stays reliable and secure.
What Keeps It Running
The infrastructure side includes:
- Cloud infrastructure with appropriate redundancy and security controls
- Database operations supporting real-time queries and batch processing
- Monitoring and alerting to catch issues before users do
- Security and compliance—we maintain SOC 2 Type 2 certification
- Backup and disaster recovery for data protection
Infrastructure costs scale with usage. More API calls, more EPDs, more users—all translate to higher operational expenses.
___
What You Get When You Use EC3
When you search EC3, you are not just querying a database. You are seeing the product if a complex operation including:
- AI parsing that handles the messiness of real-world EPD documents
- Automated validation that catches errors before they reach you
- Manual QA that resolves edge cases requiring human judgment
- Ongoing deduplication and version management
- GWP interpretation that enables meaningful comparison
- Category structures built through expert consultation
- Infrastructure maintained to SOC 2 standards
We’ve always believed embodied carbon data should be openly accessible. That belief hasn’t changed.
The work described above represents ongoing investment supported through grants and sponsorships. When you become our supporter or partner you are joining a community that is empowering data-led environmental decision-making available to everyone, helping us respond faster to data quality issues, expand coverage to new categories and regions, and keep pace with evolving EPD standards. We greatly appreciate all your engagement and support and hope to serve this community for many years to come.
Sincerely,
The Building Transparency team



