72 Million DOIs Ready for Analysis: Our Latest Public Data File
Creators & Contributors
The DataCite Metadata Store is a rich source of information and insights about research outputs, resources, and activities. Discovery systems and services worldwide rely on this metadata to build and enhance their platforms and workflows, forming critical pieces of scholarly infrastructure. At DataCite, we're committed not only to making DataCite metadata openly available—following our commitment to the POSI principles—but also easily accessible.
One way we are making access easier is through our annual public data file. Following last year's inaugural release, today we are releasing DataCite's second annual public data file: with metadata for 72 million DataCite DOIs in Findable state that were registered up to the end of 2024.
New Year, New Metadata
Over the past year, the DataCite community has registered over 19 million new DOIs. Our members have also submitted updates to over 10 million existing DOIs, encompassing metadata improvements that enhance the completeness and quality of DataCite metadata. Here are some of the highlights:
- Connection metadata: Of the DOIs registered in 2024, over 58% have at least one ROR ID and over 61% have at least one ORCID iD. These identifiers make it easy to analyze links among works, people, and organizations.
- New resource types: In 2024, we released Schema 4.5 and Schema 4.6, introducing new resource types: Instrument, StudyRegistration, Project, and Award. So far, 20 organizations have already made use of these newly added resource types.
- New organizations: Over 100 organizations registered their first DOI in 2024, including both Direct Members and Consortium Organizations.
When exploring the public data file, you will find DOIs registered for over 30 resource types, underscoring the range of outputs, objects, and activities that the DataCite community chooses to represent. The diverse outputs are showcased in the figure below.

Updated Data File Structure
In response to community feedback about our first public data file released last year, we have made some changes to make the data file structure easier to access, parse, and interpret:
- The data file's internal directory structure is now organized by the date the DOI was last updated.
- The way the data file is packaged has been altered to enable quicker and more efficient extraction.
- We have included tabular reports for all registered DataCite DOIs with helpful information about their state, associated Repository Account ID, and updated timestamp.
- We have added a Readme file to the data file.
Read more about the details of the data file format and structure in our support documentation.
Access the Public Data File
Interested in exploring all DataCite metadata? Get a link to download the public data file directly via our data files portal:
Talk With Us About Metadata Harvesting
Our public data file is one of several mechanisms for metadata harvesting. As we continue to explore additional options for harvesting and to enhance our metadata quality, we are interested in hearing from you about how we can better serve your needs.
For metadata harvesters and community members interested in harvesting, we are launching a new Harvesters Interest Group. Through this group, the DataCite Team will share updates and new features, opportunities for community discussion, and ways to share your ideas with us. We invite you to sign up to the Google Group to participate and receive updates. You can also register for the first Harvester Interest Group meeting on January 29, 2025 at 16:00 UTC.
If you have any ideas or suggestions for DataCite services, we welcome you to share them in DataCite's GitHub Discussion space, DataCite Suggestions. Finally, we always welcome you to reach out to us at support@datacite.org with any questions or feedback.
Additional details
Description
The DataCite Metadata Store is a rich source of information and insights about research outputs, resources, and activities. Discovery systems and services worldwide rely on this metadata to build and enhance their platforms and workflows, forming critical pieces of scholarly infrastructure.
Identifiers
- UUID
- 41a6cb43-f5a9-426f-9c30-1444a0d4288b
- GUID
- https://datacite.org/?p=11943
- URL
- https://datacite.org/blog/72-million-dois-ready-for-analysis-our-latest-public-data-file/
Dates
- Issued
-
2025-01-23T19:53:04
- Updated
-
2025-01-29T04:08:50