Published June 12, 2025 | https://doi.org/10.5438/bkxd-k077

Cultivating a Thriving Metadata Ecosystem: Advancing Technical and Community-Driven Strategies for Improvements and Enrichments

Creators & Contributors

Feature image

DataCite Metadata: Connecting Research to Advance Knowledge

For stakeholder communities around the world, DataCite offers a trusted home for research organizations and their communities to contribute, connect, and retrieve key metadata about outputs, resources, and activities. Over nearly two decades, the DataCite metadata store has grown into a rich and valuable source of information and insights that reflects the depth and breadth of global research activities and powers discovery, reuse, and integration across the research infrastructure ecosystem.

As the DataCite metadata store has grown and evolved over the years, it continues to support a rich array of use cases for identification, analysis, and retrieval of research-related information and activities. We can track and measure this growth in different ways, including through DOI registration activity (currently exceeding 85 million cumulative DOIs registered), membership growth (currently 1600+ organizations in 60+ countries), regular updates to the DataCite Metadata Schema informed by community needs and feedback, and analyses of current metadata completeness, consistency, and utility.

Driving Discovery and Reuse

As the DataCite metadata store grows, and as downstream systems and services increasingly rely on this metadata, demands rise for both discovery and reuse. Likewise, expectations grow for DataCite services and infrastructure to make high-quality metadata registration and retrieval as easy and effective as possible. 

While optimizing the richness and usability of DataCite metadata has been a long-standing goal in service of our mission, achieving this objective is a continuous, collaborative process, and success doesn't happen overnight. 

This year, we've been taking a closer look at the challenges and opportunities that come with driving metadata quality, and we wanted to provide a mid-year update to our community about what we've been working on, what lies ahead, and how you can get (more) involved.

Metadata Quality: Understanding the Challenges and Opportunities

What's Holding Us Back

Creating, managing, and maintaining high-quality metadata and making it available for retrieval and discovery is an ongoing effort that cuts across many users and services. Based on internal analyses as well as consultations with community members, we can identify some broad categories of challenges with regard to these efforts. 

Consistency and Coverage Gaps

The DataCite Metadata Schema is deliberately designed to support a wide range of resource types and is scoped to align with key use cases for citation and discovery. The flexibility of the schema and ongoing updates, such as the addition of new resource types, make it possible to serve multiple use cases. 

At the same time, we know that users face certain challenges when it comes to keeping up with schema updates (or may not be able to take advantage of them if they depend on a third-party platform). We also know that schema's broad utility and depth of options can sometimes lead to discrepancies and inconsistencies in terms of how it is interpreted and applied. 

In a slightly different vein, we also know that community stakeholders have additional metadata that does not always make it into our metadata store but could be useful for expanding coverage and enabling more connections. 

Limitations on Existing Tools and Services

Maintaining and improving metadata requires a close attention to detail as well as a birds-eye view to identify gaps and patterns. We know that users struggle to analyze their metadata at scale because our existing tools and systems are not fully scoped to these needs. 

Stewardship Models

Metadata maintenance and improvements also require responsible stewardship. In cases where organizations experience personnel turnover, there may be gaps in continuity and responsibility. The burden of responsibility falls on individual member organizations, which can sometimes result in bottlenecks or a single point of failure. 

Siloing 

A related challenge to stewardship models is the fragmentation and siloing of information systems maintained by different actors throughout the research lifecycle—which can create gaps in interoperability, cause duplication of efforts, and result in missed opportunities to make meaningful improvements and connections across research outputs. This is further complicated by closed ownership models, where it's often the sole responsibility of one stakeholder to manage metadata, leading to limited space for collaborative improvement beyond initial creation.

Identifying Strategies and Solutions

The above are just a few of the challenges that we have identified to achieve metadata quality and precision at scale. 

Just as the challenges are multifaceted, we can take a multi-pronged approach to addressing them, through a combination of tools, strategies, and support, and with active community participation and collaboration. These initiatives include:

  • Ongoing refinements to the metadata schema as well as guidance and examples on implementation
  • Developing and improving tools for members to help with metadata registration and maintenance
  • Exploring internal normalization strategies, such as matching affiliation strings to ROR IDs 
  • Investigating additional potential sources of metadata that could be used to enhance coverage and connections
  • Identifying areas for policy and process refinements, such as considerations with regard to metadata normalizations and enrichments
  • Participating in community conversations and collaborations around approaches to metadata enrichments, such as those recently initiated by COMET, focusing on key metadata fields and approaches to trust and provenance markers

The Road Ahead

As we pursue multiple pathways in pursuit of better metadata, we know we will not and can not do this alone! We will be relying on input and involvement from community members of all types to help inform service design, infrastructure development, and policies and practices. Here are some ways you can do this:

  • If you register metadata: let us know how we can make this easier for you
  • If you harvest/analyze metadata: tell us what you're trying to accomplish and let us know how we can help
  • If you want to contribute additional metadata or enrichments: get in touch
  • If you would like to learn more about collaborative community-based approaches to metadata enrichment: join the COMET community call on July 15

We encourage you to keep in touch and follow along, whether through feedback on GitHub, meetings and events, our newsletter and blog, or just sending us an email

Additional details

Description

DataCite Metadata: Connecting Research to Advance Knowledge For stakeholder communities around the world, DataCite offers a trusted home for research organizations and their communities to contribute, connect, and retrieve key metadata about outputs, resources, and activities.

Dates

Issued
2025-06-12T17:42:31
Updated
2025-06-12T19:20:46