DSpace perspectives on open infrastructure, open data and recognition for data
Creators & Contributors
DOI: 10.60804/h6aq-cx32
Following our series of interviews with SCOSS-funded open infrastructure organizations, we spoke with leaders of DSpace, asking them about their perspectives on the importance of open infrastructure, and milestones for open data and data recognition.

Photos (left to right): Bridget Almas; Holger Lenz; Erik Moore; Kristi Park
Please tell us about your work at DSpace. How does your infrastructure support open science and how did it become involved with SCOSS?
DSpace is the most widely adopted open source repository software in the world, used by a reported 3,000+ academic, non-profit, and commercial organizations. It is used to manage research and scholarly materials across all disciplines, and cultural heritage materials of all types, with a focus on long-term storage, open access, and preservation.
DSpace's value to the Open Access and Open Science communities is demonstrated in the program's commitment to scholarly standards of open communication and collaborative relationships, including with COAR, euroCRIS, OpenAIRE, and ORCID. Since 2002, DSpace has been used by institutions supporting and promoting Open Access to curate their digital content. Increasingly, DSpace is a component of infrastructure that improves the impact of scientific and scholarly communication by making pre-prints, articles, theses and dissertations, as well as other materials such as research data, accessible and distributable. It has evolved over time to meet new Open Access and Open Science requirements. Its flexibility allows custom configuration to comply with an institution's Open Access policies.
After learning about the SCOSS funding model for Open Access infrastructures, DSpace first submitted an Expression of Interest to SCOSS followed by a formal application in 2020. We were very excited to receive approval for inclusion in the 3rd SCOSS funding round, which ran from 2023-25. The DSpace program's partnership with SCOSS allowed us to augment our staff capacity to better meet the needs of its large community of users, and to accelerate development at a crucial moment in the history of the software.
Why do you think open infrastructure is important to advance open science, and open data in particular?
Open infrastructure is the backbone of open science. It provides the platforms, standards, and tools that allow researchers to share, find, and reuse data and publications freely. For open data specifically, infrastructure like DSpace ensures that datasets are properly curated, preserved, and made FAIR (Findable, Accessible, Interoperable, Reusable). Without robust, openly available infrastructure, open data risks becoming fragmented, inaccessible, or lost, undermining the goals of transparency, reproducibility, and collaboration in research.
Open infrastructure also fosters collaboration and innovation. When platforms and tools are open, communities can contribute improvements, develop extensions, and share best practices, creating a cycle that benefits everyone. It reduces duplication of effort and allows resources to be focused on improving data quality, discoverability, and usability, rather than reinventing technical solutions.
Finally, open infrastructure underpins trust and accountability in open science. Transparent systems make it possible to track provenance, ensure proper attribution, and maintain data integrity, all essential for reproducibility and long-term research impact.
In short, open infrastructure is not just a technical foundation. It is a cultural and strategic enabler that allows open science and open data to thrive in a way that is equitable, collaborative, and sustainable.
What are the current biggest challenges for open infrastructure? How do you think they should be addressed?
Openness itself can be a challenge. The ability to fork and modify the code to suit institutional needs is a big part of the appeal of open infrastructure; at the same time, this adds to the difficulties of maintaining, updating, and garnering support for a single shared code base.
Additionally, we cannot talk about challenges to open infrastructure without discussing sustainability. Many open infrastructures depend on short-term grants, volunteer contributions, or membership fees. Long-term, predictable funding is essential to ensure lasting impact and continuity, which, in turn, is what user institutions rely on. Without continuity, open infrastructure becomes meaningless.
Duplicative development in the open infrastructure space presents another challenge. What we currently observe in this landscape is a high level of fragmentation due to the proliferation of grant-funded efforts to redevelop similar functionality over and over again. More strategic collaboration is necessary. Katherine Skinner, Director of Programs at Invest in Open Infrastructure (IOI), recently discussed challenges and opportunities for our ecosystem in the 2025 Publisherspeak Keynote address and shared hopes for a more targeted, deliberate collaboration (see "Open Infrastructure at a Crossroads").
Why do you think it's important for research institutions to invest in open infrastructure? What will it achieve in the long term?
Open infrastructure is the backbone of modern research. When institutions invest in open repositories, data platforms, and collaborative tools, they're not just funding software. They are supporting a sustainable, interoperable, and community-driven research ecosystem. The stronger this ecosystem, the greater the benefit for researchers, institutions, and society at large.
Investing in open infrastructure ensures long-term access to research outputs, strengthens institutional control over data, and fosters innovation by enabling reuse and collaboration. It's not just a technical choice: it is a commitment to open science, lasting impact, and an approach to research that works for everyone. It ensures that today's research outputs, especially data, remain accessible, trusted, and valuable for years to come.
Ultimately, long-term investing in open infrastructure promotes transparency, reproducibility, and equitable access to research globally, while reducing dependence on proprietary systems that may lock in content or raise costs.
What challenges and opportunities do you see in Generative AI for open infrastructure and open data?
Generative AI presents both opportunities and challenges. On the opportunity side, generative AI has the potential to strengthen many core aspects of open infrastructure. AI tools can significantly improve metadata creation, helping repositories and data platforms enrich records more quickly and consistently. Automated approaches to data curation may also reduce the manual workload for repository staff while improving data quality and organization.
Another major benefit lies in discoverability. By analyzing large volumes of open datasets and publications, AI systems can surface connections that might otherwise remain hidden. This could help researchers identify relevant resources more easily and generate new insights from existing data.
Generative AI may also help address a persistent challenge in open infrastructure projects: limited developer capacity. AI-assisted development can accelerate coding, debugging, and documentation, helping projects move forward despite constrained resources.
Alongside these opportunities, generative AI raises a number of complex concerns for open infrastructure communities. One major issue involves licensing and attribution. Many AI models are trained on openly available datasets, but ensuring that these models respect the licensing terms and attribution requirements of those datasets remains an unresolved challenge.
Other concerns include bias in AI-generated outputs and the need to protect sensitive or private data. Open infrastructure projects must ensure that any integration of AI technologies aligns with community values around transparency, ethics, and accountability.
The rise of AI-generated text and other materials also introduces new questions for scholarly communication. We are approaching a point where AI-generated content may be difficult—or sometimes impossible—to distinguish from human-authored work. This raises questions about how such content should be represented, described, or labeled, particularly within metadata standards. Some institutions have proposed labeling AI-generated content, but the scholarly community has not yet reached consensus on how to implement such practices consistently. Establishing shared approaches will be critical to maintaining trust in research outputs.
Another significant challenge Generative AI brings to open infrastructure and open data is that, in crawling open sites and data, the AI bots can overwhelm the system and create access problems due to the strain placed on open infrastructure resources. Even when AI bots are from trusted sources, their effect on systems can lead to failures. Managing these pressures requires additional technical and staff resources, which many institutions already struggle to provide. In response, the communities that support and develop open infrastructure are well-positioned to work in concert to address issues that affect us all by sharing our knowledge, experiences, and tools in ways closed systems are unable to do. Even when bots originate from legitimate organizations, large-scale crawling can overwhelm open infrastructure systems designed primarily for human access.
Lastly, the infrastructure side presents both opportunities and challenges. While generative AI can make it easier for organizations to build new tools quickly and thus alleviate resource constraints, this could easily lead to increased fragmentation across the open infrastructure ecosystem. As new AI-driven solutions emerge, the risk is that communities may duplicate effort rather than collaborate on shared platforms.
What would you highlight as key milestones achieved in open data and in understanding data impact over the last few years?
The growing adoption of persistent identifier standards, including DOIs, ORCiDs, RORs, and, most recently, RAiDs. These have enabled increased interoperability across open datasets, which in turn facilitates reproducibility, and the ability to connect researchers, research activities, funding sources, and research outputs to their longer term impacts.
We have also seen growing alignment around the FAIR data principles—ensuring that data is Findable, Accessible, Interoperable, and Reusable. Many institutions, funders, and infrastructure providers now actively work toward implementing these principles in repositories, data management policies, and metadata standards.
Finally, improvements in repository infrastructure and metadata interoperability have made it easier to track data reuse and assess impact. As datasets become more integrated into scholarly communication systems, the research community is gaining better tools to understand how data contributes to new discoveries, collaborations, and broader societal outcomes.
In your view, what are infrastructure, policy or cultural advances needed to support greater recognition for data outputs?
Despite growing awareness of the importance of research data, data outputs still do not receive the same level of recognition as traditional publications. Addressing this gap will require progress across infrastructure, policy, and research culture.
On the infrastructure side, continued adoption of persistent identifiers (PIDs) and interoperable metadata standards is essential. Identifiers such as ORCID, DOIs, RORs, and emerging identifiers for research activities help connect datasets to researchers, institutions, funding sources, and publications. Strengthening these connections makes it easier to track reuse, measure impact, and ensure that datasets are properly cited and credited.
Policy also plays a key role. Funders, institutions, and publishers increasingly require data management and sharing plans, but these policies need to go further by explicitly recognizing datasets as legitimate research outputs. Clear guidance on data citation practices, along with requirements for citing datasets in publications, can help normalize data as a citable and valued scholarly contribution.
Finally, cultural change within the research community is just as important. Researchers need incentives to invest the time required to properly curate, document, and share their data. This means that tenure and promotion processes, grant evaluations, and institutional reward systems should more consistently recognize high-quality data creation and stewardship as meaningful scholarly contributions.
Taken together, improvements in infrastructure, policy, and culture can help ensure that research data is not only shared more widely, but also receives the recognition it deserves as a foundational component of modern scholarship.
Additional details
Description
DOI: 10.60804/h6aq-cx32 Following our series of interviews with SCOSS-funded open infrastructure organizations, we spoke with leaders of DSpace, asking them about their perspectives on the importance of open infrastructure, and milestones for open data and data recognition. Photos (left to right): Bridget Almas; Holger Lenz; Erik Moore; Kristi Park Please...
Identifiers
- UUID
- 556aba60-d671-4520-b6e6-e34c0c78683e
- GUID
- https://makedatacount.org/?p=1702
- URL
- https://makedatacount.org/read-our-blog/dspace/
Dates
- Issued
-
2026-03-31T14:49:42
- Updated
-
2026-03-31T14:49:42