Name:
PID Innovations and Developments in Scholarly Infrastructure Recording
Description:
PID Innovations and Developments in Scholarly Infrastructure Recording
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/f98429b7-3252-45d8-b0c3-be6ef874e3e1/videoscrubberimages/Scrubber_3.jpg
Duration:
T00H55M02S
Embed URL:
https://stream.cadmore.media/player/f98429b7-3252-45d8-b0c3-be6ef874e3e1
Content URL:
https://cadmoreoriginalmedia.blob.core.windows.net/f98429b7-3252-45d8-b0c3-be6ef874e3e1/PID Innovations and Developments in Scholarly Infrastructure.mp4?sv=2019-02-02&sr=c&sig=nH7JCgpUxwtuY06jtjJVU9p6zMQTra%2FmuwR%2FNBwWzWM%3D&st=2024-10-16T01%3A49%3A13Z&se=2024-10-16T03%3A54%3A13Z&sp=r
Upload Date:
2024-03-06T00:00:00.0000000
Transcript:
Language: EN.
Segment:0 .
Hello, everyone and welcome to this session of nice Plus 2023.
Now this session, what we're going to talk about is connecting everything with innovations and communities. For those of you who are not aware, just a brief introduction to what up it stands for persistent identifiers. It's an identifier that uniquely. Identifies scholarly entities like a researcher. Maybe research projects. You've got grants, you've got journal publications, you've got data sets.
This is what an identified the persistent identify does is uniquely identify them. Across across borders as well as for the entire duration of its existence. Now what you're going to do is in this session, you're going to hear from people from data site rate raw as well as metadata game changes. Talk about the inner workings of how bid infrastructures are built, scaled, how is community supported.
AMANDA FRENCH: How we work with community and then. And move ahead because with persistent identifiers, it's only successful if the community is involved in it, if the community supports it. We're going to start off with Matt, who will talk a bit about data side and how the supporting scaling persistent identifier communities. Ted will talk about instrument identifiers and all the new things that are happening in and around the instrument identifier community.
AMANDA FRENCH: Sean's going to tell us a bit more about rate, what's happening. And as one of the slightly recent persistent identifiers, there's a lot more to red than. Just the name. And Amanda is going to talk to us about innovations. Within the bid community, especially relating to organizational identifiers. So without further ado, I shall pass the baton over to Matt and we'll let Matt, go ahead.
AMANDA FRENCH: Great Thanks very much, Mulroy, for the introduction. And it's great to be presenting alongside so many of our collaborators and folks that we work with closely in the community. So yeah, I guess as Melroy mentioned, what I wanted to talk a bit about is our approach at data sites and really this is a community approach. Melroy touched on that.
AMANDA FRENCH: This is really important that we have this collective action and work together in building out scholarly infrastructure across the ecosystem. So just to start a bit of context. Data site is a global community. And that's really how we like to position ourselves and talk to other stakeholders within the community is that we are here working together with a common interest and that interest is ensuring that we want to make sure that research outputs and resources are openly available, that they are connected and that they can be reused to advance knowledge across some between disciplines now and in the future.
AMANDA FRENCH: And as a community, we seek to make this research more effective by connecting these things together and enabling the creation and management of these persistent, identified records. Enhancing research workflows by integrating into different systems and interfaces and facilitating the discovery and re-use of research outputs and resources. As a community.
AMANDA FRENCH: We are spread around the globe, across 50 countries and continue to grow in different areas, in particular across different research outputs and resources and defining best practice, but also our collective effort in embedding that into the workflows of researchers and key stakeholders across the ecosystem. It's all about building sustainable infrastructure.
AMANDA FRENCH: And infrastructure can be, I guess, frustrating in some ways. Sometimes if we have issues with infrastructure, if we have issues with a road, if it's got potholes or the traffic lights aren't working, we're going to be talking about this a lot and complaining. But if it's all going fairly smoothly, we generally are fairly happy. We understand how we use, infrastructure, and we can see clear benefits to the infrastructure and that's very much what we're trying to do here.
AMANDA FRENCH: One important thing to note, though, is as we scale up infrastructure services, it's really important that we have some key pieces in here. And the policy principles really outline some of these key pieces that need to be in place as we scale up infrastructure. And so I'll touch on a few. One, we want to make sure that there's clear governance, clear, transparent governance that is community governed.
AMANDA FRENCH: Stakeholder owned infrastructure. We want to make sure that there's a key piece around services so that we are focusing on building out services that add value to the community and not creating a, I guess, a lock in around the data. We want to make sure that the data is open, openly available and can be reused. We then want to make sure that there's insurance in certain key pieces of insurance in place, such as making sure that the code base is met open source so that there is the CC zero data that we do have and that insurance for the community that are investing heavily both financially, but also from a resource perspective into integrating and scaling up the infrastructure.
AMANDA FRENCH: And then finally, the long term sustainability is really important that there is a sustainable model in place that protects this investment, this common investment around this infrastructure. Working together. We're really trying to affect culture change in the community. And where I position a lot of the work that we're doing across the different positions and infrastructures is making it possible, but also making it easy, making it SeamlessAccess.
AMANDA FRENCH: And this is where we really need to focus on making sure that we working within the existing workflows, not trying to create a whole new parallel workflow for researchers to do something. How do we enhance existing workflows and add value? And then how do we work with key stakeholders from a policy perspective that can incentivize and make it rewarding. And then in turn potentially put in place some policy.
AMANDA FRENCH: And then we end up in a situation where we have a community that starts to become normative and we really start to effect change. So what we've done over the years is worked with various places and identify communities and scaling up their posts and identified services. We've done this across a number of different resource types. We have over 28 different resource types in the data site schema and we have started to see as research funding organizations, universities, foundations, government, et cetera have started to adopt and really look at connecting the disparate pieces of the research lifecycle.
AMANDA FRENCH: We've seen these increased interest in different services and working with data sites. We've had a extremely successful partnership to date with DSM in supporting their position, identify registration services, but as well as some other components, we've done some work with folks like DMP too, in establishing DMP IDs. We've worked with archives and archives and registering dossiers for all of their records.
AMANDA FRENCH: And really it's common across all of these different groups, and these are just a few that I've mentioned in the different areas that we work with. And so when we look to take a comprehensive approach. And it's really important that we don't just focus on the technology, that we do think about all of these aspects. It's really about, yes, the key piece of this is identify and metadata, registration, the service and the resolution.
AMANDA FRENCH: But then it's about how do we make sure that these research resources and outputs are discoverable? How do we make sure that we have the right technical staffing and key architecture in place to support that service? How do we think about schema changes and evolution and extending that metadata beyond what is stored in a persistent identifying metadata record? As an example, how do we address those edge cases or not necessarily even edge cases in some cases, but very domain specific considerations.
AMANDA FRENCH: How do we govern as a community and find direction collectively? How do we ensure we have that fiscal responsibility, that sustainability piece in place, and how do we ensure that we can work together around engagement and advocacy for the work that we're doing? Our collective efforts are really there to bring value in different tiers and starting with the core of registering persistent identifies in the metadata, improving that discoverability.
AMANDA FRENCH: And there's a number of services here. And so it's registering that metadata. But then there's things like content negotiation and this is where Di is a really valuable piece of technology and that if you have a domain name, you can use content negotiation regardless of the registration ANSI to resolve and retrieve metadata. It's then about adopting and implementing best practice and then building on top of this or tracking influence of the research with different tools.
AMANDA FRENCH: And services. And so bringing in dashboards, analytics, harvesting services, graph apis, et cetera. We've done a lot of that at data sites and so across scaling the different communities, working across the different outputs. And resource types, we've built out the pit graph that connects different types of groups together. We've worked very closely, obviously with crossref, ORCID and Rau, which are key partners of ours, in bringing this together into data commons, which sits on top of the pit graph.
AMANDA FRENCH: This is our own interface and making this discoverable, addressing different use cases. So saying I want to look at a particular project or I want to look at a particular grant or a particular article, what is the data set, et cetera excited. Looking at a data set, looking at the views. In this case, we're looking at NIH and the aggregated citations across their works, the different licenses, different work types that are available.
AMANDA FRENCH: It's about community. And so it's really important that we recognize that our strength is our collective effort and our strength is really in our active membership. And so it's important that we have key values, that we work together. We really are working together in enabling the discovery. Cycle and this is around registering persistent identifiers and metadata, connecting them, making sure that they discoverable and tracking this.
AMANDA FRENCH: Supporting this. We do a lot of work around strategic initiatives and I talk and say that it takes a village. And so we're doing a lot of work around things like data metrics with make data count and leading efforts there identify registries. You'll hear from Amanda, but we run the technical infrastructure for all as part of our partnership within raw with the University of California and crossref, we also work with re three data on the work that they're doing on their registry.
AMANDA FRENCH: And yeah, it's really about working together. And so finally, I would just like to say we stronger together join the conversation. We would love to work collectively with you all and any new identified communities out there that are looking to scale up. And we really welcome a conversation in working together. And with that, I'll thank you for your time.
AMANDA FRENCH: Thank you for that, Matt. And now we'll move over to Ted. Please talk to us a little bit more about instrument identifiers. Hi, I'm Ted Habermann from Metadata Game Changers. I wanted to talk to you today about the question of how hot is the ocean. This is a plot that shows the average sea surface temperature as a function of time compare or the anomaly sea surface temperature anomaly since 1880.
AMANDA FRENCH: And it's clear from this data that this anomaly, that the sea surface temperature or the amount of heat in the ocean has increased significantly over time. And, of course, it's important when we look at a plot like this to remember that behind every set of observations, there's an instrument. And in this case, the instruments are expendable bathythermographs (XBTs).
AMANDA FRENCH: XBT data, it turns out, composed about 18% of the available sea surface temperature data. And it also turns out that they introduce a positive bias bias into the global temperature anomaly measurements resulting in a larger apparent warming after XBTs were introduced in the 1960s and 1970s. So it's important to remember that behind every set of observations. There is a set of instruments.
AMANDA FRENCH: And because of this problem with a bias related to these XBTs, it's important for us to understand those instruments and know the instruments that were used to make these measurements. Enter permanent identifiers or instrument PIDs. This slide shows a short history of the recent work that's been done on instrument PIDs or PIDINST.
AMANDA FRENCH: It started with a Research Data Alliance or RDA PIDINST working group around 2017 and it's culminated now in the inclusion of instruments in a. In the version of the dataCite metadata schema. And these are the group, the RDA PIDINST working group did a lot of work to try and push this idea or develop this idea along this curve. And I wanted to point out three elements that they did here use cases, example implementations and then dataCite schema.
AMANDA FRENCH: And these three elements I think are important to keep in mind because they're really external elements. They involved interacting with groups outside of the PIDINST working group or working with many stakeholders or potential stakeholders to get use cases. It's working on example implementations and then working with the dataCite metadata working group to integrate these the ideas into the schema.
AMANDA FRENCH: And the fact that these use cases came from many different domains and many different stakeholder groups is really important in the development of PIDs or new PIDs, understanding who your stakeholders are and working with them to develop the metadata schema or to figure out how these PIDs were going to work is a super important thing. And this the PIDINST Working Group did this very well. Another thing that they did.
AMANDA FRENCH: That was really good is they created a PID provider-agnostic identifier metadata schema. They thought about what did they need to describe instruments. They didn't think about what existing metadata schemas were there that they should use. They thought they thought about the problem they were trying to solve. And when they published documentation open in and readTheDocs, they the section on how to get PIDs.
AMANDA FRENCH: So submit a data record to a PID provider that is compliant with the RDA PIDINST recommendations. So I think it's also a very important thing that they did here. They didn't try to become a PID provider. In fact, they also warned people that they had to become members of PID providers. But they focused on what metadata did they need for these instrument identifiers rather than worrying about being PID providers themselves.
AMANDA FRENCH: And and they worked with several PID providers ePIC and dataCite to do these test implementations. Again, I think this was an important part of the way they did their work. One of the important things about all identifiers is making connections. And so connections between data and papers are and/or connections of data and papers to instruments are really important.
AMANDA FRENCH: So a couple of examples of that. One from Pangea, a data set collected during a dive of an autonomous underwater laboratory and the data home page. The data set landing page has a link to the instrument landing page, the polar autonomous underwater laboratory. So this is an earth science example. Another example from Helmholtz Zentrum Berlin, which is a particle experiment facility has again, a completely different kind of use case from a completely different domain.
AMANDA FRENCH: Also has a link to the instrument used to collect the data in this paper. So this is what we're trying. They have examples of what we're trying to get to. It's also important for identifiers to point to more information. So in this situation, in dataCite, we have to think of as identifier metadata. Metadata for that PID.
AMANDA FRENCH: And of course, the instrument in this case and all dataCite resources have landing pages which are written in HTML that provide information about that resource being identified for instruments or also something called instrument papers, which like data papers are papers that focus on describing some resource, either data and the data paper case or instruments. In this case.
AMANDA FRENCH: This is from the Journal of Large Scale Research Facilities, and it's a paper that describes a particular instrument and it's connected to the dataCite metadata using the IsoDocumentedBy relation type. Of course, structured metadata is also important and you can connect to such metadata from a dataCite record for an instrument and that metadata might be written in json-LD or metadata or the instrument metadata could be a sensorML, which is from the Open Geospatial Consortium or the instrument metadata could be embedded in a net CDF data file.
AMANDA FRENCH: In all cases, this can be pointed to using the HasMetadata relation in the data site, in the dataCite metadata record so that people that find the instrument using dataCite can also find more detailed and structured metadata and potentially multiple formats for the instrument that they're using. O back the XBTs. So XBTs.
AMANDA FRENCH: This is a picture of a couple XBTs. You can see that there are very different shapes through history. They're shot into the ocean through a gun or through this little tube. And the different shapes is what causes the problem, because these things, once they get into the water, it's one of the things that causes the problem. They sink at various rates or they fall into the water at various rates.
AMANDA FRENCH: And those fall rates are what we need to know to use the data. So the fall rates are instrument dependent. This shows where XBTs have been deployed since the 1960s, all over the world, and more importantly for this talk. It shows that the histogram on the bottom shows many different types of XBTs being deployed all over the world. And and the distribution of those types changing as a function of time.
AMANDA FRENCH: It's a little bit easier to see in this plot, which shows the number of XBTs of certain types that have been deployed from 1965 to 2010. Now, we can see quite a bit of variation in the number of these things that have been deployed. And the types that have been deployed. So in order to understand the observations and understand how hot is the ocean, we need to understand the instruments that were used to make the measurements.
AMANDA FRENCH: So here's our plot again, a temperature anomaly in the rise. And of course, it's important for us to remember that behind any set of observations is some set of instruments could be one instrument could change as a function of time, or it could be networks or compare or combinations of many instruments like in this case. And it's important to remember that these sets of instruments are behind every observation or set of observations.
AMANDA FRENCH: So we still have problems in sea surface temperature or sea ocean heat studies because prior to 1990, 68% of the XBTs in the world ocean database have no probe type information. And third, even from 1990 to the present, 35% of the XBTs have no probe type information. And without that information, you can't really understand well the results. There's a lot of work going on to recover that metadata from the data and that work needs to be done because there are no instrument identifiers.
AMANDA FRENCH: So it's important to remember that science good science depends on instrument identifiers. So thank you very much. Are there any questions? Thank you very much for that presentation, Ted. And now we'll move over to Sean, who will tell us a bit more about RAID and what's happening in that space. Sean, over to you.
AMANDA FRENCH: Hi, everybody. Sorry about that. Just a second. I'm not quite set. I apologize. I'm just having a small problem with my slides thing there. I just needed a bit of a second there. I apologize.
AMANDA FRENCH: It's set now. I'm just getting my. Screen sharing. Ready? OK. I will. Do you want me to? Sorry I missed how you did this before. Should I go straight into the.
AMANDA FRENCH: Should I just have the slides up on my screen when I start? Or do you want me to, like, introduce myself or say something first, and then go to the slides? You can say something for us about yourself and then go straight. Fine it's how you want to do it. There's no hard and fast rule like this. OK, cool. No, that's.
AMANDA FRENCH: I am actually. Now now. Ready sorry. I just had a second of a glitch there with getting my slides ready to share, but I'm. I got it. Cool not a problem. Well, over to you, then, shall. Hi there.
AMANDA FRENCH: I'm Sean Ross, and I am the product manager for the research activity identifier that's being rolled out by the Australian Research Data Commons. And I'd like to tell you a little bit about the raid itself and why we decided to stand up a new, a new PID and how it fits into the broader ecosystem of, of, of persistent identifiers. So just let me share my screen here.
AMANDA FRENCH: So raid. What is a raid? Raids a persistent identifier for research projects and activities. And we're defining that pretty broadly. We realize that in different disciplines, in different parts of the world, there may be somewhat different ideas of what this is.
AMANDA FRENCH: But we think that operationally there is a thing that we can call a project that that's the place where a lot of research takes place. And so what a raid is for is for these sorts of projects and activities, and they can be hierarchical or related to one another. And that links organization, peoples, people, contributors, inputs, outputs to a project. And provides key project information that you can't find anywhere else.
AMANDA FRENCH: Also, I'm happy to say that we just heard last month that our ISO standard governing raid has now been published. And I can provide that link to anyone who needs it. So just to clarify, I'd like to start some of these presentations with a bit of clarification that we're not trying to duplicate existing identifiers. And so a raid is not sometimes it's easier to define what a project isn't rather than what a project is.
AMANDA FRENCH: It's not for grants, it's not for researchers. It's not for durable organizations or organizational units like Teams or centers or groups or departments. It's not for documents, papers, articles, books, recordings, other digital objects, software, data, sets, instruments, samples, specimens. Not those things that we're covering what we think is sort of the Nexus between a lot of these other activities and a lot of these other entities.
AMANDA FRENCH: Excuse me. So array, how does it work then that there's really two parts, the identifier itself, which uses the handle system to create global unique persistent identifiers. And then the raid metadata record. And we're just finalizing the metadata schema now and we'll have that published. We'll have that published in march, at least a draft or early beta version of it.
AMANDA FRENCH: And the metadata record, what it includes is primarily links to other kids, to collaborators, organizations, grants, awards, infrastructures, all those other things that are raid is not we link to them using the PID. We're trying to take a pretty aggressive approach about not duplicating information that's in those pids that we don't say read in a name from an orchid, we just link to the ORCID.
AMANDA FRENCH: We may do some caching to improve performance, but but we're not actually copying any of that information into our metadata record. And in addition to these linked pigs, the raid metadata or metadata record contains project information that isn't duplicated anywhere else, something like a project's title, description, subject, et cetera so this is an example of what a raid might look like in the upper block of text up here.
AMANDA FRENCH: I guess I should start on the left hand side. You'll see that there's a handle that's ultimately going to be resolved at raid which we've acquired and and that's how you resolve the raid is by that handle then the upper block here is just an example of a few of the metadata elements that we capture new for the raid, which again we're doing that to the minimum, the minimum level that we can. And then an example and then some examples of the kinds of other pages or example page that we link out to and how we're going to do that.
AMANDA FRENCH: And to the extent possible, we're doing that both for the metadata property itself, like investigator or sorry, that will be contributor. And there's different roles of contributors both in an administrative sense like principal investigator. We're calling that position what the administrative position is and in role as for the credit schema, so even for the vocabularies within a metadata property, we're using reusing identifiers uris there wherever we can to define to define those and vocabularies that we have to create for rates like different kinds of titles, we're going to have primary stored, acronym, et cetera.
AMANDA FRENCH: We will publish those, those vocabularies to make it as easy as. Possible to interoperate with Ray. So that's an example of what a raid might look like and what so what we're trying to do here essentially with a raid is take the disparate elements of that, go into research, researchers, organizations, etc., and tie them together in relationships that can be described or qualified so that there's the raid links out to those various organizations grants, data sets, publications, et cetera.
AMANDA FRENCH: So where? When will it be available and where are we now? Raed was started a few years ago, but in a very, I guess, small scale way in use by only a few Australasian institutions without a well-developed metadata record. And the RDC in the last year has picked it up again to revive it.
AMANDA FRENCH: And so 2022 we were doing ISO certification, we were doing business analysis and we were redeveloping sort of the we began redeveloping the service, the rate service in a modern stack under the ISO certification. Thus very quickly, the RDC is the global registration authority, so we'll set policy and we're also building a model service that we hope with some work will be re deployable by re deployable elsewhere.
AMANDA FRENCH: And under the RDC as registration authority there will be registration agencies which are the ones who will actually of the organizations that actually make the raids. And the RDC also serves as a registration ANSI for Australasia. And again, we're building a model, great service that we're hoping will be redeployed by others. And that's what you see here.
AMANDA FRENCH: We'll be prototyping and developing over the course of this year and it's really in 2024 where we're hoping to get the software to a point where it can be redeployed relatively easily elsewhere. So why do it? Why do a project ID like this? It's a lot of work to stand up a new kid. And I wouldn't recommend it unless you absolutely have identified a clear gap in the ecosystem and have feedback from the other major pits that that is the case, that it is a clear gap and we work with on our advisory group, Matt, Amanda, others who are involved in existing kits and there was when I joined the project in July, there was general consensus that this was a useful addition to the ecosystem.
AMANDA FRENCH: So why do what was the argument for that? And essentially broke down into these categories that projects get broadly defined and acknowledging some disciplinary and regional variation here is where research happens that in domains where there's collaborative practice it is a project is what brings all those people together. But I'm originally a historian where a lot of historians will work independently.
AMANDA FRENCH: But even there, if you start a casual conversation with someone else, it will often begin with what project are you working on? So at least colloquially there's a an, an idea that research happens in the context of a project. And projects are also this sort of time limited but identifiable definable container for the various inputs and outputs of that go into and are produced by research.
AMANDA FRENCH: So one of the first sort of questions that came up about this is that, well, isn't that the same thing as a grant? Can't we just use a cross granted for this? But that may be the case in some disciplines that there's a link between projects and grants, but in a lot of disciplines there aren't. Again, I think I started my career as a historian. A lot of historians work their whole career without ever reading a major grant or a lot of research never gets a grant.
AMANDA FRENCH: I've since moved into archaeology and a large archaeology project's going to have many grants, so it's not in a one to one relationship. And then the other thing is that and I checked this with my own, my own grant that I'd had grants closed pretty quickly after the end of their term. And on the other hand, the project that was funded by that grant often goes on for many years and can have longer term outputs and outcomes and impacts that wouldn't be captured within the metadata associated with the grant.
AMANDA FRENCH: So we can do with projects that are a bit more long term and we can get that long term view. And then just quickly, another thing is that we some of the use cases that we were presented with while we were doing business analysis or emphasize the fact that projects evolved and we need to be able to see if we're looking at a data set that a project produced in 2016 to go back and see what the state of that project was, who was involved, etc., in 2016, but also see how that's evolved.
AMANDA FRENCH: So we have a history of the project. We know what a data set produced in 2020 would have looked like. Et cetera. And existing pits that might take a snapshot of a project don't. It's more difficult to capture that evolution which rate is designed from the ground up to do. And also finally projects are where research is often administered.
AMANDA FRENCH: It's a common concept. And in research information systems it appears in other pids and it appears that domain specific metadata standard. So for those reasons, we identified that there was actually a gap that was needed and there are benefits from this mostly that are Ray provides a single source of truth for projects, supports reporting and impact measurement, captures research, provenance, giving us an idea of the context.
AMANDA FRENCH: And again, that's why it's so common in domain specific metadata. They want to know about the project that produced the data sets, and especially now that we have an ISO standard, it standardizes the identification of projects. Potential impact on this. The estimate from a cost benefit analysis done by more brains in the UK indicates that there's about 50,000 projects in the UK, maybe 21,000 in Australia.
AMANDA FRENCH: You can get an idea if you scale that globally, what it might do. And in this analysis they said that they calculated that in Australia the elimination of double entry of project metadata would save about almost 3,000 days a year, almost 3 million aud a year, and combined with pages for publication and Grant. So those numbers go way up to nearly 38,000 person days and $24 million a year.
AMANDA FRENCH: And that's just the double entry elimination, let alone other benefits of it. So I guess I'd just close by saying that a lot of my interests are around open research, open Scholarship. And I think that opening up the information about a project, about the contributors, organizations, funding all of that and making it widely available is going to be very significant beyond whatever the monetary benefits might be from a FAIR data and open research perspective.
AMANDA FRENCH: And that's. That's right. And now we'll hear from Amanda on innovations happening in the organizational identifier space.
AMANDA FRENCH: Thanks, all. I really am just giving you more an idea of developments, specifically with ROR more than in the entire space that is covered by so many of the
AMANDA FRENCH: Persistent identifiers we've heard from. So I thought I would take this opportunity to tell you a little bit about the future of ROR. I mean, I think it's interesting that we have some brand new emerging identifiers for instruments. We have some somewhat older identifiers, such as RAID for projects. And then we have what you might call the elder statesman of persistent identifiers with DOIs, sort of aptly represented by Matt at the beginning of this program.
AMANDA FRENCH: So ROR, I think is somewhere in the middle of this. It is ... I like to think of it as the young adult of persistent identifiers. It's maybe not the oldest persistent identifier, but it is no longer one of the youngest. So I wanted to talk about this from the perspective of what I'm going to say later is sort of a college graduate type of phase in our persistent identifier life.
AMANDA FRENCH: So first, I'm going to give you just a quick overview of what ROR is. ROR is the Research Organization Registry. It is a global community led registry of open, persistent identifiers for research organizations. You can see here what a ROR record looks like on the web. We have a unique identifier and we have a great deal of fundamental basic information about research organizations.
AMANDA FRENCH: And thinking about connections between and among PIDs. That really is the key. PIDs are incredibly useful once they all work together. Matt mentioned DataCite Commons earlier. And if you haven't seen DataCite Commons, I strongly recommend that you go look at it. It's just beautifully architected to make use of PIDs and to show what they can do. So it's really structured around people, works and organizations and in fact the organization is part of DataCite Commons is run by ROR.
AMANDA FRENCH: This is a sort of an older visual representing a three-legged stool, which I think is very nice. It doesn't mean that you can't have additional PIDs such as those for instruments and those for projects. But really, I mean, I think it is pretty clear that the institutional identifier is something that's really key when you're talking about research outputs. Yes, there's the research itself.
AMANDA FRENCH: There was the person who did it and there is the organization that supported that research, either by employing the researcher or sometimes funding it. Really key. So ROR really emerged to fill that particular gap. And I must say, when I first learned about ROR, I was very excited. I'm still excited. The more that I work with ROR because I had run an institutional repository at a library and I knew how messy this organizational data can get.
AMANDA FRENCH: So just here we have a little example of messy organizational data being entered. And because this particular field in an imaginary form is powered by the ROR API, it doesn't matter how you enter the organization name, it will return the same organization on the back end because it's looking at the ROR ID and pulling that information about the institution from the ROR API.
AMANDA FRENCH: This makes this kind of data cleaner from within a system, but also and crucially makes it easier to exchange all of that information between systems. ROR, like DataCite, is run by certain principles that I think many of us share, such as openness. There are other organizational identifiers, but ROR is, I think, the most open one that can be really technically integrated with a lot of existing scholarly communication systems.
AMANDA FRENCH: All of the data is CC Zero. There is an open REST API that is free to use. All of the data is openly free. All of our code is free and open source and public. Moreover, ROR is really, really dependent on its community. We really don't make a major move without consulting people Among them, Ted Among them, Matt. You know, lots and lots of major players in the scholarly communications space have joined our community advisory board, which is all show on a later slide, is now at about 150 people and really, perhaps most importantly.
AMANDA FRENCH: All of our records, which are in the open so that you can see all of the information about them go through a community curation process. So when people ask for simple changes to a ROR record, Hey, our website is wrong, can you correct it? That kind of thing. We just have our curation lead, make that change essentially, and queue it up for the next release of a corrected registry.
AMANDA FRENCH: But sometimes there really are these kind of thornier problems. Is this a research organization that is in scope for ROR? How should this organization be organized, as it were? What organizations is it related to? Does it have children? Does it have parents? All of that kind of thing. So we really rely on a very expert group of community curators to help us resolve those thornier problems.
AMANDA FRENCH: And then I think one of the things that I just think is. Almost, almost a unique governance model, a sustainability model, financial model for ROR. And is that ROR itself is not an organization. Ironically it is an initiative that is supported by the California Digital Library, DataCite, and Crossref who have written ROR into their operational budget, because they believe it's such an important part of scholarly communication infrastructure.
AMANDA FRENCH: So we currently have a four person team that runs ROR. Two of us work full time for Crossref. Liz Krzarnich, our technical lead works for DataCite and Maria Gould. The project lead really since the beginning of ROR works for the California Digital Library. And this means that work can remain free to use without relying on grant funds, membership fees or service charges. I won't read you this entire slide, but just to give you a sense of ROR, as a young adult in the PID space people have been working to develop ROR really since 2016.
AMANDA FRENCH: So that is now seven years ago, amazingly enough. And so there was a period of really nearly three years of intense discussion around what ROR should be and how it should work. And then we launched in January of 2019. With a sort of a pilot project that was essentially just a synced clone of an existing organization identifier called GRID, which was run by Digital Science, who as you know, produces some wonderful software, things like Dimensions, and had almost as a side project made public these organizational identifiers.
AMANDA FRENCH: And then found that the demand for them was so great that they were quite happy to get out of the public organizational identifier business and hand it off to ROR. I think so. They worked with us closely and then in late 2021 and early this year, we sort of have what I think of it as we've left the nest. If GRID was our parent and ROR is a young adult,
AMANDA FRENCH: ROR has sort of gotten its own apartment and has begun buying things at IKEA.
AMANDA FRENCH: So the last two or three years. really have seen a lot greater adoption of ROR. Dryad was the very first ROR adopter in 2019. Ted Haberman can tell you all about that since he was really crucial in that implementation. Again, this connection part is really important. DataCite schema accepts ROR. Crossref schema accepts ROR. They're still being they're still sort of modifying their schemas to make them even more ROR-ish, as it were.
AMANDA FRENCH: ORCID accepts ROR IDs for various affiliations and so on. And we've expanded the community, we've expanded the team and formalized the sustainability model that I talked about. So we have just turned four. So here we are. Four years. You you could say we've been in, I think, college, more than high school, and graduated. So where are we going?
AMANDA FRENCH: Well, given that it is near the beginning of the year, I thought I would just show you ROR's. Road map. Now, these are sort of immediate future things. These are on the scale of the next year or two, we certainly plan for ROR to be around even longer than that. But I thought that on the theme of innovation and developments, many of which have to do with our community, we could go over some of these.
AMANDA FRENCH: So one of the biggest projects of 2023 is that we're really revamping ROR's metadata schema. We inherited the metadata schema that we have from GRID. And now that we have diverged from GRID, we think we can really simplify the metadata schema and rationalize it to a certain extent. So we've called for community feedback on that. That's been really crucial and it's going to take quite a while to implement this year because this drastic of a change will require rewriting most of our tools.
AMANDA FRENCH: And our APIs. And then we want to really make sure that the people who have already integrated ROR know about this change because it may break their integrations -- it almost certainly will. And support it for them. We will maintain the current version of the schema and API concurrently with version 2 for a while, for long enough for people to come to terms with it.
AMANDA FRENCH: But this is going to be a great deal of work. And I think, you know, our philosophy behind this is that we want to do this work now so that we don't have to make these kind of disruptive changes in the future. So, again, once you get out of your parents' house, move into your own apartment, that's really the time to kind of say, Hey, am I the kind of person who wants to do the dishes every night?
AMANDA FRENCH: Or am I the kind of person who wants to leave them be until the weekend? I will leave you to ask your own grown children about their practices in that regard. We have some more projects in mind just to do with that community curation model that I mentioned. It is the case that more and more and more and more requests are coming in every day to ROR.
AMANDA FRENCH: I will. You know, again, I encourage you to go look at our curation queue. People are, because they can now see their record openly, I think, and because they are understanding that ROR is an emerging standard for this kind of thing. We get lots and lots of requests, more and more to update existing records and to add new ones. Last year alone, which was the first year we had a full time curation lead,
AMANDA FRENCH: We touched, we edited, we modified about 10% of the entire registry, which is about 104,000 records. And that's only increasing. So in the coming year or two, we do want to make sure that we've got organizational records for essentially non-western organizations. We already have quite good coverage in some ways, but we want to make sure that we have the best coverage of those kinds of organizations.
AMANDA FRENCH: We're continuing to work with Crossref on reconciling ROR with the Funder Registr. Crossref co-maintains the funder registry with Elsevier and I think we I think this work is at least two years out to sunset the Funder Registry in favor of ROR. But Crossref is committed to that work and we have already done quite a bit to make sure that whatever is in the Funder Registry is in ROR and we'll work with Crossref to make sure that work continues.
AMANDA FRENCH: And then one thing we've been hearing a lot is that we need better processes for requesting changes to a lot of records at once. And this is somewhat related to the fact that as people begin to integrate ROR into their systems, what they find is that many people will begin typing into a form. What organization are you affiliated with? Pop up a nice list and they'll choose something from that list.
AMANDA FRENCH: But if their organization is not in ROR, all of these systems in theory support just typing in something in text and sending that to the organization. So it turns out that a lot of our adopters have a great deal of data about organizations that are not in ROR. So we would like to think about workflows whereby they can send that information back to us and we can review them to see, Are these organizations that should be in ROR?
AMANDA FRENCH: And then finally, as I've mentioned, we really rely heavily on our community. We consider ourselves a community led, community governed, community curated project. We currently have about 150 members on our community advisory board who attend bi monthly community calls. And we are always looking for more folks to come because of course, not all of those 150 people always attend every call, but we usually get around 50 at least.
AMANDA FRENCH: So we're always looking for more perspectives, more feedback, more advice. We're going to be revamping our community practices workflows, and strategy to make it easier to join the ROR community and to make the lots of activity that does go on a little bit more visible. And then we right now know of about 80 integrations of ROR. I've talked to lots and lots of people who say they are in the process of integrating ROR.
AMANDA FRENCH: So it's just sort of continuing to find those people who are integrating or who might integrate ROR and bringing them into the fold. And then as I mentioned, because we are seeing so many requests for record updates, record additions, we are going to look at ways to support our current community advisory board better and potentially recruiting more of them. That's all.
AMANDA FRENCH: Thank you so much. Please contact me if you'd like to get involved. Thank you very much for that, Amanda. And now I would invite all our attendees to come over to the Zoom chat so that we can talk about what we've heard. Discuss any questions you have? If there are if there's anything you'd like to discuss with our panel of experts, then we can do that, then.
AMANDA FRENCH: See you there. Bye bye.