Name:
One identifier to rule them all? Or not? Recording
Description:
One identifier to rule them all? Or not? Recording
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/ce172560-f1cf-46ad-8fce-3b13a32e38ee/videoscrubberimages/Scrubber_3.jpg
Duration:
T00H42M28S
Embed URL:
https://stream.cadmore.media/player/ce172560-f1cf-46ad-8fce-3b13a32e38ee
Content URL:
https://cadmoreoriginalmedia.blob.core.windows.net/ce172560-f1cf-46ad-8fce-3b13a32e38ee/One identifier to rule them all-NISO Plus.mp4?sv=2019-02-02&sr=c&sig=wCNau1eT7CSM9HYet9RigR750V3kFBzQh7qySGtS9vc%3D&st=2024-12-26T21%3A31%3A14Z&se=2024-12-26T23%3A36%3A14Z&sp=r
Upload Date:
2024-03-06T00:00:00.0000000
Transcript:
Language: EN.
Segment:0 .
Welcome, everyone. I'm rusty mihalic.
I direct the library at goldderby.com college in Wilmington, Delaware. I'm honoring the session one identifier to rule them all or not. In this session, we will have three presenters who will share their perspectives on whether there should be one master vocabulary or persistent identifiers, or whether there should be many identifiers that specialize in specific subject areas.
For example, dois are pretty well established, adopted, but there are many other ways to identify what happens when things change. Who has the authority? Now we're going to be introducing our speakers. First Our first speaker will be Gail. She is a French librarian and a researcher in information science. She published a book in 2014 that explores the sociotechnical construction of national digital libraries in Austria, France, in the UK.
She's the director of the iss and International Center. The registration authority for ISO 3297 dash Ison and an organization, an intergovernmental organization under the auspices of UNESCO. She is involved in standardization activities with ISO tc 46 information and documentation, and she has published several articles about persistent identifiers. Her her full profile is available on LinkedIn.
Our second speaker will be Jonathan. He's a managing agent for the Gi foundation, which is a not for profit membership organization that governs the Doi digital object identifier and is the registration authority for the ISO standard. ISO 2632 for the Doi system provides the technical and social infrastructure for the registration and use of persistent identifiers called diis.
Jonathan also works as an independent advisor on strategy and innovation. He's a guest lecturer and external examiner for the masters, Imagineering and master's and strategic events management programs at the University of Applied Sciences. Prior to this, he worked at Elsevier for 20 years in various positions in publishing, marketing and technology deals.
ABC in a PhD in chemical engineering from the University of Newcastle upon Tyne. Jonathan was chair and director of the Doi Foundation from 2005 to 2010. He lives mostly in Croatia. Finally, our third speaker will be Beth. She's executive director of the pervasive Technology Institute and mcrobbie, professor of computer engineering, both at Indiana University in Bloomington, Indiana and United States.
Professor Plale served three years, 2017 to 2020, at the National Science foundation, NSF, working on advancing open science both across the NSF community and in concert with federal funding agencies in the US. She serves on the board of directors of the DONA Foundation located in Switzerland. The DONA Foundation credentials Multiple Primary Administrators (MPAs) that operate the public facing services of the global handle registry and manage the protocols and services of the Digital Object Architecture DOA.
Plale's research interests are in open science, trustworthy AI, artificial intelligence, and FAIR data in cloud computing. Please welcome our speakers. Thank you. Thank you very much, Russell. And hello, everyone. I will start sharing my screen and hopefully it works. I am happy to be here with you for this NISO Plus 2023 session.
And I will talk about diversity in PIDs and this is the outline of my presentation. I would like to focus first of all on ISO principles of identification, then talk about the ISSN as a PID and then, you know, tackle a few issues about the landscape, the PID landscape between fragmentation or specialization of PIDs.
And last but not least, I will focus on diversity as an opportunity for research and scholarly communication. So what are the principles of identification? Actually, this is a technical specification which was published last year. And actually several people from NISO have been involved in the drafting of this technical specification.
Why is it important? It's important because actually it stresses the main characteristics of PIDs. As you can see on the slide, there are about a dozen characteristics which are important to actually well, for PIDs in general. I will start and I will say a few words on each of these characteristics. About uniqueness.
Of course, it's important that an identifier, a persistent identifier, be unique within the context of its namespace or within the context of its implementation, if you want. I mean, it could be clearer if I say one referent has one identifier. Of course, there can be several identifiers within their own namespaces, so different identifiers to identify the same type of referent.
But they have to be unique within a given system, let's say. About persistence, of course. That's part of course of the acronym. They need to be independent from the registration or minting system. There should be some preservation done by the registration authority or the registration agencies. And of course, copying or preserving the data is also important. A succession plan also for an organization.
You know, managing identifiers is a must. About granularity. This is important because one PID can be adapted to the specific needs of the community that actually uses them. So the idea is to have identifiers assigned as precisely as possible within a given scope for specific resources, objects that can be, of course, physical objects or digital objects, that's for sure.
About the kernel metadata. It should be stable. It should also be available, and kernel metadata is used for minimal identification of the referent. And this is, for example, the name of the referent, the country where the referent is based or operates, the medium, the language, well there are, of course, this is a bespoke I mean, there should be bespoke metadata depending on the referent, which is identified.
Access should be as open as possible. And the idea being behind that, the rationale being to allow reuse by interested parties. Of course, re-use is paramount. About the scope, It's important the scope is well described and the idea being to have a definition of the types of referent which are actually identified. Another very important characteristic or feature is about the semantics in the PID string.
There is an agreement within the standardization community that semantics should be avoided within a PID. There should be no encoding whatsoever of a language or country if that's possible. And it should be. I mean, the identifier should be as much as possible a plain string or a URI, but something rather plain and without any meaning. Resolution is also, of course, key because the identifiers should at least resolve to kernel metadata.
We've already seen that this metadata is the minimal description of the referent or to the referent itself resolved to the referent itself or to, for example, a tombstone. Maybe we'll come to that later. If the referent is no, does no longer exist, which can happen. Another very important feature is the timing of assignment recommendations are to assign an identifier as soon as possible in the process of its, let's say, creation, dissemination and reuse.
The best time to identify, you know, a referent or an object is really when it is created. Resilience is very linked to. Persistence is a way to, of course maintain the data curated and also track errors and having some processes to correct the metadata and the identifier if need be, through the application, for example, correcting metadata or assigning a new identifier when this is needed.
And of course, the idea being that the identification system explains or has some or shares some information with the public regarding these rules. A PID system should be sustainable. So it should have a business model, which can be, which can last so that trustworthiness is created created among the community.
So based on that, I would like just to focus on ISSN as a PIDjust mentioning the fact that there are pluses and minuses within the system. Of course, it's unique, persistent. We have some metadata which is curated automatically and also by our colleagues or professionals. We give access to information and kernel metadata is available on our portal.
This is a standardized identifier and you can see on the slide the reference to the standard. This standard defines the scope. ISSNs and that's really fortunate, don't have any semantics in them. So that was really wise when it was invented. So a few years ago that at the inception of this identifier, no semantics was included in it.
So we are lucky with that. We've been around. I mean the ISSN has been around for many, many years. And of course, we work with publishers, we work with libraries to assign the ISSNs and as soon as possible. Well, at the inception of the object, which is, of course, I should have said that continuing resources both print and online and we work with a network of metadata specialists in 93 countries.
What are the drawbacks or the things, the minuses with ISSN? It's not very granular because of its scope. The scope is limited by the standard itself and about resolution. For the time being, the resolution is limited to kernel metadata and enhanced metadata on the portal. And we would like to implement a resolution to the referent itself.
And we're working with the library community to be able to gather information on, especially for digital continuing resources to get current URLs, but also past ones and also archiving URLs. So about the landscape. And I want to just tackle the issue of fragmentation or specialization, specialization of PIDs. And I'm referring here to a study that was published early February by the Knowledge Exchange group in Europe.
It's a group of six European countries, and they have commissioned a specific report on PIDs because of course this is of high interest at the moment in Europe, especially with the European Open Science Cloud being just set up by within Europe or by the European Union. So this study actually has identified or has come up with more details or a detailed analysis of identifiers.
And I especially like, you know, the typology that they've come up with, especially, you know, sorting out identifiers between technical identifiers and admin-oriented identifiers. So their stance. So is that there are some identifiers which are implemented by the researchers. There are used for the identification of instruments, for example, or facilities.
So they are more technical and usually researchers see the point in using these identifiers. And there are other types, another type of identifiers which are more admin-oriented, like, for example, ORCID or RORs or Grant IDs. And these identifiers are, you know, implemented generally in a top down fashion by national offices, by research organizations, by publishers.
So there is a tension here between these two types of identifiers. Another interesting, you know, let's say finding of this study is that there are competing technical solutions within the realm, let's say, of PIDs. There are some organization IDs that exist, which are promoted by ROR.
Some are promoted by Ringgold. And so record maintenance and the systems themselves are completely different. And also, of course, the social organization of these systems, you can have also national level, Author IDs such as the DAI in the Netherlands, and you have more international author IDs such as ORCID or ISNIs So this also creates some tension between identifiers and also within the community to elect or select the identifier, which is the most adapted or adjusted to their needs.
There are also diverse communities, and this list is taken from this report. And as you can see in this list, there are, there is a variety of communities or people involved with research and the promotion of research and of course, the FAIR principles for research. So it's a bit complicated sometimes to find, let's say, coherence or logics in the use of PIDs.
So I would say there are I think that diversity of PIDs is an opportunity because they don't all tackle the same needs or address the same needs. You can have some national or international PIDs. I've already mentioned the DAI, the digital author identifier, which was created by SURF in the Netherlands. DAI was one of the first identifier for authors that was created in 2005, and now it's being superseded by ORCID and ISNI.
But there is still, you know, some links between these identifiers and it's important. You see that you can build on experience gained with national identifiers to actually switch to international ones. So it's important to keep that in mind. There are other examples also in France, of the same process of national IDs being leveled up in a way to international identification.
There is also this tension between specialized and non specialized PIDs with very specific identifiers, such as the one developed by the RePec Author Service which focuses on economics and researchers in economics and providing some services also to this community. And of course, all these systems depend on their metadata suppliers.
They rely on publishers, they rely on libraries, they rely on researchers, communities, to provide data and share data with the identification system. And we can see there that sometimes this metadata needs to be enhanced by metadata curators. So there is also this issue of the quality of metadata and how you can get more.
I mean, better metadata from suppliers and also use some automated processes or human processes to curate this metadata. Well, as a conclusion, I would like to say that given the diversity of the landscape, we should all foster interoperability because the reason why we should do this. is that because we have specialized communities, we have specialized users, and it's important to keep this richness or this wealth of information that we have within our communities.
Thank you. So good morning. Good afternoon. Good evening to everyone. First of all, a big thank you to NISO for organizing this great event. I'm going to continue where Gail left off because I think this topic is really all about interoperability.
First of all, my name is Jonathan Clark. I am the managing agent for the DOI Foundation. And if there's one thing that I think we've learnt in the Foundation over the last 26 Plus years is that interoperability is essentially a human endeavor. And actually it goes back right to the very beginning. So back then, in the late 90s when scientific articles were beginning to reference link to each other, there was a problem, of course, of the broken links, the 404s.
And when we looked at it back then, we really realized that it actually was a human problem and a human problem that needed a human solution. And so what I'd like to do is just take you through some of our experiences over the last 26 years or so of trying to manage interoperability and trying to make it happen. Back in the early days.
We called this a social infrastructure. We call it the community of communities now. And the way that DOI works is we have registration agencies. This is a list of the current ones and they actually do the registration for the DOI. And actually what happens when someone comes to us and says, hey, I'd actually quite like some DOIs What they do is they look at this list and they sort of choose the community that they feel the most comfortable with.
So you can, if you have scientific data sets, you can go to Crossref. There are DOIs for data in Crossref. There are also DOIs for articles. in DataCite, just if you're part of a data community, you probably feel more comfortable working with DataCite. And it works on a national basis too. If you're in Japan and you'd like to register DOIs you probably feel more comfortable going to the Japan Link Center, and if you're in Italy, perhaps you might go to mEDRA.
And so each of these communities represents their own community just as Gael said. But together we try and manage this disparate group. And what we've discovered over the years is we have this kind of, and sometimes very unexpected interconnectedness between these communities. And I've got some examples. So for instance, I mentioned Japan. So if you do go to the Japan Links Center and say, well, please, I've got a whole lot of resources that I'd like to issue DOIs for.
They will actually be able to offer you Crossref DOIs if that's something that you want. If your need for services are some of the services that Crossref provide, then because the Japan Link Center is a member of Crossref, they can do that on your behalf or you can just simply take JaLC DOIs as well. And the same works for all of these interactions. So the RAs have discovered ways of working together that help them both.
And in fact, take the Office of the European Union, their system is actually managed operationally by mEDRA, the European Registration Agency. And so we've discovered over the years lots of interconnectedness. And sometimes it's unexpected. So when the EIDR, the entertainment identifier registry joined, that's essentially movies, DOIs for movies.
We thought that this was completely different domain, completely new domain. But of course, over the years, it's turned out that academic authors also really appreciate audiovisual content. And there's a lot of much more that's in common, much more synergy between that than you might think. One of the newest areas to join is the British Standards Institution, and they're registering and tracking building supplies.
But it turns out that actually tracking building supplies is really similar to tracking movies. Both of these things are very complex supply chains with multiple players in and to track them across the supply chain. DOIs work very well and there's a lot of interaction between those two RAs. And so, in fact, what we found is the longer we spent together, the more interconnectedness that we discover.
And I thought it might be helpful just to run through some examples that perhaps you might have heard of. And sometimes this interconnectedness goes beyond just the community of the DOI. If you have an ORCID ID, you'll find that. if you put your ORCID ID into the articles that you submit for publication or your data sets that you upload to a data repository, you'll find that if you've signed up for it, that your ORCID record gets automatically updated with those DOIs.
And of course that saves a huge amount of time. And that's a wonderful service, but came about purely because ORCID and DataCite and Crossref got together and talked and they thought that, hey, this actually might be a very useful thing. They connected Quite recently IGSN, which is the sample's geological samples. So all the rocks that are found have identifiers and for a long time they had identifiers run by the IGSN.
But quite recently they decided to partner with DatasCite and bring those two communities together. And of course they make sure that the legacy this is a point that Gale mentioned the legacy identifiers will all resolve and alias to DataCite DOIs and from going forward everything is DOI. So those two systems now interoperate. You've heard, I'm sure of ISBN, you might not have heard. There is in some areas you can also register an ISBN-A ir an actionable ISBN.
This is from mEDRA in Italy and it's a way of turning an ISBN on into a DOI and making it actionable. When you click on it, something happens. In this case you come to the catalog entry. And that's an example of two identifiers working together. One, providing resolution services to the other. A something we're also busy at the moment is registering DOI as a namespace for URN and that turns out to be useful for in fact that was asked for by National Libraries that have most of their collection based on URN but some more and more things with DOIs.
And so if we create a namespace that allows them to refer to DOI within their URN structure, another way of simply it's fairly straightforward to do, but it has enormous impact. In this particular case, the National Library world, something I mentioned the entertainment industries before, they have this concept of alternate IDs. So in that complex supply chain that I mentioned, each of the people in the supply chain refer to that movie with their own identifier, but EIDR map those and they call them alternate IDs.
So sitting in the record of EIDR are all the alternatives. So this movie is also known as this, this and this depending on the context. So you can see that you have good eyes, MDB, the movie database, that record is included there and so they can refer to each other and they know of each other's existence. And the final example I have is something Crossref built over the years of concept of relationships.
So if the metadata that is submitted to Crossref contains the identifiers that are referred to within that paper, that article or piece of data or whatever the object is, they can capture that in terms of relations and you can then feed the metadata so that you can interconnect just using the identifiers, these pieces of information. So that's just some examples. I don't have time to go into more detail with them, but please ask afterwards.
And if I have to think about this, what the common theme of all of this, I think it would be human interoperability. And that sounds really easy. It rolls off the tongue human interoperability, but actually it turns out it's a little bit harder. I'm sure you've all seen this wonderful cartoon. It's been around for quite a while. It's the standards world, but I think it applies to identifiers too.
And as I was preparing this presentation, I thought, well, why does this happen? I mean, it does all the time, and it's partly the community thing, but there's something else too. What's this drive to create something new? And I think it has again to do with us as humans, many of us have. It's a very human characteristic to have something called optimism bias.
We tend to prioritize the new shiny thing. We we absorb new information and we sort of value it more highly than the information, the prior knowledge that we have. And that tends to mean that the new thing that comes up is the thing that we go after. And the other reason we tend also then to undervalue what's there and existing. And I think that natural instinct leads to the development of something new.
We have new ideas. We need something new. And of course, this isn't new. Those of you mathematicians out there, this is Thomas Bayes' And he realized this back in the 18th century. But it doesn't seem to have helped much for us today, because I think we still suffer from optimism bias. So if I conclude I'm going back to the theme of this session, one identifier to rule them all.
I would say that that is about as much chance of happening as this man, Boris Johnson, had to be World King, which is apparently what he dreamed of when he was a child. It just not going to happen. But what I think we can do is to recognize the role of humans in interoperability. We can come together, we can discover what we have in common. We can work together to connect our communities and our identifiers.
And that doesn't mean that technology is not important. I put this slide up just to show that each four of these segments is necessary. You need interconnected systems to be if you've connected them, they have to be able to understand each other. The technology has to be independent. It's no good building a technology that's fixed to a location, for instance, such as urls, you need something that is abstracted so that they can exist beyond changes in technology.
But above all, you need the humans and they need to be organized. And so really summarizing the summary, I would say if we want data to work together, we need humans to work together. Thank you very much indeed. All right. Thank you, Beth Plale here.
Delighted to be following my colleagues with the landscape that they laid, which I agree with so much of. I expect the remarks I make will complement what has been said thus far. I'm at Indiana University. I'm a professor there, and I work in data management. As was mentioned in the intro. I've also spent some time at the National Science Foundation in the United States.
Some of my thoughts with respect to open science are informed by thinking about the broad ecosystem for open science. including the funding agencies in the United States which funds major research not in the health space or the medicine space but but the other aspects of science and those products that are coming from research which have moved beyond publications to include data sets and software.
And if one looks at that ecosystem and how those products can be used, the ecosystem is complex. Where this is likely something that people are familiar with. It is institutions that support research. It is the publishers, the repositories and other content providers. It is the producers and the consumers of the research. It is the organizations that advance research.
And I'd also argue it's persistent ID service providers. This is where I was certainly focusing on as I was considering how a federal agency enhances the accessibility and availability of the products that they're funding through funded research. And the principles that are overall guiding open science (that many of us are familiar with and working with) are the principles of FAIR that these digital projects products are findable, accessible, interoperable and reusable.
And in order for that to exist, there are, aggregators, publishers, funders and so on. When one looks at the persistent IDs and these have been spoken to from the perspective of my colleagues, there are IDs for people, there are IDs for organizations, there are IDs for publications and/or data sets. And within, well, let me just say that there are these organizations.
And when one looks particularly at IDs for data sets and publications and people and the awards that funded that getting all of those things linked together is really important. So what one is seeing is the services such as Crossref such as Make Data Count that are value adding on top of the persistent ID services.
Once you've got stability in the persistent ID services, one can then start to link things together. And it's really those linkages, between a publication and a data set and a person and an award. that create the fabric that allows for the accessibility, the discoverability of these products of research.
And when one looks at that grounding, one sees nonprofit organizations that are doing the excellent foundational work in terms of laying the foundation for scientific data products. However, these have business models that are lean or businesses that are lean and recognition that maybe to something to what Jonathan was saying is maybe because it's not the new shiny thing are not as valued perhaps as they could be.
So what we're seeing is the emergence of a set of organizations that are providing extremely valuable services for the promotion of goals that could be considered major country goals. But it's they're fairly vulnerable in their business models. The value that they provide, I think is unquestionable. But the funding that comes with rewarding that value is maybe less frequent or less rich and flowing.
So looking at a funding agency, this that happens to be the picture of the National Science Foundation. They are funding basic and applied research. They're responding to national calls for greater access and visibility to the products of the research that they fund. They're also hugely constrained. Of the National Science Foundation budget, 95% of that budget goes out in awards.
So what they're trying to do in terms of advancing open science is done on a shoestring of a budget. Most recently in the United States, the Nelson memo which came out in August 2002 directed federal agencies to update their public access policies, to make publications and their supporting data resulting from federally funded research publicly accessible without an embargo on their free and public release.
So these products must be instantly available. So we put that together. What we're seeing is agencies that are being urged along in open science in important ways, we're seeing the dependence on IDs As we all know, persistent IDs are critical for uniquely identifying things over time. And nonprofit organizations that are basically stepping up to provide that infrastructure that is needed.
So what I would argue here is I love the idea of interoperability of persistent IDs. I think it's really important and I think discipline, specific, persistent IDs are valuable. But we need that interoperability pushed down because we can't push a dozen different ID schemes for data sets up to agencies and expect them to incorporate them all up into nonprofits and expect them to be able to support them all.
So at some point, we need to recognize that the upper layers that are going to give us the discoverability and the access to these products needs to come around some set of standards in order to make things work and that heterogeneity or the interoperability that we need needs to be at lower layers. So that's I guess that's my one point. The second point I want to make, which is a lesser point, is some activity that's going on in the community that recognizes that it's important to identify.
This is particularly around data. It's important to identify data with a unique ID, but that that's just a starting point. And what else can one do to. Basically kind of look at data as being able to not being trapped inside repositories where you're dependent on the repository to know something about it, but it can carry some of the information with it, including it can be responsive to direct queries that go beyond just give me the record for the ID.
So this notion of globally linked FAIR Digital Objects is a fairly recent development within the last four or five years for envisioning what one of these objects could look like in a more generalizable way. The Persistent ID is a key piece of that. It is just one piece of it. And I do think there's some promise here in terms of bringing the find ability, accessibility, interoperability and reusability to data in a way that allows the common way of looking at data.
To not have to push into the individual repositories and forcing repositories themselves to make changes. I think it's a development. and a development that I'm involved in. And I think it holds some promise. And it also depends on semantic information, which is critical to being able to make things findable. So I think we need to pay attention to the infrastructure that we have in place for assigning and resolving IDs.
The work that's being done there is excellent. I think it's underfunded and it's critical to progress in open science. I think we need to acknowledge that the further out we get from the scientific space, the more stability we need with respect to a small number of solutions rather than a large number of solutions which they can't afford. And they're not the right people to be trying to solve that.
And then from a technical point of view, interoperability of PIDs is entirely workable and an entirely a pragmatic solution to moving forward. And here I would absolutely agree with comments I've heard from both of my colleagues. So thank you. Thank you, everyone, for attending our session today.