Name:
PIDInnovationsAndDevelopmentsInScholarlyInfrastructure
Description:
PIDInnovationsAndDevelopmentsInScholarlyInfrastructure
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/53dc831c-d58d-4b7d-b1b2-fe2ba2575288/videoscrubberimages/Scrubber_1.jpg
Duration:
T00H41M28S
Embed URL:
https://stream.cadmore.media/player/53dc831c-d58d-4b7d-b1b2-fe2ba2575288
Content URL:
https://cadmoreoriginalmedia.blob.core.windows.net/53dc831c-d58d-4b7d-b1b2-fe2ba2575288/PIDInnovationsAndDevelopmentsInScholarlyInfrastructure.mp4?sv=2019-02-02&sr=c&sig=%2BKk3veGx68wJskS637nUElh%2FwOjnLm%2BUB5bGRyQnbYo%3D&st=2025-01-02T17%3A27%3A49Z&se=2025-01-02T19%3A32%3A49Z&sp=r
Upload Date:
2024-03-06T00:00:00.0000000
Transcript:
Language: EN.
Segment:0 .
Hello, everyone. Whenever I got his haircut. Yes actually considering it's over in Brisbane and feel so much more cooler now that I've got the haircut. No, we just got six inches of snow today. So I need to, I need to grow my hair out more or less. All right.
So, everyone, welcome to this session. If you've listened to the presentations that we've been talking about, you know that this is where we discuss what we've heard. If you have any questions, then this is a chance for you to ask those questions. We have some of our speakers. Online ready to answer them. Unfortunately, Matt mentioned that he wouldn't be able to make it for this chat, but he has asked me that in case if there are any data site specific questions.
Kelly is there. So Kelly may be able to answer those related questions. Otherwise, Matt is more than happy for me to forward those questions to him and then he'll send you those replies. But and also on that, he has put in a link to the notes document that's there on Google and. If you've got any notes, anything to share, anything you'd like to be recorded.
Put it over there. And what we'll do is everybody can see it and we can act on that. So moving on. I understand there was a question from Amanda about RAID. Yes I was looking carefully at the metadata schema more closely than we did when we the proposed metadata schema for Raed and I noticed that there was a start date but not an end date.
So I was just wondering, you know, do the projects ever end unless I just missed it? Yes and. So with the raid. With the raid stuff speaking with some of the great folks. And what they've said is that there's a fair bit of development work currently going on. At this stage they don't have an end date because the thinking behind that is with any sort of research project or research activity even after the.
Project is done. You still have journal papers being submitted and you still have impact being tracked. So it's. They haven't specified an end date yet. It might be something if the community wants, then it might be something that they may look at. Putting in. So yes, but there is a fair bit of development work at this stage going on and it is not something that has been decided, I guess.
So it will depend on is it something that the community wants? Is there justification for it? And if so, then maybe it is something that they may work. There's a question from our over here for you, Amanda. Hi, Howard. We're already has parent child relationships. So, yeah, we yeah, absolutely. We already do.
And we've done I always forget the statistics on this. We've done a lot of reconciliation of the offender registry with raw ID, and I would have to look up the exact number of it. I think it was in our annual meetings. But suffice to say, a large percentage. I've forgotten the exact percentage of. Thunder records have a corresponding route ID records. It's quite a small percentage where there is a thunder ID that that doesn't have a roar, that doesn't have a corresponding route ID.
And, you know, there's basically a pool of ones where we're not quite sure if there is a corresponding Royalty because we're not quite sure if the match is. And then there are some that we know for sure are out of scope. But yeah, raw already has parent child hierarchies. So and Thanks man. That would be a really good thing to publicize much more broadly than has been done because I've been hearing even from some people within ISO that the parent child relationships are not there.
So I think it's really important that we get that message out. I think you're right. I actually wrote this documentation page about that specific thing just the other day. It was sort of buried in our documentation. That is obviously just the bare the bare minimum and it may not be structured in the way that people expect it to be because it's not as though it's not like a Doi where you have a, you know, you know, it's not as though there's one ID and then subsidiary IDs.
There are say that you have, you know, a University is a typical case and a research Institute at that University. Each of those will have its own route ID and the raw IDs are entirely random. So they're not the Route ID itself has no those two authorities have no relationship to each other. But we have an entire element called relationships, which has values for parent and child among and related organizations.
So you just you can just graph a whole tree. This is a child of that. This other thing is a child of that. You know, like organizations can have grandchildren and, you know, parents and grandchildren and all that. So it's really easy. There's even a script to traverse. So I think you're totally right. We need to get out going around saying raw doesn't support hierarchy.
That's entirely wrong. So I'd be happy to help you in cross Crossref get that message out because if we're going to make this transition, which is not going to be an easy one from the open front funded registry to raw. And by the way, as you can tell, 100% behind it. Want it to happen, right? Yep yeah, but I want to happen.
But we need to get in front of this. Right and so therefore we need to get the messaging out now that, hey, it's already out there. And by the way, the open fronted registry, you can see it in the following ways within raw, you know, not technical jargon like you've got here, because it's the other type of people that aren't as technical that need to hear this. Especially the funders that I deal with, us funders and so forth.
Is there a possibility the child belongs to two parents? I think I haven't seen any examples of that. Unless you mean grandparents. I don't know. You know, obviously there can be know, a child has a parent and that parent has a parent. So there can be two parents in that sense, but not in the sense of I haven't seen an example, but I would have to check if the thing actually technically supports having two parents, if you know what I mean.
So the ultimate answer is no one child cannot have two parents in that sense. You mentioned something and then we'll go on to Ted. It's on. You're still on mute, Chris. All right. I thought I'd taken it off. Yeah, we're getting just into a little bit of a deep conversation on raw and the fund registry there.
I wonder if the topic of how to handle funding programs has come up. Just for full disclosure, I was involved in the creation of the funder registry to start with, and it does contain, to my knowledge, a number of funding programs which are not organizations. So I wonder if anybody thinking about how we're going to deal with those if the idea is to transition the funder registry into raw.
Feel free to take that away and think about it. I didn't want to put it. Yeah, no, that that's what I would have to do because. Yeah the I've actually noticed that myself that because obviously there are grant identifiers and there are funder identifiers. And then in between those two things stands for the funding program and there absolutely are no identifiers for that.
And as far as I know, there are no plans to create identifiers or even sort of standards for that. The that would be an enormous project. And then the other thing is those programs change all the time. I feel like that's not a very stable kind of. Data set, as it were, but. Yeah I don't know that that would be the right. That's just a personal thing.
Yeah, it depends on the funders, and I'm sure we'll know more, but I know that when we put the industry together, they were. It was important to some of the funders to include that level. Right yeah, I can. In fact, Japan science and Technology Agency is very insistent of having those programs in there. But I do hear you, Amanda, that they are definitely more ephemeral. But but if there's no plans, that's an important factor.
Also say we need to get together to figure this out. Yeah so there are I don't know about them. I mean, there may be plans that I'm not privy to, but. Right I will say to that I am sorry. Let me just say one thing. This is again, not even speaking for war. I'm speaking entirely for myself here. But I do also feel that it's important to solve the first problems first, or at least get closer to solving them.
I mean, the just linking funders with research outputs is, to my mind, the key problem that we need to solve honestly, even before linking grants to research outputs. And so to my mind, if I were queen of the world, I would say the programs could wait. Let's do that part first, then worry about the programs. But Thanks for that, Amanda. Ted, I believe you want to do say something or ask a question.
I think that we I'm glad that both Amanda and Maria are here. We can get to different points of view on what Rory is doing. Raw inherited a bunch of parents and children from grid, of course. So, so, so. A lot of those relationships were grandfathered in and saying that Rory supports those relationships is clearly true because there are parents and children.
But at the same time, raw. Raw is also focusing really on the sort of highest level, institutional level. So so for instance, saying that, oh, Rory can do children and we might want to clarify that. That doesn't mean they're going to be lawyers for departments at universities or other children. I mean, there could be people in the audience who are associated with children, organizations within larger organizations, and they're rapidly making suggestions to raw.
So I think there is some support in raw for relationships between organizations. But the focus of the audience of raw, as far as I understand, and I'm hoping you could Make sure that I have this correct is sort of at that higher level still and is going know you're not encouraging overpopulation of children organizations. Yeah and again.
So here's one thing we hear a lot that people would like there to be that level of granularity in war. There's no denying that. We hear that people, even just for the affiliation use case want departments in World we do hear that again. Rau is really focused on that top level organization. And again, this being more my personal thing rather than the official raw take to my mind that is a problem, but it is not solved yet.
It makes the most sense to me to focus on that top level organization thing. The other thing I've heard from a lot of people, a lot of integrators, is that when they try to use other organizational identifiers that do have a deeper level of granularity, they find it way too noisy. So that, for instance, and the way a lot of these things work is that, you know, that you'll have one identifier that is for the institution only, and then another identifier that is for the institution and the department, you know, which is.
Weird, you know, like you can't have. It means, like, if you have just even two fields in a form, you have two entirely different identifiers, both of which have like redundant metadata in them. Like it doesn't make sense to me from, from a sort of a clean metadata point of view to have that. So I hear from a lot of people that raw is really easy to use or as lightweight or as fast, you know, there's, there's no credentials that's all open.
So, you know, and there and it does support hierarchy. So we're really focused on what is the primary sort of affiliation that that researcher needs to cite. When they're doing their research outputs. The other thing I think, too, you know, as we say, because we're as open. Open source code, open data zero data people can and in fact have bolted other taxonomies onto that there was a sort of a.
A pilot project using vivo taxonomies where they said, hey, could we map these to raw so you can do all kinds of mappings with war? Because there is open data, open API that you can't do with other, not with every other identifier. So you. Thank you for that, Amanda on Ted.
There is a question here from Sophie. Some research organizations publish some of their research outputs themselves, not by a third party for this type of research output. Is it OK to document the organization's role in a metadata publisher element? It's there in the shop window. I shall post it again.
Yeah I'm sorry. I'm I missed that. I think the question was, if you're trying to describe a publisher and is it possible or is it allowable to give a raw to that publisher? And I think the answer is definitely yes. If if you're publishing a data set and you're using something in data site before 4.5, you can't actually do that.
We're adding in that identifier capability. And Ellie is leading that work, but adding that identifier possibility for a publisher. So you could have a raw for a publisher in data site starting sometime early in this year. Right I'm not that familiar with cross publisher element. I haven't looked at it much recently, but yeah, I think that's going to depend on.
Yeah if I'm understanding the question correctly, I mean, if as I understand, the data site does support raw IDs as an organizational identifier in the publisher element and well yeah, in the next schema. So, so if it's a research organization that is in scope for war, it will have a route ID. So then the question of where can you, can you put that in? Publisher metadata is going to depend on where you're sending that metadata to.
Right so data site will support that. Crossref if I'm. If I'm not mistaken, and it's entirely possible this could get too technical. The primary place that the Crossref metadata schema supports where IDs is in the author affiliation. So I am an author and I have been I am affiliated with this organization, but that is wrapped in a metadata that's within a metadata element called institution.
And often when an institution is publishing reports, then the institution is the author of that report. So you can also have raw IDs as I am in that element, not just I am affiliated with this, but I am the actual sort of organizational author of this might I think. And I'm not 100% on the Crossref metadata schema that the Crossref publisher element does not support identifiers yet, but they might in the future.
But there may be other systems that. Yeah, for sure. It would take a route ID in that published element, if I'm understanding that correctly. The side case, if you have an organization which is a thing, not an affiliation, it can also have a raw in the name identifier element instead of an affiliation identifier. So, so if you have a creator or a contributor that is an organization like a field station, for instance, it could have a raw as a name identifier associated with that name.
Yep and Chris, I see you've put in the chat that crossword members are publishers, so they use their internal IDs for their members. That is correct. Crossref has a member ID, which is not just an internal ID because it's exposed in the Crossref API. So you can look that up. And interestingly enough, Bianca Kramer, whom I think some of you might know, was asking just the other day about mapping member IDs to raw ids, which we have actually, which is entirely possible and we've sort of begun to play around with and there are a couple of very raw spreadsheets, like just with a first pass of here's a, here's a Crossref member ID, here's a, here's a route ID.
And so that was a project that we had actually talked about when I first began at Crossref last year is mapping Crossref member IDs and crossref, interestingly enough, not. You know, when things have Crossref as a publisher trade association, which it is, but increasingly research universities are members of Crossref because they do due to their own publishing. So it's not just, you know, the Wylie's and the Elsevier and so on who are members of Crossref.
It is there are plenty of research institutions who are members of Crossref and so would have a member ID that could be mapped to a route ID. So that is a project that we've been looking at. But I must say we have kind of put that on the back burner because it seems less important than, for instance, the funder registry. And one of the things that I was going to write in the chat as an answer to you, Howard, is that that funder registry work is a get.
You're absolutely right. It's a big project. And it's going to take a long time. And I think it is important for everybody to know that the funder registry is not going anywhere right away and it will not be going away. It won't going to it won't just disappear quickly. There will be a long sunsetting period. So it will take if you are using the funder registry, please continue to use the funder registry.
You need to do nothing at this moment. It'll probably take my guesses a couple of years. So Thanks for that, Amanda. I see we've got Shaun with just recently joined us. And so Shaun would be able to directly confirm what I've just mentioned or even contradict it. Hey, Shaun, there was a question which said that which asked if there was going to be an end date in the red Record.
So how your research activities, does that have an end date and will that be there in the record and. Is there any scope of adding 100 to the red record? Yeah so I guess there is an end date. We've made it optional because we know some projects. I'm laughing because I'm part of an archaeological project that's been going on since 2008 and we've still got. But so it's, we're requiring a start date for the project.
So when it started and we have relatively few required metadata elements end date is optional I would say in date can be useful for things like we're talking about using it for when is an organization of the hook for answering questions about array? Like if somebody has a question that we're requiring some contact information about a project. And if a University or other research organization gets tired of handling, you know, inquiries or whatever, they can close the project.
And then we'll know if there's an end date that that's fine. We don't necessarily have to have an active contact for the project. So we do think there's some utility to having an end date. We're not making it required, but it's there. And then the second question about what was which was it about integrating with? Was it Crossref funder IDs or grant id? Sorry, what was the funder id?
Yeah, that whenever our basic strategy for raid is that we pretty that if a for something and organizations, individuals outputs inputs exists we capture the PID and we do not duplicate information about that. And so we are working under the assumption that when you have funding that you'll put in that when it's possible understanding that not all grants have these, that there'll be a grant ID there.
And if we've got the grant ID, we kind of think then that you will follow the grant ID to find out who the funder was rather than duplicating that information into enumerate. That's provisional. If this I guess we're our opening gambit here is to be very sparse this way like to really push it to see that not to capture any piece of information that can be derived from another piece of information.
And again, like, so we're not, for example, keeping affiliations in that. If you've got it, if you have a contributor and an ORCID with an orchid, we're not then also capturing that affiliation, you know. So again, we're taking we're starting with a kind of extreme position here on this. No, no, no information can be derived from another element. But if we need to back off on that, when, when we start getting raids minted in large numbers, these we're considered we know we're going to have to change our metadata schema.
There's no chance we've got it all right on the very first try and we'll see how it works when more raids are being used to the full extent that they can be in the wild. So that's. Fantastic Thanks for that, Sean. All right. Do we have any other questions? I see our chat window is going great, so that's nice.
But anybody else wants to ask any questions from our presenters. Or if you think there's other gaps that niceness or community discussions could fill. Besides the things that you've raised.
Great silence. You know, you've covered everything if there is no questions at all coming up now. Oh, Chris. Not not a question, but maybe a prompt for discussion. It occurred to me that several of the discussions we've had this evening, the one about Roe versus other identifiers, the one about the funding registry, also the presentation about rate earlier on, it was very clear what it wasn't just pointed to the importance of scoping identifiers.
It's and the importance of really being kind of use case driven and different use cases yield kind of different overlapping scopes, which creates messiness in the world. But the world of data is, as I'm sure we all know, always messy. So just sort throw that out there and see if there are any thoughts from any of the panel members on the kind of challenges of how scope for your bid.
Well, I have thoughts about this, actually, because Roger has an admirably defined use case, which is as. Research affiliation. The research affiliation use case is the primary use case for war. And what's wonderful about that is very focusing for us. And it's, you know, it leads to a very clear message for other people.
That being said, people use war for all kinds of other things. And it's not as though we're going to forbid that sort of thing. For instance, I've been I found it really notable. How much what I think of meta science, sort of scientific analytics, what you might call biblio nutrition type of stuff, info, nutrition. Those fields are thrilled that something like raw exists because they just need it for analysis.
Now, often what they're analyzing is which institutions are these researchers affiliated with? Right but but, you know, so it's sort of a follow on a knock on from that affiliation use case. But it isn't, you know. The primary use case that we imagined were being used for. So so I think that's a good thing. Know, I mean, let 1,000 flowers bloom, by all means. But it does help to have that really focused idea of what war is for which rhymes.
So you know, one thing that we didn't talk about, but it's relevant to this in terms of what's happening right now. We talked about Crossref under ID sort of merging with raw. Matt in his talk mentioned that data site is running the raw infrastructure. We talked about RDA against instrument identifier sort of moving into the data site infrastructure.
Matt also mentioned exams which have international generic sample numbers and I guess ends have existed for quite a long time, maybe 15, maybe even 20 years. There are already a lot of them in use all over the world. They're moving into data site. So and Matt also mentioned his talk. The data site currently has 28 different resource types and we know that we're adding instrument at least one.
And there are some discussions among some people about adding project as a possible resource type and data site and how that works with Ray. It'll be interesting to see, but I think in terms of scope data site is the I think is the broadest scope of sort of the big three of orkut and Crossref and data site. And I think that's an interesting characteristic of it.
And Matt, in his talk also expressed an interest in other pid, PID infrastructure providers, sort of thinking of merging, jumping on the data site bandwagon. So I think that there is quite a bit of convergence in the infrastructure layer and I think we're sort of ORCID. And Chris, I know that how many ORCIDs there are right now.
I don't have that number. But Jane yesterday mentioned something like 130 million doses in Crossref and data sites in the 30 to 40 million range orchids, probably in the 50 to 60, I'd guess guess about 15 million. 15 Oh. Guess that's more than I thought. But but, you know, there's a lot of convergence. And in my talk, I talked about the difference between identifier metadata and metadata for things.
And that's an important thing to keep in mind that that data site is really identifier metadata. It's not metadata. It's not really project metadata. It's not genomic metadata, it's not climate model metadata. It's metadata for those identifiers. So keeping those infrastructure simple and the metadata schema simple is something that I like. So and that addresses sort of question that Chris brought up about scope.
You can have, you can have as long as you have a has metadata relationship, then the identifier metadata can stay simple. Maria, you've got to hand out. Yeah hi, everyone. Sorry I'm late. Enjoying this discussion.
I guess just to pick up a little bit on what Chris was just talking about and what Ted was just talking about, and Kelly's very helpful comment in the chat, is it kind of two themes that I think I'm hearing is that this conversation comes to a close, and one is really about the importance of what we're really seeing emerge as these core open infrastructures that provide open pids with open metadata to help create these foundational components and foundational layers to be able to understand really important.
Things and do important things in the scholarly communication landscape. And so this so data site is, is a really good example of being able to provide, to provide that kind of infrastructure and in a lot of different ways for a lot of different constituencies. But it's really not operating alone in the way that raw is not meant to be in its own little silo or is really meant to be something that's part of route ID is really designed to be part of data, cite metadata.
It's designed to be part of work and it's designed to be part of Crossref metadata. And so the importance of kind of recognizing how these pieces come together is, is really key and the importance of supporting their sustainability models for the long term, because it's not just about supporting raw, it's about supporting all of the different use cases that depend on these infrastructures.
And we, we all benefit when they are successful. And the other thing is just understanding that there are. There are different as I think the session has addressed. There are these different kids are fulfilling different functions than are designed to operate in different ways. A researcher individually is maybe registering their own orchid, but that's a little different with raw because an organization is not maintaining its own organizational profile.
And a Doi is this kind of overarching container that may contain other kids in it. And so just it's important to kind of be a little nuanced in the discussion about the scope of what these individual kids mean and then collectively kind of understanding how they're meant to come together. So those are just a couple of things that stood out from the discussion.
I'm not sure if I really addressed your point specifically, Chris, but I appreciate your bringing it up. Thanks for that, Maria. And do we have any other comments or questions? But I just mentioned going back to some discussions about the Thunder registry, I did look up the statistics. And so for our really wonderful curation, LEED has done some, some, some, you know, amazing reconciliation of the Thunder ID with the raw registry and talking about things that should be publicized more.
I would love to do some sort of campaigning around this but he had found that for works in Crossref that assert offender ID 90% of those have a corresponding route ID already. So then there's that remaining 10% Some of them don't have our IDs and some might not have any IDs. But I mean, I think that 90% is a really good it's actually 89.88% You know, it's like, 15 million works in Crossref or so have that vendor ID and more than 13 million of those have something that could easily be assigned to our Ide.
So as always, it's that last mile problem because you need you need all of it to have a correspondence, but there's, there's a lot of correspondence already. So then it's just doing all of that work that really sort of tedious work to make sure that we get to 100% before we do any kind of work to sunset the funder registry. And you're really right, Howard, that it will. You know, we need we definitely need to keep the Thunder registry around until everything is really solid and there's a really solid transition plan for everybody who does use that, because there are a lot of people who use that registry.
Amanda, you talk about the ad, I think integrations you mentioned 80 integrations of raw. Are you aware of anybody that's integrating raw as a funder ID at this point? Because there are a lot of organizations that use the same vendor a lot, and so they only really. Yeah and by the way, I did want to clarify too, in case anybody we know of about 80 integrations, not all of them are listed on our public integrations.
Some of them are still in progress. Some of them we haven't really looked at. But to answer your question, I'd have to look into that. I know for sure that people have asked me, when I am referencing a funder, should I use a funder ID or should I use a raw id? And my answer is do whatever makes the most sense for you because the Thunder ID is not going anywhere any time soon. But eventually those two will merge and you know, and there will be a one to one mapping.
The whole point will be that one thing that I do know, Ted, is that the really nice software proposal, central is using raw IDs. So they have to they are. When funders are soliciting funding proposals from people using proposal central, those entities are identified by raw ids, possibly also funder IDs. I think I'd have to look into that, but I know for sure they are using raw IDs.
Now Howard's got his hand up, Howard. So I'm following on Ted's point. I think you now, now having this information from Amanda, we could then say, OK, let's look at all the places where thunder IDs are being captured. Right so it's the magic of tracking systems. It's within the publication workflows, it's within grant systems.
There's a whole host of areas. And let's and this is what I mean, we need to get this announced and then begin the long transition. Right it's going to take quite a while for this to happen. So Yeah. So I'm all about. And I think helping you make this happen. But let's make the announcement, right?
Raise the flag and then let's start getting the wheels in motion. Yeah yeah, no, I agree. I think the announcement that this work is happening has been filtering out. There hasn't been a big announcement. I don't think we're know, we're not ready to announce. I mean, you know, the know, the idea that the Thunder registry is going away is also, you know, it's not secret, but there hasn't been an announcement of it, partly because of that last 10% Right we need to still finish a lot of analysis before, I think we're ready to even start saying it's the beginning of the transition.
You know, it's still pre transition, but Crossref is committed to doing that. It's just we're just at the very beginning of making that happen. But yeah, I agree. I mean, let's do some communications around it. I think that's great. We also just don't want to like freak people out anything too much, you know, because if you're using the Thunder ID, that's fine, that's great.
Please continue to do that while we continue to make sure that you can transition away from that smoothly, because that's the whole point, you know, is that you want to be able to do it, do it easily. Sorry, Sophie. Don't freak out. Don't freak out, Sophie. So fine. Howard Howard.
One of the places I've worked at Nola for a long time and people call rain a lot of different things, as you know. So it might be nice in some of the searches where people are searching for funders, funder and they put in an ID to treat the Rohrs for NSFW and the Crossref under IDs like synonyms like you would like you would treat a synonym in a regular search system so that if someone searched for the raw that happened via funder ID, they the search engine would also search for the because that's the discovery side of this is going to be important along with the input side so so maybe in something like chorus which I it has a lot of funder oriented stuff might be interesting to think about synonyms or they're already thinking about it.
Thank you. Of course. Great guys and thank you very much for coming to this session and participating in this conversation. It's been pretty good. It's gone on for a while now. And we've had some really interesting discussions happening. Once again, thank you all, especially presenters Matt, Ted, Amanda and Sean.
And I hope to see you all at the conference at other sessions. But so now from all of us here. It's bye in this session. But I shall see you all online at some stage during the conference. All right, guys. Bye and thanks, Nitti. Thanks, everyone can I today's by.