Name:
Data and Software Citations
Description:
Data and Software Citations
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/df7aef55-ff95-4a81-b412-330de6da2f19/videoscrubberimages/Scrubber_1.jpg
Duration:
T00H23M46S
Embed URL:
https://stream.cadmore.media/player/df7aef55-ff95-4a81-b412-330de6da2f19
Content URL:
https://cadmoreoriginalmedia.blob.core.windows.net/df7aef55-ff95-4a81-b412-330de6da2f19/Data and Software Citations .mp4?sv=2019-02-02&sr=c&sig=gctiEcBANTQcoqLyQQoTD1jX%2BIS6%2FjoIUXGzPfW5rwA%3D&st=2024-12-10T07%3A44%3A25Z&se=2024-12-10T09%3A49%3A25Z&sp=r
Upload Date:
2024-03-06T00:00:00.0000000
Transcript:
Language: EN.
Segment:0 .
So hi, everyone. My name is Sandy Hirsch and I am the moderator for this session. And see, we have our two excellent speakers, Shelly Stahl and Patricia Feeney with us. And we're so grateful to both of them for preparing such informative and thoughtful presentations. And you saw me pestering you during the session to complete the one question survey.
And if you haven't done that already, then I will just put that again one more time into the chat. And so you'll have your last chance to, to, to complete that. But I'd like to first see Shelly and Patricia. Is there anything you wanted to say as we head into this discussion, or would you like to talk about the results of the survey?
I'll turn it over to you for a second and then open it up to any kinds of questions that the group or any discussion topics that folks would like to have. Patricia, what we provided to everyone is incredibly detailed. And there's no doubt for anyone who actually wants to dig in and please do please do dig in.
I think it's fair to say that there's a number of folks that support the two of us. So I certainly paper by myself. There's an entire team. There's an entire national task force. Patricia was part of it. And that if you need support on the understanding, we can either answer the questions or find you the right people.
Yeah, absolutely. Yeah I guess I can show the results of the survey just real quickly, just to kick off a discussion. That would be great. Or if I have it up to if you'd like me to share it. OK great.
And thank you. Those of you who participated, how there was a jump in the responses. So I appreciate that. Yes thank you. All right. Yeah so the first question, there are two questions. The first one, do you support data citation within your organization?
And that's very exciting. I think this is great. This is very scientific. So we should assume most people are supporting data citations within the organization. We've got a few people who would like to a few people who aren't sure. I think Charlie gave some really great tips on how to figure that out.
And then one person said no. And then we have a word cloud. The biggest barrier is the citation reduction. I don't think it's a big surprise that the largest word is education. Some of these other ones are interesting. We'll have to look at them.
Researcher, education, researcher. Willingness, friction between systems. I think those are all very true. So Yeah. Not not no big surprises, but this is all, I think a lot of good information to start the discussion. Excellent.
Thank you so much for sharing that. That was interesting and I had missed the second question. So two question survey anyway. So I'd love to hear your thoughts. Were any of those barriers surprising or are there any of those barriers? Things that I'm posing this to the broader group that you'd like to talk about, some of those barriers or potential ways to overcome those barriers.
Love to hear your thoughts or any other aspect of this. The information that they provided, which was very detailed. I see that we have Tracy just posted that she thinks she misread the question that she.
Interesting I can happy to share some of the behind the scenes that happened within you and our process. Many of we're partnering with Wiley and we've been Howard said earlier today. And thank you, Howard, for this. Howard Ratner from course, he said that there are the space sciences are very forward leaning when it comes to making sure that we include data citation, which is a lot of work that's happened within our organization and our partner organizations.
So, so imagine I'm doing. A lot of outreach. The last few years I'm saying to student groups, I'm saying to people very quickly, if you cite your data, you're going to get credit, meaning the automated attribution will take place. And I didn't understand how it happened. I just assumed it was happening. I didn't go check to make sure it happened.
And there was a moment in time where I think it was Rachel lamb, Patricia's colleague. We were at a meeting and I said, Rachel, I really need to understand how we know that this is actually working. And we were sitting next to each other in a session that was active. Like she and I couldn't talk. We had to right on a piece of paper and she showed me how the API worked so wordlessly, right?
So silently. I gave her 10 days of our papers to look at and I'm thinking, oh, I'm going to learn. This is going to be great. I'm going to see how it works. I'm going to be able to take screenshots of what this looks like. All 10 failed. None of them made it through the process into the downstream services that crossref represents with the machine readable piece intact, such that automated credit could take place.
All 10. I'm losing my mind. I'm telling people they'll get credit and it is not working. And that's not even the beginning of my horror story. It turns out that every single publication from the last 5 Plus years had no data citations that made it None. So if you're all thinking some of yours are making it, I need you to go look. Because what we uncovered is many issues all the way through the process, and the document that we've provided gives you the clues to go look.
So is your policy supporting like does it exist to start with? Is your do your copy editors know what to do? Is it in the right format? Are you telling the authors the right thing? And then all the way through to. The third party you're using, the platforms you're using, do they get it? And because we had changes like in the last year, the probability of you having at least one major issue is very, very high.
So this is an imploring. Please dig in. Get your feet dirty and find out. So we do have a question that was typed into the Q&A and it's from Fred Atherton, and it says, Thanks for your talk, Shelly. I wonder if our IDs are well used at the content in the Au and if so, what protocols you've adopted for these or any thoughts on what opportunities problems these present and trying to ensure that data software citations are properly captured supported.
So my understanding of our IDs is they're used primarily for physical samples. So so Fred, if you could confirm that that's what your understanding is. And so one of the challenges with persistent identifiers is that our infrastructure favors the Doi. We know that other identifiers are being used for like exams or IDs and their value such session numbers.
We know all of that is true, but you have but and there isn't an answer yet for that. What we do have is investment in the Doi schema in a lot of what I say. And just brace yourself, Fred, curl your toes. If needed. We don't we're not processing duties correctly yet. So this paper is all about getting that right. So if you all could tell me, read it.
Tell me you checked and tell me, it's working. And by the way, there are numbers we can check to make sure that you're right, because there's folks that are counting it on the other side. And if that's true, then let's go dig in and add in the other persistent identifiers. So, yeah, Fred, I'm with you. But let's get to your eyes. Right great.
Thank you. Did you have something you wanted to add or. Charles I'll put the link in again. Yeah I think we covered. OK basically. Excellent other questions or comments. We are in the process of spinning the paper for peer review publication.
So we once that happens, of course, if you can all just tune in. There's several dozen. Coauthors from across multiple journals. If you don't see yourself there, please don't feel like we left you out. So I see that we just posted a question what happens in crossref to articles that use the just tagging that is not currently supported?
Yeah so any of the debts? We have our own specific XML that we ingest and converted into XML. So anything that we don't support is just lost. That doesn't make it to us. That stays in the death file, unfortunately. OK answer the question.
It's probably something worth confirming. I mean, just because crosstrek doesn't map it doesn't mean it should be mapped. So I've watched Patricia give this talk a couple of times now, and some of that information can come out other ways. Some metadata is being captured by the folks that have registered the persistent identifier. So I don't think we should just assume that just because crossref doesn't map it, it's actually needed.
It's probably a conversation and that I. So Mary was just Mary Beth was just indicating that what could we do? That might be a really neat conversation to have. Do we actually need it as journals? Yeah Yeah. I mean, there are different jobs in crossref exactly the same because we serve different uses. So we're not going to collect everything, but we'd like to collect things that people find relevant and especially things that they're able to provide.
So there's a question from Howard. Is there an automated way for publishers or crossref to capture a data set already registered at data site? How would we name if I capture? Data set has APIs. So they do have tools, so I'm not sure what you mean in the metadata.
Howard, of course does that. So no, no, no, but. But so in advance it always be better. I mean, of course get said at the tail end of things, right? So if there was already if a researcher through a repository had already registered a data site DUI, and then when something was either submitted to crossref or when it was submitted to a publisher, and there was a way to actually capture that metadata directly out of data site so that it gets captured upstream so people like us don't have to do it at the tail end.
That's what I mean. That's a great question. Yeah, I suspect the answer is no, but I just wanted to ask. No I mean, someone has to be making that connection either in a crossref metadata record or we do we have the currently have currently the event data API that makes those connections in some way, but it's not really comprehensive. We don't do any automatic.
Extracting your matching with the data side. Apa OK, great. Thank you. And I see that Joe has his hand up, so please go ahead. I'm crossref. I just had to clarify the question about how I've actually. Are you saying. Can the data cite metadata they would have put in the reverse citation?
Are you saying that data site would say this data set is connected to this article or are you not even saying it would be that much metadata and you'd be expecting to find. Great question, Joe. Not sure. I think the answer was at the time that the repository probably posted the data, the data set into the data site.
It wouldn't have known about the publication yet. Cool so in some ways those connections are supported, but they have to be established by someone. So there's a really neat tool that data site built. There's a little bit of money from one of my projects. Most of the money came from the European commission, and it's a Commons. I'll give it to you.
And it links the persistent identifiers. So if it was in one or the other, it would make the connection. It's they did a lot of work. And I think they I'm sure that crossref was part of these conversations, but they did a lot of work to get their metadata cleaned up in order to do all of this. And they include all the major persistent identifiers so so DIY ORCID raw and they also just recently.
Bold in the repository lists. So if the data is in a particular depository, they have that hooked in as well. It's really neat to explore that, but it's not quick. It's a heavy lift because they're pulling in so many different separate data sets. Metadata set.
Yes go ahead, Charles. So something flew by in Patricia's talk that left knee confused. It was something about URLs not being pids. And we need to use your eyes. No no, I don't. Sorry if I gave that impression. Oh, OK. Because it was like.
What? no, I think. I think I was. I was discussing. Well, there's some we're not exactly clear on what we do, but I'd like to support other identifiers in crossref metadata. And one of those would be a URL that you could provide an identifier tag and.
They're the Jets kind of handles that differently and they have the specific way you can take a URL and in a citation specifically. So I think I was making the point that you would map that URL to identifier in the crosswalk data, but maybe I was not sure. And in fact, for our purposes. Kids that are earls are convenient in spite issues with keeping you earls persistent forever.
Yeah, definitely. But they're easy to resolve. You just followed. Yeah great. Great I see. Howard has his hand raised. Go ahead, Howard. Yeah so this question for Patricia, I think I heard you during your presentation saying there was a new API coming.
Do you have an update on that? Yeah It's not ready yet. And it's still I don't know if Joe wants to add anything, but he's got more current information as of yesterday. So I'm going to say there's nothing to nothing to add, but we are interested in hearing from people who want to hear about it. So I guess just please. Tricia Yeah.
Tricia, your contact details are. Right definitely make contact with us because we're looking. Looking for people to bounce it off. OK thank you. Great and I just there was a question from Kelly, I think, that was typed into the Q&A and it was for Patricia. And I apologize if this is what we were just talking about, because I was looking at this question. I think you mentioned that crossref is considering moving away from structured references to prioritize the unstructured citation and Doi sub properties.
Could you speak to the motivation behind this change? It will. I feel like we've just done a lot of testing recently between construction and structure and citations and have found that currently the matching process is working very well with our objection citations. And we've heard from a lot of our members that marking up the citations is a very heavy lift and a lot of them just plain don't.
So but I think that's still under consideration. We don't have a clear recommendation yet. But I will say the ideally, if in a perfect world, every citation would have an identifier and we wouldn't need to worry about the rest of it. Yeah yeah, I let me foot stone. But Patricia just said so. So ages. Experience if we send an unstructured data set citation to crossref.
Ish, about 50% of the time they'll figure out that it's a data set which is for us is not good enough. So and, you know, to their credit, they're trying. But I'm got to tell you, when the title of the data set has the name of the journal in it, then figuring out it's a data set is like, no way. Like, that's not happening. So just, you know, let's because we have to do the work to figure out it's a data set and software on our side.
Let's tell them and then None of us will have a problem. So so we are new guidance. The paper that you have the link for. Yes the unstructured tag, which means whatever the author provided, give them the whole thing and then give them the persistent identifier. And at the moment we just have a Doi tag to the point that Joe had made. Joe, Joe.
So Joe. No, it was a person had made earlier about we need other persistent identifiers to be part of this process, but we'll do that tomorrow today we'll fix the UI. Tomorrow we'll do the rest. OK Joe. Joe does have his hand up, so. Go ahead, Joe.
Yeah, it's kind of a question around to kind of follow on from the structured, unstructured thing, from the little I know about kind of the tendencies in data sets curation. Like getting fine grained like this is a version of that kind of information. Like data sets can be much more thinly versioned and sliced than articles. So my impression is that.
Even if you were trying to match to a data set that's maybe quite error prone compared to matching to an article where there may only be one that matches, there may be many slices. So Yeah. That might have turned into more of a comment. Sorry but now I imagine like matching is different between literature and data sets, so that kind of undermines the PID thing.
What? OK. I wasn't sure if Shelly or Patricia had anything to say about that. If not. I did want to. One of the goals of these sessions is to identify if there is anything that nice, though.
If there's a nice project or anything that you'd like to see come out of this. And Mary Beth had posted a link to a Google Doc to capture those kinds of ideas. But if there's anything you'd like to share or anything that's any ideas that you have as related to what nice's role might be for the project, I'd love to hear any thoughts about that as well or any other further comments.
I believe we have until 15 after the hour for this session. Any further comments or questions? Ideas I am not seeing any. But you have the access to the Google Docs so you can always add ideas or questions that emerge later.
And one last time. Option to. Share something. Otherwise, I'd like to thank you all for this great discussion. And I to thank our excellent speakers, Shelly and Patricia, for all of their knowledge and knowledge sharing that they've had, really providing a good, rich discussion. So thank you to everyone.
I hope you enjoy the rest of your time at the conference. Bye bye. Thanks so much, Sandy. Speakers