Name:
Data and Software Citations. What you don’t know CAN hurt you. Recording
Description:
Data and Software Citations. What you don’t know CAN hurt you. Recording
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/30d97d9c-24fc-4f55-8af5-1debf8116e4e/videoscrubberimages/Scrubber_1.jpg
Duration:
T00H42M51S
Embed URL:
https://stream.cadmore.media/player/30d97d9c-24fc-4f55-8af5-1debf8116e4e
Content URL:
https://cadmoreoriginalmedia.blob.core.windows.net/30d97d9c-24fc-4f55-8af5-1debf8116e4e/Data and Software Citations What you don%e2%80%99t know CAN hurt you.mp4?sv=2019-02-02&sr=c&sig=Fpgos3zIc%2BKo2myEsccYKstI6C3AAA0BtjoGgSJ60X0%3D&st=2025-01-22T04%3A34%3A16Z&se=2025-01-22T06%3A39%3A16Z&sp=r
Upload Date:
2024-03-06T00:00:00.0000000
Transcript:
Language: EN.
Segment:0 .
Hi, and welcome to this session.
Data and software citations. What you don't know can hurt you. I'm Sandy Hirsh. I'm the associate dean for Academics in the College of Professional and Global Education at San Jose State University. And I'm also a member of the NISO Plus conference planning committee, and I'm really pleased to moderate this session today.
The focus of this session is on data and software citations and the difference in the production process with guidance to make necessary corrections. This work has been led by the journal task force for the force 11 software citation implementation working group. We have two excellent speakers who will address this important topic. First, we have Shelley Stall, who is the vice president for the American Geophysical Union's data leadership program, and she works with AGU members, their organizations and the broader research community to improve data and digital object practices with the ultimate goal of elevating how research data is managed and valued.
Our second speaker is Patricia Feeney, and Patricia is Crossref's head of metadata, and she coordinates all aspects of metadata at cross wrap, including strategy and overall vision review and introduction of new content types, best practices and community input. During her 15 years at Crossref, she's helped thousands of publishers understand how to record and distribute metadata for millions of scholarly items.
Welcome again to this session. So we're going to go ahead and get started. So we can hear from our speakers and we'll get started. First with Shelley Stall. So thank you. Take it away. Sandy, thank you so much. I am delighted to be here and what an honor it is to speak at NISO Plus again.
So the topic here is very provocative for you all. But it does demonstrate that there was a number of clarifying moments when publishers came together to figure out how to make sure data and software citations are making it through, all the way, in the process. So we want to make it easy for you. We've actually prepared a preprint for you to take a look at, and I'll have those links in this presentation towards the end.
And then Patricia will help you understand further what all of this means for your own journals. And some of those details. So I'm excited to have her with me. So citations, we consume them in two different ways, one that are authors clearly using their eyes and also the machine readable version, which has the bits that need to be set up correctly. So this is my humorous way of showing you we need to pay attention to both things that we can see and things that are harder for us to see.
And we need tools to take a look. And very quickly walking through them. You are all experts in this. I won't take a lot of time on this, but we know humans have we have a bunch of publishers, number of style guides and citations. There's a lot of understanding within the research community on exactly how to navigate those, use the information there to hunt things down.
And we even have persistent identifiers that you can use to clip and search. But the machine readable version makes it much easier to connect, link, hunt down, and we can use tools to help us. And this is where the benefit comes in. So, so, so heads up. I actually have three problem pages. So, so starting at the very beginning. There are different criteria for data and software citations that are different than journals, and a lot of publishers aren't aware that they actually need to pay attention to this.
And what we have known in the past is actually changed and improved. So you have an opportunity to actually do a lot of benefit for your researchers. And this presentation essentially is trying to make you aware of this. So another problem, by not having data and software citations, we're actually treading into space of ethical challenges and transparency challenges.
So we really want you to pay attention to this. So that you're not in that space with us or you come out of that space and join us. So this is an opportunity for you to make sure that you're doing the best you can with the type of research you have, with the type of expectations you have in your community. And remembering that we do have requirements through US funders, through European Commission funders and others that are requiring that data and software that support research is in fact, reported and us as journals, as publishers.
This is something we need to do to support this. And yet one more. It can be super frustrating. There's not good guidance for us on what to do, and I'm hoping that with the work that we've put in from the FORCE11 team, the journal task, that we can actually provide you with that information and the work gets easier. I'm not saying it gets easy.
I'm saying it gets easier. So the culprit, we're doing really great with human readable -- yeah us! If those citations make it into the paper, they very commonly look fantastic coming out in the publication. And I do say if they make it into the paper because some of us are still challenged there and I'll walk through that a little bit and challenge for many good reasons.
I, I realize the culture change is rough, right? So it's really the human readable I want you to pay attention to as well. And we'll walk through what that is. So and I'm just saying it a moment ago, saying it again here in the slides, you need both human has to be there, right? Like people have to be able to read those references and citations as well as the supporting availability statement as well as the mentions within the method section, within the data section.
But it's the machine readable that gets you the automation and we taking advantage of that using tools that exist in order to make things really easy. On researchers to get automated attribution and credit. This is what we're here for. We're here to connect existing research, to new research, to past research through all of these persistent identifiers. Machine readable; machine actionable information.
So join us. Deep dive. Note the background has just changed from lightning on problem to scuba diving and snorkeling and join me on the deep dive. I do come from the earth and space scientists. See, this is. I can use this in humor. Wait till you see my world pictures as we look back on the earth.
It's fun to come from AGU. We do have some really awesome pictures and awesome researchers. So we talk about policy, publishing staff, editors and reviewers that peer review process and really dig in on copyediting citations, production, markup, content management and publishing. And it's those last few pieces that I'm going. Many of you may not be super familiar with your own organization or the services you use, so we're giving you all the information you need to take back to those who are "in the know" so that they can spin up and get the support that's necessary.
So realizing that not all of us are experts in all things when it comes to production, publication, production. So fun scuba diving, right? Join me on the deep dive. So policy. There have been talks previously here at NISOPlus and in other places walking through policy frameworks.
And I just want to hand you the one that I like the most work that's been done in RDA, the research data alliance, helping you to walk through your own journal policies and giving you a tiered solution. So that you can start helping the culture change with. Your researchers. Here is just some quick information. The policy is particularly focused on data, but it really translates nicely for software.
I give you this citation right here if you want to know more. These folks are amazing and they can help you navigate and it gives you not only what you need to do, but why. So you can use that to go back to your own publication committees, to your own potentially volunteer editors. If you're a society, talk to your members about why this is important. So it's a lot of really great content. So also, I didn't put the link here, but as soon as I finish, I'll pop it into the chat.
CHORUS provides a list of data availability statement policies from journals as well as software. And these are good examples of folks that are moving towards implementing changes like are identified in this framework. So yeah, course, they're, they're really great partner for many of us, hopefully all of us. But if it's not all, it should be many. So let's move on.
So you've got your policy in place. And then you're publishing editors and reviewers. We need to make sure that it's really clear what's actually required versus what is encouraged. And so it's important to be flexible because not all things are possible for all reviewers. The infrastructure is not always there. The leading practices aren't always clear. And by the way, this is Noah, and he clearly just ate his tree that I gave him for this particular moment in time.
So we'll continue to give him treats. So the areas of support that are needed, you need to let people know. You need to provide them with training, what needs to be checked, what needs to be required. Give them examples and FAQs. What's the guidance and is it possible to actually get some sort of escalation process either through your own organization or one that exists from other folks that you're partnered with?
So I'm giving you the one. AGU uses. So and by the way, it's open and available for you to take and use. And there are many friends of ours, publisher friends that are actually using this. And we welcome it. We love it. So you're welcome to go take a look.
We're open for any questions that you might have on how we implemented it. Certainly not all publishers have the same challenges that we do. But just to let you know, those in health and bio, we do have some of our journals and our researchers do have that as part of the things that they research. So we have elements in there as well, and we can hook you to folks that are doing it even more.
So So moving along, the peer review process, really important to create a checklist. So think about your peer reviewers. If they're similar to ours, they're fed up. You've asked us to do all this stuff. We're running out of time. Please don't ask us to do more. And when you hand them data and software to work on, you're actually giving them more work to do.
Right and what do you do? So we actually made it a lot easier for our peer reviewers, and we've given them things that they need to have access to. So some of them, depending on the research, are going to need different availability of data and software. So what we say is no matter what the peer reviewer needs to have access to, and should they decide that it's something they need to take a look at, in fact, that needs to be available.
So here we have that peer reviewers must be able to have access to the data and their software. They must be able to validate that it supports the science itself and the visualizations, and then they need to confirm that, in fact, those citations are accurate. So there's complexity that comes in here. Right so you all know, you likely are not getting the full citation when a paper is submitted.
Many of us don't actually require that until the paper is accepted. So there's even at AGU. We have a workflow where some things are required right away on the intent and some things are actually required at the time of acceptance. So yes, it's complicated, but when you get to the very end, when it's time for peer review, you have to have all of that resolved.
You have to have gotten to the point where all of these things can happen. And, you know, we don't ever intend to mislead people, but we have received links to data that go nowhere or go to something that has a website that has the word data on it. So sometimes that's education for our researchers, or sometimes it's just a misunderstanding of what was being asked for.
And so having all of that clear clarified for the authors is really important. And then most importantly, what is a peer review would do about it? And how do they provide guidance to the. Author on making it better. So this is really, really critical. So you would think that that was the tricky part, but it isn't. Now, get a new tank of air and we're going to go we're going to go deeper in our scuba diving.
Here we go. Note that this is a picture of the world because that's fun to do. From our satellite pictures from Asia. So we have really cool, really cool pictures. So copy editing that citation. So the markup is really important. Why is it important? Well, a data citation and a software citation are different.
They have to be handled differently in the downstream services, which means you can't mark them actually as a journal article. You actually have to mark them as data or software. So how many of us are actually doing that? Take a look at yourselves. I would go with "not many". And frankly, it's really hard for you to. And we did have to make some investments on how to implement those processes and how to get that to work efficiently.
Because we all adding x number of seconds to copy editing is costly to us. Right? so so we get that. And how many things can be automated? Well, you have to do it manually usually before you can automate it, especially something new. So those are things that we did as well and happy to share those stories if you're interested in a conversation. So so you have to be able to get them connected.
All right. So in the paper that I mentioned earlier, here's your QR code. Noah is 10 months, by the way. So we have puppy motion going on here in big, big ways. So the QR codes in the bottom right hand side. And Noah says, make sure you get your data citations right and just let me know. That's the message here. So there are three ways you can figure out from the work that we did in FORCE11 on how to know it's a data or software citation.
So number one is if the authors actually described it in the availability statement, that's like the biggest, easiest clue. If they talked about it as being important, then you should have citations. Now, fun fact, especially for data sets. Not all data, especially if it's data not owned by the researchers, may not have a DOI or be preserved in a way that we prefer.
But ethically, you can't put somebody else's data into a preservation repository. We we don't require that we don't require that of our authors because that it's not their data. They shouldn't do that. But we do give them the flexibility to actually link it in the best way possible and describe it fully in the availability statement.
So this is really key. You can always describe your dataset in the availability statement. You cannot always cite it in a way that will render it as a machine readable citation. So that's the difference. Clue 2 The folks that are DOI registries, they offer content negotiation. So in the paper we give you a lot more detail, including links to what that would be.
But it's essentially using a tool to look up the DOI and determine what type it is, is that data is a software. Now, there's lots of challenges with this, right? Was it marked correctly? Did the repository actually realize they had to put the type in there, etc.? However, as things improve this, this is great. This is great. Someone could actually tell you what kind of DOI it is.
So that's an option as well. And then the third option and we're working with our authors on this is if we can actually get a bracketed description. So that's the square brackets within the citation itself indicating that this is a dataset, software, collection or computational notebook, then that's a visual cue. And we're also using it as a learning tool for our authors. So it comes in real early and author guidance, it's in our prompts for all of our data citations and software citations, as well as our copy editors in our own production process are using it to determine if something is as a data citation.
So that is something that we're doing. We realize that changing how an author thinks about things is super hard, but we thought this was worthy to do so. We do use APA style, so that makes it really nice. APA guidance gives you really good directions on how that comes out in the style guide. And and you'll note in the paper that not all citation reference tools actually support data citations and software citations.
So please pay attention to what you're recommending to your authors if it does or not. And you can help me by reaching out to your favorite tools. And ask them to make the updates that are necessary for that support. It's causing some unintended consequences. So the more those tools that are in line, that would be super. All right. Coming to the end.
So production markup. Oh, this is the best, right? Doesn't everybody like a good markup journal file, publish paper? Right because everything links beautifully and it looks gorgeous. And it all looks interconnected. Well, that takes work and you all know that. So here's where data in-text citations, software in-text citations get marked correctly.
So knowing that their data knowing that their software is really important. So where do you look? You look in the methods and data section, in-text, citations to the reference data. Yeah, that makes sense. It needs to also be cited within the availability statement. Right that makes sense. And then remember, those citations are improving every day.
So we actually have updates from ISO on data and software citations as well as the JATS4R, which we're going to talk about in a moment. And then finally, content hosting and publication. All right. So if you've been enjoying Noah so far, I want you to pay attention to this particular slide right now. This is new information has come out from Crossref to make the connections between the data citations, supporting citations in the paper more reliable.
And this is rather new. So if you did all of your Crossref work a couple of years ago, you're not going to have the latest. And this is I need to do like blinking lights, cute puppy pictures, which I by the way, I have accommodated you throughout this talk. So here's what you need to check, walk, walk through and make sure all the metadata is in there for the file that goes to Crossref.
So for those of you who don't know that part of your process, in order to get the DOI for your paper, Especially if it's an English speaking journal, you are registering your paper with Crossref to get the DOI, and by doing that you are using their schema which includes the references which enables the automation. Haha right, that's how it works. So in order to make sure that especially the machine readable part is working, that XML going over to Crossref has to be accurate.
Oh, that makes sense too. All right. Well, if XML is like something you can barely spell, go find your people, especially your content providers, and ask the question, am I doing the latest? Give them this paper. Say, are you doing this? And wait for the answer because the probability of "yes" is dodgy.
So make sure you can also work with a Crossref folks to actually look and see if your files are coming across correctly and double check yourself. So that's important too. And please take the time to do this. This really matters for getting people credit, automated credit for the work, the data that they're preserving. That's part of the research.
That's it's. Own research product, and we do give you all the use cases within the paper for you to use as examples and check your own stuff and further. All of the gory details. And I just talked to you about the fact that the guidance has changed since a few years back. All of the gory details, all of the technical, nerdy stuff that needs to be understood is all in the paper laid out for you.
And you can hand it to all the right folks. So there's your QR code. There's the old school citation for you. It is a preprint right now, and it should maybe be ready for publication by the time you're actually watching me say these things. But we'll give you the updated information when you as you're attending. NISO Plus.
Thank you so much. Thank you so much, Shelley Stall, and also thank you with a special appearance from Noah. We enjoyed the special appearances as well. So I'd like to now turn it over to Patricia Feeney, who will give us more information. So Shelley gave a nice summary of the importance of making sure your markups correct when recording data citations.
So thank you, Shelley. I'm going to go maybe into a little bit of nauseating detail about that, but I think some of it's important and I'm going to cover some of How this process has evolved over time and where we think we'd like to go. I just kind of set the landscape for all of this. So I'm going to talk about the most common practices in making data citations machine readable.
This will cover marking up data citations in JATS, sending data citations to Crossref and a little bit about how to find them. All right. So I think most of you are familiar with JATS. If you're not, it's a tag specification for marking up journal articles and XML. It's a NISO standard and actively maintained and updated by a standing committee that I'm on.
It's very widely used. It's a specification, so it's fairly open and flexible. And there are a few different versions of JATS, and it's really designed to meet a wide set of needs. Not really. And JATS, it's a specification. It's not designed to proscribe strictly how a publication should be marked up. It's there to offer options and to meet the needs of the publishers.
You need to mark up journal articles in a variety of ways. So there's also another NISO organization, JATS4R, for the stands for JATS for re-use that establishes recommendations on how to best markup JATS for reuse. So you can do your own implementation of JATS that doesn't follow these recommendations if it works for you. But if you want follow best practices, that will help make your XML reusable and machine readable downstream, which is, I think in this very connected, scholarly world we live in, is a good thing and becoming essential.
So JATS4R has recommendations for citations and specifically for data citations. That's what I'm going to dig into a little bit. This advice can be extended to apply to software as well. This particular recommendation is on its second version. There's a link here, but you can go to JATS4R.org and find the recommendation that way as well. There are a few options to capture the data and references in your file.
It works for JATS version 1.1 and beyond. JATS itself is currently on 1.3, and this particular recommendation was last updated in 2020. So medidata, as we all know, is the key to everything. And the XML and the JSON and all the other things we use are just formats to get the metadata where it needs to go. As Shelley mentioned, one of the weak links in this entire process is the gaps between what is collected, what ends up in Crossref, what doesn't get entered into a manuscript and marked up.
But in particular, I think many see Crossref as kind of the endpoint for a lot of data citations. And I think we do a lot with passing data citations along, but there's a lot of potential that we all aren't meeting yet. So I'm going to dig into the recommendations. I originally was going to do the separately do JATS4R recommendations and describe what Crossref accepts. But but instead I'm going to compare, which is something I do a lot in my job every day.
Compare what JATS is doing to what Crossref supports. We've been making a lot of efforts in recent years to be better aligned between JATS and Crossref, so I think this is something we're already doing. It illustrates well the issues that can come in general when you try to integrate metadata that's used for different purposes. So the JATS4R recommendation. Details specific pieces of metadata that should be collected for data citations.
And they also give recommendations on how to mark them up in JATS specifically, as well as some examples which are really handy. I know a lot of people prefer when they're working with XML to look at the examples and work backwards, which is probably something I'd do if I was doing that for production. So an important piece of metadata in the JATS4R recommendation is big surprise.
It's the citation type. So you can flag a citation as a data citation. And you can flag it as a journal article, you can find it as a book chapter, all these things. So what this means is if you flag a citation specifically as data, it will be read as a data citation. If you flag it as software, it will be read as a software citation. We don't accept publication types in citations at Crossref.
Currently, the reference metadata that we collect has historically been used to DOIs registered with us to the citations registered as part of an item record and publication type hasn't been necessary. We really just kind of match metadata and are type-agnostic to a certain degree with the matching process. That makes it very hard to identify what is a data citation because they often don't scream "I am data" unless you have an XML tag in the markup So it's hard to find out what needs to be matched with an identifier or an item outside of our corpus at Crossref.
We do support data registration, but it's a very small amount of the total data sets in the world that are registered with us. Obviously, DataCite registers, many, many data citations. Data citation is new enough that the citation practices are still being adopted and refined. So that the metadata we collect isn't really tailored to that yet.
That said, all of the references registered with us are open and available to metadata users, and we've heard from many people that expanding our citation metadata to include a type would be very useful to those who use our reference metadata, particularly users looking for data citations. So we'd like to support that. It's also true that our using numbers often use publication type anyway in their citations, so they already have that metadata in there anyway.
So it makes sense for us to accept the submitted data deposits. We currently have some delays on our side in updating or metadata schema, but we hope to have support for this in the future. So JATS4R Also accepts metadata about how a data citation is used. Therefore, four types that are recommended supporting generated, analyzed, not analyzed.
This is just a screenshot from the recommendation that describes what in detail what each type is. At Crossref we support data citation by two methods. You can submit them as part of a reference list or submit them as a relationship. For citations considered to be supporting, analyzed or not analyzed, you should submit those references in your reference list with all of the journal articles and books citations when you submit your data to Crossref.
Because this really goes along with the FAIR recommendations, that data should be included in reference lists and not considered as its own special category. And the references relationship is implicit. Implicit in that if it's included in a reference list that you submit to Crossref. But if you want to assert that a data set is specifically supporting data, it should be submitted as a relationship and in addition to or instead of as a reference, so that that is supplemented by relationship and is recorded and that particular data set is flagged as a supplement.
So this is usually used for things considered to be supplemental material. So if this seems like a lot, lgetting super, super granular, be assured that it is optional to make the distinction. Perfect metadata is what we of course all aspire to. But I think in this kind of. these nascent days of supporting data citation, just getting the data citations out there and connected with a journal article, for example, is what's important.
So even the basics help a lot. It's really important to just make those connections between data and research outputs. So data also has in the JATS4R recommendations. has a specific contributor role. The people who generate data are curators, not authors or editors that are traditionally included in citations.
So this is where it gets a little confusing. So at Crossref in our reference metadata, we have a single author field that can be used to supply any person or organization considered to be a contributor. That might help with matching the citation to our record somewhere. In retrospect, we should have called that tag something else like contributor.
But when the initial schema was conceived, we really only needed to collect authors and editors and reference metadata. So our outputs also flag this as author and it's a little inconsistent. And not exactly correct in how things are labeled. But we have no immediate plans to change this in our reference metadata. But of course we may reconsider this as our reference metadata sees more use and more importantly, as our members express a need to supply more refined reference metadata.
I don't really anticipate that will happen any time soon I'll talk about this in a few minutes. But we're moving away from the structured reference metadata into unstructured references. So it seems like a lot of work for something that will be not as relevant in the future, but we're always open to feedback. Crossref metadata reference metadata in particular isn't typically used for display.
If you retrieve the metadata for a DOI record, you might have the metadata for the item and use that to format a citation. But otherwise that metadata exists to make connections between objects and is targeted at finding a DOI or potentially other identifiers that can make a connection. So it's not as much a priority to in a citation to label a curator.
as a curator, it's more important at the top level of the top item level to make those connections. JATS uses a specific data title fields to flag the title of the data set. I think both Crossref and JATS have decided that naming tags so granularly is not the best idea. So that may change in the future for Crossref.
We currently ask that that the title be supplied in the article title field, which is basically within our metadata. the most specific title you can apply to an item again, a legacy of the fact that we originally used to register mainly journal articles. So this is confusing. In the future we might clarify this a bit if we decide to expand support for the reference markup.
Because we do get questions when people are supplying reference metadata. like "I don't see where to put my data title", "I don't know where to put my software title". So that's something that's under consideration. One piece of metadata that I think is important is the name of the holding repository and JATS labels that as source. but there is no Crossref equivalent for that.
It's something we could add. But I think, again, we need to do more work within structured versus structured metadata to see if it's worthwhile to specifically support that. So are data sets published, or they posted deposited, I think they're more posted and updated and created. But it doesn't really matter. What you call it. This is the year that whatever that is happened.
So I think that's something that's very consistent between Crossref and JATS we call a year, a year. In that way we're perfectly aligned, which is nice. It's nice to match up entirely, although JATS does accept and recommend supplying an ISO 86001 machine readable date. If the date you supply is not already machine readable. But we just ask that you supply the four digit year.
OK so I think the repository ID as JATS calls it or the JATS4R recommendation labels that the specific tag as pub ID. I think this is the most important piece of metadata in this whole set of metadata. It's the identifier used by a repository for a data set. So that could be a DOI. It could be something else. JATS accepts within their metadata the type of identifier, who assigns the identifier, which is really important.
If you want to say distinguish between a Crossref and a data set DOI, you can also provide a specific URL That's handy for things like in the example I have on this slide, accession numbers which do not have a universally applied URL prefix that varies by the repository. So Crossref we of course, accept DOIs in reference metadata, but we do not yet accept other identifiers. I hope to address that in the future as Ithink that will help data citations immensely.
If we have a DOI for data as a data citation or a software citation or any kind of citation. As Shelley mentioned, you can get a lot of information from that DOI's metadata record. You can tell what the object is. You can get all the metadata for that object. You can basically get, if the record is registered correctly, everything you need to know. But for items without DOIs - and there are a number of them - I think it's a little more challenging.
And we need to consider using the existing identifiers, particularly for things like software citation, which doesn't have a culture of applying DOIs to software. So you can also provide a URL within the metadata if something doesn't have a persistent ID identifier. At Crossref, we don't accept that, but we will in the future where we will have a URI option in the identifier field when we do support that.
And of course, data sets have versions. And JATS does accept that, it's another thing that we are currently collecting that we plan to support in a future update. So basically, to summarize, we aren't entirely in alignment, but we may be in better alignment in the future. Some things are in the works, some things are already supported. But what we have now we are able to do a lot with.
We added support for references in relationship metadata to our event data API, which allows us to make these connections between objects and where they've been cited or mentioned in our metadata is sent to the Scholix endpoint as well, which is used to identify data citations. I'm not going to go into details about this because we have a very exciting new API on the horizon that's more stable and robust than event data.
The session is pre-recorded. But if I had more details by our actual session date, I'll drop it in the chat. during our session. So I hope I'm doing that. But I may not be. But if I'm not, we'll get you some more information in the future. So I've talked a bit about how structured data aligns between DOI and Crossref, but I have to say this is currently an ideal citation for us to appear on the screen.
It's unstructured, meaning that it's just a text citation. So you can get all of the information from it. That's in the citation, but it also has a DOI called out. So we can use that and downstream users can use that to figure out the metadata for this item and identify it as a data set and identify the repository, identify the title and all of the other metadata that's important to identify data sets.
This is an example of something that probably wouldn't go very far as a data citation. It has the word "data" in it, but I don't know that machine wise we'd be able to extract that. it's a data citation. It would be very hard and there's no URL linking to anything specific, so it's not very useful. So to summarize, what I think we've learned from this is that the alignment between JATS and Crossref isn't exact, but we're working on it.
Crossref citation metadata is intended to support matching of citations to Crossref and other metadata records. So it's not a complete representation of every little detail. But overall, I think identifiers - DOIs - do a good job, other identifiers do a good job. They can do a lot of this work for us. so that we're not marking up these little teeny tiny pieces of metadata and replicating them from record to record all across the ecosystem.
So to sum up by having this slide, I'm sure these slides will be distributed. I just have some links to the JATS4R data citation guidance, the JATS4R citation guidance and the Crossref guidance on providing data citations. Thank you. Thank you so much, Patricia. That was really great.
And I think you both shared a lot of amazing information. I'm looking forward to having conversation with our participants at the conference, so I want to thank those of you who attended the session. I want to thank you for attending, and I want to also thank, again, our two outstanding speakers, Shelley. Shelley Stall and Patricia Feeney. And we'd like to invite you now to join us in the Zoom room to discuss this topic further.