Name:
Improving Research Workflows with Metadata
Description:
Improving Research Workflows with Metadata
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/c61ded6e-a943-4f91-922a-935a0168ccad/videoscrubberimages/Scrubber_63.jpg
Duration:
T01H09M19S
Embed URL:
https://stream.cadmore.media/player/c61ded6e-a943-4f91-922a-935a0168ccad
Content URL:
https://cadmoreoriginalmedia.blob.core.windows.net/c61ded6e-a943-4f91-922a-935a0168ccad/session_5b__improving_research_workflows_with_metadata (720p.mp4?sv=2019-02-02&sr=c&sig=11I6el6zHRhgMQJRYQ5IQ1RmoWpiDO6LVdR2d4btDfY%3D&st=2024-11-19T22%3A36%3A24Z&se=2024-11-20T00%3A41%3A24Z&sp=r
Upload Date:
2024-02-23T00:00:00.0000000
Transcript:
Language: EN.
Segment:0 .
Without slides to start with. So a huge apologies. We we were a little technically challenged in this session because we have the lovely Jaime Hendricks from crossref with us, who I can see, but sadly, you can't see at the moment. But at some point, hopefully, you'll see her face on the screen. She's calling in remotely from the UK.
I'm going to quickly introduce our other speakers who will, if they need more introduction, add it when they speak. But Rob O'Donnell from Rockefeller University Press and agesta from e-journal press, and Ana heredia, who's a Consultant who's with us from Brazil. And I'm Alice Meadows from more brains cooperative. And happily, we wanted to start this session with a few questions for all of you so we can see who we've got in the room.
So can you put your hand up if you are from a publishing organisation? OK so probably a good half of you. What about are you from a platform system? Third party service provider? OK, so a fair few of those. Do we have any librarians here? Yay! good to see you.
And if you are in a publisher, are you on the editorial side, any of you? Not many editors here. What about from sort of product it that side of things, whether you're at a publisher or not, I should say? OK all right. So we've got we've got a good mix of people. It's lovely to see you, Ginny.
For your information, the room is fairly full, and I hope we're going to have time for plenty of questions. And I'm hoping that this is now going to work. OK so without further ado. Hopefully I want to do a little bit of setting the scene. So this is what we're going to be talking about this morning. I'll do a very brief overview of metadata, sort of very high level. Jini is going to do a bit more of a sort of not in depth, but a sort of service provider for you.
Ana is here to provide a global South researcher view. Then Ana will give the manuscripts of confusing Ana and Ana. Ana will give a manuscript submission system perspective, and then we'll finish up with a publisher perspective from Rob. And then, as I say, we hope there is going to be time for some discussion and Q&A. So we decided, you know, metadata is a huge topic. And it's interesting to me, having been interested in metadata for some time now, that I think this is the first meeting that I've ever seen, you know, several sessions that are really focused on metadata, which as a sort of metadata nerd, makes me very happy.
But of course, metadata is a huge topic. So we decided that we would just focus today on sort of funder and institution metadata because in part because that's typically sort of it's needed, but it's harder to capture and yet it's vital and becoming increasingly vital for sort of compliance with funder policies and things like that. So we've done that bit skip that. So this is just a very brief, you know, metadata 101 type thing.
What is metadata? This is a classic definition from Wikipedia that I'm sure you're all familiar with. But also, you know, metadata. There are different sorts of metadata. And you'll see there's a I've tried to have a little bit of a Portland, Oregon theme with my illustrations. So this is an example really of metadata about portlandia, which includes, you know, descriptive metadata.
What is it, the administrative metadata, sort of, you know, who owns it? Legal metadata, that sort of thing. So there's different sorts of metadata, but it's all information about content. It's not the content itself, it's the information about the content. I wanted to, particularly those of you who know me, will know that I used to work at orchid, so I am a bit of a paid person.
But pyds are a really essential component of metadata. So I did want to say a little bit specifically about them. So pids persistent identifiers are and again, this is the sort of classic reference I know Ginny is going to say have a sort of different, very nice sort of different type of definition of metadata when you see her recording. But this is sort of the classic one, oops, going the wrong way. Um, but as I say, pids are essential to metadata and particularly to research metadata and some of the kind of pids that you'll come across and I'm sure you're familiar with most, if not all, of these are crossref and datacite dois, which are for outputs, you know, articles, data sets and so on, but also for grants, ORCID identifiers for researchers and for other contributors to research raid which you may or may not be familiar with.
That's a newer identifier. It stands for research activity identifier, and it's tended, intended to identify projects and then to enable you it sort of x as a folder so you can bung all your other identifiers into that raid identifier and they all stay together all the way through the project and actually be on the end of it and then rule for organizations and somewhere in the audience. But I don't have my glasses on, so I can't see her.
Yes, we have Maria Gould from raw, who I'm really happy is here in person and we'll probably ask to say a few words later on and she will certainly be able to answer any questions you may have about raw. One thing to say about all of these is that they are all open identifiers. And of course, there are many other persistent identifiers and that they're all valuable.
But with my sort of ORCID more brains hat on, this is my version of the perfect PID. So it would have openly accessible metadata, an open API, not necessarily fully open, but at least, you know, a good chunk of it would be open, some sort of community governance, it would be sustainable and it will have a parachute plan. So in other words, if when there's no longer a need for it or if when the sustaining organization can no longer host it, there's a plan for what's going to happen after that.
So why do we need metadata? These are some by no means all, but some of the sort of key challenges in research that metadata can help address. And I'm going to go through each of these in turn. So first of all, discoverability, better metadata basically equals better search results. And when I say better, I mean, you know, richer, more complete, timely, accurate.
Et cetera. Et cetera. Now, that's a whole, you know, how to achieve that would be the topic of a whole week's discussion on its own. But you know, in theory and then in terms of next slides. So I personally love this example. It's about me. So if I try searching the crossref database without my orcid ID, the first publication of mine that appears is quite a long way down, but if I use my orcid ID da!
Everything's by me and it keeps going for a bit longer, which is fantastic. So, you know, persistent identifiers in particular, but metadata in general really help with discoverability and search. They also help. It also helps with recognition and credit, which is really important aspect and has been gaining in importance over the years as the sort of increasing recognition how much researchers do kind of free and also how important it is to them to get credit for the work that they're doing and for that credit to be attributed correctly.
Um, oops, sorry. I'll keep going the wrong way. Um, and the contributor roles. Taxonomy credit is a really good example of how metadata is helping with this and a number of publishers have already implemented it. I think one institution has implemented it. I know a number of platforms have implemented it, and I know this is now an ISO standard.
And there are plans at niso to kind of expand it beyond the 14 contributor roles that are in existence at the moment. Pigs in particular also enable interoperability so that they and their associated metadata can flow between all these different systems that researchers and research managers and so on are using. So again, using my ORCID records as an example, with orchids, you can choose to share your record with trusted parties, which are basically ORCID members, which might be, you know, platform provider, publisher, crossref, datacite, whatever.
And by doing so, they will then update your record for you. Basically, you don't need to do anything. The information flows from their system into your ORCID record, and from there it can flow out from your ORCID record into other systems. Which kind of ties in with efficiency. So more brains. We've been doing some work with a number of national organizations on their national PID strategies, which are becoming a thing.
And again, that's a whole separate topic, but quite exciting that this is now being thought about at a policy and strategy level, at the National level in some countries. And one of the bits of work we did was a cost benefit analysis in which we found that UK researchers and administrators waste an estimated 55,000 person days a year. And that's just on rekeying metadata.
Basically that's just on this stuff that could be in one system flowing to other systems and never need to be rekeyed again. That's that's time wasted. And this was a very lowball estimate and it doesn't take into account any of the other benefits of metadata. So this is a lovely example. I you can't read it, but we also did some work for the Australian Research Council and the Australian access Federation for the Australian National PID strategy, and they interviewed a researcher who the Australian Research Council introduced, did an integration with ORCID that allows researchers when they're applying for a grant to pull all their information in from the ORCID record.
And this researcher basically said, it's amazing, it's saving me, you know, 2 or three days every time I submit a grant just to be able to pull all my publications in from one source instead of having to key them in and collect them from all over the place. So that's a really nice example of how pids and metadata can make the whole process more, more streamlined and efficient. And then tracking and analysis, you know, things like citations, connections, researchers, career progress, all these are things that historically have been quite hard to track.
But metadata and pids make them much, much easier. And I think the PID graph, which I expect you're all familiar with, is a nice example of that. This is a rather old version of it, but you know, making those connections between a researcher and their outputs and their organizations and their grants is a really helpful way of giving us views at a high level. And then at a very deep level as well, because all those connections you can click onto and then see the associated metadata you can get from sort of very big picture, this is where we are to oh, and this is where this one individual or one organization is.
And then last but not least, openness. So particularly the robust open metadata that is associated with particularly open pids but also is available elsewhere, really supports open scholarship and science. It enables transparency, it enables sharing, it enables reuse. And I think the best example really of this is you literally cannot be fair without metadata.
Pretty much every single one of the fair principles includes some mention of metadata. So if you want to go fair, you need to be using metadata. Just very quickly to find to sum up my bit, how should metadata work? Well, you know, in an ideal world, it would be collected as early as possible in the process, stuck in a system and then would flow to all those other systems.
So this is a little very high level research cycle graphic that we did. If you if you want to go and take a look in real life, you can click on and it's sort of slightly interactive. And then we followed up by doing some sort of deeper dives into specific workflows, including the funding workflow. And again, it's not it's not super duper graphics or interactivity, but it's quite a helpful way to work your way through both at a high level and then some individual workflows.
So basically my overall message is good metadata is good for us all. And so on that note, I'm going to attempt to put Jenny's presentation on. Thank you. Just getting some technical help.
So I can close it down. That's good. The nice one. Thank you. And those you can see my slides. So my work across that I've been at APM and I'm responsible for a number of community outreach and governance and support and.
I know this session is about data in the research workflow for sure. She's really well. This session is called cover, so I've been asked to define my space for this and look around the conference rooms as important methods and where we do things the world is. And so first, I'd like to share this quote from Julian Schiller, who was one of the earliest days in 2021.
This is back in 2017 during the dog. And those people answer the question with metadata is data about data, which is true, but also metadata is communication. And this struck me as really important. If we think of metadata as communication, considering we're all in the world of scholarly communication, it makes metadata that elevates it to.
Something that should be sort of a strategic priority. Tells the story of the search. And Julian goes on to say, it defines relationships. Sets the parameters for a range of actions able to dig into this, but only in the next 7 or 8 minutes. Uh, first of all, what does that have to do with metadata? We do love metadata. We love it. We have 45 million open message records.
Each one of those records describe all sorts of objects, from journal articles to preprints to blogs to images to video, to books, of course, and all sorts of objects. Probably 60% of journal releases. And we also see huge use of this metadata. So over $1.1 billion resolutions every month.
And that means that every month someone or something is using one of the Allies and resolving to it, linking to it $1.1 billion times. We know that many thousands of systems we use crossref metadata. We have our own search and pre-publication API. It's not a question mark because it's not obligatory to identify yourself. And so we have this corpus of metadata.
We have lots of tools for interpreting and filling out and occipitals for retrieving information out of the records. So we are definitely a technical solution for the community, but we're also a community convener. We are involved in lots of conversations about where the industry is going about and we're concerned with things like research, integrity, reproducibility, and we're increasingly involved in funders and policy making.
Um, we are a compliance organization and pony stands for the principles of open scholarly infrastructure. That that means that we have committed to 16 business rules that if we follow them, it means that we are completely open copyable and the community can shut us down and restart us in the same way if they want to. So it's a really transparent, open infrastructure and we're trying to become even more so.
So a little bit about our membership. We have now over 18,000 organizations members and they come from 150 different countries and put approximately 150 countries because of course, journal editorship changes all the time. So they've change the change countries. Um, the largest segment of that 18,034% is now they self-identify as research institutions or universities, which is really different than how we start out.
Started out with the 12 founding traditional publishers. We also have grown and reflecting the growth of the community to 44 staff and we've become more global as well. We now cover seven time zones in eight countries. Um, so that's the who are we done with? Now back to metadata. So what exactly are we talking about when we talk about this data?
So this is, Uh, probably a list of things that you might have heard of. So basic metadata would include titles and dates, author names, certainly descriptive information and descriptions like abstracts. It also includes a which is a digital object identifier and a URL that will point to locate the object that has been registered. We ask for other URLs.
Some of them are for text mining specifically and that might have a separate license going along with it. And some of them can be specifically for similarity checking, which is what our service similarity check uses to basically compare text 1 to 1. We also collect metadata about when a record has been updated. And increasingly we're collecting data about when articles or other items are withdrawn, corrected or even contracted.
And that can be displayed through our tool. Klotzbach but it can also just be part of the metadata and therefore usable by machines downstream across the whole community. Increasingly and you'll hear us talk a lot more about relationship and data. And that means that we're looking to connect different objects. And one standalone article doesn't stand alone.
Actually, it's made up of and connected to many different versions. Potentially translations should be connected to data, both data that it's used and data that it's created. It will have references and will have citation information and all sorts of other kinds of relationships. We have funders that are members of the ready to grant allies and they're putting in relationships about who funded who for a project.
Increasingly, we're looking at provenance information. Some of this is actually something crossref has internally and can expose more publicly. So who the member is, who the publisher or society is, who the hosted platform is. We have all that kind of information and we certainly want to know who the funder is and the Steward of that record. Uh, some of this data is very subject specific.
So clinical trial information, it's possible to include in the record, but it's obviously not relevant for everybody. We like the information we have registry for the registry and each funding organization in the world has an identifier identifier and that is encoded by a number of members to denote who funded the work that they're registering. And increasingly, we won the award numbers attached to that as well.
Uh, contributed information is really important. So we ask for ORCID IDs and whether they've been validated or not. We also really want filiation information, and raw IDs are preferred or is also an open and closing initiative. Um, digging into this picture a little bit more, we have this vision for what we call a research nexus. A lot of people might call it a knowledge graph or a research ecosystem or something like that.
So this diagram, what we call a research nexus, is coming at the picture from the point of view of relationships and metadata. So in the center you see lots of objects and entities that should ideally have a persistent identifier which are necessary, but they're not sufficient because everything around the outside, which is kind of in the workflow of research and funding, creating publishing and posting through to reuse verification comments and citation and other feedback.
the objects in the center of this should have persistent identifiers. They are things like oxides or IDs. Datacite, dois or crossref dois. And it can be things like preprints, articles, blogs, data sets. It's sort of they've even put their other objects and entities dot, dot, because it really is unlimited.
We are on siloing our message email so that. Pretty much anyone can register anything. And if the provenance information about who's creating and stewarding this information is completely open and transparent, then you see the whole evidence trail. And anyone can make assessments and make judgments or choices about what they want to trust and use. That we're trying to bring together the disparate pieces of the scholarly record.
And this is obviously not just a crossref job, but we're know, we're trying to start with ourselves and we want a better view of the relationship beyond identifiers. And as I mentioned, we're trying to expose what we never would have thought of as metadata, as metadata more publicly. So things like how much members pay us, how much, you know, what, when it was paid for and who paid for it, and not just payment information, but other just administrative information.
How often how frequently the metadata record was updated and when they last updated. And what with all of this coming back to my theme for the talk tells the story. All of this metadata is communicating the full story about the research and all of the activity and actions that happen around it at the time and beyond, and seeing the future generations.
And so 60% of this is already possible. Everything that's written here and the message listed above are already things that members can supply and that any user can retrieve the rest of the aspirational. But we just like to imagine that if we can potentially gather information from other parties about what they're able to reproduce or even refuse, then that could be incredible. Um, certainly for things like increased trust in the scholarly record.
Um, which does bring me on to the why. So I think hopefully I've described a little bit about what we think metadata is and what we know metadata. And here are just four areas where we think that metadata telling a story can help the whole community and all sorts of different stakeholders. So research integrity, there's lots of different methods that can be seen as signals of trustworthiness.
So that provenance information, the funding information, um, the fact that whether something was checked for plagiarism or whether it's stewarded by an organization that has a corrections and retractions policy, all of this stuff is metadata. It's all contextual information that helps us assess the integrity of the research. Reproducibility so the more relationship metadata that is added to the data protocols and software open code, ideally it means that downstream users and sideways users can potentially reproduce the work and verify it even easier, which may in turn go back to the first block, leads to corrections or retractions and assessment.
Metadata can be used and is used by policy makers and governments and of course, publishers and everybody else to analyze the outcomes of research. Who's using it, where it's going, where is that record to travel and through what systems? It could also be used to demonstrate compliance with things like policy information that we collect or funding. Funders are certainly looking at cross-reference data to link what they funded to the eventual outputs, which may be things or ask of the.
The one that's the benefit that most people mention when you talk about metadata is discoverability, and that is simply that the more data there is, the richer the landlord, the more pathways, the more angles, the different systems can come in to find the. So that was a whirlwind tour of how think of mass data and why I think it's important for research. We have a community forum in Auburn that's got some information there about metadata, and you can reach out to me anything at all.
And Thank you very much for having me. And sorry again, couldn't be there in person. I will look forward to chat either virtually or soon. Thanks so much, Ginny. I know it seems a bit weird clapping somebody who wasn't actually here, but it was a great presentation. So now I'm going to hand over to Ana heredia, who is going to give us a nice mix of a researcher perspective and a Latin American perspective on metadata for workflows.
Good morning, everyone. so. Oops Oh. Thank you. Hello good morning. Um, so I'll, I'll more provoke than inform, probably.
So and. And I'll try to be short as well. So, um, I think Alice and Jean for having given this introduction, let's say, to, to, to the pits, to the pits world, to the metadata world. And I'll try to put my, my head of researcher on, um, to, to mention, to talk a little bit about what researchers should know and should do in terms of their metadata.
Uh, so in fact, a lot of has been discussed around who is responsible for the metadata. And and also a lot has been said about the fact that researchers have a lot of burden on them, administrative burden. And this would be more like a technical burden that they are not necessarily aware of all the mechanisms and even the importance of giving their data more context. So I I'd like to use as a frame for, for my reflection here a recent post that has been published in Scholarly Kitchen that shows a very nice, um, report, a visualization of the research life cycle, highlighting the different stakeholders that are involved in the metadata creation, in the metadata creation and the sharing at different stages of the research process workflow.
Right? so here you have a synthesis of this report that is nice, very interesting. And you have the different stakeholders there. So you have the researcher for the institution to which this researcher belongs, Uh, the, the funder who funds this research and, and the, the publisher. And then in, in the, in the bottom, you have the icons which are, Uh, represent the bibliographic data.
So for example, the esns. Then you have the researcher data that would be their names, their locations, their countries, their ORCID IDs. You have the institutional affiliation data. So going, going like this, the institutional affiliation data, so raw Ringgold is ni grid Uh, research funding data with open funder ID grant ID, then you have subject vocabulary, vocabulary data with schema and byte, for example, and you have the last column, which represents the research data metadata.
So for example, supplementary material dois additional taxonomies credit ids, and then you can have like a. A vision of the whole process. So the aim of this review, specifically of this visualization was to identify gaps and missed opportunities for the communities for whom open access and open science models are designed to serve.
So one specific part of this review has to do with what I would like to highlight, and it occurs at one of the stages of this life Cycle Research lifecycle, one of the stages, which is the research and authoring stage. According to this report, where when researchers conduct their literature review and possibly look for opportunities to publish their results.
So the challenge here are multiple, multiple research inequities and barriers are much more evident at this stage of the research life cycle. So, for instance, valid research research from underrepresented. Um, Sorry says valid research from the represented researchers is frequently invisible to the global audience because of the lack of good metadata or because of the language of the metadata.
Also, these researchers do not have equitable access to certain discovery services. And add to this. They don't have the same opportunities to publish their results. So these global inequities that are highlighted here in this slide. Uh, in the access of information, in the visibility of a big part of this, the world's scientific production and in the opportunities for publications have a, a huge global impact as they hinder the scientific progress as a whole.
So in this next slide, I, um, is to highlight a little bit this asymmetries that we see. So for instance, in Latin America, where open access is established, Uh, publishing practice, um, the transition to open science presents many challenges, even with a robust research information infrastructure which is present in the region. So the recent UNESCO recommendation is a common and legal consensual framework to develop the open science that the region the Latin American region wants.
So here I highlight, for example, an asymmetry, which is the digital gap that exists and how in the region it is being tackled. So, for example, by strengthening these collaborative infrastructures that already exist, for instance, in Latin America, we have different initiatives like, for example, cielo, that maybe some of you heard about or read the leak and which are at the same time full text, open databases.
At the same time, there are indexing databases as well. So meaning that they provide a series of services and a series of tools for editors to work in the daily basis. So another asymmetry is the English as unique code for interoperability? This is a main barrier. And and the way that the region see is to tackle this is promoting and increasing the multilingualism.
So this is something as well that is being very much discussed in the Latin America community. And then we have the symmetries to publish, right, that now are more linked to apks or VPCS, which are prohibited prohibitive in a lot of countries. And the way to tackle this, to face this asymmetry in particular is also by strengthening regional established public practices and not necessarily comply to what other regions in the world are doing.
So just to finalize with some examples of nice initiatives in the region, we have La referencia, which is a. Uh, so these are two or three examples to show efforts around data and metadata standards in the region, and one will show you one at the regional level, which is La referencia and another one is in Colombia. So La referencia is a Federated network of institutional repositories of scientific publications.
They have 12 national nodes, as they call in, in, in different countries of Latin America. These national nodes are linked to governmental institutions at a national level, and they harvest from the institutional repositories. So these national nodes harvest from institutional repositories, all open access and then referencia regionally harvest from these national nodes and makes this content available through different partnerships with other initiatives globally.
So since 2015, La referencia has been playing a key role in the region and establishing a series of interoperability guidelines whose fulfillment must be guaranteed by the National nodes while recommending its adoption by the repositories, the institutional repositories. They are part of this network. So the compliance to these guidelines determine if the record will be accepted or not by La referencia at the harvesting stage.
Another nice example of how the region is doing with the metadata is in Colombia, where the government the Colombian government recently. what can I say? This implemented National Open science policy. And within this project, within this initiative, they built a series of guidelines to help the organizations comply to this.
This one is one that regards precisely metadata for the Ministry research data repositories. Another another very nice example is the parsec project. I don't know if some of you have heard about. It's a project that the parsec means building new tools for data and reuse through a transnational investigation of the socioeconomic impacts of protected areas.
And this is a project funded by the Belmont forum through the NSF and involving organizations in France, Brazil, Japan, and that has been ongoing since 2019. So the data science team, which is a subgroup of this group composed of leading environmental data management professionals. So this is a professionals so this is a subject, a subject specific project.
They involved they involved so leading environmental data management professors, data community, society journals and representatives of infrastructures for data attribution. So they are developing leading practices on data, citation, attribution, credit and reuse. So this team has produced this data documentation that you can see on the right side and citation checklist to help the researchers and project leaders in environmental in the environmental field to improve their data management and sharing practices.
So this three examples, one being at a regional level and completely born in Latin America, the other one of a country of national policy and the other one being a trans transdisciplinary and transnational initiatives shows a little bit what is being raised as issues and questions in the region and how the community is organizing to tackle the metadata issue.
I have 1 minute. OK, so what researchers so just so just to finish, I'm using this graph to illustrate the fact that sometimes researchers are there in their lab, in their desk, doing their research, and when they publish the paper, sometimes it's the end of one of one, one stage and the start of the other one. And they don't necessarily think about the fact that every result they produce, they publish or they share is part of a system.
It's part of a system that is illustrated there, where pets have a very important role to play. So I am an advocate of not putting more burden on the researcher. But at the same time, there are some things that they need. They need to know and they need to do about their data. So here, well, not going to take a lot of time on saying what it is important, why metadata is important. But and here there are some things that only researchers can do about their own data.
And I think that this is what organizations should focus on demanding them to do and not more than that. So plan and manage their data during the research process, share their data openly and cite absolutely all data they have. So this seems very simple to do, but in fact not everybody is doing that. So just to leave you with some questions, that is, there are also raised by this report from CDC that I mentioned at the beginning who should create and maintain metadata, where should it originate, who should own metadata, quality and control, and how to engage researchers in processes that require their unique input about their data.
So this is all. Thank you so much. Thanks that was great. And now we have Ana jester from e-journal press who's going to talk about a manuscript submission perspective. Thank you so much to all three speakers already.
So yes, today I will absolutely be bringing the perspective of a submission and peer review system. If you know me, that will not surprise you in the least. If you do not know me, I've actually put a pin at the top of my about me because it seemed very fitting in this session. But yes, I also am a member of the SSP membership committee. I am a past president of CSC, a science editor, editorial board member, and was on the working group for peer review terminology.
So I have a lot of thoughts about peer review systems, but I want to talk a little bit about situations. So let's talk about data hygiene. How many of you like to talk about data hygiene? Yeah Yeah. OK we got a couple. Awesome Yeah. How many of you ever have been in a system where your name was spelled incorrectly and you weren't allowed to update it?
Yeah, that's a great feeling. And yeah, so the problem with that, of course, is not just that it's annoying, but inaccurate data actually incurs wasteful costs, right? So and not only that, even if your name had actually been spelled correctly, names change. So has anyone in here ever changed their name? Yeah so should we have a broken publication record because we changed our name?
Are we now two different researchers or two different people who love to talk about peer review way more than other people? No, we're not. So it really should make sure that we have a way to update author and organizational names in databases and on published papers. And there should be a specific process that we can easily do that.
Is there anyone in the room who's only worked at one organization for their entire life? Or we have some awesome. So yes, so well done. I also would guess you get more email than a lot of the rest of us. Yeah OK. Um, so, so as part of all of that too, though, there's time, right?
Injuring metadata by hand is time consuming and error prone. So what are some of the solutions for that? So clearly we've had a lot of discussion about pids already. So I beg of you, please be using the raw funder registry ORCID Doi is and some of the others that we've talked about here today and then make it easy to enter the correct information. So I have a couple of slides here as well. But if it's easier to enter the correct information than to enter incorrect information where maybe halfway there, right.
And then, of course, there are cost savings that go with that. So it reduces editing, rework and the correction process. And then also, please, I ask if you don't just collect the metadata. Right as we were talking about it, with it being open, make sure we're actually using it in meaningful ways and sending that metadata where it needs to go. So here's an example on a submission form where an author can come in and can say what their institution is as part of their profile, so their free form text is maybe not the best way to go here.
And that's true for spelling. But also it's quite possible that this isn't even their second language or third language. What if this is their fifth language that they're trying to enter this in? And then you add spelling to it and it's just a bit much. So being able to search and then having that search actually bring back results that are from, say, like raw. And so you actually know, it's a verified research organization.
And then we can even have things on screen that say, yep, that's verified. So here I've searched for Oregon, had a list of all the options for Oregon as well. So again, let's make it easy. And the same would be true at funding data. So if you even come on this list and start typing NIH, you'll be amazed how many other organizations there that are not the NIH of the United States until you've really seen more.
So being able to have a section on the submission form as well for authors to enter in funding, funding data can be very important as well. And here it's actually shown as an autocomplete. They start typing and then it gives them options from that specific list as well, which is the funder registry list. So again, not something that the journal just came up with and said, well, we think we know all the funders in the world or all the funders are authors have, right?
No, let's actually use a PID that's there. So my goal with what I would say about all of this is please make it easy for the authors, right? Is there anyone who's heard, oh, we should be making things easier for the authors in the last. Yeah even six hours. Yeah so, you know, so certainly like making it easier for them, but also easier for us as it comes in is just mutually beneficial.
And in my view, working together really builds it better, which is, you know, just the, the, I guess, more system way of saying teamwork makes the dream work, right? So, OK, so we also see Alice asked us to make sure we had one question to leave the audience with. So mine to you is how much time, which is also sometimes pronounced money where your organization save if authors easily and accurately entered funding and institution data.
So all right, that is, I will leave. Thank you, Ana. As you can tell, we're probably going to leave you with more questions than answers at this point. But last but definitely not least, I'm delighted to introduce Rob O'Donnell from Rockefeller University Press. Else Uh, so I'm.
O'Donnell to put everything in context that I'm going to talk about, I want to tell you just a little bit about the press first. Um, we're a University press, very small University press, sort of atypical for a University press and that we don't have a books program. We just published three, well, four journals, but three journals that are our own. They're transformative hybrid.
We're running at about 30% for 2023 so far. We do have one fully open access journal that's a collaboration journal with Cold Spring Harbor Laboratory press and embo press. Uh, we've always been predominantly institutional subscription based, but we've got a fast growing and published program that is well over 300 institutions at this point. Um, all of our journals are with e-journal press, which is convenient for me.
Um, and the three journals are hosted on silver chair, the LSA, the collaboration collaboration journal is on high wire and we've been active in the switchboard for about a year and a half. So normally when I start talking about metadata, I describe how we're collecting it. But Thank you, Anna, because Ana just showed us because we use, um, and so I'm just going to run through basically a bunch of ways we're using the persistent identifiers that we collect.
Hopefully it doesn't come off as too chaotic, but I'm going to buzz through. So as Alice said at the beginning, we're going to focus on institutional and funder IDs. But first I want to touch on ORCID real quick. I imagine everyone in this room is familiar with how if an author is tagged with an ORCID ID crossref can update that record automatically on the ORCID side, but I'm not sure how many publishers in this room.
And actually I'd like to see a show of hands. How many publishers are pushing reviewer credit to orchid? Wow OK, cool. So, yeah, I mean, if you've got a connection to the orchid, which we do through and we ORCID members, as long as those researchers have ORCID IDs in our system, we can push reviewer credit after they review with their permission, which is very helpful for the reviewers I think.
So I'm going to start with the first one, which is actually not institution ids, but geographical information. So we provide all research for life authors that publish with us free gold, open access publishing, and those countries also have free, free to read access for all of our journals. And we're doing that. We're collecting the country information on submission. And once that's recognized by in the system, it's automatically determined that they have free publishing.
Author doesn't have to do anything. They're automatically brought to a C by license. They don't get invoiced. It's just very smooth for them. And I think that's really important for low income countries, as Ana was just discussing. Um, we're just starting to do this, but we're trying to push authors to the right licenses based on their funders.
So we're collecting funder ids, as Ana showed, but we're doing that. It's important to note, I think, only at revision submission because we've got a very high rate of rejects. So we don't want to waste authors time giving us their funder information at that point. So when they submit their first revision, we ask for it then. And what we're doing is in really Black and white policies for funders, we're going to show them your funder requires open access.
This is a license you need. Give them messaging up front, send them to the right license when it's time to sign. It's a little tricky because some funders policies are based on award date, so we can't do that. So it's a little restricted right now, but we're hoping that with time we can resolve those issues. Um, raw IDs. I love raw IDs.
So we, we have, as I said, we've have over 300 institutions in read publish deals. And the way we're automatically ranking those, recognizing those is we can figure all the deals with raw IDs. So if there's an institution with a bunch of children, we put each of those raw IDs as part of the deal. When authors come in, we collect all of the raw IDs for all of the corresponding authors affiliations.
If there's a match, same as I just mentioned, for research for life countries, it's automatically recognized. They're brought to the right license, they're not invoiced, they're done. And I want to show this as an example of why it's important to have our IDs to recognize these deals. So in this case, this corresponding author has five different affiliations.
We're going to check each of those affiliations to see if they're eligible for the deal. If we're doing that with tech strings, that would never work. It would be highly inaccurate. So you see here we've converted them all to a raw IDs and it's pretty Seamless within BJP. But the other reason I wanted to show you this slide is this paper has over 100 affiliations. And so as text historically, we never really knew that these 100 institutions were publishing in our journal.
So what we did was we took on a project to map all of our subscriber IDs and all of our author ids, contributing author ids, corresponding author IDs to raws. So we've got a raw for every institution that's published with respect to 2017 now and Aurora for every subscriber. And with this project, here are a couple of graphs of what we did.
So on the left, you'll see we broke down by corresponding author institution. How many of those, how many of those institutions, how many articles have those institutions published? And it works out that about 75% of the institutions publishing with us publish one or less articles per year. And that's really something that we I mean, it's astounding in some ways. But what it did allow us to do is with our read and publish deals, we realized, hey.
There's hardly any revenue here. So what we could do is approach these subscribers and say, hey, can we convert this to a read and publish deal? We'll give you free, open access publishing for any of your authors who publish with us. And your subscription price remains the same. And then on the right side, we did a comparison of corresponding author IDs with subscriber ids, and you'll see that there are far many, far less authoring institutions than subscribing institutions, which made us, you know, we realized.
In APC model going up an axis would be super difficult for us. So then we compared all contributing authors with subscribers and the numbers are pretty even. So that's really informed. You know, we're working toward going open access and it's really informed our business model discussions. But without the raw ids, we really wouldn't have had accurate information here.
Oh, I should also mention that when we converted our subscribers to raw ids, we also put those all into silverchair's site manager, which is our access control system. And with that we're the hope is that we'll eventually be able to on demand because we're also putting raw IDs in our Fios On Demand through silverchair, be able to report on any institution to see what is their publishing output, what is their usage, and provide that to the institution.
OK I'm just going to grab water. So open access switchboard. How many people know about open access switchboard? How many people are using open switchboard? OK, we need to get those numbers up. So in a nutshell, open switchboard is a central metadata exchange hub for funders, institutions and publishers.
So I took Yvonne campins, the executive director of switchboard, gave me these images. And on the left, you see what metadata exchange among institutions, publishers and institutions would look like if there were no common hub, right? It's chaotic. There's different structured metadata going from institution to institution, and there's just way too much traffic. On the right is what the switchboard can accomplish.
So we know, it's they're sending we're publishers are sending data to the hub. It's a structured, meta structured JSON metadata file that is common across all of these institutions and organizations. And it it's getting passed from publisher to institution to funder and back in different ways, but it's all going through one hub. So how this is working and the way we're using it is the first thing is eligibility querying.
So with those four IDs we get and with the funder IDs we get, we're sending that to the switchboard only if the author hasn't been recognized as part of an read and publish deal for us. Right so if they're a deal done, they're getting free, open access publishing. If they're not. We're querying all the funders that have been tagged, all the affiliations that have been tagged and say, would you be willing to pay the APC to make this article open access?
And what really excites me about this is there are lots of small institutions, so lots of small publishers that can't assign these read and publish deals. And this gives those researchers at those institutions the opportunity to possibly get their APC paid by their funder, by their institution, where they wouldn't be recognized in a deal otherwise. OK late breaking slide here.
I don't know if you were at the session on Wednesday, Silverchair unveiled census. I think this is an exciting project. Basically, it's. They're trying to provide funders with usage and metrics, usage metrics, usage and impact metrics for the articles that cite them as funders. The only way this is going to work is with tagged funder IDs. So just another example of how it's important to be tagging and including these IDs in your XML.
And then on the horizon. What Ana showed and the way we're collecting data and the points at which we're doing it. We're trying to help the authors as much as possible, but really we need automated extraction of metadata and then verified by the author. I know there are lots of people in this room working on these things and we hope to implement soon. Raid Alice mentioned raid at the beginning, which is sort of a package of pids.
I'm hoping that really picks up and eventually when we get a submission, it comes with a raid ID so that we've got all that metadata already. We've got all those pids ready to go. Um, and then just, you know, once you start working with all of these IDs and you've got them in your system, in your jats, everywhere there are, opens up the opportunity for all different workflows and just hope you all think about what workflows you can come up with.
That's all. OK, so we're going to hopefully get Ginny up on the screen on Zoom. Now just before that, while we're working on that and before we open for questions, I just want to I'm not totally putting her on the spot because I briefed her beforehand, asked Maria Gould from raw if there's anything that you'd like to add to the conversation at all before we have the Q&A. Maria let's come up here, Maria.
I want to be quick because I know many of you. We have questions for this excellent panel. I just wanted to say, I don't think I could really talk about raw any better than the panelists already have, but raw is here. It's really easy to use. It won't cost you anything. It will help make all of your workflow dreams come true.
So if you want to learn more, I'll ask you to talk to the panel about it and also come talk to me during the poster session at 100 today. Thanks thanks, Maria. And for those that don't know, Maria is also local and a wonderful source of information about all things Portland. So just a little plug for that, too.
OK do we have any questions or comments from anyone? Yes oh, Yeah. And if you wouldn't mind coming up to the microphone, I know it's a bit clunky, but with the session's being recorded in this way, we can capture everything. So thank you. Hi that's working now. Beautiful I'm Cassandra. I'm a librarian.
I have a question that sort of struck me as you were all talking. And Ana, you mentioned something about sort of underrepresented researchers. And I'm relatively new to this. And I think that I've learned that librarians and publishers don't necessarily connect enough. And I'm realizing that in our discovery layer, when I'm telling students about things like citation justice, where you're looking for sort of underrepresented groups and what they're publishing, I don't know that it connects with any of these things that we're working on the other side in terms of pids.
And I'm wondering, is there something I'm missing? And I know there are different tools that we could probably look into, but is there discussion around that sort of when people on the end user side are searching for something in a University library, and how do we connect that with this side of the metadata and pids and that kind of thing? Or am I just absolutely missing something?
This I don't know how to begin. I mean, I think that I can speak from my perspective as from my regional perspective. We're also having been in charge of ORCID in Latin America. Um, I think there is AI don't know if this word exists in english, but miss compass like different velocities.
So for, for instance, in Brazil, you have a lot of researchers that have their ORCID IDs. Um, in other countries as well. But you don't have institutions that are members who are pushing information to their ORCID records. So they don't they, the researchers don't see the value of it. They don't understand why it is important.
And this is why I put like this like this all connected diagram of more brains to show that it is really a everybody benefits from it. Things that researchers are starting to understand why it is important and I'm not sure why what is the best way to engage them on that. But for sure, library and research office have a huge role to play there.
and so, yeah, this is something that was very clear for me when I was at ORCID. There's a lot of researchers create their ORCID IDs because the government is saying so. The funding agency is saying, their institution is saying, but at the same time they don't see the benefit of it yet. I think that's absolutely right. I think there's a whole question about this is a real it takes a village.
Everybody's got to get involved in helping researchers understand why this is important. But to Anna's point, during her presentation, we also have to make it easy for them and not overburden them. So there's still a lot of work to do on this. Thank you. Ginny, please wave your hand or shout if you want to pitch in. Yeah, we have another question. Oh, so this is just following up on that.
I'm so glad you mentioned multilingualism and the impact that has, especially on the kids and engagement internationally. I was wondering if any of our panelists could share kind of what that roadmap looks like to internationalize all of this work and help us with interoperability in non-English languages and that crosswalk. So once again, you might want to take that one first.
Maybe would you? Did you hear it? OK Me? yes. Yes Yeah. Do you want. Do you want to have a stab at that to start with? Because no. Crossref has been doing work on multilingualism, I think.
Haven't you? A little bit, yes. We just did a survey, actually, about metadata of a full membership, and we had, Uh, close to 1700s responses, I think, and multilingual metadata came, if not top, certainly second or third, um, of what our members want to start providing. We're definitely seeing an increase in connecting translated content to each other.
Um, and we're looking at things like we already can take abstracts. For example, in multiple languages, members can add as many abstracts as they want. Um, and if we were to, for example, follow the jats, um, standard, you would have sort of every element repeatable with a language tag. So things like that, you know, we're not supporting it well enough at the moment, but things like that we're trying to look into and, and get better at it.
Yeah Thanks. There of you want to add anything? I would just add one. Um, I'm not sure if you can hear me. I would just add two. So if you're going to collect things in multiple languages, especially like in a submission and peer review system, you need to make sure that your editorial board or whomever on staff that are looking that are also able to kind of verify that from a visual standpoint, too.
And I know I don't mean to say how much, you know, staff time will have to go into that is something to definitely plan for as well. But keep that in mind for sure. And also, this has come up a lot at niso, and I don't know what, if anything, is actively being worked on, but it's definitely like with crossref, it keeps bubbling up as something that's really important to the community.
So hopefully there will be some movement. Please hi, I'm a publisher. I'm from Amsterdam University press, and I hope that my question is recognized by other publishers as well. Um, we apply, so we, we give dois to all our books, all our book chapters. Um, we don't have our own platform, so we are dependent of third parties, aggregators who then disseminate our content in our metadata.
What I've noticed is that the difficulty that we're encountering is that our eyes, when they are on other people's platforms, these platforms want to apply then their Doi to our content and through crossref we because we do deposit our Doi actionscript. I've had to try and find a solution which is co access, which is not ideal. And I was just wondering if other people are recognizing this problem of having your own and doing the work.
But then. It gets lost along the way because of third parties applying their dois to your content and prefixes. I don't know if there's anybody else in the room with that problem. I suspect there are other publishers around the world with it. But Jenny, that's I think that's definitely one for you to take. Please yeah, sure.
Yeah you've highlighted a problem for sure. Um, the way that book publishers have are, Yeah disseminating, um, access to the content means that sometimes there are duplicated dois for the same book or book chapter. Um, yeah, I think it's something we need to look into, but it's definitely not. It's not just you. Um, you know, some just kind of take it back a little bit.
Like, so much of this is also down to kind of like a social contract between all the parties. Um, and obviously we want to try and reflect some of the commercial contracts that publishers have. Um, but we have terms, terms, you know, you can only really register a Doi for something you have the legal rights to do so for, but that's really hard to police. Uh, so it has happened and there are duplicate Doi out there, especially for books.
Maybe we need a standard. Another one? We don't have enough already. OK. I don't see any more questions. And we're right on the hour, so I'm going to say please join me in thanking again all our speakers here and overseas.
And Thank you all for coming along for this session.