Name:
Going with the publishing (work)flow - moving metadata from the point of peer review -NISO Plus
Description:
Going with the publishing (work)flow - moving metadata from the point of peer review -NISO Plus
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/742957c3-7e84-47fb-a1c3-c3a38208b9d1/thumbnails/742957c3-7e84-47fb-a1c3-c3a38208b9d1.png?sv=2019-02-02&sr=c&sig=6AxeDZLM9Kg%2B5HfGZyjnSWOKoCeXMkY%2BnJZKy6U0PJY%3D&st=2024-12-21T18%3A17%3A29Z&se=2024-12-21T22%3A22%3A29Z&sp=r
Duration:
T00H45M57S
Embed URL:
https://stream.cadmore.media/player/742957c3-7e84-47fb-a1c3-c3a38208b9d1
Content URL:
https://cadmoreoriginalmedia.blob.core.windows.net/742957c3-7e84-47fb-a1c3-c3a38208b9d1/Going with the publishing (work)flow - moving metadata from .mp4?sv=2019-02-02&sr=c&sig=c2gCseccxzryBJ1WMsmbkNQr44wDSx5aQwlWvW3BcnY%3D&st=2024-12-21T18%3A17%3A29Z&se=2024-12-21T20%3A22%3A29Z&sp=r
Upload Date:
2022-08-26T00:00:00.0000000
Transcript:
Language: EN.
Segment:0 .
[MUSIC PLAYING]
HELENA COUSIJN: Hello, and welcome to our NISO Plus session "Going with the Publishing Workflow-- Moving Metadata from the Point of Peer Review." I'm Helena Cousijn, and I'm the moderator of today's session. And I'm really happy to introduce today's speakers, Brian Cody from Scholastica, Rebecca Wojturska from the University of Edinburgh, Randy Townsend from PLOS, and Shelley Stall from AGU. Let me now hand over to Brian to start with the first talk. Thank you.
BRIAN CODY: Great. Thank you. Let me just share my screen. All right. So talk today. So I'm Brian Cody. I'm Scholastica's CEO and co-founder at Scholastica, which is what I'm going to talk about today. So in this session looking at metadata-- that's obviously the topic-- I'm going to look at it from four sort of points, looking at how the role of author education, how can that point of submission and peer review, how can we help metadata get all the way to production.
BRIAN CODY: I'll share some lessons learned and opportunities. For those lessons learned, to get a sense of the viewpoint I'm bringing, so for Scholastica, we build software and services to work with scholarly publishers. So we have a peer review system. So we do a lot with collecting and metadata production service, so preparing that. We also host open access content. So we use it there.
BRIAN CODY: But we work with lots of publishers for only one of those pieces. So we might work on collecting metadata, but we are helping them distribute it other places, sort of not just hosting it. And vice versa, we might be working on production, but using metadata they collected somewhere else for peer review. So we've seen a lot of the challenges.
BRIAN CODY: So I'll try and bring some of that. So I'll start on this sort of level. When we think about enhancing metadata or the potential of metadata, I think a lot about author education. One of the reasons-- if you're a publisher, you probably know, if you can collect the metadata up front, that's a lot easier than chasing it later.
BRIAN CODY: People are highly incentivized at the point of submission. But one thing we've heard from authors over the years-- I see people on Twitter sometimes, where they say, ugh, why do I have to fill out this long form? They experience sort of the metadata we're looking for as people who care about metadata as sort of a litany of chores, or seems unnecessary. And so one thing we've worked with publishers to ask is, do your authors understand, or are we doing any work to help them understand, why ORCID is important or why we're asking for it?
BRIAN CODY: If we're asking authors to put their data in a repository and we're trying to follow FAIR data principles, are we mentioning why? I think a lot about benefit for the authors, because sometimes the answer is, well, we want these metadata, and you have them. But also, there's a way to talk about it, which is, we want to help your article be more discoverable in the end, and there are all these new ways to discover content, and metadata helps.
BRIAN CODY: Also, there's a benefit, which is, if we don't have these now, that might lead to a delay in the production process. Your content might get out later, because we want these. And so communicating that can help authors, I think, be a lot more willing and excited to help give you the metadata you need. I also know we've seen journals start by putting these explanations in their author guidelines or their submission guidelines, which I think is a great start.
BRIAN CODY: It makes it transparent. I think if you can put that same information at the point of data entry, it's even more powerful. And I've seen examples where, whereas ORCID, you see a note saying, here's why we want this, or sometimes it's, you know, the little question mark that you click and then it brings up the, why do we want this? And then there's more room to share that.
BRIAN CODY: But that sort of education can help authors feel like the work they're doing is not just busy work, that they can see value for it. And that's, again, at the moment of submission, it's before the production or the publishing step. But it's a really vital moment for that metadata. One of the things I think about is, again, at moment of submission, how do we keep the metadata moving? So sort of related to what we were just talking about, one of the first things to look at is, if we want to get metadata moving across our entire production process, how are we collecting it?
BRIAN CODY: And there's different ways to do that. One we've worked with journals to do is look at their Crossref participation report. Here's an example. But if they're saying, oh, we want more ORCID IDs, or registration IDs, are we even collecting that? And sometimes you see that they're at zero, or it's very low, and it's because their process is, in production they have authors fill something out at that point or update data.
BRIAN CODY: And depending on the level of publisher you're at, this might be something you've already standardized. There are lots of publishers who are trying to level up their production program and are finding that they're not asking for some of the things that they want. So that's important. And I think in three tiers, and hopefully this is helpful to someone. There's collect the metadata, validate it, verify.
BRIAN CODY: So let me say a little more about that. Collect-- literally did someone put it in a field, and do you have it? OK. Think about-- I'll keep using ORCID. If someone put in an ORCID ID of 123, we kind of know that's not real. That's not right. And so we can check, is this valid?
BRIAN CODY: And so we often think of that as, the form would say, oh, that's not correct, so either delete it or change it to be a real one. So that's a way to help enrich metadata. That's actually what's happening in that moment. But as we think about all the way to production-- and we see this when we work with people sometimes-- the ORCID ID was wrong. Sometimes they paste in the ORCID ID for the third author when they meant the second author, and now they're connected to the wrong person.
BRIAN CODY: So there's other steps there you can verify. Something people might have seen is, at the point of submission, an author might connect-- sorry, the submitting author might connect their ORCID account to that platform or to that submission so that that way they've actually signed in and it's verified. There's other versions where, for example, you put an ORCID ID and it sort of shows you, oh, here's the person that is an ORCID, or here's a link to ORCID, can you go click that and verify this is Susan Johnson?
BRIAN CODY: And if you see someone else, maybe that's not the right ORCID ID. So that's a form of verification. So again, that is metadata enrichment, because we're getting more accuracy, and we're doing that more at the moment of submission. When we think about between submission and publishing, and that production side, I think a lot about all the checks we're doing.
BRIAN CODY: So I think about copy editors and the proof stage, where authors and editors are looking at this. This is a place where people are trying to catch mistakes. And so one of the questions we'll ask publishers is, does your proof that you're sending people-- is that using metadata, or is that sort of separate? So basic example-- if I notice the title had a typo in it, it was spelled wrong, and I fix it, is that also fixing the metadata title?
BRIAN CODY: Are these coming from the same place? Or are we other doing double work or is it a room for sort of the metadata to stay lower quality than the visible data we're looking at? Ideally, they're either the same, so you sort of get a one to one. But I think about that, too, with-- sometimes you see PDFs or web pages where it has the ORCID ID or references have DOIs and again, is that reflected in the metadata?
BRIAN CODY: Is that DOI for that reference the same? Or if an author says, oh, this is wrong, are we changing text that someone's reading, or are we changing metadata that's being translated into text? And the closer they are, the more that all that human work that's happening is going to improve the visible product, but also sort of passively be improving the metadata.
BRIAN CODY: And so that's a step I encourage people to just double check. Are you either duplicating work, or are you missing an opportunity to get that enrichment? Couple lessons learned I wanted to share. One is for people to make sure they're thinking about their metadata. Where is it going to be used? Because that can have ramifications. I think a lot of people at NISO know about this.
BRIAN CODY: If you're new, I would encourage you to get involved in some of the working groups. There's people who are very sophisticated on this. But I'll share an example. We had publishers. They were collecting article type metadata. They had been for years. When they went to get into PubMed Central, it turned out PubMed Central has a more restricted list of acceptable article types.
BRIAN CODY: And you can translate anything to be "other," so you can sort of get around it, but you have to update the data. And ultimately, they weren't getting what they wanted because having a bunch of articles labeled "other" when there may be case studies or original research-- they want those to show up that way. And so it introduced work.
BRIAN CODY: And if you know that people have goals where they're thinking, I want the metadata to be used somewhere to help discoverability, well, see if they have additional rules, because that can help standardize the metadata up front. And that kind of fits in with the second point I wanted to make. Again, for NISO, this probably won't be as much of a shock. But sometimes for people who are not so standards friendly, they have this idea of, I'm going to move towards digital-first or single-source publishing.
BRIAN CODY: I'll move towards XML-first publishing. And I'm going to get rid of print, and then it's going to be-- I'm going to have this single digital output, and it solves all my problems. And then they, again, get in different indexes, or they want to start using Atypon or Silverchair or HighWire, and they have their XML, and then they're told, OK, great, we need to update your XML, or we need to translate it, or we need to migrate it.
BRIAN CODY: And the idea is, but I have XML. I have JATS XML. We try to help people understand JATS is a standard, but different places that take that XML, that JATS XML, might have additional rules, and that JATS has flexibility. And people can make choices within that and it won't say it's not JATS, but it doesn't make it the exact thing they want.
BRIAN CODY: So I think of this as, like, flavors of ice cream. JATS is the ice cream base, but it doesn't have flavor. And when we want to make XML that we're going to put into Crossref or PMC, those are different flavors. One's chocolate, one's vanilla, one's strawberry. And then depending on what you're doing with the XML, you might need a whole new flavor. And there are services and vendors that can help you translate that.
BRIAN CODY: So again, you have a base, but they might need to translate that and produce your chocolate ice cream, because you've spent years making strawberry ice cream, which is great, but it's a different use over here. And for me, it's all about setting expectations, because sometimes people think about their metadata and how they're-- they're like, I have all this great metadata. And as they level up, again, their publishing program, I think it's helpful if they expect and know that there's still work, that you've made a lot of ice cream, and you've made a lot of ice cream base, so that's amazing-- you have great metadata-- but there is work to get it in the world to get the benefit of discoverability.
BRIAN CODY: I don't know, I just find that helpful. We've seen a lot of people where that wasn't obvious to them. So on that note, I think opportunities, making sure everyone-- authors, your editorial board, the publisher-- sees metadata not as a chore, but as something that leads to discoverability in this modern era where people-- probably use some examples of this. People are using metadata to surface and discover and curate and collect scholarship in new ways.
BRIAN CODY: Every kind of metadata you have is-- I think of it as, like, it's a different hook, a different way for your content to be grabbed up into what the discoverability services are surfacing. And so you want lots of those hooks. And for some people, it can feel more like this long list of things. There's some great programs if you're not familiar. People can go look at them.
BRIAN CODY: But call for Open Abstracts and OpenCitations, Crossref's Cited-by, and a lot of that's about some of the traditional benefit that things like Web of Science gave, where you look at how people are citing each other. For discoverability, the more you're making those metadata-- first, do you have metadata for things like your references, and are you including your abstract? But are you making them open?
BRIAN CODY: And that's really valuable for discoverability. Last thing is, again, this question of where do we start with enhancing metadata, depending on where your program is? I don't want it to feel like, oh, I could do everything. I need to do all these things, everything that's been talked about. That can be a goal, is I want world-class metadata. But really, I think what it is is you want world-class discoverability.
BRIAN CODY: And metadata is one route. And what I encourage people to do is take-- you can read more about this, but an agile approach where it's, think of a specific problem, fix it, get the benefit, keep going. So even if you have this big goal, say, rather than think of it as a five-year project that might never get enough traction because it's too big, say, this quarter we want to start requiring ORCIDs, or, this quarter we're going to verify something, or we're going to create an option to put in funder information even though we weren't doing that before.
BRIAN CODY: And we don't even know-- we're not even sending it up to Crossref or DOAJ or our indexing services, but at least we're collecting it. And so that's a small step, but if you do that enough, it iterates. And it really adds up to something important. Well, that is what I wanted to speak to. And I believe next we have Rebecca. Thank you.
REBECCA WOJTURSKA: And that should be shared. Perfect. So hi, everyone. Yes, I'm Rebecca Wojturska. And I'm the University of Edinburgh's open access publishing officer. I'm based on the scholarly communications team. So I'm here today to talk about metadata, specifically in library publishing and particularly in the workflow that we use for OJS.
REBECCA WOJTURSKA: Just to give a bit of background on the service itself, I thought this would be useful for context just so you're aware where I'm coming from. So we launched a general hosting service in 2009 using Open Journal Systems, or OJS. OJS, for those who don't know, is open source software from the Public Knowledge Project, or PKP.
REBECCA WOJTURSKA: There's many acronyms in there, so I apologize. In 2021, we launched a book hosting service to support monographs, textbooks, and basically just publishing projects across the university. And we use Open Monograph Press, or OMP, for that, which, again, is open source software from PKP. It's worth stating as well that we're not actually a publisher. We don't have responsibility for peer review or production, et cetera.
REBECCA WOJTURSKA: That is very much on the editorial team. Instead, we focus on empowering and equipping academics and students with everything they need to run successful journals and launch successful books. So we focus on hosting support and providing publishing expertise. It's also worth noting that, as I'm sure many of you already know, the University of Edinburgh already has a traditional publisher in the form of Edinburgh University Press, who are a fantastic publisher.
REBECCA WOJTURSKA: I used to work there. They're amazing. And the remit of the two services are very different. So our service at the library, it's provided free of charge to staff and students of the University of Edinburgh. We do also provide a shared service for external partners within Scotland for a small fee. And that fee just covers costs, so we don't make a profit. And any money from that is fed straight back into the service.
REBECCA WOJTURSKA: We do have a service board, and that comprises of academics, students, journal editors, librarians, and a representative from EUP. And in terms of staffing, we have one full-time member of staff, which is myself, and I also get a day a week of technical support. In terms of what the service actually offers, we offer use of the OJS hosting platform as well as ongoing technical support.
REBECCA WOJTURSKA: So that's including upgrades and also preservation of content. We offer training, documentation, advice and policies, all to ensure that journals are in line with industry standard. And we also offered the initial setup of pretty much everything with some limited customization. We coordinate ISSN's, Crossref submissions, and metadata delivery. And we provide indexing support, including help with finding and submitting the journal to all the relevant databases.
REBECCA WOJTURSKA: And there are a lot of databases, as I have found out. We also help with annual reporting to help measure journal and article success. And it's worth noting as well that everything is fully open access. We use Creative Commons licenses. And we don't ask for article processing charges. So we're keeping it open both ways. You don't pay to be published, and you don't pay to access the content.
REBECCA WOJTURSKA: And this kind of accessibility of diamond open access is a core component of inclusive publishing, although we know there are issues with digital property as well. But yeah, so basically we just get involved in kind of all the background publishing stuff that no one likes to think about, but which is crucial to dissemination and discoverability.
REBECCA WOJTURSKA: And I just wanted to quickly share that we had an exciting rebrand recently to bring our book and journal services together. So we're now known as Edinburgh Diamond, which kind of does what it says on the tin. It promotes diamond open access, transparency, and high quality. So we've got a new logo, and I've got the links to the main sites there. And yes, that is Will Smith making an appearance on the Twitter feed.
REBECCA WOJTURSKA: So firstly, I wanted to touch on metadata, as that's what we're here for, in the submission and peer review stages and, again, within the framework of OJS and library publishing. So as with most submission systems, metadata starts upon author submission. And as we work with digital-only formats, that is also the kind of focus of my presentation.
REBECCA WOJTURSKA: We don't deal with print as part of the service. So a journal is very reliant on authors to complete metadata as fully as possible. And the pro to this is the potential for fuller metadata from the start. And that can protect against kind of changing, evolving, different sets of metadata, or confusion about the metadata for the same article. The con is that authors aren't necessarily fully aware of what constitutes good metadata, or they don't necessarily understand why it's important.
REBECCA WOJTURSKA: OJS does allow the journal editor to update metadata as they see fit. And this could be straight away after submission. It could be after peer review, once the reviewers' suggestions have been applied. And for example, it could be maybe one or two keywords were submitted, and they're not great, so just improving the quality of those, for example. So the pro to this is there could be consistency in the quality of metadata across all the journal's articles, because if one person or one team are working within the same framework for quality metadata, then it can be applied to all the articles.
REBECCA WOJTURSKA: The cons is that it requires extra time from the editor, because they have to go through and update it, and they also have to have that awareness themselves of the importance of metadata. And that's particularly important for us, just because, as I say, we're not a publisher, but we do provide support in this area and guidance. As for metadata at the production stages, if metadata has been kept clean and correct throughout the process, there should be minimal effort in the production stage on OJS, she says.
REBECCA WOJTURSKA: Of course, there is also the option at this point for the editors to make final changes to the metadata and to ensure everything is clean and good and ready to go before file generation, or before metadata exports to places such as Crossref or the DOAJ. So I've made a note of some of the places, or some of the formats, that OJS can export metadata into, as well as noting, of course, you can get Crossref and DOAJ complaint data, as well, which is very, very useful.
REBECCA WOJTURSKA: Of course, attention is required for metadata in file generation, specifically including PDF and HTML. And speaking of which, in 2019, my predecessor ran a XML publishing project to look at how the journals we support could develop an XML-based publishing workflow using existing open source tools. So although I wasn't involved in the project myself, I did want to kind of touch on it, because there were some interesting observations made.
REBECCA WOJTURSKA: So as we know, XML is used widely in academic book and journal publishing. It makes scholarly content machine readable and layout independent, more flexible, and reusable for a variety of formats. It also offers improved searchability, accessibility, and preservation, and allows text mining and content enrichment through multimedia and semantic tagging. So there's various options you can use to create HTML.
REBECCA WOJTURSKA: I've just listed a couple there. And XML can be introduced at different stages of the production process. So you can have XML first, XML last, and XML middle workflows. So you can really see the importance of getting the metadata right in order to be able to produce that XML properly. And the project found that the easy way for XML creation for small open access journals was through OJS plugins, specifically docxConverter, which uses DOCX2JATS.
REBECCA WOJTURSKA: So the pro was that this is the most user-friendly option and, if done correctly, it can be utilized to create PDFs and HTML with clean and correct, quality metadata. The con is that it requires a manual cleanup of the metadata, because things don't always export perfectly, sadly. So yeah, there was a manual cleanup that is involved, all to ensure that it's fully machine readable, which is the ultimate end goal.
REBECCA WOJTURSKA: In terms of metadata dissemination, it will come as no surprise to anyone when I say that good metadata is your best marketing tool for discoverability. OJS has plugins that can export metadata to Crossref and the DOAJ in compliant formats, as I've already mentioned. And similarly, OMP has an ONIX feed.
REBECCA WOJTURSKA: And PKP are working on a Crossref plugin, I've heard, because more publishers are assigning DOIs to chapters. So both OJS and OMP keep metadata grouped quite tidily, in my opinion. And that makes export easy and compliant with major external partners. And obviously, once you start submitting the journal to various databases, some of the abstractors and indexers crawl the website to kind of get that metadata themselves because it's open access, so you don't need to provide access yourself.
REBECCA WOJTURSKA: So you do need to ensure everything is properly and fully filled out on OJS so that they can get everything they need to create entries and lead people back to the articles and back to the journal ultimately. Just to summarize the kind of challenges that I've touched on in this presentation, authors are not always best informed about high-quality metadata and its impact on dissemination.
REBECCA WOJTURSKA: And similarly, general editors are not always best informed about high-quality metadata and its impact on dissemination. There's a significant lack of resource for XML publishing in small diamond open access journals, and that can impact on accessible data outputs and the ability to work with that metadata, as well. As previously mentioned, XML metadata using open source tools can create imperfect results which require manual attention and therefore time.
REBECCA WOJTURSKA: And lastly, some editors may struggle with the resource to export metadata for various external partners, especially when they all have different standards, and because that also takes time. And as we know, time is very precious when it comes to academic publishing. And most things, actually. So finally, I thought I'd just finish with a few recommendations.
REBECCA WOJTURSKA: So firstly, provide guidance around good metadata and the impact it has, for example, on general discoverability, search engines, and indexing. Some of my top tips include getting metadata in the URL, such as the DOI, because that can just improve discoverability, especially on a search engine. Making the DOI as informative as possible-- for example, by having the journal acronym, the year of publication, and the article reference number all in that DOI.
REBECCA WOJTURSKA: Using DOI links in references, or if you're an editor, encouraging your authors to do so. And that just means people can click through to the article without having to search for it. And also, publishers need to ensure that metadata is constantly updated if there are any changes through their provider-- in our case, Crossref, in case you hadn't guessed already. And I always say, think of keywords as key phrases.
REBECCA WOJTURSKA: So what would people search for on a search engine that would result in finding the author's work? So you want to use those phrases as your keywords. So something too broad will get lost. Something too niche might never get found. Also, create documentation about these kind of best practices around metadata and file creation, and consider sharing them with your peers, for example, me. And also, yeah, look into automated metadata quality checks, because, everything's kind of becoming more open now, and I think that is a very, very good thing.
REBECCA WOJTURSKA: So thank you for listening. And if anyone has any questions or would like to talk more, please do reach out and give us a follow on Twitter @EdinDiamond. So thank you very much. And I believe I am passing along to Randy.
RANDY TOWNSEND: Thank you. Thank you for joining us today. My name is Randy Townsend, director of publishing operations at PLOS and associate professor at George Washington University. As we think about the journey of peer-reviewed literature, we as publishers recognize that there is a thoughtful process in place that involves many professionals and experts adding value along the way. Many times, activities happen in silos, and editorial may not know the details of production, and production may not be fully versed in peer review.
RANDY TOWNSEND: While a measure of autonomy may have some benefits, hand-offs tend to flow smoother when there is mutual understanding and shared value. When we know why something is important, we care a little bit more about it. I want to start with the organizational value of metadata. These are two headlines from 2021 where lottery winners had to search through the garbage to find the tickets they trashed.
RANDY TOWNSEND: It's easy to throw away things that seem unimportant at a particular moment. In efforts to declutter, we identify things to get rid of, and we breathe easier afterwards. I know I do. Oftentimes, value is not immediately visible. Later on and after the fact, we can learn to regret those moments when we didn't take the time to consider the potential value.
RANDY TOWNSEND: Metadata can be like that. It's information that describes information. Many of us understand the value. Sometimes we assume that everybody is as passionate about metadata as those of you who have joined us today. Like a message becomes ambiguous as it's passed from one person to the next, it's easy for the significance of metadata to become less of a priority.
RANDY TOWNSEND: These eight pillars can help frame the potential of metadata for your colleagues, and why it's as important to the beginning stages of publishing as it should be for the middle and in all areas throughout the business strategy and operations of your organization. What is your vision and your mission? What are the core values of your organization? Who are your most important stakeholders, and who are your strategic partners?
RANDY TOWNSEND: Where do you see growth potential, and where do you anticipate threats and risk? And one of the most important things-- how do you build resilience to ensure a sustainable future for your company? I feel a little bit like Quentin Tarantino, one of his movies, starting from the end but connecting the end to the beginning so that all of the scenes throughout make sense.
RANDY TOWNSEND: At each stage, you have to demonstrate the power of metadata and how it contributes to the success of that segment of the workflow. In the beginning, we want to make sure metadata activities serve the primary stakeholders-- the author, the publisher, and the content. We ask who the authors are, where are their affiliations, who funded the research, and what are their ORCIDs? For the publisher, we want to make sure the submission is branded with our unique identifiers, that the integrity of information is protected, sufficient disclosures and acknowledgments are present, and key peer review dates are captured.
RANDY TOWNSEND: We want to know how diverse our editorial boards are and who our peer reviewers are in China, and what subject areas are growing in submissions the fastest. For the content, we want to distinguish critical elements, links to any preprints or related research, and proper attribution, discoverability, and indexing. Further consideration includes data and software availability, and relationships that connect research findings, methods, and techniques more broadly.
RANDY TOWNSEND: That brings us to the sequels, where derivative outputs can be generated. Think social media, press releases, blogs, things that expand the conversation and invites diverse and immersive discussions. But what else? How can we make sure this valuable content will be available in technologies we haven't yet invented, or that there are versions that can adapt to the way in which future generations will access content?
RANDY TOWNSEND: Throughout the business of the publisher, how are we capturing financial information about funding sources without sending yet another correspondence to the author, or distinguishing between an APC waiver from a submission that qualifies for research for life? Now, this is really my favorite part of this presentation, and where I assume that everybody here loves the things that I love as much as we all love to talk about metadata.
RANDY TOWNSEND: In 1977, Star Wars released in theaters. Here's the partial cast list from the original movie, and I've highlighted Harrison Ford, who introduced us to the swashbuckling space smuggler Han Solo. Star Wars was not his first acting opportunity, but neither was it his last gig. Harrison Ford has acted in 82 movies and TV shows throughout his career.
RANDY TOWNSEND: This information is easily searchable in Google, Yahoo and Bing. In 1997-- in 1977, excuse me, I'll just point out that none of those search engines even existed, but I presume that they had pretty good, or interesting, at least, record-keeping operations at the time. In hindsight, of course it was important to capture every piece of information you could about Harrison Ford, because clearly he's become a pretty big deal.
RANDY TOWNSEND: Star Wars itself has become a pretty big deal. It received seven Oscars and earned $461 million in US ticket sales and a gross of close to $800 million worldwide. There have been nearly 400 books written about Star Wars. 11 additional motion pictures were made and 21 TV shows. Even bigger than that, the Star Wars enterprise has created more than 100 different unique starships, more than 7 million different languages, with 68 actually used in live-action movies and TV shows.
RANDY TOWNSEND: There have been more than 100 video games, theme songs, Jedi, Sith, droids, bounty hunters. Some species even discriminated against the droids. And somehow, these storylines, for the most part, have remained connected. We know that wielders of the dark side of the Force use red light sabers, and the Jedi have their pick of most of the other colors of the rainbow. I use this lighthearted example to demonstrate the potential of capturing this information and information about information, the value of being able to drill down to know that the TIE fighters are vastly different than X-wings in shape, size, and maneuverability.
RANDY TOWNSEND: Oddly enough, two years before Star Wars hit theaters, Sony released the Betamax videotape, which was shortly followed by the VHS tape, which eventually won the video war of the day. This amazing technology allowed people the luxury of eventually watching Star Wars or any other Harrison Ford movie from the comfort of their living room. However, as technology advanced, we learned that VHS has become outdated, and eventually lost out to the much clearer, much more expensive Blu-ray.
RANDY TOWNSEND: And in 2011, everybody that had the trilogy on VHS could now rebuy it in the new technology, with one important twist-- the new digital technology allowed the producers to add new scenes to strengthen the story. They were able to repurpose old content and make it new and interesting and valuable all over again. So when looking at the publishing workflows-- I'm just trying to connect it right back to the point of this presentation-- it's beneficial to connect everybody throughout the process to the question of, what else?
RANDY TOWNSEND: Examine what it is that you're collecting, and consider what you're not collecting. Frequently embed conversations about how metadata can support the vision, mission, and core values of your organization in every strategy meeting. Think about the needs of your stakeholders and partners. Consider what you need to grow and expand the potential of your content while protecting it against threats and risks to protect the future potential of your business.
RANDY TOWNSEND: Thank you very much for-- hope you enjoyed a little musings with Star Wars and metadata. And I'm happy to turn this over to my friend Shelley Stall.
SHELLEY STALL: Thank you, Randy. And thanks for the opportunity to talk at this session. I'm Shelley Steinhart. I am the senior director for data leadership at AGU. And I am incredibly passionate about metadata. And coming out of Randy's talk and the fun applications for why it's important, I want to take you through, perhaps, incentives for yourself to help our researchers as they think about what it means to have good metadata, and pulling that through, giving them as much incentive to help you pull that through, the entire production system for publication.
SHELLEY STALL: So AGU-- we have a data position statement that affirms that our data-- so in this case, I'm mentioning earth and space science data, but gosh, that applies to so much discipline-specific research data. It is a world heritage. It needs to be shared and preserved and documented. And for the purpose of this talk, I want to highlight connecting all the way to the very end to give credit to those who actually created the data.
SHELLEY STALL: So that's the part I'm going to highlight here. So to get us started, this is just the greatest depiction of connecting data over time, especially from a publisher point of view. So Nature, back in 2019, took some time to do some analytics. And some of the resources that I'll put in the link for this talk will show you who's credited with gathering all the data and doing the analysis.
SHELLEY STALL: They did an amazing job. And what this shows you is the connection of papers, and essentially the genealogy of a paper, what came before. So when one paper is referenced by two, then you get these dots. And what I want to highlight for you is that in the yellow section, which is in the upper right hand side, we in the geosciences-- that's the geoscience branches-- we interact with all of the other disciplines.
SHELLEY STALL: So it really matters to us that our data is easy to understand and easy to find and interoperable with all kinds of other data. And frankly, that applies to software as well. So one of the other things that they analyzed-- and this is something I commonly talk about, because I think it re-orients, especially for our researchers, the importance of the kind of relationships and kind of access to information they need.
SHELLEY STALL: Here you see-- and you as publishers, the folks who are publishers that are here, will know this-- the single-author paper is disappearing. And papers from a single country-- that's blue-- are also shrinking. And papers are strongly coming from multiple countries. This is a big deal. And we need to pay attention, because it really means-- and here's my little summary.
SHELLEY STALL: And I'm happy to grab-- as soon as we can all get together at the bar, I'm happy to grab a bunch of glasses of wine or whatever and argue with you on this, because I don't really care what the updates are. The point is, we're not working as individuals. We're working as teams. We are not single country. We have to have international collaborations. We need easy ways to discover things.
SHELLEY STALL: We need good documentation. Data and software need to be interoperable. I have it on one line, but it really applies to both. Software needs to be interoperable and accessible using tools that are common to everyone. And we need good licensing for re-use. And can I just tell you, it's all the metadata. You look at any of these items, and it tracks straight to the metadata.
SHELLEY STALL: And we as publishers are responsible for a number of these to connect to each other, right? An amen on that, people. An amen. All right, here we go. So this is from Helena, who's moderating for us today. Helena, thanks for letting me steal your slide. So Helena laid out for us in a talk that she gave-- I think she actually gave it here at NISO Plus last year-- where she talked about the journal data policies working to help researchers find an appropriate repository and then coordinating that submission.
SHELLEY STALL: And this is all very difficult, right? None of this is really smooth. We're working on it, but it's not smooth. And the linking is critical here. The linking is where persistent identifiers matter. It's where metadata matters. We have publishing, collecting and distributing those data citations, and eventually we get to credit. And we all know that if this works well, that's really-- that just doesn't happen that often.
SHELLEY STALL: Like, we cannot guarantee to our researchers that this is smooth for all papers across all publishers. We know that, at the heart of our heart. OK, now time for the humor. So I can't draw. This is my drawing. But I wanted you to see it from a researcher point of view. And this is where you should be laughing, and I'm going to guide you through it in case you can't understand what this is talking about.
SHELLEY STALL: And it's a bit tongue in cheek. So researchers love data management. And you just know that's not true, right? OK? So here you have your amazing researcher who's like, oh, I wonder what data's out there? And those little pointy things in the middle-- those are the researcher's hands, in case you don't recognize them. And they're on a keyboard.
SHELLEY STALL: That's the little dots. OK, get it, keyboard. And they are able to find, within a few seconds, relevant data. And I am not kidding. That's what it should look like. And they should be able to look at that data and go, oh, this is relevant to my work. And then they spend most of their time on the actual research.
SHELLEY STALL: This is what we want-- less hunting, more research, so much better. And then it's time for them to actually-- they're interested in getting published. They want to submit their data to an appropriate repository. They go to do that, and they're really frustrated because they picked a repository that will help them, and the data manager says, I need more metadata. And they just don't know why.
SHELLEY STALL: Why do I need more metadata? And here's where we have to help. We have to help realize that the investment of time on getting good persistent identifiers, good metadata, pays them back a million fold. And we need to make it easier. So UNESCO, their recommendation on open science, which all the countries signed last November-- this is huge, just huge-- has within it this statement that from the very beginning of the research process the researcher both contributes to open science and takes advantage of open science practices.
SHELLEY STALL: And ba-ba, there you go. That's it. And it has to do with making data available, you know, open as possible, as closed as necessary. We all understand that. I don't mean to say that open is for everything. Some things have to be protected. But nonetheless, being able to discover is really important. So we need some incentives.
SHELLEY STALL: Within the US, we've got work happening at the National Academies, coming out of the publication from 2018 on the open science by design. And then they had this roundtable where they've now released their package to help with incentives. This is really great, and it helps us all think and give us a framework for thinking about what it means for incentives along these open science practices.
SHELLEY STALL: Further, in case you're not in the know, we've got a pilot going on at the National Science Foundation for actually connecting links to data sets into the reporting from PIs on their grants. And I have a report of my own due next month, and I'm really excited because I'm going to link as many data sets as I can so I can see exactly how this is working and then tell others about it.
SHELLEY STALL: And it's fantastic. It's really leading the way. Small steps get us there, and I hope the pilot is successful. And then in just a year-- it's almost a year exactly-- the policy for data management and sharing from the National Institute for Health is going to be in place. And I know the researchers associated with funding coming from NIH are both excited, maybe not always in a good way, but they are eager to understand exactly how this is going to work and for the NIH to help lead the way on exactly the steps to make sure that everything has adequate metadata and it's supporting their new policy.
SHELLEY STALL: And I'm really excited to see this. And thank you so much for Susan Gregurick leading this, and all of her peers and colleagues and the folks at the National Library of Medicine. This is just phenomenal. And even more recently-- this is super exciting-- NASA is putting out a lot of money in the next few years to educate their researchers across all of their science mission directorates to understand what it means to practice open science.
SHELLEY STALL: And I am really delighted. We've been talking to the folks at TOPS, and we are so excited for this work. And certainly, we are going to be supporting it. The year of open science for NASA is going to be 2023. And for UNESCO and the UN, it's 2025. So we're going to all be talking open science for the next few years. So we encourage authors, right?
SHELLEY STALL: So we as publishers, we the community of NISO Plus, we want to encourage our authors to use persistent identifiers for digital research objects, seek out discipline-specific scientific repositories so they can help the researchers with robust metadata and provide other guidance more broadly, and make sure your ORCID is current. Your ORCID is how you are connected to all your research and how we help our authors stay connected.
SHELLEY STALL: This is us. This is us requiring the ORCID for the publication. It's us making sure our authors know which persistent identifiers and that they are on every single digital object. So that's number four. And number five, we need to make sure that we're very accurate on what the funder and grant information is that's supporting the publications and any other digital object, because that's how we can inform the funders for research on what actually is coming through publication.
SHELLEY STALL: And we know how important that is. [INAUDIBLE] is an important partner there, and others. So thank you so much for this opportunity. Workflow, metadata-- these are just the best words. And it's good to be here with all of my esteemed colleagues.
HELENA COUSIJN: Yeah, that was really great. Thanks so much for these excellent talks, Brian, Rebecca, Randy, and Shelley. You all touched on different aspects of metadata workflows, which I look forward to discussing more with the audience. So I hope that everyone will join us on Zoom, where there will be an opportunity for Q&A and for the discussion. Thank you.
HELENA COUSIJN: [MUSIC PLAYING]