Name:
Telling a story with metadata or Always drink upstream from the herd Recording
Description:
Telling a story with metadata or Always drink upstream from the herd Recording
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/7c8f6818-05ab-450f-8904-865ccbda5321/videoscrubberimages/Scrubber_2.jpg
Duration:
T00H36M58S
Embed URL:
https://stream.cadmore.media/player/7c8f6818-05ab-450f-8904-865ccbda5321
Content URL:
https://cadmoreoriginalmedia.blob.core.windows.net/7c8f6818-05ab-450f-8904-865ccbda5321/Telling a story with metadata or Always drink upstream from .mp4?sv=2019-02-02&sr=c&sig=ToKLR51wzEFCzgcCr%2BFSzrC54T1qKTL3JJjnis8fd00%3D&st=2024-12-21T14%3A32%3A32Z&se=2024-12-21T16%3A37%3A32Z&sp=r
Upload Date:
2024-03-06T00:00:00.0000000
Transcript:
Language: EN.
Segment:0 .
Hello and welcome to NISO Plus 2023 and our session telling a story with metadata or always drink upstream from the Herd.
My name is Lola Estelle. I'm a digital library project manager at SPIE, the International Society for optics and photonics. And I'll be the moderator for this session today as we speak with Julie Zhu from IEEE and Jenny Evans from the University of Westminster. Our topic today, as the title suggests, is telling a research story with metadata.
There are researchers, research outputs, publications, funders and organizations, but how can we exploit metadata to add context to how all of these players work together in the scholarly ecosystem? How can we leverage existing standards and initiatives to do so? How can we connect research output downstream? If we think about metadata early and often, we can tie research that follows to the original work and attribute appropriate credit to the author.
Before I turn it over to Julie and Ginny, I'd like to give you a simple example of the kind of thing we're talking today that we've implemented on the SPIE Digital Library. I'll share my screen briefly. There we go. Now, the problem that we encountered at SPIE is that research published by SPIE has a large outsized impact on real world outcomes, as seen by its influence in the patent literature.
But to date, that wasn't surfaced on our platform. So our goal is to better link SPIE's research to real world outcomes and allow users to trace relationships to further their research and product development. Our solution was to partner with Lens, a comprehensive resource of linked data sets and scholarly and patent works run by Cambia, which is long established global social enterprise. The patent data is now integrated for SPIE publications, and where you can find that on an example article such as this quite old one from 1992 is I can now see which citing patents correspond to this paper.
This paper led to the development of four patents, as we see here, and I can then click to the Lens their user interface. So I'm leaving the SPIE Digital Library at this point and going to Lens and it will give me details about that patent. It's funding, the researchers who worked upon it when it was filed, the patent number and so on, in order to really link the real world outcome to the original research.
So that's just a quick example to show you the kind of thing we may be talking about today. So with that, I'd like to turn it over to Julie Zhu, who is a senior manager of Discovery Partners at IEEE. Julie cultivates and manages working relationships with discovery services, link resolvers, proxy services and search engines to maximize IEEE's content in terms of find ability, visibility and accessibility and multiple discovery channels.
Julie serves in the KBART and ODI standing committees and also the KBART automation group. Thank you, Julie. Thank you, Lola. So I just want to give a quick introduction of IEEE and IEEE Xplore. IEEE is world's largest technical membership Association with over 400,000 members in 160 countries.
It has five core areas of activity publishing conferences, standards, membership and e-learning in the fields of Electrical and electronics, engineering and computer science. The IEEE Xplore platform publishes IEEE journals, conference proceedings, standards, e-learning courses and hosts third party content like over 10 e-book collections and the digital library. We have over 200 subscription packages and thus over 200 KBART title lists.
So as we all know, metadata is an essential part of publication and products, and we need metadata to display the key components of articles in publications like author related information, publication, information, article, information, access information and funding information and more. We need metadata for discoverability, linking and access, and we need metadata to track usage.
Metadata that flows through multiple systems in various pipelines from other systems, publishing systems, platforms, indexing indexers, linking resolving systems and authentication systems, library systems, user checking systems and more and forms a huge metadata ecosystem. Metadata problems can occur at any time during any stage of the process during metadata creation, enrichment, transferring and configuration.
Problems in the metadata upstream will impact the functions downstream and fixing metadata downstream can may not necessarily resolve problems upstream. So I published a post in the Scholarly Kitchen in 2019 titled building pipes and fixing leaks, demystifying and decoding scholarly information, discovery and interchange. Here's a graph to show the complexity of this piping system ecosystem.
It reminds us that we content providers, discovery service providers and libraries are all data plumbers. We need to continuously build data pipes and fix data lakes to ensure a better end user experience. There are many different types of metadata today I will only touch upon a few types of metadata from upstream.
Author-provided metadata is at the very beginning of the metadata pipeline. Almost every day I receive inquiries from anxious authors why their articles are not showing up in Google Scholar while their names are missing from the article results and why the articles do not show up in their author profiles. Why the publishing platform or Google Scholar does not show all their published articles in their author profiles, how to make the articles more discoverable and increased usage, et cetera.
A significant portion of issues may come from the author provided metadata, for instance. Misspellings or a special characters in the names may cause Google Scholar to drop the author names from Article search results even though in some countries, people have only one given name without surname. But Google Scholar cannot handle that if authors provide more accurate keywords that will improve the chances of the articles being discovered and being used.
Many authors have a similar names, initials, or even affiliations. So adding author IDs like IDs can help with the author disambiguation. So on IEEE Xplore platform. We published metadata best practices tips for authors to help improve discoverability. For instance, authors should optimize article titles.
They should create meaningful titles include including important terms, upfront thinking, search terms when adding keywords and make the title succinct. They should avoid special characters, abbreviations and overstatement. They can also optimize other names like checking, spelling and special characters. Capitalizing first and last names.
Adding ID if they have only one name duplicated. They should also optimize keywords using thesaurus terms both broad and narrow terms. Use indicative terms and try to think from the perspective of a searcher. Authors can also optimize abstracts. They can write short and informative sentences and put important terms upfront and use repeated keywords and synonyms.
At a publication level. There are many national standards that help the metadata governance during various stages for information creation and curation. And there is JATS. Journal article tech suite doubling core metadata element set and more for Information Discovery.
There is ODI open Discovery Initiative and early access licenses and indicators for content linking. There is KBART Its knowledge base and related tools, and IOTA is improving open URLs through analytics. For authentication, there's also established establishing suggested practice regarding single sign on or RA21 for journal displays on platforms, there is page the presentation and dedication of each journal for publication transfers between platforms.
There's transfer code of practice for metrics. There is sushi and all metrics, and there are quite a few more. Today, I will only focus on one of these standards. The KBART KBART files are package based title lists, providing holdings in title level, linking info for a subscription product and following the KBART specifications.
We all know the major link and knowledge basis in the library landscape. EBSCO Full Text Finder, Clarivate's SFX Alma, 360 link, OCLC's worldcat linker and a couple of others. The list of related tools have been expanding in the past decade. At the very beginning, the link resolvers served mainly catalogs and databases.
Over 10 years ago, discovery services became another important related tool, served by link resolver. Nowadays, some of the elements in part like ISSN, are also being used by search engines like Google Scholar for programs like subscriber link, casa and universal casa. So here's a list of the elements in the KBART files. I put them into three groups, general, serial specific and monograph specific.
Since publication titles are not always reliable due to misspellings, abbreviations and other reasons. And we need the publication title ID and title URL to help differentiate link resolvers need a publication IDs like ISSNs and ISBNs to store and link publications and articles missing or incorrect ISSNs and ISBNs can cause gaps in holdings and cause open URL links to fail.
Google Scholar also relies on ISSNs for its subscriber link, CASA and universal CASA programs. In KBART Phase two added parent publication title ID to link a publication to its parent publication in a series. They also add it also added preceding publication title ID to link a publication to its previous title to track publication title change history.
It added to access type to indicate a paid publication versus a free open access publication. The KBART standing committee is actively working on KBART 3 to meet the emerging complex needs for link resolvers. For instance, how to indicate hybrid journals or flipped journals. So metadata is used for three different types of linking.
Let's say in discovery services. The most traditional form of linking is openurl linking. It needs accurate metadata fields like ISSN volume issue start page for a journal article and ISSN Plus page for a book chapter when ISSN, ISBN or some other metadata fields are incorrect in either the Publisher Data or the link resolver knowledge base data.
The open URL link will break. Due to the frequent problems in. OpenURL linking a second form of linking was introduced the DOI linking. This is often more reliable than OpenURL linking, but not all articles have dois. To increase linking success. Some publishers work with some discovery service providers to create direct links using publisher specific links, including article ID or other method elements.
Whatever the linking types, the quality of metadata is always highly important. We also need to add metadata tags for search engines and social media for web search engines like a Google search. We need to add unique title, description, and canonical link URLs. We also add customized metadata tags for social media like Twitter.
Google Scholar has its own unique set of over 20 special tags, starting with citation_, like journal title article, title, volume, issue, page, publisher, author, et cetera. For Google scholars subscriber link program. We need to provide a metadata in XML format according to Google's specifications, like a journal's ISSN and coverage range for each library subscriber.
Nowadays, improving accessibility is high on many publishers agendas. We pay attention to how to use metadata to help increase accessibility. For instance, publishers need to optimize persistent identifiers, including article article URLs or ORCID ID, etc whenever possible. And publishers can also optimize PDF adding file names, title, author names, keywords in the PDF file properties and put important terms up front.
And we can also add we can also optimize images, enter detailed captions and other relevant metadata can also add metadata for software codes, data and reports. Maintaining and improving metadata quality is no easy task for publishers and for libraries for everybody.
We need to know that metadata is everywhere in all kinds of systems, and internally we need to work closely across the various teams and units. For instance, like we have separate editorial and publication teams for different content types, like journals, conference proceedings, standards, ebooks and e-learning courses. When one team creates metadata correctly, it does not mean the other teams also do it correctly.
So we need to monitor different teams and encourage exchange among teams. When metadata is created correctly, it does not necessarily mean that that data will be stored correctly in all the systems, in all the formats or the databases. So we also, for instance, like the XML files, FTP files, API files, subscriber link files. So we have to constantly monitor and troubleshoot even when we handle all the metadata currently internally.
We have to make sure that our external data partners currently pick up store and index our metadata. The frequent inquiries and reports that we receive from our partners, customers and users help us track down and troubleshoot metadata issues. So it's a never ending process. So we will have to keep building better data pipes and fix the links.
Thank you. Thank you, Julie. I love the metaphor of the plumbers and the pipes and the links. It's such a great way to think of it. I appreciate your presentation. We will move over now to Jenny Evans. Jenny, is the research environment and scholarly communications lead at the University of Westminster.
Jenny's role includes responsibility for scholarly communications, research, integrity and ethics strategy and policy research, information management systems, and leading a team of subject matter experts. Jenny, we're happy to hear from you today. Thank you very much, Lola. And thank you, Julie. I will just share my screen.
Oh, no, that's OK. OK can you see my screen? OK amazing. Thank you very much. So I think, interestingly, my story today is a little bit different, but actually I do certainly use that analogy of plumbing when I talk about metadata. The communities I work with.
So thanks, Julie, for introducing that as an idea. It's certainly something that is common across our two stories today. So my story today is talking about a very different community of researchers and research, that of Arts and humanities primarily and the repositories landscape. So I guess my kind of presentation today is a nice kind of complimentary contrasting presentation to what we've just heard from Julie.
But I think what's been really useful. Thank you, Julie, is you're explaining that discoverability landscape because it really does matter to the research and the researchers and practitioners that I work with. So today's presentation is based on primarily an art and Humanities Research Council funded project. So thank you for funding this call to practice research voices or voices.
Just to start though, before I forget. So I'm based at the University of Westminster, we're based in central London. We've got four campuses around the area, over 19,000 students. We're very much research engaged, research engaged University. We the primary sort of focus of our research disciplines is arts, humanities and social sciences. Though we do have some scientific research which which is really why we've got to where we are with this research, because it affects so much of our research community.
It's become something I'm very passionate about and want to fix and change a work with community. And I guess that's the other thing. My this is very much a kind of this is what we've found. We really want to hear from the community about what you think, whether you disagree. That's absolutely fine. We're open to all ideas, but this is kind of our kind of oversight of what we've been doing.
So I'm going to talk quickly about this idea of non-traditional outputs, speeches and definitions, building the foundations for our projects, what the project was, what we found, our recommendations and next steps. So this idea of non-traditional outputs and non text outputs, I thought it was worth starting with this. So text based traditional outputs, those ones that Julie was certainly talking about earlier tend to be static.
They tend to be one file off in a PDF and they're very kind of grounded in a research recognition landscape that sees that products of research as the primary focus and goal, which is comparability and preservation. So kind of the other side of the story are these traditional research outputs. They aren't always non text, but quite often they are. So you can have text based outputs.
They attempt to represent performance or other live events. So actually what you're capturing is a remediation of the actual event rather than the output or event itself. The research element can often be embodied in the work. So so perhaps in a video or it might need evidencing by a narrative process. As significant as products in this work can be time based, it can be changed.
It can change over time, or it can be added to over time, and also often involves a range of contributors that really deserve. And the community really wants those contributors that can be collaborators, that can be participants, and there's a need to recognize this research. Just a definition we use very much in the UK at the moment, although we're trying it out with other international communities to see what they think, say what they say.
So in the UK there has been quite a lot of conversation going on about these non-traditional non text outputs. There were two reports funded authors by James Bulley and, Ozdin Sahin at Goldsmiths University in London, known as now is the UK report commissioned by the practice research advisory group in the UK, funded partly by research England, I think maybe just two.
And they looked at this idea of what practice research is and how could it be shared. So this definition of practice research is an umbrella term. So actually practice research can happen across all disciplines. This idea of capturing process, you may have heard of things like practice based research, practice as research. It really depends on the discipline.
But practice research does happen in the sort of Sciences as well, in medicine and nursing, for example. And this idea of a narrative is really helpful as well. And this sort of that might be conjoined with or embodied in that practice. And that articulates the research inquiry and practice research. It's not often practice researchers will publish in traditional text based outputs, but the process itself is not necessarily easily defined.
It needs some kind of explaining. So building the foundations for practice, research, voices. So but between 2017, I guess, and really last year to build a repository at plu, now I use repository that meets those bare principles findable, accessible, interoperable and reusable. But it puts this practice research at the core. So we built it embedding practice research rather than retrofitting as many repositories have.
We work very closely in co-design with art and design and architecture community, and we developed this single repository captures and captures publications and data and practice research outputs and the narrative. This was very much a University of Westminster approach to the research excellence framework assessment exercise that was submitted two years ago now. So it was one view on how you might capture this research.
What does it look like in what does practice research look like in our repository? This idea of non text outputs such as artifacts, this idea of an exhibition output that where dates are really important, multiple dates, and this idea of a dynamic collection or portfolio that collects together and changes over time. Contributor and collaborator roles are really important and the language reflected the discipline.
So we've got specific templates that are named creator rather than author, that we've got commissioning body. In addition to publisher media type as part of the output rather than being the outputs description rather than abstract. The collaborated lists were quite specific, so you could be clear about what the collaborator role was. So we were able to do this.
This did map to a certain extent, and some of it via internal crosswalks within the repository did map to open standards as best they could, but we identified quite a few problems. So the practice research advisory group UK reports this was the next part of this Foundation work identified this idea of a single practice research item type called project, which represents this multi component nature I've spoken about already some of the things it could include, which kind of very much mirrors what we have in our repository and this idea of use of addition.
What was really helpful about these reports was that it published and gave us a framework to build on the discoverability layer or landscape. I should say that Julie has referenced already Google Scholar only finds these PDFS less than five megabyte. Obviously, if you're talking about non text files, they're going to be bigger.
So that's really problematic. The persistent identifier metadata standards landscape has improved over the last few years. So it is getting better, but often it still is. This research fits into other or best bits. And for me personally, we working by now, we're working with JISC and colleagues at kaios on this work in the UK. It doesn't reflect the complexities of practice, research and actually it was very hard for me to sell if I was talking to one of our researchers.
It's very hard to sell the benefits of engaging with persistent identifiers such as ORCID, because they just couldn't see how their research fitted in. So last year at NISO Plus 2022, a year ago we did a presentation. I think we'd been funded, but we couldn't tell anyone at that point. And we kind of put a call to action to the community to balance what we needed to do as a community to bring together some of these case studies by changes.
We focused at that point. So we had this lofty expectations that in a six month scoping project, we could cover all of these standards. We were a bit more realistic when we started and we ended up focusing on data site, the research activity identifier, which I should have spelled out, apologize RAiD, which is started in Australia by the Australian Research Data Commons and the CRediT contributor role taxonomy.
So for this community, credit was assigned is that the right words are published as a standard. I'm not sure what the right word. There is. Two years ago now, I think lost track. And actually the RAiD identifier has recently been approved as standard as well. So I think that's really helpful. We're in a really good place to move this work forward.
So practice research voices, which we generally just refer to as PR Voices for short, because it's otherwise it's a bit of a mouthful. So what were we doing? So the arts humanities Research Council was basically funding infrastructure in the arts and humanities future data repositories. So we're scraping this idea of a national practice research repository using the work we've done at Westminster.
But really a key, key part of this work, there were two kind of key parts of this work, the moving to this repositories discoverability and interoperability landscape in which practice research disciplines are embedded rather than afterthoughts. This idea of other but also really, really important bringing together all of these voices to form this practice. Research voices. Choir thank you Adam Biles Moore, one of my project coaches for our brilliant project name and actually working with them.
So, so there's a real need to work with these communities and this, the idea of community empower voices is really important. So I guess I've spoken a little bit about that already. So the idea of the repositories piece, we actually looked at what we've done and I looked at identifying and articulating the Cayuse schema, and then we were referencing and kind of comparing that against the British library's shared repository service, really recognizing that practice research doesn't just happen in institutions, it happens in galleries, libraries, archives, museums.
They can be independent researchers as well. This metadata from persistent identifiers. So I've just mentioned the three kind of. Standards that we focused on. And finally, this practice research community of practice. What is so interesting but also challenging is that there are so many different communities involved in this, from archivists and curators to librarians, repository managers, research data managers, software developers.
There's so many people involved in this conversation, so bringing those people together was a real focus. We'd done a lot of work informally with these various communities over the year, but I felt like I was talking to each individual community rather than having them in one space. So our findings and recommendations with a little sparkle. I'll explain that in a minute.
So that ongoing community engagement is key to success. I would say also as the lead for the project, we were not as inclusive as I'd have liked us to be. We did work with an expert, an external advisory group who are amazing and thank you all if any of you are watching this today. But if we'd had more time, we would have reached out further. So actually what's really interesting is this idea of moving repositories to a place where they're not simply a retrospective archive or collection of outputs, but actually capturing process alongside our project, a project being carried out.
Open standards must underpin this, but existing open standards rather than trying to create new ones. But there's a real challenge here. The skills and the complexity in capturing this research and a repository is not as straightforward as uploading a PDF of a journal article. So our recommendations around continuing the co-design and embedding this practice, research and standards, training and skills, and that investment in capacity.
There's not enough people around working on this area. So we ran a workshop metadata, persistent identifiers and a taxonomy. So a huge thank you to community representatives we had from CRediT, DataCite, and RAiD who came and gave a presentation even. Natasha Simons at the Australian Research data Commons was able to record something for us. So some of the questions that were identified as part of this, there are some updates to DataCite, CRediT we look forward to working on.
We realized credit's not quite ready for us to be talking about going beyond scientific disciplines, but there's work there. The idea and we asked the questions and particularly thank you James Crowe for the British Library is a portfolio the same as collection? And that's been a key kind of outcome. And actually the idea of a portfolio that changes over time maps actually better to RAiD than it does to a DOI.
So what we kind of we've come up with that's in our final project report, which we hope to publish soon, is the idea of the PRVoices framework. So we've identified this need to capture individual objects on an ongoing basis, bring these together into a collection, and then a portfolio or theme builds on the idea of collection and overlays, narrative and context. Just a quick note about the little sparkle.
So thank you also to the sustaining practice assets for research, knowledge, learning and education project team. They were also funded on this call there another practice research call led by the University of Leeds and some other findings. We had some they did they had a very different approach. They their findings really did kind of align with most of ours, but they had some key differences.
But they did pick up this idea of using that the IIF, the international image interoperability framework, I suspect I've got that slightly the wrong way around as a key standard and also potentially the Oxford common file layout specification. So there are some other standards relevant to this work.
And finally on the next steps. So this is our diagram. I just wanted to highlight this about this idea of the practice, research, community practice. Now, what's so interesting and challenging, as I said before, is all of the different communities with a stake in this work. So we've got the Australian Research data commons, we've got RAiD, CRediT, crossref, DataCite, ORCID, NISO, so taxonomic communities.
So, so at the moment, the kind of richness that's being captured by the discipline communities is being lost in this kind of interoperability metadata landscape. UK reproducibility network publishes research as software developers, practitioners, repository managers are groups of repository managers such as UK core in the UK research libraries, UK research Management Association in the UK.
There's just so many people who care about this work and want to make it better and want this community to benefit from that interoperability and discoverability landscape that they don't at the moment. I think what's heartening is that there is work being done in the metric tied revisited report was published recently in the UK and thinking about this idea of recognizing a diversity of research outputs. The Royal Society in the UK has published its resume for researchers, which also tries to broaden out what's recognized in that scholarly landscape.
And so, so I think there's, it's a good time for this work to go forward. There is hope to expand this. There's quite a lot of work to be done. There's a lot of technical culture changes, a lot of community culture changes as a kind of generational piece of work won't happen overnight. But really getting to a place where it is an equal landscape, where this idea of non-traditional research outputs doesn't exist any longer, is our goal eventually in collaboration with community.
And that's it for me. That's how our practice research voices story. Thank you. Thank you so much, Jenny. I hadn't previously considered the links between metadata and Community Engagement and how standards can help to capture the actual process of research. So very interesting topic and I look forward to discussing it more as we now begin to move from the recorded part of our session into the live discussion portion.
So we will see you all there. Thank you.