Name:
The “nested triangle” of metadata supply for OA books 2-NISO Plus
Description:
The “nested triangle” of metadata supply for OA books 2-NISO Plus
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/6acf17ad-7307-406a-b8bf-4999eb71d404/thumbnails/6acf17ad-7307-406a-b8bf-4999eb71d404.png?sv=2019-02-02&sr=c&sig=Y5b3KyoUSM1cQDThcwYy7dNWWQgMtHPjQECvIS3rZrE%3D&st=2024-11-23T15%3A35%3A23Z&se=2024-11-23T19%3A40%3A23Z&sp=r
Duration:
T00H30M12S
Embed URL:
https://stream.cadmore.media/player/6acf17ad-7307-406a-b8bf-4999eb71d404
Content URL:
https://cadmoreoriginalmedia.blob.core.windows.net/6acf17ad-7307-406a-b8bf-4999eb71d404/The %e2%80%9cnested triangle%e2%80%9d of metadata supply for OA books 2-NISO.mp4?sv=2019-02-02&sr=c&sig=iAJSPF6yqBg8zMtoYu6Km39ZY6ygvAhV1cxukEByqp8%3D&st=2024-11-23T15%3A35%3A23Z&se=2024-11-23T17%3A40%3A23Z&sp=r
Upload Date:
2022-08-26T00:00:00.0000000
Transcript:
Language: EN.
Segment:0 .
[MUSIC PLAYING]
BRIAN O'LEARY: Hello, and welcome to today's session on the Nested Triangle of Metadata Supply for Open Access Books. I'm Brian O'Leary of the Book Industry Study Group, and I'll be working as a moderator today. This session addresses an issue with respect to academic librarians who want to make research outputs, including Open Access books, fully discoverable for their users and communities in order to develop open and responsible research environments.
BRIAN O'LEARY: And the key to enabling that is high-quality metadata that can be ingested into a variety of content management systems and online discovery interfaces. To talk about that issue and the opportunities today, we have three participants, Ronald Snyder of the OAPEN Foundation, Emma Booth of the University of Manchester, and Diane Rasmussen Pennington from the University of Strathclyde. Welcome today, and I think we're turning first to Emma to hear from you.
EMMA BOOTH: OK, just going to share my screen. OK, there we go, right. So welcome to our presentation on the Nested Triangle of Metadata Supply for Open Access Books. In this discussion, we will be bringing together the perspectives of a metadata specialist from an academic library, an high school educator academic and researcher, and a representative from Open Access Books provider to examine the metadata supply chain for Open Access books.
EMMA BOOTH: We hope to open the discussion to representatives from content aggregators and library management system knowledge-based providers as well so that we can begin to find solutions to the barriers preventing an optimal dissemination and discovery experience for Open Access books. First of all, we'd like to acknowledge the increasing importance of Open Access publishing as part of the scholarly ecosystem and the benefits it brings to academics, research communities, and the wider world.
EMMA BOOTH: Open research contributes to the development of a richer, increasingly diverse, and more accessible world of scholarly communication where the number of potential contributors to research is expanded by improving access to its outputs. Many academic institutions and universities are therefore making a priority commitment to the principles of open research and are supporting their academics with disseminating their research for Open Access publishing.
EMMA BOOTH: This includes making a commitment to facilitate the publishing of Open Access books, the University presses, and academic publishers, and also to making Open Access books freely available online on Open Access E-book platforms, such as the Directory of Open Access Books and OAPEN. Librarians at these academic institutions and universities want to make the outputs of scholarly research fully discoverable to the library users, including books that are published as Open Access as making this content discoverable at established library search interfaces helps to promote the development of open and responsible research and learning environments.
EMMA BOOTH: The key to making Open Access books discoverable in a library's online catalog or discovery layer is the inclusion of descriptive bibliographic metadata for these books in the library's content management system. The incorporation of this metadata is not only fundamental for making sure that Open Access books can be found and accessed by students and researchers, it also enables course leaders and instructors to identify and link these resources to online reading lists and virtual learning environments, therefore, ensuring that Open Access resources are embedded in teaching.
EMMA BOOTH: This metadata can also be leveraged by various browser extensions, such as Lean Library and LibKey Nomad, which help to streamline the full-text access experience for library users through in-browser popups. At present, there can be a perception that all Open Access books are inherently discoverable and accessible to all because many of them are available digitally as DRM free E-books that can be shared online.
EMMA BOOTH: However, for librarians, it's not always a straightforward process to make Open Access books fully discoverable to library users, as they tend to be poorly served by traditional book acquisitions and metadata supply chains which often prioritize paid-for and subscription content. Fundamentally, in order to ensure that any book is discoverable to library users, academic libraries need to have access to high-quality descriptive metadata for that book-- metadata that is interoperable with the library's content management system and can be effectively indexed for searching via their online discovery system.
EMMA BOOTH: The negative impacts of poor metadata provision can be seen on screen here, showing that a lack of reliable, high-quality metadata creates a poor discovery and research experience for academics and students, potentially leading to low usage of library resources. Poor metadata provision has hidden costs for libraries as it often means that they must devote extra resources to manually correcting errors or upgrading metadata from their suppliers.
EMMA BOOTH: It can also replace extra demand on customer services staff who need to provide additional support to library users to help them with resource discovery. With Open Access books, the situation can be compounded, as librarians often have to prioritize their workloads to ensure that paid-for or subscription content is fully discoverable and accessible first, as this content directly affects how much perceived value for money the library receives from its content budget expenditure.
EMMA BOOTH: So librarians are looking for an efficient and reliable method for ensuring that quality metadata for Open Access books can be supplied to them. So what are the options currently? At present, there are three main routes that can be used by librarians to source descriptive metadata for Open Access books. Firstly, librarians can retrieve metadata directly from Open Access publishers or content providers, such as OAPEN or the Directory of Open Access Books.
EMMA BOOTH: This metadata is usually available for free and can be shared and reused widely under a CC0 license. But retrieving and ingesting it into a library management system is largely a manual task, which can be time consuming for librarians, particularly if the metadata needs to be transformed or enriched in some way prior to or during ingestion. The second method is for librarians to retrieve metadata from a content aggregator.
EMMA BOOTH: This metadata is usually only available as parts of a paid-for or a subscription service. So while some automated delivery options are available to streamline the ingestion process, there are often financial costs associated with this route. Finally, librarians can make use of metadata from their library management system provider. This is usually in the form of a proprietary knowledge base or a centralized discovery index that is only available to paying customers of that system provider.
EMMA BOOTH: This type of knowledge base is often managed externally to the library by the system provider. So whilst librarians are able to quite quickly activate metadata for discovery, they often have much less control over its quality, accuracy, or completeness compared to the metadata in their own internal systems. These supply mechanisms form a nested triangle. A source metadata provided by Open Access publishers and content providers is ingested by aggregators and system providers as well as libraries.
EMMA BOOTH: Metadata can also move between these stakeholders depending on their commercial relationships. Within the nested triangle of Open Access books metadata supply, there are unfortunately several issues or roadblocks that can affect each of the stakeholders as metadata travels between them. These roadblocks can be categorized into three main areas-- metadata quality or, more precisely, a lack of it, metadata ingestion and deduplication, such as problems with converting metadata between different formats, or issues related to matching and merging metadata from multiple sources, and, finally, metadata openness, or more accurately, metadata that is not open, as it is restricted from being shared and reused.
EMMA BOOTH: Let's look at each of these in more detail. Existing research into library requirements for book metadata by Jisc, OAPEN, the National Acquisitions Group, and SUPC, a purchasing consortium in the UK, have found that librarians are looking for completeness, accuracy, and timeliness of delivery when assessing metadata quality. This research has also shown that libraries require metadata that contains sufficient descriptive elements to enable their users to find, identify, select, obtain, and explore library resources.
EMMA BOOTH: These metadata elements are titles, names, dates, book identifiers, including publication information, ISBNs, uniform resource identifiers, or DOIs, and subject terms, such as keywords from an abstract or a contents list or, Ideally, structured subject headings from a recognized structured vocabulary, such as Library of Congress subject headings for book industry standards and communications codes. Librarians also need information about the type and format of the content being described and any pertinent access rights or restrictions, such as an Open Access license or copyright statement.
EMMA BOOTH: Finally, librarians need electronic access information, such as the URI for the E-book provider platform through which the Open Access book has been made available. In terms of metadata ingestion, libraries require metadata that is standardized and interoperable with their library management systems. As most academic libraries create and manage bibliographic metadata in MARC record format, they need metadata for Open Access books to be supplied either as MARC records or in a standardized XML format that can be easily and consistently mapped across to the MARC standard.
EMMA BOOTH: Some libraries are also able to handle metadata in Dublin Core format, but rarely have systems that are set up to ingest or process ONIX metadata. Libraries can use typed unlimited, CSV, or KBART files to perform title matching within existing metadata in an aggregator or system provider knowledge base. However, the success of this is very much determined by the quality of metadata within these knowledge bases, as both the knowledge base and the data file must contain enough matching points for successful identification and activation of relevant metadata.
EMMA BOOTH: A lack of optimum quality metadata in either source can negatively affect title matching. And this can lead to problems with record duplication. As libraries, aggregators, and system providers ingest book metadata from multiple sources, there's potential for duplication within their systems, particularly for Open Access books that are made available by multiple providers or across a variety of online platforms.
EMMA BOOTH: It is possible to perform deduplication upon this metadata, and some libraries do this within their library management systems. Others rely on match and merge algorithms in their discovery layer to deduplicate search results for library users. Any such method of deduplication requires that there be sufficient metadata matching points between the data sets to correctly identify the duplicates.
EMMA BOOTH: In the screenshot here, you can see that multiple entries for the same E-book are being displayed in the library discovery layer. The first entry is from the metadata supplied by system provider knowledge base. And as it is incomplete and does not contain enough detail or accuracy to successfully match and merge with the other entries, the user is presented with duplicated search results.
EMMA BOOTH: The system provider record also does not include an Open Access indicator, so it would not be included in any search results that have been filtered for Open Access content only. This duplication of search results is confusing for the end user. They cannot easily determine which version of the publication they should access. So for libraries that rely on system provide a knowledge basis as the only way to make resources discoverable, there will inevitably be a lack of clarity for library users if the quality of that metadata is not up to standard.
EMMA BOOTH: This means that libraries have to log support tickets to system providers about issues with knowledge-based metadata. One system provider has clarified that metadata quality can vary in their knowledge base due to the way it's ingested from content providers. Even if the system provider is supplied with MARC, ONIX, or Dublin Core metadata, they can only ingest this data into their knowledge base if it is transformed into the KBART format.
EMMA BOOTH: And transforming the source metadata in this way can result in certain elements being stripped out, such as subject headings, abstracts, and contents lists. So system providers ingest basic metadata first and then aim to enrich it later. But this can take time. And often, libraries have to flag up these particular issues with the system provider before a certain collection will be prioritized for enrichment.
EMMA BOOTH: Deduplication of metadata within knowledge basis and discovery layers simplifies the search experience for library users, but it can also lead to an extra roadblock in the metadata supply chain. This is metadata openness or the ability for any stakeholder in the supply chain to freely share and crucially re-use metadata. Metadata openness can be inhibited if any one creator or manager of metadata places restrictions upon it.
EMMA BOOTH: If the metadata that is supplied by an open access publisher is open by design with a CC zero license, it can still end up under a usage restriction if it becomes merged or duplicated within a system or knowledge base that contains metadata from another more restrictive source. Equally, if an aggregator or a system provider places limitations on their library customers as to how they can use metadata from their proprietary knowledge base, then the metadata can end up siloed and restricted, limiting its future usefulness.
EMMA BOOTH: For example, some system providers may only permit a library to publish knowledge-based metadata to discovery services and will prohibit them from sharing metadata with any system or service that might enable it to be downloaded and reused by other libraries or individuals. For libraries, this means that cooperative cataloging and metadata sharing initiatives for the collection evaluation or research purposes can become limited.
EMMA BOOTH: I will now hand over to Ronald to provide some insights into the perspective of an Open Access books provider.
RONALD SNYDER: Thank you, Emma. So how does this look from the perspective of an Open Access books provider? Well, the OAPEN Foundation promotes and supports transition to Open Access for academic books by providing open infrastructure services to stakeholders in scholarly communication. We work with publishers to build a quality controlled collection of Open Access books, and we provide services to publishers, libraries, and research funders in the area of hosting deposit quality assurance, dissemination, and digital preservation.
RONALD SNYDER: As a matter of fact, I'm also uploading a few thousand books to Portico at this moment. The OAPEN Foundation manages two Open Access books platforms, The OAPEN Library and the Directory of Open Access Books. The OAPEN Library hosts a collection of over 19,000 full text books and chapters. And hopefully, it will get to the 20,000 mark this year. And the DOAB or the Directory of Open Access Books, it's a discovery service.
RONALD SNYDER: So instead of hosting the titles, it only lists metadata, and it points the user to the place where the book can be downloaded. This might be a download from the OAPEN Library, but also, in most cases, from the publisher's website. And currently, we have over 48,000 descriptions in the DOAB. And so to enable maximum dissemination, we provide metadata in several formats all under a CC0 license.
RONALD SNYDER: In other words, our metadata is in the public domain. Our platforms are also based on repository software, so the contents can also be harvested by those who like to do it that way. Next slide, please. To give you an indication of the collection of both the OAPEN library and the DOAB, especially the growth, which has been substantial over the years-- and, actually, even in the first three weeks of this year, several hundred titles have already been added to both collections.
RONALD SNYDER: So next slide, please. Well, furthermore, many libraries already use the collection of OAPEN and DOAB. For instance, when we look at the Download Data for the OAPEN Library, we see over 1,011-- sorry-- 1,100, I should say, different library users from all over the world. Some of the Downloads originate from aggregators, as you can see.
RONALD SNYDER: Others are directly connected to the library catalog. And furthermore, it's also interesting to note that the Library of Congress is using the contents of the DOAB to collect and add Open Access versions of books to their collection. Next slide, please. OAPEN and DOAB, we're not the only providers of Open Access E-book metadata or Open Access E-books, period. In recent years, we have seen the development of Knowledge Unlatched, a commercial provider who opens the Open Research Library.
RONALD SNYDER: And more of a more recent date is the ScholarLed collective, which are building the THOTH application. The titles from both providers are also listed in OAPEN and in DOAB. Next, please. Let's talk about metadata quality. When we look at the metadata aspects from our perspective, from the OAPEN perspective, we could conclude that many things go well.
RONALD SNYDER: The OAPEN Library and DOAB provide metadata in several standard formats. And all titles are described using the relevant metadata elements that were mentioned before. However, our MARC XML export does not fully comply with NAG standards for electronic books. So this is something that we are working on, and we'll plan to fix this year.
RONALD SNYDER: Next slide, please. Regarding metadata ingestion and deduplication we automate the metadata ingestion. We have made agreements with several aggregators and system providers, and we are in regular contact with them. And of course, this helps to solve problems. There is one major issue, though, that makes us sometimes a bit uneasy.
RONALD SNYDER: Both DOAB and OAPEN provide books, and they provide chapters, and especially when only several chapters from a book are made available in the Open Access instead of the whole book. And sadly, we see that the ingestions of chapters is not always optimal. So this is one of those things that might be improved. And on the next slide, we go to the metadata openness.
RONALD SNYDER: Well, as mentioned before, all our metadata is provided under a CC0 license. So I am not sure how we could improve beyond what we do at this moment. Now, I think I'd like to ask Diane now to go to the rest of the presentation. DIANE RASMUSSEN
PENNINGTON: Great.
PENNINGTON: Thanks, Ronald. And if you could go to the next slide, please. Thanks. So I'm an academic. I teach cataloging and metadata, and I do a large amount of my research in the area of metadata quality. If you go back to last year's NISO Plus presentation from 2021, it still should be available up there if you want to see of the wide breadth of areas that my students and I do research in the area of metadata quality and how many different things it can mean when we talk about metadata quality.
PENNINGTON: And Emma and Ronald have just given you an excellent example of one small area, which is just the Open Access books metadata, and you can see how much is involved there within the supply chain. But what I would like to share with you here is just one piece of research from my PhD student who finished in 2020, George McGregor, now Dr. George McGregor, who did his PhD in the area of resource discovery, and metadata, and digital environments.
PENNINGTON: And he worked in this area for years. He's actually one of our own institutional repository managers in the library here at the University of Strathclyde at my University. And so he was very informed by his personal practice and his day job. And he's done years of publication and empirical work in this area of metadata alignment. So I'm just going to show you this model, which I will go through and explain the different pieces because I think that, even though Emma and Ronald have given you such a great example of one area that we need to be concerned with concerning this type of metadata, I think that this illustrates even more based on empirical work and based on his personal practice how complicated this really is and how much work it is for us to get all these different pieces of data together to eventually get the user to whatever it is that they need to get to.
PENNINGTON: So I'll just go through explaining these different layers as he's called them so that you can have some more context. He's talked about data or metadata alignments in terms of horizontal, vertical layers to show that there are different levels of understanding between the systems and how all these different layers ultimately impact whether the users can discover whatever it is that what they need.
PENNINGTON: And he put the word meta in parentheses, specifically because it's not just what we normally think of as metadata as librarians. the not just descriptive metadata, structural metadata, administrative, and those kinds of things, but also different kinds of data structures, whether it's controlled vocabularies, application profiles applied to certain types of metadata, as well as just the different kinds of preferences that different discovery services introduce into systems.
PENNINGTON: And they all have slightly different ways of doing things. And as we know, even though we have metadata standards, that sometimes, the way things are approached, even though we have these just to cover different ways of describing things and so on and different ways to get users to what they need, they don't always match up the way we would like them to, even when they are supposed to be implemented as standards. So he's got the collection-based services at the very top of this upside-down pyramid, I guess you could call it.
PENNINGTON: These are things that offer things like the collaboratively-generated collection descriptions, the collection-level metadata, which is-- this is where the top level, the broadest level of the process of resource discovery because it basically allows the user to figure out the entire body of what is available, what collections are available. I mean, you have to need to be able to access that, or if you don't understand it, to at least know what collections are available before you can go down specifically all the way down to the item level.
PENNINGTON: So the next level is what he called the distributed service layer, which are the content aggregation systems, distributed digital library systems, Federated searches, digital services that clump things together in hopefully useful ways, and so forth. And these are meant to be identified as a result of the interactions with the collection-based services as possible results to routes-- or sorry-- not results-- routes to discovering content at the item level-- so going on a bit further down in terms of understanding the landscape of the information that's available.
PENNINGTON: Underneath that is the semantic interoperability label, which includes terminology services, concept linking tools delivered via linked open data or some sort of semantically aware middle layer. So we're hoping that the semantic layer, I still think, needs a lot of improvement. And that's one of my specific areas of research. But we do need to improve on this quite a bit. Principally concerned, however, with lining user queries entered at the distributed service later with the service layer below.
PENNINGTON: And so then there's the service layer that goes underneath that. And then ultimately, at the very bottom there is the discovery layer. The discovery layer is the final conclusion resolution of metadata alignment-- whatever you want to call that-- in the sense that all prior layers of the model need to be in place and are hopefully, syntactically, and semantically interoperable.
PENNINGTON: And of course, we know that that's not always the case, but we're working towards that. And within all of those data structures working together, that will hopefully then provide the best discovery experience possible for the end user. And there's a lot more involved in the discovery layer other than just the metadata. Obviously, there's the human computer interaction, how well does the search system work in terms of the kinds of searches that can be used and how the results are presented back to the user.
PENNINGTON: What does that mean for their personal information management? Are they able to order results, arrange results, filter them, sort them in whatever ways are meaningful for them, et cetera? And then from there, does it-- it's not like a layer itself, but it's kind of the end result of those top four layers. And then those ultimately come down into this process of discovery. And so how are all those upper level layers structured, harnessed, processed, and put to use within the discovery system?
PENNINGTON: And how that is ultimately presented to users that will help their interactions with the resources and, obviously, the metadata which hopefully leads them to the resources while also trying to minimize the cognitive load of the user as much as possible. So if you would like to get more detail on this model and the thinking around it and the empirical work around it, I've given you there the URL to access his full dissertation, which, of course, as it should be, is available Open Access through the University of Strathclyde Repository.
PENNINGTON: So now, I think we'll close the formal part of our presentation. And we'd like to have plenty of time for discussion. So we'd really like to hear from you on your thoughts on what we've presented today, maybe what you're doing locally, if you have other practices you'd like to share with us, or any questions that you have or concerns that you have locally at your whatever institution or organization you represent and whichever part of the sector you represent.
PENNINGTON: So thank you very much for your time and listening to us today. And we look forward to a great conversation with you. So Thanks very much.
BRIAN O'LEARY: Thank you, Diane. And I think it's also important to recognize that the work that you've all done, Ronald, Emma, and Diane in preparing this presentation and sharing your thoughts with us today. We do have an opportunity to take questions from the folks who are in attendance today. And why don't we start there? [MUSIC PLAYING]