Name:
Metadata and discovery
Description:
Metadata and discovery
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/bafefbab-cc8d-4ee7-b3e2-017aa246347d/videoscrubberimages/Scrubber_1.jpg?sv=2019-02-02&sr=c&sig=p%2FPmBGai21pmpd3OWR25pmNk48V7cxgpouBGU7b%2B8lY%3D&st=2024-12-21T13%3A00%3A30Z&se=2024-12-21T17%3A05%3A30Z&sp=r
Duration:
T00H55M37S
Embed URL:
https://stream.cadmore.media/player/bafefbab-cc8d-4ee7-b3e2-017aa246347d
Content URL:
https://cadmoreoriginalmedia.blob.core.windows.net/bafefbab-cc8d-4ee7-b3e2-017aa246347d/8 - Metadata and discovery-HD 1080p.mov?sv=2019-02-02&sr=c&sig=V9rIjpFMBU1QUX2iMjiK57T%2BmLiC3Oz4sA%2Fq8STRPzY%3D&st=2024-12-21T13%3A00%3A31Z&se=2024-12-21T15%3A05%3A31Z&sp=r
Upload Date:
2021-08-23T00:00:00.0000000
Transcript:
Language: EN.
Segment:0 .
PETER MURRAY: Good day, everyone. And welcome to this session on Metadata and Discovery. My name is Peter Murray. And I will be the moderator for this session. During this session, we have two topics. The first is the NISO Open Discovery Initiative Recommended Practice. On this topic, we have Ken Varnum and Geoff Morse. Ken is the senior program manager at the University of Michigan library.
PETER MURRAY: Geoff is the interim head of research services at Northwestern University libraries. And Ken, we will start with you. Thank you, Peter for that introduction. And I will get us started on Geoff's and my contribution to this panel-- What You Can Do to Help Promote Transparency in Discovery-- and Why. Next slide, please.
PETER MURRAY: So Geoff and I have been involved with ODI for varying lengths of time. I'm an old timer in the group, and started out with nearly at ODI's inception in 2011, where Geoff joined us a little bit more recently, about three years ago. And the two of us have been very pleased and proud to bring a library perspective about other librarians to this group of people interested in transparency and discovery.
PETER MURRAY: Next slide, please. So in our talk, we are going to give a brief overview of the Open Discovery Initiative to make sure that we all have a common understanding of what the goals of the ODI are, and what we have been up to in the most recently revised recommended practice of the Open Discovery Initiative. We're going to spend the bulk of our time talking about Metadata quality and why it is important to the effort and the goals of the ODI.
PETER MURRAY: And we'll wrap up with talking about how you-- no matter what your role as a librarian, as a content provider, as a discovery provider, or just as an interested member of the information ecosystem-- might be able to bring to the ODI and to its efforts. So I will go into a very brief summary, a review of what it is that this is in the first place. The Open Discovery Initiative was first proposed way back in 2011.
PETER MURRAY: So it's nearing a decade at the Summer Conference of the American Library Association. And this was, if you can cast your mind way back a decade, this was as these web-scaled discovery indexes were just appearing on the horizon. Serials Solutions at the time, had just announced the Summon product. There were other competitors coming from EBSCO and OCLC, which was really the first huge, international effort to pull together all in their lofty goals, all content that might come from publishers of journals or e-books or any other source, the indexing provided by Abstracting and Indexing services, and library catalogs themselves, into one integrated index, to try to solve the problem that many, many of the information-seeking world, those of us who use information-- I want to find everything all at once.
PETER MURRAY: And of course, this while a noble goal, opened up a lot of questions and a lot of concerns among librarians and others. What is in these indexes? How is it indexed? What data appears? If I license a content source in one place, and I going to find it in my discovery index, if I do license one?
PETER MURRAY: All of these things-- and so ODI was formed to try to pull together representatives of libraries and content providers and discovery providers, to hash out some very basic foundational information about what is in a discovery index, what gets passed in, what gets sent out to the end user, what is used in indexing. So this group met over the course of several years and released its first version of the recommended practice in 2014.
PETER MURRAY: And at the time that the original working group finished its work with the recommended practice, it created a standing committee to oversee, observe, and eventually take on when it was ready, when there seemed to need a new version of the recommended practice that might address emerging concerns. That's happened in 2017. When enough time had passed and enough change had happened in the discovery ecosystem, that it seemed that there were some new concerns and new issues, as well as a few things that needed to be revisited from the original recommended practice.
PETER MURRAY: That launched a three-year process in a very similar way to revise the recommended practice. And a new version came out in 2020 just last summer. So what is the ODI trying to accomplish, and particularly this latest revision in its focus. Hit the next slide, please, Geoff. It aims to define ways that libraries, consumers of contents, on behalf of their users, can assess participation in discovery services, to affirm how content is used.
PETER MURRAY: So if you do subscribe to a vendor's indexed product and you subscribe to a discovery product, where is the overlap in those two things? What do users actually see? And what is actually working under the hood to influence the relevance rankings of the search algorithm? We aimed to streamline the process and standardize the process, through which content providers provide their metadata or their content to discovery service providers.
PETER MURRAY: And a couple of topics that were near and dear to librarians' hearts in particular, is to define models for fair linking, from discovery services back to publishers' content. And this hearkens back to those original conversations a decade ago, when everything was new and everything was a black box, but concerns about what, if any backroom deals might be cut to privilege one vendor over another, so that it was not clear what the library's role in providing access to the content source that they wanted their library's users to see was.
PETER MURRAY: And then finally to try to set some guidelines and standards on usage statistics for content providers to understand the impact of their indexing on discovery and usage, and for libraries to also understand what happens when users use a discovery service, how does that drive traffic. And how does that drive users to the services that are providing the content, so that libraries can most effectively manage their own budgets and understand where content and indexing are coming from?
PETER MURRAY: Next slide. So I've talked a lot about the members. This is the representation, the most recent version, the 2017 version of the Open Discovery Initiative, the standing committee. We have representatives from the library world, from content providers, and the three big heavy hitters and the discovery provider space.
PETER MURRAY: And before I turn this over to Geoff, I'd just like to summarize in the next slide, sort of a brief, what's in it for me, for each of these large constituencies. First, why ODI should matter to libraries is that discovery obviously is a service that many libraries have found attractive and useful for their users, because finding relevant content is simply simpler for users when it's all in one place.
PETER MURRAY: And it can be addressed through a single search interface. And everything a library has can be found. And so ODI aims to make it easier to understand which resources are included in discovery services, how that input is treated, and how records are then displayed, and how access to the full-text content or whatever the next step is, the video, the audio, the what have you is paid.
PETER MURRAY: For content providers, ODI feels that participating in a discovery service is a value add to the content that you're providing, the indexing that you create to help make content more discoverable. It increases usage of those materials for those who are providing both the full text and the indexing. And the more value should be to that ecosystem, the more likely it is that libraries are going to continue their contracts and relationships with you.
PETER MURRAY: And finally for discovery providers, participating in the discovery initiative helps ensure transparency and improves customer satisfaction and retention for those who are providing access or purchasing access to the discovery indexes. People want to know what goes in and what comes out. This is in a sense, showing how the sausage is made. And finally, I just want to conclude portion of this presentation by noting that we are all in this together, for my final slide, please Geoff.
PETER MURRAY: Discovery service providers, content providers, and libraries are all equally valued in the ODI process. We all need input we all need to be heard. And although Geoff and I come to you from the library perspective, we are by no means the only voices that matter in the ODI conversation. Each group's inputs, both the content inputs and content outputs, have impact on the others. Nothing is happening in a vacuum.
PETER MURRAY: And each of these three legs of the stool are needed to keep the whole thing moving forward. And this is really an effort to foster communication so that a broader set of the information ecosystem is aware of this, and can understand how to contribute, and the importance of the underlying piece of all of this, which is the metadata about the content being discovered. And now, I'll pass it over to Geoff to go into a little bit more detail on what each of us can do and should be doing in the context of Open Discovery Initiative.
PETER MURRAY: Geoff?
GEOFF MORSE: OK, thank you Ken. So I'm just going to start off here with a nice illustration of basically what Ken was just referring to. We are all in this together. And for these systems to work well and our end users to get the kind of results that we want to give them, we all do have to continue to work together. So I'm going to delve into the various roles we all have, and explain some of the things we can do and how we all work together.
GEOFF MORSE: So to start off, let's talk about the content provider role. It's important. Clearly, we've listed metadata quality first here. And it's important that core metadata is made available by the content provider for indexing. The full range of metadata will improve the discovery service experience for users. Fair linking and open access, , indication of open access are also going to be essential for providing the end user with the most effective experience.
GEOFF MORSE: So let's take a closer look at each of these areas, now. I'm going to start with the metadata quality I just referred to. So for a material to be discoverable in really the enormous universe of discovery system content, at a minimum, the following really need to be included when applicable. And I'm just going to run through a brief list here-- metadata elements such as title, author, author identifier, publisher name, volume, issue, page, dates, item identifier, component of title, component of title identifier, item URL, open-access designation, full text flag, content type, content format, language, indexing data, and abstract.
GEOFF MORSE: So I know I read those really quickly. But the recommended practice includes examples of these fields, as is illustrated above in that little illustration there. For discovery systems to work effectively, the metadata from content providers must include enough description to make the sources discoverable amongst the millions and honestly sometimes, billions, of records we now have in discovery systems.
GEOFF MORSE: As discovery systems are becoming larger and of a wider variety of materials in them, it becomes even more imperative that the metadata effectively assists the end user and locating what they need. So to continue on, in addition to the core field, which are of course, extremely important, the presence of data points that support direct linking, which we sometimes see now, should not ever supplant the open URL, as this would remove the library ability to choose the platform to direct users to.
GEOFF MORSE: So even if direct linking is available, and it often is, we should still make sure that open URL is going to be an option, because libraries like to have the option to choose which platform they direct their users to when there are multiple options. So it's very important in addition to those metadata elements just mentioned, that we provide for fair linking and open URL. That's extremely important to librarians in particular.
GEOFF MORSE: So with Open Access from the content provider, if an item is Open Access, it should have the free-to-read element included in the metadata if it can be viewed without authentication. The start and they can also be viewed as you can see there. That also is clearly important to the end user, who can then identify quickly if something is open access or not.
GEOFF MORSE: And it's important that the discovery system gets that information from the metadata from the content provider. So let's continue on and talk about the discovery provider role a little bit. Discovery provider also has an important role of course in the provision of records from the metadata they receive. So transparency is really important.
GEOFF MORSE: And metadata should be made available to both the collection and the title level. We can see that availability of the collection metadata should be available and downloadable form. And Open Access indication also has an important role here. So I'm going to expand on those topics right now. So first off, librarians and their end users really need to understand what is available through the discovery system central index.
GEOFF MORSE: So at the collection level, elements that need to be made clear include things like the provider of the product, the market product or database name, the title and the knowledge base, the number of titles in the knowledge base, the number of unique records in the central index, percent of full-text searchable records, percent of abstract records searchable, and that percent of subject searchable records, and the percent of articles that are free to read, data of last update, and date of report update.
GEOFF MORSE: So this is at the collection level. The level of transparency is beneficial to the librarian who's teaching, researching, or assisting end users to get the most effective use out of the discovery service. Knowing the extent of what is included will help guide the librarian and end user to effective search strategies. The level of transparency is also beneficial to the prospective customer, who can better see how well the discovery service can meet the needs of their user community.
GEOFF MORSE: And it also makes it easy for content providers to see how they fit into the overall discovery system. The ability of users to drill down to the title level is also important to librarians, end users, and content providers. Librarians and end users need to know what is included in what they are searching. Content providers need to be able to see that their material is represented in its source acknowledged.
GEOFF MORSE: So fair linking, just to briefly say here that business relationships between the content provider and the discovery provider should have no impact on the results, relevance, or link order. So for instance, we put here in the bullets in the slide, there should be a statement of neutrality in the algorithms. And if there is a business relationship between the content provider and the discovery provider, that should be clearly, clearly, clearly explained.
GEOFF MORSE: And the presentation of links to content should be configurable by libraries as well. So that is another key factor in the recommended practice here. Finally, the statistics-- the COUNTER reports provide usage on one provider's content by all of its customers. So this is important. They allow the content providers to be able to see if customers are using their content, clicking on links, and importantly, only licensed customers are getting access to their content.
GEOFF MORSE: So it's important that discovery service providers can supply these reports. Since content providers, in a sense, when they send their metadata to a discovery provider, they're kind of losing control of their metadata within the discovery system. It becomes really crucial for them to be able to see how their material is being used within the discovery system.
GEOFF MORSE: By viewing statistics from all mutual customers, all the customers they have, the content providers can tell if their content is being found within a discovery system, if their metadata is effective, and if licensed indexing and abstracting metadata is only being used by authorized institutions. The index we mentioned in the content provider, that it's important in their metadata, they include the free-to-read metadata element.
GEOFF MORSE: Discovery services, when they have that metadata element included in the content they receive, need to present this indication of open access to the end user in the item record. This of course, is very beneficial clearly, I think, to the librarian and the end user, that they can quickly see and access open-access content, public domain, or otherwise available materials that are free to read.
GEOFF MORSE: So, I've briefly gone over the importance of the roles of the content provider in providing metadata, fair linking, and open-access indication, and the same with the discovery provider and their roles related to this provision of metadata and statistics and fair linking. But I think it's important to mention here, that with the updated recommended practice, the librarian's role as well, which is newly highlighted.
GEOFF MORSE: So first off it's an important responsibility of the librarian and the library using the discovery provider, that configuration guidelines have been followed. To do this, one really needs to understand how the system is supposed to work. So when ODI did a survey of librarians using discovery systems, they found that significant configuration work was required when most libraries implemented their discovery system.
GEOFF MORSE: When ODI surveyed a number of libraries about how the implementation process was carried out, it became clear that in many cases, the implementation of the discovery system felt primarily to one staff member. It's understandable if you have a very small staff, it may not be possible to expand your staff working on the discovery configuration. But if possible, assign staff to oversee the specific areas of configuration.
GEOFF MORSE: In a sense, make them more expert in specific areas, other areas. Documenting all configuration decisions is important. Many libraries reported that they lost documentation when people left, that they didn't have a record of how things were configured. So it's important to document configuration decisions. And then, the last bullet point, confirm that the subscribed content is enabled in discovery.
GEOFF MORSE: Double check to make sure it's available. So you can see here that in addition to what we said about content providers and discovery providers, the librarians also, and the libraries have an important role in this configuration of the discovery systems. To that end, it's really important that training be established to meet different user's needs, including the library staff.
GEOFF MORSE: Certainly, a lot of libraries do training and workshops for their patron base. But it's important that the different library staff members also have their needs met, in relation to the configuration and the use of the discovery system. So front-end librarians may need more training and just the intricacies of the advanced search or the various searches. Technical service staff may need different training if they're using the discovery system for any aspect of their work.
GEOFF MORSE: And finally, with discovery systems now, there are quite often updates on a monthly or quarterly basis. And it's really important that these are reviewed regularly. Even if they're performed by the vendor, which they often are, it's very important and sometimes easier said than done from my personal experience, to stay on top of all of these upgrades as they come in. Because that of course, will ensure the best performance of the system for the end user.
GEOFF MORSE: So, what can we do here? The new recommended practice has a voluntary conformance statement for libraries as well. And we urge you to, as a library, to complete and publish the library conformance statements. And follow up with vendor partners, whether they're content providers or discovery systems on their conformance, and see how close they are to being in conformance.
GEOFF MORSE: We really advocate the increasing conformance for content providers and discovery service providers, and also to continue to enhance our conformance for librarian role as well. So that is really our quick overview of the areas of the recommended practice. And we have more links here for more details, including the tech checklist and templates.
GEOFF MORSE: The goal, of course, is transparency, not perfection. We have an implementation guide and our frequently asked questions here as well. And some resources for librarians including configuration guides, conformance checklist, some talking points, and frequently asked questions, and some general resources as well. So I want to thank you for your time.
GEOFF MORSE: And definitely follow up with this. If you have any questions, we're eager to talk about this. So thank you very much.
PETER MURRAY: Thank you Ken and Jeff. And we'll get to that question and answer time here shortly. Our next topic though is How Better Metadata Makes a Difference. And on that topic, we have Diane Pennington and Emma Booth. Diane is a senior lecturer at the University of Strathclyde, and is also the chair of the Metadata and Discovery Group of CILIP, the Library and Information Association in the UK.
PETER MURRAY: Emma is the resources metadata specialist at the University of Manchester. She is also a member of the National Acquisitions Group executive committee, and on the CILIP's Metadata and Discovery group. We're starting with Diane.
DIANE PENNINGTON: Just to make sure everyone can hear me OK?
PETER MURRAY: Yes, if you can get a little closer, that'd be great. And we can see the screen.
DIANE PENNINGTON: OK, good. Thanks for that introduction, Peter. So I am going to start with sort of a broad overview of what metadata quality means. And it can mean a lot of things. And I will be talking specifically about my role as an educator in metadata and cataloging within libraries, and hopefully, give some examples of some really good practice and research that my students have done, and I've done over the last few years, and hopefully motivate everyone to help us understand why again, this is so important.
DIANE PENNINGTON: And one of the things I want to start with for those of you in the audience who may not be librarians-- and that's the first question that we get sometimes, is you need a master's degree to be a librarian. Really? I thought all you had to do is check out books. Why do you have a master's degree for that? But the profession has become more and more technical, even in the years that I've been in this field.
DIANE PENNINGTON: It just keeps getting more and more complex. And a lot of that has to do with providing resources online, making things discoverable and accessible. And quality metadata makes that possible, which is the focus of the section of the conference. And the problem that we have though, is that a lot of librarians, even if they do get their accredited master's degree from our professional bodies they don't always get a lot of metadata instruction in their training.
DIANE PENNINGTON: And that's because there is this misperception, even in our own field, that metadata isn't important. Or vendors will provide it. Or it's just, somebody else will take care of it. And we don't have time, and it's not our problem, which is really isn't the case. And I can say this as the chair of our metadata and discovery group here in the UK, is that we're frequently contacted by people who need help.
DIANE PENNINGTON: They have these great collections. They don't know how to make them accessible. They don't know how to provide metadata to help people find what they need, in terms of local collections or anything else. So one of the things that we do as a committee is provide that training, whether it's through our conferences or through connecting them to people who can provide that training for them.
DIANE PENNINGTON: And what I'd like to show you is what I do to prepare my students for this complex field. So I've been at the University of Strathclyde in Scotland. Obviously, you can tell by my accent, I'm not Scottish. But I was recruited to Strathclyde in 2015. And they had in mind, a position for somebody to teach, someone to bring back cataloging and metadata into the curriculum, because it wasn't there for a long time. So I redesigned everything that we teach in this area when I came in.
DIANE PENNINGTON: So the first semester is more general things that non-librarians might be more familiar with. So things like taxonomies and thesauri and ontologies and linked data, and other types of metadata schemas that may or may not be within just a library setting. So that a lot of our students also go into non-traditional settings. So they go into businesses that do information management and so on.
DIANE PENNINGTON: The second semester is a bit more focused on traditional library cataloging, which is the kind of thing that Emma will be discussing next with MARC records. And also in the second semester along with that, they have to do a digital library project or for a cultural heritage organization but some of our partners that we work with. And this resource has to include good metadata to help people interact with the resource that they develop.
DIANE PENNINGTON: And finally, we have student placements. A lot of them do metadata cataloging projects. And their placements as part of their master's, especially this year. Most of them are virtual because they can't be in person at the library. So there's a lot of cataloging projects being done remotely. And finally, a lot of them choose to do research projects hopefully, in the areas of metadata.
DIANE PENNINGTON: And that's what I'm going to show today, is just really quickly, some examples of some of this research that students have done for their final research project, as they finish. And I'm going to go through these really quickly. You can follow up with me for details if you want any of this. Or if not, you can just kind of take the concepts that I'm going to produce here.
DIANE PENNINGTON: So the first one is showing that there is a difference between how MARC, which is the standard format that we use for library cataloging records, can be different around the world. This student was from China. She wanted to show that different MARC fields are used in the British Library and in the Chinese National Library format. So they use something called Chinese MARC.
DIANE PENNINGTON: And their fields don't match up with the MARC21 that we use in other countries, including in the US and the UK. And not only that, but how do we deal with things in different languages? So even if they were using the same fields, we have transliteration issues where they might be using the original Chinese characters. Or they also might have some sort of translation or transliteration into Roman characters that might not be the same in every setting.
DIANE PENNINGTON: A student last year did this, who was looking at the need for cataloging in non-English languages, whether or not the catalogs themselves were in English. So you can see here, there was a very long list of languages when she surveyed librarians, had all these different languages that they need to be able to catalog in. So we have to have good cataloging and metadata not only in English, but also in different scripts and different languages, and to understand what those standards are.
DIANE PENNINGTON: And that there needs to be consistency in terms of people understanding how to input different languages correctly, and so on. Management is also an issue, making sure that, again, the support and training is there, giving staff appropriate training, making sure that the library leaders understand why it's important to have these things, and why it can be so important for their strategic plan for the libraries, as well as for the vendors, who want to help libraries promote their resources and to sell this as part of their product offering.
DIANE PENNINGTON: Because libraries are quite reliant sometimes, on the vendors of the systems that we purchase and pay for. This student was looking at research data curation in an academic context. So when the researchers have to upload their data sets to the university's repositories, so the data sets themselves. And obviously she saw that metadata, including the standards and the metadata, and making sure that they were standardized, was an important part of managing it, to make sure that the data was quality, as well as the metadata provided about it.
DIANE PENNINGTON: This student looked at the difference between describing cultural collections, both on traditional, sort of digital library institutionally created sources, as well as on social media sites like Instagram and Pinterest. And she found the need for enhanced metadata, that people were using a lot of user generated tapes, personal features that are available on social media sites, where you can personalize what you're looking for and what you're doing, much more so than you can on a traditional library site.
DIANE PENNINGTON: In the case of musical metadata, this was a student who made some suggestions for linking metadata together from a large world music collection. So not just the recording of the music itself, but sketches and photos and other resources and background information about a world music site. Professionally created metadata-- this is from my own university and our institutional repository.
DIANE PENNINGTON: One of our students looked at the metadata provided to the papers that we have to upload to our repository. And she found that it was really good. And the quality was very good. But that's because they have a lot of staff that take time to improve the quality of the records once we upload them to the system. And they even improve mine, which I think is amazing, because they just do such a great job.
DIANE PENNINGTON: This student looked at the online fandoms and surveyed a particular fandom that she's a member of on an online community, about metadata. And while she found that the users, themselves, didn't know much about metadata in sort of an official, librarian kind of sense, they were really good at providing tagging and tags that were useful for themselves and for the other members. Here we looked at PhD students in humanities and social sciences, and came up with sort of a relationship for helping them connect with one another, to find other people to collaborate with.
DIANE PENNINGTON: So whether it was for finding new collaborators, finding supervisors, finding people of similar interests, and so on. Because when you're a new researcher, you don't have your network built up just yet. The need for standards-- and I'll talk about standards, because this is a NISO conference after all. Standards are really important in terms of using standards that already exist, and using them consistently, that we can't possibly have interoperability around these different kinds of systems if we're not using the structures that are already in place.
DIANE PENNINGTON: And so the importance of using these things and building on the resources that are already so big like Wikidata, DBpedia, these different ontological structures and using open-source tools are really essential to make sure that linking will actually work. And then on the library cataloging side, one of the problems that we have is that we have these cataloging rules that have been updated.
DIANE PENNINGTON: There's one, the Resource Description and access code that was released about 2013, that was supposed to be based on a user model that was supposed to make things easier for people to find. But the problem is that MARC records themselves, live in their own silos. And while this new structure with the RDA and the FRBR approach was meant to help users link things together, as you can see here, even if you have the same item, it will still show them as-- they won't show up correctly.
DIANE PENNINGTON: Because one item was alone in each particular MARC record. So it doesn't show up the way we might want to. This is a conceptual piece that I did a few years ago, talking about person-centered metadata, essentially. So if we were to design an ontology for example, people living with dementia, which is something I have a personal interest in-- my father passed away of dementia-- that what is the relationship between that person and things that they can engage with in the community, for themselves, with other people, to help them have a richer life.
DIANE PENNINGTON: And are there ways that we can design metadata and systems and just personal connections to make that easier for them? Within Scottish public museums and galleries, this student, last year, found something similar to what we've been seeing in a lot of this research, is that they want to link to other data sets. But the data isn't normalized. It's not very clean. There's a lot of different kinds of data.
DIANE PENNINGTON: And that makes it difficult for different museums and galleries to link together their data with one another. Here we looked at student, she did a classification system for a Christian college library. And she found that existing classification systems didn't meet up very well with the collection that only included Christian materials. So she developed her own classification system to reflect what was in that collection and how she saw the relationship of those items to one another.
DIANE PENNINGTON: This is a problem we have as well, is metadata sitting around an Excel spreadsheets, sitting on people's computers. Not really helpful to the rest of the world-- so what we need to do here is to take that out of the spreadsheets and make them available. And that's what one of my students did with that. This was a catalog of glass art for a glass school that exists here in the far North of Scotland.
DIANE PENNINGTON: So she took that metadata out of the spreadsheet, put it into cataloging records, along with a photo of the art of the piece of artwork, and created a faceted classification system to describe the artwork itself. I've done some work on barriers to data implementations. And this is one of the things that we found was that again, there's the issue of the training. We can't afford training.
DIANE PENNINGTON: Staff don't have time for training. And they don't have time to do the projects themselves. So they're kind of stuck in this sort of loop of not being able to do this. It also needs to be ethical as well. I have been serving the last two years, as part of my role on the I'm the MDG committee, the Cataloging Code of Ethics. So that means that not only do we have-- and this is just one of our principles.
DIANE PENNINGTON: You can read the rest on the website. That not only this metadata needs to be interoperable and consistent, in terms of standards, but we also need to recognize that standards can be biased, culturally and otherwise, and that we need to be critical and inclusive when we use them in applying our metadata records. There's been some work, some of my students have done, just doing literature reviews to come up with a sort of a theoretical framework for the different aspects and dimensions of bibliographic metadata.
DIANE PENNINGTON: And this is one of my PhD students who's just finished up. And he's come up with a very complex model of metadata alignment. So it starts with end users and all of the things that we can't control, that the end users interact with. But then there's different layers that live underneath a discovery layer of a system, that then feed into the discovery layer that then feed into at the end user does and what they see.
DIANE PENNINGTON: And that everything in the back end needs to work just as well, to feed into the front end. And a lot of my work has also focused on not just bibliographic metadata, but metadata about emotions, about how certain types of things make us feel. Like what if we want to find a song to put us in a good mood or a video? That's just really interesting.
DIANE PENNINGTON: It will keep us awake, sort of things. And so this comes into the question of quality, in terms of types of metadata. What kinds of documents or objects or other things need metadata? And what is the best way to design that metadata so that it's not just a search and retrieval act, but also discovering, accessing, engaging with it, sharing with each other.
DIANE PENNINGTON: So finally, I just want to say that as I hope you've seen some of these examples of projects, that the training that we've done at Strathclyde in metadata, everything that I've added to our curriculum, has led to a lot of students who are very interested in quality of metadata, who are critical about it, and approach it carefully and thoughtfully, from what I've seen. And this ability to take these concepts to find a problem in their own context and all these different contexts that I've shown you really quickly, to find that problem, to do the research on the issue.
DIANE PENNINGTON: Because I don't give them specific problems or context. They find it themselves. And then to do the research and apply it to practical recommendations, is something that is very important to evidence-based work in our profession. But we don't have the opportunity. Or we don't do it as nearly as much as we should. But we need to have this to have better metadata.
DIANE PENNINGTON: We need to know what the issues are and how to resolve them, and that we have evidence to back it up, to make these decisions about informed practice. And so now we get to hear from Emma, who's going to talk about one of these excellent projects that she's done locally in her library, to show how this can actually work. Thanks very much.
EMMA BOOTH: OK, thank you Diane. And like you, I said to myself, I still need to speak. The brief for this session is Better Metadata Makes a Difference. And as Diane has illustrated, this is a very rich topic for discussion. It's also an area of personal and professional interest to me, as a metadata specialist at the UK academic library, where I work with and create metadata on a daily basis.
EMMA BOOTH: As a library practitioner, I regularly see the positive impact that better richer or quality-assured better data has when it's leveraged by discovery systems to bridge the gaps between information and communities. My perspective on the content of my presentation is naturally going to be fairly library-centric. But hopefully what I have to say will be relevant to information professionals in other sectors that have an interest in metadata quality.
EMMA BOOTH: So first of all, I want to articulate that while there are many librarians do not work directly and that's data creation or metadata management, they do interact with and use metadata on a daily basis. In fact, metadata has always been at the heart of library services, because it is essential for accurately describing the resources that we want our end users to discover and use. Metadata is an incredibly powerful tool for connecting library users with the information that they need for teaching, learning, and research search.
EMMA BOOTH: And without it, a library would cease to be a fully-functioning service that supports its user community. But in order to be fully useful to libraries, bibliographic metadata must be standardized, accurate, and as complete as possible. Otherwise the resources it describes can be rendered essentially invisible to the library user. Metadata quality makes a huge difference to libraries, because the better the metadata, the more utility it has in driving resource discovery, and supporting library users with finding and selecting the content that best fits their needs.
EMMA BOOTH: Quality metadata also supports several of the library functions and processes-- from resource acquisition and activation, to collection, analysis, and the development of collection management and preservation policies. Metadata can be extracted and analyzed alongside usage data and statistics to inform evidence-based collection development strategies. It can also be used to support resource sharing and collaborative research, which extends its impact outside the library and immediate university community, to other stakeholders involved in the dissemination of scholarly research.
EMMA BOOTH: The quality and interoperability of metadata has material impact upon how readily it can be transformed into different schemas and formats to support this variety of purposes and activities. So the flip side, of all the useful and wonderful things that quality metadata can do for libraries and their users, is the negative impact that poor quality metadata can have. And this is unfortunately, something I encounter regularly in my day to day work as an e-resources metadata specialist, particularly when dealing with externally created metadata.
EMMA BOOTH: In 2019, -I began working with a UK-based organization called the National Acquisitions Group, to explore in more depth, the negative impact that poor quality metadata has on libraries and their uses. My work with NAG involved designing a survey to gather data about academic library experiences with shelf-ready metadata. The scope of this project was fairly narrow, focusing on MARC records supplied to higher education libraries for print and e-books, purchased as part of a specific procurement agreement.
EMMA BOOTH: However this project is an interesting case study, with key findings that are relevant to the wider interests of the information and metadata standards community. In particular, the recommendations published last spring in NAG Quality of Shelf-Ready Metadata Report, relate directly to the work of the NISO e-book metadata working group. To assess the negative impact for quality metadata on libraries, the NAG Quality of Shelf-Ready Metadata survey asked respondents about how frequently they're having to undertake quality control and error correction activities on vendor-supplied metadata.
EMMA BOOTH: 88% of the 50 respondent libraries indicated that they carry out quality control or spot checking on shelf-ready records, with 60% checking individual records. 96% of libraries indicated that they carry out in-house error correction or enrichment work on vendor-supplied metadata, with 62% saying that they always perform these tasks on records sent from their book supplies.
EMMA BOOTH: These figures show that there's a huge amount of effort being invested by libraries in checking, correcting, and enriching vendor-supplied metadata, effort that is essentially being duplicated across the academic library sector. This is quite disappointing for libraries, as the big selling point of adopting a shelf-ready or vendor-supplied metadata service, is that it should streamline workflows and free up staff time for other technical projects.
EMMA BOOTH: That present, the amount of record checking and correction work that libraries have to do, indicates that they're not really experiencing the efficiencies that were promised In terms of the specific errors or problems that necessitate all of this quality control or remediation activity, the survey results showed that the most commonly encountered issue that academic library catalogers experience with vendor-supplied metadata is incomplete or brief records that do not fully support end use of discovery.
EMMA BOOTH: Other regularly encountered issues focused on metadata being inaccurate or incorrect. For example, records containing erroneous or inconsistent ISBN information, or data that has been incorrectly mapped to the MARC21 format of some other schema. Whilst the data from the survey responses demonstrates that libraries experience a range of issues with vendor-supplied metadata, the majority of them fall into two main categories-- inaccurately or incorrectly recorded metadata, or missing and omitted metadata.
EMMA BOOTH: So all of these issues have led 94% of the survey respondents to agree or strongly agree that it would save them time and effort and improve the discovery experience for their users if shelf-ready and vendor-produced records was supplied to an agreed standard of quality. But what does that standard look like? And what do libraries mean by quality metadata? In order to provide clarity for both libraries and those further upstream in the content and metadata supply chain, The NAG Project aimed to produce clear recommendations for suppliers regarding quality standards and specifications for book metadata.
EMMA BOOTH: Data from the NAG survey about which metadata elements libraries considered to be essential for supporting end-user discovery of books and e-books has been used to inform these recommendations. The majority of survey respondents indicated that the descriptive and authority-controlled metadata elements onscreen here are essential to their requirements. Whilst this case study just focused on MARC21 metadata for books, the essential metadata elements that have been identified here are relevant to the other formats of bibliographic description and data transfer use throughout book publishing and information supply chain.
EMMA BOOTH: Furthermore, these metadata elements are used to all stakeholders in the book supply chain, regardless of the schema or systems that they use. The essential metadata elements identified here fall into five distinct categories-- titles, names, dates, book identifiers, including those related to resource type and format, and subject. These same five categories of essential metadata were also identified by the NISO E-book Metadata working group, in their recent recommended practice document, Outlining E-book Bibliographic Metadata Requirements in the Sale, Publication, Discovery, Delivery, and Preservation Supply Chain.
EMMA BOOTH: This recommended practice was open for public consultation last summer, and in to articulate the e-book metadata requirements for different stakeholders in use cases in the supply chain. It's therefore, encouraging to see alignment here, with the NAG recommendations for shelf-ready and vendor-supplied book metadata for libraries. Indeed both the NAG on NISO reports suggests that a consensus can be found regarding a definition of metadata quality for books and e-books.
EMMA BOOTH: And if implemented, it should help to ensure that metadata in the book supply chain is functional enough to support a variety of activities and purposes across all stakeholders. From the library perspective, there are additional desirable metadata elements beyond the essential elements already described, that help support end user discovery and add extra detail to assist with resource identification and selection.
EMMA BOOTH: The majority of these desirable metadata elements are recorded in note fields in the MARC record, and are particularly useful if a library's discovery layout can be configured to index the data in these fields for keyword searching. These desirable elements provide added value to library metadata, as long as they are accurately recorded . And this is really the crux of what libraries mean by quality metadata.
EMMA BOOTH: Guaranteeing metadata equality for now and for the future is not just about defining the number or type of essential fields or metadata elements required. It's also about ensuring the accuracy, reliability, and standardization of the data that is recorded in those fields or elements. This is especially true if we want to move beyond a particular format or schema and transform metadata from a flat record structure to an entity-based network environment of linked data.
EMMA BOOTH: For all members of the information community, quality metadata is metadata that is both detailed and accurate, so that it can serve a variety of purposes, including enabling the discovery and identification of works. Better metadata is metadata that is credible, complete, and compatible. That means it must follow international standards so that it is open for use and can be transferred across different systems and communities.
EMMA BOOTH: It also means that it is interoperable and is future proof as possible. So the effort is not duplicated in mapping or translating in its different formats and schemas. It's therefore, in the interests of all stakeholders in the information and metadata supply chain, to work together to ensure the quality and accuracy of the metadata that flows between them, regardless of the format schema or systems that they are using.
EMMA BOOTH: Because when we settle for inadequate metadata, everyone suffers as a consequence. Thank you for listening.
PETER MURRAY: Thank you, Diane and Emma. This concludes the prepared recording part of the session. We will now move to the Zoom meeting for reactions and discussions on these topics and related ideas. With that, we will close this recording, and see you in the Zoom meeting. Thank you. [MUSIC PLAYING]