Name:
Just connecting things? How creatives are keeping the metadata flowing Recording
Description:
Just connecting things? How creatives are keeping the metadata flowing Recording
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/b5502236-ee25-4380-a11b-f0ff1fd61f3a/videoscrubberimages/Scrubber_3.jpg
Duration:
T00H35M47S
Embed URL:
https://stream.cadmore.media/player/b5502236-ee25-4380-a11b-f0ff1fd61f3a
Content URL:
https://cadmoreoriginalmedia.blob.core.windows.net/b5502236-ee25-4380-a11b-f0ff1fd61f3a/Just connecting things How creatives are keeping the metadat.mp4?sv=2019-02-02&sr=c&sig=xzdWnip0OnnjxJ%2BThM7u9WkYq0Dlx%2Fl3%2BcWxUjxF33U%3D&st=2024-11-22T04%3A03%3A00Z&se=2024-11-22T06%3A08%3A00Z&sp=r
Upload Date:
2024-03-06T00:00:00.0000000
Transcript:
Language: EN.
Segment:0 .
Hello, everybody, and welcome to this nice Plus 2023 session.
The focus of our conversation today is going to be on how metadata and content flows is being adapted across the creative and cultural industries industries. And I'm delighted to be joined by Rachel kotecki, who is head of research infrastructure services at the British library, and Sarah Brennan, who is product manager at academic video online. Welcome to both of you. Each we'll be talking a little bit about their own experiences and how specific projects might lead to greater connectivity, accessibility and discoverability.
So without further ado, let's dig in. And I'm going to hand over to Rachel. Perfect thank you very much, Catherine. And as I mentioned, I'm Rachel kotecki. I am the head of research infrastructure services at the British Library. So what that means is I am head of a team who provide all different kinds of services to researchers at the British library, which is the National Library of the United kingdom, repository services, data services, lots of things of which I am going to concentrate on just two aspects today.
And they are one service, which is called ethos, which is an electronic CCS online service and then I will look towards some project worked with have done on a project called heritage pids which is the short name of the project. And I'll give you the long name later, but I'll jump straight in because I've got quite a lot to tell you about and limited time to tell you about it. So we will start with ethos, which aggregates all the UK examined doctoral theses in the UK and it was created by jyske and the British Library in around 2008.
The initial purpose of this service was to aggregate information on UK doctoral theses to enable digitization. So we harvest metadata and links to full text, but also part of the service is for anyone to be able to request and pay for digitization of an individual hard copy thesis, but any subsequent user can then actually access that for free. So the first person pays the initial fee to digitization and then it's openly available for anyone to access after that point.
And actually this is still a key part of what the service offers, but it is diminishing. So we were seeing just prior to the pandemic a few dozen requests for digitization on a monthly basis. We had to switch off digitization during the pandemic because we couldn't get on site to do the digitization. But actually when we switched it back on, we've had very few requests and no complaints at all actually when we switched off digitization.
And that's purely because actually lots of the theses being published now that people need for contemporary research are already digital. So the digitization is really diminishing month on month. There's also a lot of bulk digitization that institutions are doing and that wholesale removes a lot of the demand, which is great because it means we can just give people immediate access to a lot of the content.
We do get regular updates from harvesting of UK institutional repositories as we go along through memorandums of understanding that we have with the universities. And there are two flavors of that MOU. So some institutions only want to contribute the metadata and they want us to link back to the hard to the digital copies of their theses on their own repository. And the others allow us to actually harvest the full text as well.
And I'll come on to the impact of the full text harvesting of ethos later. Just to paint a bit more color into what ethos is. Our stats show that ethos is actually a well valued and well used service. So you can see in the graph at the top, right, that usage has increased consistently year on year. So we see a starting rate, a baseline of about 10,000 CCS views per month in 2011 when we first started recording this, rising to a baseline in the lean summer months of about 50,000 views of theses records in the summer of 2019.
And then we see a massive surge in use at the start of the pandemic as people lost physical access to libraries and were turning to open, accessible services like ethos. Maybe they also had a bit more time to read and process a long form document like a theses. Some of them can be a couple of pages in length. That's that's a lot of dense knowledge to get through. But that use we did see settle back down as we've come towards the latter stages of the pandemic and users returned to a similar profile with a kind of peak at the start of the academic year.
But it's settled down to a rate that is still much higher than we would have expected without the pandemic intervening. So we're looking at currently at a baseline of about 80,000 theses views per month in those summer months. And we do also talk about how we have near complete coverage. So some internal documents suggest that we might be missing about 3% of all theses published in the UK, which goes back to 1768.
So missing 3% over 300 years is not too bad a miss rate. And actually, because of that long term view and the completeness that we do have, we can offer unique insight into the UK research landscape. So just a couple more stats that cover that in even further. So we see a huge increase in doctoral research in recent decades. So the pie chart shows that although we have records dating back to the 1700s, records from the 18th and 19th century don't even make up 1% of all the records that we have.
20th century theses are those published in the 1900s make up about a third of what we have. But then the final two chunks are actually only from the last two decades. So only one fifth of a full century represents 2/3 of that pie chart, really. And this is a really stunning expansion of doctoral research in the UK. And then if we just briefly look at subject coverage, there is a strong skew towards stem, although social, economic and political studies is being well represented and is increasing.
So it's taken us a lot of work to bring together the ethos service that people can use today. And it's not been an easy process. There are many ways in which the UK PhD theses differ, so there are institutional differences. The repository platforms that we have to harvest and their own kind of different ways of dealing with metadata and the process of harvesting from them.
So we do use IPM, but actually that only requires you to make available the metadata in Dublin core, which doesn't cover all of the useful aspects such as supervisors or funders. That is really helpful information for us. Then there are even school or faculty level differences in the metadata which also make their way into institutional repositories. But we have had some successes standardization.
So the UK, ETD application profile is made available by many institutions for harvest so that we can aggregate a significant level of detail. This has increased since work in 2016, where we expanded that profile to include more persistent identifiers, specifically research identifiers. So we're talking about digital objects identifiers, Doi for the theses or kids for the author and supervisors, but also international standard name identifier.
So is nice for authors. And this allows the theses to be represented in the research graph of the UK, kind of the connected network of how authors and their outputs all link to each other and that itself becomes a significant source of research knowledge. I mentioned is these international standard name identifiers and these are super important for ethos because of the age of some of our records.
So authors and supervisors that will never have an ORCID because they're retired, they may even have passed away, but actually we still need to represent them. They will never have an ORCID because they cannot claim one. And so we have to use those instances. And these are actually also where ethos contributes back to the assignment of identifiers. So the British Library is also a registration ANSI for Disney, and a theses in many cases will be the first or even the only publication from a lot of authors.
So we can submit our ethos metadata to Disney to assignment of Izzy's, but we can also then pull back those matched to newly assigned identifiers into the metadata which can then be re harvested by the institutions. So they may not have name identifiers in the metadata that we get from them, but we can provide it back. So if we look very briefly at the ETD, UK, ETD, we have mandatory advice and optional fields.
I've highlighted here the addition of our identifier. So that's the theses identifier which is advised not mandatory because not all institutions provide that at the moment and the identifiers for the authors and supervisors. So again for authors, persistent identifier is advised, but for the supervisors it's optional and also because we might not have the supervisors listed. So we can't make them have an ID for that because they don't even have a name.
On the second page you can see that we have quite specific FDE metadata such as qualification level, which is important for us in filtering because we only include doctoral thesis, we don't include masters or undergraduate dissertations. Dewey is there, but it's not usually included in the harvest. So the British Library actually pays to have this added and institutions like the identifiers are able to harvest that improved metadata back.
We have a core metadata expert who works on ethos who not only harvests and adds in the durian, isn't he? They do a great deal of kua and duplication. So all that enrichment enhancement deduplication is able to be harvested back to the institutions. So there's a great benefit there in their inclusion and working with us beyond the increased power profile for their doctoral research, but also in terms of the information they hold as well.
So what else can folks do with ethos? And this is where we get into our ethos, case studies. And one key thing I've already touched on is the understanding of the UK research landscape, particularly the pipeline of research careers ethos has provided insights into dementia and immunology researchers and their career tracks, where it's possible to look at the theses topic and carry that through into later publications.
So does an author drop out of research altogether, or do they go on to another subject area? And all of this is going to be much easier to examine now that we do have increased use of ORCID and izadi. And some of this also then relies on the addition of our subject indexing. Another powerful activity from ethos is text and data mining. So the Royal Society of chemistry mined ethos, full text to pull out chemical compounds that are not present in the National database, the National compound collection.
There are issues with the text and data mining approach in ethos, and most of those are sources of potential bias. So less than half of all the records that we do have are actually available in full text, either directly from us or from links to the awarding institution repository. And where you don't have a full corpus, there will always be bias.
There will be bias on what is embargoed by whom and why. There will be bias from what is digitized and why. And there's even bias in who gets to participate and successfully complete a theses ultimately. But what we can say is that with the information it provides us, that could help to identify some of those sources of bias and make the research pipeline potentially more equitable in future. So ethos has its uses.
It has some problems we'd like to overcome. Where do we go next? What is there still to do? Well, we'd like to see much more use of identifiers, not just where they're supported, but for rewarding institutions as well. So for iceni war as organization identifiers. Again, we need to look to Disney for organization identifiers because Roa or raw does not necessarily have in scope those institutions that no longer exist and maybe haven't existed for 200 years or more.
In 2016, we looked at the power of identifiers to link data with publications and allow institutions to start breaking the data out of the back of the PDF. This is already increasing without much intervention on our part since 2016, but actually the more reusable data that we can break out of CCS, the better. And then finally, for digitisation, as I've mentioned, is not only important for avoiding that bias, but it will further open up the Uk's historical research aggregating full texts.
The ethos will then enable much more use and research across the corpus. So this is where I stop talking about ethos, which is a fairly contemporary national collection. But we've also dealt with some more historical collections and knowledge. In the UK, there are hundreds of organizations that collect, curate and preserve cultural heritage collections. This has created silos of information that can make it difficult to understand what is where it can make it difficult to navigate.
That, especially for researchers and towards the National collection, is a program for the Uk's arts and Humanities Research Council to break down those silos or the walls between them and open up heritage collections for research. And at the British Library I'll work with persistent identifiers through data sleight and some EU funded projects led us to propose that an increased use of persistent identifiers would help to break down some of those silos.
We already saw through use of doors and ORCIDs that you can link research across institutions, funders, et cetera in a global setting. So what was stopping the same from happening in cultural heritage collections? What we wanted to do was identify those barriers, develop tools to support, use wider use of persistent identifiers or pids in the sector, and then broaden awareness in order to support investment in the use of persistent identifiers.
And what we wanted to have at the end of this project was a toolkit of resources, a framework for decision making. And those would be decisions such as what identifier to use for what, when, what kind of information. And we did successfully do that. So this is the Heritage pids project, which I was going to mention. The long form name is persistent identifiers as IRA infrastructure.
So IRA. Oh yes, there's another acronym in there stands for independent research organizations. So we're talking about heritage organizations that are actually allowed to apply for funding from UK National funders, but they're not higher education institutions. Yeah, hopefully that explains that a bit. So you'll see heritage pids used because it is much less of a mouthful than the full project title.
But we did finish the project at the beginning of last year and we have created this toolkit. So I'll provide links to everyone who wants to have a look. But yes, it's all available through our developing identifiers for heritage collections, resource and case studies. Coming back to those again are part of that resource. We wrote each one was one of the partner institutions involved and across those case studies you can see use of identifiers in other standards.
So triple life or the International image operability framework being one of those use cases. But more generally in connecting and standardize, standardizing information across systems, connecting collections with research in a way that would allow you to see who's collection got fragmented across different organizations and do better reunification projects, et cetera so there were lots of use cases explored in those case studies, which again I can link you to.
I'll give you one quick example of how we connect with research on the next slide. So this is a presentation given by our co-investigator professor rod Paige at the University of Glasgow, who created a little bookmarklet that allows you to show the connection between collection items and research. So I'm going to press play for you now.
So this is another academic paper. This is the Edinburgh journal of botany. It's a paper on begonias. Now, this paper is behind a paywall, so we can't see a great deal about it. There's a UI for the paper there. Was literature cited. If I want to do that anymore, I need to spend 25 pounds.
I'm not going to do that. But what we can tell you is something about the specimens that this paper uses. So we got up here to annotate it. It says, OK, I know what paper you're looking at. And here are some herbarium specimens. These ones on the bottom. These are from National Museum in London and the one at the top is from Edinburgh.
Again, each of these links is the persistent identifier for that specimen. So let's go and have a look at that. So now we're off to Edinburgh. This is the Royal Botanic garden. This is their herbarium catalog page showing the specimen of begonia. And again, it's a typical kind of Natural History collection, web page, bit of data, a picture of the specimen, and that's it.
So again, there's no indication on this page that anybody cares about the specimen because the search results are quite. Let's look. I did not mean to press play again next, but fantastic. So that's a quick run through of the heritage pit's work and the use of persistent identifiers to kind of open up our collections, not just for research ultimately, but actually anything that we might not have thought of, including reunifications, community use.
And what have we learnt from these two examples I guess is what I want to leave us with. So firstly I want to say that where we cannot collect everything together and it's not always possible to do what we've done with each source and bring everything together in one place. But through linking and interoperability and a few key standards, we can support these kinds of uses and research that we cannot yet imagine, and that might not even quite yet be possible.
As long as we can make it available for the future through these standards, anything will become possible. And finally, to say that we don't need to homogenize everything which is controversial potentially for this audience, I know, but actually a few key selected areas of standardization. For instance, the use of persistent identifiers to demonstrate when we do and don't mean the same thing will actually get us really far while still being able to represent the unique diversity of what we have as collections holding organizations and the unique and diverse ways that sometimes we need to describe those collections as well.
So that's it for me. A quick thank you to our funders, our project partners and the folks in my repository services team at the British Library. That's Jenny, Heather and Sarah, who retired last year. But her spirit is still very much with us. And thank you to you, my lovely audience. I will hand back to Catherine.
Thank you very much, Rachel. That was a fascinating presentation and really rich with the longitudinal view of the ethos projects and all of the things that are happening in the towards a national collection program. Really excited to hear what we are going to be talking about next as we meet our next speaker. And thank you, Rachel.
That was. Thanks so much. Hi, everyone. I'm so glad to be here. My name is Sara Brennan. I am the lead product manager for academic video online, a streaming video database available from proquest, part of clarivate.
And I'm also a librarian and I was just really delighted to present alongside Rachel on this topic today centered around creative content publishing and the application of metadata to enhance that content, all with the goal, obviously, of making it discoverable and usable among scholars and thinkers, researchers, students. And I was particularly excited about this presentation topic because it focuses on the unique challenges manifested by nontraditional formats like streaming video, which is where I've worked for the majority of my career.
And having been fortunate to work in streaming video for the past several years, I've really observed just giant leaps forward in how academia and scholarly publishing think about video as an academic format. More and more video is just an accepted format as a part of scholarly output, and it runs the gamut of types of content that are being published as video, from journal articles to dissertations, recorded lectures, student projects, conference presentations like this one.
And I know it's a cliché to say it, but I think the pandemic also really contributed to that. I think creativity really expanded during that phase, partly by necessity, but but a fortunate outcome that we saw a big increase in the creative expression and creative output by academics and scholars during those years. And looking ahead, I still see so much creativity yet to come. There are so many forms of multimedia that are becoming more and more adopted, things like podcasts and virtual reality.
And as an aside, I was fortunate to attend the educators conference in 2022, and I saw some really amazing, incredible creative work being done in virtual reality as Scholarship. I think there's sort of a stigma sometimes around multimedia, and I was really pleased to see that some of the work being done in virtual reality was really top notch, excellent, creative, and I was glad to see it at the conference, and I'm sure there are many formats still yet to come that haven't been invented yet, of course, but just really pleased at how we keep pushing the envelope at what we consider Scholarship.
But and because of that growing interest in publishing as video and other forms of media, I've been really pleased and proud to be serving on an ISO committee that developed in 2019, sort of recognizing that streaming video is here to stay and it's a permanent form of Scholarship publication. But unfortunately, you know, the existing problem that there aren't clear metadata standards for these emerging formats and particularly for streaming video.
So this committee formed in 2019, it's a very diverse mix of stakeholders. It's publishers, academics, distributors, librarians. And I want to specifically acknowledge the co-chairs on this slide. So Barbara Chen, formerly of the Cla, Violetta Iglesias of Cadmore media, bill dorf of Cass dorf and associates, and Michele herberg of the a different amla, the music Library Association.
And we were really expertly led by netty and nice. So and much of my presentation today is kind of shaped around the work that this committee did, and there's a link in the slide to see the full list of committee members and also the forthcoming recommended practice, which is about to be published. And again, the committee formed because the more we embrace these creative forms of expression like streaming video, the more we all need as an information professional community to ensure how we're thinking about how others are going to find and consume that content, how it will be discoverable.
And accessible, and then also to prioritize how it will be preserved long term. And so I'm preaching to the choir in this session. But metadata, what does that mean in streaming video and metadata in multimedia obviously means several different things all at once, like it does in other formats to, you know, administrative metadata, title, date, version, et cetera. Semantic indexing exists for multimedia.
So subject classifications and keywords within the context of the content unique to what that content, the disciplinarian focus of the specific content, there's technical metadata in multimedia. Usually that means the media type of file type of file size, encoding, bitrate there's rights metadata like there is for any format, who owns it, who's aggregating it, licensing it, publishing it, distributing it. And then accessibility data really matters.
Metadata really matters in multimedia. So closed captioning transcripts, audio description, all of that has to be captured for the content to truly be accessible. And then streaming video is delivered in a whole variety of different applications, maybe more so than other traditional formats. But there's certainly the Scholarship path. So publishers, licensors, distributors, aggregators, video can be a standalone piece of content, a film, but it's often also supplemental material to something else.
So a video introduction to a dissertation or an accompaniment to a journal article, a video that's given as a conference presentation in support of a conference proceeding paper. So it really, you know, it rides both worlds and then there's non scholarly publications of video. So TV broadcasting, theatrical screening, obviously more and more media is digital born. So web broadcasting and all of those applications don't necessarily talk to each other.
There's not crosswalks between the metadata that's required for each of those applications. And that's really the basis for the committee work and why that committee formed at. So to address that very problem. So all of that is great in the abstract, but I really wanted to center my presentation around a specific example of creative output and how the metadata and those problems sort of manifest in one specific case.
So what you see on the screen is a film poster on the right. It's for a film called Anna, which is a short form animated film. And I'm going to give some more background about it. And then on the left hand side is record. So a very traditional form of metadata used by libraries, obviously for the film. So this film is a project that came out of a team of scholars at the University of nebraska-lincoln.
It's directed by Michael Burton. The film is the output of historical research that was done by scholars centered around a woman named Anne Williams. She was an American woman. She lived and died in 1815, and she was identified through abolitionist papers and a host of other primary sources, including court documents and oral history and testimony of ex-slaves.
And these historians specifically chose to publish a short form animated film as the vehicle to publish their research, and that was an intentional choice by them. They really felt like film was the best medium to tell her story. They wanted it to be visual. They really felt like a film would humanize her story. And perhaps more than anything else, they really wanted their work to be accessible to young students, school age students, more than like a dense text publication would be.
And so that's what they made. They chose this creative form of expression. It's a beautiful film. I've included a link to the film and which I encourage everyone to watch, but also a link to just read more about the project. And sort of the evolution of their project. They chose film better than I can do it justice in this short presentation, but both links are in the slide.
And so the film is really beautiful and it was screened theatrically, so it was celebrated as a film and they applied and were selected for a whole bunch of film festivals. To their credit, it's very acclaimed, but the metadata required to produce a film and have it be seen in movie theaters, screened theatrically and applied to film festivals. The metadata that those outlets are requesting is totally different from the types of metadata that Scholarship requires or that a library would request for a record.
And I think this example really illustrates that difficulty between cross walking those requirements and also the burden on the content creator on these scholars to imagine all of these intended audiences and be capturing metadata in all the right ways to satisfy all these different needs. And so I just really thought that this example illustrates the exact problem that the committee is hoping to help solve.
And when someone is creating or authoring a piece of scholarly multimedia, it's not always top of mind for them to be thinking about that nuance. Like, here are these brilliant, creative, expressive filmmakers. A mark record is not top of mind to them. And as information professionals, we as librarians, etc., that's our goal. But how do we better work with content creators to make them aware of the ultimate need for that type of metadata?
And the goal of a talk like this isn't to be overwhelming or intimidate. I think often content creators feel that they're busy doing many things, especially when they're undergoing a creative project. And so what metadata should I capture isn't something that they're even going to ask. And so the burden switches to us as publishers, aggregators, licensors, distributors to help them navigate that and to help them imagine what metadata they should be providing.
And then I want to speak specifically about preservation for a moment, because it's really hard to give a talk about video or multimedia formats and not mention preservation. And this slide is a bit of a graveyard of former formats. You can see there's 16 millimeter and VHS beta Cam and then laser disk. And I think I won't go so far as to call these formats obsolete. Certainly preservationists and archivists have equipment that can utilize these formats, and they have preserved.
There are efforts to preserve this type of content. But media is just such an evolving technology, and the media that we use today even will probably be obsolete sooner than we'd all like to admit. And what we capture as metadata, especially in terms of technical metadata for that scholarly record, really matters. And it's impossible to future proof metadata. And at the time that content is being created, the technical metadata sometimes seems obvious, like who would imagine that DVD would become obsolete at the time that it was the mainstream format being.
Used but it's really important to think about users and generations yet to come and how they're going to access this content. And I don't think it's specifically unique to multimedia, but digital born content obviously is way more susceptible to becoming ephemeral. It's the place where it's hosted can easily change or vanish. And I think the librarians who work specifically in media would tell you that they actively acquire in physical formats, especially DVDs still to protect against that, and that maybe more than other forms of non media video in particular, the places that host and stream and distribute and aggregate video also are evolving very quickly and fluidly and even streaming video that was available within the past few years.
Suddenly it can be not there the next day. So just the more metadata we capture, the more we safeguard a little bit against content vanishing or assist preservationists in the work that they do. And so this just in sum, how can we better assist content creators in keeping all of that top of mind? Both how do you make the content accessible and how do you keep it?
How do you increase the longevity of that content and help preserve it? And so this came right out of the recommended practice. Is this little sort of mini checklist for those who work with content creators is to help them think about who the intended user is and what the primary use case will be for the output. And then is that finished piece of media a standalone asset or is it going to accompany something else?
And those two questions really help inform the third, which is what standard already exists that can best support this project. And not written there, but sort of implied is what unique identifier could be used. Unlike other formats, video doesn't have its own standardized unique identifier. Books have ISBN and journal articles usually use Doi, and answering those first two questions will help a content creator determine what standard that they can lean on or apply towards their content to kickstart that metadata process.
And then the last thing I really want to emphasize is that metadata isn't a one time exercise. So working in streaming video, I mean, every week a filmmaker will have added foreign language track a year later or some accessibility tools, which is the best thing of all that they've added audio description or a transcript. Really often a year later, a filmmaker will come back and say, here is.
A lesson guide or a discussion guide that we've added to accompany the film. And the idea that we can still update that metadata ongoing it doesn't have to be static is the most important thing. And again, really impacts that preservation model of making this content have a longer lifespan and be accessible to a bigger and wider audience of users. So I'll close there.
I just really wanted to repeat how pleased I am that this topic is on the conference agenda. I think it really speaks to how creative Scholarship is becoming and really eager to see what comes next and excited about continuing the conversation with all of you. Thanks Thank you so much, Sarah. That was a fantastic tour de force around the content and metadata creation.
With video in mind, I really look forward to our upcoming conversation. We'll see you later.