Name:
Improving the Preservability of your Complex Digital Publications
Description:
Improving the Preservability of your Complex Digital Publications
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/f36f176f-5b96-4a7f-b126-2107e7d8d61e/videoscrubberimages/Scrubber_1.jpg
Duration:
T00H39M49S
Embed URL:
https://stream.cadmore.media/player/f36f176f-5b96-4a7f-b126-2107e7d8d61e
Content URL:
https://cadmoreoriginalmedia.blob.core.windows.net/f36f176f-5b96-4a7f-b126-2107e7d8d61e/session_3f___improving_the_preservability_of_your_complex_di.mp4?sv=2019-02-02&sr=c&sig=8O%2BGavFYe73PVSxgKC5%2F7iQya6DNAVvUqTm3j2WHJO0%3D&st=2025-04-29T20%3A37%3A20Z&se=2025-04-29T22%3A42%3A20Z&sp=r
Upload Date:
2024-12-03T00:00:00.0000000
Transcript:
Language: EN.
Segment:0 .
All right. So I myself am not a member of the Society for scholarly publishing. However, I do like to read the Society for scholarly publishing code of conduct, as we do sometimes at the beginning of meetings, just to establish some norms. The society, the Society for scholarly publishing is committed to diversity, equity and providing a safe, inclusive and productive meeting environment that fosters open dialogue and the free expression of ideas, free of harassment, discrimination, and hostile conduct.
Visit the website to read the full code of conduct. And I think it is also on various panel boards throughout the venue. Welcome, everyone. You are in a homework session. Just so we're going to be doing a little bit of workshop. Hands on work. It's not much. So, don't be daunted.
It'll be, we hope, fun. We have prepared a case study for you to work on a self-assessment tool. My name is Tim Carlin. I am the program manager of the locks program at Stanford libraries. And I'm here in my capacity as the provider of technology for clocks. Clocks is the one that the entity that is named in the grant that we're going to talk about in just a moment.
I would like to have my colleague, Angela introduce herself. Do you have a microphone. There we go. And Hello. OK. Hi, everyone. I'm Angela spinetti. I'm the project manager for the embedding preserve ability for new forms of scholarship project.
That's you're going to learn a little bit about. I'm based in Chicago and I'm happy to be here with my colleague tib. Thank you, Angela. And we would like to introduce our three colleagues, three colleagues that were supposed to be here and through a complex set of circumstances, all three of them are ill and unable to travel. And so it's just the two of us.
But we figure that we had a go/no go yesterday. And we feel good about this presentation with three tables, even though there's only two of us. So we think that will work just fine. But I would like to mention our colleagues for their invaluable contributions. This was the product of all five of us. Jonathan Greenberg is the digital scholarly publishing specialist at NYU Press and NYU libraries.
We have Karen Hansen. She's the lead research developer at portico and Ithaca, and Scott Whitmer, who is a digital preservation specialist at the University of Michigan. And we are very sorry that you do not get to meet them this time. But this similar workshop to this one will be presented at Epraise in Ghent, Belgium in September, and you will get to meet more of us then.
All right. So let me first put an overview of what we're going to do today. We are going to give you a taste of what the preserving new forms of scholarship grant is and what it does and how it fits into this picture. We are going to introduce the preserve ability self-assessment tool that we have developed as part of this work. We will then have an exercise that I assure you will not be too strenuous where you will work in small groups.
So table by table, using this tool based on an exercise that we have prepared for you to consider. And then we will come back together and compare notes to derive some lessons and learnings. So where do we start. We start in the early 2000, when e-publications became proxies for print. They were basically proxies for print in the sense that they were linear text and images, and you could page through them in a linear or consecutive kind of way.
And as time went by, researchers started to experiment with all the possibilities that digital publishing and online viewing afforded. And more and more of these publications became enhanced with various features over time. And this led to a variety of publications that included features such as supplemental data, some code that you can download or run or compile some multimedia audio and video that might be embedded or that might be remote, away from the main intellectual work, some various and many kinds of interactive visualizations, perhaps maps, perhaps some non-linear navigation, perhaps some databases that you can filter and display in various ways and draw graphs from.
I mean, it's been very exciting, obviously, and you can see some examples on the screen here of works that were part of a previous version of this grant that we studied. So maps and videos and all sorts of things. So these publications are much harder to preserve from an archiving point of view, and especially at scale because archiving at scale requires being able to process large corpora of content quickly and automatically as much as possible, using hopefully similar formats that don't require much user input or researcher input.
And these new so-called enhanced publications, these multimedia and dynamic publications have are going to become unavailable for future users unless more thought is put into their preserve ability. So to explore this challenge, New York University libraries partnered with several preservation services for a project called enhancing services to preserve new form of scholarship. This was a grant from the Mellon Foundation, and it was a two year project that looked at complex publications that were already published to see if they could be preserved at scale using the technologies of these various preservation agencies that were named as part of the grant.
And from this analysis, the preservation services identified what are now the 68 recommendations which we designed to guide publishers as they create digital publications that are more likely to be preserved. So we refer to these as the guidelines, the 68 of them. And all of you have handouts on your table with three URLs for various pieces of documentation that we want you to take with you and look up when you return. We made the URLs short as we could so that it would be easy for you to access these resources.
Yes, of course. So they're called the guidelines for presenting, preserving new forms of scholarship. And they're available online on the NYU library site. There's a screenshot of the beginning of it at preserving new forms, dot, dot. All right. So here is an example of one of those guidelines for captions.
So to give you a sense of what that looks like, we often observed that when publications included embedded video, whether on YouTube or some other video streamer, they often did not include a caption, which means that if the link were to break in the future, it would look like the image on the right that just has an error message with no context available. And so these features are more likely to break than basic text and images which are probably just embedded in the work.
This led to the guideline number 16, which is to create meaningful captions for all non-text features in a publication. Pretty simple. Once the guidelines were published, we realized that there was that more work is needed I think is a very classic ending to any paper or report to a funding agency.
More work is needed. So not only were these guidelines challenging to navigate, they're 68 of them. It's a very substantial they're all pretty easy to understand. But there's 68 of them and they cover quite a lot of ground. And they were based on things already published. And written by preservation services outside of the context of a publication workflow. So a second proposal was initiated, this time for a three year project which we are reaching the end of as we speak.
And the idea was that we would test whether the guidelines were plausible to implement in the real world, and we would do this by embedding a team that included representatives from the various preservation services and their experts with publishers as they were working on a new publication and who were also working with the platforms that were used to identify ways to improve the functionality of the platform specifically for preservation.
And we use the guidelines to provide advice. And we tracked what was implemented. And the goal was to use this evidence to build a new version of the guidelines based on this new evidence and this practice of embedding much more upstream in the production process of a digital work. So a major finding of this work. Is that many publishers and publishing platforms are willing to take action where they can for preserve ability.
So that was very gratifying to see that this work was useful. But they found the guidelines overwhelming at worst, or certainly daunting at best, without us as the preservation specialists, as intermediaries, and they wanted something more usable. So this led us down to the path we are now on, which is a new deliverable for this grant, a preserve ability self-assessment tool, so that platforms and publishers and people related to the publication workflow can assess the gaps in their preserve ability within their own environment.
And we have built part of it to share with you today, and you are the first ones to see this tool in action. So we're very proud to bring this to you. So we are going to go through an exercise, which will not require you to read all 68 guidelines right now. We will introduce the self-assessment tool that we have started to develop. We'll show you how it works using a use case from our research from a use case that we encountered.
And then we'll split up into the three tables. So each table will work together with the tool on one of those use cases for a short time. And we have contrived a question and answer format for you, and then we'll come back together and talk about our findings in relationship to the recommendations we made during the project. And you can tell us more about what you learned and how you view through this small keyhole, this work that we are currently doing to build this assessment tool.
So this self-assessment tool has four parts. Part one is focused on creating context. It aims to collect some foundational information about the publication, as well as consider your preservation objectives as a publisher. Part 2 is focused on defining the core intellectual components that make up the publication that should be preserved and identifying risk categories associated with them. Part 3 is focused on exploring the risk categories and identifying the recommendations related to them from the guidelines.
So this is the part where we try to provide a guiding map and a summary entry point into the otherwise lengthy guidelines. And part 4, which is not going to be covered at all today, is focused on using the relevant guidelines to form an action plan. So obviously, you can extrapolate what that looks like. After you learn a little bit more today, you will see, OK, well, and then I would take this information to my team and we would improve preserve ability using x, y and z means.
So part one will be about creating context. And before we go through these four parts, we would like to introduce the work that we have selected as the exercise as the backdrop for the exercise today. It is a work called owning my master's parentheses mastered. And Angela, I was hoping that you could tell us more about this very exciting work.
I would be happy to. And I will point out that we are fortunate to have the editorial director from University of Michigan press with whom we worked on this publication. Sarah Jo Cohen with us. And so, Sarah, these are your words, actually. So one of the things I wanted to add is that the self-assessment tool is based on the process that we used to embed with publishers and publishing platforms.
So we took the process that we used and we crafted it because we think it would be helpful for you as well. And it's in the form of conversations, questions so that you can basically talk together about what does it mean, what is this concept of provability and what are we doing now and what are some things we can be doing moving forward. So part one creating context is we asked a bunch of questions to our publisher partners.
So in this case, we asked what's the name of the publication. And please describe it for us. So this is what we received back from Sarah and her team from University of Michigan, helping us understand who is Carson and what is owning my master's mastered. In 2017, Carson defended his dissertation, which was a 34 track rap album. On the album, Carson works through what it means to be a Black graduate student at a University located on a former plantation during the early days of the Black Lives Matter movement.
Carson argues that some traditional hip hop scholarship that is hip hop scholarship that's published in Article or ebook form renders Blackness pathological by virtue of attempts to confront the notion that hip hop culture is deviant, bad or unworthy of study as a remedy. He offers hip hop performance as scholarship and invites us to think of rappers as scholars through owning my masters and his work more broadly.
Carson offers a model for engaged public scholarship and social justice work. While Carson gained a reputation as the guy who wrapped his dissertation, Carson's dissertation is so much more than a rap album. In addition to the tracks and liner notes for them, Carson created a mixtape each semester. He was at Clemson. He created 14 videos, built a website that includes a timeline that contextualizes the making of the album in relation to larger events on the Clemson campus and in the US.
As a result, the products of this project are currently spread out in several locations. Not the same location because he had a positive experience working with fulcrum on I used to love to dream. Carson approached University of Michigan press about creating a version of owning my masters for publication on the fulcrum platform.
The liner notes will be more substantial and will include a brief introduction. Lyrics for the tracks that include links to the tracks will also be included a photo gallery, a version of the timeline that he published on his website and a bibliography. So this new form of scholarship contains will contain all these things. Several of the videos that he created for the project would be published in fulcrum as resources.
So that's the context. Thank you, Angela. So we are very excited, of course, to study this work because it provided exactly the kinds of complex, dynamic, unusual nonlinear aspects and multimedia aspects that make what can be very exciting new scholarship. Not a linear dissertation, but something containing emotion, non-linearity and audio visuals.
All right. So part 2 is identifying core intellectual components and potential risks. So what are the core components that are desired for preservation for this publication. So it was determined that preserving the EPUB was extremely important. Preserving the resource files themselves, either embedded or not.
Specifically, the music tracks, the still images, the videos. There were some playlists, so sort of arbitrary sets of tracks that form sets preserving the resource metadata, making sure that relationships to the main projects were expressed and not interested in preserving the resources linked outside of the EPUB on other platforms. Just it was necessary to make sure that they would be playable except for the interactive timeline and that one it would be very nice to have a representation of it because if you go browse this work on the internet, you will see that interactive timeline is the centerpiece is really the centerpiece of the work.
So if you think people have access to this table as well. Yes Yeah. So if you find in your handouts. Table one. Core intellectual components. You will see that the first one is EPUB file of main text. Preservation requirement required in this case. And you then use table 2 to assess a risk category 2.
And I'll show you what that looks like on the screen. Table number 2 is for the risk category. So you can find the fifth one. Publication is an EPUB category. So then you can turn to table number 3, So there's this cascade effect from 1 to 2 to three. Table one is the list of features that are desired. Table two you need to identify what risk category those items have.
Table 3 gives you some questions that you can follow along and hopefully self-assess which guidelines are important. So table number 3 under category risk category, which is publication is an EPUB starts with the question are epubs validated using EPUB check no see guideline 37. So that's what we'll do. Guideline 37 is validate the EPUB using EPUB check and resolve issues.
So among the 68 guidelines there is this one that is related to epubs and the self-assessment tool provides a pathway to you not having to read all 68 guidelines and then say, OK, well, some of these are about epubs and this is about EPUB, so I should keep an eye on this particular guideline. It is a way to find your way, a pathway to these guidelines that are maybe related to the work you're looking at.
So this is the part where we begin the exercise. That's homework. We want you to take a turn at this using table 1, 1, 2, and 3 for the third item on the list of core intellectual components. So the third one is audio tracks to the album stored as supplements on the same platform as the publication are required. So for this, you will want to see my.
Did I go too fast. I'm sorry. There we go. You want to do the challenge also or later. No, you can mention it now. You can mention it now. So then the ninth one here is a bonus point question for those of you who do homework. And I know that some of you do.
What would you do in response to the interactive timeline, the one that's linked from inside the publication but is hosted on a separate platform from the main publication. So if you are interested, definitely can work on this challenge. Question but the one that is more straightforward based on the self-assessment tool to get familiar with it is the third one for the required core intellectual component of the audio tracks that are on the same platform as the publication.
Did you want to add anything. Andrea, before we go, the only thing I'll add is we know this is going to be overwhelming. We've just walked through an example very quickly. So the first thing that we would just recommend that you do is just take quiet moment and look at what all the components that are in front of you. We've got the context. Remember, the context is owning my masters mastered.
That's the enhanced publication. That's the new form of scholarship. So that's the lens through which to think about. The audio tracks. And then if you decide to do the challenge as well. The other thing that we would recommend is you don't have to do this by yourself. You can do it as a full table. Four people, five people together.
Or if you want to work in pairs. Or trios, however pick a partner to make it easier because we're going to also spend about 20, 25 minutes. Like we're going to just give you some time to read through this. We're going to actually float between the tables to talk you through any questions. To provide any additional context to ask you questions. If there's something that doesn't seem to make sense to you.
This is a learning experience for you and as well as for us. So we appreciate your again, I did say this as people were walking in the room. It's 4:00 in the afternoon, so we thought maybe a little paper might be good. So if you want to stand up and move around and there are other tables here with some of the materials, feel free to spread out. We just had you all together to begin with, just to make it easier to distribute the materials.
So Thanks for your patience and we hope you have fun. It is meant to be a fun exercise. So we're here to help out. We are going to switch to the discussion portion where we were going to share our homework, answers quickly and see what we think, but also discuss what we learned and what risks you identified and what guidelines were not. So you're not so sure about.
And before we start, I think Angela and I already have the feedback for you already now that this seemingly small, limited exercise had an invaluable impact for us because this is the first time that we're presenting this at, say, a conference or a scholarly environment. And we can see that it resonates with many people from have you all have very totally different backgrounds, people from publishing books, journals public health, very many different backgrounds.
And yet everybody is struggling with these issues and what to think about them. And having a tool seems to have resonated with you. And the feedback that we're going to get from you is already that good. This is a good direction to go into and to continue refining this tool. So I would like to Thank you all for helping us with this. Yes, I will add that Thanks many times over because we really had no clue if this was going to be something that would work or not.
So thank you. But let's get back to you what you just did. So what risks did you identify related to preserve ability of audio files stored as supplements. Any group. OK outdated file formats. And which risk category was that H h.
OK OK. We had f. We have h. Let's talk about it. OK OK. And this is the object contains the metadata. No embedded multimedia. All right, so let's do it this way.
How many pages do we have out there. OK, we have some H's. Let's start there. And then we'll move on to K. How many Ks do we have out there. There's really audio tracks. Yeah OK. All right, so this table's got k. We've got H over here. What other letters do we have.
Who has f. OK F over here. And how about the front table. What was your letter on table 2. For the audio tracks H h. Sorry, you were ages as well. OK all right. So let's talk about H. How did you arrive at h.
Thank so locally hosted interactive supplement not downloadable. OK Anyone want to. OK, sir. Seibel one. OK suggest to me that they are separate files that you download.
OK media player. But excellent. So because we didn't specify downloadable or not downloadable, that's how we got to k. Exactly right. So it's one of the reasons why we kept it vague, because that is a decision. Is it are you going to offer that up as something that's downloadable or is it part of the publication so it's not considered a separate entity that you can access.
We think of music as always being downloadable. Well, maybe not always, but right. That is something that could be downloadable. I shouldn't say always. But because we didn't specify h and k are both valid. All right. All right. And where's our where's our metadata.
Did somebody say metadata. You have to separate. So we're just going to focus on these two, the audio tracks. And then the interactive timeline. So let's hop down to the bottom there for whoever did whichever groups did the challenge on the interactive timeline and it's linked from inside the publication and hosted on a separate platform from the publication.
So the link is internal, but the actual resource is external. So which risk categories are we talking about for the interactive timeline. J OK. And which ones. J third party hosted interactive supplement. OK OK.
J any other J's. J j. J j. J OK. Good excellent. Any other letters. Any other categories P OK. Talk about p metadata for embedded components or supplements.
Any other peas out there. All right. So talk a little more about how you arrived at p. I'm going to give you the metadata for any component. Sorry I think you need metadata for any component. So, you need to describe what it is and how it relates to the rest of all of the stuff that you mentioned.
Yeah as part of the. Yeah so this goes back to the caption example. So it's metadata about the timeline itself, not necessarily the components that are within the timeline. So what is this thing. Why was it created. And part of the context, creating descriptive information that we received from the press, which was great, was about the fact that it was built using a tool that came out of Northwestern University lab journalism lab, but it was hosted at the University of Virginia.
This is a person who was studying at Clemson. So we're talking about geography played a huge role in this. And then what was the point of the timeline. And so what we learned actually from talking to Sarah and her team was how important the timeline was for the author. And so this also is something that. That actually came up a lot in the conversations that we were having was so you noticed they said, well, we'd like to have the publisher said, we'd like to have a representation of the timeline.
The author said the timeline is hugely important to me. It was how I was keeping track of what was going on while I was doing my dissertation work, why in my research was happening. This was like my touchstone. And so I created it so I could help myself remember what was happening around me on campus and around the country at the time. And so in the end, what happened is that timeline became it.
It actually shifted into the required category for preserve ability. And so till you want to talk a little bit about how Sarah worked with Scott on campus at University of Michigan and the preservation team that provided an intern. I'm sorry, but I don't remember. Sorry I was just. You seem to but. But why don't you tell us.
I'm sorry. I don't recall. That's all right. I was just trying to. I was just trying to do a little back and forth here. Yeah so no, the interesting thing that happened is, again, two things. One is the timeline rather than being something that was sort of it would be nice to have a representation of it moved up in priority to something that needed to be preserved because the author's intention of his role played a key, played a key role in the decision making.
And so how they ended up being able to take care of it is they worked with this is at University of Michigan. So they worked with our other team member, who's a preservation specialist on campus there, who brought in an intern to create a work file for them so that they would have a version of the timeline that they could put into the preservation package that was delivered to that will be delivered to the preservation service so that when that link to the timeline breaks someday in the future, nobody knows when the metadata will be there to describe it, but then they will also have a file that can be used to access using whatever technology in the future, right.
Might be used to be able to play it. So that's also goes to your question that you had earlier about some of the ways that for interactive components, things that you can do. But it was interesting because of course Sarah and her team weren't this was bringing some new information to them of here is an option that you might consider using. And in this case, because this component was so important, they collaborated on campus and managed to figure out a way to do it.
And so that's part of what we're trying to convey with the guidelines, is that there might be some things available for you, some options that you might not have considered before. And there discussed some of the technical things that are discussed in detail. But we actually like the fact that it ended up being two other people who wouldn't normally have been involved in the process, actually helped to solve the problem.
So that was part of what we were trying to convey by having the interview conversations that we were having with each of the publishers is who do you know who might be able to help you do this. Yes is there an existing vendor neutral format for describing interactive timeline for the description of them. No, for the timeline itself, I don't think so. Yeah so the thing you observed was that a piece of running code.
Yes right. Yeah, it's a widget. exactly. Exactly, exactly. And actually, one of the things we discovered, I mean, it was this was one of these examples that you kind of can't believe happens. We found actually a video from a computer science researcher who actually walked through the whole all the steps to create the preservable file.
So they happened. This app happens to be used to create timelines. It's clearly out there in the world and it's used. But because it relies on Google Sheets, which is not a persistent and sustainable technology, the point that this person was trying to make was, here's a way to take something that works in this. You could do it these few little steps differently, and that will result in a much more preservable file format that you can actually use to put in your preservation package to send off to your preservation partner or your repository partner and give a much greater probability that you'll be able to play this thing in the future.
So we just looked out in terms of this particular publication and the components that it had. But that's the kind of thing that over the past three years has happened, is as we started to talk about the importance of things, one of the guidelines, for example, is about author intent. Having conversations with your authors to understand what is critical to the making of their scholarly argument in their article or their enhanced scholarship, so that at least you have an understanding from their perspective of what they deem are the important components and the important enhancements.
You might not agree with them and You might not be able to preserve all of them. But what we found was a key conversation then in a lot of cases just wasn't happening. And so even broaching the subject at a point in time, whatever time fits in your production process, that conversation could be a key to having a greater understanding of what it is, where you need to focus your attention.
As our workshop draws to a close. I think that the part that Angela and I and my three colleagues, when they see this recording will be most interested in is question one and question 1 over here. I'm sorry. Question 4 and question 5, the last two. We want to hear your feedback, whether right now or in the hallway later or during the cocktail hour about whether or not you see this tool, once it is more complete, this mapping from risk categories to a Yes And no question format and the related guidelines you can go through to learn more about this risk category.
Is this tool useful in your own environment and would it be useful to improve preserve ability of various kinds of things. Obviously, it doesn't have to be books. It could be really any number of scholarly materials. And if you have any other feedback for us as to the format and the guidelines themselves, we would be extremely interested in getting this feedback from you either now or at a later time, because it will help us refine this tool and make the self-assessment tool more useful.
So that in turn, the guidelines themselves, which are largely no nonsense, they are common sense, but there's just too many of them and be more useful. So please hold on to these materials. These handouts are yours to keep and please visit the three websites that we have listed on the links document for you to consider outside the context of this workshop. And if anyone has a comment to share right now, we'd love to hear it.
Yes, please. You guys upped the. Let's say. But that makes it a lot clearer. Each ticket. That could help alleviate these questions and actually get a statement into the particular guidelines. They actually looked at the digital object in question.
OK, great. Thank you. Thank you very much. Any other comments you want to share right away. Yes from serving and presentation. You're so welcome.
That's kind of great. You're welcome. You're welcome. That's great. Sometimes old school is the way to go, right Yes use it quickly. An unreliable third party. That's one of the 68 guidelines. Says not to do that.
Yep there's your rules. Good point. Thank you. We debated that honestly. We did. Because we said BDQ, whatever did it. Should we use that. And yeah, so Thanks. We really appreciate you pointing that out.
We broke our own rules today. We invite you to join us at 2024 in Ghent in Belgium, where we will give a two hour version of this workshop where there will be more examples to look at. It will be a little less rushed, a little more practical. By then, the assessment tool will be developed further, so there will be more to look at. And so we would love to see you a second time to see the progress that we make on this work Yes And I will just say in closing, we did put a sort of sign in sheets on each of the tables.
We would love to keep in touch with you. So if you're willing to share your name and email and say, hey, I was here, I'd love to hear what you're doing we might call you up or email, you probably and say, could we have a chat with you now that it's been whatever, a week, a month, a couple of months, something like that. Because the more people we can talk to and get feedback from and just hear how you think something like this might be useful and used because that's really our goal.
We'd really appreciate it. Thank you. Thank you.