The essential nature of digital preservation
The essential nature of digital preservation
https://asa1cadmoremedia.blob.core.windows.net/asset-8f72cc1a-0494-4869-bc38-861efa35a73b/41 - The essential nature of digital preservation-HD 1080p.mov
CLARISSA WEST-WHITE: Welcome to NISO Plus 2021. Thank you for joining us in the session, The Essential Nature of Digital Preservation. I am your moderator, Clarissa West-White, reference librarian and research instructor at Bethune-Cookman Cookman University. This session intends to use the [INAUDIBLE] of NISO attendees to map out pathways to digital preservation solutions, outlining the problems and then asking the fun questions, what can take away the pain?
CLARISSA WEST-WHITE: How? What needs to be in place? What's stopping us from doing it right now? To begin the discussion, Oya Rieger, senior strategist on IthakaSR's Libraries Scholarly Communication and Museums team, whose responsibilities include spearheading projects that re-examine the nature of collections within the research library, helping secure access to and preservation of the scholarly record, and exploring the possibilities of open-source software and open science.
CLARISSA WEST-WHITE: During her presentation, she will answer, what is a digital preservation system anyway? Following her is Paul Stokes, product manager at Jisc, who has had a varied career in both the commercial sector and academia, and all points in between. At present, he leads on preservation for Jisc preservation service. He is a director of the Digital Preservation Coalition and a director of the Open Preservation Foundation.
CLARISSA WEST-WHITE: He's been passionate about preservation for many decades, and currently had a number of bees in his bonnett regarding cost, value, sustainability, and storage. During his session, he will address pains associated with preservation and ask, why aren't you doing it now? Oya.
OYA RIEGER: Thank you very much for the introduction. Let me open my presentation. Well, hello to everyone from sunny but very cold upstate New York. It's a great pleasure to participate in NISO Plus 2021 and collaborate with Clarissa and Paul on this session.
OYA RIEGER: As you know, our cultural, historic, and scientific heritage is increasingly being produced and shared in digital form. So therefore, organizations with stewardship roles have increasing dependency on digital platforms to support creation, discovery, and long-term management of this digital content. Yet, some of these systems and tools have shown to have sustainability challenges.
OYA RIEGER: So I'm going to share with you some early findings from a study I'm involved in related to this question. Just a few words about my organization-- Ithaka is a non-for-profit organization with the mission to improve access to knowledge and education for people around the world. And S&R is the strategic, guidance, and research arm of the organization. We were delighted to receive funding from the Institute of Museum and Library Services to examine and assess how digital preservation and creation systems are being developed and deployed and sustained.
OYA RIEGER: We have two basic-- I would say, two fundamental research questions. One of them is we want to look at the business approaches of community-based and commercial initiatives. And a very important one of the core questions we have is, as you know, there are all kinds of cultural heritage organizations, from small historic societies, all the way to well-endowed and well-resourced large university libraries.
OYA RIEGER: So one big question we have is these systems, are they inclusive? Are these systems affordable? Are they accessible? And ultimately, we are hoping to contribute to the sustainability dialogue that our community is really very strongly engaged in. The study methodology will be case studies. I'll talk a bit later, but we will select some digital preservation systems and look at them deeply.
OYA RIEGER: Before we start the case studies, we wanted to interview with some experts, and we'll ask this fundamental question-- what is digital preservation from your perspective? And what does it mean to use a digital preservation system? So during the last few months, we interviewed more than 20 colleagues from different digital heritage organizations, whether they are libraries or service providers.
OYA RIEGER: There are many definitions out there. And digital preservation really is kind of a well-established field. And I think there's strong recognition that it involves a range of processes. Some of them are managerial. Some of them are technical, and some of them are related to policy. However, when I started talking with colleagues about digital preservation.
OYA RIEGER: What is really very obvious is that, although it's a well-established concept, it is very situated, and it's interpreted differently, depending on institutional context. Some colleagues look at digital preservation as a process to refresh the file formats of their digital content to make sure that they're accessible. They are usable. Some feel that digitizing old audiovisual content is digital preservation.
OYA RIEGER: And some colleagues feel that digitizing some content, especially analog primary materials, is digital content. And by the same token, the definitions vary, definitions about what is a digital preservation service varies. And one thing that became obvious is, as I was talking with colleagues, as you know, libraries now have a very, very rich, diverse range of services from web archiving to digital creation to research data.
OYA RIEGER: And this is actually a terrific development and means of specialization. But on the other hand, what we are seeing is preservation services, which really kind of, in many ways, span through these services, sometimes don't have a common mission or a common system. And when institutions are looking for systems, services for their digital preservation, again, whether they are dealing with digital humanities project or web archiving or they are harvesting social media content or they have digital books, they are trying to figure out what are the systems out there and, very importantly, how these systems talk with each other, how they communicate with each other.
OYA RIEGER: Just some direct quotes from my conversations, especially I just want to pull attention to first one, which really highlights that digital preservation about resources, as well as it's about technologies. So let me actually talk a bit about what a digital preservation and creation system is, pointing out that we haven't really addressed the question, what is digital preservation? But definitely, Paul and I are looking forward to talking with you about this when we have our discussion session.
OYA RIEGER: So for our study, as I mentioned, we will look deeply into eight or so preservation systems and assess their organizational framework, their governance model, their marketing model, so and so forth. It's not a study to study, to look into technologies. But it's more about the business approaches. And it's more about usability and business approaches. Rather than having a kind of definitive definition of what digital preservation and curation system is, we are really taking a very operational, very pragmatic approach.
OYA RIEGER: For us, any system a heritage organization considers a preservation system is a preservation system. And as you can imagine, it includes long-term storage services, digital asset management software, so and so forth. We have really benefited from many studies related to digital preservation and sustainability, as we started this. I am a less-funded research.
OYA RIEGER: And especially, I want to give a loud shout out to the power project that has done some really amazingly wonderful work. One thing they did is they looked at different digital preservation systems, and they tried to create this taxonomy just to indicate that how we are really kind of into this microservices realm now, that there are many digital preservation services.
OYA RIEGER: But they are sometimes just really targeting specific aspects of digital information lifecycle. And kind of from a holistic perspective, I keep on thinking about open archival information system. As you may know, this tries to provide a conceptual framework and common vocabulary. And this starts from creating digital content, ingesting in a system, managing it, and all the way to access. So, therefore, in our study, we wanted to stay away from microservices, where the preservation service is only focusing on one stage or one component.
OYA RIEGER: And instead, we are looking at systems that are, I wouldn't call holistic, but try to cover many of these processes in this lifecycle. With that conceptual framework, we have identified 36 digital preservation systems. This is by no means a comprehensive study. We have a case study-based research. Therefore, our goal here is really to identify and [INAUDIBLE] identify probably eight or so systems in this list to be able to do deep dives.
OYA RIEGER: I want to actually take us back to the initial interviews because I wanted to highlight some issues. Now that I hope I was able to share with you what we mean by saying "digital preservation system," first of all, every digital preservation system has some requirements in means of having some institutional policies. And again, from our interviews, what we are hearing is many cultural heritage organizations really lack consensus on what collections to preserve and what to prioritize, and then of course, how to fund them.
OYA RIEGER: And one of the dilemmas seem to be building in-house preservation systems versus outsourcing it. And whether it's outsourcing or building in-house, one of the really critical issues here is, how do you bring together these systems and tools so that as you are managing a range of digital assets, there's some cohesiveness among these systems?
OYA RIEGER: In our interviews, very often we heard that there is some tension between community-based and commercial products. By the way, by no means our study is trying to binarize commercial systems versus open-source or community-based. We do understand that there are some wonderful collaborations and intersections. But just for the sake of this discussion, what we are seeing is many cultural heritage organizations, they really prefer using community-based systems because they feel that their values aligned.
OYA RIEGER: However, for some of them, from resources perspective, from programmatic perspective, sometimes commercial systems make more sense. But we are definitely seeing this tension in this dilemma. But we're also kind of-- as we start looking at these preservation systems, what they're also seeing is that some commercial systems are very value-driven. They work very closely with the cultural heritage organizations.
OYA RIEGER: Whereas some community-based systems may not have the same deep engagement. So, again, we really don't want to binarize the spare. Ultimately, though, through these interviews, we identified two questions to look into when we started case studies, when we start these deep dives based on eight or so digital preservation systems. One of them is, as you know, community-based initiatives very often are based on one-time funding from foundations, grants, from membership fees.
OYA RIEGER: And what we are hearing is that they really-- some of them are having difficulty in developing sufficient capital and being agile and competing with commercial products. And the second question is, again, whether they are commercial or community based, as we have institutions investing in these systems. Whether they are purchasing it or whether they are funding development of an open source, we are going to look into how could we have stronger engagement from the funder community so that they really have a say, and they have some power in shaping these systems and making sure that these systems are meeting the diverse needs of our community.
OYA RIEGER: So it's a great pleasure just to share with you some findings, some insights from the study that we are involved in. And Paul and I are going to be sharing with you some questions, hoping that it will be springboard. But I am now happy to turn it over to Paul. Thank you, and I look forward to talking with you again.
PAUL STOKES: Thank you so much, Oya. Now I'm just going to share my screen, the traditional pause between presentations, where we decide whose bit is whose. Somewhere down here, I have the right one. It will be a second. You might have to cut this bit out, Jason. [LAUGHS] I'll just do with that. That's better.
PAUL STOKES: I can see it now. And do that again. Okeydoke. All right. And we have the usual, I hope you can see my screen question. So I just need to move this across so I can see what I'm doing. Right.
PAUL STOKES: Here we go. So addressing the pain in preservation. Well, first, real quickly, this is me. I'm Paul Stokes. I'm a senior co-design manager or product manager at Jisc. Jisc is the UK's National Research and Education Network provider, [INAUDIBLE] preservation in the open research services team there. I'm also a director of the Digital Preservation Coalition, the DPC, and the director and past chairperson of the Open Preservation Foundation, the OPF.
PAUL STOKES: I'm also a digital data holder. Now this is going to be short. I want you to have your say, rather than suffer death by PowerPoint. So on with the show. So I think we can take it as read that it's widely agreed that this preservation is a "good thing." And I would go so far as to say that, almost certainly, most people here believe that-- at least I hope you do, it will be that.
PAUL STOKES: So when did it begin? Digital preservation in a form we would recognize as such gels an idea in the late '80s and the early '90s. And standards were formulated or [INAUDIBLE] by other persons as formalized and so on. Today, 39 years on, it's still not business as usual. And I want to ask why. Today, I think it set the stage for our upcoming discussion, as I mentioned by Oya earlier, regarding what's stopping us from preserving data, the pain points.
PAUL STOKES: Before I do that, I'd like to step back briefly to address that question-- what is digital preservation? Oya has given us a detailed overview of the technicalities and what it means from different perspectives. But what does it mean in practice? Well, this is the definition I would like to use-- keeping digital "stuff" usable, and stuff has a wide definition.
PAUL STOKES: Expanding on that slightly, it's storing and processing selected information to enable access and use of that version of the information in the future and storing better data to facilitate discovery and use. It usually involves the creation of more than one copy of the information. Digital preservation or digital curation is an active and ongoing process. It's intended to extend the ability to use the information beyond the lifetime of current systems and performance usable.
PAUL STOKES: Let's unpick that. What makes stuff unusable? Well, any and all of the above. Hardware failure, file damage, loss of habitat, loss of files, obsolete formats, all that sort of stuff here. And I can guarantee that everyone here today will have experienced one or more of these data losses in some form. And coming back to that question of digital preservation, well, another way of looking at it is about managing and mitigating risk, the risk that things will happen that make your information unusable.
PAUL STOKES: And how do you do that? Well, there are processes such as files format migration, changing file into a supported format suitable for long-term preservation. Things such as emulation, emulating whole systems in software and virtual machines now old files and software to be used, keeping multiple copies. That's always useful, in case your disk dies, you've got another copy somewhere else, and authenticity checking, and so on.
PAUL STOKES: This is all well and good, but to the crux of the matter, why aren't people and you, in particular, doing it? Well, these are some of the excuses-- reasons, if you insist-- that I've come across. I don't really know what the preservation is. Well, you've made the right step. Having listened to Oya and me, you should have a better idea.
PAUL STOKES: I'm happy to talk to anyone about preservation. You just try and stop me. Ask. Google is your friend. The Digital Preservation Coalition's website is a good starting point. A point of preservation champions, send them on courses, become informed. Now the common one, I have backups.
PAUL STOKES: I have a repository. Repeat after me, a backup is not a repository. A repository is not a digital preservation solution. Sorry, Oya, I'm not sure, if I entirely agree with your loose definition you just gave us earlier on. Why? Well, backup is storing information as a hedge against [INAUDIBLE]. It's current information.
PAUL STOKES: If current information is damaged, it can be restored from a backup. But it's usually automated. It usually only covers a limited time period, and it's often applied with a blanket approach, entire systems. There's no selection involved. Deposits in the repository, well, that [INAUDIBLE] involve storing selected information, which is good, giving me well access to that version of the information in the future.
PAUL STOKES: It's usually stored metadata to facilitate discovery and use and so on. But it's a one-time activity. The ability to access the information may deteriorate over time. And as I said a moment ago, preservation is storing processing selected information. Access that version of information in the future with metadata and so on and so on, the key concepts here, being selection, use, reuse, and active, and ongoing.
PAUL STOKES: [INAUDIBLE] other things we got here? Well, I don't know what I've got. I don't know where it is. Well, I'm sorry. That's inexcusable. Find out. Well, you knew I was going to say that, now didn't you. Run a data asset survey. Ask people.
PAUL STOKES: Ask them what they have, what they create, what they use, what they might use. Use tools such as the DPC RAM. It's the Rapid Assessment Model to help. [INAUDIBLE] just find out. How about this other one-- I don't know what it's worth. Well, this is perhaps the $64 million question. Valuing data is hard.
PAUL STOKES: There's no doubt it has value. I mean, look at companies like Facebook and Google and so on. They have-- they're worth billions. But as I hinted at, its value is in the eye of the beholder. Some questions to ask yourself, if I had to replace my data, what would it cost? If I wanted to sell it, what would someone pay? If, heaven forbid, there was a ransomware attack, what would I pay to get it back?
PAUL STOKES: Well, if you don't like the thought of ransomware attacks, what would an insurer value it at? You need to get a handle on value before you can make a case to preserve it because preservation costs. It ain't cheap. And if you're seeking funds, you need to be able to make a credible business case. And to do that, you need to understand the value of the asset you're dealing with.
PAUL STOKES: This is another one-- we don't have the people. We don't have the budget. Simple, recruit, handle upskill from within. Preservation is not a problem to push off the agenda for a future officeholder to deal with. It's about the now. Failing to act now will cost more in the long run. There's a tool in that for instance called the cost to do an action calculator.
PAUL STOKES: Look it up. It costs more than you think to do nothing. We don't have the budget? Why, yes, as I said, act now, because it'll cost more later. And this one, as was touched on by Oya just now, policies. We haven't got a agreement. We haven't got a preservation policy. This one is interesting as well because perhaps we should start.
PAUL STOKES: Policy drives investment and actions. In particular, you need a policy that has high-level buy-in and wide acceptance in an organization. And that's achieved by including all stakeholders, of course. The DPC, mentioned again, have a preservation policy toolkit to help you get started that I highly recommend. So look on the digital preservation website for that. The other side of the argument, preservation is not always a no-brainer.
PAUL STOKES: Sometimes it doesn't make sense to try and keep something usable. Sometimes, hard as it may be to contemplate, especially for me as a dataholder, deleting information is the right thing to do. We shouldn't and couldn't and can't keep everything. Digital curation is about throwing things away, as well as keeping things. Curation involves asking and answering some or all these questions.
PAUL STOKES: And does it have value? If it doesn't, why are you keeping it? Does it cost more to keep than it's worth? Again, that's an unsustainable position. What's the environmental cost? Increasingly, this is becoming important to people working in the IT industry. And preservation obviously is a significant part of the IT industry, one hopes.
PAUL STOKES: Carbon cost is something you need to be thinking about. Do we have enough storage? Believe it or not, the world is running out of storage. We are producing data faster than we can find places to put it, which, again, is unsustainable. And who are we keeping it for? This is the designated community in OAS terms. If we don't know who we're keeping it for, if there is no designated community, then why are you keeping it?
PAUL STOKES: So going back to the original questions and faults and so on, if you're going to take away three arguments from my presentation today, that would be this. Preservation is about the present. What you do now will have long-term ramifications. So doing nothing is not an option. It's not small, fudgy stuff. Address the pain.
PAUL STOKES: And next thing, all organizations can benefit from preservation, and I mean all. These days, every organization has something that's digital that needs to be preserved. How long you need to do it before and so on, that's another question. And the last one, well, I'm all out. Just keep on banging on that one till the cows come home. You can read that as well as I can.
PAUL STOKES: So I mentioned the questions you want to try and answer. Well, these are the two key questions that I'm particularly interested in getting answered during our discussion. I hope you come prepared to take part in an active discussion. What is/are your pain points? What's stopping you? Why haven't you started? And for those lucky few who have actually started, what were your pain points?
PAUL STOKES: What did you find? What did you do to solve them? What will you do to solve them? So those are the two core questions I want to get onto today. So hopefully, the lively discussion will be starting shortly. So now it's over to you, Clarissa.
CLARISSA WEST-WHITE: Thank you, Oya and Paul, for your insight. We are thankful that you joined us for this discussion. And we cannot wait to see you all in the discussion. Thanks for coming. [MUSIC PLAYING]