Name:
Preservation of new media - roles and responsibilities
Description:
Preservation of new media - roles and responsibilities
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/ff6c5d3a-ad69-4269-b936-83f60066a60c/videoscrubberimages/Scrubber_1.jpg?sv=2019-02-02&sr=c&sig=Xy%2F8OCHSQZNfTt6dzhKaqZGKtT8sW1jwkLFNR4Fc9dM%3D&st=2024-12-26T17%3A50%3A22Z&se=2024-12-26T21%3A55%3A22Z&sp=r
Duration:
T00H37M45S
Embed URL:
https://stream.cadmore.media/player/ff6c5d3a-ad69-4269-b936-83f60066a60c
Content URL:
https://cadmoreoriginalmedia.blob.core.windows.net/ff6c5d3a-ad69-4269-b936-83f60066a60c/15 - Preservation of new media - roles and responsibilities-.mov?sv=2019-02-02&sr=c&sig=HgWLzpGQqRCXZZ5boj0FIhbGXmAZLyomzkmu0uwwA%2Bs%3D&st=2024-12-26T17%3A50%3A23Z&se=2024-12-26T19%3A55%3A23Z&sp=r
Upload Date:
2023-02-13T00:00:00.0000000
Transcript:
Language: EN.
Segment:0 .
[MUSIC PLAYING]
WENDY QUEEN: Welcome to the NISO Plus Digital Preservation Panel. I'm Wendy Queen. I'm the Director of Project MUSE and a NISO board member, and I will be moderating this panel tonight. We welcome questions throughout the presentation in the chat. And when we are at the Q&A session of the evening, we will address those live. So I'll start by introducing our very impressive panelists. First being Heather Staines, who is currently an independent consultant in scholarly communications space.
WENDY QUEEN: Her prior roles include Head of Partnerships for Knowledge Futures Group, Director of Business Development at Hypothesis, as well as positions at ProQuest, SIPX, Springer, SBM, and Greenwood Publishing Group. She's a frequent speaker and participant at industry events, including the COUNTER Board of Directors, the Charleston Library Conference, the STM Future Lab, Society of Scholarly Publishing, the NISO Transfer Standing Committee, the NASIG Digital Preservation Committee.
WENDY QUEEN: And she has a PhD in military and diplomatic history from Yale University. And you can understand why I need to read these introductions, they're very impressive. On a personal note, beyond Heather's impressive dossier, I will add that I find Heather one of the most well-rounded and insightful colleagues in the industry. And I'm pretty certain there is nothing that she cannot do.
WENDY QUEEN: So moving to Mark, equally as impressive, and equally, I need to read his introduction as well, has created and managed innovative online products and services since 1984. As Director of the Wayback Machine, he is responsible for capturing, preserving, and helping people discover and use more than $1 billion new web captures each day. Prior to that, Mark was a Senior Vice President with NBC News Digital, where he managed several business units, including GardenWeb and Stringwire, a live mobile video platform for collaborative citizen reporting.
WENDY QUEEN: Mark was Senior Vice President of technology with iVillage, an early internet company that focused on women and community. He co-founded Rojo Networks, one of the first large-scale feed aggregators and personalized blog readers. In the early days of the net, he managed technology and business development at the Well and led their effort to build the first web-based interface for online forums and also helped bring the pre-web internet to millions of people by running AOL's Gopher Project as part of their internet center.
WENDY QUEEN: He managed technology for the pioneering US-Soviet Sovam Teleport Email Services and co-founded and managed PeaceNet, one of the first online communities for progressive social change, and later IGC.org, one of the world's first ISPs. He also co-founded the global NGO, APC.org. Mark's early training and experience with computer-mediated communications was acquired while he served in the US Air Force, spending more than three years working at the Air Force Data Services Center at the Pentagon.
WENDY QUEEN: Mark's non-profit work includes volunteering with the Open Education Libraries OER Commons, the Nuclear War Risk Reduction Organization, N Square, and the Interspecies communications group, Interspecies I/O. So while I met Mark just organizing this panel, I can tell you that I left the conversation inspired and energized, and I'm really looking forward to hearing the panelists.
WENDY QUEEN: And I'm certain that you, too, will leave this presentation with the same exact experience I had. So we're going to start tonight's panel with a series of questions that I will ask the panelists. And I will start with Heather and then move to Mark with, what got you interested in digital preservation?
HEATHER STAINES: Thanks so much, Wendy. I'm really excited to be able to attend this, my first NISO Plus, and look forward to the after times when we can get together in person. I'll admit, I hadn't heard of digital preservation until I started in 2008 as a product manager at Springer. And I was informed my second week on the job that I would be managing the digital preservation relationships. So I was definitely thrown into the deep end from the beginning.
HEATHER STAINES: So I was incredibly lucky, I got to work with everyone at CLOCKSS, at Portico, as well as the KB in the Netherlands and the German National Library as well. So I quickly learned about the critical importance of digital preservation, particularly in working with the librarians that I spoke with every day. One of the things which really impressed upon me the essential need for digital preservation was back in 2011 when the earthquake and tsunami hit Japan.
HEATHER STAINES: I was serving on the CLOCKSS board at the time representing the publisher side. And if any of you were familiar with CLOCKSS, you'll know that there are distributed archive nodes around the world. After the earthquake and tsunami, the archive node in Japan was shut down with the impression that the power was probably going to go out. After things recovered a bit and came back online, that archive node came up, it did exactly what it was supposed to do.
HEATHER STAINES: It synced its content with all of the other nodes. And so it was such a tragic situation, but we got to see, in reality, how a natural disaster can be mitigated and protect the content from any loss on that side. When I left Springer, I was really sad to be leaving digital preservation behind, but I had an opportunity about four or five years ago. NASIG formed a task force around digital preservation, and a former colleague, knowing my extreme interest in digital preservation, tapped me.
HEATHER STAINES: And I was super happy to join. That task force is now the NASIG Digital Preservation Committee, and I started as the co-chair back in the summer of 2020. I'll just say one more thing before we go over to Mark. As Wendy mentioned, I'm a historian by training. And so I think a lot about what we need to preserve from now for future historians and future anthropologists. So when we think of digital preservation, it's not just going backwards, but it's looking forward ahead.
HEATHER STAINES:
MARK GRAHAM: Great. So first of all, I, too, I'm just thrilled to be here and to share about the work of the Internet Archive and the Wayback Machine and to invite everyone's participation in the effort that we're talking about today around digital preservation. I was thinking, I haven't thought about this for a while, but almost 40 years ago I spent a fair amount of time in the National Archives in DC. I was stationed at the Pentagon at the time.
MARK GRAHAM: And as a student and on my own initiative, I wanted to learn more about the origins of the nuclear war problem. And I ended spending a lot of time through these dusty boxes of papers in the modern military branch there, discovering documents that had been recently declassified. And at the time, we had no laptops, obviously, in the early 80s.
MARK GRAHAM: And I was I was thinking about how one could create indexes and discovery methods to be able to learn about these new documents as they came into the public record. So I guess early on, I was-- I had some exposure to some of the challenges around archiving and opportunities around archiving, and maybe some of the role that digital could play in the process.
MARK GRAHAM: The next several decades, I guess I just made a lot of digital-- helped to make a lot of digital content, as opposed to try to archive it but just trying to create it. A little more than five years ago, I had an opportunity to join the team at the Internet Archive and to work alongside Brewster Kahle and dozens of other engineers and people around the world to help pursue the goal of universal access to all knowledge.
MARK GRAHAM: And in there, I managed the Wayback Machine with the mission of helping to make the web more useful and reliable. I guess, so it's been a bit of-- a little bit of an on the job training for me. In my first few weeks, I began gathering reams of academic papers and books and conference proceedings, and it's been a never ending journey of exploring this vast field of archiving and digital archiving in particular.
MARK GRAHAM: Just one more personal note, I've accumulated a lot of books in my life. And at one point, my wife said, we can't afford a bigger house, so you're going to have to get a shipping container. So anyway, so I do have a shipping container of books. I'm in the process of emptying it right now by digitizing the books. So I like to get my hands dirty and to actually do some of this work myself.
MARK GRAHAM: I physically digitize some of these books myself, most of them are being shipped off to our digitization center in the Philippines. So I'd say, I like to learn by doing and freeing up some of the shelf, because these books that I've got here, I want other people to enjoy them. And they're not really a benefit when they're just here taking up space.
WENDY QUEEN: Mark, I have to know, where exactly is the shipping container located? I just have to know.
MARK GRAHAM: Princeton by the Sea in Half Moon Bay. I could give you the address, but I have some degree of privacy.
WENDY QUEEN: No, no, no, just curious.
MARK GRAHAM: You can find it. In Google Maps, you can probably find it right now if you set your mind to it.
WENDY QUEEN: OK, so I'm going to move to the next question. So digital preservation has been underway for some time now, aren't we done yet? And I'll say personally, in my everyday job, sometimes I ask myself that question. Can we be done? So I'm going to let Heather answer that question first, and then I'll move to Mark.
HEATHER STAINES: Yeah, thanks, Wendy. I have so many things in my life that I wonder that same question. So the reason this question really resonates with me is I spent a lot of time over the course of the fall last year reaching out to librarians to really find out what their priorities were around digital preservation. And in fact, was it a priority? Did it actually rise to front of mind?
HEATHER STAINES: Or where they simply have so many things on their plate, particularly with the COVID and the shutdowns and the economic impact, but I wanted to really find out what they were thinking the main challenges were. And many of them actually indicated to me that this was a reaction that their colleagues had-- aren't we done yet? We work with CLOCKSS, we work with Portico, we have maybe some national initiatives.
HEATHER STAINES: What more is there to do? So from their side, they're always looking for new ways to just raise awareness around the issue. So it's really fantastic that NISO is highlighting digital preservation as a topic this year. Another thing that they really thought was important to distinguish was the difference between safeguarding and more true resilient preservation, which I'm sure is something that Mark is going to mention as well.
HEATHER STAINES: They also, from a librarian standpoint, talked about some uncertainty they had around their rights as librarians, what they could preserve from their collections. And also, given all of the things that were going on last summer and last fall, they raised issues about trust and equity. Who is responsible for making preservation decisions and making decisions about what should be preserved?
HEATHER STAINES: And is this an equitable process? They talked to me about coverage gaps. And I should say that when I reached out to librarians, I did not find a librarian from Antarctica. I didn't have that much time, but I had librarians, fortunately, from every other continent and region in the world. And what they talked about were coverage gaps, preservation of content in local languages as an issue, maybe uneven distribution or preservation across disciplines, with, certainly, large STM publishers preserving their content, but maybe that being less the case when you come to humanities, for example.
HEATHER STAINES: And they also mentioned money. Of course, money is tight, and there's a lot of economic uncertainty right now in competing priorities in the library. So when they think about what they can do, they think about their staff resources and technical capacities being limited, and that against the ever increasing scale of content production and format proliferation, and just, again, the decisions about what should be preserved.
HEATHER STAINES: So there is much work to be done. And I have some things I'm excited about, which I'll talk about a little bit later. But from the library standpoint, we want to actually help raise awareness and educate folks to the fact that this isn't done and that there's much that they can do to help participate in the preservation process.
WENDY QUEEN: Thank you, Heather.
MARK GRAHAM: Yeah, gosh, what a great question. We've just begun, and there's so much more to do. And I'm usually a glass half full kind of guy, but in this case, the glass is just starting to fill up a little bit, I think. First of all, I should just to share from the internet archives perspective, we take digital material-- sorry, we take analog material, we digitize it, and we preserve it, we make it available. But we also collect digital material and preserve it and make it available.
MARK GRAHAM: And in both cases, we're talking about the use of this material, not just preserving it, not just putting it away in some place where it's safe, but, rather, helping to make it more useful and available to people so they can get value from it. There is some examples of that where-- and at first, I'm remembering Jonathan Zittrain and Larry Lessig's article from 2013 about link rot and content drift, where they document that 50% of the citations of the Harvard Business Review, in the time of the study, no longer linked to the digital asset that they had originally linked to, and that something like 50% of the papers in Supreme Court opinions also had suffered link rot of some kind.
MARK GRAHAM: And in our own work with Wikimedia Foundation and more than 300 Wikipedia sites, for example, here you have a format which is entirely digital. It's digital first, full of links, tens of millions of links. And because they're linking to digital assets on the web, and because the web has no inherent backup system or no inherent version control system, we've discovered more than 14 million of those links don't work as they were originally intended.
MARK GRAHAM: And so through the Wayback Machine, we've gone in and we've done a lot of remedial work to repair those links, to take out a broken link to the live web and insert a link to an archived version, either with the Wayback Machine or a number of other web archives. So I would say, overall, we're making good progress. Having things be digital first, in some ways, is exacerbating an ongoing problem of preservation because there's maybe a built-in assumption that if things are born digital, they're going to be persistent.
MARK GRAHAM: There was a recent study in the Columbia Journalism Review where 19 of the 21 newspapers-- online newspapers that were studied said that they had no formal digital preservation practice in place. And I can say, with my years at NBC News, we had more than 100 websites, many of them leading news websites. Preserving those resources was the last thing we were thinking about, right?
MARK GRAHAM: We were always thinking about the next day, like what we're going to be publishing tomorrow or the upgrade to some new system, not how to preserve what we had. And then I'll just finally close by saying, I'm sure people read the Nature-- or it was a paper that was published in Nature, I think it was last year, documenting more than 170 open access journals that word had vanished or disappeared in some fashion.
MARK GRAHAM: And there's actually a lot more to the story than that. Many of them had been preserved in some way, maybe under a different title or in a way that wasn't as easy to discover. But that was actually a good thing because it sparked conversation, and people came together. And new initiatives have sprung up to bring attention to the need to preserve these open access digital journals, as well as those by the publishers.
MARK GRAHAM: And I'm just also reminded today, because today is what-- we're recording this on February 5th, 2021, and there was an announcement from the Confederation of Open Access Repositories about a new project called Notify, which is helping to ensure the understanding and knowledge of open access papers and citations to them and reviews to them, et cetera, to help fill in the ecosystem. I think that can only be a good thing because this is more understanding and awareness of the resources that are available.
MARK GRAHAM: There's more opportunities to preserve them and more opportunities to build services with them that help them make them more useful and reliable for future generations.
WENDY QUEEN: So Mark, is there anything further you'd like to add about the connection between digitization and preservation?
MARK GRAHAM: Other than that, I'd say digitization opens up opportunities that you just don't have when things are analog, obviously. There was an article that came out a few days ago by Nieman Lab about some work at the Internet Archive relative to newspapers, where we're taking a lot of microfilm now and we're digitizing it. And that's good because now it's no longer in a medium that's physically fragile.
MARK GRAHAM: But also, once you have it digitized, you can do more interesting things. You can index it. You can translate it to other languages. You can do word frequency analysis. You can do meta-analysis across papers and across journals, et cetera. So while preservation is certainly key, to me, it's the starting point.
MARK GRAHAM: Getting things digital and then being able to preserve them is a requirement, but then on top of that, you can do much more interesting things. And so that's what excites me. When I first started working at the Internet Archive, one of my concerns, I have to say, was I was going to say, oh, I've always been involved in the future and trying to create new things. And I thought, how am I going to like working for an archive?
MARK GRAHAM: It's going to be looking backward a lot. And while I love looking backward, I found that the opposite is true. And that it's really all about looking forward, because unless we try to envision what it is the future generations or tomorrow might require from us today, then we're probably not doing all that we can be doing today. And also, I'm just going to put it-- and I know this is about research, but I think I take a pretty broad perspective on research.
MARK GRAHAM: For example, I think of Wikipedia as the card catalog system of academic publishing and of reference material overall. But I think, in this time of Twitter and where we've just seen the records of the President of the United States disappeared off the net-- now they're archived all over the place, but the official version of them-- that we run the risk of losing our ability to remember.
MARK GRAHAM: And we are what we remember, right? And so it's just, I say, what I would add is that paying attention to preservation initially is an absolute requirement, but it doesn't end there. It's just the beginning of many, many other things.
WENDY QUEEN: Great, thank you. And that's a wonderful segue to the next question for Heather, which is, who is responsible for preservation and how is that shifting?
HEATHER STAINES: Yeah, it's a great question. And I spent a lot of my career working for publishers, and, I think, from the outset, publishers were largely responsible for preserving the content that they were producing. And a lot of the impetus behind this was the licenses and the contracts that were signed with libraries. And understandably, if you're a librarian, you're going to spend hundreds of thousands, if not millions, of dollars on these content licenses.
HEATHER STAINES: You want to make sure that no technological failure or a natural disaster is going to cause that content to disappear. But interestingly, when we are now in the midst of a shift to open access content, that leverage of the library license is not-- it's not at the forefront. And so the vanishing journals that Mark referred to, many of them are open access.
HEATHER STAINES: And we do typically see less participation amongst the pure gold open access publishers in preservation initiatives. And perhaps the thought is, well, anyone can get the content, anyone can download, it will just be preserved as a matter of course. But we need to make sure that scholarly record continues to be intact.
HEATHER STAINES: In some cases, my last role at the Knowledge Futures Group, we worked with a lot of researcher-led journals and scholar-led projects. And there were entirely volunteer staffs, there was no budget. So from a financial and a technology standpoint, they didn't have the resources to necessarily be able to participate in digital preservation initiatives.
HEATHER STAINES: And so it will be fantastic in the future if we can make sure that through self-service models, or other types of things, that the rich variety of projects that are being produced can be preserved. There are, also, now in lots of talk in the news about preprints. Improvements have been around, as we know, for quite some time. It had gotten a lot of attention lately.
HEATHER STAINES: Preprints are still-- the folks who are behind the preprint servers and behind experimentation with that are still trying to figure out what kind of a business model there is, or if there needs to be a business model. And so who will pay to preserve those preprints? To what extent do the preprints need to be preserved if the published articles are going to be preserved?
HEATHER STAINES: So lots of discussion around that. But you probably are aware, librarians take preservation quite seriously. And so whether or not the publishers continue to play the largest role remains to be seen. And I look forward to talking further with librarians around ideas that they might have on shifting responsibilities for this type of activity.
WENDY QUEEN: Thank you. So Mark, and then we'll go to Heather, what are some challenges and what are you most excited about?
MARK GRAHAM: Gosh, so many things to be excited about because there's so many challenges, right? The challenges themselves are exciting. So the Internet Archive is a non-profit, and we don't charge for people to use our service and we don't have advertisements. We do have a subscription service that's about 800 partners that use that. These are libraries, museums, universities, governments, et cetera.
MARK GRAHAM: Many of them to preserve academic literature. We do have a project called what will be scholar.archive.org. Today, it's scholar/QA.archive.org. And it's a catalog of academic publishing. There's more than 15 million full text papers available through it that are all archived in the Wayback Machine. There's a larger metadata catalog associated with it called Fatcat Wiki, as in fat catalog.
MARK GRAHAM: It's a wiki. And these are based on open source software and its open APIs, and anyone can contribute to them. This was written about in a blog post that-- I'm going to share some of the URLs of things that I talked about here in the chat so you'll be able to see them-- but a blog post from September of last year, "How the Internet Archive is Ensuring Permanent Access to Open Access Journal Articles." And it goes into many of these issues in more detail.
MARK GRAHAM: So I would say, I'm excited about all of the opportunity that we have to work collaboratively with open standards and with APIs and with open source software as part of a community of people that are passionate about preservation. And they're passionate about use and are working together to help us get more value and utility out of these materials that are being produced.
WENDY QUEEN: Great. Heather, would you like to share your thoughts?
HEATHER STAINES: Yeah, like Mark, I have so many thoughts. There are many tough digital preservation challenges to tackle. Increasingly, we're seeing dynamic content and interactive content. I had the opportunity to work with some of that. When I was working on the PubPub hosting platform, we had the Harvard Data Science Review journal, which had a lot of interactives. And it's just amazing the creativity that authors can bring to bear, but how do you preserve that in a meaningful way?
HEATHER STAINES: And sometimes you have to just take a snapshot or almost destroy the original vision in order to get something preserved. So I think that's-- it's a challenge and an opportunity. Also, increasingly, we're seeing content that's not just in one place. You might have an article in one platform, standalone peer review, almost like an overlay journal, on another.
HEATHER STAINES: You might have the data in yet a third place. And so I think preserving not just those items but the connections between them will really mean that entire knowledge graphs need to be preserved. As you know, Wendy, when we worked with open access books, they may be in a whole variety of different locations. And so keeping track of which items are the same as which other items can be a challenge.
HEATHER STAINES: There's a lot of-- there's annotations. People know I can't get through a presentation without talking about annotations, but things that are related to the text. There's underlying code, videos, supplemental information, data. We want to make sure that the context of all of that can be preserved. I'm also really excited about how digital preservation interest is expanding worldwide.
HEATHER STAINES: I had the opportunity to learn recently about some of the local organizations that are taking up the challenge for digital preservation. And some of them that I learned about are the African Library and Information Associations and Institutions, AfLIA, and also the West and Central Africa Research and Education Network, WACREN. There are new initiatives in Australia.
HEATHER STAINES: And recently, the Digital Preservation Coalition expanded into Australia. So I'm very heartened that we are seeing a dramatic expansion amongst the places who really think and understand that digital preservation is important and can then take that out and talk to the general public in the community about it.
WENDY QUEEN: Wonderful, thank you. And that leads me to my favorite question. This is where it gets good. Anything you want attendees to take away, right? So I will start with Mark, then go to Heather.
MARK GRAHAM: Well, I guess, I would just like to just invite people to collaborate and participate and to share. My email address is mark@archive.org. I welcome your questions. If you see something, say something. Web.Archive.org/save is available to everyone to share and to archive things from the public web. And if you have a file, just go to archive.org and hit that little upload at the top right and upload it.
MARK GRAHAM: Last night, for example, I was on Twitter, and there was a paper that had just been published by the Polarization Project at Stanford University. And it was a paper about the 2020 elections, and there was no URL for it. It was published on Dropbox. And so I downloaded it, and I put it up on the Internet Archive.
MARK GRAHAM: And then I was able to get a URL for it, and I was able to save it on of the Wayback Machine. And I wrote to the author on Twitter, and he was like, thank you so much for archiving that. And I don't want disparage Dropbox, but it was the not best [? persisted ?] place to have something if you want it to be accessible and available to people on the long-term. So there are a lot of open source tools out there and platforms.
MARK GRAHAM: And I just would encourage anyone who-- just to try things. Get your hands dirty, experiment a little bit. You can't mess anything up. And ask for help and be of service to others, because we're all working in this effort together.
WENDY QUEEN: Thank you, Mark. Heather?
HEATHER STAINES: Wow, that's such great advice, Mark. It's tough to follow. I guess I would point to some resources that I have found to be valuable myself. So our NASIG Digital Preservation Committee on the NASIG website-- and I will follow Mark's lead and put links in the chat-- we've developed some materials particularly for early career folks who don't have much of a background yet in digital preservation.
HEATHER STAINES: So we put together a digital preservation 101, and here are questions that you should ask your publishers, and a guide to the Keepers Registry, which is a fantastic project where librarians can see where multiple content should be preserved in multiple initiatives in order for it to be secure. There's fantastic resources available via the CLOCKSS and Portico sites, as well as the LOCKSS network, which maybe we can talk about in the question period, because we didn't really get to talk about that and that's amazing.
HEATHER STAINES: And then get involved, as Mark said, there continue to be a need for best practices. One group I'll give a shout out to at the University of Michigan Publishing, they're putting together a digital preservation baseline group, which is intended to give advice to libraries and potentially other types of institutions about what, at a minimum, what you should be preserving, because it can get to be quite a rabbit hole, as Mark had attested to.
HEATHER STAINES: And then I mentioned that need for licenses to leverage preservation of content, and so our Digital Preservation Committee at NASIG is also working on a project called the Model License Project in association with the Library Publishing Coalition and the Society for Scholarly Publishing to come up with a template that libraries and publishers, and potentially other cultural heritage institutions, can use to ensure that preservation is being thought about and, hopefully, put into practice.
WENDY QUEEN: Wonderful. I love that we have a tagline out of our panel, too-- if you see it, save it. And it feels like we need t-shirts for that as well. So before we go live into our Q&A period, I just did want to take a moment to thank the panelists. I can't express enough that they are both amazing, and I couldn't be happier to have had the opportunity to basically watch the way their minds work through this process and share with us.
WENDY QUEEN: So thank you very much.
HEATHER STAINES: Thank you.
MARK GRAHAM: Thank you, Wendy. [MUSIC PLAYING]