Name: How is ‘Data’ Understood in the Humanities and What Does it Mean for Open Scholarship and Data Sharing Policies?
Uploaded: 2024-12-03T00:00:00.0000000
Duration: T00H59M30S
Description: How is ‘Data’ Understood in the Humanities and What Does it Mean for Open Scholarship and Data Sharing Policies?

Name: How is ‘Data’ Understood in the Humanities and What Does it Mean for Open Scholarship and Data Sharing Policies?

Description: How is ‘Data’ Understood in the Humanities and What Does it Mean for Open Scholarship and Data Sharing Policies?

Thumbnail URL: https://cadmoremediastorage.blob.core.windows.net/5fa323f6-0d97-43ca-b049-6af906492041/videoscrubberimages/Scrubber_1.jpg

Duration: T00H59M30S

Embed URL: https://stream.cadmore.media/player/5fa323f6-0d97-43ca-b049-6af906492041

Content URL: https://cadmoreoriginalmedia.blob.core.windows.net/5fa323f6-0d97-43ca-b049-6af906492041/session_4c___how_is_%e2%80%98data%e2%80%99_understood_in_the_humanities_and_.mp4?sv=2019-02-02&sr=c&sig=aWSxs4bLS4%2F68HnE3e%2FUyo4paYq234wF%2B3GIwZC2o3M%3D&st=2026-04-01T12%3A41%3A16Z&se=2026-04-01T14%3A46%3A16Z&sp=r

Upload Date: 2024-12-03T00:00:00.0000000

Transcript: Language: EN.
Segment:0 .
So we have a little bit of a change session. So I'm Jennifer Kemp. I'm a director of consulting services at stratus, which is strategies for open science or scholarship want to be inclusive of the humanities. Of course, we're supposed to have a third speaker today who couldn't join, who couldn't make the meeting. So not to worry.
We have fantastic speakers. We have plenty of time for Q&A and. We've also got some last minute input to help bring the publisher perspective to this meeting. So I'm going to be consulting my phone for some notes. So I'll bring I'll come back to that later. But let's get a show of hands from the audience who's a publisher in this room before we get started.
OK librarians. OK Do we have any researchers here. OK consultants. Research infrastructure. And who works primarily in the humanities. OK primarily with qualitative data. OK fantastic.
So I'm going to let the speakers introduce themselves. So this is, I believe, the third humanities session at this meeting, which is very exciting. Raise your hand if you've been to all three sessions. OK, so we'll have plenty of time for discussion later, please. Think of the questions. Think of comments. We very much want to hear from you.
So I'm going to go through just a little bit of background on the session, why we're here today, what we hope to cover. So that, as you all know. Research data has been a big topic of discussion over the last couple of years. That tends to focus primarily on quantitative data and. What we consider the hard sciences and the humanities and often the social sciences and qualitative data in general tends to get left out of that discussion.
That's only seems to have accelerated with the 2022 OSTP Nelson memo. So we're all kind of waiting to see what's going to happen with the National Endowment for the Humanities policy in particular, but. For the moment for this for the purposes of this session, we are defining data very broadly. And we know that not all humanists consider what they do to produce data.
So that is one of the things that we'll discuss. So we are talking about it broadly here in terms of qualitative data in general. And similarly, we are talking about policy very broadly here. So probably funder policies come to mind, but we're also thinking of publisher policy. So if it becomes common that what might be considered data in humanity is shared more widely, how do publishers, for example, accommodate that.
What might they provide in their guidelines to authors. So those of you who are in the other humanities sessions may remember yesterday in the crisis in the humanities session, which was very excellent, very depressing, that there was talk of a focus on Humanities Research. That's very much our focus for today as well. So I'm going to turn things over to our panelists in just a minute to introduce themselves.
And we've got some examples to share. And like I said, plenty of time for questions. So before I turn it over to them, a special Thanks because. We have one person who traveled across the country to be here and another who got up in the middle of the night to be here. So these are very enthusiastic experts that we have with us today. So, Christie, do you want to start. Sure Hi.
My name is Christie golebiowski Davis. I am the director for the University library center for digital scholarship at Uc Santa Cruz. And in my work, I actually come from a research background. My PhD is in anthropology, and anthropology is one of those weird things where sometimes it's a social science and sometimes it's a humanities depending on where you go in the work you're doing. But in my current work, I also work with a lot of historians, a lot of people in Latin American studies, a lot of people in arts and digital media.
And what we do in our unit is we do a lot of consultations around how to use digital tools in your method around thinking about how that impacts your work, how that impacts your thinking, and how using a digital tool is going to change the way you understand your data. Because there are affordances and limitations that each digital tool has. So we do a lot of that conversation. Some of our conversations do start with the scope of do I have data.
And so we have conversations with people with that. We work with faculty, we work with graduates, we work with undergraduates. We also do a lot of work in providing access to Tools and equipment and training on those tools and equipment in my unit. I'm Catherine Skinner. I am currently the research lead for invest in open infrastructure, which is a not for profit organization that is focused on advancing open meaning both the investment in OpenAI and also adoption of open in all its forms.
So my background here though really stems back to some of the earliest days of my career when I was at Emory University. And my PhD in sociology and I was a sociology candidate in the early 2000, late 1990s, and thought that I was heading for something that was not tenure track but did not anticipate anything that was going to be digital or anything along these lines.
Again, it was 99 2000 as I was entering, and within the first two years of me being at Emory, I had a side job where I was managing a couple of Mellon funded projects on OI, PMH, so on data sharing, basically archival data sharing. And my eyes were opened very quickly to a sociological principle that I was studying in the music industry in my dissertation research, but that I saw unfolding in the knowledge industry, which is that industries don't like to change and people within industries tend to try to keep things in line, especially the big players, not because they want to be conservative, but because it suits the different methods that we have in place.
And what I was watching from my vantage point, working in the libraries and studying sociology was this absolute chaos that was to come. That was pre-google. It was a moment where it was very apparent to me that knowledge was unlikely to be seen as anything except a commodity, unless some pretty severe steps were taken. So I threw in fully into the open space, not blindly, not without full awareness that corporate models can be blended in and should be blended into that space.
But my work at Emory, I worked with what is now the Center for Digital Scholarship at Emory for several years, helped to found that. And then I founded a nonprofit that wound up being the home of several programs that included library publishing coalition, which some of you may have heard of. It intersects a little bit with SSP as well as a digital preservation unit and several other tool groups.
And in that work, I was struck over and over and over again by how often the term data comes up and about the myriad expressions on people's faces within any mixed audience when it does. So when I was at Edutopia, which is the nonprofit that I founded, we worked with libraries, archives, museums and publishers. Always an interesting intersection to try to play to the strengths of all of those groups or try to find a common language in those groups.
And that was one of the spaces where as I'll inevitably describe in a little bit seeing how data was perceived, whether it was in the area of etds electronic theses and dissertations and what are these affiliated files that people are trying to deposit. And what do I do with them. Or whether it was things like transatlantic slave trade data and mapping tools and different approaches to scholarship that were moving things much faster than a lot of people were comfortable supporting or being a part of.
And so now here we are 20 years later, 24 years later. And the thing that shocks me the most is that we're still having this conversation. What is data in the humanities. Thank you. Great introduction. So next, we'll start where we started when we were first discussing this panel. What are we going to talk about.
Which is what. How does this come up in your everyday conversations and what are some examples of projects that use and make available data humanities projects that make available data. So in my everyday life, it usually comes up in the form of talking to a researcher who wants to do something digital, either has a really clear idea of what they want to do or knows that there's something there that could help them, but they don't quite know what it is.
So usually it's the start of this conversation of what are your goals. What are you trying to do with the information you have. What information do you have. And then kind of moving forward into more research questions and how to leverage the information that you have meaningfully. And like I said before, how to make sure that when you're doing something.
So if you're doing text analysis, why don't you pull up the text analysis example that I have here. This is not a project that I work on, but I think it's really cool. This is copyright in the early English book market and algorithmic study that's looking at how different pieces of early text were. There was a lot of copying happening in different types of publications, both images and text.
And so how do you how do you share this with somebody when you're using code to analyze something, when you're using lots of text on the background. What I like about this example, you can see this is just an image, but if you go to the link, you can see that it's interactive. You can actually hit the Refresh button and it'll bring up different examples. So what's cool about this to me is that you don't have to actually dig into all of the individual pieces on the background.
You can just see what researchers are seeing at the same time. But there's this question of how do you share that and how do you share that in a way that meets our goals as researchers to be transparent about our methods and to hopefully allow for some reproducibility of the work that we're doing. Of course, in the humanities, like the perspective that you're bringing is a really important part of it.
And so that's not going to be pulled away altogether. But if you're using a large underlying set of information, you want to be clear about how you're using that, particularly in a situation like this. And so I will talk to people about different options that they have in sharing that. And often we'll have questions like, well, I'm using a lot of text that is already existent elsewhere. Do I need to completely reproduce all of that text as an appended document, or are there other ways that I can share it.
Am I constrained by funds and places that I can put something up. That's the thing that we see a lot. And so one of the things I'll do is have it's really individualized, which is one of the problems with this. Like my answer to that question is going to be dependent on the research they're doing and the resources that they have available.
But we kind of find different ways to approach that question. I'm going to share another example now as well for a project that I was on, which is the lithic collection, and this is a project that I was on. I was the graduate student in for this one and I was in charge of the scans. I have a lot of background in 3D scanning, and there's this process of 3D scanning where you gather your original data and then you edit the scans because you have to because they're not perfect in the first time.
And that's a human intervention. And so how do you share that information. A lot of people in the scanning community would love to share the original scans, the edited scan just before merge and completion in the completed scan. But if you try to share that on a data platform, you're talking about Tens of thousands of dollars just for data storage. And for most people that's not within their budget.
And so in this particular case, what we did is there's a CSV attachment and in that CSV attachment, we actually quantify for each scan what changes we made. We had kind of a set of changes that we would choose between and we kept that information, we recorded that as we did the editing. And then shared that. And in the cases where we did an edit that wasn't part of our standard edits, we made sure to share that as well.
So we don't have the ability here to share all that data, but we do have the final scans and we also have PNG images of the scans because this is also a type of data that is still relatively new 3D scans. And so there's this question about what's the longevity of that. So that's kind of my everyday engagement with thinking about data with this humanistic view of how are we sharing it, how are we talking about it, what's our format, what are our options and how can we share it when we don't have the opportunity to share all of the bits of it.
Yeah so I've gotten to where a bunch of different hats, so I've worn the hat and both as a graduate student working within projects and then also as an advisor to groups of students and faculty members who were working on projects. And so you can pull up the slave trade database. Yes so one of the most. Informative and helpful moments in my journey, particularly within this kind of Humanities and data space, was back in 2007.
I helped to take slave voyages.org or had helped to create slave voyages.org from the transatlantic slave trade database, which was a long standing collection that five professors had led. A lot of the charge on. Others had been contributing to it. Throughout one of those professors, Dr. David eltis, wanted to combine this data set that had been building over 50 years of different slave voyages, all the different information about it embarkation points, disembarkation points, how many people were on the ship.
Who was the captain of the ship. What's known about what happened when the ship was in action. Was there a revolt. It chronicles even some of those kinds of things that happened within those more than 35,000 voyages. They combined five data sets into one back in the 90s and they created a cd-rom off of it. And they were historians. And so this was huge who knew how to do such a thing.
And they were using cd-rom and Cambridge was the one that published it and they licensed it. And then Cambridge was like, please take your copyright back because we don't know what to do with this. We don't know how to preserve this. We don't know what to do with it. Like you publish it again in a different way, but don't involve us. So David was at Emory University as a professor at the time, and we had the luck in the library where I was working as a student worker at the time of intersecting with David through a Dean who knew of our work in the digital humanities and knew that what David was really doing was digital humanities.
He just didn't have that name for it because he was a historian who had been doing this for a lot longer than there had been digital humanities. So David became a really huge part of our team and our program, watching what happened as we tried to code that project and build the data model that could actually work within internet based forms instead of just within a more static space, building ways for people to interact with it, not just interacting with the data for visualizations and maps and all sorts of things, which it's a remarkable resource in terms of what it does, but also submitting to it and making sure that whole process was there and engaged from the beginning.
We had to grapple with all the different questions about what data was and what the different types of data were that we were trying to compile. And then what do we do with that data to make sure that we don't lose pieces of it along the way, that the model doesn't get compromised. It was an incredible effort. So I've had that kind of experience. I'll skip experiences from the kind of in-between years and switch to where I am right now, which is looking at it from more of policy and practice perspective and thinking about what happens now for NIH and for IMLS as a result of the Nelson memo and the policies that are coming out from that.
And what does that do to the whole constellation of players that currently serve to provide information about research and those research assets. How are policies now going to change the way that we do that work. And then what do you do when a lot of the professors, even the ones that are getting grants and need to know how to do this, still don't think that what they're doing is creating data.
So if you try to talk to them about making sure that they are storing their data somewhere safe, they look at you funny. If you talk about the data being published, they aren't quite sure. There are translations that need to be done between different research communities and publisher communities, societies and librarians. And a lot of that hasn't happened.
Well already. And now there's a government element on top of it that introduces all kinds of interesting things. So I've got a study right now that I'm working on that is looking closely at institutional responses to the Nelson memo. We're working with 30 institutions across the US, everything from Yale to community colleges. We wanted a good cross section of different institutional types to try to understand what are the workflows internally between their office of sponsored research, their faculty members, who are the ones that are actually doing the research, usually the library it different units on campus.
This that are involved in advising how to publish and how to deposit in the light of the changing policies. And so we don't have results from that yet. We're just at the beginning of that project. But that's the lens that I'm now looking at, this kind of conundrum of data and data maintenance and replicability and things like that within the humanities. Do you mind if I add something to that. Please so one of the really interesting things that we've been talking about as well in preparing for this panel is this question again, this question of what is data in humanities.
What is the expectation on what would be preserved, and how that would be preserved. And part of that question is about funding. If you have digital material that's going to cost you $30,000 to put up that is now required to be put up, how do you manage that. But also this question around not it's not unusual in the humanities for your data to not be your data.
And what I mean about that, there was a really good example. Please forgive me while I pull up an image of a poster from yesterday. And please forgive me if I pronounce this person's name wrong, but Camilla Livio from Georgia. Uga libraries was doing a project on Twitter data. And if you've been following anything in the Twitter sphere, what happened in the last about year and a half is that access to the API was essentially completely cut off unless you have deep pockets.
But there's also a policy change that has happened that someone, if someone like they were working with the data previously and had the data to do the research on they can no longer share that underlying data by policy. Like they can't do that because it's not theirs to own and share. And when we're working with humanities data, we are often working with these larger sets of things because that's what we as humans deal with.
We want to be where the people are and how do we grapple with that when talking about a policy that's now requiring us to make some of that information available. And I have any answers to it, but I know that those are conversations that I'm having on my campus. That was a conversation that we were prepared to talk about earlier this week.
But the regulations haven't come out yet. So it's still kind of just this up in the air. And I think it really comes down to this tension between how policies are made in realms where we don't have agreed upon definitions or understanding of what data is. And so now we're making policies around that. And are those policies being made with even an understanding that there are data to work with.
I don't because I'm not in those rooms, but kind of interrogating that question, I'm really looking forward to seeing the results of some of your work because hopefully it will help inform those of us who are grappling with that question right now. And I think there's a lot of fear with a lot of people that I work with that there are going to be requirements to make access, make accessible things that they don't have funds or rights to make accessible in their research.
And then how do you what do you do with that. Does that mean that you just avoid research. That is exactly Yeah. Yeah is that there are whole swaths of research that will no longer be able to be federally funded because they can't comply with some of the policy guidance that may be there. And it's all still hypothetical, but this is one of the concerns is what if the area that I'm trying to study cannot be studied in the open.
What does that mean for the viability of me receiving funding. If it's known from the beginning that the assets that I'm working with can't be deposited or redeposited in a way that someone else can use or look at. Has a really good point. So you both touched on a couple of other questions that have come up around sharing less widely or more informally and data curation and who would be responsible, who has the tools, who has the knowledge, what policies would be around this.
And I mentioned that we had a sort of a late, late breaking conversation with John Lenihan from JSTOR, who was in one of the humanities panels yesterday. So we some of you probably arrived after I mentioned this, but we were supposed to have a third panelist, very unfortunately, to not make this meeting, but we wanted to see if we could still have some platform and publisher representation. And so many of you most of you probably know JSTOR.
So when it comes to data curation, again, who has the tools, who has the time, who has the resources. Jstor works with a lot of smaller publishers. So I mean, even a lot of larger publishers wouldn't have the resources to do this kind of data curation. But smaller publishers definitely don't. So do you want to talk a little bit about data, data curation and that sort of informal sharing. Yeah I think actually, this would be a really good opportunity for what's on the menu example that we had prepared.
We pulled together a bunch of examples to think and use to speak about different issues and what's on the menu if you're not familiar with it. It was a New York Public Library project where they took images of menus from even the early 1900s and into the 1800s and then crowdsourced the ability to go from the image to text and then started looking at things like how did prices change. What were the different names that people used for foods.
Because that's something that happens. And how does this affect how we think about food. It was a really cool project. It is mostly defunct at this point in time. The website still exists. You can still see the thumbnails, but you can't see the original images anymore. So this is a project that had a lot of really good intentions and for reasons unknown to me, maybe somebody else knows what happened with this, it just fell apart because it wasn't able to be maintained.
They were also at one point in time there was a CSV of data that was downloadable. It's not anymore. I'm not sure why that happened. I don't know if they identified it says that. It says now that it'll be available soon, but I know at one point in time it was available. It's possible that there was something wrong with the data that they wanted to update or something else going on.
It's a really cool example of crowdsourcing. It's a really cool example of thinking about how things change over time and using the resources of a large public library to engage with this type of research. But it has this issue. It's very representative of the issue around how do we curate and store this and how do we find resources in particular for some of these projects that are very interactive, including the slave voyages project that require maintenance on platforms that regularly change.
I've done some pedagogy work in creating online tools and those were created a while ago using Adobe Flash. And when Adobe Flash went defunct, my colleague and I who worked on it got loads of questions from people are you going to remake it. And we're like, we would love to but that's a lot of time. And there are two of us. And so how do you.
It's a question I don't have an answer to but it's a question I know a lot of people, including myself, are grappling with. How do you deal with this long term issue. And I think another point that has come up that is it's not. It's not. It's not like everything's not the same. The data that you're looking at, we've talked about maps, we've talked about we've talked about community input data.
We've talked about 3D scans. We have some examples here that have audio. They're all different. And it's hard to have easy to develop workflows when you don't have a sense of what you're trying to create a workflow for because there's so much variety to it. That's why Cambridge University Press was like, hey, David and company, please take back this stuff because we don't have the knowledge, the systems, the infrastructure or the desire to have to create it for this project at this time.
We need you to do something else with your 35,000 data points on slave voyages. Super important historical information that lots of work has gone into. Get trapped in platforms that just don't age well at all. And we know this. We've been through cycle after cycle after cycle of this. One of the first times that I had this, I was a founder of Southern spaces, which is my background is sociology and American Studies and Southern spaces is one of the first online journals that was really trying to use the internet's capabilities.
It was not PDF oriented. It was very much video, audio and interlinking and mapping and things of that nature, trying to make sure that we were actually integrating new scholarly practices and methods and discovering new things through the different types of artifacts that we could make visible or audible on the web. The system that we started with was driven for the editorial side, which this was 2001, 2002.
So we were early adopters on open journal systems, but we couldn't use that for the front, so we built our own HTML page for the front. I wrote the HTML. That was a very, very bad idea. And then a few years later, we had to shift it to XML and make sure that there were actually some workflows that were working well in the background and that the repository did what a repository needed to do that site alone.
I'm still on the editorial board. It's gone through 16 iterations. And so far, we haven't lost any data. But the amount of data that is now behind all of those humanities publications dating back to about 2004 I think is when we published the first set of publications. It's astronomical. And if Emory had not founded the Center for Digital Scholarship, it wouldn't exist.
There's no way that Emory would have continued to pay for that as a professor based project, especially after several of us that were working on it had left the University. And so there are these huge questions around how do we maintain and when do we maintain digital scholarship and digital data in the humanities space. When is it important enough to do that.
Who should do it. Is it the Wayback machine. Which is what I tried to do after I saw this. I was like, I want to go back and see if those images are there. Well, they're under a DDoS attack right now. If folks didn't know Wayback Machine is in deep doo doo until they can stop this sustained multi-day attack that they've been under.
But Internet Archive shouldn't be the ones in charge of preserving the images that were from New York Public libraries menu. It's this very convoluted space and then where that hits again with policy today in a really worrisome way is if you take even the generalist repositories that are available for data deposit. So let's say and this is jumping ahead of ourselves because of course, policy is not there.
But let's say that in the age says to all of its grantees from here forward, whatever data you are working with does need to be reposted as long as it can be made publicly accessible. And you need to figure out where to do that. NIH is not going to run its own repository. Here's our list of suggestions. Well, maybe Dryad is on that list of suggestions. I certainly hope it is.
It's one of my favorites. Maybe figshare is on that list. Maybe there are others that are on that list. A how to. How did the faculty members decide where they're going to plot that data. And do they know what kind of curation practices are going to happen once it's plopped? If they're putting it somewhere that is free to them.
Are they putting it somewhere that's under more jeopardy of being dissolved in the future because there's not a business model undergirding it that allows it to actually stay alive. What are the curation practices. Some of the generalist repositories do very minor things. They back up the data, but they certainly don't do a heavy level, bit level checking and things of this nature.
And they certainly don't do migrations and other things kind of as part of The course. What is it that the different options that we lean on are providing and do we know enough about those options to know what to recommend to different scholars. And so one of the things that I think is going to happen and this is again, the reason that we're doing this study right now is that I think you're going to hit all kinds of people who are going to be coming to you and going, which repository do I use.
Kristy I already do, but especially what's the policy is driving them to you. And they're going to have panic in their faces. I have to comply. Where do I put it. Kristy and my panic is that we've talked to repositories about some of this humanities related data and been told, no, we are not the repository for you.
What do you do with that. I will say this is a place where if your University, if you're in a University setting. And University has the resources to do so, which is not always true. I've seen some really good examples of repositories from libraries. University of Minnesota has a really good repository. California Digital Library services, the whole Uc system.
There are a lot of things that I like about escholarship. They are one of the repositories that has told us this is maybe not the right place for that type of data. And then we go to Dryad, who's also turned us away for some things. And a lot of that has to do with file format types, often the types of things that we're working with. Unless you're working with really traditional audio, video, and text file formats, that's where repositories get a little uncomfortable.
And what University of Minnesota has done is said, we will take the files that you give us. We have some recommendations for those three big ones that are regular. If you have something that's different, we will take it. We can't promise anything about it, but that's better than nothing. But you do lose a lot of the digital humanities projects that I've worked on or digital scholarship projects that I've worked on have this really interesting and important engagement layer on top of it.
And that's another place that's kind of Internet Archive is the place that I know everybody goes to. But it's problematic for a number of reasons, not including the ones that you've talked about, but also you lose some of that interaction. Why can't I remember the name of the platform. Thank you. So the 3D, the Saqqara project is actually published through a publisher.
I don't remember the specific publisher at the time, but it's on the scholar platform. And Internet Archive can't handle scholar. It can't handle any of the interactivity. And this is a project by Dr. Elaine Sullivan, who is out of Uc Santa Cruz. And one of the things that we dealt with there. And one of the things I really like about her work. So these are 3D models.
They're not 3D scans, they're 3D models that she's created based on site plans and based on other kind of archaeological data about what the actual landscape looked like and where the buildings were. And she's been working really hard on this other question, which kind of leads into our metadata conversation of how do you add citations to 3D models. I have a colleague who works in podcasting. How do you add citation to podcasting in reasonable ways.
So that's another conversation that's happening in this realm. Like how do you how do you cite your sources when you're creating digital media. And I know you have some interesting things to say about metadata. Everybody's favorite topic. Jennifer, you may have more to say on metadata than I do. In all honesty, I may pitch that one back over to you because we haven't gotten enough of your voice in here.
Sure so a couple of notes on because you mentioned platforms and policy and metadata. So one is the most of you probably know that the OSTP memo mentions book chapters, not the book title level. So you talk to a lot of people in humanities, they don't think in terms of chapters, because a monograph is meant to be consumed as a whole. And there are not a lot of there's not a lot of content at the chapter level.
So that's one interesting aspect of it. Another is because humanities often has such a focus on books, even though there's a lot of content. Now because of open access in journals that is distributed across multiple platforms. That's long been the case for books. And that's only going to grow. So if you have this data available and you do find a repository for it, how do you then link the data in a repository here.
Again, just defining data loosely to maybe all of the places where this content exists, right. So there's a lot of challenges. So there's a lot of challenges that kind of overlap here in terms of policy and metadata. I mean, metadata on books often isn't great in the first place. So if you're talking about linking underlying data to any kind of published research output, that's not often great, even in the so-called hard sciences when you get into territory like this, where you might have multiple data sets of different kinds, maybe those get spread out among different repositories because one can't accommodate them all and you have the associated book or book chapters in multiple places.
It gets a lot more complicated for everybody, publishers and platforms included to link all of that together. And then the thing that I'll add to that coming at it again from the kind of policy centric perspective that for better or for worse now occupy right now because of this project, one of the biggest problems with metadata right now in a whole host of spaces is that you can't figure out who funded a project because most people don't use good funder data.
They don't use good wars. They don't use there are all kinds of things that they're not using, but they especially don't put the funder data in there. And so if you want to identify, let's say, your National Science Foundation and you want to identify what publications have come out of the research that you have funded right now, that is a crapshoot game even for a group like NSF, much less when you're looking to engage in other more humanities centered spaces.
And so the norms and the standards that we use have to catch up with the policies that we're starting to set or else there is no way to assess the use of the policy if you can't actually reference that. Yeah, this was funded by the NIH consistently and in ways that you can pull from the metadata. So I think sorry, I was going to say and this is where I will put my plug-in for.
I know there are a lot of working groups on the researcher side who are having conversations about what are the things that they think should be important in the metadata. There's an IMLS funded project looking at metadata for 3D objects. I'm almost certain that my colleague who works in audio podcasting is part of working groups around what that type of metadata should look like.
But every working group that I know of is primarily researchers and doesn't have publishers or institutional repositories represented on those. So if you are working in a field that has data that you want to see, metadata like funding information or other things like that, or you have a better sense of what the underlying structures are, reach out.
See if you can find somebody who works in that field, reach out to them, see if they know of any working groups. I think that's a really good place to start having some more collaboration on what that looks like. Yeah one of the things that came up in conversation with John is this need for cross stakeholder groups and more communication to figure all this out. I mean, none of us are going to figure it out in our own stakeholder silos.
We've got one more question before we get to the Q&A. But actually, I want to ask an audience question first, actually for the publishers in the room, are there any publishers who are providing guidance, even if it's maybe informal but frequent to humanities authors about data and repositories in general. OK stick around for the Q&A, please. Also, so let's say, the NIH policy comes out tomorrow and they talk about data, they want data, they want data associated with chapters, whatever the case may be.
How many of the publishers in the room feel reasonably confident they can adapt and respond fairly quickly without too much pain. OK, great. Great Yes. Please, please stick around. So a lot of this is really about just giving visibility to this work. Maybe you consider it data, maybe you don't.
But a monograph being the published output of something is great. But particularly in the context of funding and of policy understanding that there is so much work that goes into it that it's work that others could benefit from, that humanity should not be left out of conversations when it comes to funding and policy is really what a lot of this is about. So we wanted to take a couple of minutes to see what would it what would it look like if there was a lot more visibility and what would it take to have infrastructures and policies in place to fill in the gaps that we've got now.
What are we lacking now. I know it's a big question. So we could spend a whole half day on that. I think I would say the two things that I see as lacking are central places to have these conversations because we're having these conversations in a lot of places. But I feel like they can sometimes be separated both in terms of what's happening.
There are places online where you can our online repositories for certain types of data that are just completely hidden to other groups that work with them for no other reason than they're just not aware that they exist. So I think that would be really important is how do we get everyone in the same room, which is the perennial question, and how do we create a system that recognizes that there are always going to be outliers, especially in humanities data.
There's always going to be a set of things that we're used to working with, and then there's going to be something that's new and different, and how do we incorporate that into the structure. Yeah, I would say that the thing that I worry the most about is infrastructure and how it gets funded. And there. I don't care what your business model is, I think everybody is relatively screwed right now by the lack of understanding of how much work and how much cost goes into preservation of anything.
So whether you're talking about various types of data or whether you're talking about the publications themselves, that the journal articles, et cetera, there is so little understanding of the real costs that everybody is accruing. So whether you are a commercial player or whether you are a non-profit player, I don't care. Those are business models. The problem is that we don't have a good investment structure that allows us to build tools that can be used by enough people that we can get to a level of scale that doesn't bankrupt all of us.
Now we have a whole bunch of little systems running in a whole bunch of big publishers and little publishers and a whole bunch of libraries, institutional repository for every library, different one, often homegrown. It is a magnitude of things to try to upkeep and we can't do it. It's impossible. And so figuring out how to merge private and public interests right now is the piece that I'm actually the most interested in and most flummoxed by.
People don't want to have the conversation. People don't know how to have the conversation. The fact that I put together a proposal for a project called reasonable costs landed me in hot water with half of the societies in the country because they thought that what I was trying to do was claim that their costs were too high. It's not the point of the project. Point of the project is to point out that we don't have the infrastructure yet that we need.
None of us do. Elsevier doesn't have it. Nobody has it. And I don't know how we address that individually. And the work that we've done so far to do that has led us into spirals of institutional repository at every library, which doesn't talk to other institutional libraries. And then it gets hidden because why should it.
I want it my way. I think it's remarkable. But I think this also speaks to part of the conversation yesterday about the humanities crisis as part of it's also identifying that this is valuable and in convincing people that this is valuable. I mean, I believe it's valuable. So if you don't think it's valuable, I'm happy to talk to you about why I think it's valuable.
You don't have to agree with me, but I will talk your ear off if you want to hear it. But we need to get we need to get that word out in a way that resonates with people because otherwise we're just talking to ourselves. And then we run into these issues where Yes, this repository works for me and and I don't have the bandwidth or the funding or the buy in from other people to have those larger conversations.
Well, we do have a few more minutes for conversation, so let's open it up to questions and comments. Thank you so much. Let me get the microphone for the recording. And everybody say who you are as you start speaking, please. I said, say who you are as you start speaking, if you can make the microphone work. Hi, I'm Brian Moynihan from orchid, but a lapsed librarian.
And my question is. A lot of research institutions in Europe have specialized data Steward positions. I don't really know if they're focused on humanities at all. My guess is probably more on hard data, as you say. But are you aware at all of awareness on in universities and deans that might be a possible something to help like in the sort of beginning of having someone to completely focused on this and not just someone like a former collections librarian that's now a research data librarian and not maybe a specialist in whatever field.
Just curious about that. So I'll say any time that I work with my colleagues who are in Europe, I'm always astounded at how much better their infrastructure is. And I wish that we could just tap into that somehow. And I'm not sure where that blockage is. My experience actively working in a library right now is that there's no funds for that. There is desire there.
But often when my University librarian is having to make decisions about positions that we have, there are other pressing needs on campus that need to take priority because we've been our library at Uc Santa Cruz was hit severely with the 2008 crisis and hasn't come back from that. And we're about to get hit again. And so just building up that base is the priority. I know that there's interest there.
There's just it's just a lack of capacity. And that's really, really hard to struggle with when you're actively dealing with those questions. The only thing I'll add to that is going to get worse, I think, across the board and certainly in the US, but in a lot of other economic spaces as well. I think we're going to see a really significant hit to higher Ed I'm not saying anything novel, but I will say I've got a 13-year-old and a 15-year-old and I'm not sure whether they I have a PhD, my husband is a professor and has a PhD.
We both value education. We both value the work that we do. I don't know that either one of my children will go to a traditional four year college. It is moving that fast at this point. And I think whatever is coming in the next couple of years is likely to have all of our heads spinning all over again and again begs all these questions of if we're not consolidating our efforts around a smaller portfolio of projects and programs and infrastructures, then what's going to be lost.
Would say it's here. Yeah we had to cancel two searches this year. Yeah OK. Sorry that's such a bleak note that we're just on right now. I'm just trying to gather my thoughts. No, no, not at all. This has been great. Thanks so much for putting on this panel. I'm Méabh Liz Lyons.
I'm actually at Emory. I'm in the Center for humanistic inquiry, but I work really, really closely with the Center for digital scholarship. Yeah, they're amazing. And so my job, I work with researchers who are writing books and who want to explore digital publishing and have deluxe digital editions of what they're doing with various kinds of interactivity.
And a lot of the conversation that we have my background is in digital Humanities Research and project management is this a digital humanities project or is this a book. And I think publication can be used in a lot of publishing, can mean a lot of different things. But I think one thing that I haven't really heard in this conversation is how to distinguish between the scholarly record and final version of record output that really we want to preserve for learning and future researchers versus amazing interactivity.
But we say maybe this has a shelf life of five years and then we sunset it. And so I wonder what your thinking is about, about those questions. Thank you. My answer to everyone who wants to do something interactive online is you should assume after five years, it won't work anymore. Not everybody that I work with is comfortable with that.
The project that we had up earlier from Dr. Sullivan is an example of that. And she was able to find a press that works with scaler. We were not able to help much in that process. She worked more closely with the press, but that's usually what I tell people, is that you should be thinking about these more traditional types of media as that long term backup. Because if you're thinking about long term preservation, that's where we have the ability right now to have more confidence that that's going to exist in 10, 15, 20 years.
Yeah and there's been some great work on not how to preserve the interfaces because those interfaces to emulate and do all the things that you have to do to make this work. It's insane. But how do you capture enough of the experience so that someone can do a fly through. I mean, Emory's center for digital scholarship, some of the work that they're doing right now has a lot to do with gaming technologies being applied to ascites that you want to be able to go on an archaeological journey through.
How do you make a recording of what it felt like to be a part of that. Or how do you how do you document what that is. And there are some methods. There are even some standards that have already been pioneered for that, but they're not being used nearly enough. And especially for that author approved manuscript or whatever you want to call it in this space, having some sense of what is it that you want to have persist at the moment of publication production, whatever it is, seems really, really important.
And I think the hardest thing is convincing people that as much as you would like it to be otherwise, you should also still be thinking about that. Thank you. I just wanted to do a brief shout out to Jasmine mullikin, who's at Stanford University press and was the preservations manager for that digital conference because she has come up with great protocols around preserving complex digital projects and they're very elaborate.
So if anyone is interested in this stuff for their own press, she's the person to talk to. Thank you. Yeah, good news. Hi, I'm Matt Canham from Routledge. Taylor and Francis. My work in the open research team. I had more of a comment. Really, and it's a shout out for the research Data Alliance.
They are a group that brings together lots of different stakeholders to challenge, to work on really complex issues around research data. And they have lots of working interest groups focusing on different areas. I was just having a quick look through their very new website. And I don't think there's anything really going on with humanities data at the moment. I know there's some social science working groups and there's lots of there's working groups and everything.
There's a rice data working group. So I think this should be some more stuff going on in the humanities. But I think there's lots of things going on there that are really interesting. In their most recent meeting they were talking about this idea of a reliquary, which is going to be almost like a digital envelope to capture diaries of data sources in different things.
So when you were talking about how can we pull together lots of different types of data that are all stored in different repositories and different kinds of chapters or books and things. They're working on solutions to do that, and they have publishers in the room to work out how we're going to publish those links, how it's going to be captured, referenced in the metadata. And the thing that they really want to do is work out how that's all going to trickle down.
So everything that's inside that digital envelope will get the credit, the recognition, the kudos, the visibility and all of that sort of thing. So yeah, I think there's definitely room for more of these conversations to happen at RDA, so I'd be happy to help or talk about that if that's useful. Yeah, that was my main point. Yeah, I think there have been cycles of humanity's interaction with RDA and then it goes back more towards the hard sciences, but there's always at least a trickle of humanists at those meetings.
And yeah, it's a great shout out. Vincent and Francis. And we're also part of the LOX archive or Cox archive. And you mentioned a couple archives and some of the problems that you're facing and lox works on some of those very same problems with format migration and creating a sustainable long term archiving solutions, working with libraries. So it's kind of just a shout out.
If you haven't heard of lox, it's a great shout out. I founded the first private lox network, which is called meta archive when I was a Edutopia, and it actually was founded at Emory University. So this goes back to the Emory days, too. It's why we founded Edutopia in the first place, was to give it a broader home. Lox is this is going to be my quick soapbox. I will make it very, very quick.
Locks was created in late 90s. It is still the best preservation solution that I have seen anywhere, and I have studied this stuff for a very long time. It handles all kinds of security issues differently than most other systems do. It is elegant and simple in all kinds of ways. It's open source and y'all, it's constantly in danger of not surviving because it's at Stanford and it's supported by one University primarily.
And then, Yes, Cox builds on that, and there's some money that comes through. That archive builds on that. There's some money that comes through. But it is one of the many underfunded, chronically underfunded, underappreciated, underused under adopted open standards that are out there that have proven themselves now, I mean, for three decades.
And still it just bubbled back up with some folks in Canada that just contacted me to say, will you talk to us about your experience. Because we're thinking about. And I'm like, well, good. I'm glad that this is making a comeback again. But it really is these core technologies that do already give us some of the functionality that we need instead of building on those, we're innovating and we're creating new things.
And each of the commercial players is building its own proprietary thing. And then each of the nonprofits is either trying to get into one of those proprietary spaces or they're trying to build open and they can't sustain it, and then it falls apart, the universities all back out. I mean, it's this pattern over and over again. We're wasting so much time, money, and effort, and we can't afford to do it anymore.
So we have one minute left. Does anybody else have any questions or comments. Do we have any good news to end on. Any other shout outs. Maybe well, Thank you so much for joining on this beautiful Friday. And many, many Thanks to our speakers. Present and not present. There was a lot of support that went into making this panel, but Thank you so much.

Cadmore media player playing video How is ‘Data’ Understood in the Humanities and What Does it Mean for Open Scholarship and Data Sharing Policies?

Video Player

Transcript

Segments

End of Video Player Control