Name:
New Directions in Data and Scholarly Publishing
Description:
New Directions in Data and Scholarly Publishing
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/a93a84ce-e631-4352-83bd-f299187642fb/thumbnails/a93a84ce-e631-4352-83bd-f299187642fb.png
Duration:
T00H54M08S
Embed URL:
https://stream.cadmore.media/player/a93a84ce-e631-4352-83bd-f299187642fb
Content URL:
https://cadmoreoriginalmedia.blob.core.windows.net/a93a84ce-e631-4352-83bd-f299187642fb/new_directions_in_datascience_and_scholarly_publishing_2022 .mp4?sv=2019-02-02&sr=c&sig=KLOVWayn5Rndl1GgYywfqmGXrrtZ3yDWbPC0QAihBKQ%3D&st=2025-01-22T10%3A41%3A37Z&se=2025-01-22T12%3A46%3A37Z&sp=r
Upload Date:
2024-04-10T00:00:00.0000000
Transcript:
Language: EN.
Segment:0 .
Hi, everybody. Welcome to new directions and data and scholarly publishing the final session. Last but not least, of a new direction seminar. What we're trying to do today is just kind of spotlight or showcase three interesting data driven projects to hoping you will get inspired maybe to launch your own data driven project. My name is Rebecca MacLeod.
I'm the managing director of Harvard Dataverse science review, and I'll be serving as your moderator. So I'm super excited to be participating in the session with our three awesome panelists. Lillian kelenic from the p.a. Dylan Rediker and Christina Drummond. And so they each have been asked to discuss how and why their organizations are collecting and using data. So they each will briefly talk about their data projects, and then I will ask a few questions, and then we'll open up to the audience.
But first, I want to give a quick shout out to Alexa Colella. This panel was her brainchild and she unfortunately couldn't join us today. So thank you, Alexa, for bringing us together. And I'm hoping you're zooming in from home, so I'm going to hand it over to Lillian. Thank you, Rebecca.
Actually, my name is Lillian Wang. Selznick, not Lillian grace Selznick. Like it is on the slide. But grace is a nice name too. So yeah, I am an assistant managing editor at the proceedings of the National Academy of Sciences. And I was a little bit surprised actually when Alexa reached out and invited me to join this panel because I as assistant managing editor, I manage a team that manages peer review and also corrections.
And I didn't think of myself as having a particularly data driven role. But as we went deeper into these conversations, I realized, well, actually, I am participating in a project that spans the departments within PNS that is very much using data to take a look at our own editorial workflows and sort of inform the ways that we improve and try to optimize it. So that's my angle here today.
So first a little bit about peonies. So we are a multidisciplinary science journal published by the National Academy of Sciences. So we publish peer reviewed research across the biological, physical and social sciences, and we also have a news magazine section. We were established in 1914 as a venue for members to publish their research, but have been accepting submissions from non nass members since 1995.
We generally accept only the top articles in each discipline that have a broad appeal to a general scientific audience. We've been green open access since 2004 and since we saw yesterday there are a lot of different definitions for what green is for. That for us, that means that all articles are publicly available after six months and we also have an immediate option.
So to give you a sense of why we needed to embark on a project to optimize our editorial workflows, this is just a sense of the scope. At PNAS in 2021, we received over 21,000 submissions and we published almost 4,000 articles. We did this with the help of 290 editorial board members, 1280 member editors, 1475 guest editors, and almost 50,000 reviewers.
So there's a lot going on day to day. So all this was done was facilitated with 30 full time editorial staff who handle peer review operations and keep the years running behind the scenes. We do all of this on a highly customized submission site that we've been using since about 2004. So it's been iterated upon for the last basically 20 years and has become very complicated.
So the project was to the goal was to establish a baseline for what it is exactly that our staff do in conducting peer review operations. Figure out exactly how they do it, how long it takes them to do it and how they feel about it. This we found that there was a lot of individualization within workflows, so people were doing what felt best for them. And in management, we didn't always have insight into what exactly people were doing on a day to day level.
So that brings us to our editorial workflow optimization project or swap for short. So this consists of generally three phases. The third phase is ongoing, so we don't have a lot of neat results for you to share today. But just I want to share more insights into how we developed the project and sort of things that you might be able to take to your own organizations.
So our first phase was planning where we did some of the sort of boring infrastructure stuff, so establishing which tools we're going to use. So we made heavy use of project management tools like confluence, Jira and. We also decided which staff would participate. We developed the scope of the plan, generated a project charter and developed a communication plan to keep all of our staff in the loop.
And we found that that was super important to make sure that we were being very transparent. No one was worried about are they doing this project so that they can fire us all and outsource our work, that kind of stuff. That was not the goal, by the way. And we decided on deliverables. So for this project, that was a series of so-called swim lane diagrams for our workflows and recommendations for future implementation.
The actual implementation of these optimizations are going to be a whole other separate big project. So phase two was our discovery phase, and this was really the meat of it. So it was about 20 hours of in-depth walkthroughs of our workflows with subject matter experts that we drew from the staff. And these were really granular, particularly in the beginning when we're establishing what buttons are you pushing and how long does it take you to do these things?
From these discovery interviews, we identified additional data that we need. And the data that we were looking into fell into three broad categories. So there was data that we could pull from our site. There is survey data from staff and also a time study that is actually going on this week and next week. So phase 3 is analysis.
And so this is we're in the process of consolidating and reflecting on our discovery data and everything that sort of spun out from the discovery meetings. We're compiling the diagrams and that's helping us visualize all these different handoffs between staff, between departments, between editors and staff, and all these different places where bottlenecks could occur.
And we are using qualitative data from these staff interviews to develop priorities. See what's really important, because we found that we generated a lot of data, but the numbers don't really mean anything without the context of well. Is this something that is a problem or not? And we sort of discover that through the qualitative aspects.
I have some takeaways here, but I think we'll be sort of exploring these more during the discussion portion. So I think I'm going to. Pass it off to Dylan. Hi, everybody. My name is Dylan Roediger.
I'm a senior analyst at Ithaca, and I'm here today to kind of shift the lens a little bit away from publishers and towards the researchers who are submitting articles to you, and particularly to talk about researchers perspectives on data publication. This is something that issandr has been working on for quite some time, and my team in particular has been exploring this through a series of projects over the last several years.
Rather than talk about one specific project, I'm going to be kind of trying to synthesize very quickly things we've learned about research communities across disciplines over the course of several different projects. So collectively, these projects are really often rooted in large scale qualitative analysis. Together, they give us a pretty comprehensive sense of how researchers behaviors and experiences with data sharing and publication work and what that looks like and feels like to researchers.
And as I mentioned, I'm just going to share a couple of high level findings from this long series of initiatives that we've been working on, which are rooted in a very large corpus of long form interviews. We've done about 1,600 of them at this point with researchers from essentially every discipline one can imagine existing and a lot of interdisciplinary fields as well.
We do this work in collaboration primarily with University libraries who have worked with us in large numbers to help us understand what's going on among researchers and also how libraries, and other parts of campus can support the work that's going on campus. One of the key takeaways is that there are a lot of barriers to sharing data that researchers face, and there's actually still a wide amount of reluctance to share research data on the part of many individuals.
And the cultural norms that are going to support a data sharing ecosystem are really unevenly adopted from discipline to discipline. There are lots of different barriers. They're hard to synthesize, but in very short, short order, the four ones that seem to pop up again and again and again, have to do with academic incentive structures that devalue the work of sharing and publishing data and treat it as a lesser kind of output than the journal article, which retains its kind of status as the most prestigious type of output.
A lack of expertise on the part of researchers. Preparing data for sharing is a very difficult thing. It requires specialized knowledge that many researchers do not have and have uneven levels of access to, and this creates a barrier in its own right. A lot of researchers don't really know how to do this very well. Metadata, ontologies, things that I'm sure have been talked about throughout the seminar this week are very challenging for people who are not trained to do that kind of work.
And while universities are developing an infrastructure to support this, that too is very uneven from discipline to discipline, institution to institution. And researchers have very uneven levels of engaging with those resources or in fact, even knowing that they exist. A third major barrier has to do with time, just like sharing data is hard. It is also a very time consuming process.
And given the fact that it's not necessarily a very prestigious way to invest one's resources, people are very reluctant to engage in the amount of time it takes to prepare data to be published, especially according to fair standards and the best practices that have been developed. And finally, there is a lack of funding that's a problem, particularly in some fields and particularly around long term preservation of research data.
And I see a few funds. I'll leave this up for a sec, even though I turned it on late. Even though there are all these barriers, it's also the case that there are wide number of communities of researchers who have built cultures of data sharing within communities of researchers. We call these things data communities that if the question are and they are clusters of researchers who share overlapping research agendas and a commitment to solving particular shared problems.
And in some cases, in fact, quite often also they have interpersonal relationships with the researchers who are engaged with similar issues over and over again. We found that these are the types of situations in which data sharing is actually most effective, in part because when you share data with someone who's working on a similar problem as you are, it's most likely to end up being reused.
And re-use is a really important factor in not only successful data sharing and open science, but in people's willingness to engage in the work of repairing data for sharing. Often these data communities are organized around domain repositories. The NSF, NIH, other agencies have invested quite a bit of money into developing these domain repositories.
Over time, those often end up serving as kind of a hub where communities of researchers who are interested in particular types of problems can come together and work together. And a good example of that would be something like fly base, which compiles information on fruit flies and finds researchers who are working on fruit flies, which is very some of you probably know, a very popular thing to study, believe it or not.
I want to just briefly mention the Nelson memo, which I'm sure has been talked about quite a bit, both at this meeting at the ISO workshop earlier this week, and I'm sure in boardrooms and conversations of most people in this room. And talk a little bit about the opportunities and implications this has for researchers and for publishers.
OCP issued new guidelines that will direct federally funded research to be open across agencies. And among the changes are the journal article changes, the immediate open publication for journal articles that have been talked about quite a bit, but also the immediate publication of data and the implication that soon those requirements will be extended not only to data that's associated with peer reviewed publications, but to all research data that is generated via federally funded projects.
So a very large potential amount of data may soon be subject to data publication standards. Right now, the federal government encourages researchers to use those domain repositories wherever they exist. And this has been really important because it makes sure that data gets into the communities that are most likely to re-use this data.
And overall, the ostberg memo signifies that this approach of preferring domain repositories is likely to continue to be the status quo. But at the same time, it also introduces potential new market dynamics that may change the actual practices of researchers in ways that potentially benefit publishers and may also, I think, blunt the impact of the memo and its goals. The new guidelines suggest that researchers will be allowed to use grant funds to pay for services involved in preparing data for publication.
They also because they type of data data, the data deposit to the date of publication, they have to occur simultaneously. They create a situation in which publishers become a fairly obvious kind of gateway to publishing the journal article and the data at the same time in the same place as an easy workflow issue for researchers. It also suggests that publishers may be well positioned to monitor compliance, which is perhaps another reason why some of the dynamics are kind of suggesting that publishers may be encouraged to play an even more active role in data deposit going forward.
And we can talk about this more in the Q&A. But I just want to very briefly say that. One of the things that I think is really important as we look towards this potential world where the workflows of data and publication are more combined into a more centralized series of vendors, is to make sure that in the process we don't lose scite of the data communities who are actually reusing the data, and that whatever publishers are doing to try to help a new kind of data ecosystem emerge takes into account the power that comes from also creating communities around data deposits, rather than just treating data deposit as kind of a repository where things go to be a check in the box for compliance.
I will leave it there for now and pass things over to Christina. Thank you. Hey, everyone. Well, it's fun being the last speaker, so let me take a moment of privilege while we transition here and just ask the question, how many people this is a raise your hand.
I want you all to be active enough falling asleep. Raise your hand if you think we can create economies of scale across our organizations through shared cyber infrastructure. Give me any cyber infrastructure. I'm not talking to any specific, so not unanimous, which I think is interesting. So I am here. I'm Christina Drummond.
I'm the executive director for the open access book usage data trust, which is an effort that's looking to shift from collection to the exchange of usage data between parties, both open and, I would say, sensitive or proprietary data. And we're trying to do this in a community governed way because we want to improve quality and interoperability of that data. For us, it really focuses on how do we shift from a collection perspective where we Hoover up whatever we can find that's out and open, but bring those open data components together with data that is provided or specific purpose with ethical guardrails into this data exchange or a data marketplace model.
And we want to do that in a way that is trusted. We heard a lot in the last panel about how important trust is, not only trust in the data, but trust in the governance and trust in the systems, as well as making sure that those systems are equitable for all involved. And that's where these principles come in. I'll note throughout this, there are lots of ways to keep us focused. I think one of the things that really struck me when we started our panel is we were talking earlier about how a lot of this comes back to workflows.
Well, for us, we're looking at how can you foster the global exchange of open access usage data, because, you know, if you want to tell the story, we'll say Open access book. That book lives across the internet on publishers, aggregators and digital libraries all over. And if an author wants to tell the story of their impact, they need all of that data. And those authors, their libraries, their institutions right now have to pull all of that data together, every single one of them.
So our effort has been very thankfully funded by the Mellon Foundation since 2015 to pull together stakeholders in this space, to understand what the opportunities are and what the challenges are. We started by hosting a workshop and you'll see here exploring open access. E-book usage is a report that came out of that workshop that really identifies some of the major themes that we're looking to address.
And that set the stage for our last grant that ran from 2020 to 2022, in which we really needed to get to the specifics. So it's enough to know that we have opportunities to build economies of scale. But what does that mean in terms of specific use cases? How are publishers using this usage data? What's the difference between the commercial publishers and the University presses and how they use this data?
What about the libraries? What about the scholars? So we knew we needed to document that, but we also needed to understand the data ecosystem. Where is this data coming from? What standards are engaged, what parties are engaged, and who's trying to pull it all together? We also want to understand some of the technologies that could be used and applied for this from open source perspective and see if we could actually then use the experience of putting together dashboards where we aggregate open access, book usage data together to learn more about what the challenges are in doing so.
And then, of course, because we want to be a global effort, we had to start talking about, well, what does community governance mean when you want to be trusted, when you want to be equitable? OK so one of the things I always like putting our ports here because that's like 70 pages and you don't want me to talk you through 70 pages. So we actually went through over a year, did focus groups, communities of practice, interviews to understand what the specific use cases are.
And this is incredibly small. I don't know how many of those can read this online, but what we found out is there were use cases not only in publishers and libraries, but also the platforms and services that support both around not just reporting. We all know that there is reporting. We have to tell our authors, you know, they want the report to give to their funder. Here's what my book and what my scholarship is doing.
Here's who reaches. Here's what the impact is. We know the funding agencies want that, too. But what was really informative about this process is that each of these organizations want to take that usage data and build it into their own internal operations. They want to use it for strategy, they want to use it to inform their operations.
To do that, it has to be high quality. It has to be more timely than once a year. And so this really started to think about what does that mean for us? Again, moving from collecting to exchanging. It also surfaced for us how every single one of these players is going through that step of usage, data management and curation. They need the expertise to be able to interpret this open access usage data, because even though we have standards like counter, it doesn't necessarily always get applied, especially in lower resourced institutions that are like, hey, we may have only the capacity to give you Google Analytics.
That's what we have the time for and all of our pulling together those reports. I noted how we looked at the data ecosystem, which is really, really messy. And so for those, I'm a visual person. I love this. I know not everyone does. But what I'll draw your attention to is when we come back to book usage reporting, you'll see where the lines come together.
Those solid lines are usage reports. And every library, every publisher, the systems that support them that have to work with usage, have staff in their organization who have to go through this process. So again, I ask you, can we do better? Can we find economies of scale? Surprise we think we do. I know.
Not a surprise. We actually were just awarded another award from the Mellon Foundation to look at how we can build the governance building blocks to support this data trust value proposition. Because at the end of the day, we recognize we need a neutral third party in the middle to foster that exchange of information, not to provide the analytics.
We have lots of folks who are providing analytics, building dashboards, using APIs to pull this data in, but how do we facilitate that data management? And what I would argue also the legal data sharing agreements and processes that sit on top of all this. We recognize there isn't only an opportunity to facilitate the data curation and aggregation step, but if we have community support and we have an exchange that's neutral and trusted in the middle, does that open up the opportunity for us to then start benchmarking across not only open data, but many of these controlled proprietary sets of usage data as well?
I actually have a background data policy work with a lot of lawyers over my career. And so I know at the end of the day, this all boils down to those legal agreements. What does it mean when you share data? Well, if I'm going to share it with you and transfer it to you, of course I'm going to have a data Transfer Agreement that puts guardrails in place.
If I'm using data from somewhere, there are going to be terms and conditions in place about how I can use that data. And for when we're talking about open access usage and with the Nelson memo, a lot of us will be having this conversation even more. These come into play when we have that point of data exchange. And I would argue in the same point right now, this is manual.
This is, I think, a great opportunity for us to look to what does this mean to automate these processes, to make them machine readable and to help build in that contracting into the process? The problem is you need to have community agreement. We have to all understand what these terms of participation are. So for our project over the next year, we're going to be creating a data rulebook.
What does it mean when you have a book usage data and you want to participate either as a creator of this data who wants to contribute it to the data trust or someone who wants to participate the data trust to pull that data out by API reports. So we need to create those standards together. This can't be a top down thing. This has to be a community led thing.
So with that, we are looking over the next few years to actually formalize our governance. Right now, we are a research project. I'm based at the University of North Texas as our fiscal host, but we have an international board of trustees. We have international governance project advisory board for this project, and we're working in partnership with Oprah and open air in Europe. And I'll note one of the things for us when we come back to neutrality and trust is we've been very heavily influenced by some of the data regulations emerging in Europe, specifically the data governance act, for those of you who don't know specifically talks about how these third party kind of intermediaries that are helping to foster data exchange need to be neutral in order to be trusted.
I cannot be putting a report out that ranks my data providers because those on the bottom of the list aren't going to want to participate. And so because of that regulatory framework that we know is going to impact a large amount of our participants or publishers or libraries, this is something that for us brought us to focus on the data exchange aspect and how we can foster that through trust and community governance.
So we have lots more to come. You can find us on Twitter and I'll note, if anyone's interested in participating in the technical pilot, please reach out. So my first question would direct it to Lillian on. I noted on your slides that you had feelings or data and have the importance of weaving in the qualitative data to understand the quantitative data.
So can you talk about bit more about that? Yeah so I think speaking as a publisher whose bread and butter has been peer review for the last several years. We're not always accustomed to thinking of ourselves as employees, as generating data. We think of data as being more like what Dylan and Christine are talking about, sort of this capital data that's all in Excel sheets or being generated by researchers.
And so I think when we came together to form our wap project, there was some question about like, well, where do we begin? And what has something that sort of emerged as a theme of what's guiding us into which areas to prioritize has been seen, how, how our staff feel about the work that they're doing.
And so this is actually something that a previous boss, Lois Jensen at apa, taught me. She said, pay attention to what annoys you about what you're doing every day. And, and when you identify something that makes you really mad, like make it better. And so I think of irritation, annoyance as being a, an amazing flag for importance, for something that, that screws up your day to day.
And that is sort of where we're concentrating our efforts. It's, I think something that is a little unexpected is that it's also become we're treating our feelings as data that informs the data that we collect. But then it's also sort of a two way street where there are some cases where the gut feelings are not correct. And I think that's something Paige talked about yesterday, you know, using the data to figure out which gut feelings are wrong.
And so there are some I think one interesting example of that is there's one element of our editorial workflow assignments that is divided amongst, I think, about 12 different editorial staff and individually each. None of them complained about it because they say, oh, you know, it takes me 15 minutes a day. It's fine. But once we looked at it in aggregate, we realized this is something that adds up to a full time workload for one employee.
So it's just because it's divided amongst the team. No one's complaining about it. And so it's sort of being able to shift perspective. Take a look at like walking in our employee shoes and saying like what is affecting them versus being able to Zoom out and say what can't they see at the ground level versus what? What can't we see because we're not doing their day to day work.
I also was thinking when you talk about feelings or when you launch into a data driven project, you know, the danger sometimes is if you have an agenda, right, you have the feelings and you want to use the data to prove your point. But you're hoping going in with neutral, you know, what am I going to discover? Because we know the data can be used to illuminate it. Also, it can be used in nefarious ways.
So I of you to want to elaborate on anything in terms of a larger scale, in terms of qualitative versus quantitative data and how the importance of them being enmeshed in these projects. As a qualitative researcher, I have a little bit of skin in this game, I guess, but I think Lillian really put it very nicely, like when you're trying to understand workflows, there's been a kind of revolution in the way that we can use analytics and other kinds of metrics to understand how work happens, where choke points are and whatnot.
And those are obviously really important and essential tools. But it's often the case that. They are not very useful for understanding people's experiences and the kind of cultural norms that make that are also part of systems and how they work. And we've seen this a lot in the way that we've tried to understand data sharing efforts, and this is part of why we looked at these data communities.
We wanted to understand the kind of positive experience of people who were sharing data, not because they had to, but because they wanted to. And to understand what it was that made people want to engage in data sharing under the hunch. That, I think has been largely true up to this point that building a technical infrastructure for something like data sharing is obviously a necessary part of achieving open science.
But if you don't build a cultural infrastructure to go along with it, even with mandates, you're going to find the impact of that to be muted and to be not as widespread as one would hope. So I think this connection between qualitative understanding of experience and workflow issues is a really important one. If I can just kind of build on that, I absolutely think need both.
But that said, I think a lot of the technology is already there. We're not innovating technology. The trick is figuring out how to take that technology and build a system of governance that instills trust. On top of that, recognizing the diversity of all of the players at the table, because those what trust means to me is going to be different than what trust means to each of you.
And so when we're looking at these systems, thinking about how do we put governance in place, not only at an organizational and community level to instill trust, but at the data layer. And this is where I think the importance of data governance comes in, because we really need both of those layers in place and trusted for this to work. It's not a technology problem. Controlling my data governance.
We had a conversation at the ISO forum on Tuesday, and Christina made a good point about most organizations, large or small, have a marketing person, has an it person or whatever. But what about a data, a data person, a data steward? Who is that that's overseeing each organization's governing that? When who did your project? A bunch of people would have.
What if you had a data expert who oversaw your project? So can you talk more about that, the importance of maybe it sounds like, especially now with the Nelson memo, publishers are going to need some data. In house experts, the people who may be data scientists, but they understand publishing at the same time. Is that something that we're it seems like people, organizations will need? Can you talk about that more?
Your Christina when we talked about this the other day and I'm happy to start with that, I think it's at multiple levels. So obviously we need domain experts who understand the specifics of a usage data or access usage data, who can navigate those. And it varies discipline by discipline. But I do think there is even a higher level in the same way of chief privacy officers at each of our institutions who are looking at are we in compliance with gdpr?
We have a growing space for not only embedded data stewards, but someone to play that chief data officer role for an organization who's looking across everything, not just are we in compliance, but can we be more effective across all of our units? Can we be more innovative if we partner? And what does that look like? And I think, again, it's very different than the technological era at that point.
It really is a data governance layer. I'll take a crack at the second part of your question about the Nelson memo. The memo. Opens up an opportunity for services to assist researchers in preparing data for publication. And I suspect that we'll see a lot of different kinds of entities trying to develop services that might fit into that.
Those could be libraries that could potentially be base building services based on taking control of a little bit of grant funds to do work for researchers. Repositories may be able to develop these things, both domain repositories and also larger, either commercial or non-profit repositories, triads and whatnot, and also publishers. And because a lot of the barriers around data sharing have to do with money and time and the effort that goes into preparing data, the expertise that you need, it seems possible and much is up in the air with the Nelson memo, obviously, but it seems possible that researchers will prefer to outsource some of this if the opportunity arises, and they now have the capacity to use grant funds to do so.
I think if publishers are interested in kind of getting into that space. And as I already said in my earlier remarks, I think it's logical to think that some will. The things that I would be trying to keep track of are talking with researchers and also with information professionals at libraries who have this kind of expertise already and figuring out where the common challenges and pain points are, and really getting a good understanding of the work that particularly librarians are already doing in this space to assist researchers and also to really keep in mind the importance of making sure the data gets found by the people who are going to use it.
And, you know, I've been thinking about this a little bit, and my first inclination was that, like, journals might be a kind of umbrella to kind of gather together a community. And I think in some context, that may very well be true. But most of the data communities that we've seen thus far, they're interdisciplinary. They are connected by things that aren't going to necessarily be neatly bundled into the way that journals are often bundled, both because journals are either more specialized or overly broad in general.
And I think that if to the extent that the goal here is to make an infrastructure in which data is not only deposited but reused so that the potential of data actually is manifested, it's going to be important to really pay attention to building systems and services that also allow for communities to form around them so that people can actually reuse this stuff. Yeah, I think coming from the publisher side, I think I do think that it's going to become increasingly important for publishers to have people with specific training in data governance, to not just make sure that that authors and the publisher are in compliance with federal guidelines, but to go beyond that and actually engage with researchers, to engage with the data communities that to not just do the bare minimum and sort of enforce a one size fits all data sharing policy, because I think that's what ends up happening when, you know, as much as I think on the job learning and training is important and sort of it's been the backbone of my career so far.
There are some things that you can't just pick up from going to a few webinars. So I think, you know, actually engaging with people who have that expertise, who have that training, who have come from research communities is going to be critical moving forward. I could just build on that. I think we are.
We're following in the footsteps of other industries that have been working on this. I think of fintech financial technology systems, health sector, the COVID cohort collaborative, for those who don't know, I think is an excellent illustration of how they're like, OK, we know we need to be able to share clinical trial data across and during the pandemic as fast as possible, but we can't just make it open, and I think that is something that we are quickly approaching.
We were in this era of open data Commons as best and I think we are appreciating some of the ethical challenges of that approach and what that could potentially mean in terms of threats of privacy. And I think that means we're shifting into this new era where it's not just open, but it's as open as possible, but as controlled as necessary for us to be able to get everyone to the table exchanging that data for the public good.
Thank you. So it's going to open up to the audience. Any questions? Jackie, do you have any questions on the chat? We do. What are the most important things that research data is used for? That's a really difficult question because research data is such a diverse category of things.
But I would point just to one example of the development of COVID vaccines as a kind of collective process in which data was being shared and reused quite widely. And the infrastructure that we've built up to enable the exchange of information really worked to speed up that process. And I think could give an essentially an infinite number of responses to what research data is used for, but that would probably be a really good recent example that demonstrates high impact use of data in a way that really relies on leveraging the entire infrastructure and a very profound cultural incentive that also gave people a strong motive to use that infrastructure.
If I may riff on that for a second. I think one of the fun thought exercises that we often encounter in privacy circles is, well, what negative ways could this data be used to harm society or to have a disparate impact? And I think that's a question we don't always ask ourselves. We think about all the benefits, but we don't think of potential challenges. So when I hear things like, hey, let's give every scholar in the world a unique identifier and we can pull up all of their work they've ever done, even from when they were a student all the way through their career.
I think about what does that mean for scholarly freedom? And so I think we're rapidly approaching a place where we have to start thinking about those potential negatives and what ethical guardrails need to go into place because of those. Hi is this on? Yes, it's on. Is it on?
So in thinking about the value of data points and of sharing those data points and how you can have some insights when we can combine our data and that being a good thing, but knowing that that is a very difficult thing. Do you see an opportunity for the usage of usage sharing frameworks similar to the distributed usage logging framework in helping to say like, OK, everyone, put your data in, organize it in this way so that we can have shared insights on this data and how could we maybe move that needle forward?
We talked about the distributed usage logging framework earlier in this meeting as well as on Tuesday at the ISO meeting and how it has this potential. But it's hard to sort of get that going. So I would love to hear your thoughts there. I think that was probably directed towards me and I would say, yeah, we so that project has actually come up quite a bit. And I think that's the role that the data trust will likely play is not only to develop that community framework here over the next year and then to document it in a role book, but to also Steward that framework.
And I think that's something that's really important here, is we think about we talk about community governance and trust. The framework is only as good as its adoption and its stewardship over time. And so I think that's why you need an organization and a community really to play that function. Any suggestions? Well, I would say, you know, look at building data trusts, but first, like, you know, and that's really the approach we're taking.
But I'll note we're really leading that work. There are a handful of international data spaces, which is actually the model we're learning from that are growing in Europe to foster this kind of stewarded data exchange for a given industry, for transportation, for logistics, and to make sure that those exchanges interoperate with each other. And so I think as scholarly communication, scholarly publishing, we can learn from that work that's come before we're specifically focused on open access book usage because it's a manageable use case.
But if you were to ask Christina personally, like, what do you think? I think this really comes back to national cyber infrastructure and how does our Systems Interconnect with what's going on in europe? I don't necessarily think we can have a global technical infrastructure to have them all, but I hope we can have interoperability with a global framework that is stewarded by all.
OK I have a question. Sarah there's no one online. Hi, everyone. Sarah from plos. So I feel like we're talking about two different sets of data. Data that's generated from research and data that's used to evaluate how things are used and if they're being used effectively.
And there's a lot of overlap. But those are kind of what I'm hearing. I was wondering to the latter case, if you can speak to. How organizations should be thinking about data collection. When there's a lot of like, oh, that would be cool to know. That would be nice to know. Like, that's neat. And then you end up potentially building and collecting huge amounts of information that like, that's cool.
But so what like what kind of guidance would you give to organizations that are thinking about engaging in this data collection in a thoughtful way that gets to an outcome you want where you don't realize in the meantime, we just had things we were curious about, but none of this is actionable, but we just were curious. And it's sort of a muddled question, but I see had not so maybe know what I'm getting at.
I think this is I think this is for you really. I think it's certainly applicable to my experience with our Ewok project. Just being organized in asking the questions that you want to ask is super critical. I think having a setting up the internal infrastructure for where once you ask these questions and someone gives you a report, where is it saved?
You know, don't just have Excel sheets flying around in email that that's a nightmare. And so I think building out a shared space in, you know, in the cloud where all of it is saved and organized and all of the file names make sense is important. It's the difference between just asking this question once every five years because it seems like, oh, this is something we should know.
And then just forgetting that you ever asked it and versus actually maybe taking action on the information that you need to wrap up soon. But do you want to add. Oh, I was just going to add really quick. Once that question you're trying to answer, I think also have to double check like does context matter and does timeliness matter? Because your answer to those two questions is going to impact what kind of data you need.
Free time for one more question. Well, to Sarah's question, I kind of take the frame of you that the data is there to drive decision making. And so you start thinking of what your goals are in whatever project you want to do and then what data you need. So, I mean, it's a continuous, continuous process. So my question for Lillian, with your project in particular, which I found really interesting, did you to do start with like specific goals in mind or was it more like collect different data and see where it goes and.
And second part of the question is, you mentioned swim lanes. Did you use any other kind of Six Sigma type? Process improvement? I find those really helpful in tackling these types of projects. So I guess our goal. Well, so I wasn't actually EPA when the project began, it was one of the things that drew me to pnac was hearing about this project in my interviews and realizing, oh, this is, you know, this is a journal that cares about being efficient and not just sort of resting on its long history and laurels.
But my understanding of how the project began was that it began with the realization that things are kind of a mess. And, you know, 100-year-old journal things have progressed sort of ad hoc changes over the years. And so it was really the goal was to establish a baseline of where we are now and then to look from there.
How can we improve? Yeah the, the swimming diagrams. So we had the help of our digital project manager, who I think is maybe more into the sort of Six Sigma stuff. So we were benefited from, from some of those principles without.
Maybe, I guess I didn't realize that came from that sort of business philosophy world. But it's been certainly very helpful to have someone who's more sort of project management oriented guiding these things that I think in the past are generally just done through intuition and, you know, saying, well, in my experience, the peer review or turnaround time should be 14 days or it should be 10 days, actually.
And those things that came from the gut are now coming from data. OK Thank you. Thank you all. Excellent I have to follow up to find out your phase three, phase three, and then how the governance is going and everything. And then after the Nelson project, Nelson Mandela is up for a bit.
So you do a part, too. Great Thank you. Thank you. Thank you so much to our wonderful speakers and panelists. This concludes the 2022 cesp new direction seminar.