Name:
OA Usage Reporting - Understanding stakeholder needs and advancing trust through shared infrastructure Recording
Description:
OA Usage Reporting - Understanding stakeholder needs and advancing trust through shared infrastructure Recording
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/cc5542eb-5be6-4a6c-b532-8bbb2b9c42fc/videoscrubberimages/Scrubber_3.jpg
Duration:
T00H42M10S
Embed URL:
https://stream.cadmore.media/player/cc5542eb-5be6-4a6c-b532-8bbb2b9c42fc
Content URL:
https://cadmoreoriginalmedia.blob.core.windows.net/cc5542eb-5be6-4a6c-b532-8bbb2b9c42fc/OA Usage Reporting.mp4?sv=2019-02-02&sr=c&sig=NslEBXd2O6xuDIQXO3HavComOHwutrxet9ziFFtKjFo%3D&st=2024-11-23T22%3A20%3A01Z&se=2024-11-24T00%3A25%3A01Z&sp=r
Upload Date:
2024-03-06T00:00:00.0000000
Transcript:
Language: EN.
Segment:0 .
JENNIFER D'URSO: Hello Thank you for joining us today.
JENNIFER D'URSO: My name is Jennifer D'Urso, and I'm the production manager at project muse. I'll be moderating the panel on usage, reporting, understanding stakeholder needs and advancing trust through shared infrastructure. Our panel will begin with four brief recorded presentations from speakers who all work in the trenches of OA usage reporting and who each approach the challenges and opportunities presented by usage data from slightly different but intersecting angles.
JENNIFER D'URSO: After the recorded presentations, we hope that you will join us on Zoom for a live discussion of the topic. So without further ado, allow me to introduce you to our panelists in the order in which they will speak. First, we will hear from Tim Lloyd. Tim is the founder and CEO of LibLynx, a business that helps the knowledge industry to manage identity and access to online resources, and to better understand the usage impact of those resources. Tim is a member of COUNTER's Open Access Unpaywalled subgroup.
JENNIFER D'URSO: He's also a member of SeamlessAccess.org's Governance committee and co-chair of its Outreach committee. His career spans several decades in a variety of product development and operational roles in online publishing, with a particular focus on developing innovative products and services to support online learning and research. The second speaker will be Tricia Miller.
JENNIFER D'URSO: Tricia is the marketing manager for sales partnerships and initiatives at the nonprofit publisher Annual Reviews. She has worked in scholarly communications for more than a decade to support marketing and sales efforts for society and nonprofit publishers. With a background in marketing, environmental horticulture, socio-cultural communication, and technical and professional communication, her work emphasizes improving the user experience while supporting systems and best practices for accessing and understanding scholarly information.
JENNIFER D'URSO: She is driven by supporting equity in academia and larger DEI opportunities within scholarly publishing. She acknowledges that the land on which she lives in Sun Prairie, Wisconsin, is on the unceded ancestral land of the Hochunk nation. Our third speaker will be Christina Drummond. As the executive director for the book usage data trust effort, Christina works to improve the quality, completeness and timeliness of OA impact data while reducing reporting costs through better global usage data exchange, aggregation and governance.
JENNIFER D'URSO: In addition to being certified in information privacy, data stewardship and design thinking, Christina holds an MA in International Science and Technology Policy from the George Washington University, a certificate in International Business from the University of Washington and a BS in the Social Sciences from The Ohio State University. She has served as co-chair for the Professionalising Data Stewardship Interest Group of the Research Data Alliance and as an appointee on the Mid-Ohio Regional Planning Commission's Regional Data Advisory Committee.
JENNIFER D'URSO: Christina also led the community consultation strategy for the It Takes a Village (ITAV) - Developing Practical Tools for Open Source Software effort. Our fourth and final speaker. Is Jennifer Kemp. Jennifer is head of partnerships at crossref, where she works with members, service providers and metadata users to improve metadata, community participation and discoverability.
JENNIFER D'URSO: Prior to crossref, she most recently worked for Springer Nature, where she was the senior manager of policy and external relations North America. Her experience in scholarly publishing began with her work as a publication manager at Highwire press, where she had a variety of clients publishing in a wide range of disciplines. Jennifer's perspective on the industry remains influenced by her years as a librarian, and she is an active member of community initiatives.
JENNIFER D'URSO: At crossref. She facilitates the books interest group, the funder advisory group and the metadata user working group. She also serves on the next generation library publishing advisory board, the library publishing coalition preservation task force and the open access e-book usage board of trustees. So with that, I'm happy to turn the floor over to Tim Lloyd.
Segment:1 Tim Lloyd.
TIM LLOYD: Thank you very much. Great, Hi everyone. Good morning. Good afternoon. Good evening. Wherever you are. I'm kicking off this session by exploring how the changing environment for Scholarly Publishing is driving complexity in Open Access usage reporting.
TIM LLOYD: I'll touch on a number of issues that Tricia, Christina and Jennifer will explore in more detail from their various perspectives as an OA publisher, an emerging usage data trust, and research infrastructure. Next, please. So a model that I find helpful for thinking about usage reporting is to break the process down into three components.
TIM LLOYD: There's data capture, where we capture raw event data from the publishing and distribution platforms that are delivering open access content to users. There's the processing of that raw event data, which can include custom logic for enriching the data. And then the delivery of the processed output in various formats, such as aggregate reports and processed event data. Next. So, I'm going to start with the high level drivers that impact usage reporting across the board.
TIM LLOYD: And my first one is Scale. Usage of publicly available content is at least an order of magnitude greater than paywalled content. Most industry applications for usage reporting were developed for paywalled content around a traditional model of month end batch processing of counter reports. These reporting architectures bake in the assumption that processing can wait until all the relevant data has been assembled, and that reprocessing is a fairly rare occurrence.
TIM LLOYD: In my experience, these systems struggle to transition to an environment where reporting requirements are more frequent. They may be on demand, and reprocessing is more common because data isn't perfect, and some data may only become available at a later date. For example, if future reprocessing is a consideration, then you also need systems to make the significant volume of raw event data available, at least for a period.
TIM LLOYD: Next is Granularity. Stakeholders are increasingly in understanding usage at more granular levels: the chapter of a book, an article within a journal, an audio video segment, et cetera. This correlates to the Item in COUNTER reporting. And it's no coincidence that COUNTER's proposed 5.1 update to their code of practice makes the item the default level of reporting versus the title, which it is currently.
TIM LLOYD: Item level reporting obviously significantly increases the volume and detail of data flowing through the system. Next. Usage of open access content is also attracting interest from a broader set of stakeholders outside the traditional library audience for COUNTER reports. Those managing institutional research funds want to understand the impact of their funding. Those managing and negotiating publisher relationships want to understand how their organization is generating new open access content as well as consuming it.
TIM LLOYD: Authors have increasing choice over where to publish and usage reporting informs their understanding of these choices. And all this is in addition to editorial staff wanting to understand usage trends as input into their editorial strategies. These emerging stakeholders drive new use cases, add additional complexity to the process. And last, but not least, is data privacy.
TIM LLOYD: COUNTER's code of practice includes a statement on data confidentiality that is based on the current ICOLC guidelines. That's the International Coalition of Library Consortia. And this statement prohibits the release of any information about identifiable users, institutions or consortia without their permission. As usage reporting grows in scale and granularity, and data is made available to a wider range of stakeholders,
TIM LLOYD: we need to be thoughtful about ensuring privacy is maintained. Next, please. Let's go down a level of detail, and I'm going to start with data capture. So, one major change that we'll see over the next few years is an increasing number of publishers syndicating their content so that usage occurs in multiple platforms, rather than just from the content owners platform.
TIM LLOYD: While this model is already familiar to those in book distribution, journal publishers are experimenting with distributing content via platforms like ResearchGate. Just this week there was a press release from De Gruyter about a content partnership with ResearchGate as an example. This requires the usage from these third party platforms to be integrated into a publisher's own usage reports in order to provide a comprehensive understanding of usage, and adds more layers of complexity.
TIM LLOYD: A great example of this is the next point: Diverse formats and metadata. As raw usage data is sourced from a wider range of platforms, a more diverse range of inputs is to be expected at the format level. Files can be tabular, like comma delimited CSVs, or structured, like JSON. In some cases, usage data for a single platform has to be split across multiple files due to the peculiarities of the database exporting the raw events.
TIM LLOYD: At the metadata level, examples include the use of free text fields versus standard identifiers, and varied conventions for time stamping. As a community, we've yet to coalesce around a fixed set of standards for usage data, which is to say that plenty of standards exist, but we aren't consistently selecting and using them, at least in the open access arena.
TIM LLOYD: As a result, we're trying to build aggregated reports from diverse data sets and, as is often the case, this ends up forcing us to use the lowest common denominator rather than being treated as an opportunity for best practice. Next, please. So let's look at data processing. So, the new use cases I referred to earlier will likely necessitate additional metadata to drive new reporting.
TIM LLOYD: For example, content topics to enable analysis against research priorities. Or identifiers to enable reporting to be filtered for particular funders or authors. Anyone who has tried to disambiguate author names across multiple systems will be familiar with the huge challenges that can lie there. And similarly, we're seeing a need for new processing logic. In the paywalled world, platforms know the identity of the organization accessing content, which is how we're able to generate COUNTER reports for librarians. In an open access world, we typically don't know anything about the user
TIM LLOYD: and so new processing logic is needed to affiliate usage with an organization, such as matching IP addresses to registered organizational ranges. Another example would be using third party databases to look up funder IDs for a journal article. Next, please. And finally, what about delivery? These new use cases also spawn a need for new reporting formats.
TIM LLOYD: We need to support both machine driven reporting needs - think bulk exports, APIs - as well as ones designed for human consumption - web pages, spreadsheets, PDFS. We'll see more reporting using graphical formats that make it easy to consume at a glance, as well as the traditional tabular reporting. And we're seeing a demand for a greater frequency of reporting. COUNTER reports capture a month of usage, but increasingly usage data is flowing in real time and driving on demand reporting that can cover custom date ranges and be used to power reporting applications that are also working in real time.
TIM LLOYD: Next please. So this future has important implications for the systems and workflows needed to support open access usage reporting. They need to be exponentially scalable and support granular, real time or near real time reporting. They need to be able to flexibly cope with a variety of input and output formats, and swap in custom processing logic for different use cases.
TIM LLOYD: They obviously need to be standards based, which should help reduce the variety we have to cope with. The standards are a journey, not a destination, so we need to be practical about supporting what's possible now, while we transition and get community buy-in. And they need to be reliable and auditable so our community can understand how they're created, and rely on them for decision making.
TIM LLOYD: In short, this is a big change. It won't happen overnight, and it doesn't need to. But this is the future. we need to prepare for. And, with that, I will pass to Tricia.
Segment:2 Tricia Miller.
TRICIA MILLER: Thanks, Tim. Hello, I am Tricia Miller. I'm the marketing manager for sales, partnerships, and initiatives at Annual Reviews, and today I'll present on open access usage and what publishers are looking for.
TRICIA MILLER: So with the increase of open access usage, the data describing who and where scholarly resources are being used is changing and will continue to do so. These changes, however, may not align with the original standards describing the value and use by institutions. The audiences, contexts, and purpose for use of scientific literature is growing as more open access content is being published, and therefore the measurement and significance of open access usage data has to be reimagined.
TRICIA MILLER: This puts publishers in an ambiguous position to balance the paywalled and open access usage data to meet the needs of our users, other stakeholders and ourselves. So, to start responding to the changes in usage reporting, publishers must have access to, and be able to report upon, how open access impacts our entire scholarly communications community. Next slide.
TRICIA MILLER: For Annual Reviews, the challenges are amplified by our publishing model, Subscribe to open, which is a non APC open access model that still necessitates institutional, government, academic and other subscriptions to fund open access publishing. Not only does open access usage impact more audiences, but currently it must also be trusted to correspond with traditional usage metrics in order for our subscribers to rationalize their financial support.
TRICIA MILLER: So how do we create a framework that correlates both usage types, and can satisfy open access publishing or whatever model is being used by any publisher, and whether it be for publishing articles, supporting author manuscripts or APCs, and fulfilling funder and government mandates. All of those stakeholders need to be recognized. So, I'll start with the reasoning for open access usage, which is the access and impact to a global audience.
TRICIA MILLER: Then I'll discuss collaborating to create a new framework for shared understanding and standards. And finally, the necessity to trust the data reported to us. Next, slide please. So, at the heart of open access is the impact and access to a global audience beyond the traditional audiences, who are those accessing articles through an institutional confine
TRICIA MILLER: Open access publishing creates a complex network of users. In 2017, Annual Reviews, with grant funding from the Robert Wood Johnson Foundation, opened the Annual Review of Public Health to help understand how access affects usage and impact of scholarly review articles. What we were able to find was who is using our content, in what context, and for what purpose. I'm going to briefly show you how our open access usage data reporting helped us to visualize the impact of open access publishing.
TRICIA MILLER: Next slide. So after the first year of opening access to our journal, our open access usage data showed us that we were able to increase usage by about 40% and in 2022, The usage increase was 130% higher than it was when it was behind a paywall. Next slide.
TRICIA MILLER: It's no surprise to anyone that about 90% of our usage is at academic institutions. But what our open access usage data told us that was noteworthy is that the variety of usage beyond academic institutions continues to grow. The data showed us 94 different types of institutions downloading full text, HTML and PDF articles. The variety of institution types within academic, government and corporate usage proves that there's a need and a value for scholarly scholarly literature that non open access publishing models are leaving out.
TRICIA MILLER: This granularity of data is important to evaluating and supporting the needs of all of our users. We found usage that took place at construction companies, banks, food producers and even prisons, just to name a few different institution types. Next slide. Another example of the granularity of data we have is the range of areas of interest of our users that reflects on the purpose for access and the impact that articles that are open access can support.
TRICIA MILLER: For Annual Reviews, our data indicated 326 different areas of interest by our users. Next slide. The last metric I'll share on the granularity of open access data is the global usage. The usage of the Annual Review of Public Health had articles accessed from 55 different countries in 2016 when it was behind a paywall. In 2022, the articles accessed in countries jumped to 187.
TRICIA MILLER: So, from here, we can clearly see how open access and our open access data impacts can impact usage. Now we have to understand the needs of a truly global and diverse audience. How are all of our stakeholders impacted when our audience and their needs are changing? Next slide. It leads for our need to develop a collaborative framework. It has to be based on the integrity of our data, the availability of data to all stakeholders, and the reproducibility and consistency of the data which are all objective and possible to obtain through collaboration and standards.
TRICIA MILLER: But they also matter for the interpretation of the data to accomplish both individual and collective goals. Next slide. The traditional usage framework is based around cost per download and institutional attribution, but now that's just part of the usage interpretation that open access reporting can provide.
TRICIA MILLER: Many stakeholders now are aware and concerned with a mission driven framework where the benefit to their communities, to society and global knowledge sharing can exist alongside with their institutional benefits. The equity and access to the global audience is one of the significant reasons why open access publishing is accelerating so quickly.
TRICIA MILLER: What publishers need to understand and undertake are how these needs can co-exist and be communicated effectively. Next slide. The final point from the publisher perspective that matters to open access usage reporting is trust. So having trust in our open access data that our open access data is accurate builds trust generally in the publisher, but also their approach to open access publishing as a community working together to achieve our collective goals.
TRICIA MILLER: It's our open access data. sharing of results, and our interpretation that help build trust in our relationships through these collaborations. As I mentioned earlier, open access data is much more granular, and so we also have to balance the transparency of usage data with the privacy for our users and their institutions. Next slide.
TRICIA MILLER: The open access usage data that Annual Reviews gathers is both attributed based on IPs and unattributed, meaning that it's not associated with a particular institution that we know of. But that doesn't mean the information about unattributed users is unknown. To generalize, it's different from traditional paywalled usage reporting or COUNTER reporting that Tim spoke about, where the bulk of reports are pulled based on IPs and institutional subscribers.
TRICIA MILLER: Of course, there's a lot more to it, but for time, what's important is that the granularity of open access data and the lack of current standards means that there may not be consistency in how data is gathered, presented or interpreted, which can cause a lack of trust. Next slide.
TRICIA MILLER: So to summarize from the perspective of a publisher, open access data can offer evidence to who our real audiences are so that we can support them better, thereby allowing us to ask and understand their needs. We also need to listen to institutions, libraries, funders, authors, and other publishers to help us reimagine this framework that meets the need for all of us. Is equitable, trusted and collaborative.
TRICIA MILLER: Thank you.
CHRISTINA DRUMMOND: So that's a wonderful lead in to what I'll be talking about, which is a bit about how we might be moving forward. And so I'm Christina Drummond. I'm the Executive Director for the OA Book Usage Data Trust effort, and these issues that we've been hearing Tricia and Tim talk about are very much at the core of what we've been looking at with respect to OA books for a couple of years now. We've been working with the global OA book stakeholder community to look at how we can better foster the community governed exchange of reliable, granular, on-demand usage data in a way that is trusted and equitable.
CHRISTINA DRUMMOND: So in other words, addressing those issues of complexity while also enabling the use cases you've just heard about. In the coming few minutes, I'm going to give you a little bit of background about where we've been and how we got to where we are as an effort, but also to give you a preview of what we're looking at this year. That's where we get into these conversations around community principles and guidelines and how we get towards developing those organizational policy and technical governance mechanisms that we're going to need as an ecosystem to not only improve the interoperability of our systems, but to integrate our systems to make it easier for OA book stakeholders to not only participate in something like a data trust, but to more freely and accurately exchange this data according to our community principles.
CHRISTINA DRUMMOND: Next slide. For those of you who may be unfamiliar with our effort, I should note that everything took root and started at conversations at a 2015 Scholarly Communications Institute session. Our first set of principal investigators are thought leaders in our field, including folks like Charles Watkinson, Rebecca Welzenbach, Brian O'Leary, Cameron Neylon, Lucy Montgomery, Kevin Hopkins, Katherine Skinner. Supported by the Mellon Foundation, we've now completed two project phases where community stakeholders came together to document shared opportunities, challenges and both the reporting and analytical use cases related to OA book usage, not only from the data reporting side and all of the different reports that you've just heard Tricia explain for different stakeholders, but also from an operational analytics and strategic decision making side of things.
CHRISTINA DRUMMOND: This foundational research prepared us to then focus what we're looking at now on our current grant that's also funded through Mellon, which is looking at how do we create an ecosystem or cyber-infrastructure network for all of us to interoperate through, and what are the governance building blocks we would need for such an ecosystem, which we're talking about in terms of an industrial data space? I'll explain what that is in a couple of minutes.
CHRISTINA DRUMMOND: Long story short, we're aligning with some European frameworks that are emerging to support this trusted data intermediation and exchange across not only public agencies and organizations and nonprofits, but private and corporate organizations as well. Next slide. Earlier, our effort explored how data collaboratives and data trusts could be a way to support a global data exchange clearinghouse by providing those economies of scale.
CHRISTINA DRUMMOND: When I think about economies of scale, I'm really referring to usage data aggregation and benchmarking, all of that curation and management that organizations today have to do to be able to provide those reports when the data is coming from so many sources. The question really becomes one of how do you structure such governance and technological or cyber infrastructure to sustainably integrate the many standards PID authorities that we're all working with today around OA book usage.
CHRISTINA DRUMMOND: Next slide. So this past July, the Mellon Foundation awarded our team 1.2 million to continue to advance this work through the OA Book Usage Data Trust. What we're focusing on now is how do we evolve our community governance and engage community stakeholders to focus first on the ethical participation guidelines and ethical data exchange guidelines?
CHRISTINA DRUMMOND: How do we foster that trust that you just heard Tricia mention, and make sure that we're not doing any harm for those who are participating, but also for those who are reflected within the data. We also, of course, need to understand the return on investment for us to work together on a shared solution as opposed to trying to do this each on our own. We're honored to work with a diverse project advisory board and my Co-PIs, the Secretary General of OPERAS and the Chief Legal Advisor for OpenAIRE, to help us advance this project, to really build out these community governance mechanisms.
CHRISTINA DRUMMOND: And I should note that when we talk about community governance we're not just focusing on the governing boards and policies, but we're truly trying to define those participation rules of the road. What are the ethical guidelines for us to be thinking about and operationalizing around OA book usage? Our hope is in later years, we'll take this information to then hone how to sustain such a shared cyber infrastructure, building upon what we're learning in separate technical pilots.
CHRISTINA DRUMMOND: Next slide. If you've seen a talk of mine, you know how much I love to refer to this foundational work that came out of our last two grants. Given the nature of today's workshop, I just wanted to flag two reports, this being the first which really goes into detail.
CHRISTINA DRUMMOND: on things you've heard both Tim and Tricia mention so far. What are those specific queries and use cases and reports from an analytics and reporting perspective? There are many, but what you'll note here is there are many that are shared among the libraries and the publishers. Key to us in our project is we realize there's two different pieces going on here. One is meeting that reporting and analytics need, and there's many different organizations that are trying to do that.
CHRISTINA DRUMMOND: There's a lot of innovation and needs there that need to be met. However, in all cases, there is this shared component around the usage data curation and management and that's what we're trying to address with the data trust itself. Next slide. In 2020, we were lucky enough to come across a growing body of work taking place in Europe to prepare for the European data marketplace regulations that are going to be emerging here this year and next.
CHRISTINA DRUMMOND: Heavily funded by GAIA-X and Horizon Europe, multiple industries are actively developing these frameworks and standards that are related to the exchange and processing of data through cyber infrastructure networks called data spaces. And this is to make it easier for each participant who's looking to either share and contribute data, which you see on the left as data providers, or for those who are looking to innovate and use that information on the right, making it easier for them to connect to the system without each one having to connect to everyone else in this data space.
CHRISTINA DRUMMOND: This work is referred to as international data spaces or industrial data spaces - IDS for short. And I should note that from the data trust side, we adopted this model because at its core it had two key concepts that differ from what you might see in data harvesting approaches or data lakes. One - it's oriented around maintaining trust for all involved. Trust in the processes, the systems and the data quality.
CHRISTINA DRUMMOND: Two - it also focuses on leveraging these new technologies like blockchain and ways to account for data provenance, transformation and access, all while leaving that door open for people to join and leave this ecosystem with their data as they so choose. And so as we're looking to bring together both public and private sources of that usage data across commercial publishers, public repositories and many other stakeholders, we recognize that such concepts were core to what we were trying to do as well.
CHRISTINA DRUMMOND: And so this became a very aligned framework and set of certification standards for us to incorporate. Next slide. Through Mellon funding we're now working towards hosting workshops and virtual community consultations to develop the principles and requirements for data trust participation. Issues we know we're going to need feedback on include community governance, the necessary technical and security requirements, and also the kind of ethical principles or guardrails, if you will, around data processing and usage, as well as the compliance mechanisms that would have to be in place to make sure that everyone plays by those rules.
CHRISTINA DRUMMOND: This coming April will be hosting an in-person workshop to generate a first draft of principles that will then be sharing over the summer months for broad community consultation. The key for us is to determine how do we share this data in a way that's as open as possible, but also as controlled as necessary to ensure that we have the data granularity and quality we need, while also controlling for potential harms.
CHRISTINA DRUMMOND: This means we have to think through things like data transfer requirements, processing principles and also usage policies that define what's OK and perhaps what isn't in terms of how that data that is accessed through the data trust can be leveraged and used. Next slide. Once we have such principles and we have that community consensus around them, we're going to be transforming those into not only technical functional requirements related to security, and data policies related to ethical use and privacy, but participation terms for interacting with the data trust itself.
CHRISTINA DRUMMOND: Our next six months are going to focus on the principle development as we prepare and fundraise for our technical infrastructure buildout, which is on the horizon for later this year and next. Our aim is to meet certification requirements for that international data space. , the IDS, so that will be ready to engage with our European based OA book stakeholders, recognizing that if we can meet those very specific needs for our European partners, will be well prepared to meet other national data regulations that may emerge around data sovereignty.
CHRISTINA DRUMMOND: So with that, I want to share one last slide. And this is kind of relevant to what you've been hearing over the two talks before mine, which is really thinking about how the usage data is flowing through the ecosystem. Some may have seen this before. This is work that was done in our prior project by Michael Clark and Laura Ricci. What this particular slide, and the report goes into detail on, is how usage data for open access books in particular flows throughout the ecosystem from the point of data creation on those readers, kindles, what have you, all the way upstream to those who are funding.
CHRISTINA DRUMMOND: And they elegantly document here not only how it travels from the right to the left, from library management systems and book aggregators as well as repositories up through those who need to generate the structured usage reports. So, if you look there are points where multiple lines come together. And that's what we're trying to focus on. Alleviating some of the pressure as a data trust.
CHRISTINA DRUMMOND: Now recognizing that right now, that is a very manual process, pulling that information together to meet those needs we're just talking about. So QR code is there if folks want to scan and look at these. Thanks so much. With that, I'm going to transition to Jennifer.
JENNIFER KEMP: Thank you, Christina. And to all of my fellow panelists, I'm Jennifer Kemp from Crossref. And I'm going to try to wrap things up here a bit by talking about how open infrastructure and Crossref in particular fit in here. Next slide, please. So first I thought it would be useful to start off with an overview of exactly what is included in crossref. And you can see here, it's a very large number of records.
JENNIFER KEMP: It's mostly books and journals. What I think is interesting about that is the percentages of both of those have stayed about the same, even as we've seen considerable growth in newer content types, like peer review reports and preprints, which you see sort of popped out at the top there.
JENNIFER KEMP: Usage information isn't included in crossref record, so I think I should say that up front. But the size of this corpus means that it's really very widely used as a discovery tool to see what's been published and to connect to other information. On next slide. So next I want to look at some of what's in an actual record, because you can see here that while some information is required, it's really pretty basic.
JENNIFER KEMP: It's roughly a citation plus DOI in most cases, there's a lot of optional information that can and should be included. Skimpy metadata doesn't do anyone any favors, not the authors, not the systems that use the metadata, and may be least of all publishers. I like using books for this example because they're possibly the most complex of content types. They live on multiple platforms, probably more often than not.
JENNIFER KEMP: So I included here a pointer to the best practices developed by the Crossref community books group. But I think the takeaway here is to get the content registered. And we know that there are a lot of books like professional medical titles, for example, that often don't get registered at all. And even with the books that do, chapters aren't often included. So that means that for librarians and research administrators, for example, that makes it very difficult for them to know everything that their faculty have published in the first place, let alone to connect usage to it.
JENNIFER KEMP: So getting the content registered is key. And then for this particular scenario, making sure that affiliation information, including Ideally RORs, is included in each record and for all contributors. Next slide. Of course, there are a number of user and use case types for this metadata, certainly including metrics and analytics.
JENNIFER KEMP: I think the takeaway for this slide is really how heavily the metadata is used and by such a wide variety of systems and tools. So don't think of getting or having a DOI. Think of describing an output to a system that will in turn distribute that information to readers. And those readers, like Tim mentioned earlier, maybe humans or machines or very often both. The systems that use this information are generally designed in whole or in part to point readers to relevant content.
JENNIFER KEMP: So if the content isn't there, or if the information associated with it, the metadata is lacking, the communication about that work will be too. And that's not going to help usage either. Next slide. One of the other things I want to note here that I think is important is the network of open infrastructure, which in some cases is formal, in some cases in some cases isn't POSI or the principles of open scholarly infrastructure is meant to be a guide for like minded organizations on things like sustainability and transparency.
JENNIFER KEMP: But I think it also serves as a useful reminder of just how many kinds of organizations may be involved directly or indirectly, and having a full healthy picture of research. So if one is looking at usage of a book or a journal article, for example, that has an underlying data set that may be deposited with Dryad, for example. Just understanding that and how organizations involve, operate and work together is probably useful.
JENNIFER KEMP: Next slide. And finally, on that note, I want to close on the Research Nexus vision that hopefully at least rings a bell by now. We talk so much about more and better metadata, getting all of that affiliation information, for example, that I mentioned earlier. And I think we really need to start to understand that that includes connecting individual records through relationships in that metadata.
JENNIFER KEMP: So translations, versions, connecting preprints to versions of record. Those are all examples of relationships in the metadata, and it's important because it does contribute so much to that fuller picture of the entire research landscape from the initial funding of research through to publication and including post publication, discussion, discussion of research and layering usage or weaving it in might be better, a better way to put it that makes this network, this vision really much more complete.
JENNIFER KEMP: And I think that is a good example of how infrastructure enables these sorts of analysis. And to do that, it really requires participation by all of the stakeholders involved. Thank you.
JENNIFER D'URSO: Yes I almost jumped in and did that spontaneously. Thank you very much to all of our panelists. This was fascinating and raised a number of issues that I know we would all like to discuss more. Please join us in the live Zoom session where we'll talk about as many of these issues as we possibly can. Thank you very much.