Name:
Unpacking OA Usage Reporting What Do Stakeholders Want?
Description:
Unpacking OA Usage Reporting What Do Stakeholders Want?
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/a01d734e-dc9e-4dbc-9963-b2c089cd11f9/videoscrubberimages/Scrubber_1.jpg
Duration:
T00H54M46S
Embed URL:
https://stream.cadmore.media/player/a01d734e-dc9e-4dbc-9963-b2c089cd11f9
Content URL:
https://cadmoreoriginalmedia.blob.core.windows.net/a01d734e-dc9e-4dbc-9963-b2c089cd11f9/session_2a__unpacking_usage_reporting (360p).mp4?sv=2019-02-02&sr=c&sig=3S310fKB%2FrVnX3gN0r%2BrvJSp%2BmWrxGBu9gXXIPSv5ao%3D&st=2024-11-20T01%3A06%3A55Z&se=2024-11-20T03%3A11%3A55Z&sp=r
Upload Date:
2024-02-23T00:00:00.0000000
Transcript:
Language: EN.
Segment:0 .
Welcome, everyone. Thank you to those of you who are virtual and have been waiting patiently along with the people here who've been waiting patiently and sorry. We started a little bit late, but the room is now pretty much ready. So this session is titled unpacking open access usage reporting.
What do stakeholders want? And what we wanted to achieve with this is talk and have a pretty open ended conversation about what are the implications of open access usage reporting. How is it different from what we've been doing traditionally, and what do all these different parties involved in the ecosystem need? And so we've assembled a really great set of speakers. I'm going to kick off with a sort of an overview.
I'm going to be followed by Tricia, Cristina and Jill, not in person with a surprise guest. And we'll explain it when we get there. We'll all introduce ourselves as and when we get there, and then we'll leave, hopefully leave plenty of time for Q&A at the end. So let's go. First, the code of conduct, which hopefully many of you have seen many times, and obviously that is guiding our behavior in this session.
And the four values, the core values of society for Scholarly Publishing. OK, now we're into it. So I'm going to explore this, how the changing environment for Scholarly Publishing is driving greater complexity in open access usage reporting. I touched on a number of issues that Tricia, Cristina and would be Jill will explore in more detail from the various perspectives as an open access publisher, an emerging usage data trust and a library supporting a research institution.
But first, some metrics to underline the importance of analyzing open access usage. I forgot to introduce myself. So I'm Tim Lloyd. I run a company called lib links. We do identity and access, but we also do analytics and so we do a lot of usage reporting. So we processed about a billion events last year.
We took a sample of a mixed sample of $500 million of them that was both open access usage and paywalled usage events and looked at comparing how many more requests were made from open access usage versus paywalled usage. And we found it was 7 and 1/2 times. While it seems pretty intuitive that open access content should get more usage, it's useful to put a number on that difference and that underlines why is it so important to give consideration to open access usage reporting as a distinct challenge we face?
Open access usage also receives usage from a much more diverse community than paywalled content. So we took the same sample of $500 million events and analyzed the organizational source of open access usage in situations where the user's IP address matched a registered organizational IP address and we stuck that up as a heatmap.
So as you can see, while there was usage from 189 countries, the USA and China dominate the image, accounting for 40% over of global usage. So nothing unusual there. But if we filter down the countries to countries with less than 2% of global usage, which only removes nine countries, we start to see a different picture. While European countries still have the highest usage, you can start to see other countries show up, like Brazil and Sweden.
Lowering that filter to less than 1% of global usage reveals a whole host of lower middle income and low income countries that are getting value from open access content, including South Africa. Ethiopia Pakistan. Mexico enabling researchers and other stakeholders in these communities to access research is one of the cornerstones of the open science movement.
And one of the challenges for open access usage reporting is to more effectively communicate the impact to funders. So now I'm going to switch back to the challenges we face as an industry in collecting and analyzing usage of open access content. So one of the models I find helpful for thinking about usage reporting is to break the process down into three components.
So there's data capture where we capture raw event data from the publishing and distribution platforms that are delivering open access content to users. There's a processing of that raw event data, which can include custom logic for enriching the data. And there's delivery of the processed output in various formats, such as aggregate reports or processed event data. And I'm going to start with some of the high level drivers that impact usage reporting across the board.
And my first one is maybe unsurprisingly, scale. You saw that from that first metric, 7 and 1/2 times more usage. Usage of publicly available content can be an order of magnitude greater than paywalled content. Most industry applications for usage reporting were developed for paywalled content around a traditional model of month end batch processing. So these reporting architectures bake in the assumption that processing can wait until all the relevant data has been assembled and that reprocessing is a rare occurrence.
In my experience, these systems struggle to transition to an environment where reporting requirements are more frequent, maybe on demand and reprocessing is more common because data isn't perfect and some data may only become available at a later date. For example, if future reprocessing is a consideration, then you also need systems to retain and make the significant volume of raw event data available, at least for a period.
So some big data storage challenges there. The next one is granularity. Stakeholders are increasingly interested in understanding usage at more granular levels. The chapter of a book, an article within a journal, an audio or video segment, et cetera. This correlates to the item encounter reporting, and it's no coincidence that counters recent 5.1 update to their code of practice makes the item, not the title.
The default level of reporting. So encounter a journal is a title. An article is an item. A book is a title. A chapter is an item. Item level reporting understandably, significantly increases the volume and detail of data flowing through the system. Usage of open access content is also attracting interest from a broader set of stakeholders outside the traditional library audience for counter reports, those managing institutional research funds want to understand the impact of their funding.
Those managing and negotiating publisher relationships want to understand how their organization is generating new open access content as well as consuming it. Authors have increasing choice over where to publish, and usage reporting informs their understanding of these choices and the impact of their publishing. And all this is in addition to editorial staff wanting to understand usage trends as input into their editorial strategies.
So these emerging stakeholders drive new use cases that add additional complexity to the process. And last but not least is data privacy. County's code of practice includes a statement on data confidentiality that's based on the current icolc guidelines. That's the international coalition of library consortia. This statement prohibits the release of any information about identifiable users, institutions or consortia without their permission.
As usage reporting grows in scale and granularity data is made available to a wider range of stakeholders. We need to be thoughtful about ensuring how privacy is maintained. So let's go down a level of detail and let's start with data capture. So one major change we'll see over the next few years is an increasing number of publishers syndicating their content.
So the usage occurs in multiple platforms rather than just the content owner's platform. While this model is already familiar to those in book distribution, journal publishers are experimenting with distributing content via platforms like ResearchGate. I was in a session just earlier. I think it was one day talking up marketing and there a speaker from was talking about their experience of distributing content via ResearchGate and the impact that might have on understanding authors intentions.
Unfortunately, this also requires usage from these third party platforms to be integrated back into a publisher's own reports, to provide a comprehensive view of usage and adds more layers of complexity. And a great example of that is my next point. Diverse formats and metadata. So as raw usage data sourced from a wider range of platforms, a more diverse range of inputs is to be expected.
At the format level, the files can be tabular, like comma delimited, CSVs or structured in JSON. In some cases, usage data for a single platform can be split across multiple files due to the peculiarities of the database exporting their events. And these are all real scenarios we've encountered at the metadata level. Examples include the use of free text fields versus standard identifiers and varied conventions for time stamping.
As a community, we've yet to coalesce around a fixed set of standards for usage data in these scenarios, which is to say that plenty of standards exist, but we aren't consistently selecting and applying them. As a result, we're trying to build aggregated reports from diverse data sets, as is often the case. This ends up forcing us to the lowest common denominator rather than being treated as an opportunity for best practice.
Moving on to data processing. The new use cases I referred to earlier will likely necessitate additional metadata to drive new reporting. For example, content topics to enable analysis against research priorities or identifiers to enable reporting to be filtered for particular funders or authors. One of the things that counter reports do very well is provide very granular detail.
Or very aggregated detail. But that middle ground, which enables you to get a sense of huge volumes of data that's much harder sort of area where subject and topic taxonomies can really help. Similarly, we're seeing a need for more processing logic in a pay world. In a pay walled world. Platforms know the identity of the organization, accessing content, which is how we're able to generate counter reports for libraries in an open access world.
We typically don't know anything about the user, and so new processing logic is needed to affiliate usage with an organization such as matching IP addresses to registered organizational ranges. Another example would be using third party databases to look up the funder IDs for a journal article to laugh. Filtering by funder ID. Finally, what about delivery? So these new use cases also spawn a need for new reporting formats.
We need to support both machine driven reporting needs bulk exports, apis, as well as ones designed for human consumption. Web pages. Spreadsheets PDFs. You know, one of the takeaways I took from this morning's plenary was how the change to having machine automated and machine readable formats needs to coincide in parallel with making this information more accessible for users to understand.
That just drives a lot of complexity in the process. We'll see more reporting using graphical formats that make it easy to consume at a glance, as well as traditional tabular reporting. And as open access publishing becomes increasingly global, we'll also see a greater need for internationalization of reporting formats, date formats, number formats. Multilingual support. And we're seeing a demand for a greater frequency of reporting.
Counter reports capture a month of usage, but traditionally, rather increasingly, usage data is flowing in real time and enabling on demand reporting that can cover custom date ranges and be used to power reporting applications in real time is something that's going to be a bigger facet. So to summarize, before I hand over to my colleagues, the future of this future has important implications for the systems and workflows needed to support open access usage reporting.
They need to be exponentially scalable and support granular, real time or near real time reporting. They need to be able to flexibly cope with a variety of input and output formats and swap in custom processing logic for different use cases. All this opportunity for experimentation and variety also increases the importance of standards to ensure reporting is consistent, credible, comparable to counter. It will also require the development of policies to underpin data collection practices and ensure their legal and ethical.
But standards and policies are a journey, not a destination. So we need to be practical about what's supporting what's possible now. And they need to be reliable and auditable so our community can understand how they're created and rely on them for decision making. In short, this is a big change. It won't happen overnight. It does not need to.
But this is the future. We need to prepare for. And with that, I'm going to pass over to Tricia. Hi, everyone. My name is Tricia Miller. I am the marketing manager for sales partnerships and initiatives at annual reviews. And today I want to talk a little bit about the open access usage that we've gathered and the perspective of what publishers are looking for.
So with an increase in open access usage, the data that describes who and where our scholarly resources are being used has changed and we know it will continue to accelerate in that direction. These changes, however, may not align with the original standards that describe the value and use by the institutions. The audiences, contexts and purpose for the use of scientific literature is growing as more open access content is published, and therefore the measurement and significance of open access usage data needs to be re-examined.
It puts publishers in an ambiguous, ambiguous position to balance the paywalled and open access usage data to meet the needs of our users, ourselves and these newly involved stakeholders to start to respond to these changes in usage reporting publishers must have access to and report upon how open access impacts our entirely scholarly communications community.
For annual reviews. The challenges are amplified by our publishing model. Subscribe to open, which is a non APC open access model that necessitates institutional government and corporate subscriptions to fund open access publishing. It also means that our subscribers are acting as funders of our open access model, and we need to go beyond the traditional usage metrics to describe the impact of supporting open access publishing.
Not only does sorry, not only does open access usage impact more audiences, but currently must also be trusted to correspond with traditional usage metrics in order for all of our subscribers to rationalize their support. So how do we create a framework that correlates both usage types can satisfy successful open access publishing and continued financial support that's required for any open access publishing model model, whether it be for accessing articles, supporting author manuscripts or fulfilling funder and government mandates.
We start with the reasoning for open access usage that that is the impact and access for a truly global audience. I'll then discuss collaborating to create a framework for shared understanding and standards. And finally, the necessity to trust the data that's reported between us. So at the heart of open access is the impact and access to a global audience beyond a traditional audience, those accessing articles within their traditional institutional confines.
Open access publishing creates a complex network of new users in 2017, annual reviews with Grant funding support from the Robert Wood Johnson Foundation opened up one of our 51 journals, the annual review of public health, to help understand how open access affected the usage and impact of scholarly review articles. I'll be using the Lib links platform that has all of the annual reviews, open access data to show you some visualizations of what we found.
So what we were able to find was who was using our content and what context and for what purpose. And I'm going to briefly show you our open access, how open access data helped us visualize the impact of our open access publishing using the data that we collected as a case study. For that, the potential open access has from the annual review of public health. So after the first year of open access, the opening access to the annual review of public health usage increased by just over 40% And in 2022, the usage increase was 130% higher than it was when it was behind a paywall.
It should be no surprise that 90% of our usage is currently coming from academic institutions. But what's noteworthy is the variety of usage beyond academic institution that we find is continuing to grow. Our data from the annual review of public health showed 94 different types of institutions downloading full text, HTML and PDF articles. And the variety of institution types within academic, government, and corporate users prove that there's a need and value for scholarly literature that non-open access publishing models are leaving out.
The granularity of data is important in evaluating and supporting the needs of all of our users. And we found usage in places like construction companies, banks, food producers and even prisons that are using the annual review of public health. And I want to point out that this is all new detail that open access usage is unable to uncover that we previously didn't have access to. Another example of the granularity of data we have is the range of areas of interest of our users.
It reflects on the purpose of the access and the impact that articles that are open access can support for annual reviews. Our data indicated 326 different areas of interest that our users had. Then the last metric I want to share, discussing the granularity of data that our open access usage found is global usage. So the annual review of public health prior to being open access, we found access in 55 different countries and in 2022 that access jumped to 187 different countries.
So from here, we can clearly see how open access impacts usage. But now we must understand the needs of a truly global, diverse audience. How are all of our stakeholders impacted when our audiences and their needs are changing? This data is providing publishers with the opportunity to consider and create new products, services and business models that go beyond supporting only those that can pay to participate.
It leads to our need to develop a collaborative framework that's based on the integrity of our data, the availability of data to all stakeholders and the reproducibility and consistency, the consistency of that data, all of which are objective and possible to obtain through different platforms, but are also very subjective in their interpretation. But all that have the aim to accomplish our individual and collective goals.
So the traditional usage framework we all know is based on cost per download, and institutional attribution is now just part of the usage interpretation that open access reporting is providing. Many stakeholders are now also aware and very concerned with a mission driven framework where the benefit to the community, society and global knowledge sharing exists alongside the institutional benefit.
The equity and access to the global audience is one significant reason why open access publishing is accelerating. So quickly. And what publishers need to undertake is how usage needs can co-exist and be communicated effectively. It's uncovering cultural differences that also look at who can, who should and who are participating in scholarly communications.
The final point from the publisher perspective, that matters in open access usage reporting is trust. Having trust that our open access data is accurate builds trust generally in a publisher, but also in our approach. Any publishers approach to open access publishing. We know that there are many pathways to achieving open access, but we also know that they need to be sustainable, transparent and trusted.
As a community working together to achieve our collective goals, our open access data, sharing the results and how we've reached our interpretations help build trust in the relationships that these collaborations to build standards have made. As I mentioned earlier, open access data is much more granular. And so we need to also balance the transparency of usage data with the privacy of the users and their institutions.
The open access usage data that annual reviews gathers is both attributed based on known and unattributed meaning that it's not associated with any particular institution that's known to us or a current subscriber. But that doesn't mean that the information about the unattributed user is unknown. So to generalize, this is different from the traditional paywall usage reporting or counter usage reporting that most of us are familiar with where the results are pulled on institutional subscribers that pay to access content.
And of course, there's a lot more to it. But for time, the granularity of open access data shows us much more detail about the individuals who are accessing it. The data may not be consistently gathered, presented or interpreted, and that, of course, can cause some privacy and trust issues. So to summarize from the perspective of a publisher, open access data can offer evidence to who are real.
Audiences are thereby allowing us to ask and understand their needs. And we must also listen to institutions, libraries, other publishers and funders to help us reimagine a framework that that's needed for all of us and is sustainable, equitable, trusted and collaborative. So now I'll hand it over to Christina. Thank you. Hi, everyone.
So I'm Christina Drummond, and I'm honored to have this opportunity to shine a light on two efforts that are moving forward to improve the quality and efficiency of global usage data exchange. Before I get started, I want to note that I'd like to make things interactive. So if you have your Cell phones handy, whether you're in person or online, know that we'll be having QR codes pop up here to either access resources or interact with some polls as we go.
So with that, let me start by sharing information about an NSF funded workshop that I was privileged to co-host in April of this year. Recognizing that advanced data governance controls or related cyberinfrastructure may be necessary to support the trusted exchange of this granular and, dare I say, sensitive usage data. Charles Watkinson of University of Michigan Press and myself held a stakeholder workshop to explore whether national infrastructure is needed to support impact usage data exchange for analytics and reporting.
This daylong event was held in conjunction with the spring CNI meeting, bringing together thought leaders to discuss the current state and explore how to improve the fairness, if you will, of usage data. Talks from invited experts on the state of usage, metrics and analytics were recorded and made available, and you can access them here. It essentially documents our current state and what the perspectives are today of funding agencies, cyber infrastructures, library consortia, publishers and repositories.
We discussed at this event not just how to improve the fairness, if you will, of usage and impact metrics, but we also explored how to extend the CARE principles for Indigenous data governance to the realm of scholarship usage and impact for those who are unfamiliar with the CARE principles. They emerged from a November 2018 event led by Stephanie Carroll and Molly Hudson, held in conjunction with international data week and the research Data Alliance.
And even though these principles have been in existence for five years, many agreed at this April event that these care principles are necessary to ensuring trust and transparency for scholarship usage and impact metrics. The phase emerged that we must be fair and care to share. The workshop proceedings will be released later this year and initial takeaways fell into four categories.
First, related to education and advocacy. Attendees agreed that common terms and vocabularies are needed so that we can improve upon this current state together. Second, related to regulation, stakeholders recognize that we need to have a solid understanding of what are not only contractual obligations are, but the regulatory obligations, the legal ones with respect to security, handling, licensing and reuse of granular metrics.
Third, there was a shared interest in developing shared values and principles to define ethical use of usage data, especially given AI and the potential for unintended negative or disparate impacts based on how this data could be repurposed. Finally, there is an interest in leveraging ongoing research and development from the open access book usage data trust and the European industry data space efforts that are ongoing.
Looking at how we can extend what's currently being done for books to other scholarly outputs. For those of you who haven't heard about the book usage data trust, it's a mouthful. I know. I like to talk about the ebu owl. It's an outgrowth. It's an outgrowth of over five years of R&D supported by to over $2.5 million from the Mellon Foundation.
The effort, which is actively maturing into an infrastructure consortia, aims to implement the European industry data space framework or ids, to support the exchange of reliable usage data across our global ecosystem in a trusted, transparent and community governed way. There our past round of Mellon support, Kevin Hawkins and myself documented the usage, data analytics and reporting use cases to eliminate those specific questions, queries and applications that were sought after by book publishing and discovery stakeholders.
Our data trust effort honed in on the need to streamline the aggregation, curation and governance of this usage data as necessary to reduce the resources that are spent both on sending usage to others and aggregating data from multiple platforms and sources. Given the nature of the session today, I should note that we found stakeholders wanted to leverage granular usage and impact metrics for internal operations and Strategic Data driven decision making.
Publishers wanted to use trusted, timely data for editorial strategy, for print strategy and to evaluate promotions, marketing and discovery. And we looked at other stakeholders to that link to the report will take you to those reports if you're interested. As a tech folks, I always like to start by understanding what the MVP is.
What is it you're actually building? At its simplest, the data trust surveyed are engaged stakeholders, and we're now moving towards offering a short list of APIs to facilitate multi-platform usage data exchange sourcing and aggregation. But under the hood, we're working towards simplifying the legal security and data governance layers that are involved.
When you're trying to bring together information across platforms and services. As a data intermediary service, we're looking to provide and really improve the quality and timeliness of usage data exchange in support of the existing business relationships, much like switchboard does for Financials. But we're looking to do it for usage securely, routing that usage data between those with the authority to share it and those who have the authority to see it.
While we adhere to the principles of open scholarly infrastructure, we recognize that the data itself has to be as open as possible, but as controlled as necessary to foster granular access to both public and commercially generated usage. In our current round of support for melon, which runs through 20 2025, we're actually developing mechanisms to govern and sustain this usage data space.
We call these our governance building blocks. Lucky to be leveraging the data space framework coming out of Europe and its multidisciplinary approach to resolving issues with the controlled exchange and computation of sensitive data, frankly, between competitors. We're taking the lessons that are being learned in sectors such as health care, transportation and banking and applying them to scout.com. We're adapting these frameworks in emerging global standards.
So as not to reinvent the wheel as we knit together the fabric of trusted data infrastructures that we all use to connect. In addition to developing robust community governance in line with the principles for open scholarly infrastructure, our current efforts are focused on determining the ethical guidelines for usage data stewardship, exchange security and reuse, which we collectively referred to as our data rulebook.
We're preparing for a 2025 service launch by also addressing sustainability and studying the participation return on investment, if you will, with a small set of organizations. The data rulebook we're creating will make transparent to general audiences how this exchange is community governed, and how the technology stack functions through scalable partnerships and processes, while also outlining data use principles, terms of participation and compliance mechanisms that are needed for all involved to ultimately have that transparency and trust in the data that's being exchanged.
The rulebook will become the foundation for standard contracts once the service launches. In Brussels this past April, representatives from publishers, libraries, standards, organizations and services came together to draft an initial set of principles and identify what's needed for that data exchange to be both trusted and ethical.
Those gathered began to define requirements and identify compliance mechanisms to retain trust between the data trust participants. Directions identified are now being shared for community consultation. And this is where I'd like to ask you to get out your phone, because I'm going to share some of that now and invite you to reflect and add.
So first up, we have concerns that came up that must be addressed for those who create usage data to be comfortable sharing granular, sensitive metrics with the third party service. We centered on the risk of data leaks, potentially damaging use of the data by those receiving it. Suggested mitigation strategies included industry standard cybersecurity, controlled usage, data access and accountability measures for those who choose not to follow the data sharing and use agreements.
So my question for those of you in the audience, if you generate a usage data, what would you add to this list? What concerns do you have when thinking about exchanging granular usage metrics? So if you have your phone, you're welcome to take a screenshot of that QR code and it will take you to a Mentimeter and I'm going to ask our friends in the back to toggle the screen.
We can actually see as a group what you go ahead and put in. I'm going to ask you to take 30s to reflect on that question. And if you have any ideas on additional mitigation strategies, you can add that to. And if you're online, you're welcome to do this as well. To go to metcom and add that code.
So we're seeing a lot about community input and the power of that community governance and how relevant it truly is to this process. Cybersecurity is coming up. The importance of privacy. Importance of thinking about those kids. When it can and can't be shared.
Just want to go ahead and give it a couple more seconds. You're welcome to keep adding this while you do. I'm going to ask our friends in the back to toggle back to the slides. So next up, we ask this question of what is it that you need if you're somebody who relies on the data? So we heard concerns from those who need to aggregate and base analytics on these metrics.
You know, they were concerned about people gaming the system, perhaps inflating data or adding noise to the numbers, selectively making data available or perhaps worse, misleading others. Having folks hack the system or having misinformation targeted. Yarn tying the ability to participate in this Exchange Network to good behavior solutions that surfaced included ensuring that the data contributors had the authority to Bookshare and manage the data and that clear practices were communicated.
If you needed to trust a feed of metrics, information from multiple platforms and services. I'm curious to hear what you think we need to add to this list, either in terms of controls or in terms of potential solutions. So again, I'm going to ask those in the back to switch back to that Mentimeter. And if you had your phone open, it should have automatically advanced for you.
You can go ahead and add your thoughts here as well. Authenticity of the data. Data provenance. We had a lot of conversation around that. How do we know the source of the data, how the data has been transformed, the transparency of those algorithms? Any others.
As somebody who was relying on usage data, what is it you would need to trust it? Protecting that usage data. You know, something we talk about a lot is making sure that as you participate in such a Data Collaborative or exchange, that there's no harm. Those who contribute the information and who rely on it want to make sure that there's no harm done.
And we'll come back to that for a moment, because it's not just those who are creating the data or relying on it that could be harmed. Their reputations could be harmed. But it's also authors and readers we need to think about. So with that, I'm going to ask our friends in the back to switch back to the slide one last time, because that's the next thing I want to think about here is we just talked about trust requirements for those creating, relying on the usage and impact metrics.
It's important to consider the people that are reflected in those metrics. In our workshop, we asked folks to think about what concerns needed to be addressed, to trust that granular metric exchange would not harm readers or authors. Privacy, censorship, reputational impacts and negative targeting campaigns all came up with solutions focusing on shifting from repositories to decentralize distributed ledger technologies.
Zero copy solutions such as what we're looking to implement with the international data space, avoiding individually identifiable aggregation or benchmarks, and ensuring that exchange supports contractual and regulatory compliance, which of course, we all have to do anyways. And so I'm curious in this room one last time, what, if anything, would you add to this? Are there other concerns we need to be thinking about with respect to potential harms for readers or scholars, if that granular, you know, especially related to the granular metrics and the sharing aggregation or benchmarking of that across platforms.
Similarly, if you have ideas for solutions, go ahead and add those too. One last time our friends in the back could switch. The importance of anonymity. I apologize. I can't easily read all that sideways. More quantitative measures of success.
That incentivize the wrong focus. Things to keep in mind. How do these metrics affect what is published or rejected? I often think about scholarly freedom. These are all really important things for us to keep in mind as we talk about this shift to granular metrics and what that means if we start having operations and strategic decision making based on it.
So feel free to go ahead and continue adding your thoughts here. I'll ask our friends to switch back to the slide deck. And I'll just note, you know, there's a lot here to think about. We really are at this turning point. More importantly, as we look to advance this conferences themes of digital transformation, trust and transparency, you know, it's going to be important that we work together as an ecosystem.
And so feel free to use either of those QR codes that were shown on the screen to engage and work on this together. So with that, I'm going to hand the back mic back to Tim to talk about Jill's thoughts. Thank you. Okey dokey. Thank you very much, Christina.
So we are starting to run out of time a little because we started late. Um, so this part of the presentation was going to be delivered by Jill Emery, who is the a professor and collection development and management librarian at Portland State. Very sadly, Jill was unable at short notice to be able to be here in person. Jill, if you're there virtually, we wish you well.
Uh, however, we have Jill slides, but in the interest of time, I also have a special guest to join us. So at far greater short notice, I'd like to introduce a special guest who only 24 hours ago found out that they were going to join us on stage. So this is Elliot hibbler. Elliot is the head of scholarly communications for the Boston College libraries and very bravely volunteered without any slides or planning to come up and give the librarian University perspective.
So, Elliot, would you please come up and join us. And can we get round of applause to the person who joined it last moment? Um, and I'm just going to pass it over to you. So this is really Elliot's reflections from a library and institutional perspective. Thank you for inviting me. Like Tim said, I didn't have a lot of time to prepare slides, but that's OK because I cannot have done as good a job as everyone here did.
So I can just say, oh, I didn't have time and you don't have to actually see what my slides look like. So I really want to point on two different tracks where I see this usage reporting being very useful for libraries, and one is to an external audience and then two is internally for what libraries can do. So at its most basic, people still ask librarians. And by people I mean administrators, authors. They will ask, how often was my work downloaded?
And, you know, it seems like there would be easy to answer, but it's not always easy to answer, particularly when we need to pull information from a lot of different places. I feel like with consistent, reliable usage reporting, it will be a lot easier for us to reliably provide that information to authors and administrators and then to leverage that information into encouraging more and more usage. Now I do want to put the proviso that I don't think download count is the best way to measure impact, and I definitely don't think universities should use that too heavily.
And, you know, big tenure decisions. But people ask and we need to answer in reliable, consistent ways. The other thing for me in particular, one of my duties is to administer our open access fund. We use that open access fund to provide coverage for authors who don't have, you know, funding they can use from a grant to publish. This happens a lot more in humanities and social science.
And being able to get that usage data back will help us make the case for more funding in the future. Right now, our provost office generously funds us to some level, but if we had, say, more usage reporting or even up to a dashboard to show some of these download counts, I think that would really help us make the case for future funding. The other place I'm really thinking about this usage is internally for licensing resources.
Now don't do the licensing myself. We have a great team that does that. But in, you know, being here at the meeting this week, I can already tell that there's more and more open access material every year, particularly in hybrid journals. And hybrid journals are probably the toughest thing for libraries to figure out how to price and negotiations, because on the one hand, you know, with a regular subscription package, we can use our counter data, we can look at, you know, our cost per use, who's, you know, how much are these journals being used?
But what happens when every year the percentage of stuff in a journal gets bigger, while the percentage of stuff we are paying for, you know, gets smaller? What does that do to that cost per use? And then how do we use that information to negotiate? What a great deal is? And I do want to point out that it's not just something down to a cost per use Element. I thought it was great hearing about how this data we can look at where things are downloaded from and ideally we can use, you know, think about things like equity and where we want to put our money and thinking about how we want to fund, you know, how we want to pay for some of this material.
The other thing I did want to touch on was privacy. Now, I in some ways privacy can almost, you know, cut against some of these things. It would be a lot easier if we didn't care about privacy. But libraries still really do value privacy, so we can't go too far in our usage reporting. You know, in revealing things about people. That's why I really am optimistic about the counter. As as a good balance of trying to get information while at the same time trying to preserve, you know, some patron privacy because libraries just aren't going to, you know, ditch privacy in the pursuit of.
So we still need to find that right balance. I know we're very tight on time, so that's the end of my remarks. And yes, we can get back to questions or write down. Interest awesome. So we want to make this an interactive session. And I know we have both extroverts and introverts in the room.
So if you're an extrovert. And you to come up to the mic and ask a question of any of us, please, by all means we have Tim is our roving individual in the room. And for those who are perhaps a little more introverted, I'm going to ask our friends in the back to go back to that Mentimeter. Because oops, just went through the slides.
One of the things that we wanted to do was to ask some questions around. Go ahead and advance if you can. See there we go. I want to get your thoughts on. What else do you think need to have? Trust and foster transparency with usage and impact. Do you have any thoughts on this?
So feel free to ask questions at details on your device. But we also want to welcome you to come and take the mic and share your reflections. That's I think perhaps the easier of two question prompts we have ready for you. Anyone I think they need coffee. So the great.
The great thing about traditional subscription based usage reporting is it's so easy to tie a download of HTML view back to a specific institution. And I was just wondering, with open access usage reporting, perhaps one of you can speak to this. What percentage of the usage is not able to be attributed to a specific institution and how can we be able to attribute some of that usage to institutions?
Generally for I think it's different for each publisher. Probably we using the Lib platform and integration with PSI that is an optional service for institutions to add their IP addresses so that we can attribute. I think for us we're in the realm of about 50%, but that also means that all of that usage, not all of it, a lot of that usage is coming from places that had never accessed our content before because it was behind a paywall and they were unable to.
I think Tim can talk to probably a broader idea of what that number might be. So, yeah, it's great. It varies very widely, very widely. So it can be down as low as 5% Some data sets it can be much higher. If it's a large enough data set, then you get statistically significant results, even if it's a relatively low value.
But I think one of the areas that is interesting from my perspective is exploring other ways to fill in the other gaps. I've seen some very creative stuff done by institutions, so this is one I cite regularly, but if you haven't come across it before, Harvard has a great repository where when you go to access some of the open content, you're invited to leave a comment as to what was the value you got from it.
So this is qualitative rather than quantitative. Those can then be shared. You can either attribute it or it can be anonymous. But it's a great way to get collect feedback from your community as to why they're getting value from, in this case, institutional repository. Other methods would be to invite individuals to maybe register if they're prepared to and provide more information about the value they got.
And there could be an opportunity for us as a community to make it a valuable thing to let open access publishers know why you got value from it, and that's not something we really promote, but that could be something we could do. Thank you. My question and you brought this to mind is repositories, how do I know that this data is being stored in a quality long term repository so that we know it's there today, it's there five years from now, and that it's still quality, the provenance, all of the updates are being done.
So that is a long term challenge as well. And how can this help with that? And if I can just jump in on that, I think we also are hearing concerns from folks, you know, so let's say you treat your usage statistics as sensitive, commercially sensitive. Do you want another entity managing that, storing that, preserving that, or do you want to keep it in house and provide access on demand as needed in an accountable way with that metadata and provenance?
I'm hearing the latter. But what that means is that as an ecosystem, we need to start thinking about how we're going to manage that information long term. And I just add one more point to that, that in counter reporting world, you're only required to hold 24 months of rolling data. In my experience of doing this over many years, we have typically held it for far longer and very, very, very few publishers we work with have shown any interest in holding it for longer than 24 months.
So your question is an interesting one. You know, are there publishers who are interested? They may well be institutions that are interested. And I don't think it's something we've thought of as an industry. Anyone else? Oh, great. There's a question on the chat from remote attendees.
In the category of unintended consequences. If today you could take the time machine back to 2005, what would you have told the counter board? Let's take. Whoa someone else want to take that one? Consistency, I think, would be number one. I mean, we've learned from each iteration of counter, but I think we've all felt the frustration from, you know, counter three to counter four maybe was the specific one that was really hard and just reconsidering what is use and what is not use.
I feel like we learned a lot along the way, but to have more informed conversations towards the beginning as a publisher, I feel like would have been something really helpful. Oh, I would just Pat them on the back and say, you know, keep up the good work because it's been a long time to get to where we are now. And I think, you know, every iteration it just gets a little better.
Any other questions? I'm not sure that I've organized this very well as a question yet, but I keep thinking about the transition away from IP addresses as a good way to identify even an institution, even if we're not talking about open access as people are coming in from whatever device or thing they're on and they're using SAML for authentication.
And so you get attributes. And if they're a customer, you get the institution. But it seems like I wonder if we can replace that with raw or something. And if that could help with the whole open access counting as well, and attributing it to an institution while maintaining an individual's privacy. So again, not a question, but maybe I'd love to hear if that's a viable solution from people who think about this more than I do.
I think about it a lot and it's challenging because with Federated authentication, you know, the user is logging in and typically people are trying to remove barriers of access to open content. And I think it will be challenging as a community to start putting barriers in. So with all the flaws of IP authentication, it's, you know, the famous quotes, the least worst one to give us some information.
Now But I think there's lots of room for experimentation here. I've previously had conversations with institutions about experimenting with pop UPS and questionnaires, and again, this culture of, you know, things are free, but in order to keep them free, there's some value, you can give back as a community. And part of that is expressing why was this a value to you? And a lot of what we're doing with analytics is trying to perceive what that value is from a distance, and it's way more powerful to have the users themselves tell you what the value they got from it was.
I'd be fascinated to try some experiments in this area if anyone wants to have a go. Oh, and I just point out that even if, you know, some of our data is not perfect, one of the good uses of data can really be to just look at the Delta over time. So even if we don't know the right institution count, just if we whatever count we have, if it's consistent and we can see it going up or down, we can make some decisions based on that.
Wherever time. I want to thank you as an audience. I want to hugely Thank Elliott. You did a Sterling job at short notice. Thank you very much indeed. Bye, everyone.