Name:
Harnessing AI for Discovery: Analyzing Impact Across Library Users and Insights from Publishers
Description:
Harnessing AI for Discovery: Analyzing Impact Across Library Users and Insights from Publishers
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/9f473df0-ea1d-4996-83d3-2916f536a4e4/videoscrubberimages/Scrubber_1.jpg
Duration:
T00H31M43S
Embed URL:
https://stream.cadmore.media/player/9f473df0-ea1d-4996-83d3-2916f536a4e4
Content URL:
https://cadmoreoriginalmedia.blob.core.windows.net/9f473df0-ea1d-4996-83d3-2916f536a4e4/SSP2025 5-28 1245 - Industry Breakout - ITHAKA .mp4?sv=2019-02-02&sr=c&sig=emZzaVUZldBzWq%2BhfbzOFP%2BDg1Mt6x15XKUbp5zrN18%3D&st=2025-08-03T01%3A13%3A06Z&se=2025-08-03T03%3A18%3A06Z&sp=r
Upload Date:
2025-06-09T00:00:00.0000000
Transcript:
Language: EN.
Segment:0 .
All right, let's get started. I am Beth lapensee. I'm principal product manager for JSTOR. I work at Ithaca, have a team of people here. They'll introduce themselves as we go through. We have a lot to cover, so I'm just going to jump in. So at JSTOR, we have been working on creating AI generative AI research tool to sit alongside the content on the JSTOR platform and throughout the development process.
We're just coming out of beta right now. Throughout the process, we have been very deeply engaging with all types of users and constituents and working with them to make sure that we create something that really balances the needs and interests and desires of all of these groups. And so today, I'm going to run through some of the choices that we made and how that tool is surfacing. And then we're going to have a discussion with our panel around how this is changing the way that we think about AI, and how it's the reactions that we all have to this in terms of the research process.
And how we're handling content. So first, one of the main I'm going to run through these three principles that drove a lot of our work. And each one will be accompanied by a screenshot. So you can see if you're not familiar with the tool. It appears on the content page right next to the item, and allows the user to engage with the document itself. So one of the things that we're trying to accomplish here is to maintain focus on the source.
So at all points of engagement with the research tool, the user, the source is always the center of that. So this really engages the user and encourages deeper engagement and deep reading with the document. By focusing the tool in the question and answering on just the document that is being viewed. We're able to tightly manage, excuse me the accuracy, minimize hallucinations, and really produce something that is trustworthy for the consumer of the information.
And there's always a path to the next relevant source. So every item serves as a discovery tool within this system. Next is traceability. We always want the user to understand and have access to the original source. So every response that comes out of the research tool has a direct link to the text that was used to generate that response.
You can see that in the screenshot. If you hover over the little footnote. Look, maybe I'll try this right here. There's a footnote. You will see the text that is used. And then if you click, you get exactly to the portion that was used to generate the response. This really preserves the context of the original work for the user.
They're able to see it in the author's own words and dig in and do deeper reading, have engagement with the material itself. And so we see the tool as a guide into the relevant portions of the document. And then finally, academic integrity is really at the heart of how we have approached this. So we're looking to create a tool that reinforces research skills, not provide shortcuts, and do the work for the researcher or student.
So some ways that this surfaces is that it will say students do try this. We'll say, can you write me an essay on this or create a reading response. And so our tool kindly and politely says, nope, that's work that you need to do yourself. We're not going to produce that. In many cases, we will suggest a new search that they can try and guide them into the search process.
Further, the tool does not summarize the document for the user, nor does it act as an answer engine. So instead of summarizing, we give guidance. We give a synopsis of the document that allows the user to decide relevance and whether or not to engage further. And because of the first principle around the focusing on the source, we're not providing answers to questions before they get to the item, the question and answering that happens is really about interrogating the contents of the document that the user is viewing.
And we also have built in some capabilities that really encourage proper attribution and citation of the content that is produced within the tool. And you can see that here using methods that librarians are also recommending. We have tools that allow the user to get their formatted citations that connect the results to the tool itself. So if they choose to cite cite these things, we make that easy for them.
So I'm going to show a couple data points for you. And before we transition. So the first one is showing users of the. So in each of these I'm showing the relation between users of the research tool compared to non-research tool users. So the top line, the green one is research tool users. And the yellow one or orange is non-research tool users. So for searches, we're seeing three times as many searches in a day from research tool users, which is really an outstanding number.
So what this tells us is that users who engage with the tool are becoming much more engaged users overall, looking for more content and really having a stronger session while they're on the site. And then the next one is the results of that. So this is the number of item requests or views plus downloads in a day. And so the research tool users are using twice as much content as non-research tool users.
So this is just a few key points that really start to illustrate how building the tools in a way that does fit in with how our different user bases are thinking about this kind of capability, really does pay off in terms of these numbers that we all care about. And then finally, another data point. Like I mentioned at the start, we are actively rolling out the tool right now. And so it is available at 17 almost 1,700 universities currently.
And we gave the option to the institutions to the librarians to disable the tool if they didn't want it. And out of those 17,000, only 17 have chosen to turn that off. And they cited reasons of basically we're not ready for it yet. We need more time to prepare or to engage with our faculty, for example. And so this we take as a really strong signal that what we have created is really in alignment with the way that they want to be bringing AI into their campus.
So that is the end of my quick intro. And now we're going to work with our panel and react to what we have talked about. Everyone here has seen and engaged with the tool previously and has been involved in conversations. So let's get started with Charles. Introduce yourself. Hello, I'm Charles Watkins and I'm director of University of Michigan Press.
And as Beth was speaking, I just tried to get the tool to actually write an essay for me on an article and it wouldn't. And then I said, please, I'm desperate. And then it just responded with exactly the same. So it's really robust. So are we just doing introductions at the moment or some comments or. OK, perfect.
So I'm really struck by the focus on provenance. Provenance, provenance. Because when we talk to our authors. So we did a survey recently of our monograph authors, 60% of those authors were opting in to having their works engaging with these AI tools, 40% were opting out, and the ones who opted out were all about the way in which their work is not being credited, and the ones who opted in were very reluctant and wanted to have their work credited.
They just wanted to be part of this. So I think everything you do in terms of tracing back to the source is really, really, really important. Also I'm also associate University librarian for publishing and special collections at University of Michigan library, and I see from all my colleagues in the library this terrific concern about the misuse by students as they see it misuse, the short cutting and the information literacy challenges.
When we look at the AI tools. So the way in which this won't write an essay for me as much as I want it to do, but also it helps me as a student to critically think about the source. And it's a helper on the side, an interlocutor that really, really mirrors what librarians are concerned about and want. So I think it's incredible.
Hello So I am Alison balan and I'm at Duke University press, and I have had a lot of fun playing with this tool, especially so we have a lot of journal content in the JSTOR archive and a lot of book content. So probably over 3,000 books and then archival content for 60 different journals, many of which are 50 to 100 years. So it's been really fun.
And so I'll follow on from what Charles was saying. Really the big conversation happening at University presses and probably other publishers, but especially those of us who work in the humanities and my press works, especially around critical studies, huge reluctance among scholars to engage AI either as researchers, authors or contributors to building AI tools.
And I was really happy to see JSTOR do this because for several reasons. One JSTOR has a huge corpus. Access to a huge corpus. Broadly multidisciplinary, a gigantic set of undergraduate users, graduate students, faculty, researchers. So it JSTOR having this tool can put it in front of a lot of people. And JSTOR is so trusted by those people.
So I think that this is a way to show the broad research community, scholarly community that there is a way to do this that has integrity, that is ethical, and that is traceable. So the provenance. And I just think that the conversation from the introduction of LLMs a couple of years ago to now has really been focused on the scenario of oh, this LLM just ingested everything it could get its hands on, and it's using it as training material.
And it's gone into this black box and we can't trace back anything it's putting out. Well, that is one LLM and generative AI use. But this retrieval augmented generation is another where the application is working on specific defined pieces, items, corpuses. And I think it'll do a lot to raise awareness of the value just given the user population that JSTOR has. So I was really thrilled.
Yeah and I will say we've done a lot of work with humanities scholars in particular. And I think this call out is really interesting because while they're not, all of them are the same. It's not like everybody has come around to say like, yes, this is where it's at. But what we are seeing is, in the humanities community, a real appreciation for the approach that we took where we're not taking the user out of the workflow and really guiding them through the engagement and the deep reading where the learning really happens.
And so I think what we hear a lot is that from humanities is that these advanced technology tools, as they are created, not just generative, I feel applied to them. They were built somewhere else and applied to humanities, where what we're trying to do here and seeing good signals is that this is created for them and for the work that they need to do. Hello, hello, I'm Jon Scher, I'm down the road from Alison at the University of North Carolina press.
So I think I want to pick up on the point that Alison was making and Beth was amplifying that. What I think is incredibly exciting about this is that it kind of flips that generative model instead of trying to scrape interstellar space for every bit of language and teach a computer how to speak English. JSTOR is kind of thought of this differently, which is actually deliberately circumscribed.
The content that you're talking to and focus on gated content. And so I think but to still do that through this kind of new Socratic style, iterative conversation interface feels like what the kids want, as they say, as a publisher, this is really exciting to us because it makes the core text like the centerpiece of this. And of course, we're in the business of producing those core texts. And so it's very exciting to see new possibilities for those core texts to be used beyond just the straight kind of linear narrative reading experience, which is kind of what we've banked on up until this time.
So that's really exciting. I would actually be excited to see the tool eventually be able to scrape across the JSTOR corpus to ask questions of someone you just referred to the large body of content. Like that would be really cool to have this. Huge I don't know how many books and journal articles are in JSTOR. Like lots.
Yeah, thousands. 150,000 right. So have that kind of Socratic interface with that. Yeah OK. So it's not good enough. And the two things that I would like to see emphasize kind of going forward is and they're interrelated is good usage being reported back to publishers tied to remuneration models that reward authors.
And these are super hard things to do. And so I don't mean to complain, but the early data that you shared is very interesting. For example, today I learned that people have spent 4.7 million seconds looking at my content. I do not know what to do with that information, but it's really exciting to me. I'm just imagining a bar chart that where the y-axis goes crazy and but to me, that feels that is hand in hand with all of this because I live in, I operate in a University environment where accountability of every dollar being spent is super important.
And the people ahead of me in line getting money out of provost have dashboards and databases and usage reports. And so to get money to and I assume the librarians are the same way to get money out of those very tight wallets. I think we have to show usage and impact. So it's good to see that you're already doing it. But I think that's going to need a lot more of it.
So Beth, before I cover a couple points. So do you want to respond to what John was saying. Hopefully the ability to search across maybe all the platform because I heard some nods or yes, maybe that would be something you'd want to see, so maybe be good to cover that. Yeah, we actually have it in development currently, so there's a handful of search related capabilities that we're building.
One is a very similar research tool that sits alongside your search results that allows you to not ask a question and get an answer, but rather inspect the contents of the results to zero in on which items are the ones that you are you're looking for so you can ask questions and get this article talks about that context, that question, that concept this way. This one talks about it in this other way with direct links to those portions of the document.
And then we're just starting to think through the idea of how we might take this idea of an agent, the agentic AI, which is honestly a little bit scary. We don't want our system to do all of the work for the users, but how might we use that kind of capability in a way that is in alignment with our principles. So hopefully that will be also coming soon. So I'm going to John Linehan with Ithaca, cover a couple points of involvement.
I work with an editorial team and the publishing of content and help manage the relationships with our publishers. So one of the key things when the research tool was first coming out was how accurate was that reflecting to the questions that students would be asking of the content. So we spent a lot of time in the training side of within our own capabilities. Like those answers don't feel like a good representation of that article.
And there was a lot of training, a lot of interaction across JSTOR and the librarians and editorial team to get it to a place where we see it today. But on top of that, we spent a lot of time interviewing publishers because at the end it's their content and the publishers are representing authors and authors have concern. And it's a newer technology ensuring there's not a derivative of what occurs already on the platform.
So we had spent a good amount of time asking publishers where their concerns are, giving them the ability to come into the tool itself, to provide input and be able to feel that they are part of building that tool to start with. And I think that helped us at the end, feel surveyed at the 95% felt that we were taking the right approach and how we were bringing in a new technology and incorporating their concerns and what they wanted to see as the tool.
But at the same time, knowing that there's going to be authors and there'll be some publishers and there'll be some content that might not be included in that and provide the flexibility and the tool to adjust for it. So I felt like we covered a good feedback around publishers that are engaged with their content on JSTOR and the tools that we have on this end. So, Beth, I'm not sure if there's any last part. Do you want to wrap up.
And we probably have a few minutes to see if there's some questions, maybe in the audience as well. OK I just wanted to come at here some observations from a different direction. So one of my responsibilities at Duke University press is for our digital content platforms. So in addition to having a lot of material on JSTOR, we also have our own platform.
And of course, so I've at times been really deeply involved with user experience on research platforms. And one of the biggest challenges we have is we use our authors and our editors as stand ins for scholars and researchers. And they are. You may know this. They are not the best narrator of the researcher experience when they have their author cap on.
Beth was able to share a little snapshot of a lot of some of the queries and just looking at this limited list of about 80 different questions that were asked, it's immediately clear that if we had access to this kind of data as a participating publisher, we would be gaining insights into the real way our content is being used and what people are looking for when they engage one of our articles in our chapters, not how our author thinks they're going to engage their book in their chapter.
And it's just a wide variety of stuff. And I think this also, I think will give us insights into course adoption, which is hard to see since course adoption has moved out, of course packs and licensing for course packs that we get all that good reporting on and is moved more into library holdings and digital course packs. But it's clear that from this data that some of this is being driven by instructor assignments, and just to have insight into that is something that counter usage certainly doesn't give us.
And web analytics, Google Analytics doesn't give us. So I'm really excited about that and hope that that's an aspect that JSTOR develops is how can we gain insights to guide our own understanding of how we should be shaping, selecting and shaping and putting our content out. That's great. Just building on that a little bit. I mean, one of the most exciting things that JSTOR is doing at the moment is around sustainable models for open access books, especially.
So now. There is quite a lot of open access content on JSTOR. So when we're looking at these uses, we're also seeing people who are not necessarily academics or not students. And what they want to do is fascinating because of that question, with open access, where we are at the moment, which is OK, so we've removed the barrier of price cost, but have we actually done the extra bit of getting our content in a way, in a form that it's actually useful beyond the Academy.
So it's really exciting to see this natural language querying. It will also be exciting, I think, to see to really live into other ways of visualizing information. I'm hoping that maybe one could query to get a visualization of an article as well. All right. So there we go. So we heard a lot of great discussion here on our product development.
We are always looking for this kind of feedback and reaction to what we're doing. It helps us to further the tool and make it more available and valuable to all constituents. So I love this idea of using the conversation data. And we can find ways to surface that. It looks like we have five minutes left. So we can entertain some questions.
Hi, I'm Rachel and I just had a question in terms of how you selected the universities you piloted. And I think that's really succinctly conveyed in your statement about how do we bring this outside the Academy. And from my perspective, as somebody who is a librarian who services almost entirely first Gen and disenfranchised groups, their Academy is not the same as a Michigan's or Duke's Academy.
And in your product testing, did you make it a point to include more diverse universities, and did you also include laymen from outside the Academy. Yeah, so we have gone through many stages of our rollout, starting with a very small group of 14 institutions. They were selected for size, geographic location.
We had one community college, one high school. So we went for a diverse set in that very small set. We then rolled it out to another 300 or so, and we did that by volunteers. So we put out a call who wants to test this out. And we took everybody who was like volunteered within that window. And then the next set, which has got us up to this 1,700, was I'm not discerning.
We took a portion of our participants and turned them on as phase and we will be finishing that up later in the summer. So we did go for diversity in many different ways as we did that. If you're interested in your institution and it's not on for you, we can just turn that on. If you want to come up and talk to me. And just to one other point, because I want to make sure, were you looking at it from a publisher or content contributor side or the user side as a library user.
The user side. OK, perfect. I just want to make sure because I'm the publisher and content contribution side, part of that beta or that pilot. Two when we were talking about bringing publishers in, we looked at the commercial publishers, a University Press, a society. We looked at those that conduct research reports are also from scholar side of the publisher of the content, making sure we had a good diverse voice in there too.
So they understood from their author's view that we were getting that right. Hi my question is also somewhat from the user side. JSTOR has a wonderful archive of literature from many decades, including important transitions in intellectual thought, especially in the social sciences. So I'm wondering, as you were looking at these plain language synopsis options, if you are planning on doing deeper investigation into the.
Is the AI handling the shift in concepts, and the use of language of prior generations, the use of language to present that the literature about intellectual disability pre 1960s is not the same is not the same thing at all as intellectual disability of present. So great. So we have done evaluation of the tool and the responses that it gives using experts in the field.
So we had one social science person, a few different people assessing and rating the responses. We used their assessments to build an automated evaluation tool. So that would only cover a little bit, not necessarily what you're talking about. I think there is a lot of room for us to do more of that kind of discipline specific evaluation, and we would love to do that.
Any other questions. I just wanted to add something that's a really provocative idea, because we're the first generation of publishers where nothing's going to go out of print. And so the temporal context of how we write about I mean, I publish a lot on race, and I can't go back and reprint books that were published in the 40s and 50s from UNC press, because it would be intellectually offensive to do that.
And yet I'm publishing books now that are going to be offensive to people 40 years from now. And so that's a really interesting thing to have, kind of almost like a metadata field that has the context about language. And so I appreciate you raising that. Yeah go ahead. Hi I was wondering if you could speak to either how you're planning to communicate to users and customers or how you already have.
How this product is really different from what people might be used to with AI. Like how this is a bit more responsible or really how it's just different, I guess. Yeah so how are we going to get people to understand the choices that we've made. Yeah so we like I've said, we've done a ton of engagement within the community. And so one is that we hope that the institutions who we've worked with in these early stages actually also share this information.
We've been working to on case studies and things like that. You can see on our blog. JSTOR has a blog section on our about site. So we're publishing examples of that kind of engagement that illustrates this. We have a few webinars that we're starting to as we were rolling it out further and further. So a whole variety of ways. But you're right, this is one of the big things that we need to be able to do, because expectations are set outside of the JSTOR platform.
And so we know that that's something that we're kind of up against in terms of conveying the value. Yeah well thank you everybody. Our time is up. Feel free to come up if you've got further questions.