Name:
AI: The Scholarly Ally – From Open Science to Global Dissemination
Description:
AI: The Scholarly Ally – From Open Science to Global Dissemination
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/c70bdbf3-a6ab-422b-bff1-122ebc873943/thumbnails/c70bdbf3-a6ab-422b-bff1-122ebc873943.png
Duration:
T00H59M39S
Embed URL:
https://stream.cadmore.media/player/c70bdbf3-a6ab-422b-bff1-122ebc873943
Content URL:
https://cadmoreoriginalmedia.blob.core.windows.net/c70bdbf3-a6ab-422b-bff1-122ebc873943/GMT20241001-200224_Recording.cutfile.20241002122522617_1920x.mp4?sv=2019-02-02&sr=c&sig=kCO0Bbbw51Vawz%2B4RRT0NzM1tKzCpMZsxy1FsEhy8oE%3D&st=2025-04-10T03%3A47%3A31Z&se=2025-04-10T05%3A52%3A31Z&sp=r
Upload Date:
2025-04-10T03:52:31.8515248Z
Transcript:
Language: EN.
Segment:0 .
Test we're good. I think so. Can you hear me? OK all right, everyone, welcome to the last panel of day one of SSP new directions. I, the scholarly ally from open science to global dissemination. So it's time to address the elephant in the room that's been underscoring all of our talks today.
Or maybe the threads of the rug beneath our feet is a little bit more apt analogy. We have an excellent panel. My name is Jamie Devereaux. I'm a senior general manager with Mary Ann Lieber, and I will serve as your moderator today. It's great to have such a diverse and engaged audience, both in this room and online. And we're all grappling with AI right now.
You know, how is it reshaping the scholarly publishing fields in transformative ways, helping researchers manage the vast amounts of information that's coming through things like streamlining literature review, process efficiency and peer review, enhancing editorial workflows, and of course, much more. So, as with any disruptive technology, it's bringing a lot of challenges as well. So things like trustworthiness, bias, transparency and of course, the role of human oversight.
So those are things we're going to get into in this discussion a little bit today. I'm fortunate to have here with me 3 experts that are bringing a diverse experience and viewpoints into this issue. I'll introduce them in turn and then turn it over to them to introduce themselves. The format of the panel. After the introductions will define a few terms that we're going to talk about.
Go over some questions and then open it up to the room. We also have a live poll. If you're able to get to it in the Zoom event, we'd love to hear your feedback on what AI you're using, if it's relevant to your role, and we'll bring that into the discussion as well. So please do chime in and co-create this event with us. So with that, please join me in welcoming dipika Bajaj. Over to you, Topeka, to introduce yourself and talk about your connection with AI.
Hello, everyone. Nice to have you all here. And this is the last session for the day. So thank you so much for staying back and listening to us. We've prepared well because Jamie did make us work very hard. My name is deepika Bajaj. I have been part of the academic publishing world as a role with atypon and John Wiley, and since 2019, I have actually moved very close to the AI ecosystem.
And I've been working with data and cloud and storage and AI tools. Right now, I'm working with lots of companies to deploy and develop AI solutions that will become commercialized in the next five years, and we are looking at some of the challenges around trust and what's the new responsible AI framework that we should be building. And then I have friends like letty who invited me here today to kind of share with you out of an industry perspective on what's happening in the space and how that might add some perspective for the conversation today.
So thank you very much for having me here. Excellent welcome to PCA. Over to you, Dan Berger from ACEs, bringing a deep experience in AI and publishing workflows. So over to you for a brief introduction in your connection with. Yes Hello. Thank you. Thanks, everybody here.
And Thanks to everybody online for joining us today and sticking with us. My name is Dan Berger with American Chemical Society and been with ACEs for almost a year and a half. Not quite, but I have been working in technology and publishing and scholarly publishing for a lot longer than that. 1915 years depends on how you count, but have a good amount of experience.
I come to the discussion today as a product manager. I'm not a researcher. I'm not a data scientist. I'm not a policy person. But I am somebody who builds solutions and find solutions for people and users. And that's really how I've been looking at the world of AI and all the technologies there. How can it help us solve some of the problems that we're facing?
Thanks great. Thanks, Dan, and over to you. Mary Ann Callahan, joining us from DCL. Hello, everyone. My name is Mary Ann. I'm with data conversion laboratory and I'm I come to the panel today as a service provider. We work with many publishers. Many of you in the room and AI related technologies are not new to DCL.
Anyone who have had the opportunity to talk to in the past couple of years, I probably bore and remind them that OCR is one of the original machine learning technologies. That's optical character recognition, and, you know, that's teaching a machine how to discern the difference between a zero and the letter O sounds. It's commoditized now.
And perhaps one day some of the GPT technology we're going to be talking about today that feels so new will also become just another tool in our box of what we use. But Yeah, that's my role. So so we our my company's solutions tend to involve a lot of AI related tools and technologies. And I think very soon we're going to go on to definitions, which I'm a real stickler for, because we keep saying I, I, I, and for the most part, I think we're talking about large language models and generative pre-trained texts in this conversation.
That's exactly right, Mariana. You'll notice I moved to the generative AI definition slide just in time for your comment. We often say AI and mean generative AI. In the purposes of this discussion, we are talking mostly about generative AI, a subset of AI that focuses on generating content, text, and images. Of course, chatgpt we all know has some more examples.
Illicit Trnka site I don't know if the site people are. It's a great tool. Scholar C and many more. And chatgpt I was looking back at when it started. It was only back in late 2022. So here we are, not even two years in. Since the advent and introduction of that, you could see the, you know, effects and changes that has had.
And we're not going to go through a bunch of definitions, but we'll also touch on the hallucination aspect. This meme is now quite dated, but it really does make the point. Well, in that, you know, it's an instance where AI generates outputs that are factually incorrect or nonsensical, often due to its limitations in its training data. So I think we're moving away from this more and more, at least in my experiences with AI, but still certainly something that's happening and something that will be coming up in our talk.
OK, so our first question to the panel and something that has been running through the course of our discussions today, is ensuring integrity and trust in AI generated content prior even to thinking about the logistics of putting it into publishing and the research process and across the board. So how are we seeing this done with respect to the integrity of and trustworthiness of the content?
And we'll start with you, Topeka. Great Thank you. Very, very broad question. So I'm going to just speak about looking at content and what AI technologies are doing from like three step process. Number one, AI solutions are looking at cost and sustainability, right? Like how can we reduce cost for the work that we do, but also at the same time doesn't take too much of processing or be sort of like adversarial to our environment.
Second is responsible AI, which is about having the ability to create technology or AI solutions that are safe for humanity. And I think finally, it is about looking for compliance. What kind of guardrails that we need to put in that will actually produce the outputs that we want. So those are the three things that have to go in order to do the integrity.
So I'm going to talk to you about the ecosystem of AI. There are some developers of AI solutions. And then we are the consumers of I, who are the big developers of AI. Microsoft meta. You know Google. These are the big developers. They have already embraced a framework for responsible AI, which actually checks for biases, and they have made themselves responsible for making technology that will meet those standards.
The real question, again, is if we give corporate America the idea to be responsible for the solution that they create, is that really responsible of us? So that's yet to be determined. On the consumer side, we look at the outputs of these solutions that are coming out a little bit like the hallucinations that Jamie touched on. So what is happening in that space right now?
AI technology, as you know, is a growing field, right? And now there are many companies that are coming in that are offering single API solutions that test for data leakage and hallucinations. I'll just give you a name of the few that are kind of growing in this space. There's one called the trust wise. That's a single API. They offer it on top of your LLM and you can.
Or your rag, which is your regenerated augmented generation of data. And it gives you a very fine confidence score of what a trustworthy content means. Right I'm sure you guys have heard about ithenticate and you've used about used a lot, but that's purely for Scholarly Publishing, and that can be a very effective. And these are some of the things that are happening in the space with respect to integrity.
But I do want you to know what is the genesis of integrity. It's all about data. We are feeding the training data to the LLM. If we have hallucination in our data, it's going to show up in the hallucination of the, you know, output. So just to give you an example, I was just sharing with my table over there, Amazon had an application tracking system that they had put in place.
And what happened was that they started to see more, and this was for engineering roles, and many of them saw that it was only accepting male white men for those roles and just eliminating all women, because there was no data that proved that women had done this role in the past. So they had to revisit this whole and change their entire database and do a complete data revamp to again get the outputs that we wanted.
So think about if we have bias in our data. It's showing up in the LM. So it's not that the technology is doing it to us sort of like that. It's also if we as a society have not created that content, then it's not there. Well, that's about it. And that was trust wise. I was your example.
Yeah that's a really excellent tool. So I will note that again for the group trust wise I and anything to add from that Dan or Marianne to your comments or just in general. And your choice just in general. Yeah I mean, I think that there's a lot here with regard to the integrity and how it's used. You know, you spoke about the integrity of actually building some of these lm models.
You know, I think, you know, I could talk a little bit about how integrity plays into how it actually gets used. After it becomes a model. And, you know, in, in the application, in scholarly publishing and other uses. You know, one of the first things that comes to mind for me in terms of integrity is transparency. Transparency of who is using some of these AI tools and what they're using them for.
I think transparency goes a long way. If you read something and you think, I wonder if this was written by chatgpt or some other tool like that. If you come out right up front and say, I used chatgpt or some other tool to help me write the first draft of my paper. That's going to go a long way for people to say, OK, great. Now let me read what else is here.
So that transparency piece for me becomes a really big and important piece. And right now we don't have a good way to do that. There's no there's nothing on the front end. You know, I'm speaking as a platform product manager. There's nothing on our platform that we have today that says this, this article, this piece of content has this aspect of AI that was used to develop it. And so I would really be interested to see how we evolve our standards so that we have things like this in place on our platforms that state.
These things, and not just on the platforms, but within the metadata itself, so that as these articles move and as the content moves from platform to indexer, that becomes part of the metadata so that that transparency is really important, you know, and as a side, I think it's really also important getting back to what you said earlier is that these LMS that we're talking about right now, they were built and trained on existing content and existing data.
They're very, very clever with language and they're really good at ideas, but they cannot generate new knowledge. And so that's an area. I think we need to really be really careful about how these things are used and how we're interpreting their use, and be really suspicious. If you see something, you know, that really is, has that's representing knowledge or new knowledge and making sure that actually comes from real research.
Yeah and I would add, Dan, when I there's a developer on my team I speak with a lot and he always reiterates GPTS are language prediction engines and nothing more. They do not reveal truth. And when we think about how these things were trained, yes. Now that that data are very old really when we think about it and truth changes. So just scientific.
The scientific methodology itself is an evolution. So we have to be able to have models or understand how these models work, so that we know that what comes out of them still does not represent scientific evidence, truthful information necessarily. And I think the other side of that is also like educating ourselves, educating our staff, educating our children.
One of the first things we did at DCL was create an AI acceptable usage policy. Part of that is we are an ISO certified company. We handle a lot of your data, not our data. So we have very strict constraints. And make sure our staff understand that external data cannot go into an lm. We we can't put in internal things into an external system. And so Yeah, I just think those sort of understanding how these things work to the best of our knowledge, because there is a lot of Black box development technology out there, but we have to kind of have a bigger picture with that.
Yes and we're going to move into that kind of trustworthy and verifiable data piece. I wonder if we could show the results of the first question poll just for a moment, so we can see what the folks in the room and online are saying. So the question our AI tools relevant to your current role, 70% said yes, 30% said no. Not able to see the short answer right now, but I think we probably know what many of those would entail.
LMS, things like ithenticate and grammarly and so on. So let's move to the next question. If you would like to post the next poll question to the group, we would like for you to keep engaged. That way if you'd like to respond. And also to the folks online. Maybe this needs to close before I can advance.
OK, so there's our next poll question. The next question for the group. I have it here too. And when we can get to the slide we'll get there. It's data governance piece. So building on what we've been talking about with this, you know, trustworthy and verifiable data. What you know goes in needs to be clean to come out, you know, in a trustworthy way.
So thinking about data governance, referring to a systematic approach in overseeing the management and utilization of AI data within an organization. Definition for you, Marianne, I know that you want to hear that. So let me Pose to the group what you're thinking about with things building on things like we're talking about with the security risk sensitive information going into AI.
And that data security and stewardship piece, so to speak. Over to you. OK, great. So I just want to also play the devil's advocate here. So that we have an interesting conversation. Right so just talking about the points that Dan and Marianne were pointed out, you know, we're talking about social component of AI, which is transparency, bias, fairness, you know, responsibility, accountability.
But there is a technical issue attached to it, right? It's a machine that we're talking about. So what happens is sometimes I feel that technology is something that we humans create. So we actually have the reins right now to control what the output can be. Very soon that situation will not exist. So let me just kind of give you an example. When people like yourselves complained about the hallucinations, now it's become important that the developers of these LMS are putting policies that are integrated into the LLM.
Right so, for example, if you think about now, let's look at the complete ecosystem of AI. Nvidia is the owner of the AI ecosystem. Why? because it has gpus that actually are optimized for these LMS. And hence they will be developing the next generation AI deployments. Right? so if you don't have a GPU you cannot develop an LLM. Right so there are many constraints as well.
So what I want to kind of hone in into is if you look at it right now, the conversation should be about our own knowledge of this technology that we are looking at. It is going to grow. And if we are looking at some of the societal issues. We need to be right now, having that conversation with these developers as business leaders to say we want these outcomes, not wait till it's done and then kind of catch up, right?
That's what I want to kind of develop. Now I want to touch on the question on the frameworks. So there are many societies that have come in. One is specifically for the research community like yourself, which is called the Center of AI safety. They are creating the solutions that are deployable for scholarly publishers. So something to look out for. I also want to talk about some of the indexing work that we all talked about.
Part of our content creation. Some scholarly publishers are already using AI tools and they're deployed at large, one being Scopus and the other being PubMed. They are already in with this solution, and they're taking advantage of what AI brings. One of the things I heard on our conversation, Thanks to Nikola, was cost was a big deal. You know, scholarly publishers are looking to bring down costs.
So indexing is going to be something that is definitely helpful in bringing some costs down because you don't need a human in that picture, but you need a human in the loop to check the output. Right? so if you can just think about it from that perspective, I think those are some of the frameworks. I also want to talk about now, LMS are refusing outputs if they do not match their schema that we have programmed.
So let's say if there is a false negative that shows up, the LMS are being programmed right now with some technology and researchers who are supporting, like the work here that refuse that output as well. So there's a lot of growth happening. I don't know the end picture. I'm just telling you the story of how things are evolved. Yeah, I have some notes on this and I'll get to them in a second.
But something you said really brought up a question for me that I think we do need to address, and that is this, this notion of cybersecurity and LMS. You know, I think that all of these safeguards that you're talking about are great until they get hacked. And they're going to get hacked. So this is a little bit off topic I apologize, but but if you'll sort of if you'll let us run on this one for a little bit.
And if you have something to share, I'd really appreciate that because I think it's something that we're all going to have to understand. See, I knew Dan will take it to another level. So he has done that. So very well put, Dan. Cyber security. OK let me just explain you how cyber security attacks happen so that you understand how simple it is to actually hack an LLM, you know, prompt engineering.
If I tell the system to give me a response, which is kind of like attacking, it will start it will start the cybersecurity attack, right? So it's very easy now to have these kind of attacks. Then it was before. Right so he brings up a very valid question here. Now what are the safeguards around cyber security. Right cyber security is now CIOs in big corporations cannot sleep.
If they are held against cyber security attacks. They will be like out of a job next day. I mean, that's just the nature of the game. So, you know, this is a very, very important issue. So in response to that, the community that we are now developing is creating a lot of elements about how to manage, you know, data into a stored and a secured environment. So you are creating your own dedicated environments that are not accessible outside.
So let's say you have your own GPU, your own LLM that's positioned and on top of it, your own application that is not taking advantage of any of the, you know, outside non-proprietary software. So you're buying those licenses. That's what the it industry has done to kind of done in the past that they give you your own, you know, instance versus trying to deploy anything that's outside of that framework.
And so let's say if you were working on a Microsoft Azure platform, then it gives you that security. But you cannot guarantee that on OpenAI. Now that that's your decision, whether you want to take an OpenAI platform. But think about it, there are certain industries that cannot make a mistake. Finance, for example, is the biggest use case of your claim. Processing is all be done by AI.
In the next two years. Nothing will be done by humans, right? Like claim processing is completely now AI function. Then secondly, in health is becoming a great like all the radiology images across the world are right now going through LMS and then any kind of patient outcomes are being decided by them. So I wanted to give you examples of the secured LMS while, you know, you know, talking about it because I wrote it down.
So there's a rad LLM which is radiology language model. And then there is Med langs GPT which is medical imaging generative model. So these are secured, closed, fine tuned for that particular use case. Just to give you an example. But yes, right now there are many threats. But I'm giving you some examples of where we are at right now. Thanks Thanks.
And I'm sure those closed systems can still be hacked, but we'll go down that path later on, maybe after a drink or two. Right once they're hacked. You know, getting back to the idea of governance here. But, you know, I think that there. Again, there's a couple things, couple thoughts that come to mind in terms of governance. You know, one is politically, especially here in the US, we're really averse to regulations, especially tech regulations.
You can look to what the federal government and Congress, the executive branch, have tried to do with social media and really not been very successful at it. So, you know, I'm not sure where this goes in the US in terms of governance. I think probably Europe will be able to get more done. But it's but we'll see what happens there and how that gets received. In terms of institutionally, you know, institutions as we've heard today are kind of all over the place.
Some have very sophisticated policies, others have very loose policies, and some don't have any policies at all. And and it becomes difficult to, to operate in a world where there are, there are no standards and there's that word standard. And so I think that that standards that will emerge from the industry are, are going to be the most, the most useful things, and probably the thing we can count on the most right now.
Standards, though, tend to be sort of looking backwards. We sort of wait and see how things go, and then we develop standards based on that versus kind of this sort of governance that's sort of based on looking forward. Yeah And I'll just add in, I do think there's something we do very well in our industry and that's develop standards. And probably a lot of people sitting in this room are also involved with niso.
And that's a great outlet to each of us individually can personally get involved with helping to support, to identify and work toward the creation of a standard. And just to comment about other countries coming up with regulations, I think that the US can still benefit even if they originate abroad, for example, GDPR. I was speaking with Nicola at our table. You know, I do marketing in the US, but I am GDPR compliant because I market to a global audience.
So I think that same applicability would take place with data governance with AI. A couple of standards that scholarly publishing is already working on. ACM has artificial intelligence standards that they have shared, and we are all supposed to follow them. And NIST and ieee. Ieee is done absolutely phenomenal work in putting those standards out there.
So something to consider as you are working on it. And EU AI act is a big, big framework already there. And Governor Newsom in California just vetoed the first Bill that was for the deployment of AI technology. So obviously, we in the US are still trying to figure out how to standardize it. That just happened yesterday, I believe, or day before yesterday.
Yeah so I look forward to international governance framework perhaps next year when we come back for new direction. That'll be next year. We'll turn to our British or like the European friends over there, like we'll take their lead on that. That's right. And before I get into this next point, I wonder if we could show the poll of the trustworthiness question if that's not too difficult.
Wow almost half and half. So do you trust ai? Yes do you trust AI as part of the scholarly publication process? Almost divided down the middle between yes and no. So that's really fascinating. I was expecting it to be different. We both do. And we both don't trust AI.
So that's maybe after we wrap with our questions, we can hear people come to the mic with differing opinions. If we can make that work, because that would be really interesting. Thank you so much. So if you could close that, please. And move to the next poll question, we really appreciate that. So we've been talking about data integrity integrity of research frameworks.
So now we're going to bring it back to what we do in this room within scholarly publishing. So how and this is another broad question that we can kind of follow. Where this leads is how are we as a scholarly publishing community being transformed because of Gen AI tools right now? And then we can follow up with that a little bit more. But Dan, would you like to kick this one off?
Yeah, sure. I can get us started with regard to how AI or generative AI are transforming the scholarly publishing industry. I think about the, the, the ways in sort of three, three areas. One is how it's transforming or affecting publishers. Another is how it's affecting researchers. And the third is how it's affecting kind of everybody else, everybody in other industries, if you will.
For publishers, I think that the biggest thing that I've seen so far, and this is so far, is really inefficiency. The availability of AI tools has really kind of democratized efficiency, if you will, where some of these tools were available and have been available for a while. If you weren't a computer scientist or a data scientist, you really couldn't use them.
Or if you didn't, if you didn't know and were best friends with one, they really weren't available to you. But now these tools are available to everybody. A couple bucks a month or there's free versions. So all these, these tools that can really help us with efficiency are available to everybody doing any piece of the any piece of the life cycle here. So I think that that's a big deal.
These tools have also allowed us to build new products and features. And we're still experimenting with what are meaningful products and features versus what are kind of fun things for about five minutes and then nobody really cares anymore. And we've already seen I've already seen a handful of those. So I think that that, you know, where we go with, with building products and features is really going to be telling.
I think we'll probably at some point get a little bit into the conversation of how these things are affecting search and search tools. Some of the biggest challenges, I think, for scholarly publishers with regard to these really goes into sort of two areas. So, you know, one of them is just this, this notion of now we have these tools that can help us generate all this content.
And now we've got all this content. And, you know, as the product manager, I hear a lot from researchers, we reach out to researchers a lot and ask them what their biggest pain points are. What kind of features they want to see, what are their stumbling blocks? And the thing that I constantly hear is, well, there's just so much content.
I can't I can't figure out how to find what I want to find. I can't figure out how to sift through this. And if you guys can help me with some tools, that would be really helpful. So I think that that, you know, AI is not the only reason that there's a proliferation of content right now. But it certainly is part of the reason, one of the other big challenges.
I think that we're going to have to figure out and face is licensing the licensing models for content. Now, both I and subscription really don't take into account the use of the content in, in training models. And there's a, there's just so many questions. We could spend an hour at least on that topic right there. I'll leave it at that. But but licensing and usage is going to be a big, big challenge that we'll have to figure out.
The next bucket of uses is researchers. And this is something I just don't know a whole lot about. And I'd really like to learn more about how researchers are using AI, and not just in those efficiency reasons. You know, help me draft a first. Help me write a first draft to my paper. But how are researchers using it to analyze, to manipulate their data? How are they using it to build models?
How are they using it to figure out how and what questions to ask in the research? I don't know a whole lot about that, so I'd really be interested to learn more about that. But I think as publishers, we're going to have to know what researchers are doing, because we have to provide the tools for them. And then in the last bucket of other, other industries, you know, I think that the content like I just talked about licensing and training, you know, that's really valuable content, especially if you can translate it for other uses.
And how is that going to be used? And what are other industries going to want to do with the content? There's going to be a lot of pressure for, for the use of, of that in training. I think that the, the, the example that comes to mind first is, of course, medical information and medical research. If you've ever had, you know, had a stomach ache or a headache and looked on WebMD and then kind of left after you almost had a heart attack based on what you read.
That's not a really good way to, to sort of to, to, to get some help. But if you were to just ask a chat bot, hey, I have a headache and here's a few other things and get a more reasonable response. You know, that could be a real use case for some of that information. And so that's, that's where I think some of this other content is going to go to is some of that, that top of funnel kind of research and, and other industries there.
I feel so sorry for the licensing people because as much as we're dealing with things. Boy, that's a big that's a big problem. There's another layer that we're seeing at my company with the proliferation of some of these tools. There are issues with scale and quantity. We're seeing it like with paper Mills. So, so many more papers being submitted because maybe they're, you know, being written by these GPTS.
But then other tools like, like ithenticate is it is an example in which we're working with a, a customer authenticate is authenticate is such a great tool. But then, you know, the results are just mountainous and a person can't go through some of the results that these AI related tools are providing. And it's like there's another layer of AI to deal with the quantity of results coming for tools.
If through some of these tools, if that makes sense. So that's a place where we're starting to take our take what we do well, you know, process, text, analyze, put structure, put weights in and then kind of make something a little more manageable for human. So I think that sort of industry, those other services are going to just grow. And when we come back next year, we might have other examples.
So that was just something else I wanted to just thinking about while you were talking. I have a follow up question. So I hear you both from two different perspectives. You want to use it for catching plagiarism, but you don't want to use it. Let people search like content per se, like so you want the customer to not use it and you want to use it, or versus you want researchers to use it, but you want to define the use case for which they want to use it.
Me personally, I don't have I don't I'm not a publisher. So I don't, I don't I don't I can't make a statement on that. But like me, me personally, I'm continuously redefining my own use cases for where a GPT fits in my world. I would never use it to try to find the answer to a question unless it were. I knew that it was a very dedicated GPT, and I don't know if there's anyone here from Isa.
International society for automation. I've been following their product. They've released, I think it was this year. Called called MIMO. It's an LLM that's been educated on their content. So that's the kind of sauce that I would trust if I were a researcher. But I, I still would go to Google for a general, you know, query over a GPT.
Right but the way, the reason I'm asking that question is that the use case for AI is very different from what the searches, but right now, because it's one of the vastly trained models, if you had to look into information like you just wanted to do, what is the research on some Henry Vi, you know, you can go to chatgpt. It'll give you a timeline in less than 5 seconds versus if you look at Google, you'll have all these links and have to go through it.
Now who can say that Google is the right source as well? I'm not talking about the trust of the point being that it's designed to give you information in a very limited time frame without you having to search through that content. So I think that's getting into a different area, which is the, the, this issue of, of search and rag rag is, is for those who don't know, is an acronym for retrieval, augmented generation.
It's a sort of a newer approach to search that involves using traditional document based index based search. But instead of getting back a list of documents, you get a chatbot or an LLM to kind of interpret the results for you first so that you can do some sort of reading through it, as opposed to having to figure out which of the 10,000 results you got back is useful. It'll give you that kind of high level overview and summary of search and that's really what it is.
And it's really great for getting a timeline of Henry VIII or Henry Vi. It's also really great for figuring out how long to cook chickpeas in a pressure cooker. For things like that. But if you really want to dive into something into the research, you know, chemistry research or some sort of biomed research, you're not going to get a really good response from a rag.
You're still going to have to go combing through the documents to find what you want. So it's really good at that high level, high funnel, you know, top of funnel search. But it's not great when you get lower down the funnel, at least not right now. Then the way rag would be programmed for your use case would be that the data set of the chemistry journal will be tuned as.
So rag is retrieval from the entire database of the world. Or it could be websites. Anything that was published whatever. And augmented is augmented with your specific data set to generate a response in NLP, which is not natural language processing, in order to give you answer relevant to the search of the context. So if you put the chemistry database into it, chances are it will be pretty precise on the response as well.
But if it's not that, then it won't. But you still might have to go into the research to get all of the information unless you keep funneling it down. That's what it would do for you because it will augment their search information, looking at all the content that you have provided within the research context. So hopefully that will be the case. But Yeah, this is a new emerging.
It's called fine tuning and rag. It's like an emerging technology. And many people are moving in that direction. And it's become one of the most relevant ways for predicting accurate and relevant responses from chatgpt. I definitely think that's on the path to validation. The skeptic in me just doesn't trust it yet. And and coming back to budget too, you know, like some publishers can create these and others can't.
So like for I just want to get a show of hands. How many people are right now looking at AI for the like tools within your organizations like not from. Yeah go ahead. Please raise your hands. Yes there you go. Almost everyone. Wow so almost everyone.
We are two different ways. And how many of you think that this is the right time to invest in these tools? OK, so look, looks like this technology is going to get adapted pretty quickly from the show. Excellent and we're going to move on and dig in a little bit more. And then we're going to open it up to Q&A. I wonder if we might show the results of our next question if that's up there.
OK so the question was one common concern about AI is maintaining content integrity. Do you consider this a primary issue with adopting AI in scholarly publishing? 85% said yes. So I think that builds on our conversation and is quite telling. OK so you can go ahead and close that one out. Thank you so much.
This is this is two questions. We've been talking about AI being used in peer review. And Dan, you brought up a point about comments and concerns and questions from researchers. So I wonder if we want to dig into that a little bit more, or we're getting close to our time so we can move on to the call to action. So let me put it to the panel and see if there's Oh, and we have one more one more poll question for you.
So please come in and chime in. We'd love to hear your thoughts. But thinking about concerns and questions from researchers, the researcher side, the publisher side and across our community. So would anyone like to chime in on that before we move on to the call to action? Or do we feel that that's been covered?
I feel like it's been covered, but I've had so many conversations outside of this table that, you know, was it covered over there or was it covered during the roundtable or just in the past two years? Have we covered it? Yeah I mean, I think it's an interesting question. I've already exposed myself as not really knowing a whole lot about how researchers are using it, but wanting to learn more.
I know there's a lot of talk about how it could be used in peer review, and I do not feel qualified to speak on that, so I'd really be curious to see what others who are doing peer review or working with peer review feel about how its use in peer review. You know, how that could go forward and what it could mean for the peer review process. Obviously there's again, that efficiency thing. It could certainly I could certainly see how it could be used for increasing efficiency, but I'm not quite sure what the other concerns are.
And I'd rather hear from somebody working in peer review about that. Yeah is anybody working in peer review using an AI tool here? Is somebody exploring an AI tool? Please go ahead. Go to the microphone, please.
Just very briefly. I'm no expert, but the tools that we are using are ithenticate and figshare or kruphixx. Sorry and I think those tools have pretty highly sophisticated AI facilitated support behind them. So that's the legitimate extent of it. But the more I have questions from editors and editorial board members, et cetera, I get questions like when can I use chatgpt to do peer review?
And I say never, never, but not right now. But I know that that is the future. I'm sorry to say it, but I think that's exactly where you are. You are definitely a power user of AI tools. Do you think that it should be allowed to do peer review choosing. I'm just curious from your perspective. I think we talked about this a little bit earlier and we just had peer review week.
And a lot of the comments around peer review, the future of peer review, how to amplify peer review, support burnt out peer reviewers and researchers does point in the direction of finding solutions through AI. So it is. It is definitely a topic of consideration. I think we're moving in that direction. And whenever I present it to young researchers at conferences or even start talking about it, they are very much in support of finding a way to help them do better peer review.
And if we can find a if there are tools that are accepted by the community and authenticated, then I think that there's I think it's going to happen. I do. But, you know, my question is those are just those are efficiency tools. Right but what about the idea that in a panel of four peer reviewers, one of them is actually a chat bot, and maybe it's trained in its finely tuned, but you give it the paper and it gives you back a formatted peer review response.
A human can read it, but that that becomes one of the four entities in your peer review group. Again, I you know, I'm wondering if anybody's tried this, if it's, you know, what others might think about it. Hey so yes, Jay Patel with cactus. So we actually have launched an AI peer review assistant within our paper pal solution for authors, so authors can bring their manuscript in, or they can actually type their manuscript in paper pal.
And what they can do is they can choose to have our AI critique their paper. So it checks for language, grammar, some technical stuff, but then also give them preset prompts that they can click on to get a review of their paper. Or they can bring their own prompt. So we already started doing that. It's fairly new, but as we collect data, I can certainly share that.
That'd be awesome. That's great. Thank you. So that's the preflight question for you is, how do we know that the peer review is done with integrity like it is, the output? How do you propose your solution will help us with that. Yeah I mean it's still early days. So we're waiting for feedback from the authors that actually use it to see how useful it was for them and how accurate it was.
But like I said, it's fairly new since we released it. So as we collect more feedback from the authors that use it, we'll certainly share that. So the more the feedback he gets and he trains his model, it will change to the state of what the correction it needs, so it will get more and more accurate over a period of time. So feedback is a big important loop of training these models as probably. And so that would essentially be the most important thing in order to get the better outcome I think too, and I've seen that tool from cactus.
It's really cool. And and I think there's also going to be or is there's a separation between sort of mechanical aspects of the manuscript and then sort of the developmental or the science, the research aspects. And, you know, I think back to the late 80s, early 90s of copy editing, similar parallel, like when I was a copy editor, I couldn't imagine not having a red pen and marking up a piece of paper.
And, you know, now, when I see some of the manuscript preflight or the pre-checks, it's certainly better than I ever was as a copy editor. So separating some things that we know that the, the engines and the, the AI technologies can do well is, you know, that's where we scholarly publishing industry, I think have, have always sort of excelled. And I want to be cognizant of time.
We have about eight minutes left. So do you have a question. Please ask. And then we'll go to a closing from the group. And then we'll wrap up with our last poll question if any other questions in the room. Sorry, I have more of a statement please. Heather kotula access innovations I am married to a researcher who authors papers and peer reviews papers and consumes published papers.
And Jay, what you got for paper pal? You know, doing peer review and stuff like that. It can check all the mechanical things like Marianne was saying, but it can't check the validity of the science. And I think that's a huge part of what human peer reviewers do, is say, you know, the way you modeled your experiment was stupid or something like that. So do they do that experiment and then make the validation or that's a hypothesis.
How do humans validate that experiment? Do they do it in their lab and then make sure that it is accurate sometimes. I mean, can research be reproduced? That's a huge thing for all of us, right? And every time I say that, I think back to the joke about the Journal of irreproducible research. But I don't know where that came from originally. Sometimes they reproduce it, sometimes they just look at the parameters and, you know, based on their expertise.
I would never do it that way, you know? OK, they did it this way. I would never do it that way. Did it work? You know, or was just it just off the rails? Really? good point. I mean, I imagine that someday I will be able to rerun that science that, that, that, that research and check its validity.
Yeah, it's actually technically possible. The, the definition of it will come from your community of how you want it to, you know, perform that experiment. I mean, we're going to go to space using it. So that's definitely something that says something. So here we are at the end of the panel. So we talked about a call to action. You know we're discussing a lot of questions.
We discussed this topic around in our roundtable. So how do we move from questions to solutions. We thought a little bit about frameworks US and globally. So let me put that to each of you in turn. And then we'll open it up for any group questions. But Heather has a question, so. Oh, Yeah. Oh I'm sorry I don't want to disrupt the flow, but just kind of apropos to a lot of the ways these conversations have gone.
I'm running Q&A session at Frankfurt at STM this year, and the researcher who's on that panel is from the University of Mannheim. And he said something which scared us all, which is we are now in this generation where we're bridging the before times and the now times and like back to, you know, the fax machine and the stuff. But a lot of the comments and responses have said you can look at that and using your expertise, know that it's just not right.
Or, you know, you think that they're not using it, you suspect they are or whatever. But he's worried that the researchers that he sees that are growing up using these tools and not learning how to do things properly is like a loaded word, but they don't know how to sort of do it the hard way. Right? that they're not going to have the expertise to actually look at it and say.
Something's wrong with this. And that is worrisome because I feel like we've all got kids that are in, you know, coming up through the system and where they're not, you know, learning things and where they are. But if we keep the human in the loop. But what if the human doesn't have the expertise to like, you know, raise that flag? It's not a question, but how are we going to cope with that?
Hopefully not with like another chatbot, because this seems like we're just piling on. Anyway, that's my rambling. Good good point. And Thank you for the word of caution, Heather. We have to keep the what do you call like the intelligence of humankind in the past and kind of transpose it into the future kind of thing. So I agree with that.
That won't go away. However, there will be new ways of doing the same things that we've done in the past, like we've transformed from internet generation. This will become another tool to transform to the AI transformation, right? So let's say I want to just for the last question, how many people think that you don't use ai?
That must have been all the online people. Like 30% or something that they didn't use AI. You don't use AI. But but they know you. So that's the problem. So I think there is obviously the frameworks are very important. And that's why, one of the questions when we were talking about this panel was, you know, we talk about a lot of areas of concern, but we don't know how to move forward.
So, you know, Jamie was very kind to kind of include this question as to and this is actually to you guys, not to us per se. We want to hear from you. Same questions, but how do we move forward? What's the next best step that can actually take us one place closer to adopting it without fear. Yes Hi, I'm Doctor Sarah Wright from the American Veterinary Medical Association, so I'm an associate editor, but I'm also a researcher in the aquatic animal health space.
And as a scientist, I make evidence based decisions. And I also do that to for our journals. So if there were data to show how I could be beneficial in processes, I think that would go a long way from the researcher perspective. Now peer review is a whole different subject. I'm not going to approach that, but I do think that the peer and peer review is there for a reason, and there really are components of that you unfortunately cannot duplicate or replace.
But I do think that it will become integrated at some point. But again, the peer and peer review is there for a reason. Really well said. And we're coming up on our final 2 minutes. So I wanted to check if there was any online questions that we'd like to share. Would that be Jackie or susan? None OK, great. So please do to the open ended poll we could share.
Yes so the question, what was the question? The question was, how do you foresee AI changing the role of human editors and peer reviewers in the scholarly publishing process over the next five years? Some of the responses received were I will help do a first pass at editing or summarizing lengthy text. We're going to have to be a lot sharper and read more closely to detect undisclosed use.
Use of AI in manuscripts and reviews. I hope it doesn't change the role of human editors too much, but I do think AI will eventually take over the peer reviewer role, flagging data inconsistency concerns. Hopefully I will save editors and reviewers time and performing repetitive tasks pattern recognition more quickly than humans can do alone.
I should help the human editors focus on the content aspect of the process. I don't know if you want to respond to any of those comments, or if anybody else would like to add on them. I think great comments. I agree with, like we've talked about it. Like the text summarization and indexing and metadata enhancement will definitely be useful with AI.
So thank you everyone online. I think they also stayed back and didn't sleep on the session. So great comments. Thank you so much. Thank you to our panel and Thank you to the room for participating. And Thank you everyone for your participation in today's seminar.
For those in the room, if you're thirsty and interested in continuing the conversation, please join us at the mission DuPont, which is just a couple blocks down. The address is 1606 20th Street and no longer raining, so there's no excuse. Come join us. This is not a sponsored social hour. This is a BYOD buy your own beverage. So really, do still hope you'll join us those online.
Thank you so much for being part of this conversation. And Thank you for everyone in the room who included folks who are joining remotely. Thanks again to our speakers for your time, your expertise, and joining us today. The engaging presentations and conversations. Thanks again, of course, to our working group to SSP staff Susan, Jackie, Melanie and the Au tech Wizards, Bob and the staff for Au.