Name:
The Role of GenAI in Peer Review - Balancing Innovation and Integrity
Description:
The Role of GenAI in Peer Review - Balancing Innovation and Integrity
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/bfb819b1-f555-49d4-bc08-3d50d74d9f0e/videoscrubberimages/Scrubber_1.jpg
Duration:
T00H26M50S
Embed URL:
https://stream.cadmore.media/player/bfb819b1-f555-49d4-bc08-3d50d74d9f0e
Content URL:
https://cadmoreoriginalmedia.blob.core.windows.net/bfb819b1-f555-49d4-bc08-3d50d74d9f0e/SSP2025 5-28 1245 - Industry Breakout - Enago and Charleswor.mp4?sv=2019-02-02&sr=c&sig=x7Lh6tlZxJPZJSvXJRQ%2FBKaU%2B2eIZTFGuey79tixbr8%3D&st=2025-06-15T19%3A52%3A54Z&se=2025-06-15T21%3A57%3A54Z&sp=r
Upload Date:
2025-06-05T00:00:00.0000000
Transcript:
Language: EN.
Segment:0 .
OK good afternoon, everyone. It's quarter 2 so I'll make a start. So hi everyone. My name is Mary miskin. I am the operations director for Charlesworth. And I'm joined this afternoon by my colleague Tony O'Rourke, who's just coming back to the stage. Tony is the VP of publisher partnerships at in Argo.
This is my first SSP conference and my first time presenting an industry breakout session. And I arrived from the UK at midnight last night. So if I'm struggling to make my brain work, please go easy on me. So today, Tony and I are going to talk to you about the role of AI in peer review and also talk specifically to a product we have launched called Argo AI peer review assistant.
So as I mentioned, I work for Charlesworth and Tony works for anago, but together the two companies have combined our technology expertise and are working jointly on new software products for the publishing market. You may be familiar with some of our existing software products. So if Charlesworth, we have the Charlesworth gateway, which is an author communications software solution that sends authors manuscript notifications in Native language to authors primarily in the Asia-Pacific market using WeChat, line, kakao, and also WhatsApp.
And the Charlesworth gateway is also up for an epic award at the SSP conference this year. So fingers crossed for that. And then on the inaugural side, we have the Trnka product, which is a language writing and editing tool for researchers. We have the inaugural reports product, which is a manuscript submission screening tool aimed at both authors and editorial office staff.
And then we have energon, which is a manuscript review literature summarization tool. And it's the inaugural product that we have extended to create in Argo AI peer review assistant. So just looking back through the evolution of technology within peer review. What we've seen over the last 20 years, in the mid 2000 is when we first started to see some basic automation for peer review.
So this was kind of rule based non AI automation to speed routine checks. So things like keyword based plagiarism checks or email based manuscript tracking and Excel sheets. And then from 2010 to 2015, this is when we started to see the introduction of AI and ML tools. So here we've got classical machine learning for editorial decision support. So this would be things like language editing and proofreading, readability scores and some basic reviewer matching algorithms.
And then in late 2016 to 2020 is when we started to see some process efficiencies coming with NLP. So this enabled things like automated technical checks with a hybrid ML and rule based system, and generative NLP models for aiding author and editorial efficiency. And then from 2020 to the present day, this is when we started to see the emergence of Gen. Gen AI and large language models for things like AI generated content summaries, literature review, insight generation and peer review support.
And for those of you who were less familiar with some of the acronyms I've just given to you there, I work with our AI teams on a day to day basis, and sometimes I have to come back and fact checked what I am talking about. So this is kind of a hierarchical summary of what the terms are and how they correspond to one another.
So ML is machine learning and that's the top level of AI. And then under ML we have NLP natural language processing. So this is a subfield of ML ML that's focused specifically on processing and analyzing human language. And then within NLP, we have generative AI Gen AI. And this technology analyzes structured and unstructured data and summarizes key trends, providing a natural language explanation instead of raw data.
And then a subset of AI is the LLM, the large language model, and the key difference between the LLM and AI is that LLM is just looking at text. It is a large language model, whereas AI will work with multiple different content formats. So images, audio code, et cetera. So then also looking at some of the products you may have been familiar with through that timeline of the changes in technology, we put together this timeline and some of the key products we've seen emerge within editorial and peer review workflows.
So in 2004, that's when iThenticate was launched, the plagiarism detection tool we're all so familiar with. And then in 2005, I've had to make this date up because I could not actually find legitimately the genuine date. When manuscript central ScholarOne manuscripts was launched. But 2005 was when I could see the majority of press releases coming out from editorial offices starting to integrate their peer review workflows onto manuscript.
Central city two. OK should have put that earlier in the timeline. But so 2005 ish for me. I remember I started working in editorial in 2007 at Emerald publishing, and that was when emerald was just putting their journals onto manuscript central. Then in 2012, we saw the launch of publons that allows researchers to track and showcase their peer review and editorial contributions.
And that was acquired by Clarivate in 2017 and integrated into the Web of Science platform. 2018 was when site was launched. Then in 2019 we have author one. So this is what we used to call our inaugural reports product manuscript screening and assessment software. And then in 2019, we have on siloed technical checks being integrated into ScholarOne to assist with manuscript evaluation.
Elevation in 2020, and Aga launched trinket which is our grammar checking language enhancement tool, and around the same time paper preflight was also launched by cactus. I also have profi launching in 2020 and then in 2025 we've got Argo AI peer review assistant. I'll pass it over to Tony.
Thanks, Mary. So I'm Tony Rourke, VP publishing partnerships in Argo. So this slide is just really summarizing the kind of tools that are already out there. And helps to understand the context of what we're working towards with in Argo, with the peer review assistant. So tools already exist for managed summarization, for literature review, for Meta reviews, decision support, research integrity and transparency is a really hot thing.
In fact, one of the things we're announcing this week is a service called Mark, which is about providing what we call the source of truth behind an article. How much of the article has been generated by AI. How much of the article has been cut and paste from another source. How much of the article has been plagiarized? In fact, there'll be more of that later.
But ultimately, the tools are out there in a number of different places. But what we've been trying to do is bring these tools together in a single place to understand how AI tools can be leveraged in peer review. It's kind of important to understand the different persona that exists within the publishing workflow. And there are effectively there are three personas.
You've got the novice, the kind of person who is making pretty much a general purpose use of AI. They've got a basic knowledge of AI using large language models like ChatGPT or Gemini, and these reviewers don't do much to customize the models. They're pretty much taking the products as they are and using BILT in capabilities, but what they are trying to do is respond to a number of basic prompts. They want to understand how they can summarize sections, how they can clarify reviewer comments, for example, and just scroll down the screen, how they can draft responses, how they can improve language.
And these are the kind of typical simple feedback, a simple prompts that you might expect an AI novice to use. The AI literate persona is a researcher who is using who wants to develop or adapt specialized AI tools for targeted peer review tasks, often integrating multiple models or customizing prompts for greater control and specificity. So they use cases you can imagine.
One second, things like automating structured review, the ability to be able to create creates prompt chains that guide LLMs through journal specific review criteria or targeted feedback. Using AI to flag vague or unprofessional or incomplete comments, or suggest even improvements in the text, or even creating custom dashboards. Use building interfaces that aggregate AI generated summaries and highlighting potential weaknesses and assist within the decision making process.
And then you've got the AI agent creator. Now, I think six months ago when we were thinking about this presentation, this was kind of very much the domain of the computer specialist. But now we're seeing this being used much more widely by the non-computer specialists, somebody who's got a good grasp of computing and a good grasp of AI, but who may not necessarily see themselves as a computer specialist.
And these agents are usually used or multi-agent systems that are kind of acting semi-independently within the peer review process. They may be designed to handle is quite specific or quite detailed end to end tasks, or one that might be used for specialized tasks such as language for statistics or one for ethics. So again you see the typical use cases. Again we would imagine might be things like typical end to end review facilitation agents that can analyze manuscripts, generate draft reviews, suggest reviewer matches, provide structured feedback with minimal, potentially minimal human intervention.
The amount of human intervention very much depends, of course, on the publisher and on the user. Another use case might be role based and task specific agents, where agents are assigned to distinct review roles, so it could be ethical compliance using the AI for statistical rigor or even novelty assessment. But when we're building these tools, it's really, really important to understand who those users are and the expertise that those users have.
So I'll just spend a few minutes talking about the use of LLMs in peer review and some of the practical concerns, and how they affect journal policy. Obviously, a massive issue is one of data privacy and confidentiality. Uploading manuscripts to public LLMs like ChatGPT and Gemini, as we all comes with huge risks in terms of potential data breaches, impact on IP, lack of end to end encryption.
Almost leads to a loss of control over data storage and possible deletion of important data. Uploading content may be used to train models, but that risks exposure of unpublished work or identities appearing within the LLMs. There's another huge area where journal policies may be violated, where you have to be take quite tight control. So sharing manuscripts within public LLMs potentially breaches confidentiality.
Where AI generated content without disclosure may count as ghost authorship or plagiarism. I think the point. The point is it comes with risks. But you have to ensure that those risks are being mitigated. So we've developed a peer review workspace to try and address many of these challenges, to provide to create a secure environment for publishers to work in order to support the work of the reviewers, creating an integrated workspace that's scalable, that's efficient, that's rigorous, and providing AI assistance.
I'd also say it's important to say assistance because it's AI with human or human. With AI, it depends really on the workflow that you prefer, combining different aspects of the process workflow, whether it's manuscript evaluation, integrity checks, literature review, et cetera, a recording reviewer actions ensuring transparency and audit full auditability throughout the process, addressing ethical concerns.
Ensuring that there is a human in the loop as part of that process, and that the tools are fully customizable according to the requirements of the publisher. At which point, I'd like to hand back to Mary to give you a quick demo of the peer review assistant. OK, so what is the inaugural AI peer review assistant. Ultimately, it is a peer review workspace, so it's not a submission and peer review workflow system.
It is a standalone workspace that reviewers will use to complete that process of peer review. But it can be integrated into existing submission and peer review systems via API. So although the review would get the review invitation through your existing systems, they would come to the inaugural I peer review assistant to go through that process of reviewing the manuscript. So the workspace, the workspace uses our locally deployed LLMs, and we've set allowed for either an AI first or a human first approach to completing that peer review.
And within our locally deployed secure LLMs, we have multiple LLMs available. So we've put in place a workflow through which multiple LLMs will verify the outputs of the previous LLM. This enables us to reduce any hallucinations and ensure accuracy, and provide reviewers with a consensus response from multiple different LLMs. The workspace features a number of tools available to the peer reviewer, so there's the Copilot paper querying.
This is fairly similar to ChatGPT, where you would ask a question and it will answer based upon the data available within the manuscript and the knowledge within the LLM. And then there's a structured peer review workspace, where reviewers work through the process of answering structured peer review questions. And all of this, as Tony has highlighted, is overseen by complete human oversight with every step of the peer review.
For the publisher, we offer a complete analytics dashboard that allows you to track the work undertaken and the time that has been spent within the review workspace. So this gives publishers peace of mind that the review has been completed with integrity. So the metrics we can track include things like the time actively spent working on the review, the sections or pages of a manuscript that a reviewer has reviewed and annotated, any key concepts.
They've analyzed the Copilot questions they may have asked, and the number and names of related papers that they may have explored in relation to the peer review. Does this work. Yes so this is a screenshot of the peer review workspace. We can see on the left the paper that is under review. In the middle pane are the structured peer review questions.
And on the right is a specific question that is being answered by the AI. So I'll jump now into some pre-recorded screen demos of the product in action. So this is the first step we have in our structured peer review process. So you can see here that there's a hierarchical summary being created. So this is based on the structure of the manuscript under review.
It's not restricted to a predetermined format, so it will quickly and accurately summarize the relevant content from that paper. So this is the first step for reviewer, enabling them to quickly understand the paper under review at a macro level. So now we're navigating to the structured peer review questions.
And you'll see here there's the ability to toggle between the eye first and the human first approach in the right hand pane. What we're showing here is the eye first. And the question of are the conclusions supported by the data. So you'll have seen it's gone a bit quicker than I am speaking there. The process through which the question was asked, the LLMs went through three separate circles of multiple different LLMs, checking one another's responses.
And then what we can see in the top box is the consensus response from the three LLMs. And then there's the ability for the human to write in the box below their own additional review comments. So they can agree or disagree with the outputs of the AI and add their own additional review. It's worth pointing out here in the middle where you can see what we've called the SPR section. That's the structured peer review section.
These are questions that we have populated ourselves based on standard structured peer review formats. These are fully customizable depending on what a publisher or journals specific peer review questions are. So now we're moving on to an example of the human first workflow. And we have chosen the question of our key references included.
So you can see there the reviewer is able to look through the paper in the Reading Pane on the left hand side. They can review the paper. And that specific question. They can write the response and then they've ticked validate by eye. What's happening here is the AI has taken the text of what the author has written, and it's analyzing that.
And it's going to state whether it agrees or disagrees with what the human reviewer has written. The AI will then add their additional review context that the author can read and agree or disagree with. We do have a further step in the UI. We're planning to implement, which is a final summary comment rather than just the agree, disagree and save review process.
So moving on from the structured peer review form to look at some of the additional functionality within the workspace. So this is the Copilot paper querying space. So if the reviewer has any questions about the paper that they want to be answered quickly or want to have specific concepts explained to them, they can do this using the secure LLM chat functionality that you can see being worked through on the right here.
So we can see some of the questions that have been answered and the responses generated by the secure LLM that is plugged into the back. And then in addition to Copilot, we also have related papers functionality within the Explore section of the interface. So this feature allows the reviewer to quickly identify papers that are relevant to the paper under review.
The related papers are sourced from over 200 million academic papers that we have indexed within our own index of open data. And the suggestions. The suggested papers have been identified as being semantically relevant to the paper under review. The reviewer can also highlight specific sections of the manuscript and further refine the related papers that are being surfaced to them, asking just to look for papers related to a particular part of that paper.
The reviewer can attach the related papers within the attached folder, or they can also pin them to specific sections of the manuscript under review. If they want to remind themselves that just one specific related paper was relevant to a specific section of the manuscript. Then finally, we're just showing the way in which the paper under the review can be annotated and highlighted.
The reviewer can add specific notes to themselves as they're undertaking the review, or they can highlight sections of content and send it to Copilot for querying. And all of this is saved within the reviewing panel for the reviewer to come back to at a later stage when they're finalizing their review. So that is a quick overview of the product in action.
At the moment it is at prototype stage. And we are working with publishers rolling out small scale pilots with selected journals or editors where we've got the ability to implement your own custom peer review questions and offer it to reviewers or editors as a standalone tool for them to use and feedback on as part of their reviewing and editing processes. So we're really keen to find more early adopters who are interested in working with us on pilots.
So do please get in touch with myself or Tony if you're interested in finding out more, we'd be happy to do a demo to you and your teams to give you more detail about the product. I think we've got six minutes left for questions. If anyone's got any questions, there's a mic in the middle of the floor there. And please say who you are and your question, and we'll try and answer that question as well as we can.
Hi, I'm Elizabeth, I'm from Elsevier. This was really interesting. Can I ask a really silly question. What happens with the reviewProduct. Is it just not stored at all or is it deleted or not. The article itself, but the review of it. That kind of depends on whether it's a publisher integration or a.
We also have a use case where a B2C user would just access this product independently, directly on inaga Reed. So when it's directly on inaga Reed and it's the reviewer independently completing the review, then they would permanently have their workspace available to them with their saved completed reviews where it's a publisher integration, the reviewer will be given a link to the workspace for that specific review, and at the point when that review has been completed, after a certain amount of time, that link would become inoperable.
But we would expect the publisher to maintain the copy of the review reports, but we would delete it after some period of time. Thank you. Any further questions. Hello milosh from MDPI. So nice presentation. Really amazing product.
I'm coming from the tech side, so maybe a bit of a more techie question. From my experience, LLMs for Gen AI chatbots are always tempted to agree with what we humans tell them. And here we see that in a peer review process, they need also to disagree. Did you get any feedback from the end users about how that works exactly. Yeah, I would agree with your position there that on the whole you do tend to see.
I wanted to agree with what you have said, and to overcome that, we have carefully crafted the prompts that ultimately exist in the back end of each of those structured peer review questions. And in our testing, we have found that even if the AI says it agrees, it is being asked to add additional thoughts and insights so it will additionally give more peer review feedback, even if it ultimately agrees with what the human reviewer has said.
If there are no further questions. Thank you very much, everyone. I hope you found it useful and we look forward to any private conversations you may wish to have. Thank you.