Name:
                                The Role of GenAI in Peer Review - Balancing Innovation and Integrity
                            
                            
                                Description:
                                The Role of GenAI in Peer Review - Balancing Innovation and Integrity
                            
                            
                                Thumbnail URL:
                                https://cadmoremediastorage.blob.core.windows.net/bfb819b1-f555-49d4-bc08-3d50d74d9f0e/videoscrubberimages/Scrubber_1.jpg
                            
                            
                                Duration:
                                T00H26M50S
                            
                            
                                Embed URL:
                                https://stream.cadmore.media/player/bfb819b1-f555-49d4-bc08-3d50d74d9f0e
                            
                            
                                Content URL:
                                https://cadmoreoriginalmedia.blob.core.windows.net/bfb819b1-f555-49d4-bc08-3d50d74d9f0e/SSP2025 5-28 1245 - Industry Breakout - Enago and Charleswor.mp4?sv=2019-02-02&sr=c&sig=x%2BU%2F5xLhcWuuHoO%2BH9fXRVQPxj%2BG5qpHc%2BaEv0qpD18%3D&st=2025-10-25T22%3A31%3A48Z&se=2025-10-26T00%3A36%3A48Z&sp=r
                            
                            
                                Upload Date:
                                2025-06-09T00:00:00.0000000
                            
                            
                                Transcript:
                                Language: EN. 
Segment:0 . 
 OK good afternoon, everyone.  It's quarter 2 so I'll make a start.  So hi everyone.  My name is Mary miskin.  I am the operations director for Charlesworth.  And I'm joined this afternoon by my colleague Tony  O'Rourke, who's just coming back to the stage.  Tony is the VP of publisher partnerships at in Argo.   
This is my first SSP conference and my first time presenting  an industry breakout session.  And I arrived from the UK at midnight last night.  So if I'm struggling to make my brain work,  please go easy on me.  So today, Tony and I are going to talk to you  about the role of AI in peer review  and also talk specifically to a product we have launched called  Argo AI peer review assistant.   
 So as I mentioned, I work for Charlesworth and Tony works  for anago, but together the two companies  have combined our technology expertise  and are working jointly on new software products  for the publishing market.  You may be familiar with some of our existing software products.  So if Charlesworth, we have the Charlesworth gateway,  which is an author communications software  solution that sends authors manuscript notifications  in Native language to authors primarily  in the Asia-Pacific market using WeChat,  line, kakao, and also WhatsApp.   
And the Charlesworth gateway is also  up for an epic award at the SSP conference this year.  So fingers crossed for that.  And then on the inaugural side, we  have the Trnka product, which is a language writing and editing  tool for researchers.  We have the inaugural reports product,  which is a manuscript submission screening  tool aimed at both authors and editorial office staff.   
And then we have energon, which is a manuscript review  literature summarization tool.  And it's the inaugural product that we  have extended to create in Argo AI peer review assistant.   So just looking back through the evolution of technology  within peer review.  What we've seen over the last 20 years, in the mid 2000  is when we first started to see some basic automation for peer  review.   
So this was kind of rule based non AI automation to speed  routine checks.  So things like keyword based plagiarism checks or email based  manuscript tracking and Excel sheets.  And then from 2010 to 2015, this is  when we started to see the introduction of AI and ML tools.  So here we've got classical machine learning  for editorial decision support.  So this would be things like language editing  and proofreading, readability scores and some basic reviewer  matching algorithms.   
And then in late 2016 to 2020 is when  we started to see some process efficiencies coming with NLP.  So this enabled things like automated technical checks  with a hybrid ML and rule based system, and generative NLP  models for aiding author and editorial efficiency.  And then from 2020 to the present day,  this is when we started to see the emergence of Gen. Gen AI  and large language models for things like AI  generated content summaries, literature review,  insight generation and peer review support.   
 And for those of you who were less familiar with some  of the acronyms I've just given to you there,  I work with our AI teams on a day to day basis,  and sometimes I have to come back  and fact checked what I am talking about.  So this is kind of a hierarchical summary  of what the terms are and how they correspond to one another.   
So ML is machine learning and that's the top level of AI.  And then under ML we have NLP natural language processing.  So this is a subfield of ML ML that's  focused specifically on processing and analyzing  human language.  And then within NLP, we have generative AI Gen AI.  And this technology analyzes structured and unstructured data  and summarizes key trends, providing a natural language  explanation instead of raw data.   
And then a subset of AI is the LLM, the large language model,  and the key difference between the LLM and AI  is that LLM is just looking at text.  It is a large language model, whereas AI  will work with multiple different content formats.  So images, audio code, et cetera.   So then also looking at some of the products you may have been  familiar with through that timeline of the changes  in technology, we put together this timeline  and some of the key products we've  seen emerge within editorial and peer review workflows.   
So in 2004, that's when iThenticate  was launched, the plagiarism detection tool  we're all so familiar with.  And then in 2005, I've had to make this date up  because I could not actually find legitimately  the genuine date.  When manuscript central ScholarOne manuscripts  was launched.  But 2005 was when I could see the majority of press releases  coming out from editorial offices starting  to integrate their peer review workflows onto manuscript.   
Central city two.  OK should have put that earlier in the timeline.  But so 2005 ish for me.  I remember I started working in editorial in 2007  at Emerald publishing, and that was  when emerald was just putting their journals onto manuscript  central.  Then in 2012, we saw the launch of publons  that allows researchers to track and showcase their peer review  and editorial contributions.   
And that was acquired by Clarivate in 2017  and integrated into the Web of Science platform.  2018 was when site was launched.  Then in 2019 we have author one.  So this is what we used to call our inaugural reports product  manuscript screening and assessment software.  And then in 2019, we have on siloed technical checks  being integrated into ScholarOne to assist  with manuscript evaluation.   
Elevation in 2020, and Aga launched  trinket which is our grammar checking language enhancement  tool, and around the same time paper preflight  was also launched by cactus.  I also have profi launching in 2020  and then in 2025 we've got Argo AI peer review assistant.   I'll pass it over to Tony.    
Thanks, Mary.  So I'm Tony Rourke, VP publishing partnerships in Argo.  So this slide is just really summarizing  the kind of tools that are already out there.  And helps to understand the context of what  we're working towards with in Argo,  with the peer review assistant.  So tools already exist for managed summarization,  for literature review, for Meta reviews, decision support,  research integrity and transparency  is a really hot thing.   
In fact, one of the things we're announcing this week  is a service called Mark, which is  about providing what we call the source of truth  behind an article.  How much of the article has been generated by AI.  How much of the article has been cut and paste  from another source.  How much of the article has been plagiarized?  In fact, there'll be more of that later.   
But ultimately, the tools are out there  in a number of different places.  But what we've been trying to do is bring these tools together  in a single place to understand how AI tools can  be leveraged in peer review.  It's kind of important to understand  the different persona that exists within the publishing  workflow.  And there are effectively there are three personas.   
You've got the novice, the kind of person  who is making pretty much a general purpose use of AI.  They've got a basic knowledge of AI  using large language models like ChatGPT or Gemini,  and these reviewers don't do much to customize the models.  They're pretty much taking the products as they are  and using BILT in capabilities, but what they are trying to do  is respond to a number of basic prompts.  They want to understand how they can summarize sections, how they  can clarify reviewer comments, for example, and just  scroll down the screen, how they can draft responses,  how they can improve language.   
And these are the kind of typical simple feedback,  a simple prompts that you might expect an AI novice to use.  The AI literate persona is a researcher  who is using who wants to develop or adapt specialized AI  tools for targeted peer review tasks,  often integrating multiple models  or customizing prompts for greater control and specificity.  So they use cases you can imagine.    
One second, things like automating structured review,  the ability to be able to create creates prompt chains  that guide LLMs through journal specific review  criteria or targeted feedback.  Using AI to flag vague or unprofessional or incomplete  comments, or suggest even improvements in the text,  or even creating custom dashboards.  Use building interfaces that aggregate AI generated  summaries and highlighting potential weaknesses  and assist within the decision making process.   
And then you've got the AI agent creator.  Now, I think six months ago when we  were thinking about this presentation,  this was kind of very much the domain of the computer  specialist.  But now we're seeing this being used much more widely  by the non-computer specialists, somebody who's  got a good grasp of computing and a good grasp of AI,  but who may not necessarily see themselves  as a computer specialist.   
And these agents are usually used or multi-agent systems  that are kind of acting semi-independently  within the peer review process.  They may be designed to handle is quite specific  or quite detailed end to end tasks,  or one that might be used for specialized  tasks such as language for statistics or one for ethics.  So again you see the typical use cases.  Again we would imagine might be things like typical end  to end review facilitation agents that  can analyze manuscripts, generate draft reviews,  suggest reviewer matches, provide  structured feedback with minimal, potentially  minimal human intervention.   
The amount of human intervention very much depends, of course,  on the publisher and on the user.  Another use case might be role based and task specific agents,  where agents are assigned to distinct review roles,  so it could be ethical compliance  using the AI for statistical rigor or even  novelty assessment.  But when we're building these tools,  it's really, really important to understand who those users are  and the expertise that those users have.   
 So I'll just spend a few minutes talking  about the use of LLMs in peer review  and some of the practical concerns,  and how they affect journal policy.  Obviously, a massive issue is one of data privacy  and confidentiality.  Uploading manuscripts to public LLMs  like ChatGPT and Gemini, as we all  comes with huge risks in terms of potential data breaches,  impact on IP, lack of end to end encryption.   
Almost leads to a loss of control over data storage  and possible deletion of important data.  Uploading content may be used to train models,  but that risks exposure of unpublished work or identities  appearing within the LLMs.  There's another huge area where journal policies  may be violated, where you have to be take quite tight control.  So sharing manuscripts within public LLMs  potentially breaches confidentiality.   
Where AI generated content without disclosure  may count as ghost authorship or plagiarism.  I think the point.  The point is it comes with risks.  But you have to ensure that those risks are being mitigated.   So we've developed a peer review workspace to try and address  many of these challenges, to provide  to create a secure environment for publishers  to work in order to support the work of the reviewers,  creating an integrated workspace that's  scalable, that's efficient, that's rigorous,  and providing AI assistance.   
I'd also say it's important to say assistance because it's  AI with human or human.  With AI, it depends really on the workflow  that you prefer, combining different aspects of the process  workflow, whether it's manuscript evaluation, integrity  checks, literature review, et cetera, a recording reviewer  actions ensuring transparency and audit  full auditability throughout the process,  addressing ethical concerns.   
Ensuring that there is a human in the loop as part  of that process, and that the tools are fully customizable  according to the requirements of the publisher.  At which point, I'd like to hand back  to Mary to give you a quick demo of the peer review assistant.   OK, so what is the inaugural AI peer review assistant.  Ultimately, it is a peer review workspace,  so it's not a submission and peer review workflow system.   
It is a standalone workspace that reviewers  will use to complete that process of peer review.  But it can be integrated into existing submission  and peer review systems via API.  So although the review would get the review invitation  through your existing systems, they  would come to the inaugural I peer review assistant  to go through that process of reviewing the manuscript.  So the workspace, the workspace uses our locally deployed LLMs,  and we've set allowed for either an AI  first or a human first approach to completing that peer review.   
And within our locally deployed secure LLMs,  we have multiple LLMs available.  So we've put in place a workflow through which  multiple LLMs will verify the outputs of the previous LLM.  This enables us to reduce any hallucinations  and ensure accuracy, and provide reviewers  with a consensus response from multiple different LLMs.   The workspace features a number of tools  available to the peer reviewer, so there's the Copilot paper  querying.   
This is fairly similar to ChatGPT,  where you would ask a question and it  will answer based upon the data available within the manuscript  and the knowledge within the LLM.  And then there's a structured peer review workspace,  where reviewers work through the process of answering  structured peer review questions.  And all of this, as Tony has highlighted,  is overseen by complete human oversight  with every step of the peer review.   
 For the publisher, we offer a complete analytics dashboard  that allows you to track the work undertaken and the time  that has been spent within the review workspace.  So this gives publishers peace of mind  that the review has been completed with integrity.  So the metrics we can track include things  like the time actively spent working  on the review, the sections or pages of a manuscript  that a reviewer has reviewed and annotated, any key concepts.   
They've analyzed the Copilot questions they may have asked,  and the number and names of related papers  that they may have explored in relation to the peer review.   Does this work.  Yes so this is a screenshot of the peer review workspace.  We can see on the left the paper that is under review.  In the middle pane are the structured peer  review questions.   
And on the right is a specific question that  is being answered by the AI.  So I'll jump now into some pre-recorded screen  demos of the product in action.  So this is the first step we have in our structured peer  review process.  So you can see here that there's a hierarchical summary being  created.  So this is based on the structure of the manuscript  under review.   
It's not restricted to a predetermined format,  so it will quickly and accurately  summarize the relevant content from that paper.  So this is the first step for reviewer,  enabling them to quickly understand the paper  under review at a macro level.   So now we're navigating to the structured peer review  questions.   
And you'll see here there's the ability  to toggle between the eye first and the human first approach  in the right hand pane.  What we're showing here is the eye first.  And the question of are the conclusions  supported by the data.  So you'll have seen it's gone a bit quicker than I  am speaking there.  The process through which the question  was asked, the LLMs went through three separate circles  of multiple different LLMs, checking one another's  responses.   
And then what we can see in the top box  is the consensus response from the three LLMs.  And then there's the ability for the human to write in the box  below their own additional review comments.  So they can agree or disagree with the outputs of the AI  and add their own additional review.  It's worth pointing out here in the middle  where you can see what we've called the SPR section.  That's the structured peer review section.   
These are questions that we have populated ourselves  based on standard structured peer review formats.  These are fully customizable depending  on what a publisher or journals specific peer review questions  are.   So now we're moving on to an example  of the human first workflow.  And we have chosen the question of our key references included.   
So you can see there the reviewer  is able to look through the paper in the Reading Pane  on the left hand side.  They can review the paper.  And that specific question.  They can write the response and then  they've ticked validate by eye.  What's happening here is the AI has  taken the text of what the author has written,  and it's analyzing that.   
And it's going to state whether it agrees or disagrees with what  the human reviewer has written.  The AI will then add their additional review context  that the author can read and agree or disagree with.  We do have a further step in the UI.  We're planning to implement, which is a final summary  comment rather than just the agree,  disagree and save review process.    
So moving on from the structured peer review  form to look at some of the additional functionality  within the workspace.  So this is the Copilot paper querying space.  So if the reviewer has any questions about the paper  that they want to be answered quickly  or want to have specific concepts explained to them,  they can do this using the secure LLM chat functionality  that you can see being worked through on the right here.   
So we can see some of the questions that  have been answered and the responses generated  by the secure LLM that is plugged into the back.  And then in addition to Copilot, we also  have related papers functionality  within the Explore section of the interface.  So this feature allows the reviewer  to quickly identify papers that are  relevant to the paper under review.   
The related papers are sourced from over 200 million  academic papers that we have indexed within our own index  of open data.  And the suggestions.  The suggested papers have been identified  as being semantically relevant to the paper under review.  The reviewer can also highlight specific sections  of the manuscript and further refine  the related papers that are being surfaced to them,  asking just to look for papers related  to a particular part of that paper.   
The reviewer can attach the related papers  within the attached folder, or they can also  pin them to specific sections of the manuscript under review.  If they want to remind themselves that just  one specific related paper was relevant to a specific section  of the manuscript.   Then finally, we're just showing the way  in which the paper under the review  can be annotated and highlighted.   
The reviewer can add specific notes to themselves  as they're undertaking the review,  or they can highlight sections of content  and send it to Copilot for querying.  And all of this is saved within the reviewing panel  for the reviewer to come back to at a later stage  when they're finalizing their review.   So that is a quick overview of the product in action.   
At the moment it is at prototype stage.  And we are working with publishers rolling out  small scale pilots with selected journals or editors  where we've got the ability to implement your own custom peer  review questions and offer it to reviewers or editors  as a standalone tool for them to use and feedback  on as part of their reviewing and editing processes.  So we're really keen to find more early adopters who  are interested in working with us on pilots.   
So do please get in touch with myself or Tony  if you're interested in finding out more,  we'd be happy to do a demo to you  and your teams to give you more detail about the product.  I think we've got six minutes left for questions.  If anyone's got any questions, there's  a mic in the middle of the floor there.  And please say who you are and your question,  and we'll try and answer that question as well as we can.   
 Hi, I'm Elizabeth, I'm from Elsevier.  This was really interesting.  Can I ask a really silly question.  What happens with the reviewProduct.  Is it just not stored at all or is it deleted or not.  The article itself, but the review of it.  That kind of depends on whether it's  a publisher integration or a.   
We also have a use case where a B2C user would just  access this product independently,  directly on inaga Reed.  So when it's directly on inaga Reed  and it's the reviewer independently completing  the review, then they would permanently  have their workspace available to them  with their saved completed reviews where it's a publisher  integration, the reviewer will be given  a link to the workspace for that specific review,  and at the point when that review has been completed,  after a certain amount of time, that link  would become inoperable.   
But we would expect the publisher  to maintain the copy of the review reports,  but we would delete it after some period of time.  Thank you.  Any further questions.   Hello milosh from MDPI.  So nice presentation.  Really amazing product.   
I'm coming from the tech side, so maybe  a bit of a more techie question.  From my experience, LLMs for Gen AI chatbots  are always tempted to agree with what we humans tell them.  And here we see that in a peer review process,  they need also to disagree.  Did you get any feedback from the end users  about how that works exactly.  Yeah, I would agree with your position there that on the whole  you do tend to see.   
I wanted to agree with what you have said,  and to overcome that, we have carefully  crafted the prompts that ultimately exist  in the back end of each of those structured peer review  questions.  And in our testing, we have found  that even if the AI says it agrees,  it is being asked to add additional thoughts and insights  so it will additionally give more peer review feedback,  even if it ultimately agrees with what the human reviewer has  said.   
 If there are no further questions.  Thank you very much, everyone.  I hope you found it useful and we look forward  to any private conversations you may wish to have.  Thank you.