Name:
Drowning in Submissions? Smart Guide to AI-powered Peer Review (The Only Answer to the Deluge)
Description:
Drowning in Submissions? Smart Guide to AI-powered Peer Review (The Only Answer to the Deluge)
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/3ef8e781-cf89-40b9-ad27-14dd361dbca1/videoscrubberimages/Scrubber_1.jpg
Duration:
T00H34M29S
Embed URL:
https://stream.cadmore.media/player/3ef8e781-cf89-40b9-ad27-14dd361dbca1
Content URL:
https://cadmoreoriginalmedia.blob.core.windows.net/3ef8e781-cf89-40b9-ad27-14dd361dbca1/SSP2025 5-28 1415 - Industry Breakout - Hum.mp4?sv=2019-02-02&sr=c&sig=%2FEMZZLVWmARZlcaMxTJcxBsJ2Bv17b3kZRQ%2FaHRhfys%3D&st=2025-07-02T01%3A35%3A02Z&se=2025-07-02T03%3A40%3A02Z&sp=r
Upload Date:
2025-06-06T00:00:00.0000000
Transcript:
Language: EN.
Segment:0 .
It wasn't on. It wasn't on. God, is that you. Yeah please buy our technology. So that's good. Closed caption. That's so awesome. Yeah look at the closed captions. So people tend to get frustrated when we put up a slide.
The problem is huge. The deluge is huge. A massive number of annual submission increases. And so this is more of a thought provoking statement. Some of the early returns from apes experimentation and the editors who are playing around with. So why don't we take one big step back. Go ahead an OK, so for those of you that don't ape publishing started working with hum a while back on some issues and ideas around peer review.
And then, as you'll hear in the session after us purpose led publishing was born and that effort wound up folded into purpose led publishing, which includes AIP publishing, IOP publishing and the APS. So what we're starting to talk about here is some of what we've been learning around piloting a peer review assistant. And so Destin, take it away. Thank you.
It's just hard not to just jump in. And that's why we have each other. Yeah I know. Rein him back. Rein him back. So one of the things we found early testing of alchemist review is one of the editors cited, it used to take me an hour to review a manuscript. Now it takes me 40 to 50 minutes.
Does anybody here know what alchemist review is. Show of hands. Have you heard of alchemist review. Yeah, a handful of people. OK, well, that's still a third of the people here. We're going to show them though. OK we're going to show instead of just tell. I know we're a vaudeville act. And so this is more of let's breeze through the premises.
If you had all of that time across all of the editors that worked in scholarly publishing, instead of saying, we're going to eliminate jobs, it's actually you could use that time so much better. And the central premise here is you can actually make this whole editorial enterprise much more efficient, more pleasant. You can take the time that is being used on lower level and menial tasks, and you can apply it towards a better author and reviewer and frankly, editor experience.
So that's some of the motivating factors behind alchemist review and especially what is happening with PLP. And so we thought we'd so we're going to pause for one second. So as you can tell we just like being in a meeting with us. It really is. So just taking another step back. So one of the things when we started at AIP looking at peer review and then with PLP is there were a few things going on that were interesting and concerning some of them.
So one thing that was concerning was getting reports that people would take manuscripts and throw them into foundation models like ChatGPT and Claude, and get either a review written for them or get some kind of information back from them. And while this may seem, oh, well, that's not so bad. The reality this is someone's work and someone else is taking that and putting it into an environment that's open and not always clear about how it would be used.
So that really is a problem. We don't want to do that. So that was one thing. Another thing that was happening is as everybody in the world knows, peer review is just getting it's going to be harder and harder and harder to find peer reviewers, but really it's because there's so much content. There's so many manuscripts out there.
There are only so many reviewers. It's hard to bring on new reviewers. It's hard to train new reviewers. All these challenges exist that all of a sudden the demand for peer review is far outstripping the supply. And so we were thinking, OK, well, what could you do about that. And we were talking with him about this.
And so we also were very sensitive to the fact that people are afraid. Afraid is a strong word. I don't mean to sound condescending, but there I was an unknown to a lot of people, and therefore the disposition of something that was used with I was unclear and sometimes worrisome to people. So we wanted to be sensitive to that.
And so that's when we started to put together a prototype with and this at this point was PLP putting together a prototype with him. And we decided, well, let's address some things that are very clear and easy to see that we're not asking for anybody to make a decision. We're not asking for a machine to make a decision, but we're providing a reviewer with information that may be useful to them.
And so we started with and by reviewer you mean editor. Yes oh, and that's the other thing. One of our core premises is that the editor is the doorway to the reviewer, and that the editor has to be super comfortable with something and see its value before it starts to get disseminated to reviewers. Thank you. Now, what did we do first.
Yeah so we're not starting with decisioning. We are starting with information and insights. This is one of the existing views that does deep textual analysis. So trying to understand various sorts of things like the summaries and the methods and how they were applied. Some of the key author contributions. What is the author saying that they are actually contributing to the literature.
Things that can be evaluated by expert editors and various sorts of things like taxonomy and key concepts, which can help with some of the jobs like reviewer selection. We work with a partner called grounded AI, and grounded AI does something which really no human does. And so they go through each citation, each reference, and they evaluate those with a suite of tests and are ultimately looking at things like, is there a match found in the literature.
Is there a relevance of that match the way in which it was cited. And no human is going to go through 400 different citations. Even the most diligent ones. And so the citation analysis has a mix of individual flags, as well as some of the bubble up graphs to look at things like when the citations were published, which authors and what's their concentration, which flags things like self citation and high author concentration, and which journals were the citations coming from.
So flagging things which may, may look out of line. So this is something this is a tool that's available, which no editor, no reviewer would ultimately be able to do on their own. And this is really shepherding attention and one of those time saving sort of elements, and mentioned passing content to ChatGPT, passing it to Claude. All of this is done in a private data and AI cloud, and there is a chatbot which speaks academic.
And so it already has both the original manuscript preloaded as well as all of these assets. And so if you want to know what a protoplanetary disk is can just ask the assistant. And it already has the entire paper passed to it. You can ask me to ask you what is a protoplanetary disk. It is a pancake of gas, ice and dust which surrounds a young star. My wife's an astrophysicist, so this is actually one of her students papers that we're.
Which is not published yet. So, yes. She consented. Becky's great. So this is a very simple version of that. But ultimately, we see people asking fairly intricate questions up to and including editors asking to create a review to benchmark an AI generated review against some of the things that are coming in from humans.
We did want to preview a handful of the things that are coming soon. So one is using vision language models to look at figures. This is something where if you're an editor, figures come at the bottom of a manuscript and you're scrolling up and down to find whether it's actually located within the text. You're also doing deeper analysis of the scale and error bars present.
So we're ultimately doing this sort of work up here and providing a set of flagging. One of the things you should pay attention more to what should you dig in further and potentially provide feedback on. We're also pulling related papers. One of the things we find that, especially with some of the larger journals and non-employee editors, is it's a manuscript plus Google.
And so they're doing just a ton of googling. And one of the things they're googling is for related papers. So we have an agent go out and we retrieve papers. We do a quick summary of the paper and suggest what the relevance is. So that has a multiple set of effects. But one of the things they're doing is taking those key author claims and validating the extent to which this is actually a novel contribution to the literature.
They're also that's a key way in which they're looking for reviewers. If you don't have some of the more sophisticated reviewer identification tools, so you can pull from those related papers. And the last bit is journal fit. So this is looking at a suite of evaluations. One is relevance to aims and scope of the journal, but also novelty, potential impact and scientific rigor.
So as a managing editor, those are the key factors in which you're using to ultimately apply the test of. Is this a good fit for my journal. And various journals have various stringency criteria that they apply to this, but it allows you to reshape processes. So if they're all green, that's something which should move out of the queue right away if it's red. On relevance to scope and green elsewhere, that's something which is a redirect candidate.
So get it into that redirect and cascade pool very quickly. So you can really rebuild processes around these. And it gets very interesting as well as you start to get to the bigger, higher, more scaled journals. Can you go back to 2 1 more. All right. You see the top of that where it says manuscript manuscript digest citation evaluation.
We're going to take a little digestive break. So in total, what we're looking at right here is under that tab manuscript, they can get the actual manuscript that's being looked at. So when you go to the manuscript digest, that is a lot of the features and functions that Dustin just went through where they can see in a summary form, citation evaluation is the detailed analysis of citations and so on.
So if you go back to our timeline, what basically happened is we started to work on this prototype and the prototype became alchemist review, which is now obviously out there. And as Dustin elaborated on to high school. Yeah, college soon. It's getting good grades so far, which we're very proud. Yeah so just so you understand that this is a Web App. So it's not yet integrated into workflows.
The way that I know that AP and the PLP folks we would like to do that and it's coming. But all of the first three tabs up there manuscript, manuscript digest, citation, evaluation, they are all there right now. And that's what AIP, IOP, AIP, IOP, naps are using right now. That's what's providing a lot of the learning, which is then helping us focus on what to build next. Yes, indeed. So one of the things that people ask us is like, where do we start.
Like, what's a good journal. What, what. Where can this make the biggest impact. So for one, it's picking the right sort of journal and set of individuals who are working on it. Often we find it's like a group of 5 to six very highly empowered people, a small team who is interested. This is very much an opt in process. We see the frontier, basically prestige Giants down through mega journals as the places where you can make the highest impact.
So the notion of friction is really like lots of checks, high rigor and you're things like science, nature, cell, the field Giants. We'll say AIP advances, of course. So ultimately they have quite a bit of friction, but also huge volumes. And then you have maybe something like plus 1 over on the mega side. So as you get to scale, there's obviously a lot of places in which you can start to do the process redesign, but there's a lot of friction as you go to the prestige journals as well.
We see alchemist review as making a dent on all of these eventually. But as you think about the kind of first steps you need to make in choosing places to start, and often we see publishers start with a journal or two that those are the places where you look first and then you'd go to are there the right sort of editors, editorial staff. We're also working with things like people like transfer editors and folks who are working at a journals portfolio level.
So it really depends on where you ultimately have the pain points and where you want to demonstrate the impact. 0.3 on the Roi proof or political cover. Some of this is deciding like what is the decision architecture, you have internally and some groups to move something like this forward. They need to make money in six months. And some groups they ultimately need to show that something works.
So this is kind of working in principle. So this is something where we help work through the decision matrix as we figure out how to get started. We're going to admit that not everything has gone right. One of the things we launched without was a manuscript ID, so it was just search by the title, and we had almost torches and pitchforks backlash from the editorial group.
This is just how they live. And so this was our top priority feature that we launched. We can see the session replay and we saw the search to finding the manuscript increased by 25% So this is just so deeply habituated into the way in which editors work in their day to day workflow. The more senior, highly trained advanced editors. Really we're almost cruelly dismissive of summaries.
They hurt your feelings a little. Not hurt feelings. But it was like it didn't have to be that sharp for us to get the point. And it's like it's already there in the paper. And we're like, we know because this was looking at the paper. But ultimately, some of this feedback, especially from the editorial cohorts, is a little bit harsh. And the more fundamental point is some of these things are more useful for certain people at certain parts of the workflow.
And so a lot of this is just putting things out there and getting the response and being OK when some of that response is quite harsh. And as we got started start simple, provisional, more pushbutton start. And it was basically like a data scientist saddling up and manually processing manuscripts. And so the initial processing times could be up to five days initially.
And that's like a complete non-starter. So it was like the first set of manuscripts that went through. People were reviewing it after the fact and they were like, well, if you read the whole manuscript and then you go to the analysis, it's like, well, maybe not that sort of surprising or exciting, but some things have gone well. So we track our weekly and monthly active users. So we have more than 22 editors.
We've exceeded the 1,000 manuscript mark. Incredible amount of feedback. And maybe this shouldn't be surprising from editors that they'd like to provide feedback, but they do. And so it's a mix of in-app feedback. It's conversations, it's surveys. And so the in-app feedback comes in via Slack channel. And it's just like raining all the time. Thumbs up and thumbs down.
You really have to brace yourself for that sort of stuff. But it's been really great, and it really helped us to hone in on what we needed to refine and basically do for the next iteration. And shout out to our friends at ScholarOne. One of the things that you certainly as a startup who's providing a technology solution like you can't exist in isolation, especially in this space. You need to get the manuscripts.
You need to ultimately get the right sort of flags and assessment and information into the right places. And ScholarOne has been wonderful in helping us facilitate the integration, especially on the manuscript feed side. So, as Anne mentioned, the ultimate final state is basically having around trip, a two way integration where we're basically the intelligence layer and passing the right things back to some of these other solutions.
So the image I first did with and facing you, and it was horrific. Not because Anne isn't wonderful and lovely, but it's because it was so uncanny. It was like just off a little bit. You had a chin thing. Anyhow so this is the Fireside chat from the back. But I mean, curious. I know that there was a spreadsheet of 30 to 40 sort of ideas at one point, and it's not the only thing that you're working on.
But why did this get such time and attention relative to other things. So I think a few reasons. So first of all, I mean, not to sound silly about it, but we kind of had a partner lined up, which was really nice. It wasn't a question of figuring out, how we were going to get this done and who are we going to get to help us. But when we started looking at different applications of AI and what we were doing, this just seemed like a really, really logical one.
And one of the things that our premises was peer review is kind of bundled and people talk about it at this one monolithic thing. But the reality is it's a lot of different activities, and some of them are not really well suited for a human, and some of them are well suited for a human decision maker. But even that decision maker could really leverage some better refined and prepared information. So it jumped out first because it was an area where we felt like we could really make a difference, and that.
It was also interesting enough. Melissa Patterson is here and she runs at AIP publishing. She runs our editorial groups and she has a journal editors conference twice a year. And at that conference we did some polling. And interestingly, the editors were really on board with the idea of trying to get work, especially help with citation analysis and review. And in general, when we asked them, who would like to who would like to start, who wants to volunteer.
We had like 53% of the editors say that they would want their journal to do this. So I think that was another reason there was excitement in the editor community within AP publishing and our colleagues at PLP about going into this, going into this area. One thing that you didn't ask me, but I would like to stress, because I think it's super important, is we were starting from scratch with hum.
So what that means is hum had great capabilities that they had built up through their CDP and other foundational products that they were building, and that was a bit of a head start, which is nice, but this is not done. This is not fully baked. So the features that we're prototyping are pretty well baked. And as Dustin said, they're adding to it. But it's a mindset that you have to maintain that you are engaging to make something better and better and better.
You are not getting perfection out of the gate. And if that's what you want, not only is something like this difficult for you, but I believe this decade and the following ones will also be difficult for you. So just putting that out there, I mean, kudos to you and the team because I came to you and was like, the first interface is going to be a Google Doc. And you were like, sure, we'll pay for that.
Which is commendable, but it allows us to move so quickly. Not a ton. I just want to say yeah, no, no, no, we're not retiring. But yeah, so early stage through middle stage high school version. Do you want to lose your job. Do you want to make some no friends. No the cultural pushback bit. No well, actually, I think I almost kind of started to address in that point is that some folks expect something to be fully baked and to have a 0.0001 error rate.
That's not what this is. That's not what experimentation and exploration are. And people need to understand that. Another one is when you're talking about Roi, well, what's the Roi when we want to give this to our editors and peer reviewers for free. The Roi isn't about dollars. It's about impact. It's about one of the things that we said is we want to try to reduce the effort of we want to increase the being of our peer reviewers.
And I don't even want to say burden because that is a value statement. I know there was a nature article that got panned because they basically were talking with how burdensome peer review was and all of these people on X or whatever it's called in other places, were saying it's part of my job. I love doing it, but why does it have to be so hard and time consuming.
So it isn't that reduce the burden, it's that really bring to peer review some of the efficiencies and not to keep it all lumped together. So I think the cultural pushback came from peer review is sacrosanct. Well, not really citation checking. OK maybe the idea of a human making decision about acceptance is sacrosanct, but there's a lot more to peer review than just that.
So getting people to pick that apart and think about what are the places that this could work, the idea that, this may not yield an Roi for a little while because it's an experiment. And that whole mentality was but they were great about it. Like they came like when I say it was the hardest, it was more like the thing that there were maybe frequent reminders of. But it wasn't that anybody wasn't really on board.
It was. It was nice seeing that many of the editors who have opted in are really more on the I can see where this is going or if could you do this. Or if only you could do this, then I could do that. And so it was like this is useful for x and y context. It's like you kind of need to have those people there as opposed to one of the things I pushed back against is the word trial, which British people love.
And it's like you're putting this very early thing on trial and we're evaluating it, which is not take trial that way. No, but I do because we're a breath. I know it's taking a breath, but this is the opposite of that. It's more of like we're opening up to see how this might evolve in the future. And that's a very of excellent place to be. What's next. And so right now we are in the middle of what we're calling our prototype phase, which is alchemist review as it is present in the market now.
And so we're getting a lot of data. And we are going to take a limited period to. Although access would likely continue for the editors using the tool, we're cutting off periods where we're taking feedback and saying, and we'll come back to you, Dustin, and say, well, and quite frankly, most of the time, hum is before us. And that's saying we're seeing this.
What do you think. But we're going to take stock and then start to think about how do we do more of an integration and how do we roll this out to more journals. Exciting it was seven minutes. If you would like to ask a question, we can either have a runner run a microphone around, or there's one right in the middle of that aisle there. I think these are too dangerous to throw.
I mean, too soon to tell. Ask the question. So do you see it's really editors who are the primary audience for this tool. I heard mention of managing editor as well. So just can you talk about the different groups that might want to use it.
I'll start and give it to you. So we are starting with editors just because we feel as though they are the gatekeepers to peer reviewers and not even just out of respect, but because of their knowledge and expertise. And really, what's the first review done. It's triage. It's bench rejects. So we're starting there.
But we have every intention of having this go broader. We've had dreams about wow, wouldn't it be cool if we could use this to train peer reviewers in the future. Or wouldn't it be neat if with the proper controls, authors could just put their manuscripts in there and see what kind of feedback, if we could come up with their citation analysis yields we don't want to give people ways to game the system.
But by the same token, there may be ways to actually expedite the process before we even believe it starts so, so cautiously. But yeah. So the first thing editors, the next thing would be, reviewers. And then with depending on how that went, thinking about, more broadly certainly training peer reviewers, I think it would be a lot easier to train peer reviewers where they actually interact with something and get feedback than the way that it's done now.
And it would also be easy to do on their own time, that kind of stuff. Yes, the current version is focused on editors, mostly because it allows us to move most quickly as we think about exposing to authors, exposing to reviewers where we have to not build things, where we can just build the features that provide value. So we're working basically from enhanced version of triage through desk reject, through associate editor handling editor assignment, through the kind of reviewer invite.
And so some people are introducing things the methods and key claims into the reviewer invite emails as a sort of off label use. And so, yeah, it is slightly off label. Not the primary intention of that. But we ultimately see this, and we've already had early feedback from both prospective reviewers and authors who are basically like, please expose this tool. It's more of just a matter of time.
And then just again, how is the manuscript getting in right now can be magical pipes. OK, yes. Some of those magical pipes come from ScholarOne, but it's basically the origin submission system is then radiated feed to us. OK short very simple version. That guy in the back.
So that cheerful chap in the back asked about subject matter. And so right now, so basically are we only focused on physics since we talked a lot about PLP in the physics orgs. So right now we're doing science, technology, and engineering. So the Ste of STEM, we're pushing off medicine, math and humanities and social science until later in the year.
They have different data sources, some different content types. Math, for instance, has a whole bunch of proofs and formulas. And so ultimately there's a broad array of things, which we'll support now. But that's the key focus. So ST. All the way back. Great presentation. So you talked about figure analysis as it in the current status.
Can it detect image manipulation, which is OK. Is it something that you're planning to add on later on. This is a yes. Sorry in the current form. Can the image analysis detect image manipulation. So for us, we tend to focus really heavily on the editor efficiency side. And if we're doing research integrity stuff, it kind of pulls you in a different direction.
And so the things that the existing class of models are really good at are, for one, finding what's the most prominent figure within a paper. And then we can use it in a bunch of visual formats, including that digest. We can then also see if the figures were referenced. So this is something where humans have to do that. Now that's a lot of wasted time. And then do analysis of the labels, the scale error bars and some of the other components.
So that's a sort of analysis, which is again done by a human. It's not like the top of a license sort of thing. And it's not that we'll never get there. It's just that's something which image twin and the big database of images and looking for deviations that they're handling pretty well. And right now we're trying to step into gaps where nobody is. It was like.
So I've obviously heard about this load through an and through APIs work. But I guess my question is, have you reached that probably not yet, but do you anticipate reaching that point where there's too much. So I guess what I mean is, if someone hands me a PowerPoint with all the notes, it's completely made. I then have to redo the whole PowerPoint because I can't present it like that.
Or is there going to be a tipping point where editors and the machine is doing so much that the editors are like, I just need to actually get into the paper because is that you see that tipping point happening, or is so much busy work that that's far away. I don't know if that question makes sense if I understand it correctly, I think it makes sense. I mean, one of the things you see is like us adding tabs, which is a lot of pages, a lot of pixels, then the job is to ultimately, how do you take that up to like the top level navigation.
How do you do flagging and notification and what people actually, at least they're saying they want is a lot of that to happen within the manuscript submission system. So if there are flags, then ultimately you want. So the flagging so journal fit basically like the four assessment criteria. All of those are green. So that ultimately is something that I can see on my sort of manuscript home page within ScholarOne or editorial manager, whatever it is, passing things like summaries and taxonomy.
So you get that in a single page. And that's an iterative thing where you develop more assessments. The other thing is like a lot of this is thinking about how does an individual person do this right now. And people are starting to think of how do you peel out new roles where new roles that are squarely focused on the image analysis side of things.
And so a lot of this is like New tools recomposing existing role recomposing the interface, and that is a forever sort of thing. So we are so early on that. Assuming you could integrate with an editorial system, what do you imagine is the product that you would send back to the editorial system right now. The user just goes to the web interface.
But what in order for the person for a user to access these tabs, what would go back or the information that you're generating. What do you imagine would go back to an editorial system. XML, a token a link to the adjacent package if you want the technical. Yeah so basically like text any sort of indicators and then a link. And so if you want to go deeper then you're able to go deeper.
If it's all green then you don't really need to go deeper. So one of the things too I think that's important is right now as standalone, what happens is someone has to go there. And then when they go back, they can't make a decision in the tool. So it would be really nice to just pull that. So that information about their feedback and their actual peer review report that they write or approve would also need to go back and whatever kind of way you want to put it.
Yes, sure. And the option is either we pull some of the state change, advanced peer review or redirect directly into alchemist review and trigger that via API. Or we pass some of the things back to the submission system or both. And I know we're over time, so we're probably being told that. One other thing I was just thinking about is that what we're doing now.
OK, what we're doing now in the prototype stage is that we are going to make a PDF of every bit of information that the tool provided so that when the manuscript itself is aged out of the prototype and it doesn't exist anymore for an audit trail, we'll have that PDF to record back with the manuscript. So if there's ever an audit or anything, we can see what the peer reviewers saw.
So at least they have that information. We don't need that yet. I think we're done. Thank you. We got oh can we do Paul's question real quick. Come on, do it, big guy. Oh it's Paul I'm going to send you on a lost journey. Wait this depends. You get to ask a question if you're coming to dinner tonight.
I'm just teasing. No, Lord it over his head. So how far away would you be from making that application available to the author. So they could just accept the change. Accept the revision request, draft it and kick it back in. Probably next year. So I mean, our ideal with the review side is not just to the first step will likely be exposing some version of this sort of analysis.
But if you're talking about the review side of things, ideally what we would want is a version of this to go to the reviewer to have a conversational, basically AI plus reviewer to have the review be written. And ultimately then that goes back to the not to the editor yet, but to an AI, which then evaluates all of them and gets a final package back to the editor, which then goes back to the author. If you want to talk about probably 2027.
Thanks, Paul. Thanks, everyone.