Name:
Safeguarding Research Integrity: Enhancing Identity Verification and Accountability with RICS
Description:
Safeguarding Research Integrity: Enhancing Identity Verification and Accountability with RICS
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/1df8ff9c-c5ac-4598-b57d-c835d5fec6d5/videoscrubberimages/Scrubber_1.jpg
Duration:
T00H22M18S
Embed URL:
https://stream.cadmore.media/player/1df8ff9c-c5ac-4598-b57d-c835d5fec6d5
Content URL:
https://cadmoreoriginalmedia.blob.core.windows.net/1df8ff9c-c5ac-4598-b57d-c835d5fec6d5/SSP2025 5-28 1330 - Industry Breakout - HighWire Press.mp4?sv=2019-02-02&sr=c&sig=tdVFS5i38HwCtY1gnRIKUYYI6VmZdoqxEB4ImHT05UU%3D&st=2025-06-15T22%3A16%3A34Z&se=2025-06-16T00%3A21%3A34Z&sp=r
Upload Date:
2025-06-06T00:00:00.0000000
Transcript:
Language: EN.
Segment:0 .
OK you ready. Josh OK, great. So, hello. My name is Tony Alves and this is Joshua Roth. We lead the product team at highwire press. Highwire is the platforms division of NPS. We're going to talk today about what we've learned from our work on research integrity, including my involvement with the stem integrity hub over the past four years.
And we're also what we've learned helping our partners approach the challenges. And in particular, we're going to be talking about the research integrity check system that we call RICs, which we also presented at STM in December. So we were going to begin with a quick look at the scale of the problem facing scholarly publishing and the scholarly publishing community today.
And so in 2023, more than 10,000 research papers were retracted. This reflects a deep and growing vulnerability in the publishing process and in the publishing ecosystem. The consequences they're not just about your reputation, they're also economic. So according to an earlier study published in eLife, the estimated cost per Article retracted due to misconduct or misconduct was over $425,000, and that figure includes direct waste of research funding.
And it also includes the cost of time, institutional investigations, and the ripple effects of misleading science on subsequent research and policy decisions. And perhaps more troubling is that long term erosion of public trust. So according to Pew Research, trust in science has dropped from 87% before the pandemic to 76% in 2024. That is a double digit decline.
And so there's a perception of integrity and Science at risk. And it's not just a one off case because of what we see is the volumes of misconduct have they've been increasing. And we are seeing more and more systems are working to prevent this. And these systems are struggling to keep up with the volume of problems. So, so to state the obvious research integrity is a complex problem.
It's not going away. On the dark side, there is increasingly more sophisticated AI generating highly believable and completely made up research. But the industry really has responded to these threats. We have many initiatives, such as Retraction Watch. We have NISO recommended practice for communication of retractions, removals and expressions of concern. We have the World Conference on research integrity that happens every year.
And there are now research integrity departments and research integrity officers, many of which did not even exist three years ago. And there's also a proliferation of research integrity tools. So there are many great research integrity tools that check for things like paper mill activity and manipulated images and conflicts of interest and plagiarism, reproducibility, compliance. Many of our clients are asking us about the proliferation of research intelligence, research integrity tools and which they should use at highwire.
We've had the pleasure of talking and working with many of these tools. We've had some hand on hands on integrations into our products, including our new submission and peer review tracking system called digico pro, and into our new research integrity check system, which is a standalone dashboard that we will talk about in more detail. So with that, I'm going to turn it over to Josh for a bit. OK thanks.
So we've got all these great tools. So it's no surprise that our clients are regularly asking us how to incorporate these tools into their workflows. And just at the same time as we started receiving this question, our colleagues in our publishing department were also asking the same thing. We have about 100 staff working at NPS on research integrity. They process around 10,000 papers every year against several research integrity checks for many of the world's leading publishers, and so they were very excited to see all these tools arrive.
They've subscribed to many of them, but when we were speaking to them as part of our interviews, as part of product research, the same questions and challenges kept on coming back. And in particular this challenge that although the tools themselves are automated, they still require individual data harvesting. So you have to go to one tool, get the data, go to one tool, get the data, and manually compile the output of those tools into a handwritten report for an editor in chief, for example.
And that's time consuming. And that means that fewer of those checks can be executed. So that's one of the problems that we're trying to solve. We've been doing a whole bunch of additional research and market analysis as well. We can boil down the main challenges as we see it into three core items. The first is that digital identity is the biggest problem right now.
It is very difficult to make sure we properly are evaluating each author and each reviewer to check that who they say they are, and that they're qualified to be doing what they are claiming to do at scale. So that's a problem to solve. The second is that it takes a great deal of time, as I just mentioned. And in many cases, that in particularly impacts smaller publishers who are already very resource constrained.
And thirdly, there is no automated solution which is perfect. And so in particular, anything which involves edge cases that need to be addressed still requires human intervention to work through them. These are three big challenges. And that gives us a framework that we've been developing this new product for the research integrity check system.
So for digital identity fraud, we have now a comprehensive set of identity checks, which is in line with the 2025 best practice recommendations. For the issue of people being short on time. We're putting multiple different services all into a single dashboard, so they need to look in one place rather than men. And for the issue of requiring human input, we have an experienced team of humans.
And so we can draw on those humans to augment those automated processes. This is an example of the RICs dashboard, in which we're pulling data from multiple different sources into one place, categorizing them into identity, manuscript ethics, and data challenges. At any point, the user can then deep dive into that data to see where that data is coming from, and investigate it in more detail if they have time to.
I'd like to spend the rest of this talk talking a bit more detail about how we're addressing each of these challenges, starting with Tony. Great so I'm just going to interrupt Josh for a bit to briefly talk about the STM Association's March 2025 report, trusted identity and academic publishing. That report provides essential context for why we developed RICs and what it's designed to address. The report highlights a core vulnerability in scholarly publishing, and that is the lack of reliable identity verification in submission and editorial systems that gap.
It enables fraudulent activity to flourish, allowing fake authors, fake reviewers. And even misaligned editors to gain access to the publishing process. That report emphasizes that these bad actors don't just harm individual papers. They erode trust in the entire research endeavor. That's why a robust identity verification tool like Rick's needs to operate across the entire editorial workflow.
It needs to catch problems early during the submission, during the editor assignment, and then again at reviewer selection so that fraud is prevented before it compromises the scholarly record. Did you switch slides. I just did. Yeah OK. At the heart of the integrity crisis is a fundamental issue.
Most editorial and submission systems rely on weak or non-existent identity checks. Fraudulent actors exploit these gaps. They pose as authors, they pose as reviewers, and they pose as guest editors. They use fake email addresses, aliases, and they steal other researchers credentials, and the result is a mistrust in the peer review process. It misassigned editorial roles in the guest editor world and then of course, compromised publications that erode research integrity.
The SDM report. It clearly outlines the top tactics that are used by bad actors. The number one method is the use of fake, personal and non-institutional email addresses. Impersonation through domains that mimic legitimate institutions, and the hijacking of real researcher identities is also a common problem.
Next slide. So this slide illustrates well I think we're one slide back. Oh yeah. So we probably are off by one. Oh I'll skip to this one. No You were right back one. Nope oh that one. Yeah yep. So this one this slide just illustrates how fraud unfolds step by step.
The author registers where the personal email, they submit a fake manuscript with fabricated co-authors. And then they suggest reviewers that they control in parallel, a guest editor who might also be fraudulent. They might invent. They might invite colluding authors and colluding reviewers to populate a special issue. These orchestrated actions they often lead to fraudulent publications, which are only discovered after the damage is done and then triggering reaction and retractions.
And of course, resulting in reputational harm and wasted resources and that overall degradation of trust in science by the public. So one more slide, Josh, and then you can have it back the STM recommended approach. It's a layered, proportional, proportionate identity verification adjusted to the user's role, what role they're playing in the process. And what associated risk their involvement has.
So first, you should be requiring institutional email or Federated logins, such as a shibboleth type login so that you can verify institutional affiliation, cross-reference identity claims. So you should be cross-referencing people's identity claims. For example, you want to look at their ORCID trust markers. You want to examine their publication history, prior reviews, their funding records, and then in high risk cases such as guest editors, perhaps integrate.
Third party identity verification tools, or government IDs or biometrics. And I know that can be really difficult. The goal is not just to block fraud, but to create a trustworthy and transparent publishing ecosystem. So now, Josh, back to you. Thanks got the right slide. So that identity report is absolutely integrable.
I highly recommend it. And thank you to those who contributed to it. I think some of them are even in the room. But how do we put those principles into practice. How do we enable those checks to happen. Well, it outlines a whole set of potential risk factors. We can analyze each of those risk factors. And then with the help of our research integrity team who have human experience in doing this, each of those factors can contribute to an overall score, which goes into a dashboard to give the user an overall picture of the risk here.
So for example, the email address of the reviewer or of the author is itself extremely useful. As Tony mentioned, a non-institutional email is much more suspicious than an institutional email, and that's pretty easy for us to achieve. We have a known list of institutions we can easily differentiate between a Gmail and a account, but some things are a bit trickier. So for example, the example at the bottom there, David Baker at washington.edu. David Baker really is a Nobel Prize winning scientist.
He really is from Washington University, but that's not his email address. And this is a very common attack vector. And we can check whether it is his email address through tools such as the orchid API. The orchid API is obviously an absolute treasure trove of information about research integrity. We can detect if they have an orchid at all, how old the orchid account is, if they have any trust markers, which obviously are an extremely important thing to find, and we encourage all publishers to submit their trust markers to orchid whenever they can.
Quite a common thing is authors having published an unbelievable number of papers, 2000 papers in the last three months. It's obviously a pretty major red flag. Another approach is to use the Retraction Watch database that we can now get through the Crossref API. Again, a retracted paper does not mean the author has done something bad, but it is a potential flag that could contribute to an overall risk score.
Another approach is to have a look at the authors previous topics. If an author has only published previously in maths for the last 10 years, and then they publish a paper on art history, that could be an indication of a problem. And so we can use the openlinux API for this purpose to look at their previous publication history.
We created an internal knowledge graph based on that API that we can now query to assess that risk factor. Aside from the author and the reviewer, the other agent in this equation is the submitter, the actual person using the submission tool. And we're using sigma, which is our access control system to help evaluate the submitter. So for example, if a user has only ever signed in using sigma from Australia for the last three years, and then suddenly they're submitting a new paper from France, that could be an indication that there's a fraudulent agent involved here.
Suspicious affiliations is another thing we can get from the open source database and orchids APIs. If reviewers and authors have been frequently reviewing each other's papers, for example, in a suspicious manner. So there are several identity checks that we can execute it actually dozens. And together they build up a picture of potential risk. The second of three things we are checking here is that the time it takes to execute these tasks.
Because these tools are excellent. We're so happy that we can use them. Now, but they are so much more valuable when used together because each of them is contributing again to an overall picture. So it's kind of a no brainer for us to pull all of these tools through their respective APIs into a single dashboard. So the example there of a dashboard on the left hand side in which all the Articles of a journal, their risk factor is being aggregated on the left, and then the user can deep dive into any of those factors if they would like to.
On the right. So that's where we're trying to get to. This is a technical challenge because normalizing a diverse set of data takes some engineering. Some of the services have a red, Amber, green factor. Others have a score out of 100. Others have qualitative feedback. And so we need to find ways to pull that all into a single normalized view.
Our approach to this is threefold. Firstly transparency. So we're showing the calculation we have used to arrive at a score so that the editor can review, if that makes sense to them as well. It is configurable because for sure every publisher has a different view about what they see as the highest risk, and they can configure that in an interface such as the one on the right, and there's no avoiding the fact that it does require a whole bunch of maths and iterative testing to put it all together.
That's involved as well. And then finally, this thing of humans still being required. And so many of the AI reports involve this key recommendation of keeping humans in the loop, using humans to augment AI. Sorry, using AI to augment human processes rather than the other way around, as I just said. So, for example, if you were evaluating an author and most of the checks look pretty good, but there are a couple of red flags there, it's kind of an Amber factor.
Would you reject it. Would you accept it. Would you spend more time investigating it further. That's a hard decision for many research integrity teams to make and often results in just more time, which again, results in fewer of those checks being able to be executed. And so wouldn't it be great if there were simply a button on the interface and manual check in which you can summon a human being into the equation, all from the same interface.
In which that human being can run over an agreed SLA 12 hours 24 hours, for example. Do a deep dive into that particular author's identity, for example, and then return to you. Within the same interface. A more detailed report about that integrity check and the recommendation along with it. And we're able to do that because we do have a globally distributed research integrity team in the background there as well.
So we're very excited about where we are on Ricks at the moment. We've been conducting a whole bunch of research, interviews and presenting at community events. I think it will save people a whole bunch of time, which means that more research integrity checks can happen, which will ultimately, hopefully strengthen the overall scientific record. It's being built at the moment.
I should say it's not complete yet. The beta release is due at the end of this year, and if anyone would be interested in finding out more, we'll available in the main Hall over there. We're also very interested in anyone who would like to help us with further user research. If you are yourself hands on involved in research integrity, for example, or if there's a particular feature for a research integrity tool you would love to see, we would love to hear from that as well.
And that's it. Thanks very much. Do we have time. Yeah, we have about 10 minutes for questions which we thought we would go over, so that's pretty good. I'm sure the first question is why is Tony sitting and Josh is standing. Well, I'm 20 years older than Josh, so I get to sit and he gets to stand.
So any other questions for. There's a microphone right in the middle of the floor. OK, I'm a first time attendee, so forgive me, but we get a lot of authors from China and their email addresses are often not institutional. And I think it's because they've got so many firewalls. So have you thought about that. Yeah So we have an office in China and it will be and we've been talking with them about what some of the possible opportunities are there and what some of the possible solutions would be to that.
I don't know if you've had any more specific thoughts around that. We're acutely aware of the challenges that China presents, and we're happy to have a team in China, as I say, who can help us with that. There are certainly inappropriate broad brushstroke approaches which we should not take. We don't want to say anything from China automatically has a higher risk.
But there are both technical and governmental challenges that don't exist in other countries, of course. For example, many people and researchers do not have an email address in the first place. And so that's the core mechanism for signing. Then that obviously presents some challenges. So frankly we don't have all of the answers on that yet, but the solution to it is speaking to researchers directly in China to get them involved in the decision making process.
Hello, I'm Konstantin from team of medical publishers. I think this is a great tool. My question, the way I understand this product is it's currently just analyzing individual authors. Is it also planned that you can just put basically in simple terms, you put a whole paper into this software and then you can get an instant assessment of all 10, 12, 15 authors at the same time.
Correct the latter. Yeah so it is not just an individual author check. We did go down that path for a while, but I think the picture that it is building is much richer if we have full text analysis involved in that as well. So we are focused on identity checks because that's in many ways the biggest culprit. But it does also pull in a whole bunch of services that are nothing to do with identity.
Clear skies, for example, checking whether within the full text there are any paper mill potential issues there as well. OK thank you very much. Thank you for your question. He was so worried we were going to run over time by about half an hour. We went the other way.
Last call for any questions. We were so articulate and clear. Yeah, well where to find us. We're here on the booth in the main exhibitor Hall. And thanks very much for your time. And it's our 30th birthday, so join us.