Name:
How Researchers Are Embracing AI Search: A Growth Opportunity for Journal Publishers
Description:
How Researchers Are Embracing AI Search: A Growth Opportunity for Journal Publishers
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/02402454-1a49-4bc1-bfa9-fe3999cefd91/videoscrubberimages/Scrubber_1.jpg
Duration:
T00H24M58S
Embed URL:
https://stream.cadmore.media/player/02402454-1a49-4bc1-bfa9-fe3999cefd91
Content URL:
https://cadmoreoriginalmedia.blob.core.windows.net/02402454-1a49-4bc1-bfa9-fe3999cefd91/SSP2025 5-28 1415 - Industry Breakout - Digital Science.mp4?sv=2019-02-02&sr=c&sig=v6hV2AJNNcNpmvZmzRHST1itgWSvh2hqaLQQEIbS1m8%3D&st=2025-06-15T19%3A13%3A41Z&se=2025-06-15T21%3A18%3A41Z&sp=r
Upload Date:
2025-06-06T00:00:00.0000000
Transcript:
Language: EN.
Segment:0 .
All right. Looks like the doors are closed. I'm sure some other people will sneak in, but we'll go ahead and get started. It is 216. One minute late. But I wanted to welcome everyone here today to this industry breakout at SSP 2025.
Today, I'm going to be talking about dimensions author check, which is a set of tools for from digital science to help publishers with research integrity. So the agenda today will be kicking off with an introduction about me and digital science. Just to give you a background, if you don't know who we are, we'll do a 10 minus 0 0 0 foot overview of what is dimensions author.
Check we'll spend probably the majority of our time talking about the data, which really is the foundation and underpinning of the platform itself. We'll talk about the two ways that we've surfaced it as a product. We've got a web based dashboard, and we also have an API that's really built in a robust manner to integrate with submission systems. We are not going to do a live demo today because that's a recipe for disaster.
But I did want to invite anybody to come to our stand in the exhibit Hall 303. Any one of myself or the rest of the digital science team would be happy to give you a demo. Look some people up, look yourself up, look some colleagues up, and see what kind of data we can surface there. And then we'll wrap up today with a little bit of Q&A. So first off, I'm Tyler Roose, Director of publisher solutions at digital science.
I've been here for more than eight years, and I spend almost all of my time working with publishers large and small, on everything that digital science does. That could be meaningful and useful to them, and continue to do that as we iterate through new products and build new capabilities. My previous experience was at Ingram Donnelley and OCLC. So 20 years plus in the publishing space and publishing technology spaces in different segments.
So if you don't know who digital science is, I like to describe them as a company that invests, supports and nurtures innovative software companies across the research landscape. And that means everything from technologies to help early researchers get their first grants, all the way through to researchers who have awarded grants and are working in the lab and need to be able to track their results and track their outcomes all the way through to things that publishers are really interested in.
So some of the brands we have that are firmly entrenched in scholarly publishing would include dimensions, Altmetric, Figshare, Overleaf, and readcube. Some brands, you might be very familiar with. What we do at the end of developing these products is we fully invest in them and then bring them into our overall digital science infrastructure, which makes them less of a separate company that's being nurtured and more of a product that's part of our portfolio.
OK, so at the very high level, what is dimensions author check. It's a set of tools that surface publication anomalies at the author level. That's really the elevator pitch. What we were trying to do is really give publishers the ability to look at people and try and understand if there's anything in their publication history that could cause concern or be an area of risk or something that just warrants further investigation.
It's designed to help publishers promote transparency and trust in research, as all research integrity products are. But we wanted to do is reinforce that all the tools that publishers can put in place just increase that in the community overall. Generally speaking, from a tactical point of view. We were really looking to lighten editorial workload and improve decision quality. So at a glance, help editors make better decisions faster about the people that are submitting papers for review.
As known, we needed to make that flexible and scalable so that it could fit in and underpin existing publishing workflows. We didn't want to build a derivative analysis workflow where we had to jump out of your system and go do something different and go somewhere else to do some background checking. So we did all of our work to try and make sure that it could be in workflow, exactly where your editors are at.
So let's talk about data now. I asked Gemini to help give me a graphic to introduce the data section of my presentation. I simply say, give me an illustration that describes how important data is to scholarly publishers. I'm looking at it now. I don't see anything that's completely remarkable or disastrous.
It's just a bunch of charts and graphs and people looking at them. I find it most interesting, though, that it interpreted scholarly publishing not as a industry or a segment or a community, but as the name of a company. So we've got scholarly publisher as the name of the company in this example. All right.
So the foundation of data underneath dimensions author check actually starts with dimensions data. So that's a separate analytical tool that we've been selling to publishers for a long time, who use that data for a number of different business and use cases. Underlying all of that is a lot of publications, data around authors and around publications. So we have disambiguated people records for over 35 million distinct authors, and we match those to over 155 million publications.
So that foundation data. We know a lot about people where they've published, what areas they've published in, who they've published with. On top of that foundation, we have a couple of pillars of data that's specific to the author check platform that will describe and get into detail here. The first pillar is what we call atypical events. So these are papers that have retraction notices, expressions of concern.
Those can come from Crossref and Retraction Watch and PubMed and even some from dimensions. The metadata we receive into dimensions from a variety of sources often has retracted in the title of it. And so we know that particular publication has been retracted. We also developed a proprietary taxonomy for retraction reasons, and that's around ease of use and consistency, because all the data we're getting is coming from different sources.
They use different nomenclature for the reason for retraction. And we wanted what we wanted was for users to be able to see a very simple and streamlined approach to why that retraction was in place, because that's very important. If you're looking at analyzing a particular person based on their retraction history, you'll want to know why those were retracted. In addition to that data, part of our atypical event data set includes alerts that we're receiving from the problematic paper screener.
And they're looking at papers every day to look for things like phrases and cite publications. So really, they're taking this global view of publications and looking for things in the paper that are problematic, in addition to papers that have actually already been retracted. And for us that pillar, we call that a publication flag. So in that universe, a publication can have one or more flags that are one of these atypical events.
The next pillar is actually something we developed on our own. We're not aggregating this data from anywhere else. We've actually built the tools to derive the data and create a unique data set. It's based on a methodology developed by Doctor Leslie McIntosh and Simon Porter, who are digital science colleagues of mine, and they focus on forensic scientometrics.
And they postulated that collaboration behavior could be an indicator of potential concern. So their focus in this paper, they wrote and published in Scientific Reports, was that author collaborations were forensically important. If you're looking at paper mill output or potential authorship for hire enterprises, you can actually look at the collaborations of those authors and find certain signals that really make them a fingerprint, if you will, to be able to understand and define what that authorship collaboration looks like on those types of papers based on that publication they made, they actually went through and described all of the different properties that might fit into one of those collaboration networks.
That was a cause for concern. And that then generates in our system an author check what we call a network flag. So that second pillar we call a network flag. First pillar is publication flag. The network flag can be automatically and algorithmically applied, which is really great because then we built the algorithm and ran it across all those 35 million authors and 155 million publications we've got as our foundation, so that we could really create a very robust data set of network flagged people.
And so the properties that are in there, they're really described in detail in that paper, if you really want to understand how and what constitutes an unusual collaboration graph. But there are things like being highly prolific, so publishing hundreds of papers in a given year. Things like publishing earlier in your career, the normal pathway for a researcher is to publish fewer papers early in their career and then accelerate over time.
And we found that one of the properties of authorship for hire, for sure, was early stage researchers publishing quick mainly to try and help kick off their career. And then we saw that authorship with collaborating with few mentors, was also a signal there. So having 20 authors on a paper, 19 of which are very junior and one is very senior, that's another sort of marker, if you will, of a potential authorship for hiring and then authorship collaborations with very few multiple collaborations and similar collaborations.
So authors that publish a lot, but always with different people. So those are the kinds of things that sort of fold into the algorithm that we've built there, the methodology we've built. And again, that result set then means that we've got the foundation layer of all the people, publication data and publication history. On top of that, anything we know about flagged publications.
So an individual paper with one or more flags, and if that particular person is a member of this network flagged community, or we found some anomalies with their collaboration network, which point to potentially risky activity in their publication past. So on top of that data, we've created two different products. The first is a simple dashboard. It's a web based application, and it's really designed for one off investigations and special projects.
I should have had an asterisk here because why this product was also designed is although the platform is called author check. It really is for checking people. So you could really use this platform to look for editorial board candidates, peer reviewers, new peer review candidates, and really vet people just at an individual basis as a side project that you might be looking to do.
It's also good for investigation. So post publication, if you're really looking into authorship and want to understand a publication you're digging into or has had a problem, look at their past and say, should we have caught this up front. If we'd had a particular tool in place or a particular editorial check in place. It's really good for that sort of investigative work as you dig into those kinds of things.
It's a fairly straightforward application. Again, it's web based, lets you find researchers, look at their profile, look at all the flagging details we have about them, whether they be that publication flag or the network flag, and then lets you bounce out into other resources and look at things like their co-authorship network. So how that surfaced is in this particular use, I am searching on a person.
I'm going to further refine that by their institutional affiliation. I also could add their ORCID ID, I could also add of other published papers to really find the right person, because obviously we've done a great job of disambiguating authors, but there's always multiple resource results that could happen based on middle initials initial front names. Et cetera. Et cetera.
And this view down at the bottom of the screen, you'll see a chart that shows the proposed search results based on the search that I'm doing here. I did search on a person who's the number one person right now in the Retraction Watch top 100, so I knew there would be some examples to show you here of what we're doing. That's the top level researcher there. My suggested result set in that below grid.
What I'd like to call out here is that while we have columns for all the different things, I'm pointing out, whether they be flagged, publications and numbers of issues and if they're a member of a network flag, et cetera. We're using shades of blue here. So one of the things we're doing with this platform is not designing a red, yellow light system. We're not saying a researcher is good or a researcher is bad.
Don't publish this one. Do publish this one. No green check marks or red X's. We're really just highlighting visually that something you might want to take a little bit of a look at. The darker the shade of blue, the more outlier numbers you're going to see in those particular columns. So that kind of leads you back into maybe a stronger, recommendation for digging in there.
But we really want you as the publisher, you as the editor to be able to analyze that and really qualify the information that we have to be able to take action on it. So once you've found the person that you're looking for in this application, you can link to additional resources from that person. You could link to their author check profile. That's going to give you all the very detailed information about those flags that we found for that particular person.
You could link directly out into dimensions and look at the overall universe of work of that particular researcher. So you can even look at their past funding. You can look at how many publications they have had, over what period of time and what subject areas, through what journals, through what publishers. You can even look at things like, have they had any patent history, all kinds of big picture views.
If you wanted to do really research into that particular person, and then we let you drill down directly into their collaboration network as a visualization, if that's of interest to you to look at who they've co-authored with in the past. And are any of those people, suspicious people that you'd want to factor into your decision making. So if we bounce into the author profile, you can see with the chart on the bottom right, we can look directly.
In this particular case, this person has 711 flagged publications, which is 7% of their total publication output. And it gives you the exact reason. So I'll tell you the title of the article, the publication year where it was published, and specifically the flag that we have in our publication flag database. So it'll tell you, for example, we received a retraction notice from Retraction Watch.
We've given you that normalized reason for retraction and the date of retraction. The same thing goes if we've received an alert on that from the problematic paper screener for tortured phrases. So really, at a glance, you can take a look and say, OK, these are valid flags, valid issues that I want to take into my consideration of this person again, as a potential candidate for an editorial board, an author on a submission, whatever the workflow is, but it really gives you all the details you need to say.
Some of these flags, some are very concerning, some are not a concern at all. As I mentioned, you can bounce directly into dimensions as well. This has given me a direct link for Joaquim into his co-authorship network visualization. We've implemented vosviewer into dimensions, so you can easily see a really nice and easy to click on visualization of their co-authorship network and again, understand who they typically publish with, where they typically publish, what disciplines they typically publish in, and really get that good picture of how can I use co-authorship as a balance point for looking at author integrity.
I will say the majority of authors, I think we hope that the majority of authors aren't going to be a person of concern. So in this particular example, I did a search on Simon Porter, who was the co-author of the paper I mentioned earlier in Scientific Reports describing our methodology around analyzing author collaborations.
Nothing to see here. So just one little shade of blue that just says he publishes a good number of papers, no retractions, no areas of concern. So mostly you're going to be able to set your own thresholds around this, just going straight through your review process. And it doesn't even need to stop. But it's really nice to see that there's plenty of researchers in the database that have no issues at all.
The next product I want to highlight is something we've recently developed and finished, which is a full and robust API on top of all of that data. So going beyond just the user interface, we knew we wanted to build a simple and easy to use, robust API that could be implemented at scale. Again, typically in a manuscript submission system so that you could at scale, receive submissions, post queries to the author, check API, and receive data back about those particular authors.
We knew we had to build it scalably so it could batch. So if there's 20 or more authors on a particular paper, you didn't have to do them one at a time. And as your editor sees just one response coming back at a time until you get to the list, you can send it all in all at once, process it all, get it all back in workflow so they could really see. Are there any other issues based on the data we have that they would want to look into.
And again, you can customize and define those thresholds. So it could be that automatically submissions go through if the author check response from this API has nothing above a threshold. You say, an author that has one or fewer retractions, for example. Or you can set it higher so you could do things like automation based on the results set coming back from this particular API. The data actually is really, really interesting.
So all of it's optional. So as you get data back from this API in the system that you're working with can actually have a bunch of different stuff displayed. Collaboration network. You could have a table and chart of co-authorships with other retractions. So you could really take this data and do a lot with it to make it in workflow, make it very usable.
And then of course, as I mentioned, the scalability is really the key. We knew that most publishers are going to be looking at hundreds, thousands, tens of thousands of submissions over periods of time with hundreds and thousands of authors on them. And so it really can't be held up in a 24 hour batching mechanism or anything like that. So we built about high availability and high scale output and performance.
At the end of the day, though, it's actually a fairly simple API, just like I showed you in that user base interface, you're going to be sending in through the API people information so that we can make sure we're looking at the right person. That's just name affiliation. Orchid previously published IDs. If you are integrated in dimensions, there are unique IDs for every dimensions record for a person those 35 million.
So you could include that if you're fully integrated using the dimensions API for other things. And then the outputs are going to be fairly straightforward for each person you'll see publication history and areas of concern. So flagged publications retraction details, problematic paper screener details, network flags, collaborative details cetera. So it's really just going to come back in.
And again display that, take action on that. However you as a publisher or a journal or a discipline really finds your particular space would be able to take advantage of this to catch risky authors at the time of submission and really take action and do some research. Again, you may catch one and then you do some research on them, and it's all clear. That's fine too, but at least you're able to take control of the parameters of which you're looking at, and the foundation you've established for taking a look at authorship as a criteria.
So as I mentioned before, we're not going to do a live demo. But again, we invite you to come track us down again. We're in booth 303. Happy to show you a demo of the platform itself and show you some of the data we're surfacing and how you can link out into dimensions to do some further investigation. In lieu of that, though, we did do a private beta test with publishers small and large at the end of last year, mostly to validate that the data was useful to them and was meaningful enough in how they were analyzing and looking at people in their workflows and in their systems.
We've got some very good feedback from that, and I thought it was kind of interesting to highlight that. Most of them talked about the co-authorship analysis as being a component that was really useful, and especially things like pulling all of that data into one sort of view, said one of our publishers said that would have saved them hours and potentially not even the ability to pull all of that data into one view to do a particular review on an author that was coming in.
So we got some really great feedback, I think, from that and validated what we were working on and let us scale that back up and build the API that we've created. So with that, I'm going to say thank you. I want to open it up to questions. First off on this screen, the QR code which my transcription is posting over goes to the author check website.
If you'd like to see more details about the platform itself. And then on the right hand side there, you'll see the press release we did with Sage, who was the first publisher to adopt author check late last year. So with that, I'll open it up to questions. We do have a mic in the central hallway there, central aisle there.
If you do have any questions, feel free to run up there and ask away. Hello, Matthew Salter from the mathematical association of America. I've done some work when my consultancy on this. It's a really, really important area. And anecdotal evidence I've seen and heard suggest that for things like paper Mills, there is an optimum number of authors that makes that work well.
And the number I've always hearing is 6. And I wondered if you had any thoughts about that. Well, I can tell you that Lesley Mackintosh and Simon Porter, who wrote that paper, actually defined some of that. So 6 is about is the right number. And when they did their analysis and then they did their proof points around that, they actually validated other numbers. And you're right, the higher number of authors actually tended to not be paper mill.
It tended to lean towards authorship for higher enterprises. So they were able to further define that. But that's about the right number for a paper mill. And I think it's really interesting because that's exactly the type of approach we wanted to take, was go beyond the properties of a paper that could tell you things like that, and just look at the authorship and their previous collaborations as well, because you might see an author come in with a new submission who's had a history of those types of collaboration networks that would point you to say this person potentially publishes or puts their name on a paper published by a paper mill Hi Richard Winn, I did a poster on this topic last year at SSP, and basically I did an analysis of how frequently retracted authors were published in New publications every day on Crossref, and the number is often in the dozens, sometimes in the 100.
And my conclusion from that, and here I'm going to be a little controversial. Just to stir things up, is that publishers have no interest at all in the integrity of the checking their authors. And the reason I'm saying that is they're not really making any effort to do the checks, because if you get told that someone has been retracted, it's a lot of work to work out whether they were just a junior co-author on a paper.
And so my impression is that publishers and scholarly societies are just pushing stuff out to get the APC in and let the marketplace decide afterwards whether the work was good. So I don't expect you to reveal pricing, but just the philosophical question, how much should publishers pay for this kind of capability, and how much are they actually willing to pay on a per manuscript or per search basis. That's a great question.
I'll try my best to answer it without actually putting numbers to it, but I will say the more pressure that comes from post-publication issues and the more visibility. The more pressure publishers are going to have to go ahead and actually put some of these steps in place. So if we sold this for $100,000 and they had one article last year that was published and ended up being a $5 million lawsuit, obviously the Roi is there.
If they just had that process in place. So we're taking a look at that. We're seeing that pressure driving the pressure to put something like this up front. And that helps define the price point to some extent. In addition to content, data licensing and technology support and all that kind of stuff. But that's how we're seeing the price being driven is the cost of not doing this or potential cost of not doing this.
And so we see that as the driver and we actually see another two factors, the price of saying that I've got a really new cool research integrity, trust, integrity, program in place. And being able to say that to the community, say that to their authors and their submitters and saying, we're going to catch the bad actors here. So we want your quality submissions. So using it as a marketing sort of vehicle, if you will, to say this is a better journal to submit to.
And I think on top of that, we see that publishers are investing some time if they have efficient tools like what we don't want. In one of our quotes here, the human couldn't do this without the app, and that's because it would take them hours to do something this can do in minutes. And that, I think, was part of the trade off with to your point, I know they're fighting for APCs and submissions and all that kind of stuff, but at some level, they're also going to say we can process more papers, we can approve more papers if we've got something like this that just catches the other one.
And then the onus goes on the marketing department to get more submissions. Any other questions. All right. Timing wise I think we're wrapping up. I appreciate your time. Thank you very much for coming.
And again, have a great SSP.