Name:
AI Efficiencies to Optimize Workflow from Submission to Publication: 3 Case Studies
Description:
AI Efficiencies to Optimize Workflow from Submission to Publication: 3 Case Studies
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/caa82f28-0114-492d-b1e6-29089c579ef0/thumbnails/caa82f28-0114-492d-b1e6-29089c579ef0.jpg
Duration:
T01H06M39S
Embed URL:
https://stream.cadmore.media/player/caa82f28-0114-492d-b1e6-29089c579ef0
Content URL:
https://cadmoreoriginalmedia.blob.core.windows.net/caa82f28-0114-492d-b1e6-29089c579ef0/2023 - AI Efficiencies to Optimize Workflow from Submission .mp4?sv=2019-02-02&sr=c&sig=wTKdMxW0VFsh8lKuiTZEoXjaJLutufXwpL2lmtyRyjs%3D&st=2024-11-19T15%3A27%3A11Z&se=2024-11-19T17%3A32%3A11Z&sp=r
Upload Date:
2024-07-22T00:00:00.0000000
Transcript:
Language: EN.
Segment:0 .
Hi, everyone. Thank you for joining us. We'll get started in a moment. We're gathering, letting folks enter. We'll give it another. 20s or so.
And then we'll get started. OK um, Thank you and welcome to today's Scholarly Kitchen webinar I efficiencies to optimize workflow from submission to publication. I'm Lori Carlin, SSP education committee webinar lead. Before we start, I want to thank our education sponsors, Mauricio and Silverchair. We're very grateful for your support.
A few housekeeping items. Attendees microphones have been muted automatically, so please use the Q&A panel to enter any questions for the panelists? Questions will be answered at the end of the session. You can also use the chat feature, of course, to communicate with other participants or the organizers. Closed captions have been enabled. You can view captions by selecting the More option on your screen and choosing show captions.
This one hour session will be recorded and available to view on demand in a few days. And a quick note on SPS code of conduct and today's meeting. We are committed to diversity, equity and providing an inclusive meeting environment, fostering open dialogue, free of harassment, discrimination and hostile conduct. We ask all participants whether speaking or in chat, to consider and debate relevant viewpoints in an orderly, respectful and fair manner.
And now I'm very happy to introduce Avi staiman, who will be moderating this session. Avi staiman is the founder and CEO of academic language experts and a chef at the Scholarly Kitchen. Thanks so much, Lori. Really appreciate it. And excited to have such a big crowd here with us today. This evening to talk about the practicalities and pragmatic aspects of AI in scholarly publishing.
Um, from what I've seen, most of the attention thus far when it comes to AI has been focused on the question of, well, should we use it or shouldn't we use it, right? So we've been having a lot of discussions and debates about copyright issues. Is it plagiarism? What do we think about authorship? Is there a high risk of paper Mills and fraud? We've also had questions around why we should or shouldn't be using AI.
Is it reliable? Does it make our job as humans obsolete? Do we see it as an advance in technology? Does it make us more efficient? And I think all of these questions, they're important questions to ask, but they're a little bit disconnected from the reality of our day to day jobs. And what I want to ask a question today is, how and when is it already being used?
AI in publication, in publishing workflows, and how can it be used or how should it be used. And I think once we do that, then we can actually make it a lot more pragmatic, lower the level of fears around it and figure out where it's right to use and where it's not right to use. If you allow me before we start here, I want to get take a minute to get a little bit philosophical. Um, so in Christianity there's a term called casuistry.
And what that term refers to is the fact that the best way to come up with rules is not by thinking abstractly and theoretically, but is actually looking at specific use cases and understanding the details and the intricacies of those use cases, and then trying to learn general rules and greater principles from those use cases. And I think that's actually potentially a better frame for AI.
And the conversation we're going to have today. We're really going to get into the nitty gritty, into details and hear from leaders in the field of AI and publishing about one particular way they're using AI. And I think that can be really, really helpful once we have that graduated, that really, you know, lens which which really does a deep dive, then we can go back and ask some of the more abstract questions and come up with better answers.
Um, so how do we actually go about the process of doing it? And I want to suggest for the audience at home is that the best way for us to do this is we actually need to first think about our processes and break them down into the most intricate details of a, b, c, D. Here's how I currently am doing x or y, and then think about, well, what there ways that I can power inefficiency in that particular process that maybe is slow, costly or inefficient?
And what might that look like? So I want to give you I want to start you off by giving you one example that I've been doing a deep dive into over the last few weeks, and that is the example of translation. I'm the owner of an author services company, so we do a lot of work in translation, translating research from other languages into English. Now there's sort of this, you know, simple, simplistic understanding, I would say, of, well, OK, what you need to do to, you know, translate a text using AI is you take the text, you throw it into openai, you get the text back out and you go out and publish it.
Well, that's kind of, you know, when I hear people talking in that way, I think, well, they haven't actually done the deep dive. They're just thinking about it in a theoretical sense. The actual process that we've built out is a lot more granule and is a lot more deep. So what I mean by that is first we have to analyze the specific text to see is it a good fit for machine translation for ai?
Or maybe it's not a good fit. We then use software to break down the text into small, sensible components that can work within the AI framework. We iterate continuously to improve the prompts and play around with prompts and see what results we get from those prompts. We have to retrain our staff, our managing editors to review the work and ask ourselves, are we happy with the output or not?
And then how do we go about identifying errors and issues and fixing those? How do we build glossaries for consistency of language? And finally, how do we make sure that we're avoiding hallucination or made up citations and references? And how do we deal with those issues? I'm not we're not here today to discuss and do a deep dive about the AI for translation. But my point here is, is that when we think about how I can be used, we need to realize that it really is the devil in the details.
And that's what we want to try and get into today. And there's no better person to start us off than Sarah Taylor from Springer nature, and she's going to talk to us and give a little bit of an overview about how I can impact publishing workflows and following. Sarah we're actually going to have three deep dives into specific use cases for AI in publishing workflows. Julia kostova from frontiers, we're going to have hongshou from Wiley.
And to cap it off, Justin Smith from hum and all of them. I've heard speak before. And are really excellent. So but let's start off with Sarah. So Sarah BJ Taylor has a unique career trajectory as she transitioned from a PhD scientist herself to business executive about 13 years ago since leaving academia. She's played a pivotal role in the growth and leadership of research squares, author services and Research Square preprint platform, most recently serving as research Square's chief operating officer.
Her tenure at Research Square involved significant contributions to strategic and operational aspects of the company, including the introduction of AI machine learning technologies over seven years ago to automate product delivery processes. Following research Square's acquisition by Springer Nature in 2022, Sara assumed the role of vice president of AI products. In this capacity, she is responsible for driving innovation, particularly through a digital writing assistant for researchers, which offers AI tools that Sarah led and integrated across Research Square.
With no further ado. Sara the floor is yours. All right. Can you see my slides and be hear me OK. All right. Well, Thank you so much. Abby, for the opportunity to participate today. As Abby said, I am our vice president of products for Research Square company, which is a division of Research Square.
And just a little bit more of an overview on that. Research Square is part of Springer Nature group. And under the Research Square umbrella we offer professional services was founded in 2004, and since then we've provided language, editing, translation and other services 4 over 1 million authors who are preparing to submit a manuscript for publication. So right before the point of submission to a journal, we recently added our software as a service product line, and we introduced the Research Square preprint in 2018, where we're currently pre-printing approximately 1800s manuscripts per week.
So in a few minutes I'll talk more about Ian. But first, as Avi said, he asked me to start by providing a brief overview of the landscape, especially as it pertains to scholarly publishing. So I think we're all familiar with what to me has felt like of large language models that have come to the market in the past year. So for me, it feels like every day when I wake up and check the news, there's some new technology or some new tool and it can certainly feel overwhelming at times to keep up on our understanding of how the various tools work and how publishers and researchers are using these tools in the publishing process.
So Gartner research uses the hype cycle to demonstrate that following the introduction of a new technology, there's a wave of excitement. But with that, excitement can come a lot of fear and anxiety during which people have an inflated expectation of what's possible and also uncertainty of how it will impact them in their daily work. But what can be easy to forget is that neural networks and have been around for quite some time.
What has changed is how quickly models can be trained and how few data are required to build these high quality models. And that shift really began in 2017 with the introduction of the transformer architecture, which is essentially the ability to input language in one form and output language in another form, for example, English to Spanish, for example. And it does it in a very efficient and high quality way.
And so there have been several advancements outside of language. But I'm going to talk about a language example today. Um, things like imaging and speech have seen a lot of advancements too, and these have all been employed in publishing. But often behind the scenes in our operational workflows. And today we'll be providing some case studies to demonstrate this and hopefully shine a more positive light on what's possible with AI.
So in this image, I'm showing the relatively small time period in the publishing cycle when we provide services at Research Square. When authors are writing a manuscript and preparing to submit. So even in this one example, I'm showing just a small fraction of all the digital tools that are available to authors and publishers to prepare a manuscript.
And these are all again, embedded in our operational processes, whether as a business or as an individual. So before I get into my example, I just wanted to say that importantly, the biggest lesson that we've learned at Research Square and Springer Nature is that for something as important as the publication record, there are very few cases where a tool can replace a human fully.
Rather, what we have found time and time again, is that tools should augment human work and not replace human intelligence. And so the more that we can augment human work, the more time that people have to spend on complex decision making, which improves publishing quality for all of us. So expect that you'll see this as a running theme through the hour as we're all giving our examples.
For my case study, I'm going to focus on automating language editing. So as I mentioned, since 2004, we've provided services to help authors prepare to publish, and our language editing service is performed by PhD editors in the Uc and is by far our largest product offering. So when the transformer architectures were published in 2017, we saw this as an opportunity to automate our internal language editing efficiency beyond using things like word macros, which we had been using, you know, from the beginning.
So the way that we train the tool is actually pretty similar to the way that you would train a translation tool, except instead of putting in English and the corresponding Spanish, we're putting in the before editing and the after editing language. So we train a neural network and we build an AI tool from that. Here's what our tool looks like. We're primarily working in word, and the changes that the tool makes are in track changes.
So in this one example, you can see some changes to singular plural capitalization, punctuation, articles, conciseness, tense word choice, and even some phrasing down at the bottom. And this looks like a lot of editing by a tool, right? But actually, at the end of the editing process, our automated tool is only making about 60% of the changes. There's still changes that humans must come in and make those other 40% What the tool does is it makes the easier changes.
Things like punctuation, terminology, things that a machine can easily learn as a rule in either or. But what the people have to do is still come in and understand. If I'm a Cell biologist and I'm editing a Cell biology paper, I need to know what the author's trying to say to really help, um, make the best edits possible and most importantly, to avoid changing the author's meaning.
So in addition to, you know, gaining some efficiencies in editor speed as they're going through a manuscript, we've also used our efficiencies and savings from this tool to scale our hiring as we've grown. So as we've grown, we've not needed to hire as many people as we would have without the tool. And that leads to some savings where we've been able to reduce customer prices we've upskilled employees in.
So we've got life scientists doing linguistics, myself included. Um, and we're reinvesting in New tools as well. And most recently, we launched our product in September. Kiri is a writing assistant that we've are currently building, and the goal is to bring together a lot of those functionalities I showed you in that slide with all the logos, whether there are tools or other tools into one place where authors can go to write their manuscript.
I'm currently carry is available as a subscription and we're offering digital editing and digital translation and it will be a lot more features coming in 2024. Also all Springer Nature authors can use our digital editing tool for free before they submit for publication, and that's accessible through Springer nature's pre-submission checklist. So with that, I'm going to hand off to the next speaker. And Thank you for your time.
Look forward to the discussion. Please do contact me if you'd like to follow up to discuss or have questions for me. Thank you. Thanks so much, Sarah. That was that was great. Very clear and to the point. Just one quick question that I'm curious about, Sarah. Where do you think we're currently at on the Gartner hype cycle?
That's a good question. So I think Gartner would say that we're in that the scaling the path to productivity and really realizing what the tools are capable of. I personally think publishing is still in the initial hype area where we don't fully know how authors are using these tools and how publishers should prepare to respond to the new, you know, these new tools.
Um, that's my, you know, that's my thinking. And, and it's really just because we have a longer product cycle, you know, it takes a long time for us to get through a manuscript. Um, so understanding all these cases, I think we've still got a little bit of time. Yeah, I think you're right. And I think. I think it also depends on how involved we've been in testing out some of these tools.
I think that, you know, you start off with this great hype, but then once you get into the weeds, you get you see all the problems and all the things you need, challenges you need to overcome. And then you have a much more realistic view on the other side of the potential benefits. Great thanks, Sarah. Appreciate it. All right.
We will get to the Q&A at the end. So thank you for your patience. But we're going to continue ahead with our next speaker. Um, Dr. Julia kostova is the director of publishing and head of us division at frontiers, the third most cited and sixth largest scholarly publisher in the world. In this role, Dr. kostova leads frontier strategy in the Uc with the mission of making science open, accessible and trustworthy, supporting researchers, authors and institutions in this transition.
In her role, she is also actively engaged in science and technology policy, advocating for support for sustainable, open science and investment to build trust in science with frontiers. Being a pioneer in AI, in the scholarly publishing industry, Dr. kostova has spoken at leading forums and events about the myriads of ways in which this technology will transform the knowledge landscape. A 20 year veteran of the publishing industry, Dr. kostova, has worked at leading publishing houses, including Wiley's global research division and Oxford University Press.
Dr. kostova is a strong advocate of cultivating and mentoring women executives and leaders, having held a fellowship at women in power at 92y in New York city, she holds a PhD in French literature from Rutgers University and has taught at Columbia and Rutgers universities for over a decade. Julia, Thank you so much for joining us today. Thank you so much for having me, Avi. And Thank you to everybody who joined today to hear a little bit more about AI.
I am very excited to tell you a little bit more about how we at frontiers are using AI to optimize workflows and of course, to meet better the researchers needs, as we said in this wonderful introduction. Thank you very much for that. Frontiers was one of the first in the industry to develop AI tools for specific uses in the publishing process. And I'm pleased to tell you a little bit more about what this technology has done and how it has helped us serve our communities better.
Before I jump into it, let me tell you a little bit about frontiers. Susan, if you don't mind. Next slide, please. Frontiers was founded about 15 years ago by 2 neuroscientists who found themselves frustrated with the existing publishing system, which they felt didn't meet the needs of researchers and hampered scientific discovery.
One of their gripes was with the outdated technology that researchers often had to interact with during the publishing process. They launched frontiers as an open science platform focused around the researchers needs and, of course, on advancing scientific discovery and really committed to building it around cutting edge technology, which is very much relevant to what we're going to be talking about today.
In the 15 years since we were founded, frontiers has proven very popular with researchers all over the world. We have grown to be the sixth largest publisher and the third most cited publisher. The research that we published has received over 2 billion views and downloads from readers all over the world. And of course, because our content is not locked behind paywalls, we get a very robust usage from low and middle income countries.
That is really quite important to us. While frontiers originally started in the neurosciences based because of our founders kind of disciplinary affiliation, we now publish about 200 journals in a range of fields, including physical sciences, life sciences, health and biomedical sustainability, humanities and social sciences among them, and in some of those fields, frontiers has the highest ranking title titles in their respective categories.
And as we have grown, of course, we have also partnered with a lot of institutions and funders around the world, counting about over actually 700 of those. I've also taken the liberty of including here the average time from submission to publication for our journals from 2022, which is a metric that I'll touch on today in my talk as we talk about AI tools and how we've used those. next slide, please.
Susan frontiers sees science and research as critical for advancing solutions to the many crises that we as a world as a society are facing, are faced with from pandemics to climate change to diseases to I mean, you know, to many of the myriad of those that we need to solve. And so having learned what we did during COVID about the potential of open science to accelerate solutions, our mission is to make all of science open so that we can rise to the challenge and solve the problems that we are faced with.
And AI has an important role to play in helping advance scientific discovery here from our perspective. So today I want to tell you a little bit about the AI tool. Thank you very much. The AI tool that we've built in house that is the case study that I'll present with you. Um, it is called era, and that stands for artificial intelligence review assistant.
It was launched in 2019. And as I like to joke, this was before everybody could use cogently in a sentence. Um, era is a bit of a story, to tell you the truth. Next slide, Susan. It's been featured in a variety of publications, including the New York Times and in the i.e. spectrum magazine. So really kind of a pioneering tool from our perspective that we're very, very proud to have made available.
So how does era actually work? I want to tell you a little bit about the model and how we're going to how we use that in our day to day. Next slide, please. So Iowa relies on a wide range of data sources, including a lot of data about publications and researchers. It combines machine learning with knowledge graphs. Right? and we use era in two key areas of the submission workflow for quality checks during the submission and peer review process to ensure quality and speed and to meet the researchers needs that we want to serve today I'm going to I'm going to focus on the quality checks rather than the expert recommendation part of the tool just for the purposes of this presentation.
So next slide, please. Um, so every manuscript submitted to frontiers is run through IRA. Right um, IRA performs over 20 checks, if you don't mind going to the next slide. Susan Thank you. Starting with plagiarism. Conflict of interest. Compliance data availability, ethics statements or guidelines.
Article guidelines. Compliance formatting. I mean, some of those, of course, are done industry wide. The industry wide, though, they're not necessarily automated or done consistently. And, you know, sometimes it can take a couple of days for them to perform. But IRA goes beyond those, right? So we also look at things like author identity, duplicate submissions, commercial copies, trial registration number, among others, really to ascertain the provenance of the research that is submitted.
Right and going even further, it can help with the validation and the assessment of certain elements of the content. So, for example, it checks images for manipulation. Is there patient data that hasn't been anonymized, for example, face detection? Is it suitable for the scope of the journal it assesses it helps assess the quality of language, it helps assess references and whether it contains paper mill characteristics.
Again, in my experience, this level of thoroughness and transparency kind of exceeds what is typical in the industry, and I'm really quite pleased to see that. And again, all of these checks are performed on every submission before it is before it is sent to the editor for human evaluation. But wait, there's more. If you don't mind.
Next slide, please. We also use IRA to perform quality check checks during the review process. So IRA helps us with things like confirming reviewer identity, identifying or clearing potential conflict of interest between the author and the reviewer, or the handling editor and the author or the reviewer and the author or the editor and the reviewer. Just because we really want to make sure that the process is free of undue influences and as kosher as it can be here.
We also look at the quality of the submitted review. Is it is it a one word long, for example? Yes no. Has it answered all of the questions? And then we have know, it also looks at resubmission analysis. In other words, how does the resubmission address or match up with the original manuscript and certainly with the recommendations that are made during the peer review process?
So this is a level of thoroughness during the initial validation and review stage that I think is quite in-depth that I think kind of exceeds what we are able to do in the industry typically. And, you know, and the other part that I think is worth mentioning here is that IRA does this automatically and literally in a matter of seconds.
So next slide, please. What you do see here on this slide is what the tool looks like for our internal users or for the editors who are interacting with it. You see a list of categories that IRA has checked. And in this particular example, everything is green. That is it is clear. IRA currently supports over 20 of these checks and it flags manuscripts which my colleagues my team members need to look at and evaluate in the context of the human assessment.
to the point that Sarah made earlier, it is very important to us that IRA does not make any Black box decisions, right? IRA is here to support the teams, to support the editors. But human validation is required for every decision that is made and everything is transparently recorded in our system. In other words, it assists, it augments the decision making process. It certainly does not replace that.
Um, next slide, please. And this is what it looks like when the tool has flagged a problem. So as you see, there's red there. It literally explains, visualizes and explains what the issue is. In this particular example, you know, IRA has identified a has detected a feature duplication, and it leads you to the source.
It highlights the elements that are flagged and visualizes clearly where the issue is. It also leads you to the source so that you can go and examine, you know, in this case, the images in a lot more detail. Um, and again, next slide, please. This this is a point that I think it is worth mentioning, which is why I was so glad to, to see Sarah, um, open with it. IRA assists in the validation of submissions or reviews and flags potential problems.
But the final decision is always in the hands of a human editor who evaluates the flags, but also the context, right? And I think this is a really important point. Um, there is a lot of expertise that, that our editorial team members or our editors bring to this and you know, an AI tool can support them with, with data. It can support them with specific uses, but it certainly does not replace the broader expertise that they bring to the publishing process.
So now that I've told you what IRA does, I want to maybe just take a step back and tell you a little bit about why we're doing this, why frontiers has invested resources to develop and continue to augment this system. Next slide. Thank you very much. So in our view, AI enables us to focus on meeting the researchers needs.
I mentioned earlier that researchers we know from survey to survey, and this is certainly something that all of you have seen and heard before. Researchers are very focused on the quality and the speed of the propagation process. These are top of mind concerns for them. When it comes to speed, I don't I'm not aware of any kind of general industry data points that I could refer to.
But in my experience, it is typical for publishers to take a couple of days to complete those initial checks on manuscripts, you know, the plagiarism check and conflict of interest, you know, manuscript elements and that kind of stuff. Um, not all. Not all publishers perform image validation as a matter of course, and certainly at scale. And in my experience, when it is done, it can also take a couple of days because it is often still done by, by, by human team members, right?
And so that's 3 or 4 days right there. Whereas IRA performs all of these checks in a matter of seconds. And so really shaves off, you know, shaves off considerable amount of time from the review process. It speeds up speeds up the process for editors as well. Um, we're also focused on the researcher experience as a whole in whatever capacity they may engage with frontiers.
And so in this case, IRA helps editors and reviewers because it performs tasks that many times editors are burdened with or with performing manually and themselves or themselves. So we're very, very conscious of the fact that researchers are under a lot of pressure, under a lot of competing demands on their time, and we want to make sure that we use their time efficiently and we see that we see tools as playing an important role in allowing us to do that.
And then the second thing that is, is that as you saw, IRA performs quite detailed and expansive integrity checks. It is you know, I think it is a truism by now that AI is very good at certain types of activities like pattern recognition or detection of detection of anomalies or recommendation services, for example. And we're I think in my view, we are very wise to take advantage of this technology.
I think it's good for science. I think it's good for the scientific record. And I think it also allows us to identify issues upstream before we mobilize reviewers and editors, um, to, to, to evaluate those manuscripts. And last but not least, it does allow for scalability and efficiency. I mean, it is a fact that scientific output has grown exponentially, certainly over the last 30, 40 years, but even more so more recently.
I think we all saw that during covet, right? And it is vital that we continue to evolve our workflows to meet the community's needs without unduly burdening them as well. And so we see tremendous potential in AI for transforming the way that research is disseminated. Next slide, please. From ensuring quality in a transparent way to supporting and augmenting decision making with data driven approaches to allowing us to publish and disseminate research at scale faster, and ultimately building trust in science and enabling scientists to see the broader impact of their work.
I'm happy to pause here now. Thank you very much. Thanks, Julia. That was that was excellent and fantastic. And I think, you know, it got me thinking about this question that I've seen a lot recently, which is, are we OK with AI doing peer review? Right that's just one of the big this is like a big question that, you know, you might see debated on forums.
And most publishers, I think would jump to say, no, that's, you know, that's terrible. And my what I was trying to give over in the lead in the intro is that well, let's break down what do we mean when we say peer review. Right and what are the different steps and functions that a peer review actually addresses? And if we break them down into their smallest atoms? So there are more technical from what I saw in your presentation, there are more technical sort of validation issues.
And that if we can automate those, not only will it be more efficient and better for the reviewer, the editors of the journals will also be thanking us because then they get to refocus their efforts and their brain power on the novelty of the article and how it fits into the greater scientific literature. So, you know, I think the answer always needs to be whenever we ask one of these big questions, can I yes or no?
Well, then the question becomes, what is the actual problem that we're trying to resolve? What are the issue and how do we go ahead and address that? Yeah and I mean, this is an excellent observation, Avi. And I think just for avoidance of doubt, you know, I want to be clear that, you know, our tool is not you know, is not replacing peer review, but, you know, for certain cases like, you know, like, you know, the image manipulation kind of issue that I brought up.
I mean, you know, I think it is a fact that that AI is better able to spot things. It is not to say that a human cannot. But, you know, sometimes it can identify or detect issues that perhaps are not so obvious because the images have been rotated or because the background has been cleaned up or whatever the circumstances might be. So to me, this is a really important way in which I can support the decision making.
It flags it flags and say, you know, this seems to be identical to that other, but it just flipped in a 90 degrees. What do you think? And then the human editor can go in and say, this is actually a good use. This is not this is not an OK, this is not a permitted permissible use. So it is the way in which we think this is the way in which we're thinking about supporting the editors roles, the reviewers roles in making sure that they have, you know, whatever information they need, and then they can decide this is OK, this is not OK, and therefore make a decision that would fit in the context.
Fantastic thanks, Julia. Appreciate you taking the time today. Um, I want to move on to our second case study with Joo from Wiley Hong leads, the intelligence service intelligence services group in WileyPLUS in the Wiley partner solutions, which designs develops and promotes intelligence services and products to enable automated, intelligent and efficient research and publishing journeys by leveraging AI, big data and cloud technologies.
He also established and heads the AI R&D team, which designs award winning AI based solutions and next generation Information Discovery systems for scholarly research. His personal research passion is supporting publishers success in their transition to open access and open science and helping global researchers know more, do more, and achieve more. Before joining atypon in back in January 2017, Hong applied machine learning algorithms for the insurance industry as the CTO of digital fine print served as a senior software engineer and development manager at Schlumberger and developed racing games for the British video game developer eutechnyx.
Hong holds a PhD in 3D modeling with artificial intelligence algorithms from University in Wales as an MBA in digital transformation and strategy from the University of Oxford, Oxford and a master's degree in computer science from the University of Sheffield and certifications. Certifications in AI and cloud from Stanford University and Google Hong. I think I could probably live five lifetimes and not have the credentials that you have.
So that's amazing. As well as serving as a distinguished expert in the National key laboratory of knowledge, mining and service for medical journals in China. Hong is widely published on computer science and AI topics and presents regularly at prominent industry events. He was recently named a Scholarly Kitchen Kitchen chef, which we were all delighted to have him join. Hong the floor is yours.
Thank you. Thank you, everyone. Hello, everyone. We are taking this opportunity to introduce, you know, one of the services, the content classification, as I think the many of you may know, auto content classification is one of the most popular applications and is the foundation for many applications.
So I would like to introduce this services which the partner solution has designed and developed for the quite well with the real some real examples. And also I want to highlight, you know, although the AI is one of the key enabler and the very popular and important technology now, but in order to build and design a successful delivery, successful product and also can be successfully used in the real business environment there, we should also consider the many other important factors, or even more important than the technology that I, for example, the human or the as you know, the Sarah and the Giulia just mentioned, reflecting the presentation and also the process operation and the product, et cetera.
So let's start. Content classification, often called the auto tagging, is an automatic process which can save the publisher time, expense and effort for automatically classifying the multiple types of the content. This can be used for driving the revenue by generating the new content bundles. There are three types of auto taggers based on the publisher specific and public or the multi-discipline that are different.
Taxonomy so there are three. The main challenges for the content classification from the taxonomy itself content classification and operation workflow. For example, taxonomy. So creating and maintaining the tags is costly and time consuming for the publishers often require the domain expert and extensive effort. For example, tagging the 5,000 terms can involve three taxonomies and takes 6 to 18 months.
Additionally, there is a challenges in maintaining and updating the taxonomy classification, so tagging and the maintaining the document is a labor intensive make it impractical on the large scale. This lead to the content being divided in the publisher specific data silos hindering the SeamlessAccess query retrieval and the recommendation workflow. So the process often requires multiple vendors for taxonomy creation, classification and delivery, leading to the fragmented content and a slow, complicated process as it to move between the different vendors.
It also lack of effective feedback feedback loop. So the traditional classification workflow is shown at the left hand side, which is more waterfall process and lack of the feedback loop and the publisher involvement. The new approach shows at the right hand side tackles all the three challenges described in previous slide and follow the agile process. It streamlines the taxonomy creation and classification by integrating the literatum into the largest scholarly publishing platform in the world, enabling the faster and more efficient process without third party involvement.
Our workflow is markedly quicker than the traditional method as we continuously update the taxonomy and autotagger use collective intelligence from publishers under the our team itself. So our improved auto Tigers enhance the advanced technology content in two steps. And then there is also the AI is trained. And then for any given new articles, now we can know the automatically tag them and predict the topics.
Now it's almost the 30% more accurate. It's effectively the handle the rare labels and provide the confidence score for each tag and reduce the curator effort by focusing on the items with the low scores. So our tools can consistently the wings about us competition international competition sponsored by Google and NIH elsewhere in biomedical content classification and discovered for six years last six years in a row recently outperforms the 40 teams, including Google, CMU and NIH.
So to solve the publisher taxonomy creation. We have the creation pinpoint in cost more cost effective way. We also create under the offer the multidiscipline taxonomy by leveraging AI under the human subject matter expert, which derived from the Microsoft field study, and it includes the 250,000 tags covering the 19 disciplines and have six hierarchical levels make it the versatile and suitable for all publishers, whether they already have the semantic taxonomy or not.
This taxonomy is customizable to meet the specific needs of the individual publishers. It features a regularly updated metadata and structured for the accuracy and relevancy, including the processes for the de-duplication and refinement and hierarchy and leverage the collective intelligence to ensure the tag the remain currently and relevant. So we still need to closely collaborate with human. We need humans input to refine this.
He has a few of the real case scenario where this auto Tiger has been deployed from the small to large size of the taxonomy. More and more customers are using this auto tagging services. So as a satisfied with it, here is a testimonial from one publisher. As already mentioned, our optimal taxonomy is continuously updated.
We leverage the collective intelligence of participant parties to achieve this. There's extra scenario. A publisher who adopted the article taxonomy suggested 56 new tags to be included. The 31 attacks are resolved under 25. The new tags are added based on the analysis and subject matter expert. The new version of Article taxonomy are released within one month, which is really quick and being part of the collective intelligence intelligence.
And the system is beneficial. The publisher can be the beneficial as a publisher to tailored needs addressed. The article taxonomy and its autotagger are consistently, constantly, quickly improve and the participant the publisher get new version for free. Here are some of the real case scenario where taxonomy and the content classification are applied. For example, you can display the topics in the topic pages and on a content item level.
The article, level, article, page and rely on the topics instead of the raw search term for improving content discoverability via the topic search functionality and rely on the consistent topics for the research analytics tasks such as. Identify the trending topics under the research outcome and influence for the specific topics. And the audience. Profiling can be facilitated using topics.
You can now infer the user's interests and expertise and lead to the more accurate the personalized services and for the more effective the marketing campaign. Advertise, targeting, et cetera. So lastly is the topics also be used in the pre-publication phase and this especially in the submission and review phase really on the topics check the fit of submitted articles to the journal's scope or to quickly suggest relevant reviewers, which the Giulia also mentioned in her presentation.
So thank you. Thank you so much. Yeah, that was really interesting. And yeah, and I have no doubt that we're going to follow up with some questions in the Q&A period. But just in the interest of time, we're going to I'm going to move ahead now to our last case study presented by Dustin Smith from hum about building special issues with data and AI.
So Dustin is the co-founder and President of hum, which provides AI and data intelligence solutions for publishers. For over 15 years, he's worked at the cross-section of scholarly publishing and tech innovation. He leads, hums, product vision, strategy and development, and oversees solutions that leverage AI to unify and activate first party data, including alchemist hums deep AI suite.
He's particularly passionate about helping publishers harness data to drive reader engagement content, intelligence, author, reviewer, recruitment and more. Dustin the floor is yours. Just in time. Well, Thanks for the nice intro, Abby. Uh, so. So hum is the closest category we sit-in is a customer data platform or the AI era we have.
I deeply at the core, and we'll talk a little bit about that and do a little bit of talking and more showing. And what we're going to talk about today is recruiting authors using a mix of first party data, and that's data observed on your platform. And I to really understand that data. And part of the efficiency notion here is if you're targeting the right topic area, you're targeting the right people in the right sort of nuanced way, then that is a very efficient process.
And going by intuition and sort of broadcast marketing tactics we're actually going to be using. Hi, Susan King. We're going to be using Rockefeller University Press and journal of Cell biology as an example here with real live data underneath and some of the things that are in the special issues pipeline. But just to give you a little background on what we talk about when we're saying things like first party data, behavioral data, observed data, we're talking about all of the data that's coming off of user content interactions.
So all the different types of audience members, all the different types of content content is really anything that can hold information. And that's the bulk of the data that is coming off of publishers sort of stack. And it's more than 99% It's valuable insofar as you're also combining it with all of the rest of the very good data that you have internally and serves that purpose both to observe that first party data, as well as to combine it with all of the other sources of data that you have the very rich sort of data assets that publishers have and what this looks like in practice.
And this is actually my profile from Rockefeller University press, and this is connected to their silver platform to campaigner their email marketing system. And this is a record. This is a memory, a deep memory of all my interactions across the Silverchair platform. And so this is an individual piece of content, the level and depth at which I've interacted with it when I've interacted, it's understanding me as an individual person, an individual reader and the depth of which I'm actually interacting with that it's rolling and combining all of the other data sources, and it has a deep understanding of the level and extent of topic affinities, the intensity of topic affinities I have.
We're going to be talking a fair amount about Cell biology today, and ultimately that's reflected in the sort of topic affinities that you see there. That's an understanding based upon my aggregate content consumption behavior. And that's valuable in the individual, but it's even more valuable in the aggregate where you're able to use our audience Explorer to build audiences based upon things like topic engagement.
So show me the PeopleCode who are actually engaged with a particular topic are likely or predicted to be engaged with a particular topic, and we'll show that in a sort of end to end fashion here shortly. This is an AI talk in some respects, so we did want to address that more head on. You do hear hum is the company name. Hum is also the platform.
And that's really our nervous system. So for Rockefeller University press, it's plugged into silverchair, it's plugged into their other platforms and it's receiving sort of electrical pulses from those systems. It's understanding individual people, it's resolving individual behaviour, and it's rolling that up to profiles about people and content and topics and organizations.
So structured intelligence about those sorts of things. And alchemist is our AI engine at the core of hum, it's really doing the sort of deep understanding of all of that data and it's driving the insights, recommendations and predictions. So it's really the brain that sits on top of that, that nervous system. At the foundation of alchemist, we're introducing more brand names.
Here is lodestone. Lodestone is our foundation model that we released over the summer open source. We're already working on lodestone V2, and this serves as the sort of foundation of understanding and it sits in the interpretive branch of LMS. You're very familiar with the generative branch, which you saw earlier, and in the sort of initial presentation. And alchemist is a fusion of multiple models under the hood, combining the interpretive with the generative.
And the special issue example is going to show a mix of both. So how do you really deeply understand things and how do you actually generate them? And they're actually most useful when used together. So this is an example of how publishers are using and how you can potentially use these sort of tools, the mix of data and AI to discover a special issue topic, to launch a special issue to target an audience and promote that.
So we'll just get right into it. So this is again, Rockefeller University Press. This is journal of Cell biology, and this is a dashboard devoted to finding interesting areas for potential special topics. And so this is using alchemist applied keywords. So the ones that are hum alchemist applied keywords. And this is only looking at things published within the last six months.
And only activity from the last six months. What this is showing is places where there's really high engagement and relatively low content count. So where are the topic areas that people are asking more for or there's a lot of energy building around and it looks nicer in live software. But ultimately, this topic of autophagy and cells is ultimately something which rose to the top of this individual dashboard for journal of Cell biology.
And if you look in that center column there, there's a similar sort of bundle of content, six articles in total covering Cell biology. And so what we're going to do is we're actually going to take those individual articles, their titles. And their abstracts. We're going to pass them with a prompt to a generative model. And in this case, we chose GPT four turbo, which is their brand new model that manages long context rather well.
And what we're going to ask for is ultimately recommendations of special issue topics and descriptions. And so we asked for three in particular, and these are factoring, factoring in those six articles and abstracts, the titles and abstracts, along with the really tremendous knowledge base of a large language model, to say, here are some of the places in which you might want to pull further.
So we're going to pick the first one. Deciphering the puzzle of autophagosome biogenesis could have, of course, picked an easier one to pronounce but didn't. So we're going to pick that one. And we're actually going to in this transition here, we're going to pretend we've done all the creative on site and launched the special issue. This is something which they're considering for 2024.
So it hasn't been launched yet. But but this is really inside the belly of the beast. And what we're doing is we're taking that special issue description here. So this sort of text here and we're going to go find people who are predicted to be a good match for this special issue. So this is using alchemist deep understanding of both people and content.
And so we just pasted this description here into profile search and you see the distribution of people who are interested in this topic area, this sort of fusion of topics here, not picking out keywords. It's the full set of relation of how these words are structured and we're only going to pick the top people here since ultimately what we're going to be doing here is an author recruitment sort of workflow. If you were promoting the special collection, you'd probably pick a broader set of people with maybe a little bit less intense interest.
We only want the most intensely interested people. So that's the top sort of part of the tail of the distribution here. And there's about 50,000 out of the entire audience who fit that criteria. Well layer in a couple of other things. So since we're going to be recruiting authors, we're going to go after people with a high level of engagement with our overall.
So we have this engagement score here, which we're able to dial in people who are in the 75 to 100 range. So, so very engaged with Rupp and also people who have spent time on the Journal of Cell biology in the last 90 days. So ultimately, what we want is a warmer audience that's deeply engaged and interested in this potential topic area. Many of these people are anonymous, but we're getting really, really good signals from all of the user content interactions that we can use to put together really highly refined and targeted group.
So we build a segment off of audience Explorer. So a segment is a group of people who meet common characteristics. And this is a live updating segment which can be used in home, can be used in outside systems if they're sensed as well. And so we have this segment of target authors for this autophagosome biogenesis special issue. So we're going to create that segment.
And then we're going to launch a campaign off of that. So we have what we call live engagement campaigns where we're able to put messages directly in front of people on connected properties. Silverchair in this case. So we'll put together one of these campaigns. So this is a campaign targeting those authors. Ultimately, we're naming that we're picking the target segment, the one that we just created, the target author segment that meet that profile search and level of engagement.
We're going to go to the next page. We're going to pick the display rules and we'll run this for four months or so. You see many folks who are running special issues typically keep that open in a 3 to 6 month sort of time horizon, and we'll display that once every RA21 days. Sort of a reminder, if they end up dismissing and not converting.
So we give you the ability to construct these custom prompts. We have a different we have a library of different prompt types. And you may want to do things like multi-step sort of campaigns where you're doing top, middle, bottom of the funnel. And this is something you might do at the top of the funnel where you just guiding people to the page, which has information about the special collection.
And so you'd be able to launch that. But you can go even further. You can do a lead generation form where you're saying this is a really highly targeted group and ultimately we want to give them the ability to give their information and we'll follow up with an email on that. And so ultimately, what that will look like is somebody within that highly targeted group will land on an article page within journals biology.
There will be a six second delay and ultimately they'll be presented with this message and the ability to convert on that. They drop their information into this, into this form, and then they'll automatically be put into a drip campaign where they get an immediate email follow up. So ultimately they're getting a note saying Thank you for your interest. Here's ways in which you can follow up further on this sits in their inbox in case they're not ready to convert right away.
So ultimately, this is in the context of a considerably larger sort of set of workflows. Ultimately, you can't do this as well as you could without AI. And so this is really gets to the sort of core fundamental efficiency here. But very pleased to be able to walk you through that. Thanks, Dustin. Appreciate it. OK I know we're up against the hour and I don't want to hold folks.
Appreciate everyone taking the time already for coming out. So I'm we're going to do one question that we are going to ask to all the panelists. And then what I would recommend is anyone who has additional questions. You know, I, I know that at least for myself, I can speak that I will try and give a link to my LinkedIn page so that people can reach out and ask any follow up questions that they have.
And then if our other panelists want to give some sort of way, you know, if their specific questions that people have that we don't cover and we don't have time to cover, we'll have the opportunity to do so. So what I want. There was a question that I believe was addressed to Julia, but I want to address it to everybody. And that is the question of, OK, you've got these tools, you've implemented them in different ways and different, you know, functions, and it's giving you some sort of output depending on the particular use case.
How I guess part of the question is, how do we know how to what degree we should rely on the output that we're given, even if it does seem reasonable, if it's something that we can't check easily? And then, b, is are there any sort of metrics or evaluation methods that you're using in order to check the veracity and quality of the outputs that you're getting?
So, Julia, maybe we'll start with you and then we'll go around the room. Yeah, great. I mean, great question. Not an easy one to answer, I will say. So we certainly do do track everything right. And there was a question somewhere about the rate of error as well. So all of this is you know, we're looking at a lot of data points to ensure that to ensure that we're getting the kind of results that we want.
I think the other thing that is important to say that perhaps didn't get an opportunity to mention is that we also provide input to IRA as we go through to make sure that it kind of continues, continues to learn. Right so now this is actually not, you know, a case of plagiarism or, you know, this is not a case of, you know, a human face that we need to be worried about deleting. So I think that was an important piece, that the algorithm is continuously being trained not only on additional data, but also on its own kind of performance in order to ensure that it continues to evolve.
So that's as quick as I can. I can answer this question. Avi Thank you. That's great. Um, let's go back to the beginning, Sarah. All right. Yeah great questions. How do we rely on the output?
Will say for our tools. We always have a human who's dependent on them somehow. And so somebody also asked a question of how have your how have you included your employees in this? We do have a team of about 100 internal full time editors, and they are critical and have been from the very beginning to being the point of feedback. We don't launch a new version of our model without the a team within that group going through a whole process of approval.
And so we very much rely on whether it's our employees or our customers who are, you know, how satisfied are they? What problems are they telling us about? And then, yes, we do track other metrics. We track things like customer satisfaction. But one number I should have mentioned in my talk is customer success. So with our digital editing tool, we have done several triage with Springer Nature authors, and one that we did last year showed that of Chinese authors who use the digital editing tool for scientific reports and the discovery series, they had a 14% increase in acceptance to their target journal.
So there are those kinds of metrics that we follow as well. Right thanks, Sarah. Hong Yeah. Normally, though, we invite the customers to work with us at the very beginning. So the customer provided a lot of samples, which we call golden data, and then we run the solutions to generate the results. So for the customer to evaluate.
So if there's a happy, we can move forward. And also the I think it's also another thing is important to build the user feedback loop and with continuous like the Julius said is continuous, you know, to improve the model itself. And for regarding the metrics, I think another one thing apart from the Precision recall F1 score, et cetera, I think another thing is the false positive rate is very important because you know, today we don't want to know.
The AI is not perfect. It can generate, you know, the many false positive, although it can detect something, but it's also annoying, you know, if the degenerates the many the false positive. So that you know, the users still have to spend a lot of effort to distinguish to deal with this is not good. So the false positive rate is also important.
Fantastic, Dustin. Well, one thing we developed is internal benchmarks. And so one of the key ones that we use is recommendations. And so you have kind of a basic baseline, which is last touch, which is basically contextual recommendations. So if a user is on a given page or article, ultimately, what would you recommend to them next given their last touch? And then you can move forward to more sophisticated methods like moving average.
And then we have very fancy neural net versions which people say Black box. But if you can show them the fact that we've improved the distribution, so made it tighter and brought it closer to 100, then that's ultimately something that's saying we're actually improving the performance of how these neural networks are performing. So the sort of core function, and that's an internal reference point, which we're able to use in multiple parts of the stack.
Right fantastic. So before I just hand it back off to Lori to finish up. First of all, I want to take this opportunity to Thank all of the four panelists that joined us today and really prepared what I think were interesting, concise, different, but all valuable presentations and use cases, which again, I'm a big fan of. And you know, one just kind of general lesson that one general takeaway that I think I've learned from today is that the right question is not what tools are out there that I can necessarily, you know, artificially implement, you know, force my into my workflow, but rather what are the pain points, what are the inefficiencies in my current workflow, what are the problems there?
And then are there solutions that either I want to build out internally or off the shelf, or that other publishers that I can work with and license from that I can then plug-in to solve a specific issue that I'm having and sometimes the answer to that will be us. Sometimes the answer will be no. Do not use AI for that because it's not ready or mature enough. So and I think it's important that we've moved past this sort of binary, you know, good, bad AI, but rather getting into the details in order that we can actually find appropriate solutions that really solve real problems that we have.
So again, Thank you to the panel. And Lori, hand it back off to you. Yes, Thank you. Quick 20s closing. Thanks to everyone. Adding my Thanks to Avi. Also very big Thanks to our sponsors, Silverchair and Mauricio. Very grateful for their support.
You will receive an evaluation. I believe there was a link in the chat as well. Please don't forget to fill that out. Look also for upcoming programming. We've got the journals Academy at SSP that will be presented in December. And lastly, this session has been recorded and registrants will receive a link. You'll have to log in with your SSP credentials to view and Thank you again.
We're done for today. Everybody thanks, everybody. Bye bye. Thank you.