Name:
Revolutionizing Scholarly Publishing: Practical Applications from the CACTUS AI Solutions Playground
Description:
Revolutionizing Scholarly Publishing: Practical Applications from the CACTUS AI Solutions Playground
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/a729334f-9ec9-4211-9894-f1bdfa3da9e7/videoscrubberimages/Scrubber_1.jpg
Duration:
T00H29M10S
Embed URL:
https://stream.cadmore.media/player/a729334f-9ec9-4211-9894-f1bdfa3da9e7
Content URL:
https://cadmoreoriginalmedia.blob.core.windows.net/a729334f-9ec9-4211-9894-f1bdfa3da9e7/SSP2025 5-28 1245 - Industry Breakout - CACTUS.mp4?sv=2019-02-02&sr=c&sig=ywFo%2Bllg27D3RnPq%2FTGPE84M8NDx3nDqSNlwd1%2FlkUc%3D&st=2025-06-15T22%3A41%3A18Z&se=2025-06-16T00%3A46%3A18Z&sp=r
Upload Date:
2025-06-05T00:00:00.0000000
Transcript:
Language: EN.
Segment:0 .
What Let's get started. Maybe we'll get a little bit more time at the end for some Q&As if you do have questions. Microphones right in the middle. Just walk up to there and ask your question and hopefully I can answer it. So thanks for joining us. I'm Jay Patel with cactus communications and I'm here to talk to you about our cactus AI solutions playground for publishers.
So just give you an idea little bit about cactus. We have been around for 23 years and we are a technology solutions and services company. We have over 1,400 folks globally, and well over 5,000 other freelancers and partners spread out globally as well. Our AI journey really didn't start with the emergence of large language models. It started many years ago, probably about a decade or so ago, as we started to integrate a lot of AI solutions into our own workflows and into our own processes.
And over that time, through all of our learnings and also acquisitions, we started developing tools for publishers. So our story is really more than about large language models or chatbots. It really goes back to machine learning, natural language, natural language processing and really, so we've built up to this whole era of large language models. And even before large language models were a thing, we were developing knowledge models, language models of our own before they really hit the public sphere.
So just a snapshot. We have over 300 members in our tech team. More than 50% of those are actually dedicated to our AI, machine learning, and NLP business. We've developed well over 50 different tools over the past 10 some odd years. And we process processed well over $20 billion words through our different services. And we serve well over 5 million users.
So the way to think about cactus is that we have the B2B side, and then we also have the B2C side. And so both of those sides come together. We serve well over 5 million users globally. A little bit about our team at least. The few of us that are here. This is our group CTO and EVP of products. He's in the room back there.
So if you want to say hi to him, Nikesh is the president of Global academic and of course myself. I'm based out of here in New Jersey. So why did we decide to build the AI Playground. Well there's a few questions that always come up. Every time I talk to folks about AI, it's usually how do I use it. Where do I start.
Is it safe to use. And so, we worked with the team to basically build an AI Playground for publishing professionals to explore different technologies in AI in a safe, secure fashion without being worried about violating your legal or it policies. Because the problem is when you just go to ChatGPT the public model and you put something in there, you're most likely violating your legal or it policies.
In our solution, it's actually all secure. We don't ingest any of your data. We don't use it for retraining purposes. Your data is your data. The output you receive is your output. We basically make it possible for you to safely engage with a lot of these emerging technologies and solutions. And there will be a QR code at the end. You can scan that and go read a little bit more about the AI Playground and also schedule a demo as well as register, register your interest for access.
So one of the first solutions we ever we built for this playground was a concept extraction tool. So this is classical machine learning. It takes a document. It looks at what terms, what concepts, what keywords are mentioned within it. And it maps that and it matches it to other similar concepts and keywords.
And this is really important when you get down to tagging your content or organizing your content. Helping you can help you create collections, help you serve better recommendations to your users, and really help you better understand what your content is. And it's really concept extraction is really important for publishers, because it can also allow you to figure out how you're going to potentially license your content to AI systems or to AI developers, because if you don't know what you have, if you don't know how many words you have, what topics you have.
It really puts you in a blind spot when it comes to saying to someone, well, OK, we have a billion words, how are you going to price it on that if you don't know that. That's usually how things get priced is by the amount of words that you're providing to the AI solutions developers. So all of these things have a video. So we'll just play the video so you can take a look at how it functions.
Yeah so as you can see here you put in an abstract and then it takes the different terms. It gives you the overall the key terms. And then for each term it also gives you related terms. And so this is a really great way of if you don't have a taxonomy, building a taxonomy, if you have a taxonomy to develop it out further, if you want to expand your vocabulary, this is a great use for expanding your vocabulary and it can really help.
As I said with search and discovery on your platform. Here we go. All right. Something else that we've developed probably well over two years ago now. So we've been playing with this whole concept of generative AI Search or rack search for well over two years. And so we've developed this capability, and it's actually live.
It's used by literally millions of millions of users globally through our discovery application where they can come in, they can ask a question, and it queries well over 250 million research objects in order to provide an answer. And the answer it provides is also referenced and cited back to its source. And it's only answering from our data lake of research objects. It's not going out to the open web.
It's not going on Reddit. It's not going on, ECS or anywhere else to actually answer it. It's using our research database that we have developed to answer those questions. So it's very well grounded. And it does cite, what it outputs. So in this case does COVID 19 vaccine cause cardiovascular fatalities.
And so it gives an answer. And as you can see there's blue links there. And each of them is linked down to the reference where it pulled that pulled that answer from. And I know this is a very popular growing use case for large language models or for chatbots is to really do a Q&A sort of bot. I know there are folks that have developed it for membership services on the society side.
I know there was just an announcement this week about ASCO developing something similar with Ovid. And there are other of course, there are other AI Search engines out there, like Perplexity, for instance, is very popular, and it's used widely. In fact, I think the other key thing that we developed is the ability to generate Alt text. So I know this is really important with the European Accessibility Act coming into force in June, and also the ADA Title Ii, which is coming in for coming into force in April of next year.
And the thing with Alt text is that you need to be able to properly understand the image and the context of that image. And so what we do is we provide a couple of different ways for publishers to engage with our Alt text generation tool. So this is actually our user interface, but it's also available as an API. And we can also provide batch processing.
So if you have millions of documents or millions of images, we can batch process them all and then deliver the Alt text back to you. But simply the way it works is you can either upload images or upload PDFs. And so if you upload a PDF, what we actually do is set it through a couple of different steps in order for it to generate Alt text. First, we process the PDF.
We understand the context. We understand the key terminology that's being used. We identify where the images are, the related text around the images, and we bring that together in order to generate the Alt text. So it's not simply see an image, generate an Alt text, it actually sees the image. It looks at context from across the manuscript and then puts that together in order to generate Alt text.
And it generates three types of Alt text. So we can actually generate academic short and long social media, short and long and marketing short and long. So it can have various applications across your workflow. So as you can see here, it processed that image and then it's able to give you an output in those three different formats and both short and long text. So this one's actually doing processing of PDF.
So it'll read the PDF and it'll extract out all the images. And then each image will have Alt text associated with it. And then we can provide this back to you in multiple different formats. So it could be CSV files. It could be JSON. If you want. We can even append your existing documents with this Alt text and deliver it.
You can. So there's multiple ways that we can deliver it back to you. One of the other use cases for this with the APIs that you can also integrate this into your existing submission system, per se, or into your existing workflow. So then you can expose this to authors. So potentially the authors can have a look at it and verify it and validate it in that process. It can be done pre-submission.
It can be done post submission. It can be done at revisions. There's multiple places where you can actually insert this if you want us to verify it. We have, as I said, we have a lot of folks that work for us and work with us. That can also help you verify that Alt text, on either an ongoing basis or in a bulk basis, so we can provide that service to you as well.
There we go. So I know the other thing that we're seeing a lot of is, or there's been a lot of talk for at least a decade, maybe, if not more, about plain language summaries and summarizing content for different audiences. May they be professionals, practitioners, policymakers or the public or patients.
So we've been doing AI summaries for quite a few years now. We've actually done maybe well over 7 million I summarization in our discovery app. We have done many, many more in partnership with publishers and with other platforms. And again, we can do it on demand basis. We can do it in bulk, or we can provide you an API that can be integrated into your existing solutions as well.
But here you would just bring in your PDF file, you would deposit it and then it would then it's able to process it and develop a summary. And you can also just have it rather than do it in one file, one manuscript at a time. You can actually have multiple manuscripts at a time that gets processed together. As you can see here, you can scroll through the output and it'll give you the different types of long and short form summaries as well.
And we can adapt the prompt accordingly to your style. So not every journal is going to have the same way that they want to present the summary. So we can adapt it to your style, to your journal style, and also to the audience. Now, this one thing I'm really excited about is multilingual audio, because audio has, of course, taken off with podcasting.
Besides, video, audio is one of the other really main ways to engage with your audiences. And so it's really taken off. And I'm really excited about multilingual audio because this fits in really, really nicely with accessibility. It also fits in very nicely with global expansion of your audience. So if you operate in the US, great.
English is fine. But if you're looking at markets like China, Saudi Arabia, Indonesia, Brazil, you're going to need to be able to provide that content in different languages. So we built this mechanism to actually take text and turn it into audio and to turn it into not just English but into various different languages. And this is actually currently being utilized in our discovery app as well, where folks can come in and they can actually not just translate text to text, but it can also generate audio and they can go from text to audio.
And we're seeing a lot of engagement with that specific feature because people really do want to try to better understand the research. And to be able to apply that research in their day to day lives. So it goes through multiple different steps. So you can ask it to either do a full audio of the full paper or you can ask it to summarize.
So what it does first is it extracts, it does the text summarization. And then the text is sent to the AI audio engine. And then that develops that creates this the specific audio, the audio version of the summary. Now some use cases for this are, of course, it's a great. Can implement this on your websites so your users can come in and they can engage with it in the language that they prefer.
This is also a good way to potentially create podcasts so the ability to create podcasts in English or in other languages. So there's multiple different applications for this. And sorry about that. Yeah So the other thing and this actually came out of conversations with clients was in an open access ABC world.
You need to know what you're rejecting and where that's ending up. Are you rejecting perfectly good papers that are getting published at a competitive competitor journal. And are you losing out on not just the ABC revenue, but also citations, which are very important. So we've also developed a rejection analyzer tool. And again, this is, on demand, on demand service available to publishers where they can bring in their CSV file, their rejection data, they can upload it and they can get a report back.
And so we use our extensive extensive database and also partnership with other services in order to identify where that article was published. We also look at how many citations that got. We estimate potentially how much revenue was generated. And so we provide a really detailed report back to you. And, the match can be anywhere from 60% to I've seen as high as about 80% or 85% match rate. And, maybe I would say if let's say you bring in something that's like rejected a month ago, maybe the match rate isn't as high.
But I would say like three months and older, you get a much better match rate because it takes time for authors to submit and go through the approval process or the acceptance process. But this really this application is really helpful for editorial teams and for business teams on making better decisions on what type of manuscripts they accept cascading policies.
So if you have multiple journals, how do you cascade to the right journal. So you're not losing out on really good papers that could maybe not in the flagship journal, but maybe it could get published elsewhere within your portfolio. So right now I was just checking it's going through that and it's generating a report. And then you're able to see exactly what percentage was published, what the top publishers are, what the top journals are, how long it took to get it published after rejection.
It also looks at citations that were provided or earned by the accepting journal. And then we also give you an estimate on APC revenue, potential loss of APC revenue. So a lot of those things this can really help better inform your editorial decisions going down and also your journal launch policies. Maybe it's time to launch a new journal because you're rejecting perfectly good articles that should actually stay in-house, but you don't have a journal for it.
So it helps make those decisions by using data rather than just gut feelings. So this is our website. You can see the QR code right there if you want to scan that. It's also on the last page also. But you'll be able to get a better understanding of the different solutions we provide. You'll also be able to request a demo, and then we can also set you up for access.
Once we do a demo with you and answer some of your questions. And then just one other thing that we're launching, actually, there's a lot of different things we're working on, but this is the one thing that's closest to launching. And, there's other stuff I'd love to talk to you about, but we're not ready yet. But it is a lot of cool things are coming soon.
But one of the new things. So we're also going to be launching chat with paper on the AI Playground. This is actually currently available in our paper pal service, which authors use to write edit manuscripts, find citations, paraphrase, rephrase their content. It also has an AI review assistant available for authors, but they can also chat with their paper. Now I find this really useful because I'm not a researcher.
I don't come from an academic background. I mean, I went to college, but I didn't really do science. It was a little chemistry, and physics. Oh my God and statistics. Oh God forbid. So I did marketing. But this is really cool for me because I'm able to take research papers and I'm able to upload them and then query the research papers for very specific things I want to learn.
So for me to sit there and actually read and absorb and understand a research paper would probably take a very long time. So doing things like AI summarization, doing things chat with paper or chat with PDF is really cool for me as an end user, because it allows me to better understand the very specific things I'm interested in from that paper. And then I can dig deeper or I can take that answer and say, OK, now I understand it, or I can keep querying until I get a better picture of it.
And we know that this is extremely popular because when we launched it on paper, I think we do. I forget how many. I mean, I might but it's probably like well over a million PDFs a month, maybe more. And it just keeps growing because the users are really excited about this feature. And again, it comes down to the fact that a lot of our users are early career researchers.
A lot of them are students and they're in academia, and many of them are English as an additional language or a second language. Researchers So they want to try to fully understand what they're reading, and maybe they don't understand it when they just read it for the first time. But the Q&A functionality really helps them. All right.
Jeez there we go. So thank you. I think we are way under time, so I'm more than happy to take questions. Yeah there's a microphone right there. OK. Hi.
Thank you. Great presentation. Thank you. One thing that my suite of journals has been dealing with is we do allow AI tools to be used in the production process of the articles for various reasons, including, English as an additional language for some of our researchers. We have been running into an issue with detecting hallucinated references, because oftentimes they do look correct or they'll have a Doi even though it's not accurate.
Do you have anything to quickly and scalably detect those. Yeah so we do have paper pal for editorial desks. So paper pal has so not to confuse everyone, but paper pal is three different models. One is paper pal edit, which is for authors to write and edit and get help with their manuscripts. One is paper pal preflight, which is used by journals to check for pre-submission grammar, language and technical checks.
And then the third solution is paper pal for editorial desk, which is only available to the editorial teams, and that has 25 plus different checks for research integrity purposes. And that includes things like, checking for references, making sure that they're actually valid, that the URLs are actually valid, that they haven't been hallucinated out of thin air.
It checks for paper Mills and citation issues. Citation rings, it checks for conflict of interest and a whole bunch of other things that are required. So like data availability statements, because a lot of papers are now data heavy and you should have a data availability statement to make sure that people are able to find the original source of that data to validate it. So it provides about 25 different checks.
And yes, one of them is actually picking up on hallucinated references. It also has an eye detector. But the way we do it is we take a multitude of checks and then it ranks it based on what fails and what passes into basically three categories. So either pass, Warning or critical. And then based on that, your editorial teams can make a decision on if they want to go ahead and accept the paper as is, if they want to maybe tell the authors, hey, can you fix x, y, and z.
Or if it's really bad, then say, we're just going to reject it. And it just helps editorial teams make those decisions much quicker and potentially make better decisions. So you have to go back and clean up the record if you made the wrong decision, let's say. Any other questions. Thank you for the presentation. First of all, the QR code seems to be taking me to the link.
I don't know if there's an issue with the QR code. OK, I'll check that. Yeah So I want to know at what level are you now with most of these tools, the chartered paper out test, is it ready for the market as it is or is it still at a trial and testing level. No, no, it's all ready. I mean, all the solutions you see on here are actually ready for implementation.
And as I said, we can provide it to you in an API. You can do it as a user interface on demand basis. Or you can contact us and you can say, hey, listen, I have 1,000 papers, 100,000, a million papers that I need to do something with. I mean, it could be Alt text, it could be summaries, it could be concept extraction. Build a vocabulary and we can. We have a team that can help you do all that.
So we'll do all the heavy lifting and then we'll deliver the output in the fashion that you need it in the way that you need it. So these are all ready to go. And some of and as I said, they are already being utilized on our various other products like our discovery or paper pal for instance. Yeah Yeah please.
Hi thanks. I'm Theresa from the American Mathematical Society. We're kind of dealing with Alt text generation deciding where to go in this kind of space with the EAA. I was curious if you have any of your PayPal edit or not, paper pal edit or anything like that, or your Alt text generation. If there's anything kind of that authors can do on their end specifically and put it in their hands versus in the publisher's hands, if they can, while they're using if they use paper pal edit, that they have tools there to generate Alt text and how they can deal with it on their end.
Yeah so there's a couple of ways to actually get this into the hands of the authors. One is to use our API and then integrate it into your workflow. So expose it to the author at the right time. So it could be either at hey, you're going to submit this paper. Maybe you want to do this as well as a step to submitting the paper. Or you could do it at a revision point, and you can send them to that specific site that's holding hosting our API in order to do that and verify the content.
The other way is, and we've been talking about it and it should be again, this is like a coming soon preview thing, but we are looking at integrating that into our paper workflow as well. So when the author comes in, not only can they check their manuscript, but they can also get Alt text ahead of time. So it all comes in a single package versus doing it in steps after the fact.
So that is actually in the works, but we're more than happy to demo both paper pal and the Alt text generation piece for if you want to get in touch afterwards. Yeah bill. Bill from Tripoli. For the multilingual audio tool that you're using, I would imagine different dialects that is being picked up would have some human component to that process to help verify some of the languages that are trying to be transcribed into the text or into the audio.
Yeah so that's a really good question. Actually, we are working with the American Society of Anesthesiology right now. And one of the things we're doing for them is we're taking their podcast and translating it into Korean. And so initially when we started, they had their they had a bit of hesitation on is the human better or is the AI better or which one do we prefer.
So we actually did both the human translation and the AI translation. And, they evaluated and they said, well, the AI is just as good, if not better. So we're using AI to do Korean translation, for instance. And we can actually do it in 30 some odd different languages, but we use multiple models. So it's not just one model we're using in order to do this. And so it comes down to what's the best what's the fastest and what delivers the best output.
And then based on that, we pick that. So if you have a specific dialect in mind, we won't be able to cover every one of them. But like Mandarin or Cantonese you can pick from those two different dialects, maybe if you need Arabic, but Arabic from a specific region, or Brazilian Portuguese versus Portuguese, Portugal, Portuguese those are two very distinct things. We can adapt that accordingly.
And just a quick follow up, since we're doing podcasts with you now, I'd be interested in seeing how we can integrate that into our current process. Yeah, definitely. Definitely and as far as the human validation, again, as I said, with all of the AI services we provide, we can actually do the human validation for you because we have a network of experts that can come in and actually validate the output and make sure that it is what it's supposed to be in their native language.
Because the last thing you want is something translated mistakenly. Any more questions. Still got four minutes. All right. Excellent so, I mean, by a show of hands, how many of you are, feverishly working on Alt text right now. OK not bad, not bad.
Yeah and yeah. And how many of you get emails from your bosses going, why aren't we doing this. And pushing you on FOMO because. Oh, yeah. There we go. Yeah well, as I said, if you have any questions, I'll be around for the rest of the meeting. Just grab me in the halls or we can just talk now, and I'll be more than happy to answer your questions or set up a time to go through the demo more in depth demos of these different solutions.
Thank you very much.