Name:
Moderated Debate on Trust in Scholarly Publishing: Artificial Intelligence will Fatally Undermine the Integrity of Scholarly Publishing
Description:
Moderated Debate on Trust in Scholarly Publishing: Artificial Intelligence will Fatally Undermine the Integrity of Scholarly Publishing
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/150c0858-3c78-4e3e-8e81-82639708ead1/thumbnails/150c0858-3c78-4e3e-8e81-82639708ead1.png
Duration:
T00H59M55S
Embed URL:
https://stream.cadmore.media/player/150c0858-3c78-4e3e-8e81-82639708ead1
Content URL:
https://cadmoreoriginalmedia.blob.core.windows.net/150c0858-3c78-4e3e-8e81-82639708ead1/moderated_debate_on_trust_in_scholarly_publishing_ai_will_fa.mp4?sv=2019-02-02&sr=c&sig=69eF9vpxIRDkSfcGwQVYI%2Fty8LO2Y23%2BVv4X8DnI1IY%3D&st=2024-11-20T02%3A20%3A58Z&se=2024-11-20T04%3A25%3A58Z&sp=r
Upload Date:
2024-02-23T00:00:00.0000000
Transcript:
Language: EN.
Segment:0 .
Conference we have made it to the end. We have a very special treat for you all today. It's a moderated debate. I'd like to introduce our moderator, Rick Anderson, and he's going to introduce our speakers and explain how this debate thing works. Thanks oh, well, Thank you. Well, we're starting with Applause.
We should just stop now. So welcome. Yeah, this is going to be a formal debate, more or less modeled on the structure of an Oxford Union debate. So the way it's going to work is we're going to first start by taking a poll of everybody in the room and also everybody who is attending virtually. And the poll is going to measure the audiences agreement or disagreement with the proposition that's under debate today.
And the proposition is resolved, I will fatally undermine the integrity of scholarly publishing. Once we've taken the poll, we'll then have an opening statement by our first debater, followed by an opening statement by our second debater and then a 3 minute response from each. And then after that we'll have time for discussion with the audience for questions and comments.
And then at the end of the discussion period, we'll take the poll again and whichever side has moved the most votes will be declared the winner. So this means a couple of things. One of them is that during the discussion period, you know, often in conferences, you get in trouble if you stand up and ask a question, that's actually a comment. But in this case, because we're debating, you're welcome to make a comment.
You're welcome to just sort of throw your opinion into the mix if you want to try to sway the final vote either way. But you're also welcome to ask questions or ask for clarification or to argue with our debaters. So I will very briefly. Oh, I'm sorry. One more important point. We want everybody please to vote initially and then for the closing poll.
We only want you to vote if you voted in the opening poll. Right we want as closely as possible for the closing poll to match the respondent pool of the opening poll. So please, everybody vote in the first one and then in the second one, only vote if you voted in the first one. All right. I'm not going to waste time by reading the bios of our speakers.
They are both very illustrious figures and you can read their full bios in the program. I'll just say that our opening speaker will be Tim vines, who is founder and CEO of dataseer. And then our second debater is Jessica miles, who is vice president for strategy and investments of the holtzbrinck publishing group. So let's first start by taking the poll, if we could activate that.
Should happen any minute. And while we're waiting, Tim is going to sing a song. Good joke. Oh, OK Tim has a joke. Good is this going to unfairly influence the voting? I think so. You think so?
Well, if you have a joke. But if it's. What if it's a really bad joke? It's a dad joke. Why can't you here? Why can't you hear any sound when pterodactyls go to the toilet? The p is silent.
Yeah, I think that that may actually have influenced the debate, though maybe not in the direction intended. All right. So it turns out the poll is actually up. Oh, and it's been going great. I need to write down. We'll we'll let it run as long as the numbers are changing.
And then when it looks like it's slowing down, I'll call it and record the result. And it looks like it's. Getting pretty stable. All right. I think we're going to call it and say that it's 77% disagree. 23% agree. And with that, I will turn the time over to Tim for his opening statement.
Go ahead and start the timer. You're in a desert when you see a tortoise. You reach down and you flip the tortoise onto its back. The tortoise lays on its back, beating its leg in the hot sun, trying to turn itself over. But it can't. Not without your help.
But you're not helping. Why is that? The cinema enthusiasts among you might recognize this quote as part of the voight-kampff test in the original Blade Runner. The test is used to pick out artificial humans, a.k.a. replicants, by probing for unexpected emotional responses. As the movie progresses, it becomes clear that even this test struggles to identify some advanced replicants.
I'm going to argue here that I will fatally undermine the integrity of scholarly publishing and a great many other things besides. There are three reasons why scholarly publishing is particularly vulnerable. First I should blade Runner's replicants. It will soon become almost impossible to distinguish the products of artificial intelligence from products made by humans.
Unscrupulous researchers will be able to conjure up convincing research without the trouble of picking up a pipette. I sense that some of you have a spark of hope that new tools or better screening can detect these faked articles. Let me snuff out that spark right now. Humans are relatively good at spotting AI generated pictures of people and things because our ancestors have spent millions of years learning to spot uncanny faces or strange shadows.
But we have no such evolutionary history with scientific texts or data sets. And we must therefore rely on technology. And on top of that, an image has multitude of elements that must be exactly right for it to pass, as real text is simpler by several orders of magnitude and hence far less that I can get wrong. Even if we do find some bug that's a giveaway that an article was generated by that bug will be fixed in the next version.
The arms race of using technology to spot fake research texts is a race we have already lost. Second, the task of spotting AI generated articles will fall to the editorial office. Yes, the perennially under-resourced, under-trained and understaffed editorial office. You may have faith that the editorial office has the tools and ability to weed out artificial articles.
you may want then to reflect that ORCID is now 10 years old, and most journals still don't require that the authors have one. If we as an industry are so delinquent in implementing basic author identifiers, what chance do we have to consistently detect research faked by sophisticated ais? Another route to detecting articles generated by artificial intelligence is to require that the authors provide the data sets and code objects that underpin the conclusions presented in the article.
Faking these outputs individually imposes a significant extra burden on authors. But having them to be sufficiently interoperable to generate the results in the article is very, very challenging indeed, if and only if, scholarly journals insist that authors provide their data sets in code during the peer review process, then test the reproducibility of the authors analysis. Can we expect to weed out faked research?
Will this approach work long term given the breakneck pace of AI development? It seems certain that the capacity to fake data sets and functional code to go along with the fake article is not far off. Open science buys us time to develop new approaches, but it itself will not save us from the corrosive effects of AI generated fake research. Moreover, even if many publishers adopt and enforce rigorous open science policies, scholarly publishing has a dirty secret.
Substantial fraction of the industry prefers not to answer to ask awkward questions of their authors. These publishers instead happy to receive the author fee in return for publishing the article and have no incentive to weed out plausible fake articles. Why would they? The fakes are unlikely to be spotted by readers, and every author is a paying customer.
So even if some part of the scholarly publishing industry makes best efforts to spot fake papers, the proportion of the literature that is entirely AI generated will grow relentlessly. And that brings us to the third reason of why artificial intelligence, artificial intelligence will, in time fatally undermine the integrity of scholarly publishing. Once the scholarly record is contaminated with many thousands of plausible but fake articles, how can researchers build on previous work?
How can we train useful AI research assistants to draw new insights from the scholarly record when that record is deeply contaminated with nonsense? This dilemma has the echoes of a story by Jean Louise Borg about the fictional library of babel, a near infinite library containing all the knowledge in the universe. Despite this wealth of knowledge, the library is useless. It contains also contains every other possible book containing every other arrangement of letters, so the useful information can never be found.
As the trickle of AI generated fake research grows into a flood, we must ask ourselves this is scholarly publishing willing to do whatever it takes to act as a source of truth, to fight a constant battle to ensure that at least some published research is created by real humans in a real lab. I submit that given our abysmal progress with implementing things like ORCID and open science, the answer is clearly no.
We will be fatally undermined and we will fall. So appreciative of the invitation to participate in this debate. So I want to thank all of you for being here. And I'd also like to Thank those colleagues who shared their time and insights with me in preparing my remarks today. Darrell colthurst, Ariane Guttenberg, Lesley McIntosh, Henning schönenberger and Reshma Shaik.
Although the theme of this meeting, transformation, trust and transparency, seems designed to examine the rise of AI in publishing, it was improbably announced in October 2022, predating the release of chatgpt and the ensuing generative AI frenzy. Prescient as this choice seems, I think what the timing really reflects is the reality that scholarly publishing must constantly manage technological innovation.
As Todd carpenter reminded us a few weeks ago in a post in the kitchen, quote, the publishing process has always relied on technology from the paper or ink upon which the scribe noted their work. Yes, pen, paper and ink are all technologies to the earliest typesetters and printers to the digital markup and repository tools of today. Trust and transparency have been critical for weathering centuries of transformation in response to upheaval.
We have come together as a community to create transparent and reliable systems and processes informed by a shared commitment to safeguarding the scholarly record. We will continue to do so in an age of AI. Over the last few decades, the publishing community has prioritized trust and transparency in the face of radical change. In response to the advent of the internet, world wide web and other technologies like HTML, SGML and XML.
Scholarly communities establish clear industry wide infrastructure and protocols and standards like digital object identifiers or crossref to establish trust by creating a reliable, persistent system for linking scholarly references across the entire ecosystem. According to an STM report, since 20 since 2000, too many zeros, publishers have collectively invested over 2 billion to digitize the scholarly record, make it more findable and accessible, and safeguard its integrity by developing more tools to identify plagiarism and other forms of fraud.
As research has become increasingly more collaborative and its outputs more diverse, we develop and implemented the contributor roles taxonomy, more commonly known as credit, bringing more transparency to the myriad roles that researchers have in increasing trust in authorship attributions. More recently, STM formed a working group in 2019 to explore the implications of AI technologies for scholarly communications.
The group has since released a standard bearing white paper. These examples of how we sustain trust and transparency by supporting industry wide standards and developing technology and infrastructure in response to transformation, provide a blueprint for how academic publishing will continue to evolve and endure in response to AI. Yet there's always a yet even in spite of these inspiring activities, we must acknowledge scholarly communications does face critical, potentially existential challenges ensuring research integrity, evolving business models and sustaining peer review among them.
However, it is people, not technology like AI that fuel these threats and people can working collaboratively develop and implement strategies for overcoming potential crises. Digital transformation far from undermining publishing, has made the publishing ecosystem resilient in the face of continued change in another post in the kitchen, hongzhou and Silvia Izzo, Hunter detail how automation, big data and cloud computing have expedited and improved submission and peer review for authors, reviewers and editors.
Publishers have invested in technological infrastructure to develop and enhance platforms and services for submission, peer review and production. Automation has been instrumental in accelerating production processes included automated, automated recognition and disambiguation of authors and institutions, as well as automated typesetting. At this point, some of you may be saying, wait, aren't some of these so-called automation developments you're referring to actually based on AI or machine learning technologies?
Yes yes, actually. AI is ubiquitous in scholarly publishing. I'll repeat that. AI is ubiquitous in scholarly publishing. Recent discussions of I like this one have focused on generative AI, a type of artificial intelligence technology that can produce various types of content. Large language models are one type of generative AI because they can generate novel combinations of text in the form of natural sounding language.
While this emerging technology has the potential to profoundly change scholarly communications, we shouldn't lose scite of earlier classic forms of AI and machine learning technologies that are one prevalent across publishing workflows, and two, importantly, have not undermined the integrity of scholarly publishing despite this prevalence. For example, the STM white paper that I referenced earlier highlights a few examples from prominent publisher commercial and society publishers.
For example, Springer Nature uses these technologies to identify facts, concepts and relationships in scientific manuscripts and transforms these data into structured databases for downstream applications. Elsevier maintains a data integration platform that follows fair principles to help users access clean reusable data and metadata to optimize decision making and improve data governance. ACEs has integrated their AI powered transfer tool, which leverages semantic analysis and publishing history with a peer review system that drives insight for authors, editors and peer reviewers.
As these examples illustrate, AI is making publications more accessible and usable, demonstrating how our continued commitment to safeguarding the scholarly record is informing responsible engagement with technological innovation past and present. And we should learn from the past as we continue to confront persistent risks, because there is still a meaningful risk that publishers will create silos by relying on undisclosed technologies and internal standards of ethics and governance rather than industry wide protocols and guidance.
There is still a meaningful risk that small publishers will be excluded from this wave of change because of the price of developing and deploying digital resource. It will be difficult for us to ensure accessibility. And it will undoubtedly require a cross publisher approach. But we have shown as a community that we are capable of innovation and collaboration and that we can leverage existing infrastructure to mitigate the risks associated with AI and realize its potential.
How in the conversations that I had with colleagues at the forefront of these technologies, a few key themes emerged. First, we must preserve trust by putting humans at the center of everything we do with the mission of accelerating research, eradicating bias and fighting fraud to ensure quality and integrity. Both classic and Gen Ii approaches can give us additional tools to improve and scale these activities.
For example, AI tools for text summarization and translation can help authors who find it challenging to meet the requirements of publishing in journals in English. Broadening the accessibility of scholarly communications. Second, we must ensure transparency in the production and use of AI. Specifically, this includes an understanding of what data are used to develop these systems, how they are being deployed and how they make decisions.
At this point, you may recall recent remarks by Google CEO Sundar Pichai when he said that there were aspects of AI systems that experts and developers don't fully understand. More recently, a research team at Stanford refuted these claims in a preprint posted on arxiv. Their work provides evidence that so-called emergent abilities are not a fundamental property of AI models, but rather arise from a researcher's chosen metrics for analysis.
Related to this finding is the idea of explainable artificial intelligence, which seeks to develop tools that allow human users to understand and trust outputs created by machine learning algorithms. These are valuable methods for fostering transparency and human centeredness as we develop increasingly sophisticated technologies. Last, we must improve the data that AI technologies are using at a reform of responsible research in AI.
In Berlin last month, the old adage garbage in, garbage out was a major point of discussion, with the panel agreeing that many problems we've encountered with AI are related to the data. These systems are trained on an approach that keeps humans at the center of efforts to improve systems for evaluating data quality can help address these problems. One approach could see publishers collaborating to build a cross publisher corpus for training a large language model exclusively for scholarly communications to ensure that high quality input and training data are used with the shared resource.
In many ways, the path ahead is a promising one. Scholarly publishing will, as it has previously, adapt and evolve in this technological revolution. And importantly, we are supported in this work by the many countries whose governments are taking steps to regulate these technologies. The EU has led these efforts in developing the AI act with complementary efforts in other geographies. In short, most of us within and beyond the publishing world are striving to act responsibly.
Great generals are always prepared to fight the last war. Not the current one. I'm sorry to say that my esteemed colleague arguing against the resolution has fallen into this trap, deploying the approaches that we use to tackle yesterday's problems against the entirely different crisis represented by generative.
I absolutely agree with her assertion that there are good uses for AI. We can train great AI to support research, but that does not also mean that there are bad uses for AI and that there are people who seek to subvert and use up the process of research for their own benefit. Moreover, it's not as if we, the scholarly publishing community, have been particularly successful in tackling yesterday's problems.
The literature is infested with garbage from paper Mills. An image manipulation is rife. We are only just waking up to the scale of this problem, having been warned about it for years. Based with generative AI. Each publisher has a choice to make. You can either invest heavily in ensuring that the work presented in your journals is real research that actually happened.
Well, you can carry on as normal in the hope that the majority of your work you publish is real. But here's a Warning. Journals that don't want to certify their research as real will steadily become repositories of fabricated junk fatally undermined by I. Will that I. All of us? Because most of us. That is up to you.
I find several of my esteemed colleagues statements to be problematic. First, he discusses science research to the exclusion of other domains, and in doing so, he conflates domain specific data integrity concerns with the matter of integrity of the entire scholarly publishing enterprise. After all, an editor at a Cell biology journal and an editor at a French studies journal may share concerns about emerging AI technologies, but the latter is less likely concerned about how to detect image manipulation in Western blots.
Even if we accept Tim's focus on science research, there are other issues with his account. I find his assertion that a substantial portion of scholarly publishers have no incentive to weed out plausible fake articles to be not only morally troubling, but also not grounded in fact. I've already shared many examples of our community's commitment to safeguarding examples that directly contradict this account.
Beyond moral imperatives, STEM publishers also have significant incentives to detect fraudulent submissions quickly. The time and money spent on these submissions represents an enormous waste of resource, including the precious and limited time and efforts of editors and peer reviewers, as well as the cost of managing retractions and other integrity issues. Adding to this, publishers at publicly traded companies have seen earlier this year that the financial markets do not look kindly upon pervasive misconduct.
Lesson learned being that they must do everything possible to prevent fraudulent research from being published. He also claims, without evidence that the arms race of using technology to spot fake research texts is a race we have already lost. As I said before, it is people, not technology, that fuel these threats. As Dr. Beck noted in her opening remarks, it will take a village to ensure research integrity publishers can continue to develop and scale approaches for detecting research fraud.
Funders and institutions can play a bigger role in oversight by implementing their own systems for monitoring, research integrity and imposing consequences for violations of publishing ethics or for retaliation against whistleblowers. In this ways, transparency begets trust with trust leading to transformation. All right.
Thanks very much to Tim and Jessica for their opening, their opening statements and responses. And now we're going to open up the floor, both physically and virtually to anyone who would like to make a comment or ask a question. We've got some roving mix. If you are in the sort of hinterlands of this very large room, please jump up and wave your arms frantically so that our mic carriers can see you and.
See one question over here. Should I? I'll start then. Hi Daniel Ochoa from the American Physical Society. Thank you both for your remarks. I didn't go to Oxford, but I'll give a try at this. I would like to point out, and I'm leaning towards disagreeing with the motion that peer review is also a technology, and it's a technology that's evolved in a time of scholarly publishing.
I have been in the field of publishing for nearly 20 years, and even in that short time span, a lot of things have changed. Maybe what will happen is a development in peer review towards something more adversarial, something where you start off from a position of distrust towards a submitted article. If there is a lot of suspicion that there may be AI generated content, but that is something that we're going to have to do, then if it is the case that more AI submissions are going to be is an inevitable development.
So thank you. Raise your hands really high because it's kind of hard to see. Hi up. I'll go buy. Hi Alice. Is more brains cooperative? And for the sake of you, hold the mic right up to your mouth.
Sorry is that better? Yeah OK. Alice is more brains cooperative. And for the sake of context, I'm going to add that I previously worked at niso and ORCID. So my comment is about the tools and technology that we have to address these issues. And I think, Jessica, that you're right that we've done a lot of work.
What worries me is that most of it has not been properly implemented. So my concern is, can we as a community actually do the implementation that's needed to use those tools and standards and so on at our disposal to address this issue? That's what my concern would be. You know, that's a great point and something that my colleague Tim alluded to.
And I think the first thing that comes to mind is, you know, necessity is the mother of invention. When we look at, for example, how many journals have now moved to a digital environment, that's something that's been extremely, extremely high adoption because it's had to happen. So the tools that we're using, I think as more and more journals realize in the face of this new technological wave of change, like we really OK, now is the time, we really do need to get on board.
And of course, it's not just editorial offices. It's every part of the ecosystem that that will come to fore. But to acknowledge your point, it's not going to just magically happen. We need to dedicate time, effort and making sure that everyone is aligned and pushing this to implementation. Yeah, I sort of wanted to respond to sort of both points there. I think they're very closely aligned in that we do maybe need to adopt this very defensive stance that.
This article probably is fabricated and it's wrong, and the authors have to prove to us that they're legitimate and correct. Open science is a big part of that. These are the data sets. I am trustworthy. I have done a good job with a lot of signaling around that, especially these difficult signals that are fake.
The big issue I see with AI is that it becomes possible to actually just create that trust, to leap past those barriers because the generative AI looks so convincing and it's so difficult to tell what's real and what's not real. Like rachevsky from proffy. I'm speaking now not as a prophet founder, but as a scientist.
And my question is mostly to team. You seem to have a seem to assume that people did not cheat in the past because they didn't have enough capacity to cheat. What makes you think that this scientist has such a bad flock of people that we actually devote our lives? To faking something for not so much pay for not such a great lifestyle for unlimited working hours.
Do you really think that we didn't fake because we're not smart enough? Sounds like a Tim question. Sounds like a me question. I absolutely agree. I've worked with many scientists and many of them are just absolutely wonderful people who really search for the truth no matter what it takes.
And it's inspiring. I think a big part of the problem. And it comes back to what Jessica was saying, is that. It is about people and there are a significant fraction of people that are forced to publish as part of their professional advancement. In fact, millions of people are forced to publish as part of their professional advancement and they don't really want to publish and they have to put something in a journal by whatever means necessary.
And I think they are very natural. Audience for paper Mills currently authorship schemes, reviewer schemes, you name it. These people who want to take a shortcut despite all the other things they have to do. But also. Science is hard. And if the motivations and the ability to actually get away with cheating become.
It becomes easier and easier to cheat and the possibility of being caught goes down and down. Who knows what's going to happen? Because getting that extra bit of funding is makes the difference to keeping your graduate students Fed. And in housing, you might just take that risk. And I think that's the right now it is quite hard to make up research as Jonathan Pruitt discovered when he started cutting and pasting data within his spreadsheets and including formulae for how the data should be faked.
Once that once that goes away, once it becomes incredibly hard to spot research that's been faked, then I do worry that people will be like, well, I know how this should have gone, so I'm just going to make it like it should be. And then I think we're on some very dangerous ground. Actually, I do want to take the opportunity to say something that I wasn't able to incorporate in my remarks, but something that I've been thinking about over these last few days of the meeting and actually spoke to Oleg about this morning so I can give credit where credit is due.
You know, the question of mentioned. Right, existing fraud in research and the question of why that hasn't undermined fatally undermined the enterprise is one that's, I think, been on all of our minds. And I was thinking back to the image that Dr. bick presented of this brick wall and the idea that science is building on itself. And I wholeheartedly agree.
But I think it goes beyond being a simple brick wall because we understand and the process of doing science that not all bricks will stand the test of time. And what I mean by that is, you know, I was thinking about Tim as he has a background in evolution and I was thinking about lamarck, right? We lamarck, you know, famously had this theory of the inheritance of acquired characteristics, which is discredited, but that exists in the scientific record, right?
We don't retract or remove ideas that are through the process of debate, experimentation, discussion ultimately sort of disproved. And so, you know, Oleg was saying, it's really certainly it's a foundation, but it's maybe more like a house that stands over multiple years, housing generations and generations. People come, people go, they leave their mark on it.
Things are built, they fall in. Maybe some parts of it are maybe in disrepair and need to be replaced. It's an incredibly dynamic system. And in that way, I think even with the challenges we face today, the ones we faced in the past, there is that resiliency and ability and it's not static. It's able to evolve. And so I think that will help us a great deal.
So that's just saw my opportunity to mention that. I think it's an important point. Hi oh, that was loud. I'm Cassandra Larose. I'm a librarian. I am an early career fellow, and I'm a bit scared about what I'm about to ask. We've talked a lot about trust and transparency and maybe not as much about transformation.
And I'm wondering if, as we talk about this in terms of the scientific record, in terms of a very kind of narrow scope of written publication, peer reviewed publication, are we being a bit shortsighted perhaps, as we think about the fact that that is the type of publication that is preferred and that is what people are having to do to get ahead? But should we be considering other ways of sharing our knowledge, knowing that we are at the Society for Scholarly publishing?
Again, I am very pro publication, but are there other ways perhaps that I could be used where we might look towards other ways of sharing knowledge and maybe not necessarily just the ways that it might undermine the current ways we do things? Absolutely I am very short sighted. I should point out I've got my glasses on. I have two things I want to say to that. One is that scholarly publishing is communication and communication.
Both depends on a sender and a receiver having an agreed format. So if the sender radically changes what they're sending. If the reader doesn't intuitively understand that because it doesn't match what they expect to receive, that communication fails. And so that's why. Scholarly communication is quite slow to change.
It's because it depends on the sender and receiver both changing, agreeing to change in parallel. However, I do believe that there will become a new type of reader that's not human and we need to think about research articles that are entirely machine readable that every aspect of them. Relates to an assertion backed up point, broken down in a way that a machine can understand it as well as a human reader can understand it.
So I think that might be a direction that this goes, but that doesn't. That doesn't directly hit what we're talking about here, which is can we. Can we spot the bad actors? I mean, I absolutely agree that AI will change things in a lot of ways. I'll say to that is, I agree with you. And I think I'm shortsighted, too, in that way.
And, you know, even beyond art. Right we've been talking about article, article, article, but even beyond an article, right? Are there ways to think about communication outputs that are potentially less vulnerable to the types of manipulations that we've been discussing? I think that that's an open question and a very exciting one.
Hi Roy Kaufman from s.s.c. And I promise I won't sing about pids. So I'm going to just ask this question, which is. If we see when we see a proliferation of fake science and fake articles throughout whatever in repositories and preprint servers and everything else, might that not make the validated record of science infinitely more valuable?
Because now there's a contrast point between stuff that's been identified and validated by a proper journal, by an editorial board, by checks and balances. And, you know, as chatgpt hallucinates, not because just because it's poorly programmed, but because it was trained on garbage as well as good stuff. But if you're only training on good stuff, it's going to be more reliable and more valuable.
So doesn't that, in fact, doesn't that in a weird way, the fact that you can generate papers by chatgpt and LMS increase the value of the rest of the scholarly publishing output? Absolutely absolutely. And I think I tried to allude to that point in that the way I see it now, there's a broad acceptance of the quality of the scholarly literature.
I think that's going to narrow greatly to the journals that really do put in a lot of effort to ensure that their research is real. And that's a lot more effort than anyone really is currently putting in. Um, and the, the, the material in those journals is going to become extremely valuable because that is what's going to be used to train the AI models of the future.
You can't just take the literature at large and use that to train because a lot of it is going to be unreliable and this is going to be iterative. Remember, 20 years from now, we will look back in fondness at 2022 and think, wow, that was the last year that most of the stuff written in the world was written by people going forward. Most of it is going to be written by machines and we need to know what is written, what is real, and use that as the basis of training science going forward.
And if not enough journals are producing real research, then we are going to be in deep, deep trouble. I will say as a follow up point, that right you have the ability to train on kind of pre 2022 data. You can also train on invalid articles, right? And so you can train a model to spot. OK what are the shared commonalities in fraudulent research? You can also post 2022 generative AI world, you know, pick out these subsets that are generative AI and teach a model to be able to detect those signatures as well.
So to your point there, this does bring about inherent value, but then you also have the ability to train models to do very specific things that will again, kind of help promote the integrity of what's published. hello, it's Robert Harrington from the American Medical Society. And I was just musing on what you're both saying, and I'm thinking that the generative AI or AI tools in general have such potential for enhancement of wisdom, enhancement of global health outcomes and actually global equity of the input from people.
So I just wanted to put that in your heads that this global equity of input from more people in the world with wisdom and knowledge might actually lead to an enhancement of the output as sort of the trust and output rather than a degradation of it. What is your response to that, jessica? It sounds like Robert's kind of responding to you. What's your what's your response to that?
So I would agree. And I think I attempted to alluded to a bit of what we were saying with respect to the idea of generative AI and other technologies or other AI technologies, kind of lowering the barrier for, for example, most journals are published in English. And so if that is something that poses a difficulty for an author or a group of authors, having these tools lowers that and makes it more accessible.
But certainly even just beyond the language, you can think of these technologies promoting equity and other ways, you know, democratizing access to knowledge, you know, right. There are kind of myriad examples one could probably think of. Tim, do you think Robert makes a valid point? Do you think this outweighs the dangers that you're talking about?
it's going to be fantastic that people will be able to bring their knowledge, like I was saying, about the mode of communicating research to be able to bring their knowledge into that mode. So that it reaches readers in a way that they're going to be able to. Understand and ingest into their understanding of the world. That's great.
Doesn't mean that there's still not a ton of people out there that are highly motivated to publish fake research to get through these qualifications, this professional advancement that they need to do. And they're going to do that. And it's up to us to somehow spot them and prevent them from doing that. Both things are going to happen, and I think the latter is going to become a bigger and bigger problem.
Whereas the former I mean, there's going to be lots more papers from maybe not lots of excluded groups and that's fantastic. Whether or not that's going to make if the scientific literature becomes 50% larger, but 40% of that is more garbage, then I don't think that's a good thing. Hello, my name is Jack Myers with the American Physical Society.
And just wanted to commend this topic and the panel. I wanted to sort of look at this from a first principles perspective in that the peer review process, like somebody else mentioned, is technology to ensure quality and integrity within research. There's another mechanism, which is the replication of results that's done by the community. And I think what AI offers and just hasn't quite surfaced yet is a massive increase in productivity to the scientific domain, meaning that when we equip AI into robots and the future automation potential, that we'll be able to do a lot more science than we're currently able to be done today.
And so I don't know what the time horizon of this prose is, but I would like to sort of go back to the panel here and ask, well, is there a future where peer review is just one aspect of this process? And in fact, there is the ability to replicate results in a more rapid way or to crowdsource and aggregate information around the world and to. And the use of I. And I'll end with saying that with you know, I as sort of an interesting paradoxical technology in the sense that the same technology can be used to combat the ill intent of those that use it in that way.
You know, just the simple example is this plagiarism aspect, right? Plagiarism is using machine learning to identify when things look very similar to what other things it knows about. Well, in the same way, you know, I can be applied in this way to enhance and to scale our ability as humans to detect these things as well. So just wanted to have you guys look out a little bit and see if there's something longer term that helps to solve this problem.
Jessica, I see you nodding. I'm really loving your bold vision. And, you know, I'm thinking about this and I can even think of a world in which, you know, that reproducibility check is done concurrently with peer review. Right? you know, you have not only the peer reviewer evaluating the paper or that whatever output, but also the results of the robot from the lab or the execution of the code or whatever we're using for reproducibility.
So, you know, absolutely. I think you're right. We're limited by our own imaginations, I think. Yeah, I absolutely agree. Well, 20, 25 years, it will be profoundly different. Doing science. I think most labs will have their own in-house AI assistant.
That's essentially a sort of incredibly sophisticated reference manager. But it also it is able to pass the literature and describe experiments and it understands the labs questions because it's been individually trained, that sort of thing. I think will be very transformative. And if it's able to do experiments itself and then even publish the results of those experiments.
These are things we might need to embrace. But at the same time, it doesn't mean, like I keep saying, does it mean that the bad things are going to go away? It doesn't mean that we as publishers standing there trying to gatekeep there will be a lot of different stuff coming at us, and we need to be sure that what we're putting out with our stamp on it, saying as best we can tell, this is real, that we've done that work that we've actually established. It's real Tim do you think it's more likely that in that scenario where every lab has its own highly sophisticated AI machine, that in those labs?
Do you think it's more or less likely that those machines will be used for good in terms of scientific integrity or for ill? Well I think mostly for good. I mean, they will be neutral around whether something's good or bad, but they will the people using them, I think they would be used for good because the stuff you would need to fake research would actually be a completely different, different module that you would go off and find somewhere online and different.
I want an article that does x that says x about this, including images in the data set and it comes back 10 minutes later. Just a quick point. I would maybe disagree that there'll be neutral. I mean, if you think about it, how many of your employer or University computers let you go to every single website you want to? Right?
I would assume that if the institution is paying for this, they're going to impose guardrails around what can and cannot be generated. And again, not to say that nefarious actors will find their way around those guardrails, but that's one more barrier, I think, to producing fraudulent research. So we're down to about 10 minutes left. Let's let's have one more question or comment and then we will open the poll for the new vote.
Hi Hi. This is Susan wellner with the American Society of nephrology. And boy, the pressure's on to make this question a good one. I'm going to go back to the debate topic for a second and say if the premise being discussed is, will I undermine the integrity of scholarly publishing, then if the record, the LMS, if the record is rife with junk, for lack of a better phrase, if there's bad information in it, then how does that not then reduce the trust?
And if the trust decreases, how does that not undermine the integrity of scholarly publishing? It was said earlier junk in, junk out if more and more junk is reproduced. Then what's left? I, both of you. I'd like for both of you to address that. I think she's arguing against your position, Jessica.
I definitely appreciate that. As a last question, because I think it gives me a chance to address a fundamental assumption that because making fraudulent research is easier with generative AI, there were correspondingly be an increase in that. Right and I think it's worth questioning that assumption. I will just put that out there. I think we've talked about the ways in which these technologies could be used to reduce the amount of fraudulent research.
So that would be, I guess, the inverse of the situation you proposed. But I do think both Tim and I would accept that if the scientific record becomes rife with fraudulent data, that undermines its integrity. I think my position is that that will not happen and here's why. Whereas his position is that we are fatally doomed, this will happen, and here's why. If that's an accurate summation, and I'm going to tell you why I think that is, it's because how long have we been trying to do open science for?
About 15, 20 years. We all know it's a really, really good thing to do. Multiple benefits. It's really important. It's great for scientific record. It's great for future science. How far have we gotten? Nowhere, really. Ten, 20?
15% of journals require open data. Progress is very slow. It's always someone else's problem. And and I think that we know that open science is where we should go to fix quite a lot of problems facing the community and facing scholarly publishing at the moment. And still it's always someone else's problem or we can't find the resources to do it and.
And that's why I kind of despair when we faced with the difficulty, the next level problem of dealing with properly faked research that is almost impossible to distinguish from real research. I just don't think we're going to pull our socks up and actually do the work. All right. And I that's where we have to leave it.
So first, could you all join me in thanking our excellent debaters? And now let's activate the poll again and again. I will emphasize that we need everyone who voted and only those who voted in the opening poll to please vote again. All right. There it is.
And we'll see what happens. The general shape is remaining more or less the same. But it's looking like I'm going to give it 5 more seconds.
It's looking like our final result is 68% Sorry, 68% disagree and 32% agree. Which means that. The disagrees have lost and that Tim Tim has gained more people agreeing with him than Jessica has.
So we have to declare Tim the winner. Congratulations and well done. And just to close this out, I just want to take the opportunity, the opportunity that I have started taking whenever I moderate a debate like this. And that is to point out two things. The first is that you have seen two things modeled here today. The first one is that civil discourse is not, in fact, dead.
That two people who disagree on important topics can do so civilly and rationally. And second of all, it is not true that minds can no longer be changed by argument. If the next time you hear anybody say nobody ever changes their mind in the face of rational argument, please tell them that you saw minds changed by rational argument here in real time at the SSP meeting.
So thank you again to Tim and Jessica. You were both magnificent. Thanks so much to all of you for your comments, and I believe we're done. Thank you. I don't know about you all, but I love that. That was amazing. Please give them another round of applause.
So this has been a great annual meeting. I appreciate so many of you all for sticking in on the last day for so late in the day. It's been heartwarming to see old friends make new connections and learn from thought leaders that I call heroes. I like to again salute who I call Madam prez, my good friend Miranda walker, for her courageous leadership.
Today, June 2nd, 2023 is a special day for me today marks my parents 60th wedding anniversary. Their love is a demonstration of the commitment, support, dedication and faithfulness that shaped my personal perspectives.
That framed my worldview. Their journey wasn't easy. Day by day, year by year and decade after decade. Consistency over time. When SSP declared their commitment to diversity, equity and inclusion. I was skeptical. I was hesitant. It was music to my ears.
It was exactly what I wanted to hear. It was too good to be true. But true to their word. SSP has demonstrated that commitment. Day by day, year after year. They said, bring your whole self. It's through my whole self into SSP. And others did the same.
And countless others supported leveraging their privilege and their platforms to ensure SSP is diverse, inclusive, equitable and accessible. To ensure that SSP is stronger. For those that are still hesitant and skeptical. My presence on this stage right now is the personification of SSPS commitment. The support from across the industry is the personification of that commitment.
Because of that commitment. I'm proud to say Happy Pride to our LGBTQ family. This year's theme transformation, trust and transparency provided sessions and discussions to help shape the way in which we evolve. We face the challenges of today in true collaboration with one another.
To me, it's an exciting time, a time of opportunity embedded in the sea, change in scholarly communications. Is not just one big thing. It's beautifully complicated. Whether it's exposing the fraud of paper Mills or optimizing the value of our metadata or reining in chatgpt. We're going to be different. Change doesn't have to be scary.
Change can be wondrous. Like caterpillars that transform into butterflies. And with that, I'd like to Thank all the speakers who have contributed to sp's 45th annual conference. I'd like to Thank you for attending. Great to see everyone shout out to the Virtual Attendee. I like to recognize our sponsors and exhibitors. I'd like to Thank our program chairs, Lori Carlin, Tim Lloyd and Emily Farrell and the annual meeting program committee.
Recordings of all sessions will be available in the app within 48 hours. Next year's meeting is in Boston, Massachusetts, from May 20th 9th to May 31st. Don't forget that SSP offers year round webinars and the hybrid new directions seminar in DC and online October 4th through the fifth. Please complete the meeting evaluation. And for anyone that's sticking around until tomorrow, we provide a variety of activity suggestions in the whova meeting app.
Thank you all and please get home safe.