Name:
AI in Publishing
Description:
AI in Publishing
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/b8647ba2-b944-4d60-a291-fe6b25b76320/thumbnails/b8647ba2-b944-4d60-a291-fe6b25b76320.png
Duration:
T00H59M43S
Embed URL:
https://stream.cadmore.media/player/b8647ba2-b944-4d60-a291-fe6b25b76320
Content URL:
https://cadmoreoriginalmedia.blob.core.windows.net/b8647ba2-b944-4d60-a291-fe6b25b76320/GMT20221006-150045_Recording_gallery_1760x900.mp4?sv=2019-02-02&sr=c&sig=3eh8K94939r8M1bJ6rvM6kxXwH4iG%2BXohnYmQ57tGko%3D&st=2024-11-26T08%3A19%3A58Z&se=2024-11-26T10%3A24%3A58Z&sp=r
Upload Date:
2024-02-23T00:00:00.0000000
Transcript:
Language: EN.
Segment:0 .
Thank you and welcome to today's Ask the experts panel. We are pleased you can join our discussion on AI and publishing. David Myers, spe, education committee member and a lead publisher at Wolters kluwer. Before we start, I want to thank our 2022 education sponsors, rfa, J&J editorial open, Athens silver chair, 67 bricks and Taylor Francis f 1,000.
We are grateful for their support. A few housekeeping items. Phones have been muted, so please use the Q&A panel to enter questions for the panelists or agendas to cover whatever questions you have. So please don't be shy about participating. This one hour session will be recorded and available for today's broadcast. A quick note on SPS code of conduct and today's meeting.
We are committed to diversity, equity and providing an inclusive meeting environment, fostering open dialogue, free of harassment, discrimination and hostile conduct. We will ask all participants whether speaking or in chat to consider and debate relevant viewpoints in an orderly, respectful and fair manner.
It is my pleasure to introduce our moderator and Ed ward, vice president research collaborations at Elsevier. As vp, Anita focuses on working with academic and industry partners on projects pertaining to progress modes and frameworks for scholarly communication. Since 1997, she has worked on bridging the gap between science, publishing and computational and information technologies. Efforts include working on a semantic model for research papers co-founding force 11, the future of research, communications and science, supporting the development of standards and models for research, data management and a series of workshops on scholarly document processing.
Anita has a degree in low temperature physics from Leiden University and worked in Moscow before joining Elsevier as a physics publisher in 1988. And now over to you, Anita. Thank you so much, David, and Thank you for all who have joined. This is, to me, a very exciting panel. As David mentioned, I started in publishing in the late 80s because I was a super fan of science and I was very interested in how can publishing improve science.
But really, since the mid-nineties, it's been so fantastic to see how can information technologies support scientific publishing and thereby science? And I'm very honored and pleased to invite our three panelists. I will let them introduce themselves, but I will ask them a question as we do so. So our three panelists, in the order in which I will invite them to introduce themselves, are Helen King, who is head of transformation at Sage publishing.
In itself, that is already a fantastic title, and she's done very pivotal work in the area of AI and publishing. Paul gross, who's professor of algorithmic data science at the University of Amsterdam, also was involved in Faust 11 from the early days onwards. And Lucy Liu Wang, assistant professor at the University of Washington information school, who is also involved with Allen AI and has developed and led a lot of efforts that are really transforming publishing.
So very excited to start today's talk. And I would like to ask Helen to start to perhaps introduce herself. And Helen, can you please after you introduce yourself, please give us your definition of AI. What? what do you think AI is? Thanks thanks, Anita. So, hi, everyone. I'm Helen King.
I'm the head of transformation at Sage publishing and I'm based in London. And my role is really to help sage make better use of technology, which includes looking at. Technology right across the board. But I think at the moment, particularly AI type technologies and services, you could also find me at the techradar newsletter where I write about startups, particularly startups in the AI machine learning space.
And yeah, I dig into all the interesting things that they are working on. So what is I mean to me? Well, I'm going to have a quick look at my notes, because I'm going to confess this isn't a term I use very often. So I like the definition, which AI is an umbrella term for a range of algorithmic based technologies that solve complex tasks which previously required human thinking.
As I said, it's not a term I tend to use very much when I think about AI. I don't really think of as I think perhaps the technologist would define it. I just tend to talk about software solutions that help with decision making, which in companies is tools and services that are clearly AI, but some that are using sort of older technologies and perhaps wouldn't fall so much into that category.
So for me, on the business side of things, I use a very broad definition of what AI is. Thanks so much. I like this phrase of formerly required human thinking. I think that's something we can delve into a bit deeper, further in the conversation. But first, I'd like to move to Paul. Paul, lovely to have you here.
If you could introduce a bit about yourself and how do you. Great great to be here, Paul. Growth I'm a professor of data science at the University of Amsterdam. I think for this call, one of my background. Before I was professor, I was actually worked at Elsevier as a disruptive technology director. And so I've been really active in this space of how can we use AI intelligent systems to help publishing.
Currently, we also, along with elsevier, run a lab on called the discovery lab, which is around helping accelerate scientific discovery through artificial intelligence. So that's kind of the positioning here, kind of to jump over to the definition. I like this kind of funny theorem that we have in computer science or in AI. It's from Lori Tesler that says AI is whatever hasn't been done yet.
Right so we said that chess playing, oh, that was AI, but we did it. So now that's just a computer program where we said go was just, oh, that's a.I., but we did it. And now that's just a computer program. And so. I think this art of the art of, oh, we have to keep going. And that's what I is. But actually, I think really going back to almost the textbook definition of I, it's kind of the building and design of intelligent agents.
And what do we mean by intelligence? I think we mean two things. One is learning and the other is problem solving. So how do we build systems that are able to learn and then able to problem solve? And I think in publishing in particular, what we're seeing is the rise of really machine learning. So the ability to learn patterns, to make things, do automate things.
And so that's kind of my definition of, of AI and it's great to be here. Thanks so much. I also love that definition of it's whatever hasn't been done yet. And again, that's something we'd love to get back to. Thanks so much, Paul. Lucy, over to you. How do you define AI and can you say a bit about your own background and work?
Thanks thanks, Anita. So my name is Lucy Luong and I am a assistant professor at the University of Washington Information School. And Anita also mentioned that I am a research scientist at the Allen Institute for AI, which is a non-profit AI Research Institute based in Seattle. And I have actually been there for the last three years working on the research team of semantic scholar, which some in the audience may know of.
It's a open tool that's meant to help scholars discover, understand and interpret scientific literature. And I have also been involved in running the series of workshops, one called scientific NLP and the scholarly document processing workshop, which I collaborate with Anita on to kind of foster more research in this area, specifically of language and textual processing in scholarly documents.
And what is my definition of ai? I mean, I think Helen and Paul both gave really wonderful definitions. I think the thing that I like to focus on is I think AI are these technologies that we think that they're able to they have the ability to perform these tasks that have typically been done by humans. And I think really requires some sort of higher level intelligence or knowledge in order to perform.
And these days. I think this the machine learning paradigm is we typically have these technologies learn directly from data rather than, for example, encoding intelligence in a different way, like in war based systems that have historically maybe been more popular. So, so yeah, that's kind of my definition. Thank you.
So this offers a very interesting, I think, overview, the software that helps decision making that previously required human thinking. I thought that was great. And then Paul saying they're intelligent agents that learn and are good at problem solving. And so Lucy saying they perform tasks that have been done previously by humans and require higher order knowledge.
I think that's a very interesting sort of well rounded series of definitions. I want to very briefly involve the audience. We see that there's 46 folks on the call right now. And I'm wondering if any of you are using AI at this moment within your work as a publisher, if you could just briefly type that in the chat. And if I can ask that of the audience, and then I'd like to turn to Helen.
And Helen, you mentioned your blog, which I think is very pivotal. Yes is the answer. OK, great. Yes AI is being used. Helen, could you say a bit about what you see as key technologies that are currently key AI components or I think said computational technologies that are being utilized in publishing?
Can you can you give us a brief overview of some things that you think are really happening right now that are at the forefront? Yeah, sure. I mean, I think it's hard to think of an area of publishing where AI isn't starting to touch people's workflows, or there are solutions being developed that are maybe not quite ready for implementation, but are almost there.
So I think if we sort of start with the article writing process, then there are tools and services like PayPal rifle that will help you write your paper and give you suggestions for improvements. Then when you submit, you've got sort of really interesting things publishing publishers are developing like wireless Rex solution where it's automatically extracting the data from your paper to really speed up the submission process.
So you don't have to do so much manual data entry. As those papers come into the publisher. There are a lot of tools out there that are screening and checking those manuscripts for all kinds of things. You know, whether they're looking at methods or language or all sorts of aspects that a publisher might be interested in as we move through to peer review.
There are certainly tools that will assist with peer review checking. I think we're not really there with stats checks. There are some basic rule based systems that will help check statistics. That's perhaps one area where it's not so developed. Certainly, again, picking out areas of the paper where things might be missing or the paper isn't so strong.
Looking at the citations, whether those citations support the paper or not, cite certainly does some interesting work in that area. Then we could go into production and. There are a huge number of tools now that will take your manuscript and automatically create a preview. Again, that's perhaps more used in book publishing than journals publishing, but certainly some really interesting developments there.
Then we move on to post publication where I think everyone will be very familiar with recommendation type technology and automated classification services. So that's sort of editorial workflow. But of course, behind the scenes and publishing business, you've got solutions and customer services that are helping customer services categorize messages as they come in.
You've certainly got solutions that marketing teams are using to help target content. And then on the sort of business side, again, you've got tools which are really helping publishers. What areas should I go into next? You know, they're using the data and what's been published and looking for where there are gaps and then matching that with other data sources to say, well, that, you know, there's a gap in the market there.
Perhaps you should build a product or service in that area. That was a very quick way through. That was actually fantastic. I was I was trying to keep up while writing and I was thinking this would be such a helpful overview to have somewhere. Helen and perhaps you've already published this, but this, this is really super, super helpful.
Thank you. Paul, you were mentioning some of the work. Of course, we're working as Elsevier with you on the discovery lab. But I was wondering, from your viewpoint of coming moving through publishing, you were an academic then. You were a publisher, now you're back in academia. What are your thoughts on what are some developments that are happening in AI where you really see publishing can benefit and where do you think what do you think this is?
This is heading this interaction between publishing and the technologies that you work on? Yeah, I think the two places where I see like really immediate additional impact beyond what Helen was mentioning is in terms of summarization. I think there's a lot of exciting work on how we can extract bullet points, or I think there's work by scholar AC to do summarizing papers or in general just that area of shortening up papers, producing data for reviews.
I think summarization is really going to be it's right here, right now. And I think in terms of implementation, something that I think you're going to see a lot more coming from publishing. The other place that I'm always. Quite a fan of it, and I think it's getting better. And our technology for it is kind of using published material as data fundamentally.
So your classic natural language processing, information extraction, building, what Lucy has been building over. Eleni so better semantic search, right? So better search capabilities for end users looking for data, but also using that as an additional pipeline for data sets that you can essentially provision as a publisher to different, different companies.
Right and I think see that when we started with citation databases, right. The scope of the world. But I think see that expanding in a lot of different areas as our ability to do information extraction is better and better all the time. So those are probably the two kind of top of the line areas. Thanks so much. Yeah so summarization and using the publications themselves as underlying data.
And I'm seeing some conversations already in the chat where publishers are making their open access content available and for doing AI projects like concept extraction and they're mentioning SDGS as well, which is very interesting. Lucy I share Paul's admiration of semantic scholar. I'm always astounded at how quickly you guys incorporate amazing new ways to deal with scientific materials into your products and projects.
And I'd love to know what you think. What, where is this heading? What is what is the intersection of kind of really cutting edge research right now in ai? And where are we going? What what next for science publishing? Yeah, absolutely. So I think there's many steps in the publishing and also the consuming process of these publications.
And Helen already mentioned some of these. Search recommendation, access, reading, writing and many other auxiliary tasks around it. I would say certainly at any organization, including semantic scholar, one can't focus on every single task in this pipeline. So some of the things that we're I'm most interested in are sort of AI technologies that assist with reading as well as interpreting documents in the context of the remaining scholarly literature.
I think that's really these kind of multi document or cross document tools that are going to be able to draw kind of draw connections between one work and the rest of the scholarly literature. Those are going to be the next generation of scholarly AI tools. So in terms of reading, I think there's lots of things that we can do to guide scholars through these documents there.
I think summarization came up before. And semantic scholar has this TLDR feature, which helps people identify which papers to read. It's a very, very short, usually one, maybe 1 and 1/2 to two sentence summary of these papers on the search results page. So people can skim quickly and try to decide what they want to read. But then once you decide to open a paper, I mean, you're still faced with anywhere between 5 to sometimes 50 pages of very dense text.
And if you think about even longer works like books like it gets it gets even denser. So what kinds of technologies can we use to help people get to the right place in those books or in those papers? Are there section specific summaries or are there kind of question answering interfaces that we can build. So that maybe people can actively search for what they're looking for within a paper?
I think those are really things I'm looking for working on. TLDR stand for. It's an acronym that stands for too long. Don't read I think. And it comes Yeah. It comes from the social media world and we sort of borrowed it for this purpose. Thank you. Thank you.
You mentioned this point on linking across documents. And I recall we were involved at one point in a DARPA project that had to do with finding biological pathway information across documents, the Big MAcc program at the time. And there was a question also in the chat, any thoughts on finding similarities between documents? I'm wondering if any of you, Lucy or Helen or Paul have any thoughts on currently what is the state of the art?
How how easy is it to say, I have this document that one is similar? What kind of dimensions does that similar phrase take? I mean, I'll hand it to you first, Lucy, but Helen and Paul, feel free to step in. Yeah, happy to speak to this. So I think similarity can mean a lot of different things in different contexts. But some of the work I've seen thus far is quite promising, is one kind of building out these systems that can kind of verify claims made in one article, one article with evidence that's described in other work.
So essentially, if you're working on a research problem, you want to find all of the other related work that has worked on a similar problem or worked on a variant of that problem and what their results were, what their methods were. So there are some preliminary models that can do this kind of identification. It's very hard to verify whether they're complete because now we're essentially asking models to perform something like a literature review, which we all know is even it's very hard to be thorough in a literature review, even for humans, but I think there's a lot of promise in what models can do for this, like for this particular application and how it could help scholars be a lot faster with that in the future.
And I work in this space, too. I think I specifically look at clinical research questions and kind of ways to help speed up systematic literature review in that clinical research. I mean. Helen, did you want to add something? You were nodding.
I was thinking more from I'm not sure the context of the question, but if you're thinking in terms of finding paraphrased documents and that side of things, crossref, I think, are working on a project in this space with Turnitin and certainly the STM collaboration hub are looking at trying to find similarities between submissions. And so those are tools that are sort of actively in development, but not quite ready for consumption just yet.
But there's certainly quite a lot of work going on in that area behind the scenes. So actually finding documents that may be plagiarized rather than in a positive way, saying, here's a document that actually has evidence for a claim that's made in a different document. Yeah, that's correct. Yeah so those are sort of two sides of the same coin in a way.
Paul yeah, Yeah. No, I mean, I think these are actually super great examples. I think also just some more, maybe simpler technology is also useful here, right? So AI systems are really good at telling you the similarity of, of different documents. We have large language models. We can embed them and you can do interesting things just by computing similarities, scores.
And I think the two examples I would put, I'll just drop this in the chat is this really interesting example where they were automatically clustering. This is from a startup here in Amsterdam where they were automatically clustering papers coming out of a recent conference actually. And they're just doing that automatically providing interesting visualizations.
Also, I've seen it used often for what's called zero query result answers. So sometimes you have you're building a search engine, you get no answers and you might want answers that come back. And so then you can use these similarity mechanisms to find, hey, there's these other papers that might be similar to what you're looking for. And essentially these algorithms are very good at learning representations that allow for these kind of similarities to be computed.
Thanks so much. Can I jump in just one more? So we talked a little bit about the reviewing process earlier. And I think one thing where similar paper identification could be really helpful is in citation recommendations. So identifying missing work that really is related and maybe should be cited. I generally don't really think full automation of the review process is a good idea and I can talk more about that later.
But I think for this one slice, which is identifying missing recommendations, missing citations, AI technologies have a lot of potential. Yep so so that's very interesting. We're getting a lot of questions about looking on pre screening, streamlining tasks and workflows in editorial offices and scope checks and such. I think there's a question can we find all of this information in one place?
I was hoping Helen would go off mute on that one. I can probably put something together. I have something that's about two years old, but it's quite out of date now because there's been so many new products and services launched. I know Martin delahunty did a webinar a few days ago. I can try and find the link that that was had quite a good summary of sort of tools and settings across the workflow.
But I'm not aware of a really nice list where you can go and say, well, I want to do a scope check who builds those? I'm not sure anyone's created that yet. Thanks but I think your blog would be a very good start for sure. Yeah Lucy. Yeah Oh. I feel like another blog that's a good resource is Erin tay's blog.
He's I believe he's maybe a librarian, but he often does sort of reviews of the latest AI tools in scholarly publishing as well. And does some nice comparisons. Yeah Thanks so much. And so I think this is an interesting question. Lucy, you were saying that you don't think that AI has a role to play in peer review. I'd like to throw that out there.
Do you all think that peer review can or should be done by AI, or how could it be supported by a.I.? Happy to open that up to any of the panelists. What do you think the role of AI in peer review could or should be? So I will say I absolutely think there is a role for a.I., but I think so. So the thing that a lot of people.
I think there's like really kind of two bottlenecks in the review process from kind of not an editor perspective. I mostly review for conferences, so I'll just put that out there. But one is finding appropriate reviewers and getting them to say yes, and one is actually getting high quality reviews that provide reasonable feedback to the authors that they can iterate on.
So for the first part, I think we've had a lot of work on trying to identify authors with similar catalogs of papers and kind of doing automatic assignment within some pool of reviewers. And this mostly works OK. Like you can get people who are fairly close to the topic of the paper, but I find that if you're just sending out a lot of invitations without this human touch of really having a connection with the reviewer and explaining to them why you're asking them to review a paper, people tend to either say no or be less responsive and write less good reviews.
So if there's some way of merging the eye aspect of recommending good reviewers with this human touch of like, why should you review this paper? Why should you do this thing for me and for the community, though? That that would be wonderful. And then, of course, on the review side, what parts of that are bog reviewers down?
So I mentioned something like identifying missing citations automatically that that could be automated, whereas actually engaging with the content of the paper and making recommendations I think making suggestions to the authors to improve their work. I think that again maybe is still something that requires a more human touch. Thanks so much, Paul. Yeah I mean, I agree, I, I agree with Lucy, but I'm just going to go out on a limb because we're in a webinar.
Why not? And I think it really depends on what kind of reviewing system we want, right? So if you think about where some of us are going in publishing, where it's not about publishing any more potentially novel things, right? So you don't have to judge novelty where you're just thinking about, did this person do the science correctly?
Right I think there is maybe even deeper involvement we could have for intelligent systems. Right so then you're talking just about, hey, did you do your experiment, right? Did you register your resources in the right place? Did you report all the things that in our star methods we required you to report did? Like, I think all of you on publishers know the amount of checklists that we now produce for authors to do.
And maybe if we move towards those environments where it's just about methodological rigor, it may be that the amount we could leave over to the 10 automated system could be a lot higher. And I think that's a model that I think people haven't really considered yet because they thought about it in replacing what I do when I read a paper, right, which is a very different thing. It's about taste and about my knowledge and about what I think the field is doing and less of just, hey, did you do the science?
Right so. You want to add to that, helen? Yes, I think peer review is a tricky one because I think peer review has a very it depends who you talk to, what it means. But I think from a publisher perspective, if you're talking about screening papers, perhaps before they go out a peer review, then I think automated technologies have a huge role to play.
There are tools like Penelope, Cisco or repetto that will help with a lot of those checklists that are out there. There are many others. I can't remember them all. I think that's one aspect, making sure the figures and tables are there. Those kinds of checks that are kind of quite basic but very frustrating.
If you know, as a publisher or an author, you've got to resubmit something because you left the figure out or you didn't put the figure caption in. So those kinds of things, I think are sort of becoming quite common, although perhaps not everybody's implemented them yet. I think things like using AI, if you're in the sort of preclinical image space or one of these areas where image fraud is a problem, I think that will become particularly important.
I think publishers, you know, there are people trying to defraud the system and we have to be aware of that and we have to keep an eye out for that. And I think that's perhaps much more of a publisher responsibility than a peer reviewer. Responsibility in my mind. I think it's quite hard to pick those kinds of things that prepare you.
And I would also say on the publisher side, I think some of the sort of checks around identity like are you are you actually the person you say you are? And I know perhaps we used to working in a world where everybody trusts each other, but actually I think increasingly these papers aren't actually coming from real people. And so we need as publishers to be looking, you know, really, is this person real or not?
Is this person likely to have published this body of research or not? Is this collaboration probable? I mean, if you have three, three collaborative collaborators on a paper and one's from geology and one's from history and one's from economics, potentially, but they may have paid to put their names on those papers. So I think those kinds of checks, those perhaps rather depressing side of publishing, I think those sort of quality checks will become more common, will become more common and much more of a publisher responsibility, I think, than a peer review process.
Helen, I have a question for you. Like, so one of the things I maybe Lucy experience this is just there's just so much pressure on the review system. It's just so difficult to get reviewers. How much do you think we can kind of lighten the load? Right it's just selfishly. Well, I think most publishers want to be able to find a reviewer that will review first time.
So there are lots of companies and publishers working on review systems that will narrow down to the person most likely to peer review and make sure they send you the message at the time when you're most likely to accept it. However, I'm not sure that really lightens the load. I think. I think we're going to have to get more towards automated papers where the volume of the peer review is.
There's parts of the paper that are automated, parts that are written by the author, and the peer review focuses on those the bits that you really need a human for. I think we've spoken about this, the method section particularly, particularly an STM thing. If you're using a machine that can automate the methods, just take it and put it in the paper. We don't need someone to write that and review that. I don't think I have any brilliant solutions for decreasing the volume.
I would love to hear them if you do. So what I find fascinating about the way in which this conversation about peer review is going, is that what I'm getting, both from all three of you, really, is that we're less seeing we're sort of seeing a merge between the AI work and the human work. And I love that really all of you were saying. But Lucy was really emphasizing need the human touch. You need to be convinced by your human that this paper is worth reading.
At the same time, you're all saying there are elements in science that are largely done by machines. Maybe they could be produced by machines and checked by machines. If you think of methodology and poll, I remember work that you did earlier on, which I always find very fascinating of there are fully automated labs. They could also generate fully automated reports, of course, but then who would read those reports?
So I'm very interested in this sort of balance between if you're seeing the human touch and you're saying the specific thing that humans can do that computers can't do, what is that? And in particular, I'd love to take this moment to go to a slightly darker side. There are currently already papers that have been fully generated by machines, and as people are also posting in the chat, there are people who are not just paper Mills but computationally generated papers.
How do you think we as humans can make sure that we keep on top of this and that the the technologies don't take over entirely? Is there a negative side to having to grade an interaction between the humans and the computers? Happy to throw that open to wherever feels most compelled. Lucy's nodding, so I'll hand it to you first, Lucy. well, it's quite a question.
So luckily, I think I have not had too much direct interaction with these sorts of generated papers. But I think one question we should be asking ourselves is, what is the purpose of papers and scientific dissemination? Is is the goal to create papers for other people to read so that they're like, what is the substance of the paper that authors want to communicate with the scientific community?
If there's maybe a way of highlighting the specific contributions the paper are making, it might also be able to lighten the load on reviewing, be able to lighten the load on reading and all these things. But another question is maybe we are really just trying to produce more papers for computers to read or for kind of like only automated consumption. So in the case of if you're running an experiment in a machine that is able to generate a report that replicates all of its settings, such that you can go to another machine of the same type and replicate that experiment.
Is there really like a human doesn't really need to know the details of how that experiment was produced, only that it was a reasonable experimental setting. And here are the results and how one should maybe interpret them. So so I think there's room for both. I think the types of AI that we're talking about now, they are able to ingest huge volumes of data and kind of make connections between them.
So maybe there's room for these kind of system report like things that are primarily meant for machine consumption. And maybe that's an OK. I don't really know. I'm just kind of speculating, is that an OK thing? And and maybe the scientists can focus more on interpreting and communication. Yeah, I mean, that's a extremely good point because if you look at one of the great I think a big success in the last year in science from AI is AlphaFold.
What is AlphaFold. It's just generating lots of potential protein structures. Right and we don't think that's a bad thing. Right sorry. Can you explain what AlphaFold is and how AlphaFold was a model that was trained to produce protein structures that should I say, should be. I'm not a biologist here.
Somebody help me. Should be capable of existing, I think biology. Somebody correct. You are wrong. So that's kind of generally what AlphaFold is so doing all these structure predictions that are very difficult to do but just generating lots of potential. Feasible protein structures.
And and that's been seen as a great thing, as a way to find new drugs or promote drug discovery. Or if I'm a scientist now, I'm looking up potential ideas in for a drug. And I can see, oh, there's RNA and protein structure that this eye has generated. So that makes me potentially more positive for actually deploying that or testing that usage. Now, that's different than a generated paper, right?
But it's also a generated aspect from I and maybe as Lucy was saying, that we should actually think about what we're trying to maybe as publishers, we should be thinking, oh, we can make these databases available of, of interesting resources. So sorry. It's not as dark as does. Oh, no. I was just waiting for Helen to jump in.
Yeah, I agree with what Paul Alicia said. I think on the humanities side, a peer review is a different process. And I think it's much more about, again, covering a huge generalization here, but it's much more about making sure that the arguments in the paper are really clear. And I think peer review is a slightly different function. So I think what this is going to be, it's not going to be one size fits all.
think the future of this, I think it's harder to generate papers in the humanities that are not so, you know, that can get past human editors. I'm not sure that will be the case in a few years, but I think right now it's still quite tricky. I think we're going to have an awful lot of problems with things like dally, the image generation tools, and I, I don't know, I can imagine archaeology journals being absolutely full of rather interesting things that have been computer generated but don't exist in the real world.
So yeah, I think that's a real problem. And I think we're going to see different solutions in, in the different subject areas. And I, yeah, I see some areas of science being very computer generated as, as, as Paul and Lacey have said. Yeah I wanted to just do one thing is I wanted to point people to this initiative that I'm fairly excited about.
It's called the content authenticity initiative. And this is about. It's about. It's from Adobe and Microsoft about image primarily deepfakes, which you've heard of, but it's about how do I authenticate the images that I produce in a cryptographic way, but also allowing you to do things like Photoshop edits, but still say, OK, report the transformations or the provenance of these images.
So I think for publishers, I think this is definitely something to look at, not just for AI generated work, but for in general making sure that the content is. Authenticated no. This is this is an incredibly interesting direction. I think this idea of authenticity and the earlier idea that perhaps there are elements that computers can generate.
Your example of the protein, you know, developing all the proteins that could be folded and having them help us there. I do want to push a little bit because there are, of course, lots of issues with the use of AI regarding bias that is exacerbated through the use of AI systems. There are biases. There are certain people who benefit more from specifically machine generated systems than others.
It is a concern very much. I within Elsevier we have a large responsible AI group to make sure that the systems that we use inside the publishing company do not exacerbate biases. And in fact, we had a project last year where some of my colleagues looked at reviewer recommenders and we found that the automated reviewer recommenders were steering in a biased direction compared to human reviewers.
We've since then, you know, noticed that and changed it. But it can be a danger if you have, so to speak, a self steering mechanism that it might benefit certain people more than others. And in particular, people who already have will get more because they're good at getting whatever it is. So I was just I was just I just wanted to allow you the opportunity to comment on that point. Is there something that we as publishers should be looking at?
Is there something we as a society should be looking at? Is it something where you see a role for technology or the interplay between publishers technology and the scientific community as a whole to be looking for these types of bias and keeping a finger on the pulse in that direction. Any any thoughts on that? I can.
Oh, sorry, Lucy. Go ahead. Go ahead. Lucy, then. Helen all right. So I guess this is a discussion around this rich get richer phenomenon or Matthew effects. And I think when we train models, we often learn from previous data which likely exhibits these biases that you're talking about, Anita.
And I think it's really hard, even as a reader or an editor or a writer, for that matter, to not be impacted by authors affiliation. People like papers published by people I know who are in my network, those tend to show up more in my work because I'm closer to that work. I've I'm more familiar with it.
And I can also I feel like I can vouch for it more deeply because I know some of the details or can get the answers to those details. So how can we make this process more equitable? I think certainly creating tools that grant better grant broader access to lots of scholars, like scholars from all over the world with different levels of resources.
I think that helps to even the publishing playing field, essentially giving people more and equal access to content, giving everybody access to these types of AI tools. So something like these, like reading assistants or writing assistants, I think these can be really helpful in some context, but also kind of I think something else that's that we haven't really touched on is there are lots there's different norms in different countries about things like borrowing from other work and maybe some tools which are not AI specific that help people understand these norms or maybe detect these things while people are writing papers and make suggestions as to how they should change their writing or their citation in order to be more aligned with the scientific norm is another potential intervention that could make this process more equitable.
Thanks Thanks. Interesting suggestions. Yeah Helen. So I think his publishers, we have to be really mindful. I think if you are going to build an algorithm to predict acceptance or predict impact, you really need to think about what the algorithm is actually doing. Is it is it doing what you think it's doing or is it just saying, well, this person is a white male from a prestigious institution, therefore, you know, they'll go to the top of the pile because that's the kind of data it's going to be learning from.
So I think we have to be really mindful. Cope has some excellent guidelines in their space. I actually think within publishers there's actually an awful lot of understanding about what is good and what is bad, what's acceptable, what's unacceptable, what is borderline. And I think as long as this isn't technology, people working by themselves and this is teams of people working together where you can bring that ethical expertise from the publishing side together with the technology, then these won't make these problems go away, but at least there'll be some awareness and there's somebody saying, actually, I don't think should be.
I don't think should be doing that or that's not acceptable. So yeah, I think we just have to be mindful. But there's no magic. There's no magic bullet, I think. Well, yeah, I just want to I think this point about mindfulness, this is a real the key thing I think we should be looking at what are these? Because these are systems usually they're not just one model that we're doing or training.
These are processes. These are being Fed by data. There's humans making annotations. There's decisions about your supply chain of data and also how you build these models and how they integrate with the rest of how you visualize the data or display the outcomes on the screen. So the only way to think about this is what do we as publishers have as our values, and how do we make sure that the products that we're developing represent those values?
And maybe that means we don't do something, or maybe it means we put filters on the top or we're mindful of what is this actually doing. So I think that's the way I would frame this is focusing just on the AI model is kind of like just a really small piece of the puzzle because these are very complicated systems. I know enough publishers to know that they have very complicated systems. Right and those systems are social, technical systems.
Right they're not just these machines on their own or something like this. Right so so that human touch, as Lucy called it, it's key that that stays involved. Is, is if I can paraphrase something that you said. So we have some questions in the chat, which I'd like to hand over. So key what are the risks? Computational anonymization on peer review, closed source de facto standards for algorithms that skew decisions.
I'm not sure if that's a question. And Nicholas, if you were there, I don't know if you're going to mute yourself, but. If anybody has a thought on that question. So what are the risks that we need to look at? I guess that's a good follow up question. Thank you. So how do we do that?
How do we make sure that we are on top of the fact that how do we stay in that loop? Go ahead. I actually would say for publishers, the thing you probably if you're operating in Europe, you really need to be aware of the legal risks around these issues. There's a lot of very opaque, which I'm not a lawyer that about decision making processes that are automated.
And so any time you're doing decisions, especially in Europe, based on an automated process, you really have to talk to your lawyer about what that implication is. So I think for a publisher. That's probably the. Probably number one risk I would think about, because I think can ameliorate some of the how do you say the potential risks around like algorithms that perform badly or are not doing what you expect by building better systems and by being mindful?
But some of these legal risks are kind of very hard to understand with these decision making processes. Thanks for the comments on that question, Lucio or helen? I have a comment on this, this idea of trying to remove things like demographic features from data. So I think from a surface level, it seems like it's a good idea towards equity.
But I also think there's many places in science and humanities where actually like identity kind of matters. And it's not as if removing just someone's name or affiliation from a paper is sufficient to truly anonymize who the authors are. Right there's just a lot more women and gender nonbinary people working in gender studies than in other domains, like there's these types of associations.
And I think sometimes removing that identity does some injustice to the work as well. I guess, you know. And then there's also other things like there's studies that have shown that sometimes things like gender, ethnicity really impact how we write, how we cite. So there's just a lot of invisible features there that are hard to be anonymous.
I think back to Helen's previous point, it's more about what the goals are, right? Like is the goal to be to judge each paper completely impartially based on some. Some singular standard, or is it to kind of take the merits of each paper into account as it's going through the review process and making a judgment based on that?
And these are quite difficult questions that take a lot of time and resources and people to think over. But I think we really need to collectively think about what we want the future publishing to look like. Is it just going to be more papers growing at some exponential rate where papers? Yeah, I just think it's not a very sustainable process right now.
So so we need to consider how to make it more sustainable and also more valuable for the community. And I think on the issue of the anonymization, it really depends what you're trying to do. So if you that that might be the right thing to do, but actually, you may want to keep that data in because you might want to increase the proportion of some group going through to the next stage to in the hope that that will change things later on.
So it's really I think it's really complex. It really depends. You've got to be very clear about what you're trying to do and then really be mindful about the risks. Is it going to do what you think it will do? Is it actually possible to do what you want it to do? Is it desirable? Those kinds of questions where you've really got to think through carefully with diverse groups in the room?
I would say, because it is really complex. And one thing that I think might be awful, somebody else will think, well, no, that that's going to improve my chances of this. So I really pray that I think these are really complex questions. Thank you so much. We're almost at the top of the hour and this has been an absolutely fascinating discussion.
And Thank you so much. And I love love the idea, Lucy, of you challenging us to think of the future of publishing within this context. I wanted to close with a very brief, maybe few word response from each of you. So we've asked you a lot about what I can give to publishing, but I was wondering, is there anything that publishing can give to I? So is there anything that you would like the publishers to do to support your work as a scientist and as an AI researcher?
I'm going to start with Paul. We're totally random reasons. Oh, I think the number one thing is to kind of reach out. Right it's an amazing field and it's going super fast. I think reaching out and interacting, it's such a cutting edge field that cooperating with researchers and I it's a really great time for you to work with folks. That's probably what I would say.
Thanks, Helen. I guess what we can give back is probably lots of nice, clean, processed data sets, I guess. Fantastic so do a lot of the processing. And Lucy, I'll give you the last, last word. Is there anything that publishing can do for AI, for your research and all the work you've been involved with?
This is very related to what Helen said, but I think data is going to become increasingly important. But one thing that's really hard to tell right now is what data exactly is available for secondary use. So like in a paper, is the metadata available for secondary use? Are the images is the text. It would be really wonderful. And the citation network and so on.
Be really great if. The answers to these were really obvious embedded in the document, something that we don't have to work as much or work as hard for or kind of be risky in just like literally making a decision as to what data is useful. I think that would be incredibly valuable for AI going forward. So if it got you, if I understand you correctly, making it clear for each document what element of it can be reused, is that correct?
Yeah and how. Great Thank you so much. Well, I just want to Thank you all. This is a fascinating discussion. And Thank you so much for all your insights and all your thoughts. I have lots to think about for sure, but I'm handing it back over to David. Thank you, Anita.
Thanks to everybody who participated and sent in their comments and the chat and Thanks to the panelists for an engaging discussion today. We also want to Thank our 2022 education sponsors. Again, arpa, J&J editorial, open, open, Athens silver chair, 67 bricks and Taylor Francis. 1,000 evaluations will be sent by email, and we hope that you'll provide feedback. Please visit the cesp website for information on upcoming programs.
And this discussion was recorded. So everyone will receive a link when it's posted on the cesp site. This is the session is not concluded. Thank you. Thank you so much again.