Name:
Plenary and Moderated Discussion: The Rise of the Machines
Description:
Plenary and Moderated Discussion: The Rise of the Machines
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/c91e0d6a-e6e7-434c-b954-14c23fcf9c69/videoscrubberimages/Scrubber_1.jpg
Duration:
T00H59M59S
Embed URL:
https://stream.cadmore.media/player/c91e0d6a-e6e7-434c-b954-14c23fcf9c69
Content URL:
https://cadmoreoriginalmedia.blob.core.windows.net/c91e0d6a-e6e7-434c-b954-14c23fcf9c69/plenary__rise_of_the_machines_2024-05-30 (360p).mp4?sv=2019-02-02&sr=c&sig=6Om14aX87P2cs4jQZIdPpoVwY1vMJZ4kxtlX8%2BvjFEc%3D&st=2025-04-29T20%3A37%3A21Z&se=2025-04-29T22%3A42%3A21Z&sp=r
Upload Date:
2024-12-03T00:00:00.0000000
Transcript:
Language: EN.
Segment:0 .
Good morning. Good morning. Is everybody feeling OK. I heard there were some wild get togethers last night. I don't know. Make sure you stay hydrated and caffeinated. There's going to be a busy day.
Don't want to miss anything. Believe it or not, today is my last day as your president. I thought, someone's going to laugh about that. Like, Yes, good. Thank you. Thank you. If you see me walking around slowly or taking in what's going on around me, I'm just soaking it all up.
This has been very special for me. So thank you all for the experience. I have a quick question. Are there any librarians in the room. Let's give it up for the librarians. I want to just say we need more of you here. We need more librarians and humanitarian the humanities journals.
Anyone here representing humanities journals. We need more of you here as well. We need your perspectives to help shape our conversations. So hopefully you're having a great time and tell all your colleagues and friends you have to come to SSP, give them FOMO, make sure you have a great time while you're here. We're halfway through. Be intentional about it.
I like to recognize our generous sponsors and encourage attendees to visit to visit the exhibit Hall. I'd like to Thank the program chairs. Tim Lloyd. Aaron Foley. Jesse Slater, you did a great job.
And the annual program annual meeting program committee as well. Give them around of applause, please. The agenda for the meeting is in the program app. I still have to wash my peas. I'm so sorry. Excited in the program app, you can connect with fellow attendees, both in person and virtually.
I'd like to give a big, big shout out to everyone who's joining us virtually. Thank you for spending this time with us. The Wi-Fi network again, one of the most important things. SSP underscore annual meeting. Annual underscore meeting. I said the same thing yesterday. I got it wrong. SSP underscore annual underscore meeting and the password hashtag SSP 2024.
Please remember to silence your mobile devices. SSP is committed to diversity, equity, and providing a safe, inclusive and productive meeting environment that fosters open dialogue and a free expression of ideas, free of harassment, discrimination, and hostile conduct. Creating that environment is a shared responsibility for all participants. Please be respectful of others and observe our code of conduct.
If you need assistance during the meeting, please stop by the registration desk outside of the grand ballroom. Recordings of all sessions will be available in the app within 48 hours. Please join me at this time and welcoming annual meeting program committee co-chair and apparently everybody's buddy, Aaron Foley.
Might have a. Good morning. I hope everybody got sleep last night. Who got sleep last night I'm real impressed. Whatever you guys are doing. please come see me later. So it is my absolute honor to welcome everyone to the SSP annual meeting today.
For those of you who don't know me, my name is Erin Foley. I am one of the co-chairs of the SSP annual meeting program committee, and it's been such a good conference already yesterday. For those of you who were able to join us, we had an incredible keynote from Deborah Blum of dark on publishers in the age of mistrust. If you didn't attend yesterday, I highly recommend that you go on to the app and watch the recording later.
I hope everyone has had some time either last night or at one of the two breakfast this morning to catch up with some old friends. I see lots of old friends in this room and make some new connections. Networking is one of the keystones of SSP, so I hope everybody takes real advantage of it. This week. Most of all, as I said, I hope you guys got some good sleep last night because we have such an action packed day today.
And I'm really excited to kick it off. So when we were thinking about the morning plenary session today, we really felt as a committee that it was essential to address the topic that seems to be taking over publishing for the last year or so, which no surprise is AI. So while generative AI gets closer to peak expectations on the hype cycle, automation and machine learning have long been a feature of scholarly communication.
However, the explosive growth in community and capabilities and expectations over the last year or so has started to reshape our visions of the future and of our industry. To dig into this critical and timely topic, we're kicking today off with a discussion on the rise of the machines, which is moderated by Andromeda yelton, who's sitting right here.
Andromeda is a lead software engineer at Ithaca and participates in ventures in the JSTOR labs group. She has prior experience as a software engineer and librarian at the Berkman Klein Center for Internet and Society at Harvard University and also as a lecturer at the San Jose University. Ischool Andromeda has a BS in mathematics from Harvey Mudd College, an Ma in classics from Tufts University and an MLS from Simmons University.
And she's also a former president of the library and information Technology Association, or Leda, for those of you who are familiar. So she's a really busy person. She also is a 2010 Leda x Libra student writing award and a 2011 ala emerging leader. So we're really glad that we have such a busy person moderating our panel today. So thank you very much, Andromeda.
And on behalf of my co-chairs, Tim and Jesse and also the rest of the AmpC volunteers, I'm really excited to welcome you all to this morning plenary. And I'm really grateful to Andromeda and all of our panelists for being here today. So now I'll turn it over to them for a discussion on how AI may profoundly change what we do, whether automation will take over every aspect of publishing and what roles humans will continue to play in this tech driven future.
So, Andromeda, over to you. Thank you, Erin. Great sounds like the mic is on. So I'm going to start by introducing my panelists, and then we'll drive. dive right in. I'm starting to my left and going farther. So this is Shaolin Meng, shao Lee Meng pardon me, who is the founding editor in chief of Harvard Data Science Review and the Whipple v.n. Jones, Professor of statistics at Harvard.
Meng was named the best statistician under the age of 40 by the committee of presidents of statistical societies in 2001, and he's the recipient of numerous awards and honors for his more than 150 publications. In 2020, he was elected to the American Academy of Arts and Sciences. Meng received his BS in mathematics from Fudan University in 1982 and his PhD in statistics from Harvard in 1990.
He was on the faculty of the University of Chicago from 1991 to 2001 before returning to Harvard, where he has served as the chair of the Department of Statistics and the Dean of the Graduate School of Arts and Sciences. Then continuing further to my left, your right. Dr. Chao Chow Hong is director of scientific outreach at the American Society for investigative pathology, a program manager for the women in AI accelerate and raise program and currently serves on the boards for eight different mission driven organizations in the spheres of scholarly publishing, digital pathology, AI ethics and youth education.
Education she's a former biomedical researcher expert. Scholarly communicator and a sought after mentor in the field of scientific research, scholarly publishing and ethics, especially for women and minorities. She's a thought leader, a renowned international speaker, and a strong advocate for equitable and accessible health care. She was named in the AI makers 150 AI and analytics leaders and influencers.
2021 list the 100 brilliant women in AI ethics 2022 list the top 100 women of the future 2023 list and was listed as a finalist for the AI Researcher Award 23 from women in AI and author award to be announced in July of this year. She's also the co facilitator of SSPs AI community of interest with 200 plus active members. So maybe talk to her about that. And continuing all the way to my left, Dave Flanagan is senior director AI product strategy at Wiley.
After completing his PhD in polymer science and engineering from UMass and Caltech in 2004, he joined Wiley's materials science journals program. He's been the editor in chief of advanced functional materials, general manager of Wiley's chem informatics program, and has headed a data science team. In his current role, he leads Wiley's AI product strategy and is based in Germany. So Hi SSP, welcome to Boston.
I live here. I'm glad you can experience our phenomenal weather and traffic and I hope everyone is having a great time. So as I was prepping for this panel for a very literary room, I found myself thinking about AI in literature in literature, we see examples of AI that are clearly very non-human. Like Naomi Kritzer is delightful short story cat pictures, please, in which the internet is very helpful, although it does not necessarily understand what would be most helpful to you.
And it loves cat pictures. But we also see, for instance, Hal, who is every bit as creepy if you watch the movie now as he was when he first came out. We also see examples of AI that is human or very nearly so. For instance, in Blade Runner is just indistinguishable from humans. And it is dystopian and violent. On the other hand, we have data from Star Trek who is brilliant and reliable and not quite human, but in being not quite human sheds so much light for us on what it means to be human or Asimov's are Daniel Oliver, who decides the three laws of robotics are not enough to protect humanity goes further, invents a zeroth law and secretly manipulates history for human benefit.
So clearly, we have a lot of ideas as to what AI looks like and whether it's a good thing or a bad thing. And similarly, I hope we will have a lot of divergent ideas and perspectives on this panel and also similarly to science fiction. We're not here to teach you how AI works. There are a ton of panels and vendors and posters, and many of you are clearly experts already.
We're here to speculate on the impact of AI on publishing 5 to 10 years out. So we'll start with the concrete work day to day of publishing and move outward to the industry as a whole and its impacts on the world. Along the way. We'll weave in some SSP, core values, community, inclusivity, adaptability and integrity. And ultimately, I hope you walk away with your brains buzzing with ideas about what AI is and what it means.
And with that, let's have some questions. So panelists, let's say we have leveraged human expertise and AI technology to build different workflows over the next 5 to 10 years. What will humans, jobs and publishing look like. What are humans still going to be better at. Are there new jobs humans do. Are there jobs that have gone away. Did we fire everybody and let the machines take over.
We'll start with Dave and move toward me. Yeah so first of all, just Thanks again to the organizing committee, everybody, for inviting us to have us here today. I think when we think about AI and how it's going to transform publishing. There's a lot of things that are important today that computers, AI machine learning isn't going to replace. Just thinking about the meeting here.
AI is not going to replace community making those connections with people, making those. So that sort of personal bonds that you have between people when you can meet them in person. But there's a lot that AI and machine learning can do to give us tools to do what we're doing better so we can think about how can to use Steve Jobs analogy about bicycles, how can we give people bicycles so that way they can do more with the capability, the expertise that they have.
How can we use AI as an assistant that can do a lot of the drudgery work that, frankly, we'd rather not be doing today. Because if we can give that to the computer, if we can give that to our AI assistants and interns, then we can focus on the more interesting parts of our jobs. We can focus on, the parts that we got into publishing because we wanted to do.
Thank you for having me here today. So when I think about AI, especially since the last couple of weeks with the advent of GPT 4 that is so user friendly can do amazing things. It's so interactive and it's going to be very intuitive about the needs of an individual. So I think of AI as artificial augmentation, and there is no doubt in my mind that we will be augmenting each and every process of scholarly publishing in the coming years.
But having said that, and I was told there would be some cool music playing, but the statement I want to make today is the rise of the machines is not the fall of humans. And especially in our industry, the interactions, as David mentioned, and the involvement of humans to maintain the high standards of rigor and reproducibility trust and transparency that Deborah emphasized so much yesterday in her amazing talk.
So there is no reason why humans should be replaced. But having said that, with the advent of the new technology that I just mentioned, right, everything that's surrounding a scholarly publishing is going to change. And we are service providers after all. So we are dealing with marketing, we are dealing with member engagement, and all these services are going to see a huge shift and that's going to affect or touch upon the jobs, but not by replacing humans, but just augmenting our skills to make our jobs more interesting.
Thank you. And Thank you for leaving me 40 to two minutes to answer the entire question. Let's try to Thank you. I am a scholar and I publish. That doesn't mean I know anything about scholarly publishing. The reason I'm here, I guess I've been fortunate to have this opportunity to be the founding editor in chief of Harvard Data Science Review.
You'll be hearing me advertising this all the way, and I even have a copy for some of you. And so the good question, I will sign it. You can sell it on eBay. But my most important qualification today is you all have to listen to me because I'm the boss of your President-elect. I'm extremely proud of his. Rebecca Where's Rebecca.
Somewhere right there. Thank you, Rebecca, for your wonderful job. And make this happen. Now OK, let me seriously answer that question. When we talk about generative AI, particularly as statisticians now call myself data scientist, we obviously have lots of these similar questions. And you know what. Statisticians would ever still be needed.
We all know statistics will be still needed, but everybody is replacing us. And one of my board members was at the Northeastern meeting this afternoon, sent me a picture, a news clip from years ago when calculator was invented, the newspaper clip I wish I could display it was having a group of mathematical teachers protest the rising of the calculator because they were worried about this going to really replace the students, learning how to do calculations, everything else.
But I guess you guys probably don't feel that was a threat. We all use calculator these days. And also remember at the very beginning, I do need to take a 40 minutes. Sorry my father, who 95 and he was probably the first prompt engineer because when a calculator was invented, I got him one. He was having fun with it. He was keep pushing it, trying to show that trying to fool the calculator into doing 3 plus 3 is 6.
That's right 6 plus 6. 12 that was right. He was very frustrated. He could not break it. The reason I mention this is that as you said, the technology is going to change us. It's going to change changing us. But it's not going to replace us. We just published an article in HSR.
I welcome you all to read it. It's about how the generative AI is going to change the data science education and the particular topic they're talking about is at this moment there are lots of software engineers. Their job is being changed from coding to more becoming manager software manager. So this has great implications on the system itself. Lots of the software engineer gets worried because the code is I haven't coded myself for a long time, but now I can code using GPT, so that definitely helps us.
But it's very much like the calculator helps us. We don't have to do these long divisions anymore. And so that I know it's long division, so I'm going to stop here too long. Thank you very much. Thank you. Yeah so it seems like we have a common theme from all of us of AI as an augmentation, but not as a distraction, perhaps.
And agreeing is fine, but disagreeing is also fine too, on this panel. Anyway, so I've been thinking about how scholarly publishing comes in lots of different sizes. Xiao-li doesn't work for a publishing company. He works for Harvard. He founded this journal, which is housed at a small academic press. Charlie works for a professional society with a handful of employees.
I looked up your 980. It's like for Dave works for a publisher with thousands of employees in almost two dozen countries. I can imagine I really benefiting small publishers by reducing costs and putting more types of workflow within reach. But I can also imagine it creating a type of corporate digital divide where in large companies are the only ones that can develop custom AI tools.
So how is AI going to affect the publishing marketplace. Are there winners and losers. And let's start with Chhavi. Well, there certainly are winners and losers in today's world. So GPT 4 got released two weeks ago on the very first day of its release. The mobile also, it was very strategically placed. You can use it for free, but if you want to have a mobile version, you have to pay $20 a month.
Subscription fee doesn't sound too much. On the very first day of the release, there was a 22% increase in their revenue. On the second day, they doubled their profits by making $900,000 and within one week of the release of the product, OpenAI made $4.2 million. For a small society like mine, that's a huge revenue stream.
In the first two weeks of its release, there were only 10 major countries that have started using the mobile version. So the $4.2 million are coming from 10 countries of the world. And there's your digital divide in the making, because these countries, the individuals that are using it, the organizations in these countries that are using it, are already ahead of the curve, embracing the technology to build products that are going to augment our performance.
So the AI is close to me. I will also and SSP is very close to my heart and I have gained so much from it. And one of the major things I see I have gained from SSP is mentoring people and also reaching out to the readers, to the leaders in the field for my own career development. And I'll just mention I have a brave mentee, someone I didn't know from Nigeria who just reached out to me on his own and he needed some guidance.
For a few weeks. We were just struggling to schedule a video call to discuss his needs. So that's the digital divide I'm talking about here. There are challenges that we are aware of the challenges we foresee. There are challenges we don't even know exist. How are we going to overcome those. And we'll have to overcome those collectively.
So there is a digital divide in the making, but collectively we can identify these barriers and bridge the gaps that exist in our industry. Let's do a minute or two for Shelly and then a minute or two for Dave. Only a minute. But seriously, I will not be able to say anything about whether what's going to replace what we know losers in the publishing space.
But I can say that, again, everything comes from TSR that we interviewed and leaders in UN organization. She talked about how the use of these AIs can help our organization, which very much reminds me at the very beginning, people worry about lots of lawyers worry about whether they still will have a job. And so, so the idea was and somebody reminded them, AI is not going to replace you, but AI is going to replace these lawyers who do not use AI.
And so this particular UN organization, they need to do they need to do a lot of legal work to help migrants, immigrants in lots of countries. And now the AI tools can help them to search for the legal documents, search for what is the case, all the things much, much faster than otherwise. Otherwise, it's all human time doing so. So I think for whoever those who are now trying to utilizing these techniques, you might be replaced, which will encourage all of us really be familiar with what's coming.
Yeah, I think that tools like ChatGPT and GPT 4 and Gemini and these new tools are really they're Democratic, where I have the same access to the most advanced language models on the planet as like an 18-year-old working in his basement has. And so I think it's going to encourage innovation. It's going to be able to see, more competition, more ideas and.
I mean, Yes, I mean, large publishers versus small publishers. Large publishers may have more resources. They can throw at initially. Why we're putting a lot of emphasis on AI. But there's also opportunities to partner like I don't want to plug. But if you go to Wiley partner solutions, you get access to the same tools that we're using internally.
Or you can work with an industry group like the stem integrity hub and get access to their AI tools. So I think there's a lot of opportunity for everyone to be able to use this as much as they can come up with good ideas on the best ways of using it. All right. So we've heard some everybody doesn't have access and everybody does have access. So I'm glad we have our first significant disagreement of the panel.
Let's see. We know people are already using generative AI to write papers. There's lots of indicators of that people have seen in published scholarly literature. However, there's also a reviewing side of the equation. And we were talking in our meetings before this panel about how a lot of incentive structures collapse if it's easy to write 10 papers, but nobody can review them all that fast.
And even if we figure out how to accelerate review, maybe also with AI, there's probably many vendors who are happy to talk to you about that on the exhibit floor right now. We might also need different indicators of research quality. If writing and reviewing both gets significantly easier, what metrics and standards will need to be developed to reflect this AI driven shift in scholarly communications culture.
And start with Charlie. Thank you very much. That's really. Tremendous question, the one I have been struggling with a lot as a scholar because review is where you hold the high standard. But now we have this province with just way too many articles. And for those of you who are familiar with this computer science conference, they got 10,000 1,000 submissions one meetings, and they basically trying to outsource all the review project to anybody, basically can breathe and count, maybe count even optional.
So that's a tremendous problem. I run into the other kind of problem reviews. I would love to hear you guys suggestions on how to handle it, which is at the science community, very much emphasize reproducibility replicability. Particularly in the data science space. So what I require, we require people to when you submit a paper, one particular one to get it published, you will submit your code, submit your data.
We have depositories, we do those things. But I can openly admit this is not a secret. Unlike all the articles submitted with reviews is code and the data. Certainly for my journal, we have no capacity to review them. So how good that is to put there. In fact, they might do harm because people could put anything there and they claim now it's a Harvard Data Science Review depository, get some credibility.
At the same time, we have no capacity to do it. And simply it's not possible for me at this moment. And this is where I'll be looking forward to all these eyes, all techniques. Now, I know it's not perfect. Somebody people get scared. How can we trust those things. I say, well, compared to alternative, what. We're not even doing anything right.
So I think that is a one tremendous area. I would absolutely love to hear whatever thing you hear, how do we do that. And I really hope that I can help. Yeah, I really resonate with that whole thing, in part because I am one of the warm bodies who has been asked to review for computer science conferences, even though I don't go to them and don't have a PhD in computer science.
I just write a lot of software and my husband is also in software and is constantly running into problems that there's a cool paper that looks relevant to his interests and it has code and the code doesn't run, and he can't reproduce the result and use it in his work. And it's a bummer, but at least you run the code. There are computer scientists don't touch code at all. I know. I have also reviewed some of theirs.
Anyway, Javi, how about you. So I think as an industry, we all sort of struggle with having one key metric for determining the impact of science. I know if I say the words impact factor, some of you or many of you will cringe in your seats. So I do have a wish list, and my wish list has global standards for and best practices for using generative AI in scholarly publishing, which is in writing and in peer review as well.
And it would be nice to have it, but I said, it's on my wish list and I don't know if it happens as our best practices for using generative evolve if we are all going to be in agreement because there are so many standards, there are the regulation which has been lagging forever and it continues to lag in the use of generative AI. That's going to be the trend. But there are nations, there are countries that are putting the regulation of generative AI up front, right.
So they're going to dictate the way we approach any content data in general, especially scholarly content. When we think about using these tools, I suddenly lost my train of thought. Actually, I realize I haven't answered your question, so let me just take over, please. When you sync over seriously in terms of building metrics, at least for the scholarly part, I think the ones are very important because I was a former Dean and the chair.
So the thing is, we should talk to the academia in terms of what metrics actually matters in the scholarly publishing world, for example, promotions, things like I hated myself when I was a Dean because I always joked about deans are being counts, right. We count how many papers. And I was not going to do that. But once I become Dean, I feel like it's inevitable because I have in charge of so many different disciplines.
I have no idea what these disciplines are. So all I can do is just look at their CV. What do you look at the CV. You don't read the title and count how long it is. So I realized that I was driven by doing those things. So I think it's very important for us to engage all the stakeholders as we build these metrics. This might be a time to really rethinking about all the things that we're doing, which is really not quite right.
Yeah, and Thanks for bringing me back on track because the Yeah, we clearly did. So the thing I was going to mention is the impact of science or any scholarly content that we are putting forward. So going back to rigor and reproducibility and you know how long. The longevity of the content we are putting out and the lasting continued impact it will make.
I think that will dictate how peer review should be approached and how scientific writing should be approached. Yeah, I agree that reproducibility is probably the most important thing going forward in blockchain you have the concept of proof of work that you have to solve a mathematical problem in order to get your Bitcoin out or whatever. And in publishing, part of the proof of work until now has been I can write a long, detailed paper, I can write a thoughtful review, I can write, I can look at the reviews that have been written for this paper, and that's sort of like the proof of work that what you've said that you've done has been done.
But imagine five years from now, 10 years from now is these large language models get better and better. The amount of effort required to go to a full paper starts to approach almost zero. The amount of work to review a paper almost goes to 0, so your proof of work disappears. And so if you can't count on a paper being reviewed, being written, published and reviewed as being the most important thing, then reproducibility being able to show that the code compiles.
Being able to show that the statistics that you have from your experiment are valid and being able to show that you can do the same thing again, which we kind of penalize nowadays because we're always concerned about novelty. If we can show that you can do the same thing again, that forms the solid bedrock for science and research that. If ChatGPT can write our reviews and write our papers for us.
This is such an interesting point that all of you are making about reproducibility, and in part because, like you say, people have looked for the thing that takes a lot of work as proof that there. And so with maybe traditional metrics being less useful in the AI world, I see you've gone to the thing that takes even more work than that. So I hope AI is saving us a great deal of work and other things to free up that kind of labor.
Although I also think this dovetails with the president's earlier question of who here does humanities journals. Because reproducibility is not really the coin in that realm. I'll give you, like, a moment because I see you're dying here. Yes sorry. That's my trademark now. Speaking of reproducibility, I want to just remind the general audience that reproducibility is the minimum requirement.
Reproducibility simply says we can verify what you said, what you said you have done. That doesn't say anything about replicability, which is the scientifically whether those conclusions are valid. In fact, the current standard of replicability itself does not really say anything about scientific reliability. Let me give you an example.
2016 elections. I know elections are on everybody's mind. The polls saying Clinton is going to win was replicated many, many times because that's how we all get convinced out to be all wrong, because they're not reliable. Why Because you have this whole issue of people do not report to you that there's a hidden bias, so on and so forth.
And these are the issues I want to raise this. I'm trying to say this is because these are kind of issues that AI is not going to be able to detect. When all the Survey said one way and we humans understand I'm not going to tell people I'm voting for somebody who's really unpopular. So that kind of bias only humans. Now, if we have enough training of these data, hopefully we won't.
If we have enough training of data, the machine will learn. But most time machines won't. We don't have enough data and that's human behavior is the one that humans need to understand. And so I think in the sense I'm trying to emphasize that the human reviews will not be replaced if we really take scientific reliability seriously. That's a great point.
And I think that leads us right into the next question, which is when we were chatting earlier, Dave said that pretty much anything anyone asks him about these days is research, integrity and AI. So, hey, Dave. And followed by Chevy and Charlie, talk to me about AI and research integrity. OK, so research, integrity and AI. I want to quote the famous, I thought, leader Patton Oswalt, who described generative AI as being like an ax.
An ax as a tool. You can use it to chop wood. You can use it productively. You can drop it on your foot and hurt yourself because you weren't sure how to use it. And you can also chase after somebody with an ax. And I like that analogy because like Professor Bloom, when the talk yesterday was saying that, we're looking for use of generative AI in papers and using that as a potential disqualification.
And I think it really depends more on the intent. So if I'm using generative AI because English is my second language or I'm using it to improve my thinking, I think that's a good and legitimate use of it. At Wiley, we ask authors to disclose if they've used it, and that's because we're still getting to grips with this. We're still getting to grips with how we use these kinds of tools. But you can also imagine that I could also be using generative AI to generate Tens, hundreds, thousands of papers that are completely fake and can flood the system.
So we need to think about how can we use, on the one hand, use generative AI productively to help people do more better research and communicate it better. And on the other hand, how can we use these tools to detect problems or attacks on our systems that we haven't been able to detect or defend ourselves against from before. Yeah so just building on what you said, David, and also going back to what Professor Deborah bloom said yesterday, that we are fumbling generative AI is in use, and we don't as an industry how to deal with it.
So I think the approach is collective collaboratively. And I don't know how many of you are familiar with STEM hub, but it's a huge resource available to our community. There are 35 publishers who are part of the consortia right now and they're collectively utilizing all the tools to look for scientific misconduct. David mentioned about paper mill generated content that is inundating our editorial offices. So they're looking at the submission of these paper mill content across the platforms through different publishers that we in a silo cannot do.
So I ThinkResearch, integrity and scientific misconduct can be easily at a scalable level identified using the AI tools. So we should embrace the use of it. But at the same time, the malicious actors are also embracing these technologies to get ahead of the game. And for that, we have to figure out ways and put in some I mentioned about standards earlier standards in place.
So we can contain those practices. One of the other things I wanted to say was, again, going back to the collaborative effort, SSP has the AI community of interest and you mentioned about that and we'll get back to that. So I'm going to use this opportunity a few more extra minutes to talk about it. We have over 200 members from the scholarly publishing community right now and we have three dedicated working groups and their vendors.
They're publishers, they're individual members of the SSP and beyond who are part of this group. We get together and we talk about so many different aspects. We are putting together a glossary of the terms so that we are all talking about the same things. We are attacking the same problems in a similar way. We are putting together a series of best practices because each one of US is doing a different thing, which applies to us, but we are unaware of some of the other challenges.
Some of the other things we are doing are collectively, despite being different vendors, building tools, which will help us in identifying malicious acts. And as we are doing it collectively as a community, we are poking holes to see if our approaches are going to be ethical, they're going to be responsible, they're going to be sustainable, they're going to be scalable. So in fact, at 1915, right after this conversation, we are meeting in Revere room here.
So if any one of you would like to meet the people, the amazing subject matter experts, as well as all the co-leads and the leads, I would really like to invite you to join us there. So for me, I think the first time. I really need to deal with this research and I integrity is as a professor giving homeworks. And the first thing come to my mind is like what do you do. What's your policy.
And so from the very beginning, I adopt a policy knowing that humans cannot resist technology, don't trying to say don't do it just doesn't work. You have no way to reinforce. I'd rather what my policy. There is a way to educate the students. I tell them like I trust you here to learn for your own sake, right. Don't waste anybody's time.
Do you. Try to be whatever you want. And as long as it helps you. But you need to disclose if it's a major use. If you choose some words, that's fine. And I think so that is I think that's a great part. I want to emphasize also, I don't know how much this society engaged in the education, the students side.
I think if anything, you want to promote it for the future generation. It's where you start with the students because that's where the research integrity is. Should be part of the Graduate School training and the Graduate School training. You do those. But I also mentioned that on the technical side and again from CSR, we will be having an article currently doing, trying to use all the algorithms or statistical tests to see how much part of the sentence by sentence, which sentence is more likely, the sentence, which sentence are more likely a human edited sentence.
So we'll have these technologies coming. But I have to say that the best detective that are probably still human beings. And I have my wonderful Rebecca and Amara I write my editorials and once in a while I will use ChatGPT. And one day, they look and say, Charlie, this is not by you. It doesn't sound like you. And it turned out to be that. Because I was choosing some words because I speak chinglish as I don't have too much vocabulary sometimes the choice of words.
And she said, Wow, that doesn't sound like it. She's right, because I have no idea what this word come from. So can I jump in just one last thing. And you we're talking about research integrity, and I think we're talking about a symptom. We're not talking about the underlying problem. We're not talking about the reason why, somebody working 60 hours at a hospital is considering buying a paper so that way they can get a promotion.
You mentioned your counter experience before. So I would ask anybody in the room, if you have influence back home at your institution, with your society to look at other ways of evaluating people, of figuring out who should get promoted or who should get tenure. So that way we don't have these as much of this pressure on our systems and we can keep the literature, in more better shape.
Yeah, I agree with that so much. In a past life I taught middle school, I taught Latin, the middle school boys at a Boston area prep school, which was definitely an experience because I went to public school in West Virginia anyway, and I had to deal with academic integrity cases from time to time, which was like my least favorite part of the job. And 100% of the time it was people who felt overwhelmed, needed to turn in their homework and didn't have the skills to get out of the situation they were in.
Like maybe they just had too much on their plate. They didn't have time management skills, but it wasn't like a fundamental character flaw. It was a skills deficit and it was the high pressure system they were in. And it needed a different kind of solution than just yelling at them anyway. So we have started with the granular workflows of writing and evaluating papers, moved up through the marketplace and up even higher to big questions of the meaning of scholarship.
And we're going to continue moving outward to the impact on the world as a whole. So SSPs core values include community and inclusivity. We know that AI creates both challenges and opportunities with respect to diversity, equality, and inclusion. For instance, when AI is trained on predominantly English texts from the Global North, it can leave out a lot of perspectives and disseminate racist and sexist ideas lurking in its training set.
On the other hand, non native English speakers report that ChatGPT can really help them in their writing. So what kind of risks and opportunities do we see for AI and DEI in scholarly communications and chavi? You hit this idea already, so I'll let you start. I have a lot to say on this subject, but I think I want to change shift the lens a little bit here, maybe starting with the opportunity.
I think there are too many risks also. So I recently attended the annual meeting in Portland and I happened to on a networking lunch break. I spoke to this individual from Norway and it was enlightening. This woman told me that the Norwegian government funds the medical research, medical science research there for the researchers to publish all their work for free, which is a very different model than any of the models that we are aware of.
But they have to publish that research in Norwegian journals in the Norwegian language. So there is this corpus of body of work on medical literature and clinical developments that's sitting in Norway. So it's an extensive opportunity for us to leverage that content. If you can just convert it into different languages and English being the predominant language for communication here in the Western world. And then it's a unique opportunity for us to see the developments that are happening, see the impact of those findings on masses and at the same time start building on those.
So I feel like the opportunities are immense. Definitely as you mentioned, for multilingual authors, it's easier for them to build their research by making the formatting of the manuscripts simpler. It takes away the burden from the authors to publish in specific journals with specific formatting options. As a consumer, as a reader, especially if you are again, a multilingual person, then it's easier for you to process that content in your own native language.
And again, with AI, we can break it down into different formats. That is going to be most useful for my needs. If I'm maybe a clinician versus I'm a lay person who wants to know what the impact is going to be on me. So opportunities are immense. The risks are associated with the rewards. I think one of the biggest risks that I see from the Dea perspective, if our data sets are not diverse enough, there are going to be several unintended consequences.
And I think it's our response possibility as we start developing, deploying, and scaling these tools to make sure that not just the rewards, but the risks as well, get distributed between all stakeholders. And for that, it's immensely important that all the data sets are diverse and representative of different populations. Charlie next.
Thank you. And I our 10 minutes left. And so I want to follow up on what you just said. And to me, there is a tremendous opportunity for I and I hope that some of you will take on this idea as we talk about language issues. That's a barrier. But at least in the research realm, that another big barrier is a different kinds of language, which is a discipline language.
I write as a statistician, no matter how carefully. I want to write, to be broad, I have a way to frame those things. And a social sciences coming and a clinician coming, they have their own language, own thinking. And here, I think, is one of the features of the I can help using AI to write different versions for different disciplines. Let's say take an article from someone like Mr Jones.
And if we develop these software, which I do think those things will come right, you can have the AI to repurpose this article, take the language, translate it into a different discipline language. Select examples may be useful, more relatable to others. So it's a different kind of translation. And I think that I definitely think there's some business idea here. So any publisher wants to work with me, please.
But, but seriously, that I think that the AI has the ability to learn the general, the AI, to really collect all the examples, how people think about these issues. And I think that is one of the things can make these articles really talk about inclusivity. Lots of times the barriers is like, I have to learn, so much to understand the jargons before I can understand the paper. And here, I think or whatever journal can provide the service probably can attract more submissions.
Yeah, I would just add that in software engineering, you have the concept of security by design. Security is something that you think about when you start building something. You don't just sprinkle it on at the end and hope for the best. And with AI and generative AI, we have to think about ethics by design as well, safety by design. So how do we think about making sure that we're using a diverse models we're using we're getting results that test.
Well, thinking about that from the very beginning instead of at the end or when it's in the hands of the customer as we find out that it's not representing some people or it is discriminating against some people, we need to do that from the very beginning of the process. Yeah, I agree with that wholeheartedly. And it's also one of the things that really concerns me about OpenAI being everyone's like go to because we kind of don't know what's in their training data.
I mean, we have some hypotheses, we have some evidence for bits and pieces, but they're not transparent about what's in it. And if that's what we're building all of our tools off of we've sort of waved our hands around the first step of that process. And I guess it depends on if we can do of post-hoc evaluation that things are sufficient or were you going to say something. Lichchavi Yeah, I just wanted to say the scholarly content is so different than the content that maybe OpenAI is getting trained on.
It's probably most of the content that's freely available and it's social media. It's probably the preprint servers because they're not behind the paywall. So the high quality peer reviewed data that meets the standards of rigor and reproducibility and reliability they have been properly vetted by the experts in the field, sits behind the paywall. So I think we are not ready to start using some of these commonly openly available tools.
But there is a need for us to come together to build these large language models that are exclusively trained, especially on domain specific data sets, because what's happening in one domain may be very different than another scientific discipline. So it has to be discipline specific. And I think one size fits all approach is not going to work when it comes to scholarly publishing in AI. Yeah all right.
Look 952 last question time. So we started with scholarly publishing workflows spiraled outward through publishers, the pursuit of knowledge more broadly and the effect on humanity as a whole. I am going to stick with the idea of human impact here, but I'm going to focus it down to one story. I have a 17-year-old daughter. We have homeschooled her to varying degrees all along. So this past semester I taught her how to write a research paper, shout out to all of you for making that possible.
And one of my key lessons for her was that it's not really about producing the deliverable. It's not really about writing the paper. The most important part is the research. The research is the thinking. It's the place to spend most of your time. And it turned out also to be the part where she came alive. She was so engaged in getting to dig into ideas that were interesting to her and getting to really dig for obscure facts with sophisticated search strategies.
And I worry that generative AI makes producing the deliverable of the paper so easy that especially under the kind of pressures we've mentioned earlier, we skip the part where we actually think, so how do we use AI to build tools that encourage creativity and engagement and thinking rather than alienating us from our own minds. Chasing ever faster deliverables. I'll start with Dave and move toward me so we can end with the educator.
Yeah, and that's my point about proof of work. The important part is being able to think, putting the effort into it. I mean, when I use generative AI, I'm not using it to write stuff. I'm using it to help me think about what I should be writing so I can bounce a bunch of ideas off of ChatGPT or something like that in an afternoon I can refine, refine my thinking, use it as a tool for thinking because it has superhuman recall and be able to really get a handle on what I'm trying to figure out with it.
So for that, it's a really good tool. We need to figure out how do we differentiate between I've been doing the thinking and then I go and write something versus I use ChatGPT and generate a huge wall of text that's not useful. But you can start to see now with people can see when you're interacting with people that they've written something but they haven't thought about it. And you getting past that I think is something that as a society we'll be learning how to do over the next couple of years.
Yeah I have to agree with you there, Dave. So it's about how you use the tools. If like you mentioned Andromeda before and it's a very relevant point and there are published studies now when people are either overwhelmed or they're critically reaching a deadline, that's when they tend to leverage some of these tools to just generate content. I think as humans, we need the time and space and that pause to get the clarity about what the question is and what's the outcome you're looking for.
As you mentioned, Dave, you can get the entire outcome, but that's not going to help the process. If you think about the use of AI in peer review. It's not going to help because it's going to be a very homogeneous process. The AI is not going to be able to add any value to the content in the context of human intellect. So the actual intelligence is going to be missing from that. So I think we have to train ourselves, like you said, it's going to be evolving for us to come together to figure it out.
What are the questions that we want to ask. What are the gaps we want to identify. So if you have a content in front of you, you can use AI to ask identify what are some of the things that could improve the impact of this science. Look at what's out there in the literature to see the trends where this could be leading to what more as a researcher can I add.
What are some of the questions I can ask. How can I move this field forward. Maybe do the comparison between other disciplines and look at what's happening in other fields to get some creative ideas about some problem that exists, but you have not thought about it yet. So I think it's the approach, how we ask questions. So the AI tools are available to us to do a whole bunch of different things.
We have to continue to keep probing them and just not stop when we get the answer. But take it a step, a step further with our own creativity. What is it that we are missing that we can enhance using these tools. Thank you for giving me the last word. And I have to say that having worked with many, many talent students, I am not worried about that particular issue because for two reasons.
Number one, it's a very competitive market out there. The students know that if they are using a very passively not being creative, they're not going to be competitive. Look at how all the rates on how to use up all the generator, how much human intelligence is being developed at this moment, because you got all these kids are so excited. Their brain is working so crazy at that time. That's one reason I'm not worried about. The second is that it will replace some older thinking.
It creates new kinds of thinking. For example, now we have to take this notion that human machine interaction very seriously, because every time you are working, you're doing job, you're doing prompt, you are actually being engaged with the machine. You have to think about what that means. My first encounter with the machine was I probably won't be surprised.
Everything Harvard Data Science Review. The first thing I ask is the first line say, give me a fundraising strategy for Harvard Data Science Review. You come out of two seconds literally. I don't need to repeat anymore. Sorry with eight a list of eight things that are how do you do fundraising. None of them are showering. But I actually look at it because I was a Dean.
I do fundraising. Everything looks right. That prompts me to think about, OK, what can I do next. So there's lots of thinking. It's just a different kinds of thinking now. But I do think we need to create a culture of students. And I emphasizing this. I think you made a very important point. We need to emphasize to the students that at least during the time they are students, that you should emphasize, they emphasize is not publication, not the product, but rather the process.
It is the process make us better. And I think that's the message we should deliver. And I'm quite sure that this engagement with the machine is really changing the way of thinking, particularly, for example, we're integrating the human, the humanist, right. It's all about language. And with the kind of stem side. And there's a lot more opportunity out there. And I could take another two hours, but I will stop right here.
Thank you very much. It was a fascinating experience because I feel like I prompted you with an issue that was on my mind and you came back with a reframing. And it's kind of like talking to ChatGPT in that way, but also better, very much. I'm also hallucinating all the time. So well, given how little sleep I hear everyone got, maybe everyone's hallucinating.
Anyway, I would love to have a broader conversation with all of you as the audience, but I see a 55 second countdown in front of me and that's counting up. So I think we are over time. And unfortunately, we have a hard stop because they are resetting this room for lunch. And I'm sure we are all enthusiastically in favor of that. So please, if you have any questions for any of the panelists, leave those in the extremely lively conference app and we will monitor that and continue the discussion there.
Thanks, everybody.