Name:
2025: Oxford-Style Debate: AI Is Fair Use
Description:
2025: Oxford-Style Debate: AI Is Fair Use
Thumbnail URL:
https://cadmoremediastorage.blob.core.windows.net/4ec402ce-dd98-4467-b888-0c04ddda16bc/thumbnails/4ec402ce-dd98-4467-b888-0c04ddda16bc.png
Duration:
T01H01M20S
Embed URL:
https://stream.cadmore.media/player/4ec402ce-dd98-4467-b888-0c04ddda16bc
Content URL:
https://cadmoreoriginalmedia.blob.core.windows.net/4ec402ce-dd98-4467-b888-0c04ddda16bc/SSP2025 5-30 1500 - Closing Plenary.mp4?sv=2019-02-02&sr=c&sig=XsqW8F5012aw6ILlK1hJOYHHYChEPp8xzoKGRkcXRgc%3D&st=2025-12-05T20%3A58%3A27Z&se=2025-12-05T23%3A03%3A27Z&sp=r
Upload Date:
2025-07-17T00:00:00.0000000
Transcript:
Language: EN.
Segment:0 .
Now have you all enjoyed the conference so far. Terrific that's what we want to hear. So one last time I'm going to say let's hear it for my fellow co-chairs, the entire annual meeting program committee, the SSP leadership and staff for putting together yet another fantastic meeting.
It's extremely gratifying to see so many of you still here on a Friday afternoon, although I'm not seeing you all that well because these are my screen glasses, not my distance glasses, so you're a little bit blurry. We're absolutely thrilled to once again close out this year's annual meeting with an Oxford style debate, which, for my money, is one of the best ideas SSP has ever had. This debate on AI and fair use will be moderated by etzer boats.
Etzer is the senior director of enterprise strategy at Wiley, where he establishes partnerships, drives creation of new products and services, and looks to extend Wiley's market reach. Prior to joining Wiley in 2021, he held similar strategy positions at Kaplan and Deloitte Consulting, and before that, he served as the Director of Financial oversight and management for the New York City Department of Education.
After graduating from NYU, etzer earned his masters in public policy, strategic management, and finance from Harvard Kennedy School. And now I'm going to turn it over to etzer, who will introduce the debaters and go over the debates, format and ground rules. Sir, it's all you. Thank you.
Great thank you. Thank you, Greg, for that introduction. And thank you to the entire host committee of SSP for inviting me to moderate this debate. Can we get another round of applause for the SSP group for a wonderful job well done on their 47th annual meeting. So let me just say that I'm honored to be a part of this conversation today because like many of you, along with my colleagues at Wiley, spend a lot of our time thinking about how we can push our thinking and move the needle on AI and scholarly publishing.
So, selfishly, I'm excited to be here today to learn and challenge my thinking and to grow alongside all of you. But before we officially dive in on our debate and I introduce our debaters, I'd like to just go over 2 housekeeping items. First, in the spirit of transparency, I'd like to disclose that I did not use AI, ChatGPT, Gemini, DeepSeek, or CEACR.
Any tool in preparing the opening of this debate. I like to do things the old fashioned way I use Google. Secondly, I just want to take a pulse check on the energy of the room to see how everyone's feeling. So I know it's been a long but hopefully an inspiring week. So let's just see how we're feeling. Show of hands. How many of you are here today ready to engage in vigorous debate on the future of scholarly communication, creation, dissemination and consumption in the age of AI.
Show of hands. Thank you. And how many of you are here simply because your flight isn't until 3.5 hours from now. OK, without doing precise math, it seems like most of you answered in the affirmative to the first one, so hopefully we won't have to work that hard to keep you engaged on this debate. So brings us to the resolution of our debate.
AI is fair use or is it. The timing couldn't be more on the nose for this topic. Why Because every day it seems like there's a new lawsuit, a new ideal signing, a new another policy issuance that shifts the landscape of this debate. And yet seems to raise more questions for us. Speaking of questions, there's this few I'd like to clear up before we get into our debate.
What do we mean when we say I and fair use. So when we talk about I today, we're referring broadly to the practice of training generative AI models like ChatGPT, Claude, Gemini, DC, cahoots of others on copyrighted work, including the kinds of work that we all spend our time on day in and day out, scholarly content, peer reviewed journals, articles, and the. When we say fair use today, we're referring to the legal doctrine governing use of copyrighted material without the express permission of the copyright holder.
Now, it's important to note in a global environment that fair use frameworks vary significantly by country. So for the purposes of this debate, we're going to be using the frameworks that are set forth by US copyright law, which surprise, surprise, leaves a lot of room for debate, which is why we're here. So without further ado, allow me to introduce your speakers in today's debate. First, speaking for the affirmative AI is fair use.
We have Adam isgro. Let's welcome Adam to the stage. Thank you. Thank you. Adam so Adam serves as senior director of AI creativity and copyright policy at the Chamber of progress, where he leads their generate and create campaign to unleash the full creativity creative potential of AI for artists, entrepreneurs, scientists, educators and others.
He's worked as a lobbyist, strategic communications consultant and policy director on many tech driven copyright matters, including the 1996 copyright treaty, the Digital Millennium Copyright Act, and efforts to regulate peer to peer software technology. Again, please give another warm welcome to Adam isacoff. Taking the opposing position, arguing in the negative that AI is not fair use.
Is David Atkinson. Welcome, David to the stage everyone. Thank you David. David is a lecturer in the Department of Business, government and society at UT Austin's McCombs school of business. His work focuses on the intersection of law, ethics and AI. If you get to know David, you'll learn that he loves to write his Substack.
Intersecting AI is full of great content on the subject. He's also written an open source textbook on law, ethics and AI, and as recently as yesterday was quoted in a fortune article on AI and copyright. All very timely. Again, please welcome both of our debaters to the stage. OK, so a little word on how this debate is going to work. For those of you who are not aware or uninitiated to the Oxford style debate, the most important thing is that this debate depends on your participation.
So we're going to start with a poll in which you're going to vote where you stand on the issue. For that, we'll use you'll use your smartphone and we'll flash up on the screen a QR code. Or you can sign into a website at which you will demonstrate where you stand on the issue. And then each of our debaters will give their opening arguments. They will have precisely 10 minutes to give their argument.
First, we'll have the argument in the affirmative, and then we will have the argument in the negative. After they give their opening arguments, we will have a rebuttal. They will have three minutes to give their rebuttals. And then after that, it's your turn. We're going to open up the stage. We're going to open up to Q&A, and you'll have opportunities to ask questions of our debaters.
After a Q&A period, we will have two minutes for closing arguments in which they get to sum up their case. And then we will take a final poll to see how you have moved on the issue. The winner, which we will declare at the end, is the one debater who moves the most votes from one side to the other. Make sense. OK, so let's start with our very first poll.
Let's see if we can get it on the screen. I think we're having. There we go. I is fair use, so use your smartphone or go to menti.com and use the code 22311102 to take your vote. OK has everyone taken their vote.
And with that, we will start with our first debater. Adam welcome. Thank you. Yep good afternoon. Curious to know what that number is, but I'm more curious to know what the one will be later. Are you ready. Let me know.
I'll start. First, let me say with if I may indulge, ask the moderator's indulgence, just to say very briefly that when I got Ezra's invitation by email here, I looked at it hard and said, that's great. And then I got a handwritten note addressed to Daniel. I'm not quite sure why, and it was signed lion inviting me to his den to have a conversation with a few friends about current events.
How do you turn down a lion who wants to talk about current events with friends. So here I am in the lion's den. Click on the time if I may OK to talk about. We don't have slides. Sorry yeah, it went to the poll. App there we are. Progress OK.
Technology is our friend. Let's just keep that in mind as we go through all of this. Thank you very much. Vote early and often, preferably for the affirmative. Ezra helpfully clarified that what we are really talking about is not AI is fair use, because AI is a lot of stuff as we'll come to and fair use is a very specific thing. We'll also come to that. So what we're really talking about is resolved.
And of course, my opponent in the organizers were aware of this formulation ahead of the debate. Training generative AI models on copyrighted material is fair use. That's really what we're talking about generative AI copyright and fair use. So with that pushing the right button this time, let's talk about fair use, shall we. ETS are indicated it is a specific legal doctrine.
It copyright of course, starts with rights for owners. You're all very well familiar with that in the publishing context. Distribution, copying, reproduction, derivative works, performances at times very specific rights in a specific statutory regime. Fair use, equally specific, equally specialized. Why am I emphasizing this. Because we all think we understand what the word fair means.
Why wouldn't we. It's an English word. We all speak English. Fair dealing. Fairshake Fair enough. Fair hair in copyright land. Which is much more like the Twilight Zone. The word does not mean what we think it means. It means what the statute says.
And that's going to be important because concepts of fairness. Fairness in the sense of gee, isn't that a rip off are not what's at issue here. What's at issue is whether training generative AI on copyrighted materials is fair use. And I make that distinction for reasons that will become very clear. So copyright itself, where does it come from. The Constitution.
And it's the only clause in the Constitution. I realize this is going to sound like heresy. That's the lion's den part. It's the only clause in the Constitution where the framers indicated the purpose. The purpose of copyright is to promote progress in science and useful arts, and it does that. The means to the end is, of course, to incentivize people commercially.
But the Supreme Court has noted many, many, many times that the point of copyright is not remunerating people. It's promoting that progress. And this goes back for centuries, including way back to Lord ellenborough, who nailed it. He takes a back seat to no one, he says. Obviously here in protecting people's copyrights. But it can't be a regime that manacles science. And that's going to be important as we go forward as well.
So what about generative AI. What are we talking about here. Well, etzler indicated it directly. AI is very broad. There are multiple kinds. You've all used it, whether you know it or not. Generative AI is a very specific thing, and it requires enormous amounts, as it indicates here, of information, including absolutely copyrighted information.
And that goes in the three steps in the middle are generally have the caption and then magic happens. I was an English major, don't ask me. But amazing stuff is done in ways we'll touch on in just a second. And then eventually an output is produced. But we are not here to talk about the outputs. We're here to talk about the inputs and the training of generative AI because that's what we're looking at to determine whether that process, that usage of copyrighted information is fair use under the law, not fair use, not fair in Webster's dictionary.
So these LLMs taking one type of large language model, one type of generative AI system into account are enormous. How much information goes in there measured in tokens and trillions of tokens are what we're talking about. Ultimately, right now, the smallest of these models as of a year ago was about 70 billion in tokens. That's an enormous amount.
And tokens are essentially chopped up pieces of information, which are later analyzed by the machine in those incomprehensible to an English major kind of ways to be able to statistically assess the information and answer your questions and produce new and fabulous, interesting, innovative material. Tokens can be visualized this way too. I found this rather helpful.
So with all of these tokens, 50 billion tokens, for example, if they were each a drop of water is what it would take to fill up an Olympic swimming pool. According to Google Gemini are used to produce LLMs. They're used to produce generative AI. They're not used to produce other copyrighted works. They're used to produce this incredible effectively software product yes, copyrighted information. Scholarly journals, the Great Gatsby.
Carcass photo. The Beatles music, all kinds of stuff goes into that. But the key thing is it's not just that information. Remember those drops going into the pool. Very few of those drops would be the Beatles or Winston Churchill or any other photograph, for that matter, or the Great Gatsby, or any other book or journal article for that matter. Information about climate goes in, information about cancer research goes in, information about molecular structures and how they might relate to each other.
Protein folding, mathematics, you name it, that goes in. That's how you get to trillions, ultimately, of tokens. But what's being argued about is why we're here today, whether or not that actually is fair use under US Copyright Law. And it's being argued in many, many cases across the country, 30 plus, as you see here, they keep consolidating and getting introduced. The exact number may be a mystery. So fair use that statutory regime constitutes four factors that courts look at.
Courts, by statute must look at these four factors. And I would respectfully submit to you, dear audience members, that you as people who now understand that fair doesn't mean fair. In fair use, it means what comes out the other end of this four factor analysis. The two factors are especially significant the first and the fourth. May I ask the time remaining, please.
You have three minutes left. Thank you. Factor one is all about guessed it, transformers transformation. What generative AI does is turns those billions and billions and trillions of tokens into incredible new kinds of information and products and analyses. And the courts have recognized this. The central purpose of the investigation under factor 1 is to see whether and to what extent the new work is transformative.
Take my word for it. David will call me out if I'm wrong. The Supreme Court has said this many times. Many other appellate courts following the Supreme Court have agreed, including very recently in the Andy Warhol case, which went against a finding of fair use. So it's not always that you're going to find transformation in a particular case. This one will be debated for some time, but here the Warhol court hearkens back to that earlier decision in acuff-rose and says that the first factor asking whether and to what extent the use has a purpose or character different from the original.
The larger the difference, the more likely the first factor weighs in fair use and in acuff-rose. Not incidentally, if you find transformation under the first factor, it can outweigh many of the other factors as well. Whether you use the entire work, it's fine to use the entire work if you're making a transformative use. So what about the effect on the market.
You're all in publishing. And cents. So the effect on the market. Well what's the market. It's the market as we see in point number one here, for the original copyrighted work that went into the model, not the market for a set of training data, because courts have recognized under factor 4, you'd be stuck in a circle.
Somebody comes up with an idea for training data with an LLM in mind, and then the owner of the copyrighted information that goes into the training data says, oh, wait, you want it. So there must be a market. Courts have written about this a lot. I'm not making it up. And they say, no, no, no, that's not the way it works. We've got to focus on the market for the original work.
So if your book is one drop in, that Olympic swimming pool, really, is training an LLM going to affect the market for your particular work when it's blown into all those tokens and turned into something entirely different, and that's something entirely different, is used to create something even more different on the back end. I don't think so. Substitution for the original work is what's not legit.
But if what's produced, ultimately a transformative work competes with the original work. Not only is that OK, it's encouraged by the law because the Constitution talks about progress in science and the useful arts, and the idea that the training data market is going to exist very soon is belied by a number of facts that I hope we can talk about in the questions and beyond. Specifically, there's just so much material.
No offense to any individual creator, but the value of their specific work, and that's what's at issue. We just heard from the Supreme Court, and otherwise the value of their specific work might be pennies. Then the practicality of coming up with licensing regimes across all different kinds of material, multiple kinds of copyrights, partial reservations of some rights.
Forget about it. I grew up in the Bronx. That's the way to put it. Adam time. Thank you. May I wrap. Yes thank you. And lastly, a rebuttal point. Before we get to rebuttal, you're going to hear a lot about the fact that what's really unfair about fair use is that some of the work, a lot of the work in the training data sets is copyrighted and wasn't there without permission.
Well, that's the point of fair use. But what this indicates to you is that a commentator and a judge heavily relied on by the courts, including the Supreme Court, says morality, just like fair in the usual sense, doesn't enter into the equation. If an author steals the source material for their book and writes the book, that's not material to whether it becomes copyrightable.
And the same holds for fair use. We may not like it a lot, but there are giant reasons progress in society, fair use, fueling progress. That makes a big difference. So we've talked about training. We've talked about on copyrighted material, and we've talked about why it's fair use. And with that, I humbly urge you to vote in the affirmative.
Everyone, let's give Adam round of applause for his argument. Thank you. Adam OK to the stage, David. All right. Thank you. David you let me know when you want me to start. All right, well, the timer should be going on again. Or is that just like a. It may start, but I'm holding it here.
Does it start it. All right, I'm good. Let's do it. Let's go. All right. OK, so a finding of fair use would mean that generative AI or Gen AI companies such as OpenAI, Google and Meta don't have to pay for the copyrighted work they use without authorization or compensation.
Copyright law stems from the US Constitution, as we just heard, and when thinking about the issue, the most important consideration must be whether the finding would promote progress of science and useful arts. Most arguments around fair use tend to get bogged down in the legal technical discussions based on case law. Comparisons can feel strained because we are necessarily trying to draw an analogy or distinction between VCRs and ChatGPT or plagiarism checker and ChatGPT.
While I'm happy to explain how each major case is distinguishable from current arguments for fair use, I think it more often confuses the issue and overwhelms listeners. It sheds more heat than light, but I don't want to dodge the issue. So here we go. As a foundational matter, Gen AI models do not involve criticism or commentary.
A search or indexing utility, software interoperability, or any other purpose recognized as transformative under fair use precedents. Nor can they claim that the output of their models offer commentary, searchability, or other functionality with respect to the works they're trained on. We can tidily group the distinctions distinctions of case law as follows. Google Books, HathiTrust, iparadigms, perfect 10, and Kelley all decided that the use at issue was functional and not based on the expressiveness.
For example, in iparadigms, the product simply told users how much of the submitted work may have been plagiarized for Google Books and HathiTrust. The product provided information about the books, including whether a specific word was present and how many times, it appeared in the text. Perfect 10 was just about how search engines could show low quality thumbnail images. In no case did the expressiveness of the text or image matter to the product.
Google Books, turnitin.com, and image search would all work exactly the same. If the words in the text were a jumbled mess of random letters, or the pixels in the image were the visual equivalent of white noise. The other most cited cases are all about interoperability. Again, none are about exploiting the expressiveness of content. For example, in Saga, the engineers only copied all the code so they could identify the segment that would allow their games to be interoperable with the console.
They did not use any of Sega's expressive elements. The facts were almost identical in Sony. For Oracle, the court determined that Google was only after the functional aspects of the commonly used code, so the programmers could then provide the creative elements. All of these cases can be readily distinguished from AI models. Unlike the products in Google Books perfect 10 and AI paradigms.
AI paradigms, the quality of the expressiveness matters to Gen AI. That's why they want books more than ex post professional artwork, rather than those by preschoolers, and professional songs rather than those of your second cousin's garage band. How AI models can produce text, images, and songs. That's because the developers took the creative works and fed them into the models in their entirety, in the order that they appear in the original works.
This is the only reason why models are able to generate competing works in the style of particular individuals. The models don't just train on the tokens and then forget them. Many are stored within the models and chunks just as they appear in the original. As you can see here, finally and fatally, for any argument hinging on intermediate copying the companies creating the AI models at issue aren't merely looking at the functional aspects of text or images, and then the AI companies provide all of their own expressiveness for everything the models generate.
In fact, unlike the people in Sega Gen AI, companies provide virtually none of the expressiveness. My opponent has a further hurdle to clear. He must argue that downloading tens of millions of files, including books, articles, and papers from known notorious pirate websites to train Gen AI is perfectly legal. Why Because all major AI models, including those made by members of his organization, do it.
Meta, not a member of the Chamber of progress, for example, argued that once a book is published, it does not matter how anyone acquires it, including by downloading from a pirate site where the creators of the books are never compensated and where meta never provides any attribution. Google, Amazon, and NVIDIA, all members of the Chamber of progress, must all make a similar claim. For context.
Meta torrented hundreds of terabytes of files, which, as the plaintiffs noted, was 20 times more than all the printed works in the world's largest library, the Library of Congress. My opponent's clients, some with trillion dollar market caps, say their technology is revolutionary and can solve some of nature's most difficult problems, even people or even people with PhDs struggle with.
And yet, it's just too gosh darn hard to figure out how to collect training data legally when enacting the Digital Millennium Copyright Act DMCA in 1998, which updated the Copyright Act for the digital age. Congress has expressed intent was to ensure a thriving online marketplace for copyrighted works and those seeking to use them by safeguarding the ability of copyright owners to distribute their works in protected formats.
The DRM, the determination of fair use in this case would directly undermine that objective by rewarding the intentional exploitation of stolen works as an alternative to authorized access. There is one more thing proponents of fair use must contend with. Every argument they make in favor of fair use would work at least as powerfully for any individual human.
That is, if their overall arguments prevail. Nobody should have to pay for any digital book, Article, song, movie or artwork again. For example, the use of copyrighted works by humans is highly transformative. The input is copyrighted content, but the output is entirely different, driven by statistical patterns recognized by brain neurons rather than the mere compression of data or the stitching together of data sources.
Brains are not like databases storing copies of the original. This allows brains to perform such actions as creating a song about living in the ghetto, in the style of Johnny Cash. For the second factor, Gen ai! Companies absolutely train on highly creative works, but even their best arguments work just as well for humans. Brains weren't specifically designed to capture expressive elements.
Rather, brains aim to understand the underlying relationship between words and sentences, colors and shapes, sounds and rhythms and so on, and they store some representation of them for factor 3. Humans do not require mass copyright infringement to be useful, and that should count in favor of humans, just as this factor has always favored less unauthorized reproduction of copyrighted works, not more. Finally, for the fourth factor, humans rarely unfairly compete with copyright owners and do so no more frequently than AI models.
Though someone may read a JK Rowling book, almost nobody can memorize large chunks of the books, even if they want to. Unlike Gen AI, this makes humans less likely to create an output that is substantially similar to the original copyrighted works, which in turn makes humans less likely to produce infringing outputs. Seeing that the four factors are of no help, proponents turn to more exotic arguments.
I've identified the eight most common and highlighted the ones I'll emphasize more. The first is that AI companies should have a right to learn, just like humans are free to read content and learn from it. But humans can't legally download every song, book, and movie ever created. Even if we would be physically incapable of listening, reading and watching it all.
The exclusive focus on output and evaluating generative AI infringement would mean that anyone can make copies of anything as long as the subsequent creation, after reading, listening, and viewing the copies is not substantially similar to the input. So a human could read as many pirated books as they want, so long as none of the humans writings after reading those books closely resembles the books, which would be basically all the time for every copyrighted work, you encounter.
Finally, it is essential to distinguish between innovation and progress, because not all innovation equates to societal advancement. Even if Jin AI does lead to progress in some ways, it's not clear that its overall impact is positive. More to the point, humans are far more likely to make meaningful discoveries and useful innovations, and they do so even with an enforceable copyright law.
Let's play a game. I'll list important advancements made by humans, many of whom probably read fewer than 1,000 books. And then I'll list all the discoveries of a similar caliber. Humans, the laws of physics, calculus, the theory of evolution, quantum mechanics, microRNA, antibiotics, DNA structure, thermodynamics, Gen AI.
If you really want to juice the progress of science and the useful arts, humans have a much better argument for removing copyright law for everything being fair use. The unfortunate side effect of obliterating copyright law, though, is the tiny matter of destroying the economy in some prior case law is fundamentally different from the facts and context that I use to build the most capable AI systems.
Companies must download millions of books and articles from pirate sites. If companies are allowed to download from pirate sites, it will encourage more people and companies to do so. Not just one LLM, but all LLMs. Why buy the cow when you can get the milk for free. Lastly, every argument a company could make applies at least as powerfully to humans. Notably, humans cannot legally download anything they want to learn from without authorization or payment.
So the real question is, why should the companies at issue, which are all for profit, receive special legal privileges denied to humans. Thank you. Thank you. Thank you both for those compelling arguments. Now we will have a rebuttal from each of our debaters. First Adam to give a three minute rebuttal to David.
Copyright law is really, really strong and really, really extensive. And fair use is a defense to liability. So if you're using the word with any seriousness at all, you're using it in court. Which means that the notion, fanciful as it is and entertaining as it was, that somehow acknowledging fair use in the context of the newest, innovative technology to challenge courts as every technology that comes along does, because it's never exactly the same as the ones that came before, suggesting that copyright law is not adequate to protect existing rights if we grant fair use in this particular case, simply makes no sense at all to me anyway, and I hope not to you.
Secondly, let's talk about innovation and progress. We don't know what kind of progress comes from innovation until long after the innovation is invented. That's why we want to encourage it. We want to incentivize people, we being the framers of the Constitution. That is not me. We want to incentivize people to create to the maximum degree possible, but not a Whit more than that.
Lots of commentary about that from the framers and judges beyond. So the idea that we're going to impede innovation when there's a long tradition of fair use applied to new technology, makes no sense from a constitutional point of view. And it also makes no economic sense, because when you look at those technologies, not the crickets, but the technologies that came along that have been disruptive to society, but which produced enormous amounts of progress and often which involve copyrighted material.
The list actually is quite long, and it includes the evolution, for example, of David mentioned, the VCR seems like a somewhat more frivolous example, but how about the interoperability of all software. We take it for granted. Now we take the existence of browsers for granted. So what's at stake here is, in fact, enormous. But what's at stake is not the entire corpus of human knowledge and copyright protection, as David suggests, because his argument about humans simply goes way too far.
Nobody is suggesting by analogizing to what generative AI does, to how humans learn that they're really the same thing. Of course they're not. It's a machine. It's a process of tokenization, among others, that produces a specific piece of software. And that's what it uses the copyrighted material for. It does not substitute for it.
So if you follow David's argument all the way, we have to throw out not copyright law. We have to throw out fair use. And with it, we have to throw out the constitutional purpose of promoting progress in science and the useful arts, and leave a lot of profit that we won't know how to realize just yet. But we'll be glad 20 or 30 or 50 years ago from now that we did. I suggest progress and innovation are the same thing, especially if you wait long enough and enable it with fair use.
Thank you, Adam, for that rebuttal. All right, David, you have three minutes to offer your rebuttal. Excellent. Thank you. All right. I was at a slight disadvantage for this rebuttal because Adam received a full copy of my opening statement, but I was only provided his outline. So this rebuttal is based on that outline, not necessarily what he just said.
Adam's statement is about not or about Adam. Statement about not storing a duplicate of training data in the models is misleading. A recent paper, co-authored by none other than Stanford Law professor and former consultant for Meta's fair use defense, Mark Lemley, notes that Meta's model quote memorizes some books like Harry Potter and 1984, almost entirely unquote. It's also misleading to say the models don't compete with the works they are trained on.
Suno, for instance, is designed to compete with the works it's trained on. It only makes songs, and all the songs it makes are based on training on tens of millions of mostly pirated songs. In fact, some plaintiffs made a web page that compares the AI version with the original, so you can go judge for yourselves. Otherwise, Adam and I agree on most of the basics.
The rub is that those basic facts favor my side. For example, the goal of copyright is to promote the science. Promote science in the arts. We don't allow humans to ignore copyright, even though humans make more transformative use of copyrighted works and declaring everything fair use for everyone, including AI companies or even for only AI companies may spur innovation in the short term, but it would certainly harm innovation over time, because people would lose legal protections that incentivize creating and sharing content.
This is already happening with people moving stuff behind accounts and paywalls. To put it more pointedly, why should we believe that permitting humans to consume copyrighted material without authorization would hinder science and arts, but allowing models to do the same thing would not. Next, it's true copyright doesn't protect underlying ideas, but AI doesn't somehow magically collect underlying ideas and discard expression.
Memorizing Harry Potter isn't merely memorizing ideas from Harry Potter. And of course, it would be silly to think that when a machine learns, it picks up ideas. But when a human learns, we pick out ideas and expressions. Moreover, the weight models train or change the training data is not any more impressive or deserving of fair use deference than how your brain perceives images or sounds.
When it receives signals from photons and acoustic waves, they are converted to electrochemical signals that traverse trillions of cells, and somehow we're able to make sense of it. Your brain doesn't receive words. It receives signals that it learns to perceive as words. After being trained over many years of looking at them and learning how they work in context with other words, like an LLM.
Don't be misled by companies that try to make model development seem super technical, complex, or mystical. Your brain is more complex and mysterious. I want to conclude by noting that the purpose of fair use isn't to make life as simple and frictionless for tech companies valued in the hundreds of billions or trillions of dollars, so that they can go on to make another trillion dollars.
It may be inconvenient that they should license training material in order to learn just like humans, but that is not a tragedy. Thank you. Thank you very much for that rebuttal. So now we're going to open the floor to questions of our debaters. But first I just kind of want to let you know I'd love to read out the results.
So you guys know where the original votes stood. Kind of. So give you both the opportunity to see what level of heel you have to climb. So in the affirmative that AI is fair use. The original vote was 55 votes in your favor against AI being fair use was 123 votes in the favor of David. So, Adam, no pressure, but you're going to have to nail these questions.
Are you ready. So let's start. Let's open it up to our audience. Do we have a first question. It's a little difficult to see. So why don't you raise your hand. And our facilitators will bring the mic over. Have 1 over here. OK great.
Judy has one first. OK awesome. If you would state your name and the organization that you're here representing, we would love to then get your question after that. And I just want to give a sort of disclaimer to everyone to let's please keep our questions concise and succinct. All right.
Let's start. Judy, your person I think. Yeah sorry, sorry I'm sorry I think I saw you raise your hand over here. Yeah sorry. It's a little difficult. Yeah we got you. OK, OK. That's one.
Go in there. Yes back here first and then Tim. OK hello. Jenny podrasky from ars. I wanted to ask specifically to the point of talking about the competing value and comparing the value of 1 specific work to the value of an LLM and wondering if there's any weight to collective value, any thought to even if one particular work cannot it doesn't directly compete with an LLM.
Is there any weight to thinking about the collective value of the works that it is trained on. How does that factor into that analogy of a drop in a swimming pool. It doesn't. And there's a good reason that it doesn't. First off, copyright law is, as I indicated, extensive, and it's comprised almost entirely of the rights that owners have.
Fair use is an exception to those rights for very specific purposes. And there's lots of precedent on fair use. And there's even more precedent on actual fundamental copyright rights. And we would have to junk the entire system, at least on the fair use side, if we adopted that way of thinking. These quotations that I was able to share with you from our transformer friends are not isolated instances.
They are as the boldfaced indicated black letter law, as the real lawyers say. So we can think about it from a macro societal level. And that would be interesting and useful and productive for Congress to hear about, to think about whether they want to change copyright law as they have in the past, like in 1976 when they codified the whole thing and added fair use for the first time 50 years ago.
So your point is well taken. It's interesting. It may be relevant, but it's not relevant to this with respect to bait. And just to be clear, to clear, whenever you ask a question, please just note who you're directing it to. Was that a question for both of the debaters.
I would love to hear both debaters thoughts. Yes OK. Yeah I'm more agnostic about the right approach to this. I would say that even that individual person should be compensated for that. One individual work the same way that when I hear a song on the radio on the way here, or I pay, I buy a book, I read a book last year, whatever. That probably has very little bearing on my brain and what I remember how I'm thinking and things like that.
But I still had to pay for it. I couldn't just make a copy of it for free and read every book that I want. The same should apply to limbs, even if it just makes a tiny fraction. It's a drop in the bucket. That's how almost all data is to all of us when we learn things, and I don't know why we should treat them exceptionally. Thank you.
OK next question. Hey, Tim vines from an AI company called DataSeer. So one of the things about current LLMs may be coloring this debate is that they're great averages. They just produce the next token, on average across a vast amount of data, which means they're kind of useless for science, actually, because science is not an average of all previous papers.
It is the best work. The best supported ideas become even more have outsized importance compared to a lot of the guff that otherwise gets published. And so I want to hear from the presenters about how if the new generation is able to understand the weight of evidence, the importance of what it is reading, how that changes their perspectives. OK yeah.
So you're absolutely right. It's not great for science right now. That's why it hasn't invented anything useful or anything of note. Adam mentioned that maybe in the future it would. This is all very speculative. I don't really know what to say to add on to that. I agree with you for the same reason.
There was a recent opinion piece in The Washington Post. They looked at the study where they gave these judges questions. They gave them information on a trial or whatever and asked them how they'd rule on it and rule judges all ruled those. If the person said, I feel bad about what I did, they were more lenient to them. But when they tested on it just stuck to the law.
And so maybe this is a sign that it's more formal. It's going to follow the law. It won't be swayed by emotion. And then they gave the test to law students and they did the same as the alums did. They're just trying to be formal. Go forward with it. Whatever and when I read this, the way I think about it is that means the law in science would probably evolve much more slowly over time if it was mostly led by LLMs, because they're just looking at what the distribution of their tokens are.
It's not looking for novel insights. It doesn't know how to think. A counter perspective, a counter view. They're not really great at hypothesizing. They don't know how to structure a study. They can't do any of this autonomously, by the way. So I don't see any argument for how they're going to promote science any more than any human would.
By working with the factor that we're leaving out of that equation. Humans we're producing with LLMs or diffusion models, or any other kind of LLM or other kind of generative AI. We're producing a multi-purpose tool, the classic Swiss army knife, if you will. Chainsaw pick your favorite. That are wielded by humans. Homo erectus showed up and started using tools 1.6 to 2 million years ago.
We keep inventing them. We keep using them. We keep innovating. We keep making progress. And though I'm not a scientist, so I will defer to the real ones in the room who can talk about the fifth level of AI training. These models may not have come up with calculus or invented a new branch of philosophy, but they have figured out protein folding in a way that's going to help us cure cancer a lot faster.
They've figured out the way that various drugs interact not just to kill people, but by looking at combinations of these and seeing patterns that no human could. But humans can instruct them to look for. They've figured out new drug compounds that are saving lives now. They're working with radiologists to read MRI images and spotting cancer, where the humans quite acknowledged, particularly with ophthalmological cancer.
They didn't see it. No pun intended. So the issue is not whether or not Gen AI is going to get good enough to replace humans, or to predict the next great Broadway musical. The issue is, do we want to interfere with a long standing body of existing law. Well-reasoned, consistently reasoned, that provides enormously strong protections to copyright owners, but does not let them stop progress, including by not stopping competition.
It's the substitution for a copyright owner's work that the courts pay attention to and should. And lots and lots of situations where people copy other people's work is going to land them. Big copyright penalties. Thank you. Bo next one. Eric thank you. Hello if you can see me, I'm standing up.
My name is Morel Yano. I'm from Springer Nature. I had to write down my notes just to answer the question. So from what I've understood of fair use is that it is for educational purposes. It seems like we're going a little off topic when we're saying it's for a learning of a machine learning. So I'm looking at the machine learning versus human learning. The machine learning.
It seems like we're talking about what comes out of it, innovation, et cetera. But from everything I've learned, it means fair use is for educational purposes of a human. So I guess I would like to have a kind of confirmation or description of how you are saying that fair use is now a machine is getting the same rights as a human. Thank you for the question. Had time permitted, we'd be able to go into this in more detail instead of just flashing section 107, the fair use provision of the copyright law and the screen.
It's understandable why that's your impression, because Section 107 very clearly says that works may be used without the permission of the owner in certain circumstances, including and that list of including talks about exactly the type of things you're talking about. But it is not intended by Congress. Their reports indicate it has not been found by courts. Decision after.
Decision after Decision indicates to be limited to that. And that's why we've got lots and lots of cases where fair use has been found. In the case of New technologies coming along, where the issue is the new technology itself a substitute for the original copyrighted work. Did it use too much to your point about quantity used too much of the original work, or is it justified because what's produced is transformative, and even where the transformative use competes.
If it's transformative enough, it's OK. Sega, the game cartridge company, wanted to keep a monopoly on the game cartridge market and the console market, which was even more lucrative. A company named accolade came along and says, we'd like to get in on that action and make game cartridges that fit the Sega machine. They bought one. They did what's called reverse engineering, which included copying certain kinds of their code in order to create cartridges that would not get kicked out by the Sega machine.
Classic example taught in every copyright class of transformative fair use with a new technology had nothing to do with education. David, do you have a response. Sure, yeah. So it probably won't surprise you to learn that I have a different take on Sega. So in that situation, Sega would not allow them to have a copy to learn for, so they could learn how to make their games interoperable with the console.
Sega just said no. In this case, we're publishers. That's not the companies could go to them and license the content. So that's already a key difference. The second part of that is that when they took the code, they were only after the functional aspects of it. But AI is after the expressive elements. When you're talking about tokenizing, you take the words, you break them into segments, you feed that into the machine.
Those are embedded within the parameters of the models. Those tokens are there. It's inside it. It's a copy of it. So it's very different from that case when they were only focused on the functional aspects of the code and provided all of their own expressiveness after they got that functional part. All right.
Thank you very much, David. All right. At this time, we will have a final closing argument from each of our debaters. You will get two minutes. Exactly afterward, we will have a vote and we will see who moved the most to their side. Are you ready, Adam.
Technology is scary. It's scary to me because I supposedly get a paycheck for trying to understand it and be able to talk about it to people like you, but it's also scary because it is so disruptive. It's not understood well. And what people do first come to understand is that Scarlett Johansson's voice got used without her permission, which doesn't sound fair at all, and arguably costs Scarlett a bunch of money.
Or in the case of somebody not as famous as Scarlett Johansson, it's going to have an effect on their livelihood. If you're a screen actor or if you're a voiceover artist. Generative AI may well have an effect on that, and using your voice without your permission, for example, is a violation of all kinds of law, and soon it will be a violation of actual federal copyright law because they're about to change it. My point is that what's fair under copyright law is fair for the specific reasons that copyright law exists.
You've heard all about promoting progress in science and the useful arts and innovation and progress, and that's held up for a really long time for really good reasons. And we play with that at our peril. Copyright does not. To get to a point that David made, does not protect facts, it just protects expression. But learning loosely analogized learning from that expression.
To statistically be able to predict and enable new creators to create new material is what's critical and is what is the fundamental objective of copyright law. Copyright in that way is a fundamental pillar of our democracy and fair use and its democratizing effect. Remember, not all AI developers are gigantic companies. Apple, when it invented the computer, started in a garage. So we play and undermine fair use at our peril.
Instead, we should embrace it. And I urge you again, respectfully, to vote in the affirmative. Thank you. Let's get round of applause for this closing argument. And, David, we'll have your closing argument. Thank you. All right. Yeah so you've heard both sides of this, and we are pretty far apart, I think, in how we're interpreting all of this.
When we talk about prior fair use in these other cases, as I laid out in my slides, those are based on functional or interoperability reasons. Search, indexing, commenting, criticizing, making a parody of something it was not about taking the expressive works for their expressiveness, that you're exploiting the actual expressiveness of the works. That is very different.
So I is just very different in this regard. And regardless of that, if the main point of copyright is to promote science and the useful arts, one, you have to have a strong case that it's actually doing that, which I would argue it does not. It just hasn't materialized. And we shouldn't just grant blank slate fair use based on some probable or possible future. Furthermore, and this is kind of my hobby horse because I've written some law review articles on it.
I believe in this thing or this concept of AI exceptionalism. And I've laid this out a few times here, but we should not grant AI companies rights or privileges that we do not give to humans. And in that if the GenAI says that they want to learn, they need this content in order to learn, they can pay for it, just like you and I do, because their use of it is not that different. We make better use of it when we get the same content, and therefore, we have a stronger argument for making everything fair use, and yet we don't.
So the companies can step up their game and require less material to do as well as humans. That's fine. They can license the material they should. And then finally, because I got 20 seconds here, I hope you guys find me on LinkedIn. I would love to continue this conversation. If you have any more questions. I do have a Substack thing where I talk about fair use a lot, but other legal things related to technology, so please find me on LinkedIn.
Thanks Thank you very much. Nothing like a little shameless promotion to close us out. Awesome Thank you all. So now we are going to take our vote. Remember, this is not a debate based on popularity. This is a debate based on persuasion. So it is based on how many of you move, how many folks you move to your side. All right.
So like you did before, see the QR code, go to the website and log your vote. After a certain point, we will cut off the vote and we will tally them up. Sum up the difference. In the last two minutes. Where is it. I don't know where I am, but I know I don't like it.
Does anyone need any more time. OK words like conviction can turn into a sentence. And Johann rings popped out on the finger. We danced and danced and danced for 30 more seconds just to make sure that was at your heels.
Next time tonight. All right. How are we doing. We're good. All right. One last refresh in five. Four three 2 1. OK let me see.
All right 6 2. One 1855. OK, so we started 55 to 2955 in your favor to 29. So actually we went down on your side. David I'm sorry. Adam and then 123 to 118.
So you lost a few votes as well. Adam which probably is likely due to the fact that some people voted who didn't vote, maybe some people left. So in some what it really means is who lost fewer votes. I'd like to say congratulations, David. You were the lesser loser.
And therefore, you can be declared the winner of this debate. Can everyone please give both of our debaters round of applause and thank them for their time. Thank you both for your presentations. Thank you both for your preparation. I hope you all learned a lot today in this debate, I certainly have. I hope it has challenged you and challenged your thinking, and hopefully we have moved the needle today in this AI conversation.
With that, I will hand it over to Melanie. Thank you. Thank you so much, Esther, and thank you to our debaters as well. That's one of the tougher formats for folks to prepare for. So we really appreciate your participation and your good humor as well. So just to wrap things up for the meeting, thank you all for coming to network and learn and collaborate with us this week.
It was so wonderful to see the energy and excitement of the meeting this year. Once again, I want to recognize our sponsors and our exhibitors for their support. We hope that you made some connections with them as well this week. And thank you to our program chairs, Aaron Foley, Jessica Slater, Greg Fagin, and the entire annual meeting program committee.
The program was absolutely amazing this year. Thank you so much. Most sessions were recorded this week and will be available to you within the next week or so as we get them posted, and that will be on demand library. We also this year have the annual meeting highlights virtual event on June 17, which is a recap of what you might have missed this week as an annual meeting attendee.
It is absolutely free for you. You'll get an email about it, but you do have to register so we can do the appropriate planning on our side. But it is free. Watch your inbox for that and we hope to see y'all again next year in Chula Vista, California. May 27 through the 29 for the 48th annual meeting. And we also offer educational opportunities year round through our webinars and our new directions seminar.
That app that you have on your phone, that is your guide for this meeting. We'll also tell you about all of our future events. And you continue to collaborate with our community through the community feed and SSB engage. So don't delete that app when you're done. It is a wonderful connection back to our community. And finally, please complete the meeting evaluation. Your feedback is incredibly important.
We always try to raise the bar every year with this meeting and that is how we do it. We look at what you liked, what you didn't, what was important for you and for anyone that's sticking around. Tomorrow, there's a variety. There are a variety of activities suggestions in the SSP engage for those of you that are heading home. Safe travels and thank you so much. We really appreciate your participation this week.