This episode is the “Copyright and open sharing of heritage collections for data: bounty or bane for creativity in the age of AI?” panel from the Generative AI & the Creativity Cycle Symposium hosted by Creative Commons at the Engelberg Center. It was recorded on September 13, 2023. This symposium is part of Creative Commons’ broader consultation with the cultural heritage, creative, and tech communities to support sharing knowledge and culture thoughtfully and in the public interest, in the age of generative AI. You can find the video recordings for all panels on Creative Commons’ YouTube channel, licensed openly via CC BY.
Marta Belcher (Filecoin Foundation) moderating a conversation with Aviya Skowron (EleutherAI), Dave Hansen (Authors Alliance), Rebekah Tweed (All Tech Is Human), and Eryk Salvaggio (Siegel Family Endowment)
Announcer 0:03
Welcome to engelberg center live a collection of audio from events held by the engelberg center on innovation Law and Policy at NYU Law. This episode is the copyright and open sharing of heritage collections for data bound to your Bane for creativity in the age of AI panel from the generative AI and the creativity cycle symposium hosted by Creative Commons at the engelberg center. It was recorded on September 13th 2023. This symposium is part of Creative Commons broader consultation with the cultural heritage, creative and tech communities to support sharing knowledge and culture thoughtfully and in the public interest in the age of generative AI. You can find the video recordings for all panels on Creative Commons YouTube channel, licensed openly SEC by
Marta Belcher 1:02
all right, hi, I'm so glad to be after lunch, when everyone's glucose levels are up, this is great, this is always a good slot. Everyone's in a good mood. And we'll try to keep the conversation really quick and really introduce a lot of ideas in a very succinct way. We have a fabulous panel, I'll introduce them as we go. But really excited to be talking about AI and copyright, which is a huge, huge, huge issue right now, I'd like to sort of start by framing our conversation. As you know, I think when it comes to AI and copyright issues, we really have three distinct areas. So the first area is inputs. So when you have inputs to models, what is the copyright implications there? The second is outputs, when the outputs look a lot like other works, what are the copyright implications there. And then the third is really copyrightability, an AI authored works, so I'm going to kind of frame the conversation that way. And I'm going to start with inputs, because I think that's where a lot of the action is right now. I'm gonna start by kicking it over to Dave Hanson, who's the executive director of the authors Alliance. Dave, can you start by telling us a little bit about copyright and, and inputs and how we should think about that?
Dave Hansen 2:19
Yeah, happy to, um, and thanks, everybody for coming back and being lively. But that's after lunch. So, you know, the input question is kind of interesting to me in that, in a lot of circles, and a lot of discussions is presented as if it's this, like totally new phenomenon that we're seeing, you know, organizations and projects, taking large amounts of copyrighted works, and doing these kinds of things that we see being done to them to then produce different AI models. But there's actually a pretty long history of this kind of work, going back, really to the early 2000s. And at least in US copyright law, there's a pretty to thing things really have contributed to that. One is a recognition within the law that it's very important to protect the sort of free space, the commons of ideas. And so regardless of all the other things that copyright may do, I will never protect facts or ideas. And so that's kind of principle number one that's been really important, just for the development of the Internet, and, and online communication. The second thing has been that technologies that work to basically extract or develop tools to take facts or ideas out of creative works. Those are very, very valuable for a whole variety of different applications. And so the way that we get to that is through the doctrine of fair use, which I'm sure we're going to talk about a bit more today. But we've seen precedents over the years that have really, very strongly valued this idea that copyright doesn't protect facts and ideas, and that it is okay to copy works for the purpose of pulling that kind of information out. And so you see that in applications all the way from plagiarism detection software. There was a pretty big lawsuit over 10 years ago now, establishing a precedent saying that's legal. Google Books and hiding trust, those two lawsuits are often raised in the generative AI discussion for training data. Those are really critical precedents. This goes much broader. I mean, it's kind of the foundation of like, web, web search and Web Scraping is this idea that we can take lots of data and do interesting things with it. And so my perspective on it is, you know, I think that those precedents are probably right. In that when we look at new technology, and we judge it, it's most appropriate to judge it by not kind of the technical process that we use to pull apart data and extract facts or ideas from but to look at what the outputs are, and assess what did those look like in relation to existing creative inputs? And is there a copyright issue
Marta Belcher 5:08
there? Awesome. David, this is, I think, such an important perspective. And one that I think is, is I underrepresented a lot of times in these conversations, which is that we have these really important precedents around sort of how data can be used, that are things that are the things that enable things like Google search, you know, being able to actually index and use data, for learn for machine for learning for machines to learn. And then when we have AI, and the kind of response to AI, the the gut reaction of a lot of folks is something along the lines of, well, hey, I don't want my data to be part of that input. And the issue there is, as I hear you saying, Dave, is basically Well, yeah, but at the same time, we have these really important precedents that when there's data out there that machines can can use it. So I do want to make sure that we represent sort of both sides of this issue. So you know, and sort of talk a little bit about what the, you know, how, how others are thinking about inputs as well, just so we can really have a sense of what what what this issue space looks like. So I think the next person I'd love to kick it over to is Eric selvaggio, who is an artist and a researcher, who recently organized an exhibition of AI pieces at DEF CON. How do you think about this issue of inputs to AI models as an artist?
Eryk Salvaggio 6:42
As an artist, I've been able to use I've been able to use these models, I've been able to use datasets. Back to the Gann days, right? People put these datasets out there, most of the time, no one wants to grab, like look specifically for public image domain images or anything like that, right? They're scraping things off the web. And this has been sort of open to everybody, because they've been saying, well, it's it's research, right. But one of the things that I think has really changed about this conversation is that we're no longer talking about research that is distinct from the original purposes from which these datasets, and by datasets, I should clarify, if my mom puts up 40, paint oil paintings of the beach on May in Maine to a website, she has built a dataset, right, that's what we're talking about, we're talking about data sets. If we are then taking that dataset that my mom has built, and creating a tool that is essentially going to say now, I'm going to create 1000 pictures of the beach in your style, you're creating an output system that is in competition with the original intention of that data. So we've shifted from this thing where data was like about the material that was being studied, it was about the material being researched, very solid, very important work being done, right, you want to understand hate speech on Reddit or Twitter, you want that data, you want to be able to find things and create novel insights into it, right? Putting all that information into a dataset, and then using that data set to generate it competitive product seems like a very different use to me. And it seems like something that calls for a very different set of priorities and thinking about what we can do to protect people, like my mom who was building a dataset of her paintings. So I don't disagree that like, yeah, copyright is all fine and good under copyright. But this and I don't like to be an AI hype guy. But the law is based on precedent, right? Lawyers are always telling me, I can't do things because it's against the law. And that's okay, but what happens when you have an unprecedented technology? Is it an opportunity to rethink some of those rules and rethink our approaches? And one of the ways we can rethink that approach is by thinking about this material about datasets when we talk about all this data, but like what's in that data? Like what is that data? Actually, it's my mom's paintings, right? It's poetry that people have written it is expression. And so using that data to analyze a reproduce, that seems like it should shift to the conversation of it.
Marta Belcher 9:24
Thanks for that perspective. Eric, I think it's an important one, I think you've articulated really well what we're hearing a lot of people say with regards to AI and input so I think that's, it's a really important perspective and definitely the one that I think a lot of folks have almost reflexively so makes makes a lot of sense. You know, a via I would love to also kick it over to you to talk a little bit about this issue of inputs, and, uh, specifically around sort of how you're thinking about the data that goes into these datasets as a ethics and policy researcher at an AI company. nonprofit, nonprofit,
Aviya Skowron 10:03
important distinction, I think, like in this debate is that what we do is as primary research into things like interpretability, and memorization, we do a lot of language models. So I'd better keep my remarks to just text that aside, because I do think that there are different problems that arise with images. We convey a lot of information through language 100 texts a month painting mom's paintings, not different category, mom's poetry. Yeah, I'm there are a lot of issues that I would love to sort of bring to this day from the technical side. This includes stuff like we actually don't currently have a way to verify whether a model was trained on a given input without data documentation. So what I'm saying is that unless a data set is documented, we actually don't know and have no way to confirm what went, what this means that I think that'll become a critical piece of this conversation that is, unfortunately, getting shoved to the side. And the pushing United States currently is, unfortunately, in the opposite direction. They're just the way incentives are aligning for the largest players in the field. Currently, it's to not document maybe they have internal documentation by district plate, so they certainly aren't making it public. I'm thinking about sort of more holistically, there's also the question of, of attribution. I'm thinking of from like, an artist side in the sense that attribution is important that I used to draw is kind of where this is coming from, not not exactly from my research. attribution is very important to people. And this is another thing that we like technologically cannot have, there is no way for me to take an output, and then tie it back to when I put it in, like, Oh, these are the, you know, this is what contributed to making this output this particular way, you know, there's no way to sort of thoroughly make a list of citations, that is an implementation of something like that, and CO pilots, as I know, but what they're doing actually is just taking the output and searching for it in a training data set. Saying that that's where it came from, well, decent guests, but that's actually different than, like, what we mean by traditional attribution, right.
Marta Belcher 12:45
So this is a really important point that I think you're making here, which is, or a very, very interesting point, which is the issue of where things actually come from, and whether you are tracking whether an input makes it into the output and how you track that. And I think one thing he said that was particularly interesting, there was this idea that, you know, if you're a company, if you're an AI company, and you're building something, your incentive is to not track that because as soon as you and to not be transparent about that, and frankly, I don't think you can blame the companies, because as soon as you do, as soon as you say that the inputs are, you know, from X, Y, and Z, suddenly, you're going to have three lawsuits, one from x, one from y, one from Z, right. And so, you know, just given the current state of law, it's really interesting. It's a really interesting point there about, you know, how do we how do we create those incentives? And we'll definitely come back to that point as well. Rebecca, I'd love to hear from you, as the executive director of all tech is human, how do you think about this issue of inputs into AI models?
Rebekah Tweed 13:48
Yeah, thanks, Marta. So I'm Rebecca Tweed. I'm the new executive director at Alltech is human. We are nonprofit based right here in Manhattan, but we have a global lens and reach. We are committed to tackling thorny tech and society issues in order to co create a tech future that is aligned with the public interest. And I think as we've seen here today, generative AI and copyright is definitely thorny, all tickets human is building and strengthening the responsible technology ecosystem. So part of what we're doing, we understand that the future of technology is actually intertwined with the future of democracy, with the future of work with the future of the human condition itself. And it is extremely important that we get this right. And it's crucial that we all have a voice in how these technologies are developed. Because we all have a stake in this. All of these technologies do impact society broadly, but they don't impact all of us equally. And we believe that those who have who are the most at risk for potential impacts and negative impacts deserve a prominent seat at the table. So that is why part of all tech is humans. Mission is career development efforts around diversifying that traditional tech pipeline so that we can include more backgrounds, more disciplines, perspectives, and lived experiences. And with generative AI, we have this incredible opportunity actually to use our voice because it is such a prominent and very public out there kind of technology. AI has been quietly impacting us for years. But because general AI is this massive, popular phenomenon, we actually have this opportunity to bring the conversation to a lot more stakeholders. And not only that, not just the general public, but also policymakers who are now feeling the pressure and understand that it is urgent that the time is now that we actually build some appropriate guardrails around these powerful new technologies that do impact everybody. And I think one thing we should acknowledge is that technologies does not fall fully formed from the heavens, we human beings are the ones who build these tools. And we should have a say in this and I think generative AI has given us that opportunity, we now have these fairly straightforward avenues to having a say in what happens to our technological future. We have existing laws on the books that we are trying to determine how to interpret now that we have these artificial intelligence tools. So you have for instance, the US Federal Trade Commission is determining how you know how consumer protection laws, how AI is impacting those. And there's an opportunity for us to make our voices heard the FTC is accepting comments. Also, the US Copyright Office is now accepting comments through mid October. So we have an opportunity as the responsible tech community as the open source community to make our voices heard. And the second way is through new guardrails, new legislation. The moment is now there is so much momentum around this issue finally, that it feels like if there are going to be laws enacted now is the time so you have the EU AI Act, which is in the final stages and should likely pass by the end of the year. And now finally, in the United States, we have a few different options on the table. Senator Schumer has the safe Innovation Act that, you know, even today he has meeting with various stakeholders that include a few civil society folks in DC today. And then we also have now last Friday, the US AI Act, a bipartisan bill that was introduced by Senators Blumenthal, and Hawley that has a lot of support from civil society and looks very comprehensive, and might actually be the one that gets across the finish line. So we have these opportunities. And finally, I want to say, we have an opportunity today, you know, we are shaping the norms around the usage of these tools. It's not just slow laws, or even interpreting laws that will still take years to determine, we get to have a say, in the norms around these tools and how we use them. And I think this symposium is a part of shaping those norms. And I appreciate Creative Commons for giving us this space today to have this conversation.
Marta Belcher 18:06
That's great. And I'm so glad that you covered those topics, Rebecca, and the point you're making about norms is a really important one. You know, one of the one of the things that as we're talking about this issue of inputs, you know, I think you've now heard from a via and Eric, really about the sort of the reaction that folks have, again, it's it's almost visceral, right? The reaction of these works, you know, whether it's your mom's beach photos, or whatever it is being used in these models, that there's this reaction of Ooh, that doesn't, maybe doesn't feel right, or that there's, you know, we should know what's going into it or, you know, people should be compensated for that. And I think a really important thing you heard from Dave, as well is on the other side of that, under existing precedent, that's very important, you actually should the company should actually be able to go ahead and do that. And machines shouldn't be able to learn from what's out there. That's, that's why we have things like, like Google search. And what Rebecca is saying here is really interesting around norms, which is this question of well, okay, like, I think that what you're saying raises the question of, well, is what we need in order to address this tension, right? We're seeing this tension on even just within the film, you know, on the one side, wanting creators to be compensated, wanting transparency into what's going into these models, and on the other side, is really important precedent around machines being able to actually use what's out there and learn from it. And in order to do useful things like search index, you know, basically learn, right. And so, is law, the right is copyright law, is law in general, the right way of addressing that or is it forms right? And so I think that's a really interesting double click on what you're saying I'm here, Rebecca. So Dave, I'm going to turn it over to you, you know, having having heard from from folks on what the, you know, the initial feeling is here around inputs, you know, what is it? What is at stake, from a legal perspective? If that's if that reaction is what informs changes in law?
Dave Hansen 20:20
Yeah, thank you for the question. So I think it's worth taking a second to step back and just kind of understand copyright, as, as contrasted with a variety of other legal regimes that I think, are implicated here. And that may be actually a little bit of more of a appropriate route for dealing with some of the harms or potential harms that could come from generative AI. But you know, copyright. Copyright is a pretty blunt instrument, and is extremely broad. So a lot of times when we're talking about that we're talking, when we talk about copyright, we're thinking about like painting or poetry or written published works. But copyright is this instantaneous thing that attaches to every creative work, I can almost guarantee you like, each one of you has created some sort of copyrighted work while you've been sitting in this room, since you got here this morning. And with that work, if you had the full weight of the law, and yet we didn't have access to, you know, defenses, like fair use, for example, right, somebody quotes from you and under existing law, you know, that's potentially a lawsuit that's worth $150,000 per work infringed. And so that's, like, pretty extreme. And I think sometimes it's important to, to understand the full scope of what that means when we're saying things like, it should be copyright infringement, if you're not attributed, appropriately, or other things like that, you know, that that takes things to a level that can have real chilling effects, on research that's been using this kind of technology for a long time, on interesting new applications, where people are using it in their creative process. workshop yesterday, we heard a lot about different people using this in their work, currently, and so it can have some real potential knock on effects there. But I think beyond a lot of what we are seeing right now with generative AI, those sort of a change in the law that would sort of eliminate or minimize the ability of organizations or people to essentially, extract facts and information from creative works, and then use that to detect patterns to identify themes, and, and create these generative AI tools and other tools, it would have a dramatic impact, I think across a lot of other areas where it will take a while to see, you know, I work a lot with researchers who are doing text and data mining work, really fascinating, you know, able to take medical imaging, for example, which by the way we don't think of as a copyrighted work, but actually is in a lot of cases, and use it to detect diseases, and to identify symptoms, and then come up with potential plans of action. And you see it in all sorts of research from that, too. There's an interesting project out at Stanford where they were taking police body cam footage, and then trying to understand like, how did the police actually interact with the people who they're talking with and doing that sort of sociological study? Those are the kinds of things that I think, you know, a fair use doesn't apply here, it's hard to see how it, it wouldn't also impact those kinds of applications. I mean, I think those research uses are incredibly important to protect.
Marta Belcher 23:41
So this is, this is a great, a great point, Dave, which is, you know, sort of the idea of copyright law, as you know, of these copyright law precedents as being very important, in general. And, you know, while at the same time acknowledging the you know, the response that folks have to AI, important to keep in mind that maybe the right thing, the right way to address it isn't necessarily an overhaul of copyright law, that what maybe what we need to do is continue to make sure that we protect fair use and, and related doctrines. So I guess, on that topic, for others, I'd like to open up to the rest of the panel on, you know, what are the right instruments for addressing that? Is it norms? Is it you know, fork, for example, there's, with web scraping, there is, you know, an ability that is not in not in the law, but really is just a norm for websites to say, you know, actually don't scan this website and robots dot dot txt, right. And is it so is it normal? Is it norms? Is norms the right tool for addressing these concerns? Is it technology, a va you were talking a little bit about about tech, or do other panelists think it is changes in law and if so, is it copyright Law, is it a different law? What what is it? So want to open it up to? How do we address those concerns while at the same time preserving these precedents? Well, and if you also because we also had, you know, some really interesting tech, I think tech insights here.
Eryk Salvaggio 25:19
I mean, I would just say quickly, I think one of the opportunities that this gives us an opportunity, one of the opportunities for an opportunity, one of the things we could be doing with this conversation to not this particular conversation, but generative AI is about thinking about what data rights mean to us. And whether or not we actually think that putting something online means that we lose control over it. Fundamentally, that doesn't seem right, it doesn't seem like that is all going to encourage the type of free sharing that Creative Commons was created right to to advocate for, if my fear is that if I share this drawing, it's going to be used against me by a model by a company for profit. And so there needs to be some kind of mechanism. And I think part of those mechanisms comes from Who are we sharing these images with? Right? Usually, it's social media companies, its platforms that are taking that and they're saying, Oh, we have the right to sell this. And I think that is one lever where we can think Do they should they? Is that something that we can't resist somehow? Or how the greater agency over as individual creators? So this question of data rights, and what's organizations, online archive social media websites can do with the information we share? If we really want people to share things that are thoughtful, that we really want to look at someone's illustration that they've actually put effort and time into, which could be aI generated, by the way? Although that's a totally different conversation. But do we want to say that's, they have to give up their control over how that's used? Because they've shared it on Twitter? I don't think we should, I don't think that supports a healthy ecosystem. I don't think that is encouraging people to share. And if we actually want to encourage that, then we need to rethink exactly how we are going to do that. And what these relationships with these companies are. In the US. I think data rights is a really undervalued conversation, we don't have a lot of it. And I'd love to see this, maybe you know, me, maybe it starts a flourishing of data rights positions and policies and debates. That would be incredible.
Marta Belcher 27:40
Eric, that's a great point. And I think, you know, one of the things you said that was, you know, particularly interesting is, you know, thinking about the comments, right, and how we protect the commons. And, you know, Creative Commons created credit CC licenses, specifically, to enable creators to clearly communicate to others, how others can use their works, which really enables people to share things more, right not to in order to take things out of the comments. So how do we when it comes to, you know, talking about how our works can be used by AI? How do we both clearly communicate, and enable creators to clearly communicate in order to share more rather than taking things out of the comments? How do we protect the comments and protect these important fair use rights while we address these concerns? So we'd also love to hear, you know, from Vienna, Rebecca, you know, is it norms? Is it tech hat? Like? What are your thoughts and ideas on how we address those concerns, while protecting important precedents and protecting the comments?
Aviya Skowron 28:51
I think an important part of this is also understanding the use cases, through people aren't just against the data being used, but all the people who have different concerns, they don't want it to be misused, or they would like to be compensated, or they would like to at least be attributed, you know, they would like to be acknowledged, as like contributors to have given to given the system. So I bring yourself to sort of point out that this is a complicated debate. And I do agree with with Dave, and so forth. I don't think that like copyright is the one solution to all the societal problems that we're talking about. Speaking of which, another approach that we actually support is that taken by the Writers Guild of America, who identified that their their problem is a labor shortage, right. It's a labor issue in the sense that people are concerned about being able to make a living. And from that perspective, we think that Actually WGA approach makes a lot of sense and like correctly identifies sort of leverage and where the pressure is being applied, you know, it almost doesn't matter what the tech is doing. An executive decides that it's now your job is to material what? What the SEC is doing in that situation? Yeah, yeah,
Rebekah Tweed 30:24
no, I completely agree with that. I think one thing we have to acknowledge is that we are, you know, we're not talking just about copyright, we're not talking just about like, one person's one piece of art, we aren't even talking about one person's like collected body of 40 paintings we are, you know, it is linked to this issue of, you know, an entire creative professionals in general, the future of their ability to make a living at this if, if everyone's copywritten works are collectively, you know, scraped and utilized by for profit companies to then do your job for you. You know, because those are intertwined as issues. You know, copyright is not explicitly the the only space where we need to be talking about things. I think this is an opportunity, you know, novel problems call for novel solutions. I think this is an opportunity to rethink to Eric's point to be rethinking how, how do we want our data to be used? How do we want these technologies to be used more broadly. And I do want to say, I think the the Creative Commons, exploration around preference signaling, I think, is a great kind of adaptation, to be able to kind of have the opportunity as creators to say, if and under what circumstances, I would like my own data and creations to be utilized. So I think it needs to be a multi pronged approach going forward.
Marta Belcher 31:55
Do you have anything to add here?
Dave Hansen 31:58
Yeah. You know, I think in terms of particular regulation, outside of copyright, where I see a lot of the harms are on the output side, particularly where generative AI is used for fraud or disinformation. I think that those there's probably some room there for regulation, probably also around exploitation of people's personalities, or personas, rights, publicity. But really what I wanted to say actually was going back a little bit to discussion about the commons, I think it's important to appreciate that like it taking generative AI out of the discussion. We live in a society where we have all benefited from uncompensated and unconsented, to exploitation of other people's works. That's how we learned, right? Like, every time we read a book, we don't ask like, Am I allowed to process and absorb this information and use it in my own life, right. And the fact that we're using technology to help facilitate that kind of process, it does change the conversation, I don't think that it's exactly the same. But I do think it's important to recognize that like, this is the world that has produced an incredible amount of innovation, because we do have this free exchange of ideas. And I think it's extremely important to protect that.
Marta Belcher 33:17
Great. And so, you know, I mentioned at the beginning and Dave, you I think have given us a good segue into outputs. But I mentioned that this is really, when you think about copyright and AI, you really can divide it into three buckets, which is inputs, which we've been discussing so far. And then outputs. So what what is the model outputting, and then authorship and copyrightability. So I think, I think, fairly, we have spent most of the time on inputs, because that is in fact, where all the action is at the moment, and where there are a lot of really interesting questions. But in our remaining 10 minutes, I do want to at least touch on these other two distinct areas of copyright issues involving AI. And I think outputs, you know, Davy started us thinking about outputs. And and you know, what happens when the outputs are too similar to the inputs. And also what happens when the outputs are, you know, as you were just saying, used, you know, used for evil instead of good. So maybe if you want to just start by giving us a little bit on your thoughts on outputs, and then I'll kick it over to others to talk about outputs as well. And then we'll try to at least at least hint at the topic of copyrightability. Sure.
Dave Hansen 34:33
So copyright 101, um, if you photocopy somebody's stuff, and your output is like almost identical copyright infringement unless you have some sort of extra permission or defense like fair use, and that's the standard that the courts have use for that is substantial similarity. So it doesn't have to be a perfect photocopy. There can be all sorts of other elements that you've taken, that feed into that output that can indicate that you've infringed the owner's rights. And that standard actually seems to me like a pretty good place to start for assessing whether there's copyright infringement and generative AI outputs. And you can see this I'm looking at certain models, right? There's a law professor at Emory, Matt sag, who writes a lot about this and the problem of essentially memorization and protecting, protecting from producing like clearly infringing outputs, he calls it the Snoopy problem, right? If you, if you ask for a drawing of Snoopy, you're gonna get Snoopy. Like it that that's a copyright problem. And there are ways to protect against that, I think at the system level, but just looking at the individual output, and whether it's infringing that seems to me like probably the right place to start from a copyright perspective.
Marta Belcher 35:51
And for others on the panel, I'd love to hear your views on this via especially you because I think, you know, the tech that you're you've been talking about is also really interesting here. So maybe starting with you on your thoughts on outputs, and then over to others on the panel.
Aviya Skowron 36:05
The good news is that memorization is an active area of study what we study actually, and, again, this is still like emerging literature. But there are certain things that you can do in order to minimize the risk of memorization. And the highest percentage anyone has ever found in a study is like under 5%, and a real world setting, it's like way less than 1/10 of 1%. So fortunately, it's not that often, thankfully, I do agree that, and this is what Matthew said, artists do that when you have a customer facing product, there are additional growth places that can be built so that you translate like Mickey Mouse to some other problem that doesn't doesn't immediately invoke the image of Mickey Mouse, for example. So yeah, we're working on it. The problem is that, you know, everyone's working on it as we're trying to find a solution to these issues. So it's like we're all on this rapid journey.
Marta Belcher 37:10
Yeah. And any other thoughts from other panelists on outputs before we talk about copyrightability?
Eryk Salvaggio 37:18
My thoughts apply to copyrightability. So I'll wait.
Marta Belcher 37:20
Okay, great. So and so we're just also going to I also want to cover the topic, briefly on this third topic within AI and copyrightable AI and copyright, which is copyrightability, when you have an AI generated work, what is the who's who, can you go and register that copyright? If so, who's copyright? Is it? Is it the person who wrote the code? Is it the machine? Is it the person who put in the prompt? Really interesting, and there's a lot of action there? Dave, do you want to give us just a quick, sort of like, where are we on that? And like, what are the thoughts on that? And then I'll turn it over to Eric and Rebecca and a via, for their thoughts on copyrightability?
Dave Hansen 38:00
Sure. Um, so the current state of the law is to obtain a copyright, you have to have a human author, you have to have human authorship. And we have one case on this. So far, the Copyright Office has established that standard for registering a copyright and there was a suit that went up in the DC District Court of Thaler versus promoter. And the Court affirmed that idea that like if you are going in and you have a AI generated work, and there is really no human creativity added to the output, that's not going to be protectable. The office has also released some guidance that isn't necessarily binding on anybody except for the process of registration, to kind of suss out like, Well, what happens when you have output that's like, partly AI generated, but also has human input added to it? And like, how do you have to handle that the offices approach has been to basically say you have to disclose the parts that are AI generated, and those don't receive protection. But that's pretty challenging for a lot of people where human creation and AI creation are very melded together. But that's sort of the state of law right now.
Marta Belcher 39:06
Eric, what are your thoughts on this?
Eryk Salvaggio 39:09
To me this this hits on one of the things that is a real paradox with AI is that we tend to give it too much credit. AI is not like a guy in a box, right? And this guy in a box isn't the one making the work. AI is a system. AI is a set of entanglements, I like to say between the people making the training data, the people building the models, the people building the regular content, moderation systems that block certain outputs on those models, and it is also the people who are taking those images and recirculating them, right. Every single piece of that is driven by a human being. This is a system that is built by humans trained by on human training data. It is steered by humans who are typing prompts into the window who are then taking these images and recirculating them according to their own contexts and demands and desires. And I think to say the AI made the picture cuts out a crucial systemic understanding of what AI is, and all the things that come along with that. It also is like dangerously close to saying that the AI is creative. And you can get into a lot of conversations about what it means to be creative. That's fine. We can call it creative by some definitions and not by others. But by copyright law, it seems unhelpful to define it as creative as the creative agents in this process. I don't know I am not a lawyer. I'm sure the lawyer is laughing at a lot of my ideas, and that's fine. But it seems to me like one of the things that we really should be thinking about is sort of systemic authorship. Because this is what we have, we have a systemic authorship situation. And it might be more comparable to looking at like something like the film industry, where you have the best lawyer and the gaffer and the writer and the director, right. This is a similar set of relationships that might apply to protecting the outputs of these works. But by saying the AI did this, and the AI can't give us permission to copyright it, so therefore, it can't be copyrighted, that isn't helpful, just not helpful. Whether or not it's right or wrong, it's not useful. And so I think that's one of those things that we might, again, it's an opportunity to think about systemic creativity, as opposed to this focus on individuals and individual artists that we have and individual technologies that we have, why not use this opportunity to say actually, we are like connected and the creative fabric, we always keep hearing this thing, Everything is a Remix, right? That doesn't mean throw out authorship doesn't mean throw out citation doesn't mean ignore the labor, it means let's look at that collaborative process that went into these outputs and think about how to treat that as a collective authorship problem.
Marta Belcher 42:02
Eric, that's a really great, you make really great points. And you know, I'm of course reminded of the fact that when the camera came out at that point, there was a case that went all the way up to the Supreme Court about whether whether you whether the person who took the photo has the ability to to copyright, their their work, given that it went through this machine. So I think the moderator is most important job is to end on time. But you've heard through this meandering through copyright and AI about input output and authorship. You've heard inputs about this sort of reaction to really wanting, you know, potential compensation for creators and also transparency about what is being used and how, but at the same time, you've heard about the importance of protecting precedent that really enables our Commons and free sharing and speech and fair use, and then you've heard about the questions around what are the right tools for addressing those concerns? And you know, is it copyright? Is it other laws that are copyright? Is it norms, all important questions, and this is really just the beginning of the conversation, not the end, so please think me join me in thanking the panelists.
Announcer 43:18
The engelberg center live podcast is a production of the engelberg center on innovation Law and Policy at NYU Law is released under a Creative Commons Attribution 4.0 International license. Our theme music is by Jessica Batke and is licensed under a Creative Commons Attribution 4.0 International license