Note: This column is also available free to the public. If you like my writings on genre issues, consider backing my Patreon.
Earlier this year I wrote a special report on what artificial intelligence (AI) generated art and writing might mean for artists and authors. The report examined the current situation of AI language and art models along with gaming out possible futures with regards to how all this might impact people. I think the report is still relevant and I stand by what I wrote.
One of the people I interviewed for the report was Amit Gupta, a science fiction writer and co-founder of the AI-writing tool Sudowrite. In recent days that startup landed in a major controversy over the launch of their new AI tool for writing long-form stories.
On May 17, Gupta's fellow Sudowrite co-founder James Yu announced the launch of Story Engine. As Yu wrote on Twitter, "Our awesome team worked with hundreds of novelists for months to build the ideal interface for writers and machines to collaborate on a narrative. … Story Engine is the first serious tool for long-form AI writing."
In the video attached to the announcement, Yu said Story Engine would allow authors to "write an entire novel in just a few days."
As Yu noted, previous AI writing models had difficulty staying on track with a story "for more than a few paragraphs," meaning most "AI products were optimized for short generations." Story Engine evidently fixes this by allowing people to provide the system with character information, overall plot points, themes, and more data tailored for longer-form stories. This helps the AI remain focused on generating a story that would, in Yu's words, stay true to the "author's vision."
And the reaction from the writing community to this development? Perhaps this headline in Gizmodo best sums that up: "Sudowrite Launches Novel-Writing AI Software for Pseudo-Writers."
To say that Yu's announcement was ratioed hard is an understatement, with his tweet receiving by May 21 over 8.2 million views but only 182 regular retweets and 963 likes. Instead, there were over 4,500 quote retweets where people generally ripped into the announcement.
For example, novelist Molly McGhee responded to the announcement by saying "The ongoing scam of A.I. erases that it is an algorithmic tool built off of someone else's labor without consent or compensation." Manny Edeko wrote, "I thought the whole point of creative writing was the creative part." M. L. Krishnan said, "If you're relying on AI to write your stories in any form, shape, or fashion, you are not a writer. You are not." And A.R. Moxon snarked "Finally, a writing tool for people who hate to write, or hate writers, or both."
A few writers defended Sudowrite, most notably S.B. Divya who said, "Sudowrite is founded by writers for writers. It's not here to take your job - it's here to help unlock and enhance your creative abilities. Their new feature, Story Engine, is pretty fun to play with!" A number of people including Zinnia Z pushed back on Divya about this, with Divya later adding "I didn't use any AI tools for MERU (or any other fiction I've written). I prefer to use my own voice while writing, but there are others with different fiction writing goals who might find it useful."
Protests Over AIs Built Off Our Work
One reason for the pushback to Sudowrite's new tool is the increasing understanding that ChatGPT, one of the major large language models powering these new writing systems, has been trained not only on content from across the internet but also on copyrighted works by authors.
A paper released this month described how researchers discovered GPT-4 had "memorized titles such as the Harry Potter children's books, Orwell's Nineteen Eighty-Four, The Lord of the Rings trilogy, the Hunger Games books, Hitchhiker's Guide to the Galaxy, Fahrenheit 451, A Game of Thrones, and Dune, among others." (Note: the paper was originally reported on by NewScientist. Their article is paywalled but this article in The Register gives a deep dive into the findings.)
This research doesn't mean ChatGPT or the models built upon ChatGPT contain the full text of copyrighted books. But training AIs on these works still raises serious ethical and legal concerns. And as authors and other creatives learn more about how large language models have been built off their own work, their outcry will likely grow even stronger. Especially since these systems appear to threaten many people’s future in the creative professions they love.
You saw this in the reaction to Sudowrite's announcement. Back in January Sudowrite launched a product called Shrink Ray, which allowed authors to upload their manuscripts and "get loglines, blurbs, synopsis, and full outlines automatically." This led people on Twitter to wonder if the manuscripts that were uploaded were used as "data collection" for the new Story Engine.
And it didn't help that two days before Sudowrite announced Story Engine, Wired released an expose claiming Sudowrite's system understood a very specific and original sexual term known only to a specific online fanfiction community of Omegaverse writers. As the Wired article stated, "Because the Omegaverse has such specific terms and phrases associated with it, ones that are found within fan fiction and nowhere else, it's an ideal way to test how generative AI systems are scraping the web."
This echoed accusations from last year that Sudowrite had possibly scraped fanfiction site Archive of Our Own (AO3).
Amit Gupta has previously denied that Sudowrite scraped anything from AO3, and he repeated that when I interviewed him for my report in February. And James Yu recently stated on Twitter that while they use OpenAI’s ChatGPT and Anthropic (Claude) base models, Sudowrite has "done no scraping or fine-tuning ourselves without consent."
In a new interview I conducted with Gupta yesterday, he repeated that Sudowrite hasn't scraped or used anyone's materials without permission. "No, we didn't use any works we don't have rights to," he told me. "We didn't use any works period. We didn't use those (uploaded) manuscripts for anything."
But the problem is that by using OpenAI's ChatGPT and other systems as the basis for Sudowrite, their startup is still benefiting from ChatGPT possibly using bestselling novels and other copyrighted works to train their large language model. As the Wired article explained, Sudowrite knows specific sexual terms and tropes from Omegaverse because it was built in part using ChatGPT. And OpenAI has notoriously refused to state which works they scraped and trained their models on.
I asked Gupta about this, and how even if Sudowrite wasn't directly trained on works they had no right to use, weren't they still benefiting from ChatGPT doing that? Gupta admitted there were ethical concerns around this and that this was an area where the law hadn't kept up with the technology.
He then added, "I think there do need to be a lot of conversations and a lot of thought into creating equitable solutions around all this. I think AI has the potential to help a lot of people and to advance so many industries. … My fear is throwing out the baby with the bathwater. We have to come up regulation, we have to come up with ethical guidelines for how these models should be trained in the future and we have to have a way to enforce that. But as a society we're in this messy middle ground or middle period where we haven't quite figured out how to do this. So we're all figuring this out as we go."
After my question about ChatGPT, Gupta shared that they'd been advised not to talk to journalists right now but he'd made an exception because of our previous discussions on this topic. "Anything I say in the hands of someone else, potentially, is going to be used against us," he added.
The interview ended at that point. The complete interview is below this report.
In some ways Sudowrite is trapped. By using the ChatGPT and other large language models as the basis for their system, Sudowrite will be blamed for everything those other companies do without any way to correct the problem. And because the organizations creating large language models like OpenAI are so secretive, writers must frequently assume the worst with regards to if their works are being used to train these systems.
For example, in a Twitter thread that includes screenshots of emails, David Lee Zweifler said that Amit Gupta used to be a contributor to the online writing workshop Critters. In one of the screenshot shared by Zweifler, Gupta told him "I'm working on an AI-powered tool (Sudowrite.com) and one of the unreleased features is a story critique. It doesn't work well, but for my amusement, and perhaps yours, I fed your story in to see what it would say."
Zweifler said he wasn't amused and never gave Gupta consent for his "work to be 'fed' to Sudowrite." Gupta later responded to Zweifler with his own screenshots of messages between the two.
In my interview with Gupta, he said none of what Zweifler wrote was used by Sudowrite for anything.
Should We Call It Plagiarism Software?
One thing that has changed in recent months regarding large language models is, ironically, the language used to describe them. As I said in my previous report, while these programs are called artificial intelligence, there's actually no intelligence behind them. Instead, what we have is machine learning and algorithms created from data samples.
Despite this, these creativity programs are called artificial intelligence, which as branding goes is brilliant because it makes what the systems are doing sound inevitable. But branding is still merely branding and there's no true intelligence behind what these systems are doing to our world. Only humans.
As Malka Older said, "I have no problem with the idea of collaborating with a machine intelligence, but that's not what this is. It's a regurgitation machine invisibly supported by underpaid workers to let the overclass feel pleased with itself and grift off that feeling."
The Writers Guild of America has taken up this fight by insisting that Hollywood writers not be replaced with or forced to use AI systems. The WGA has also made this a central part of their current strike.
As a recent WGA statement said, "It is important to note that AI software does not create anything. It generates a regurgitation of what it's fed. If it's been fed both copyright-protected and public domain content, it cannot distinguish between the two. Its output is not eligible for copyright protection, nor can an AI software program sign a certificate of authorship. To the contrary, plagiarism is a feature of the AI process."
Some of the WGA's protest signs and images are even more blunt, quoting photographer B.A. Van Sise who said it's not artificial intelligence, it's "plagiarism software."
Plagiarism software. If OpenAI used that term to describe ChatGPT instead of artificial intelligence their startup would never have started in the first place. If that term went viral these machine learning systems would have a much harder time becoming established.
As I said in my earlier report, I believe many writers and artists in the years to come will use machine learning as a tool. But I also believe the need for original stories and art created by humans for humans will continue to survive.
But machine learning will also likely upend the lives and livelihoods of many people across the world. And that's not something to be ignored or downplayed or trivialized.
OpenAI's CEO Sam Altman recently called for greater federal regulation of AI along with allowing people to opt out of ChatGPT using their data (something that users of ChatGPT in Europe can already do).
But in a just world, people around the world would be able to “opt in” to allowing these systems to benefit off their work and lives, not be forced to opt out. And note that what Altman's proposing is only allowing people to opt out of AIs knowing their personal and financial data. His proposal wouldn't stop OpenAI’s systems from training on copyrighted works created by the very artists and writers who stand to be hurt the most.
But of course the entire world opting out isn't truly what Altman, OpenAI and others pushing these systems want. As James Yu said in that Wired article, if people were able to opt out at scale, the machine learning models wouldn't work as well as they do.
The reason there's a lack of trust in what OpenAI and other companies are doing is because people have seen repeated cases in recent decades of the technology industry destroying and harming the lives of countless people. With the current WGA strike and the pushback we're seeing to Sudowrite's new Story Engine, it's possible we're witnessing a sea-change in the willingness of people to tolerate this.
If I'd had a chance to talk more with Amit Gupta, I'd have asked how he and Sudowrite could better work with and involve the writing community in the development and rollout of their Story Engine system. I’d also ask what they'd learned from the negative reaction to their rollout and if they understood the anger from other writers.
Finally, I'd have asked why the companies like OpenAI behind these groundbreaking "artificial intelligences" are so secretive over technologies that have the potential to dramatically change our world and hurt people’s lives.
I didn't get a chance to ask those questions.
# # #
Interview with Amit Gupta, Co-Founder of Sudowrite
Note: This interview has been edited slightly for length and clarity.
Jason Sanford: Sudowrite asked novelists to submit full manuscripts in exchange for outlines and such. Did you use those manuscripts to train your AI? Have you used any works you don't have rights for to train Sudowrite's AI?
Amit Gupta: No, we didn't use any works we don't have rights to. We didn't use any works period. We didn't use those (uploaded) manuscripts for anything.
Sanford: I know you've seen the comment by David Zweifler about stories from Critters.org. Were those used to train Sudowrite?
Gupta: None of that was used for anything and I wrote him a critique myself. I don't know if you saw the emails I posted, but basically I had given him a critique. And we were at the time building this tool that would allow Sudowrite to give people feedback on their short stories, so I had used that tool to generate some feedback for him.
Using that tool to generate feedback doesn't change the tool or improve the tool or feed anything into the tool. I think people have this conception that the AI is just constantly getting better, and every time you use it it's learning from what you do when you're using it. That's not true.
The analogy is if you use a spellchecker to check spelling on a piece of grammar or whatever else, it doesn't actually make the spellcheck more intelligent. It's just running its program. The same is true of these large language models. Now, you can improve spellcheckers and you can improve large language models, but the process is very different. It's a distinct process of training. We don't have the capability of training GPT 3 or GPT 4, that's OpenAI's model which we use through their API. We haven't done any training, we haven't done any modifications to the models, we haven't taken any of David's works or anyone else's works and used it to improve these models.
Sanford: Evidence has emerged that ChatGPT trained on copyrighted works by bestselling authors, possibly even my own writings and the writings of authors I know. Even if Sudowrite hasn't trained on works you don't have the rights to, isn't your system benefiting from ChatGPT doing that? That cuts to the heart of what a lot of writers are concerned about.
Gupta: I'll preface by saying I haven't read the (New Scientist) article. I don't know what method they use to determine what material is for training, my knowledge on this isn't complete because I'm not OpenAI. They're the only ones who can really know if it has been trained (on that).
But that said, this is an issue we're all grappling with as people in society and as writers and as creators of these tools. I think the laws haven't kept up with the technology. I believe in the United States, the form of training OpenAI has performed is legal currently, I think it's legal for them to scrape the internet and to train these models. Now I'm not saying it's ethical and I'm not saying I would do it, and I haven't done it. But I think that's part of the reason there's so many people today in a position of power in the tech industry who are speaking up about slowing down the pace of AI development and are speaking to the government about increasing regulation of this space.
And I think there do need to be a lot of conversations and a lot of thought into creating equitable solutions around all this. I think AI has the potential to help a lot of people and to advance so many industries. These same models that are used by writers are the same models being used to do all sorts of other things. You see the reports on how these models perform better medical diagnoses or whatever else, all these seemingly magical feats. They're not different models, they're the same model that was trained on the internet. My fear is throwing out the baby with the bathwater. We have to come up regulation, we have to come up with ethical guidelines for how these models should be trained in the future and we have to have a way to enforce that. But as a society we're in this messy middle ground or middle period where we haven't quite figured out how to do this. So we're all figuring this out as we go.
Dang, Jason. Quite a scoop and insightful reframing. Thanks for the coverage and the interview.
1. It's copyright violations uploading another's work with intent to pick the meat off the bones.
Using AI to "write" a story is like the Matrix where Neo could get years of martial arts training uploaded into his brain in seconds. The problem with that is, he really didn't LEARN - there was no journey towards self-discovery that's only achieved with putting in YEARS or growth and learning. He didn't learn self-discipline or much of anything else such a journey would teach its practitioner.