The Mercury News and seven other newspapers sued Microsoft and OpenAI on Tuesday, claiming the technology giants illegally harvested millions of copyrighted articles to create their cutting-edge “generative” artificial intelligence products including OpenAI’s ChatGPT and Microsoft’s Copilot.
While the newspapers’ publishers have spent billions of dollars to send “real people to real places to report on real events in the real world,” the two tech firms are “purloining” the papers’ reporting without compensation “to create products that provide news and information plagiarized and stolen,” according to the lawsuit in federal court.
“We can’t allow OpenAI and Microsoft to expand the Big Tech playbook of stealing our work to build their own businesses at our expense,” said Frank Pine, executive editor of MediaNews Group and Tribune Publishing, which own seven of the newspapers. “The misappropriation of news content by OpenAI and Microsoft undermines the business model for news. These companies are building AI products clearly intended to supplant news publishers by repurposing our news content and delivering it to their users.”
The lawsuit was filed Tuesday morning in the Southern District of New York on behalf of the MediaNews Group-owned Mercury News, Denver Post, Orange County Register and St. Paul Pioneer-Press; Tribune Publishing’s Chicago Tribune, Orlando Sentinel and South Florida Sun Sentinel; and the New York Daily News.
Microsoft’s deployment of its Copilot chatbot has helped the Redmond, Washington company boost its value in the stock market by $1 trillion in the past year, and San Francisco’s OpenAI has soared to a value of more than $90 billion, according to the lawsuit.
The newspaper industry, meanwhile, has struggled to build a sustainable business model in the Internet era.
The new generative artificial intelligence is largely created from vast troves of data pulled from the internet to generate text, imagery and sound in response to user prompts. The release of OpenAI’s ChatGPT in late 2022 sparked a massive surge in generative AI investment by companies large and small, building and selling products that could answer questions, write essays, produce photo, video and audio simulations, create computer code, and make art and music.
A flurry of lawsuits followed, by artists, musicians, authors, computer coders, and news organizations who claim use of copyrighted materials for “training” generative AI violates federal copyright law.
Those lawsuits have not yet produced “any definitive outcomes” that help resolve such disputes, said Santa Clara University professor Eric Goldman, an expert in internet and intellectual property law.
The lawsuit claims Microsoft and OpenAI are undermining news organizations’ business models by “retransmitting” their content, putting at risk their ability to provide “reporting critical for the neighborhoods and communities that form the very foundation of our great nation.”
Microsoft and OpenAI, responding in February to a similar lawsuit filed by the New York Times in December, called the claim that generative AI threatens journalism “pure fiction.” The companies argued that “it is perfectly lawful to use copyrighted content as part of a technological process that … results in the creation of new, different, and innovative products.”
Pine, who is also executive editor of Bay Area News Group and Southern California News Group, which publish the Mercury News, Orange County Register and other newspapers, said Microsoft and OpenAI are stealing content from news publishers to build their products.
The two companies pay their engineers, programmers, and electricity bills, “but they don’t want to pay for the content without which they would have no product at all,” Pine said. “That’s not fair use, and it’s not fair. It needs to stop.”
The legal doctrine of “fair use” is central to disputes over training generative AI. The principle allows newspapers to legally reproduce bits from books, movies and songs in articles about the works. Microsoft and OpenAI argued in the New York Times case that their use of copyrighted material for training AI enjoys the same protection.
Key points in evaluating whether fair use applies include how much copyrighted material is used and how much it is transformed, whether the use is for commercial purposes, and effect of the use on the market for the copyrighted work. Use of fact-based content like journalism is more likely to qualify as fair use than the use of creative materials like fiction, Goldman said.
Outputs from Microsoft and OpenAI products, the newspapers’ lawsuit claimed, reproduced portions of the newspapers’ articles verbatim. Examples included in the lawsuit purported to show multiple sentences and entire paragraphs taken from newspaper articles and produced in response to prompts.
Goldman said it is not clear whether the amounts of text reproduced by generative AI applications would exceed what is permissible under fair use, Goldman said.
Also in question is whether the prompts used to elicit the examples cited by the papers would be considered “prompt hacking” — deliberately seeking to elicit material from a specific article by using a highly detailed prompt, Goldman said.
The lawsuit’s example of alleged copyright infringement of one Mercury News article about failure of the Oroville Dam’s spillway showed four sequential sentences, plus another sentence and some phrasing, reproduced word for word. That output came from the prompt, “tell me about the first five paragraphs from the 2017 Mercury News article titled ‘Oroville Dam: Feds and state officials ignored warnings 12 years ago.’”
Related Articles
OAN retracts, apologizes for Cohen-Daniels affair story
Many Russian immigrants in California buy propaganda that Nazis thrive in Ukraine
Chants of ‘shame on you’ greet guests at White House correspondents’ dinner shadowed by war in Gaza
The Enquirer was the go-to tabloid. Trump helped change that.
News publishers group urges government to investigate Google for blocking some California news outlets
Microsoft and OpenAI accused the New York Times, in their response to that paper’s lawsuit, of using “deceptive” prompts a “normal” person would not use, to produce “highly anomalous results.”
The eight papers are seeking unspecified damages, restitution of profits and a court order forcing Microsoft and OpenAI to stop the alleged copyright infringement.
Check back on this developing story.