AI Companies Keep Stealing From Everyone. Who Can Stop Them?

It’s getting hard to keep up with all the reports filed about AI companies that operate as predatory criminal enterprises desperate to maximize rapid growth at others’ expense.


…LLM companies such as OpenAI, Anthropic, Cohere and even Meta — traditionally the most open source-focused of the Big Tech companies, but which declined to release the details of how LLaMA 2 was trained — have become less transparent and more secretive about what datasets are used to train their models. […] …there is no longer any doubt that copyright infringement is rampant. As companies seeking commercial success get ever-hungrier for data to feed their models, there may be ongoing temptation to grab all the data they can.

Shout out to Anthropic as indistinguishable from OpenAI, after it was founded by OpenAI staff to be distinguishable.


…the bloom is coming off the AI-generated rose. Governments are ramping up efforts to regulate the technology, creators are suing over alleged intellectual property and copyright violations, people are balking at the privacy invasions (both real and perceived) that these products enable, and there are plenty of reasons to question how accurate AI-powered chatbots really are and how much people should depend on them. Assuming, that is, they’re still using them. Recent reports suggest that consumers are starting to lose interest.

Washington Post

Behind the AI boom, an army of overseas workers in ‘digital sweatshops’ … In the Philippines, one of the world’s biggest destinations for outsourced digital work, former employees say that at least 10,000 of these workers do this labor on a platform called Remotasks, which is owned by the $7 billion San Francisco start-up Scale AI. Scale AI has paid workers at extremely low rates, routinely delayed or withheld payments and provided few channels for workers to seek recourse, according to interviews with workers, internal company messages and payment records, and financial statements. Rights groups and labor researchers say Scale AI is among a number of American AI companies that have not abided by basic labor standards for their workers abroad.

New York Magazine in collaboration with The Verge

This tangled supply chain is deliberately hard to map. According to people in the industry, the companies buying the data demand strict confidentiality. (This is the reason Scale cited to explain why Remotasks has a different name.) Annotation reveals too much about the systems being developed, and the huge number of workers required makes leaks difficult to prevent. Annotators are warned repeatedly not to tell anyone about their jobs, not even their friends and co-workers, but corporate aliases, project code names, and, crucially, the extreme division of labor ensure they don’t have enough information about them to talk even if they wanted to. (Most workers requested pseudonyms for fear of being booted from the platforms.) Consequently, there are no granular estimates of the number of people who work in annotation, but it is a lot, and it is growing. A recent Google Research paper gave an order-of-magnitude figure of “millions” with the potential to become “billions.”

Pay extremely low rates, routinely delay or withhold payments, and illegally redirect wealth from everyone to a few.

Digital dictatorships.

2 thoughts on “AI Companies Keep Stealing From Everyone. Who Can Stop Them?”

  1. McKernan is one of three artists seeking to protect their copyrights and careers by suing makers of AI tools. She says “I even reached out to some of these companies to say ‘Hey, little artist here, I know you’re not thinking of me at all, but it would be really cool if you didn’t use my work like this.’ And, crickets, absolutely nothing.”

  2. News outlets including the New York Times, CNN, Reuters and the Australian Broadcasting Corporation (ABC) have blocked OpenAI’s GPTBot web crawler

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.