Data Integrity Breaches Are Killing Trust in AI

Here’s the money quote from Roger McNamee

So long as we build AIs on lousy content, the results are going to be lousy. AI will be right some of the time, but you won’t be able to tell if the answer is right or wrong without doing further research, which defeats the purpose.

I generally disagree with a GIGO (garbage in, garbage out) meme, but here I love that McNamee calls out the lack of value. You ask the computer for the meaning of life and it spits out 42? Who can tell if that’s right unless they do the math themselves?

Actually, it gets even better.

Engineers have the option of training AIs on content created by experts, but few choose that path, due to cost.

Cost? Cost of quality data?

That’s a symptom of the last decade. Many rushed into an unregulated “data lake” mentality to amass quantity (variety and volume at velocity), with a total disregard for quality.

Get as many dots as possible so you can someday connect them (a sort of rabid data consumption and hoarding mindset) gradually has given way to collect only the things you can use.

While McNamee claims to be writing about democracy, what he’s really saying is that the market is ripe for a data innovation revolution that reduces integrity breaches.

Technology solutions desperately need to be brought into such “save our democracy” discussions, rooted in practical solutions.

A simple example is the W3C Solid protocol. It’s technology that gives real and present steps towards the right thing to do; gets AI companies far ahead of the baseline of safety now looming from smart regulators like Italy.

Taking regulatory action against one of the worst abusers of users, OpenAI, is definitely the right move here.

Last week, the Italian Data Protection Watchdog ordered OpenAI to temporarily cease processing Italian users’ data amid a probe into a suspected breach of Europe’s strict privacy regulations. The regulator, which is also known as Garante, cited a data breach at OpenAI which allowed users to view the titles of conversations other users were having with the chatbot. There “appears to be no legal basis underpinning the massive collection and processing of personal data in order to ‘train’ the algorithms on which the platform relies,” Garante said in a statement Friday. Garante also flagged worries over a lack of age restrictions on ChatGPT, and how the chatbot can serve factually incorrect information in its responses. OpenAI, which is backed by Microsoft, risks facing a fine of 20 million euros ($21.8 million), or 4% of its global annual revenue, if it doesn’t come up with remedies to the situation in 20 days.

It’s the right move because breach reported by users of OpenAI is far worse than the company is admitting, mainly because integrity failures are not regulated well enough to force disclosure (falling far behind confidentiality/privacy laws).

20 days? That should be more than enough time for a company that rapidly dumps unsafe engineering into the public domain. I’m sure they’ll have a fix pushed to production in 20 hours. And then another one. And then another one…

But seriously, systemic and lasting remedies they need (such as building personal data stores so owners can curate quality) have been sitting right in front of them. Maybe the public loss of trust from integrity breaches, coupled with regulatory action, will force the necessary AI innovation.

flyingpenguin

Data Integrity Breaches Are Killing Trust in AI

Leave a Reply

a blog about the poetry of information security, since 1995