Generative AI generates real worries of identity privacy and data accuracy

Generative AI models are trained on massive data-sets of text and code, which can inadvertently include personally identifiable information (PII) such as names, addresses and financial logs.
Generative AI models are trained on massive data-sets of text and code, which can inadvertently include personally identifiable information (PII) such as names, addresses and financial logs.

Summary

  • A fierce chatbot race is underway among AI pioneers. Privacy and accuracy may end up as casualties. How well might rival bots fare?

On 6 December, Google revealed that its work on Bard, a Generative Artificial Intelligence (AI) model, had resulted in a much more formidable competitor to the Microsoft-backed Open AI’s ChatGPT. Google’s new model is called Gemini, and it can work with video, images, and of course text, and is an attempt to re-establish Google as a world leader in AI. The model is already available in over 170 largely English-speaking countries and comes as an add-on to Bard. According to Google, as of 13 December, application programming interfaces (APIs) to this system will be made available to software developers for use in their own systems.

Google’s initial attempts earlier this year to prove that it had an answer to ChatGPT were a fiasco. Its parent company, Alphabet, lost $100 billion in market value in the wake of a fouled-up demonstration that gave a wrong answer to a question about the James Webb Space Telescope. In one of its answers, Bard said the telescope was used to take the first pictures of a planet outside the Earth’s solar system, but America’s space agency Nasa confirmed that those were taken by a different telescope. The paradox is that Bard’s learning engine could have found that out by, well, just googling it.

To my mind, this proves an inescapable truth about using Generative AI chatbots to replace search engines for thorough and factually accurate searches. The logic in these chatbot’s programs would cause them to make up an answer they didn’t directly have in their learning repository. This is because they are trained essentially as ‘autocomplete’ programs: i.e., they are trained to ‘generate’ a response which may be factually suspect, especially when their knowledge base is incomplete or incorrect.

Many are quick to point out that this sort of technology is still only experimental, and therefore expecting perfection while they are still in their infancy is unfair. Maybe so. But there are other important points about the rise of Generative AI models that risk getting lost in the noise. For instance, how do these programs treat privacy? As it is, we live in a world of dying privacy. I’m willing to bet that at least Apple, Google and Facebook (and likely many other firms) know exactly where I’m sitting as I type this column out on my keyboard. And so do a handful of governments, not all of them benign. When I used Bard earlier today, it gave me the option of Kannada; it’s clear it knows I’m in Bangalore.

Generative AI models are trained on massive data-sets of text and code, which can inadvertently include personally identifiable information (PII) such as names, addresses and financial logs. This information can be leaked in AI’s output generation, potentially exposing individuals to significant harm. This is certainly easy to believe in the context of the autocomplete engines I spoke of above. Even if PII is removed from data-sets in the interest of providing ‘anonymized’ data for their training models, Generative AI can still be used to de-anonymize individuals based on their unique vocabulary, writing or speaking styles, as well as other personal characteristics that act as give-aways. This can be used to track individuals or target them with personalized advertising, which is the bread-and-butter of most Big Tech firms.

It’s often unclear what private data is being collected by Generative AI models and how this data is used. This lack of transparency makes it difficult for you and me to understand and control how our data is deployed. Bard’s disclaimers, written in legalese, make this amply clear. And yet in our excitement to get our hands on a shiny new toy, we usually agree to all the legal conditions in a heartbeat, just as we would rip off the wrapping paper of a gift or the packaging of a new electronic device.

There are also issues around intellectual property. Who owns the output that a Generative AI model comes up with for you? Similarly, it’s difficult to hold anyone accountable for harmful output generated by Generative AI models. This makes it hard to prevent the spread of misinformation or hate speech. Both ChatGPT and Bard are open-access models, meaning anyone can use them to generate text. This makes it difficult to prevent the generation of potentially harmful or offensive content.

Also, while we know the overall process by which these models work, little is known of their innards, and their output can be unreliable or just plain wrong, as we saw publicly earlier this year with Bard.

While Google has also made strides in data privacy and responsible AI development, it has faced more criticism than Microsoft. Google has been accused before of collecting and using user data without the knowledge or consent of users. When I asked Bard to compare itself and Google’s privacy norms with those of Microsoft and ChatGPT, this is what I got: “Overall, Microsoft’s alliance with OpenAI and ChatGPT appears to offer a more robust approach to data privacy and responsible AI development than Google Bard. Microsoft’s commitment to transparency, OpenAI’s privacy-preserving techniques, the decentralized nature of ChatGPT, and Microsoft’s focus on explainability and ethical development all contribute to a more trustworthy and user-centric approach to generative AI."

Hmmm…. That response may have been accurate prior to the Altman debacle, but now I’m not so sure. Maybe Gemini/Bard’s training data isn’t recent enough to analyse what happened with Altman at Open AI just a few short days ago.

Catch all the Business News, Market News, Breaking News Events and Latest News Updates on Live Mint. Download The Mint News App to get Daily Market Updates.
more

MINT SPECIALS