Zoom under fire amid rumoured plans to collect personal data for AI training

Zoom says it isn’t training AI on calls without consent.

Discussions about Zoom’s Terms of Service have flooded social media after the telecommunication giant expressed interest in making user data part of its training set.

The platform is part of a slew of Silicon Valley tech giants that have aggressively pushed for more data for their artificial intelligence models amid an increase in uproar over developments. Recently, Hollywood was under scrutiny for scanning, digitising, and getting ownership of actors’ likeness.

“Zoom terms of service now require you to allow AI to train on ALL your data — audio, facial recognition, private conversations — unconditionally and irrevocably, with no opt out,” read one widely-shared tweet this week that has since been deleted. “Don’t try to negotiate with our new overlords.”

The original quoted tweet does not seem to be accurate — *all* Zoom data is not (per their ToS / post) subject to collection for AI training. But we're still not comfortable using Zoom, especially at their price tag, with this trend. Better to split too soon than too late.
— Aric Toler (@AricToler) August 7, 2023

The company quickly discredited the tweet with a press release that stressed it “will not use audio, video, or chat customer content to train our artificial intelligence models without your consent.”

While the phrasing is now present in the terms of conditions, some experts warned the original wording of the terms of service could have allowed Zoom to access more user data than needed, including from customer calls.

How about other companies?

Google has also updated its terms of service to use open-source data in its AI training. It will scrape readily available information to strengthen its newly released Bard AI.

“Our privacy policy has long been transparent that Google uses publicly available information from the open web to train language models for services like Google Translate,” said Google spokesperson Christa Muldoon to The Verge. “This latest update simply clarifies that newer services like Bard are also included. We incorporate privacy principles and safeguards into the development of our AI technologies, in line with our AI Principles.”

Why Thread’s 100 million users should be worried about its privacy policy

The updated policy specifies that “publicly available information” is used to train Google’s AI products but doesn’t say how (or if) the company will prevent copyrighted materials from being included in that data pool.

Many publicly accessible websites have policies in place that ban data collection or web scraping for the purpose of training large language models and other AI toolsets.

What does this mean?

There are two causes for concern when discussing AI and data collection.

The first is privacy concerns, including who has access to data online. Users may rightly feel uneasy having their doctor appointments over zoom or their purchases on Google become identifying information. They would also be rightly uneasy if the AI exhibits behaviours based on personal characteristics they wouldn’t want reflected.

The other point of contention lies in ownership of the data, and the debate heats up especially when the data is collected from commercial websites and trains an AI that competes with the original sources.

The matter of whether or not the fair use doctrine extends to this kind of application currently sits in a legal grey area.

The uncertainty has sparked various lawsuits and pushed lawmakers in some nations to introduce stricter laws that are better equipped to regulate how AI companies collect and use their training data.

It also raises questions regarding how this data is being processed to ensure it doesn’t contribute to dangerous failures within AI systems, with the people tasked with sorting through these vast pools of training data often subjected to long hours and extreme working conditions.