Organizations using machine learning systems require data to train their systems. But where does that data come from? And can they get into trouble if they don’t have the rights to use that data? The short answer is yes; they can get into trouble if they aren’t careful.
A few recent cases show the risks associated with companies using personal information for training AI systems allegedly without authorization. First, Burke v. Clearview AI, Inc., a class action filed in federal district court in San Diego at the end of February 2020, involves a company, Clearview, accused of “scraping” thousands of sites to obtain three billion images of faces of individuals used for training AI algorithms for facial recognition and identification purposes. “Scraping” refers to the process of automated processes scanning the content of websites, collecting certain content from them, storing that content, and using it later for the collecting company’s own purposes. The basis for the complaint is that Clearview AI failed to obtain consent to use the scraped images. Moreover, given the vast scale of the scraping – obtaining three billion images – the risk to privacy is tremendous.
In Stein v. Clarifai, Inc., filed earlier in February, the plaintiffs’ class action complaint filed in Illinois state court claims that investors in Clarifai, founders in the dating site OKCupid, used their access to OKCupid’s database of profile photographs to transfer the database to Clarifai. Clarifai then supposedly used the photos to train its algorithms used for analyzing images and videos, including for purposes of facial recognition. Clarifai is the defendant in this case and will have to fight claims that it wasn’t entitled to take the OKCupid photos without notifying the dating site’s users and obtaining consent. OKCupid is potentially a target too. It wasn’t clear if plaintiffs are saying that OKCupid’s management approved the access to its database, but if it did, the plaintiffs may have claims against OKCupid as well.