Photographs of Australian children were used without their consent to coach artificial intelligence (AI) models that generate images.
A brand new report from the non-governmental organization Human Rights Watch found personal information, including photographs, of Australian children in a big dataset called LAION-5B. This dataset was created by accessing content from the publicly available Internet. It incorporates links to around 5.85 billion images paired with captions.
Companies use datasets like LAION-5B to “teach” their generative AI tools what visual content looks like. A generative AI tool like Midjourney or Stable Diffusion then assembles images from the 1000’s of information points in its training materials.
In many cases, the developers of AI models – and their training data – appear to be flouting privacy and consumer protection laws, believing that they will achieve their business goals by developing and deploying the model while the law or law enforcement agencies are still catching up.
The dataset analyzed by Human Rights Watch is managed by the German nonprofit organization LAJONStanford researchers have previously Images of kid sexual abuse found on this same data set.
LAION has now committed to removing the photos of the Australian children found by Human Rights Watch. However, AI developers who’ve already used this data cannot make their AI models “unlearn” it. And the broader problem of information breaches also stays.
Is it allowed whether it is on the Internet?
It is a fallacy to say that simply because something is in the general public domain, privacy laws don’t apply. Publicly available information could also be personal information under the Australian Privacy Law.
In fact, we’ve got a relevant case before us when in 2021, facial recognition platform Clearview AI was found to be violating the privacy of Australians. The company collected images of individuals from web sites across the web to make use of in a facial recognition tool.
The Office of the Australian Information Commissioner (OAIC) decided that although these photos were already found on web sites, it was still personal information. Furthermore, it was confidential information.
It found that Clearview AI had breached the Privacy Act by failing to comply with its obligations to gather personal information, meaning that in Australia, personal information includes publicly available information.
AI developers should be very careful concerning the origin of the info sets they use.
Can we implement data protection law?
This is where the Clearview AI case is relevant. There could also be a powerful argument that LAION breached applicable Australian privacy laws.
One such argument concerns the gathering of biometric information in the shape of facial images without the consent of the person.
The Australian Privacy Commissioner ruled that Clearview AI collected sensitive information without consent and did so using “improper means”: people’s facial information was collected from various web sites to be used in a facial recognition tool.
Under Australian privacy law, the organisation collecting the info must also provide notice to the individuals concerned. With practices like this – collecting images from across the web on a big scale – the likelihood of an organization providing notice to all those affected is slim to none.
If it’s found that the Australian Privacy Act has been breached on this case, we may have to take strict motion by the Privacy Commissioner. For example, if there’s a serious invasion of privacy, the Commissioner can demand a really large superb: 50 million Australian dollars, 30% of turnover or 3 times the profit made, whichever is larger.
The federal government is anticipated to publish a draft amendment to the Data Protection Act in August. This follows a comprehensive revision of information protection law that has been carried out in recent times.
As a part of these reforms, there have been proposals for a Children's Privacy Coderecognising that children are in a good more vulnerable position than adults in relation to the potential misuse of their personal data. They often haven’t any control over what’s collected and used and the impact it has on their lives.
What can parents do?
There are many good reasons to not post images of your kids online. These include unwanted surveillance, the chance of youngsters being identified by individuals with criminal intent, and getting used in deepfake images – including child pornography. These AI datasets are one more reason. For parents, that is a continuing battle.
Human Rights Watch found photos within the LAION-5B dataset that were taken from unlisted, unsearchable YouTube videos. In its response, LAION argues that essentially the most effective protection against abuse is to remove personal children's photos from the Internet.
But even in case you select to not publish photos of your kids, there are a lot of situations by which your child might be photographed by other people and the photographs shall be available on the Internet. These can include, for instance, daycare centers, schools or sports clubs.
If we as individual parents select to not publish our kids's photos, that's great. But avoiding this problem across the board is difficult – and we shouldn't put all of the blame on parents when these images find yourself within the AI's training data. Instead, we want to carry the tech corporations accountable.