Two types of dataset poisoning attacks can corrupt AI system results


It usually costs ≤$60 USD to control at least 0.01% of the data. Cost is calculated by purchasing the domain in order of lowest cost per image first. Credit: arXiv (2023). DOI: 10.48550/arxiv.2302.10149
A team of computer science researchers with members from Google, ETH Zurich, NVIDIA and Robust Intelligence, is highlighting two types of dataset poisoning attacks that bad actors can use to corrupt results. AI system. The team wrote a paper outlining the types of attacks they identified and posted it on arXiv print server available.
With the rise of deep learning neural networks, artificial intelligence applications have become big news. And because of their unique learning abilities, they can be applied in a variety of environments. However, as the researchers of this new endeavor note, one thing they all have in common is the need for quality data to use for training purposes.
Because such systems learn from what they see, if something goes wrong they have no way of knowing it and thus incorporate it into their rule set. For example, consider an AI system trained to recognize patterns on mammograms as cancerous tumors. Such systems will be trained by showing them many examples of real tumors collected during mammograms.
But what if someone inserts an image into the currently displayed dataset cancerous tumor, but are they labeled as non-carcinogenic? Very quickly, the system will start to miss those tumors because it has been taught to treat them as non-cancerous. In this new effort, the team has shown that the same can happen with AI systems trained using publicly available data on the Internet.
The researchers begin by noting that ownership of URLs on the Internet often expires—including those that have been used as sources by AI systems. That makes them available to buy by nefarious types looking to disrupt the AI system. If those URLs are purchased and then used to create websites with misinformationthe AI system will add that information to its knowledge bank just as easily as the real information—and that will result in the AI system producing worse results than it should.
The team calls this type of attack detached view poisoning. Testing shows that such an approach can be used to purchase enough URLs to poison a large portion of mainstream AI systems, for as little as $10,000.
There is another way that AI systems can be subverted—by manipulating data in well-known data repositories like Wikipedia. The researchers note that this can be done by modifying the data just before the normal data dump, preventing the monitor from detecting changes before they are sent to and used by other users. AI system. They call this approach running first poisoning.
More information:
Nicholas Carlini et al., Web-scale training dataset poisoning is real, arXiv (2023). DOI: 10.48550/arxiv.2302.10149
© 2023 Science X Network
quote: Two types of dataset poisoning attacks that can corrupt AI system results (2023, March 7) retrieved March 7, 2023 from https://techxplore.com/news/2023-03 -dataset-poisoning-corrupt-ai-results.html
This document is the subject for the collection of authors. Other than any fair dealing for private learning or research purposes, no part may be reproduced without written permission. The content provided is for informational purposes only.