Anytime a new technology becomes popular, you can expect there’s someone trying to hack it. Artificial intelligence, specifically generative AI, is no different. To meet that challenge, Google created a ‘red team’ about a year and a half ago to explore how hackers could specifically attack AI systems.
“There is not a huge amount of threat intel available for real-world adversaries targeting machine learning systems,” Daniel Fabian, the head of Google Red Teams, told The Register in an interview. His team has already pointed out the biggest vulnerabilities in today’s AI systems.
Some of the biggest threats to machine learning (ML) systems, explains Google’s red team leader, are adversarial attacks, data poisoning, prompt injection, and backdoor attacks. These ML systems include those built on large language models, like ChatGPT, Google Bard, and Bing AI.
These attacks are commonly referred to as ‘tactics, techniques and procedures’ (TTPs).
“We want people who think like an adversary,” Fabian told The Register. “In the ML space, we are more trying to anticipate where will real-world adversaries go next.”
Google’s AI red team recently published a report where they outlined the most common TTPs used by attackers against AI systems.
1. Adversarial attacks on AI systems
Adversarial attacks include writing inputs specifically designed to mislead an ML model. This results in an incorrect output or an output that it wouldn’t give in other circumstances, including results that the model could be specifically trained to avoid.
“The impact of an attacker successfully generating adversarial examples can range from negligible to critical, and depends entirely on the use case of the AI classifier,” Google’s AI Red Team report noted.
2. Data-poisoning AI
Another common way that adversaries could attack ML systems is via data poisoning, which entails manipulating the training data of the model to corrupt its learning process, Fabian explained.
“Data poisoning has become more and more interesting,” Fabian told The Register. “Anyone can publish stuff on the internet, including attackers, and they can put their poison data out there. So we as defenders need to find ways to identify which data has potentially been poisoned in some way.”
These data poisoning attacks include intentionally inserting incorrect, misleading, or manipulated data into the model’s training dataset to skew its behavior and outputs. An example of this would be to add incorrect labels to images in a facial recognition dataset to manipulate the system into purposely misidentifying faces.
One way to prevent data poisoning in AI systems is to secure the data supply chain, according to Google’s AI Red Team report.
3. Prompt injection attacks
Prompt injection attacks on an AI system entail a user inserting additional content in a text prompt to manipulate the model’s output. In these attacks, the output could result in unexpected, biased, incorrect, and offensive responses, even when the model is specifically programmed against them.
Since most AI companies strive to create models that provide accurate and unbiased information, protecting the model from users with malicious intent is key. This could include restrictions on what can be input into the model and thorough monitoring of what users can submit.
4. Backdoor attacks on AI models
Backdoor attacks are one of the most dangerous aggressions against AI systems, as they can go unnoticed for a long period of time. Backdoor attacks could enable a hacker to hide code in the model and sabotage the model output but also steal data.
“On the one hand, the attacks are very ML-specific, and require a lot of machine learning subject matter expertise to be able to modify the model’s weights to put a backdoor into a model or to do specific fine-tuning of a model to integrate a backdoor,” Fabian explained.
These attacks can be achieved by installing and exploiting a backdoor, a hidden entry point that bypasses traditional authentication, to manipulate the model.
“On the other hand, the defensive mechanisms against those are very much classic security best practices like having controls against malicious insiders and locking down access,” Fabian added.
Attackers also can target AI systems through training data extraction and exfiltration.
Google’s AI Red Team
The red team moniker, Fabian explained in a recent blog post, originated from “the military, and described activities where a designated team would play an adversarial role (the ‘red team’) against the ‘home’ team.”
“Traditional red teams are a good starting point, but attacks on AI systems quickly become complex, and will benefit from AI subject matter expertise,” Fabian added.
Attackers also must build on the same skillset and AI expertise, but Fabian considers Google’s AI red team to be ahead of these adversaries with the AI knowledge they already possess.
Fabian remains optimistic that the work his team is doing will favor the defenders over the attackers.
“In the near future, ML systems and models will make it a lot easier to identify security vulnerabilities,” Fabian said. “In the long term, this absolutely favors defenders because we can integrate these models into our software development life cycles and make sure that the software that we release doesn’t have vulnerabilities in the first place.”