Google expands AI bug bounty program rewards 

Share post:

Google has expanded its bug bounty program to include new categories of attacks specific to AI systems. The program will reward security researchers for reporting issues such as prompt injection, training data extraction, model manipulation, adversarial perturbation attacks, and data theft targeting model-training data.

The scope of the AI bug bounty program includes new categories of designed to encourage security researchers to identify and report potential threats to the security and reliability of Google’s AI products.

Prompt attacks are one of the categories, and they involve the creation of adversarial prompts that can manipulate a model’s behavior, potentially leading to harmful or offensive content being generated. This kind of attack can pose significant risks, and rewarding researchers for identifying such vulnerabilities is a proactive step in enhancing AI security.

Training data extraction is another category. Attackers may attempt to reconstruct verbatim training examples that contain sensitive information from AI models. This poses not only privacy concerns but also the risk of uncovering and exploiting model biases. Rewarding the discovery of such vulnerabilities can help in mitigating these risks.

Model manipulation is a category that involves covertly altering a model’s behavior to trigger predefined adversarial actions. By rewarding researchers for identifying instances of model manipulation, Google aims to ensure that AI models remain robust and resistant to unauthorized tampering.

Adversarial perturbation is another area of concern, where inputs are designed to provoke highly unexpected and deterministic outputs from AI models. These types of attacks can lead to unpredictable and potentially harmful consequences. By offering rewards for the discovery of adversarial perturbations, Google reinforces the importance of addressing this vulnerability.

Model theft or exfiltration is the fifth category, where attackers may attempt to steal details about a model, such as its architecture or weights. This information can be exploited for unauthorized use or for creating similar models, potentially leading to intellectual property theft. Rewarding researchers for identifying such breaches can help protect Google’s AI assets.

Google is also providing more specific guidance on what types of reports are in scope for rewards. For example, prompt injection attacks that are invisible to victims and change the state of the victim’s account or assets are in scope, while using a product to generate violative, misleading, or factually incorrect content in your own session is out of scope.

The sources for this piece include an article in TheVerge.

SUBSCRIBE NOW

Related articles

A new MacOS attack from malware-as-a-service

Cado Security recently exposed a new macOS-targeted malware known as "Cthulhu Stealer," which operates as malware-as-a-service (MaaS). The...

Crowdstrike criticizes competitors who are taking advantage of the

CrowdStrike’s president, Michael Sentonas, has strongly criticized competitors for taking advantage of the company’s recent IT outage to...

Toyota confirms leak of 240GB of sensitive data in recent hack

Toyota recently confirmed a significant data breach after 240GB of sensitive information, including employee and customer data, was...

Ransomware payments reach record levels in 2024

Ransomware has become increasingly profitable in 2024, with cybercriminals collecting a staggering $459.8 million in ransom payments during...

Become a member

New, Relevant Tech Stories. Our article selection is done by industry professionals. Our writers summarize them to give you the key takeaways