Skip to main content
AI Data Safety Tools & Best Practices
8:32

data-safety-tools

Data safety and security should be paramount for all organizations developing or using artificial intelligence. Recent research reveals that in the past year alone, over 90% of surveyed companies experienced a breach related to generative AI. 

Brand leaders face a growing list of AI safety risks, including AI-specific attacks like prompt injections and more traditional cybersecurity concerns like social engineering and supply chain vulnerabilities that could expose artificial intelligence applications.

This blog highlights five AI data safety tools and best practices to help organizations mitigate the most significant threats to data security and privacy.

AI data safety tools and best practices

While this guide is dedicated primarily to mitigating risks for companies using AI-powered applications with their own data, AI data safety really starts with the organizations that build and train AI models and develop the aforementioned applications. The best practice is to shift-left by building security into every stage of AI training and development.

For more information about building safer AI, download our whitepaper, Achieving AI Security: Opportunities for CIOs, CISOs, and CDAOs.

AI Data Safety Tools and Best Practices

Tactic

Description

Example Tools

Data minimization

Remove sensitive and unnecessary information from data used in training, fine-tuning, and retrieval-augmented generation (RAG), as well as from user prompts and model outputs

Granica Screen, Private AI

Bias and toxicity detection

Detect discrimination, profanity, and violent language in training and RAG data, inputs, and outputs to ensure fairness and inclusivity in model decision-making

Granica Screen, Garak

Security and penetration testing

Evaluate AI applications for cybersecurity risks and validate their ability to withstand common AI attacks that could result in data leakage

Purple Llama, Garak

Threat detection

Analyze prompts and outputs for possible threats to data security, such as prompt injections and jailbreaks

Vigil, Rebuff

Differential privacy

Add a layer of noise to AI datasets to effectively sanitize any sensitive information contained within training and inference data

PrivateSQL, NVIDIA FLARE

Data minimization

Data minimization is the practice of storing only the information required for accurate inference (decision-making) when using an AI model and removing all other unnecessary data. It can be applied at the training and fine-tuning stage, during retrieval-augmented generation (supplementing training data with external sources), on inputs from end-users, on prompt-engineering applications, and even on outputs before they reach the end-user.

Data minimization reduces the risk that data leaks will contain sensitive information. The practice is required by many data privacy regulations, such as the EU’s GDPR (General Data Protection Regulation) and the US’s HIPAA (Health Insurance Portability and Accountability Act).

Data minimization tools include sensitive data discovery and masking solutions like Granica Screen that help companies detect PII (personally identifiable information) and other private information in training/fine-tuning data, LLM prompts, and RAG inferences. Users can either remove this information entirely or mask it to employ the context for training.

Bias and toxicity detection

toxicity-scale-v2-1

Another important aspect of AI data safety is how information used for fine-tuning could potentially harm consumers and third parties. The best practice is to integrate ethical mitigations into the fine-tuning stage to help ensure the fairness and inclusivity of datasets, avoid biased decision-making, and prevent harmful content in outputs.

Bias and toxicity detection tools can help identify discrimination, profanity, and violent language in training/fine-tuning data and inferences. A solution like Granica Screen can identify problematic content with different levels of severity to help streamline the mitigation process.

Toxicity and Bias Taxonomy By Model

Model

Taxonomy

Granica

Toxicity categories:

  • Disrespectful
  • Hate
  • Identity Attack
  • Violence
  • Sexual Material
  • Profanity
  • Physical Safety

Bias categories:

  • Protected characteristics classes:
  • Sexual orientation
  • Age
  • Disability status
  • Physical appearance
  • Religion
  • Pregnancy status
  • Marital status
  • Nationality / location
  • Gender
  • Race / ethnicity
  • Socioeconomic status
  • Political affiliation

Meta Llama Guard 1 7B

  • Violence and hate
  • Sexual Content
  • Guns & Illegal Weapons
  • Regulated or Controlled Substances
  • Suicide & Self Harm
  • Criminal Planning

Meta Llama Guard 3 8B

  • Violent Crimes
  • Non-Violent Crimes
  • Sex-Related Crimes
  • Child Sexual Exploitation
  • Defamation
  • Specialized Advice
  • Privacy
  • Intellectual Property
  • Indiscriminate Weapons
  • Hate
  • Suicide & Self-Harm
  • Sexual Content
  • Elections
  • Code Interpreter Abuse

Nvidia Aegis

  • Hate / Identity Hate
  • Sexual
  • Violence
  • Suicide and Self Harm
  • Threat
  • Sexual (minor)
  • Guns / Illegal Weapons
  • Controlled / Regulated Substances
  • Criminal Planning / Confessions
  • PII
  • Harassment
  • Profanity
  • Other
  • Needs Caution (= unsafe for defensive, safe for permissive)

OpenAI text-moderation-stable

  • Harassment
  • Harassment / Threatening
  • Hate
  • Hate / Threatening
  • Self-Harm
  • Self-Harm / Intent
  • Self-Harm / Instructions
  • Sexual
  • Sexual / Minors
  • Violence
  • Violence / Graphic

OpenAi omni-moderation-2024-09-26

  • Same as OpenAI text-moderation-stable plus:
  • Illicit
  • Illicit / Violen

Mistral mistral-moderation-latest

  • Sexual
  • Hate and Discrimination
  • Violence and Threats
  • Dangerous and Criminal Content
  • Self-Harm
  • Health
  • Financial
  • Law
  • PII

Perspective API

  • Toxicity
  • Severe Toxicity
  • Identity Attack
  • Insult
  • Threat
  • Profanity
  • Sexually Explicit

Security and penetration testing

Security and penetration testing involves probing an AI solution for weak spots and validating its ability to withstand common AI threats like data inference, prompt injections, and data linkage. Tools like Garak can help develop safer AI applications and periodically test for vulnerabilities in a production environment.

Threat detection

Within an AI context, threat detection involves continuously monitoring datasets, prompts, and outputs for threats like prompt injections and jailbreaks to prevent breaches. Threat detection tools, also known as AI firewalls, use technology like natural language processing (NLP) to automatically determine the difference between safe and unsafe content.

AI Data Threats Detected by AI Firewalls

Threat

Description

Prompt injection

Inserting malicious content into prompts to manipulate the model into revealing sensitive information

Jailbreak

Manipulating an AI to bypass safety guardrails or mitigations to extract sensitive data

Poisoning

Intentionally contaminating an AI dataset to negatively affect AI performance or behavior

Inference

Probing an AI solution for enough PII-adjacent information to infer identifiable information about individuals

Data linkage

Combining semi-anonymized data outputs from an AI model with other publicly available or stolen information to re-identify an individual

Extraction

Probing an AI application to reveal enough information for an attacker to infer some of the model training data

Differential privacy

Differential privacy is an AI data safety technique that involves adding a layer of noise to AI datasets, effectively anonymizing any sensitive information contained within. It’s most often used to protect training data for the foundation model, but differential privacy tools like PrivateSQL can also protect inference data. 

Improve AI data safety at every stage with Granica

Granica Screen is a data privacy solution that helps organizations develop and use AI safely. The Screen “Safe Room for AI” protects tabular and NLP data and models during training, fine-tuning, inference, and RAG. It detects sensitive and unwanted data like PII, bias, and toxicity with state-of-the-art accuracy, using masking techniques like synthetic data generation to ensure safe and effective use with LLMs and generative AI. Granica Screen helps organizations shift-left with data safety, using an API to integrate directly into the data pipelines that support data science and machine learning (DSML) workflows.

Schedule a demo to see the Granica Screen AI data safety tool in action.

Granica
Post by Granica
December 19, 2024