AI Data Safety Tools & Best Practices

Data safety and security should be paramount for all organizations developing or using artificial intelligence. Recent research reveals that in the past year alone, over 90% of surveyed companies experienced a breach related to generative AI.

Brand leaders face a growing list of AI safety risks, including AI-specific attacks like prompt injections and more traditional cybersecurity concerns like social engineering and supply chain vulnerabilities that could expose artificial intelligence applications.

This blog highlights five AI data safety tools and best practices to help organizations mitigate the most significant threats to data security and privacy.

AI data safety tools and best practices

While this guide is dedicated primarily to mitigating risks for companies using AI-powered applications with their own data, AI data safety really starts with the organizations that build and train AI models and develop the aforementioned applications. The best practice is to shift-left by building security into every stage of AI training and development.

For more information about building safer AI, download our whitepaper, Achieving AI Security: Opportunities for CIOs, CISOs, and CDAOs.

AI Data Safety Tools and Best Practices
Tactic	Description	Example Tools
Data minimization	Remove sensitive and unnecessary information from data used in training, fine-tuning, and retrieval-augmented generation (RAG), as well as from user prompts and model outputs	Granica Screen, Private AI
Bias and toxicity detection	Detect discrimination, profanity, and violent language in training and RAG data, inputs, and outputs to ensure fairness and inclusivity in model decision-making	Granica Screen, Garak
Security and penetration testing	Evaluate AI applications for cybersecurity risks and validate their ability to withstand common AI attacks that could result in data leakage	Purple Llama, Garak
Threat detection	Analyze prompts and outputs for possible threats to data security, such as prompt injections and jailbreaks	Vigil, Rebuff
Differential privacy	Add a layer of noise to AI datasets to effectively sanitize any sensitive information contained within training and inference data	PrivateSQL, NVIDIA FLARE

Data minimization

Data minimization is the practice of storing only the information required for accurate inference (decision-making) when using an AI model and removing all other unnecessary data. It can be applied at the training and fine-tuning stage, during retrieval-augmented generation (supplementing training data with external sources), on inputs from end-users, on prompt-engineering applications, and even on outputs before they reach the end-user.

Data minimization reduces the risk that data leaks will contain sensitive information. The practice is required by many data privacy regulations, such as the EU’s GDPR (General Data Protection Regulation) and the US’s HIPAA (Health Insurance Portability and Accountability Act).

Data minimization tools include sensitive data discovery and masking solutions like Granica Screen that help companies detect PII (personally identifiable information) and other private information in training/fine-tuning data, LLM prompts, and RAG inferences. Users can either remove this information entirely or mask it to employ the context for training.

Bias and toxicity detection

toxicity-scale-v2-1

Another important aspect of AI data safety is how information used for fine-tuning could potentially harm consumers and third parties. The best practice is to integrate ethical mitigations into the fine-tuning stage to help ensure the fairness and inclusivity of datasets, avoid biased decision-making, and prevent harmful content in outputs.

Bias and toxicity detection tools can help identify discrimination, profanity, and violent language in training/fine-tuning data and inferences. A solution like Granica Screen can identify problematic content with different levels of severity to help streamline the mitigation process.

Toxicity and Bias Taxonomy By Model
Model	Taxonomy
Granica	Toxicity categories: Disrespectful Hate Identity Attack Violence Sexual Material Profanity Physical Safety Bias categories: Protected characteristics classes: Sexual orientation Age Disability status Physical appearance Religion Pregnancy status Marital status Nationality / location Gender Race / ethnicity Socioeconomic status Political affiliation
Meta Llama Guard 1 7B	Violence and hate Sexual Content Guns & Illegal Weapons Regulated or Controlled Substances Suicide & Self Harm Criminal Planning
Meta Llama Guard 3 8B	Violent Crimes Non-Violent Crimes Sex-Related Crimes Child Sexual Exploitation Defamation Specialized Advice Privacy Intellectual Property Indiscriminate Weapons Hate Suicide & Self-Harm Sexual Content Elections Code Interpreter Abuse
Nvidia Aegis	Hate / Identity Hate Sexual Violence Suicide and Self Harm Threat Sexual (minor) Guns / Illegal Weapons Controlled / Regulated Substances Criminal Planning / Confessions PII Harassment Profanity Other Needs Caution (= unsafe for defensive, safe for permissive)
OpenAI text-moderation-stable	Harassment Harassment / Threatening Hate Hate / Threatening Self-Harm Self-Harm / Intent Self-Harm / Instructions Sexual Sexual / Minors Violence Violence / Graphic
OpenAi omni-moderation-2024-09-26	Same as OpenAI text-moderation-stable plus: Illicit Illicit / Violen
Mistral mistral-moderation-latest	Sexual Hate and Discrimination Violence and Threats Dangerous and Criminal Content Self-Harm Health Financial Law PII
Perspective API	Toxicity Severe Toxicity Identity Attack Insult Threat Profanity Sexually Explicit

Security and penetration testing

Security and penetration testing involves probing an AI solution for weak spots and validating its ability to withstand common AI threats like data inference, prompt injections, and data linkage. Tools like Garak can help develop safer AI applications and periodically test for vulnerabilities in a production environment.

Threat detection

Within an AI context, threat detection involves continuously monitoring datasets, prompts, and outputs for threats like prompt injections and jailbreaks to prevent breaches. Threat detection tools, also known as AI firewalls, use technology like natural language processing (NLP) to automatically determine the difference between safe and unsafe content.

AI Data Threats Detected by AI Firewalls
Threat	Description
Prompt injection	Inserting malicious content into prompts to manipulate the model into revealing sensitive information
Jailbreak	Manipulating an AI to bypass safety guardrails or mitigations to extract sensitive data
Poisoning	Intentionally contaminating an AI dataset to negatively affect AI performance or behavior
Inference	Probing an AI solution for enough PII-adjacent information to infer identifiable information about individuals
Data linkage	Combining semi-anonymized data outputs from an AI model with other publicly available or stolen information to re-identify an individual
Extraction	Probing an AI application to reveal enough information for an attacker to infer some of the model training data

Differential privacy

Differential privacy is an AI data safety technique that involves adding a layer of noise to AI datasets, effectively anonymizing any sensitive information contained within. It’s most often used to protect training data for the foundation model, but differential privacy tools like PrivateSQL can also protect inference data.

Improve AI data safety at every stage with Granica

Granica Screen is a data privacy solution that helps organizations develop and use AI safely. The Screen “Safe Room for AI” protects tabular and NLP data and models during training, fine-tuning, inference, and RAG. It detects sensitive and unwanted data like PII, bias, and toxicity with state-of-the-art accuracy, using masking techniques like synthetic data generation to ensure safe and effective use with LLMs and generative AI. Granica Screen helps organizations shift-left with data safety, using an API to integrate directly into the data pipelines that support data science and machine learning (DSML) workflows.

Schedule a demo to see the Granica Screen AI data safety tool in action.