Granica Screen

The Data Privacy Service for Enterprise AI

Background

Data warehouses typically follow a standard architecture where data flows from sources through an ETL process, populates core and derived data tables, and ultimately feeds online systems. However, many data sources contain private or sensitive information, and ensuring its protection throughout this process is crucial.

Challenges in Data Privacy

Sensitive data can be exposed at two key points:

  1. ETL process: Before data lands in the core data lake/warehouse, it's essential to clean sensitive data to prevent unauthorized access.
  2. Derived data generation: While allowing sensitive data in core tables might be necessary for specific use cases, direct access to such data should be restricted. When generating derived data sources, it's vital to clean/omit sensitive data beforehand to prevent exposure.

Requirements for a Solution

An ideal solution for this scenario should address the following:

  • Accurate identification and handling of private/sensitive data: The solution should effectively pinpoint and manage sensitive data across the data warehouse.
  • Performance and scalability: The data cleaning process must handle large data volumes efficiently, especially when applied during ETL.
  • Cost-effectiveness: Inefficient data processing can significantly impact data warehouse operational costs due to the large data volumes involved.

Where Granica Screen Fits In

Granica Screen offers a highly accurate, scalable, and efficient data classification engine combined with a flexible system for cleaning, redacting, or otherwise obfuscating sensitive data as needed. Screen is deployed into your VPC.

High Level Architecture

Granica Screen Deployment Modes

Continuous scan:

Monitors a specified cloud storage path (e.g., table, namespace, entire data warehouse) for new and existing data, generating reports on detected sensitive data.

On Demand scan (coming soon)

Scans a defined data set once and generates reports on sensitive data findings.

API (coming soon)

Provides programmatic access to Granica Screen's data classification and transformation capabilities.

Screening Modes:

Detection Reports:

Granica Screen scans data automatically, generating reports on sensitive data findings without impacting existing workflows. You can choose which data to scan anywhere in the warehouse (staging during ETL, core tables, derived data) and gain insights into the location of sensitive data.

Data Transformation (PII removal) and Detection Reports

Granica Screen integrates into the data transformation pipeline. Configure a destination location, and a transformation configuration, and Granica Screen will output transformed data there. Depending on your access control needs:

  • No sensitive data in the warehouse: The ETL process outputs data to a staging table, where Granica Screen redacts it before transferring it to the desired core tables.
  • Sensitive data in core tables, but not derived tables: Similar to above, a dedicated staging process can be set up for specific derived data tables, ensuring sensitive data remains excluded.

Typical Scenarios

Self-managed data warehouse (S3/GCS storage)

  • Scenario: Logs are exported to s3://my-org-logs/.
  • Policy options:
    • Add s3://my-org-logs to Granica policy include filter.
    • Run granica crunch s3://my-org-logs for non-crunch data.
  • Detection: Reports are generated to s3://n-hawkeye-report-... and displayed in the Dojo dashboard.
  • Transformation: Cleaned logs are exported to s3://my-org-logs-cleaned/ with corresponding object names.

External tables in managed data warehouse (S3/GCS storage)

  • Scenario: Consider the table creation statement:

    CREATE TABLE logs LOCATION 's3://depts/finance/my-org-logs';

  • Granica setup: The same setup as above applies. However, the external table needs to be configured to use the new output location containing the cleaned data:

    CREATE TABLE logs_cleaned LOCATION 's3://depts/finance/my-org-logs-cleaned';

Managed tables in data warehouse (Snowflake, Bigquery, Redshift, Databricks)

  • API integration (future release): Granica Screen will provide API connectors to:
    1. Get notified of new data.
    2. Read data.
    3. Write data (if transforming).

Get a demo

Contact us to get a live demo and see Granica Screen in action.

See also