Architecture

Understand the Granica architecture.

Granica consists of a control and data plane that runs entirely in your cloud environment and that interacts with data in your cloud data lakes and their underlying object stores. This architecture enables Granica services to integrate with your applications (for example Apache Spark-based apps) to securely process and manage your data because it never leaves your environment. It also enables us to solve for some of the biggest AI challenges:

  • Increase data privacy where you’re most vulnerable. Accurately protect the sensitive information with the largest attack surface - high volume unstructured data sets. Learn more about Granica Screen.

  • Shrink cloud costs for really big data. Reduce monthly costs by up to 80% by deeply compressing high volume data sets. Simply pay us a % of savings. Learn more about Granica Crunch.

  • Surface insights in seconds. Visualize and understand your data lake, discover new datasets for training and gain actionable insights about your files and their usage, fast. Learn more about Granica Chronicle AI.

In the context of data de-identification and data privacy with Granica Screen, applications read/write as usual. Using the Granica S3-compatible API/SDK is an optional security and privacy enhancement to protect data as it is being written by your applications. Screen does not de-identify source data in place, but rather creates a de-identified copy in a specified target location. In this way Screen is a straightforward data transformation easily integrated into applications and data pipelines.

In the context of data compression and cost control with Granica Crunch, your applications write data into buckets as usual (no changes) and that data is automatically and losslessly compressed, in place, to save you money. Your applications then read data, whether compressed or not, via the Granica S3-compatible API/SDK. Your applications can optionally write data inline through the Granica API/SDK to further increase efficiency and savings. Granica Crunch can be used on the same data as Granica Screen, in which case applications and data pipelines reading the de-identified data must use the Granica S3-compatible API/SDK.

Detailed Architecture

Some key architectural characteristics:

  • Your data never leaves your environment. Essential capabilities such as encryption for data (in transit and at rest) and fine-grained Access Controls are built-in providing enterprise-grade data security.
  • Our control and data planes self-deploy into a dedicated account/project and VPC in your environment and run as a single tenant, respecting all of your security policies. Our architecture is optimized to efficiently utilize multiple availability zones (AZs) for availability and reliability. Our platform leverages VPC peering to connect our services with your applications, minimizing cross-AZ charges.
  • Granica delivers the data security, compliance and control benefits of a traditional VPC combined with the vendor-managed benefits of SaaS. This is why enterprises, especially those in regulated industries with highly sensitive data such as financial services, healthcare, and government agencies, choose Granica.

Shared infrastructure

Shared infrastructure

All Granica products are built on shared Granica infrastructure, the majority of which exists to ensure things “just work”. Robust infrastructure exists for data integrity, availability, elastic scaling, encryption, 100% non-disruptive upgrades, telemetry and more. Granica also provides customer-facing capabilities for reporting, billing and analytics. Let’s explore some of this internal shared infrastructure in more detail.

Multi-tenant data isolation

Data isolation

If you have multiple end-user customers or tenants in your environment you may want (or need) to separate and isolate the data from each tenant for security and compliance purposes. But properly implementing and maintaining data isolation across your apps can be a real challenge.

Granica makes data isolation easy, with out-of-the-box support for complete data isolation of crunched data via separate buckets and associated access controls for each tenant. Just read and write through the Granica API and isolation happens automatically. Check out our data isolation page for more details.

Data integrity (Granica Crunch)

Granica implements multiple levels of data integrity to ensure your data is always protected. In fact, we have more lines of integrity code than compression code.

Object integrity

Object data integrity

  1. Crunch-time object integrity validation. Immediately after Crunch reduces a source object, Granica performs MD5 validation against the source object.
  2. Pre-clean-up object integrity validation. Before allowing Crunch to clean up source objects (i.e. DELETE them to start saving money), Granica performs another MD5 validation with the associated reduced objects.
  3. Background object integrity validation. Automatically and in the background, Granica randomly selects reduced objects, hydrates them, computes MD5 values, and compares them with the MD5 values obtained when the object was first reduced, i.e. at crunch-time.

Metadata integrity

Metadata integrity

All Granica metadata is stored in multiple locations, both in cloud storage and on each node. Granica initiates full metadata integrity checks in the background using these redundant copies on a frequent, hourly basis.

Integrity failure handling

Jason

In the unlikely event of any integrity failure (whether object or metadata), our team is alerted, Granica stops Crunch from crunching new objects and instead directs Crunch to store them unreduced. When the integrity failure is resolved, Granica allows Crunch to resume crunching incoming objects as well as any objects that were temporarily stored in unreduced form.

High availability (HA)

Granica provides >99.99% high availability (HA) to Granica products, building upon cloud-native primitives such as AWS EKS Availability Groups. We deliver this availability extremely cost-efficiently by automatically leveraging spot instances where possible. High-availability

All efficiency services are implemented using kubernetes pods running across a cluster of compute instances or nodes. These pods are made highly available using a minimum 2-node cluster consisting of 24x7 on-demand instances. As you enable more applications to use our services, Granica elastically spins up additional spot instances and service pods to handle the increased load on the system. We distribute these pods across the cluster instances, using a Broker pod to manage requests to these distributed service pods.

In addition to providing HA for service pods running on the 2-node on-demand cluster, Granica provides HA for service pods distributed across any spot instances. In this way we deliver >99.99% availability for all pods across all instances, even if no spot instances are available.

Elastic scaling

Granica takes full advantage of the inherent elasticity of public cloud infrastructure to ensure consistent performance regardless of your data scale.

Elastic Scaling For reads, the Crunch data compression service delivers sustained per-node throughput of up to 3 GB/s, nearly saturating a 25Gbps network connection. And for writes, Crunch delivers sustained per-node throughput of up to 1.5 GB/s, nearly saturating a 12.5Gbps network connection. All compute resources are also completely elastic, automatically and dynamically scaling from zero to n nodes and potentially back to zero depending on load. In this way Granica delivers effectively infinite scalability for your environment.

Whenever possible, Granica utilizes the largest available spot instances for these elastic nodes in order to complete compute work as fast as possible (and then shut down), thus maximizing performance with minimum cloud infrastructure costs.

Regarding read latency, Crunch pods on each node cache objects locally both in memory and on NVMe SSD, thus dramatically reducing read latency for data in the cache relative to vanilla S3 and GCS. For cache misses, Crunch adds less than 50 ms of additional latency relative to vanilla S3 and GCS reads. Regarding writes, Crunch adds less than 100ms additional latency on the write path relative to vanilla S3 and GCS. For most situations, this small added read/write latency makes no difference to end users, and also doesn't affect throughput.

Encryption

The Granica data path runs entirely within your VPC and respects your security policies. Encryption

Your data never leaves your cloud environment, and Granica (and Crunch by extension) relies upon the native encryption services provided by the public cloud provider of choice (e.g. S3 server side encryption) to encrypt both data at rest and data in motion. As a result, all data at rest is AES 256 encrypted, and all data at motion is TLS encrypted.

Granica also makes Amazon S3 more secure than it is natively, particularly when it comes to buckets with public access. Unlike the ACL by ACL approach AWS provides, Granica offers global policies for data access with a centralized Open Policy Agent and Gatekeeper services account. Your apps can continue to define their own ACLs, and then handshake with Granica for global allow/deny for example to globally block public access.

Non-disruptive upgrades

Granica upgrades are transparent to your applications. This enables you to easily take advantage of new features and capabilities and maximize the value you get from our platform. Non-disruptive upgrades

Granica implements a rolling upgrade (and rollback) approach across all service pods and containers, as well as underlying Kubernetes cluster infrastructure. This approach ensures that upgrades do not affect availability or performance. By default, your Granica deployment stays up to date with new versions automatically. You can manually initiate an upgrade at any time via the granica update command. And you can optionally configure upgrades to only happen when initiated manually.

See also