As more companies move their data to the cloud, especially for large-scale data projects like large language models (LLMs) and other generative AI technology, the number of cloud data breaches also increases.
According to the Harvard Business Review, over 80% of data breaches in 2023 involved data stored in the cloud, highlighting the urgent need to improve data security on cloud platforms. This guide lists some of the biggest cloud data security challenges of 2024 and discusses the new best practices and technology solutions for solving them. Click on the links below to jump ahead to that section.
While the cloud simplifies some aspects of IT operations, a lack of control over the infrastructure and the need to rapidly scale for AI, edge computing, and other new technologies creates challenges like:
The modern approach to cloud data security involves breaking down silos between security solutions, advancing to policy-based access control policies, using automation to mitigate human error while improving efficiency, and implementing technology to identify, mask, and protect sensitive information in cloud data lake storage and AI models.
A fundamental lack of understanding of the shared responsibility model may lead some organizations to entrust most or all of their cloud security controls to their cloud service providers (CSPs). While different cloud service types offer varying security obligations, in every case, the customer bears responsibility for both identity and access management (IAM) and data security.
The infrastructure-as-a-service (IaaS) model offers the most security control but also places the most responsibility on customer organizations. Platform-as-a-service (PaaS) and software-as-a-service (SaaS) have similar security handoff points, with application security as the differentiating factor.
The first steps toward improving data security on cloud platforms involve understanding where the provider’s responsibilities end and taking control of cloud security aspects within your organization’s purview. Additionally, the 2023 Gartner® Predicts 2024: IAM and Data Security Combine to Solve Long-Standing Challenges report recommends that you “Optimize your IAM technology portfolio by including critical data management capabilities and practices in the core requirements for IAM solutions.”**
Many organizations find it challenging to apply the granular, role-based access controls (RBAC) favored by the zero-trust security methodology across multi-vendor, multi-cloud deployments. These policies and authentication controls are also challenging to scale efficiently alongside the surging number of authorization rules required in large data platforms, especially for LLMs and other AI models. To address these difficulties, modern cloud security approaches use data security platforms (DSPs) to implement policy-based access controls (PBAC).
PBAC combines RBAC with attribute-based access control (ABAC), which evaluates the characteristics of the requesting user, the data being accessed, the user’s attempted action, and the context of the request.
A DSP like Varonis uses these access control mechanisms to define and enforce dynamic, context-aware data access policies that easily scale. A DSP also consolidates data access and security controls in a centralized platform so IT teams can consistently and efficiently manage security workflows across complex multi-cloud deployments.
Many mistakes occur during tedious, repetitive tasks that humans find boring, like configuring IP and port allow lists on cloud platforms. Luckily, these are often the easiest workflows to automate. For example, IT teams can use a tool like RedHat Ansible to create an infrastructure as code (IaC) configuration playbook that automatically provisions new cloud resources with the same security settings.
These automation tools can also test new configurations and applications for security vulnerabilities and prevent unauthorized changes that could introduce weaknesses. Essentially, automation provides a safety net for oft-overworked IT teams to help reduce the risk of mistakes.
Companies often expose sensitive information inadvertently because they don’t know where it’s stored and, as a result, don’t apply the proper security controls. Data security platforms typically include data discovery capabilities to find personally identifiable information (PII) and other sensitive or valuable data in cloud data stores. This feature gives IT teams centralized visibility into all the cloud platforms used across an organization so they can ensure sensitive data is properly stored and protected.
In addition, the use of LLMs and generative AI increases risk because these models unintentionally ingest PII and other sensitive material in training data and user prompts, making them attractive targets for cybercriminals. Completely preventing a model from taking in sensitive data is logistically challenging, especially when users unwittingly include PII and confidential information in prompts.
Limiting data ingestion also negatively affects model accuracy, preventing companies from getting the full value from their AI investments. PII data discovery tools like Granica Screen help identify sensitive information in AI datasets stored in cloud data lakes. Screen provides data masking to automatically remove the discovered PII, and can also replace the discovered PII with realistic synthetic data to improve AI model accuracy while mitigating breach risk.
Malicious actors have found ways to interfere with AI models and expose enough data to identify individuals or infer sensitive information, even after PII removal and masking. Examples of AI attacks include inference, in which a malicious actor probes the model for enough PII-adjacent information to identify someone, or data linkage, which combines semi-anonymized model outputs with other information to fill in the blanks.
AI-centric data privacy tools prevent model tampering by monitoring inputs and outputs for malicious or unexpected content. For example, Robust Intelligence provides an AI firewall that continuously scans for vulnerabilities, validates inputs and outputs, and monitors for compliance violations.
Granica Screen is a data privacy service for cloud data lake storage, data analytics, and AI applications, especially those leveraging LLMs. Screen delivers state-of-the-art accuracy across 100+ languages and 25+ regions to improve security, privacy, and compliance globally. It also has 5-10X lower cloud infrastructure costs than traditional tools, allowing companies to protect 5-10X more cloud data than other tools for the same money.
To learn more about improving data security on cloud platforms and data lakes with Granica, request a free demo.
Gartner, Predicts 2024: IAM and Data Security Combine to Solve Long-Standing Challenges, by Jeorg Fritsch, Andrew Bales, Nathan Harris, Homan Farahmand, 29 November 2023.
GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and is used herein with permission. All rights reserved.