2020 saw an increase in cloud data platforms used for operation, analytic, and data science workloads at neck-breaking speed. In a rush to get to the fastest "time to data and insights," organizations are left with no option but to think about data governance and security last. The first phase of migration to the cloud involved applications and infrastructure. Now organizations are moving their data to the cloud as well. As organizations shift into high gear with data migration to the cloud, it's time to adopt a cloud data governance and security architecture to support this massive exodus to the major cloud data platforms (Snowflake, AWS, BigQuery, Azure) at scale.
Who’s accessing your data?
DalleMule & DavenPort, in their article What's your data strategy? , say that more than 70% of employees have access to data they should not, and 80% of analysts' time is spent simply discovering and preparing data. We see this firsthand when we work with small and large organizations alike, and this is a widespread pattern. Answering the question of who has access to what data for one cloud data platform is hard enough; imagine answering this question for a multi-cloud data platform environment.
Let's say you're using Snowflake and AWS Redshift. Your critical analytic and data science workloads are spread across both. How do we solve the challenge of answering who has access to what data consistently and across those two cloud data platforms? For companies that are heavily regulated, you must answer these questions while using a specific regulatory lens such as GDPR, HIPAA, CCPA, or PCI. These regulations further complicate things.
The tension between security and innovation
The struggle for balance between complying with regulations and promoting the fastest time to data means the experience for developers, analysts, and data scientists must be pleasurable and seamless. Data governance and security historically has introduced bumps on the road to velocity. DalleMule & DavenPort’s article presents a robust data strategy framework; they look at a data strategy as a "defensive" versus an "offensive" one. The defensive strategy focuses on regulatory compliance, governance, and security controls whereas the offensive approach focuses on business and revenue generation. The key, they say, is striking a balance; and we agree.
A shared data governance and security architecture
From a technical strategy perspective, in order to implement either a defensive or offensive strategy and achieve a continually shifting balance across multiple cloud data platforms, you need a shared data governance and security architecture. This architecture must transparently observe, detect, protect, and secure all sensitive data while increasing performance over time.
Snowflake famously separated compute and storage. Data governance, security, and data should follow suit. Making the shift from embedded role-based and identity security and access controls in the cloud data platform to an external intelligent multi-cloud data governance and security architecture allows for the optimum flexibility and ability to apply consistent governance and security policies across various data sources and elements. Organizations will define data governance and security policy once and have it instantly applied in all distributed cloud data platforms.
Avoiding governance, security, and access policy lock with one cloud data platform provider will be critically important to adopt a multi-cloud strategy. Think of it this way: suppose you implement data access and security controls for data in Redshift. In that case, you can't expect the same policy to automatically be implemented consistently in your Azure, Snowflake, or Google BigQuery data workloads. This type of automation would require an open and flexible multi-cloud data governance and security architecture. It's essential to avoid the unnecessary complexity and cost of having data governance and security silos across cloud data platform providers. Unnecessary complexity doesn't make technical or business sense. Not having multi-cloud data governance and security architecture will negatively impact data observability, governance, and security costs significantly. The more data you migrate to the cloud, the more your cost increases. Worldwide data is expected to increase by 61% to 175 zettabytes, most of which will be residing in cloud infrastructures. Think about what this will do to governance and security costs across multiple cloud data platform environments.
You can’t protect what you can’t see
This massive movement of data to the cloud will require an incredibly robust data discovery and classification capability. This capability will answer where the data is and what type of data it is. AI and ML will be critical to making sense of the discovery and classification meta-data across these data workloads. You can't protect what you can't see. The discovery of vulnerable assets like data has been the age-old challenge with implementing security controls over large enterprises. With observability, discovery, and governance, you will now be inundated with a tremendous amount of data about people's access and security controls in place to mitigate potential data security risks.
Check out part two of this series to learn how a properly designed and implemented multi-cloud data governance and security architecture can reduce costs and introduce automation around data discovery, classification, and security.