The Evolution of Data Warehouses
The past 15 years have seen great improvements in the cost, scalability, flexibility, and simplicity of data warehouses. Today, companies can use cloud data warehouses (CDWs) from vendors like Snowflake, Amazon Redshift, and GoogleBigQuery to collect and analyze data from many internal sources to improve operations and increase business insights.
CDWs make it simple to store vast quantities of data cheaply, with minimal configuration effort and zero new hardware. Organizations can swipe a credit card for immediate setup of multiple CDWs under the control of individuals and small teams who can manage data easily and scale up or down as needed.
Contrast this to how it was in 2005, when establishing a traditional on-premises enterprise data warehouse (EDW) involved an initial seven-figure price tag, a single shared project, an inflexible, highly structured environment, and a complex access process. Flash forward to 2020: CDWs empower data-driven enterprises by democratizing data access.
The Missing Piece: Data Governance
Unfortunately, there’s a catch. Although traditional EDWs were ponderous and expensive, the up-front planning and budgeting they required meant that they were typically implemented with thoughtful governance and oversight already in place. Not so much with CDWs.
While CDW vendors do supply important security measures, they’re not in the business of providing detailed data access governance. Meanwhile, the emphasis on speed and direct access for a wide user base means that organizations using CDWs must contend with important gaps in oversight and protection that are not addressed by traditional governance approaches.
The good news is that a small group of the right people within an organization can work quickly to improve data access governance for CDWs and fill those gaps while keeping compliance and ease of use firmly in mind. Although the titles represented in this group will vary, they usually include data scientists, data engineers, and the Chief Privacy Officer or an equivalent role. The starting point for this process is determining what constitutes sensitive data for the purposes of the organization.
Context is Crucial for Identifying Sensitive Data
When the U.S. Privacy Act was passed in 1974, much corporate data was still kept in paper files stored in filing cabinets. Access to those files could be easily monitored, granted, or prohibited, and it was easy to restrict access and designate special security measures for the most sensitive information. There was certainly no expectation that most of the critical data within a company would be available via computer.
By contrast, the vast amounts of data that EDWs and CDWs store — and the ready access to them by so many users —make it crucial for each organization to determine just what it should regard as “sensitive data.” There’s a fine line between protecting too much data and not protecting enough; it’s up to the organization to determine their own best practices.
To decide which data is sensitive, context is key. Consider the example of IP address data. An IP address might be considered personal data because it identifies an individual computer, but in many cases the IP address by itself is not necessarily sensitive data. It depends on how it is being used. Many organizations, in fact, routinely collect IP addresses to analyze traffic to their websites. Now imagine that sort of innocuous data collection — but for a health organization whose website helps users find HIV services. The organization would rightly feel an obligation to treat its IP address data as sensitive. Otherwise, it would run the risk of lax data governance exposing the sensitive health information of its user base.
The point is clear: Discussing an element of data in isolation, even a seemingly simple one, just isn’t enough. When in doubt, make these considerations more personal so that you can better consider the impacts. Ask two questions:
- How would I feel if this were my data?
- How would I feel if suddenly all this data were gone, or leaked to the outside world?
While regulations like the EU’s General Data Protection Regulation (GDPR), the California Consumer Privacy Act (CCPA), and others are important to know and keep up with, they are also constantly changing. When building a CDW infrastructure, the most efficient place to put effort is in three foundational concepts that these privacy regulations are based on:
- Transparency: “Here’s the data we collect, and here’s how we use it.”
- Minimization: “We collect only the minimum data needed for these specific purposes.”
- Use limitation: “We limit how we use the data to the specific purposes designated.”
If an organization does this, it will meet 95 percent of the requirements of these regulations. It can confidently leave the other 5 percent to someone on the implementation team who knows the updated regulations inside and out.
How to Fill the Gaps in CDW Data Access Governance
The data experts who establish and review these principles will formulate good policies and supply your organization with the advice it needs. However, even the best advice is meaningless without the right technology to implement it.
The right mix of technology to address data access governance will operate at three levels:
Level One — Identity Authentication
Identity authentication is the first line of defense for access governance. It typically uses multi-factor authentication to make sure that a user is who they say they are. By itself, this is not enough, since many security and privacy events occur when user credentials or identities are shared or compromised. (According to Verizon’s 2020 Data Breach Investigations Report, stolen or misused credentials were a factor in 37% of breaches.)
Level Two — Data Resource Access Controls
By guarding access to specific data stores, tables, and fields, these essential controls determine who gets to see which data once they have logged in. To take one example, think of all the many people who should have access to the names in an employee directory, versus the very few people who should have access to those employees’ dates of birth and Social Security Numbers. Unfortunately, minimizing or removing access is difficult to implement and scale with traditional technologies.
Level Three — Consumption Controls
These query-level controls enforce who is allowed to consume which data, how much data at a time, when, and from where. This new approach, which is focused on consumption, enables the use of complex and powerful governance policies.
This third level is essential to handle data access governance and fill the gaps left by traditional approaches; however, it’s always been an exhaustively manual process. Technically, organizations could fall back on older technology to fill some of these gaps, for example by using a proxy to manage access to the CDW. But that would hamper performance — defeating one of the main purposes of a CDW — while leaving many vulnerabilities unaddressed.
With an understanding in place for how the company will address each level of access governance, the team can now start mapping what data access will look like for different roles in the organization.
How Mapping Consumption Control Works in Practice
Begin by looking at how many people in different roles have access to different data products within the organization — from more technical ones like data science applications and raw SQL tables to processed business outputs like financial reports and operational dashboards.
Make sure that the details discussed are put into the context of people’s daily work and clearly understood by all involved, whether the use cases are more technical or less.
Map out levels of access according to different user roles in a table like the one below. For example, data scientists will often have unlimited access to many types of data, but there will be only a few of them in comparison to casual business users. Keep building out the table across roles to the casual users, agreeing to access and consumption levels at each stage. Keep in mind that, while risk does increase as the number of users increases, the opportunity for consumption controls increases, too. The biggest group, which contains the casual users, often has view-only access to data, and only in the form of charts or dashboards.
Remember that these roles are not static, in terms of either responsibilities or who holds the jobs, so it’s important that governance rules not become static either. Make a plan to reevaluate at regular intervals, and to evolve your organization’s approach. Keep in mind the guiding principle that the easier it is for each user to securely access the information they need to do their job, the better.
Pairing Ease of Use With a Zero-Trust Approach
When thinking about data security, we can learn important lessons from the financial services industry. In particular, consider banks’ twin focus on ease of use and what cybersecurity experts call “zero trust” — an environment in which every user must be validated every time they try to access resources. From ATMs to mobile banking, each bank knows that if they don’t make it easy, no one will want to use their services. Yet they also must keep rigorous security measures and consumption controls in place to prevent money from falling into the wrong hands.
Consider the example of an ATM. When someone uses one, it’s not enough that they’re a bank customer with the correct PIN; the system will enforce very specific limitations on whatever they try to do. Before they can make a withdrawal or transfer, the system checks that the money is available. Rather than allow them to clean out an account all at once, the system enforces a single-transaction or daily limit to prevent that. And if they finish their transaction, walk away, and then walk back when they remember something else they meant to do, the system makes them go through authentication again. All while a camera records every move.
Query-level data governance for a CDW treats data much like an ATM treats cash. It puts zero-trust limitations in place so that unusual behavior can be intercepted at any point along the way. The system implements observability for every data request, tracking who is making the query, which data (and how much of it) is being requested, when the request is happening, and where the request comes from. As needed, the system implements controls on consumption by masking sensitive data that a user shouldn’t see, slowing down requests that come at unusual times or request too much data, and simply blocking requests that violate certain rules or thresholds. Finally, it records an immutable log of what happened — including queries, the resultant datasets, and administrative actions — so that important patterns in data access can be uncovered and any anomalies can be analyzed for their security and compliance ramifications.
This allows data users who are working within the scope of their roles to get their jobs done with no impediments, while also preventing improper credentialed access to data, whether accidental or intentional. It’s a smart approach to data governance that mitigates risk.
The Power of DSaaS to Protect Sensitive Data in the CDW
Data security as a service (DSaaS) gives organizations the most control, the most visibility, and the most context for data governance — all at the query level — without compromising the flexibility, speed, or scalability of the CDW. DSaaS distributes security across all of your data wherever it is, with virtually no impacts on performance.
When looking for a DSaaS solution, make sure it is optimized to accomplish four key goals:
1. Match cloud with cloud: use cloud-native tools when using a CDW.
Not all technology is made to integrate with cloud solutions. Rather than using bolted-on tools that aren’t cloud-native, look for tools that are purpose-built to run in the cloud to boost the value of the CDW. Cloud-native tools make data access governance easier to implement, and empower the organization to leverage the cloud intelligently.
2. Future-proof security and privacy compliance: enable effective data governance without constant re-engineering.
Data regulations are continually changing, as are people’s perceptions of data privacy. Create instrumentation and embed tools that allow changes to be adopted as the data-privacy landscape changes. DSaaS makes it easy for administrators, compliance officers, and security personnel to establish and adapt rulesets that govern the flow of data, without requiring developers to code and test the logic from the ground up.
3. Balance data security with innovation: allow data users to get what they want without creating trouble.
By placing security and governance within the application itself, DSaaS enables granular governance so that each user can access the data they need to get their job done — but only that data, and only at times and in quantities that make sense. Beyond that, analytical insights created by DSaaS allow administrators to gain better understanding of data consumption within the organization. From there, they can enact the best policies and rules to follow privacy and compliance regulations without stifling users.
4. Adopt a transactional mindset: fill gaps with granular visibility and control.
Broad policies allow things to slip through the cracks. DSaaS works all the way down to the level of individual queries and applies zero trust consistently, not only when a user begins a CDW session but also at each step along the way. Even better, it does so in real time, supporting intelligent policies with instant enforcement to protect the organization and minimize security, privacy, and compliance risks.
Freedom of Data Access and Control of Consumption Go Hand in Hand
CDWs are ushering in a new era of convenience in data access. But that broader access brings with it critical concerns about data governance, especially in light of new privacy and security regulations. With DSaaS in place covering all of these aspects, organizations are uniquely set up not only to monitor access or raise a flag when someone breaks the rules, but also to cut off access and consumption in real time, all without slowing down any other functionality.
Freedom of data and control of consumption aren’t as antithetical as they may seem. Through DSaaS, there is a new opportunity to control both access to data and consumption of that data. Combining observability with that level of control keeps data safer than ever.
CDWs increase the value of data, while DSaaS reduces the attendant risks. Using both together enables organizations to improve privacy and compliance while taking full advantage of the portability, scalability, innovation, and speed of the cloud. Whether you’re responsible for implementing a security solution or not, you still play a part in limiting the risk to your organization’s most valuable asset: its data. Because applications like cloud data warehouses are so easy to set up, convenience often overshadows security. With DSaaS, you finally get both.
To learn more, read our solution brief: Enhancing Snowflake with Query-Level Governance.