Many of us have become more aware of the power of increased knowledge around our activities over the last few years – whether it’s a FitBit monitoring our steps or an energy audit delivering a detailed view of how everyone in your home uses lights, appliances, electronics, and other things that need power. Each month your utility company monitors your usage, and the details can help you recognize ways to lower your bill and identify current problems that are making your home less energy efficient. You can do the same with data usage observability. By capturing and monitoring who is running queries on data and when, you can make informed decisions to prevent data breaches and leaks. ALTR Heat Maps and Analytics Dashboards can make this information easily viewable and digestible so you can see where the issues might be.
This blog provides a high-level explanation of what data observability is and why it’s important, how it works if you do it manually in Snowflake, and how it works if you use ALTR to automate the process. We’ve also included a few use cases and a how-to demonstration video. We hope you find it helpful!
What is Data Observability and Why is It Important?
At a high level, data observability is presenting information about how users are accessing data in an easy-to-consume visual format. Operationalizing this through data observability tools is critical to helping you understand what’s occurring and to gauging abnormal events. The payoff for using ALTR's data usage observability tools is that they provide the information needed to meet two key data security policy goals:
- Ensure that you have policies in place for all roles who access sensitive data
- Help you understand what normal access looks like because you can't identify what is abnormal without a baseline to compare to
Snowflake logs capture this access information, and the events shown can help your Data Security team spot issues and minimize time spent on bottlenecks, speed, or other problems. But this information is delivered in a plain text format that requires a lot of work to extract those insights from.
How Snowflake Data Observability Works if you DIY
Snowflake provides the foundational query history data needed for data usage analytics via Snowflake logs; however, to be useful, the data must be processed to get it in a visual form that is easy to interpret.
To do data observability manually in Snowflake, you must follow the steps below.
- Parse the SQL query text to extract a list of columns that you’d like to request and then filter it to only include columns that contain sensitive data. NOTE: This will require you to write SQL statements.
- Next, tabulate the count of records that each user has accessed each column for each minute, hour, and day of the past 24 hours. NOTE: This will require even more SQL coding.
- Last, convert the data set into an interactive visual chart that will display the information in a more understandable format to view the results and drill through them. NOTE: This will require full stack development skills to implement.
As you can see from the steps that are required, data observability done manually will require more time and lots of coding.
How Snowflake Data Observability Works Using ALTR to Automate
ALTR's Dashboard provides a high-level view of everything that’s happening in ALTR. For example, it will show you how many locks and thresholds you have, open anomalies, databases that are connected, columns that are governed, and other detailed data.
The built-in ALTR Heatmap (also what we refer to as ‘Data Analytics’ or ‘Query Analytics’) delivers data observability in a visual representation of how your users are accessing data. It shows you the roles and specific users in those roles who are querying different data sets. The analytics will give insight about how your data is being used to help you identify where you need to assign policy or if you’ve already assigned policy to confirm how people are querying data within those policies.
When you hover over the heatmap (as shown in figure 4) you can see the total number of values accessed by your assigned user groups in the columns you’re governing. You also can drill down and view a more granular level of what data is accessed by your specific users, user groups, and data types.
If you check Add Data Usage Analytics when connecting your data source, ALTR will import the past 30 days of Snowflake's history to show you your organization's usage trends. From there, your query history will automatically sync daily on all columns in your connected database. See figure 3 for context.
Snowflake Data Observability Use Cases
Here are a couple of use case examples where ALTR’s automated data observability capability can benefit your business as it scales with Snowflake usage.
Use Case 1. You want to determine a typical consumption pattern and restrict access to no more data than is normal for the user’s role.
You’re an Administrator who has already created policies on your different column-level data but want to determine if you should create an additional one to your Credit Card column. You could view the data usage over the last 7 or 30 days to see what a typical consumption pattern is and then decide what to set your time or access-based threshold to.
Use Case 2. You want to determine if anything looks strange that may require action and ensure all roles accessing data have policies that cover them.
You’re a Data Scientist and want to confirm that the right user groups are accessing column-level data that you’ve created policies for. If anything looks strange (for example, certain roles are querying data on the weekends instead of your business hours) then you can determine if you need to block access or trigger an alert (anomaly) to protect your data.
Automate Snowflake Data Observability with ALTR
By operationalizing metrics through ALTR’s data observability tools , you can minimize data breaches and make informed decisions to stay in compliance with SLAs and certain regulations. The detailed data that the ALTR Dashboard and Analytics (Heatmap) provides is a must-have for an effective data security strategy. We’ve made using ALTR to view everything that’s going on within it and your analytics so convenient that you don’t have to write any SQL code. It’s a simple point-and-click in ALTR and you’re done!