What is SQL, and why is it important?
SQL: Structured Query Language.
In this blog, we’re going to explain why SQL is so important without getting too technical. It’s the query language for relational databases from Oracle, Microsoft, IBM, Snowflake, and others that primarily store and process sensitive information like personally identifiable information (PII), PHI health information, and PCI data. The often quoted 2020 Verizon Data Breach Investigation Report shows that SQL injections are the second most prevalent attack threat (right behind credentialed threats, as seen in our last blog), so it’s certainly worth talking about.
Let’s use Snowflake (who, coincidentally, has their own version of SQL) for example. They have one of the best examples of a secure data environment: SSO, 2FA, RBAC, secure views, you name it. That makes it difficult to misuse data access... but not impossible. The aforementioned security features are entirely dependent upon and trusting of the identity of the user. If someone can present the correct sequence of bytes over the internet to Snowflake, then they can pretend to be someone else.
In a world where you can hardly trust your food to be delivered with integrity, how can you trust a solution that depends entirely upon the validity of the user? If someone will steal your lunch, then someone with access to your data will certainly steal, or get targeted by criminals, for a number of reasons far more lucrative than your $3 taco.
What do we do about it?
Extend the idea of Zero Trust into the SQL layer, of course! For an in-depth look at Zero Trust, you can check out our webinar with Forrester Analyst Heidi Shey (Forrester coined the term Zero Trust, for reference). But for our purposes here, let’s just define it as “never trust, always verify”. In other words, each time someone wants access to data, confirm they should be able to do that. Think of it like an ATM - you walk up to an ATM and verify your identity in order to withdraw cash. Even after your identity is verified, you can still only get out a certain amount. If you go across town to a different ATM, you’ve got to verify your identity again to request money. But if you have reached your daily limit of withdraw then you are done. It doesn't matter that it is actually you asking for the money, the bank assumes it isn't you.
Verifying a user’s identity means a lot of things other than just 2FA, SSO, and RBAC. If someone’s credentials get stolen, you’d think it’s virtually impossible to know, right? Nope.
Every time a user wants to access data – regardless of their identity, role or title – you should check their previous use of data (per minute, hour, day, week, etc.) and other factors like what device or application they’re coming from. If it doesn’t match up with typical user patterns, then you know there’s a problem.
Regardless of what information the user is trying to request, if the rate of data consumption breaks a limit that the business deems appropriate, then the user (whoever they are) should not get the data. Period. Even if their title is CEO, they should not be able to access all the PII in the table in one query. Why? Because more access = more risk.
Having the ability to not only know when too much data is being queried but also being able to stop it in real time is game-changing and will effectively subdue the threat of SQL injections.
Why is Snowflake a great example case?
Snowflake Cloud Data Warehouse has shown the world that separating compute of data from storage of data is the best path forward to scale data work loads. They also have an extensive security policy which should make any CISO/CIO comfortable moving data to Snowflake.
So why would Snowflake need Zero Trust at the SQL layer given the statement above? It comes down to the shared customer responsibility model that comes with using IaaS, PaaS, or, like Snowflake, SaaS. As you can see below, with any SaaS provider the customer still has two very important problems left to solve: identity and data access.
Okta, for example, does a great job of solving for the identity portion of the matrix. And Snowflake has done everything possible to help with the data consumption side (I would say that using Snowflake is the safest way today to store and access data), but there is still this last remaining “what if” out there: what if someone steals my credentials? Or decides they want to do something malicious? The insider threat to data is very difficult for any organization to handle on their own.
What does Snowflake + Zero Trust SQL look like?
It starts by enhancing your Snowflake account with a solution that can detect and respond to abnormal data consumption in real-time. This will give your organization complete control over each user’s data consumption, regardless of access point (due to cloud integration).
This means that every time an authorized user requests data within Snowflake, they get evaluated and verified by a Zero Trust risk engine (like the ATM example). If abnormal consumption is detected based on the policies of the governing risk engine, then you can cut off access in real-time. The best part is that because it is integrated into Snowflake, users don’t see any changes to their day to day, your Snowflake bill won’t increase, and your security team can finally stop credentialed breaches and SQL injection attacks for good.