What is tokenization of data?
A few years ago, a handful of “tokens” used to be as good as gold at the local arcade. It meant a chance to master Skee-Ball or prove yourself a pinball wizard by getting your initials on the leaderboard. But what's "tokenization of data?” It's kind of the same thing except instead of exchanging money for tokens, you exchange sensitive data. It's a data security solution alternative to encryption or anonymization. By substituting a “token” with no intrinsic value in place of sensitive data, such as social security numbers or birth dates that do have value, usually quite a lot of it, companies can keep the original data safe in a secure vault, while moving tokenized data throughout the business.
And today, one of the places just about every business is moving data is to the cloud. Companies may be using cloud storage to replace legacy onsite hardware or to consolidate data from across the business to enable BI and analysis without affecting performance of operational systems. To get the most of this analysis, companies often need to include sensitive data.
Tokenization of data is ideal for sensitive data security in the cloud data warehouse environment for at least 3 reasons:
#1 Tokens have no mathematical relationship to the original data, which means unlike encrypted data, they can’t be broken or returned to their original form.
While many of us might think encryption is one of the strongest ways to protect stored data, it has a few weaknesses, including this big one: the encrypted information is simply a version of the original plain text data, scrambled by math. If a hacker gets their hands on a set of encrypted data and the key, they essentially have the source data. That means breaches of sensitive PII, even of encrypted data, require reporting under state data privacy laws. Tokenization on the other hand, replaces the plain text data with a completely unrelated “token” that has no value if breached. Unlike encryption, there is no mathematical formula or “key” to unlocking the data – the real data remains secure in a token vault.
#2 Tokens can be made to match the relationships and distinctness of the original data so that meta-analysis can be performed on tokenized data.
When one of the main goals of moving data to the cloud is to make it available for analytics, tokenizing the data delivers a distinct advantage: actions such as counts of new users, lookups of users in specific locations, and joins of data for the same user from multiple systems can be done on the secure, tokenized data. Analysts can gain insight and find high-level trends without requiring access to the plain text sensitive data. Standard encrypted data, on the other hand, must be decrypted to operate on, and once the data is decrypted there’s no guarantee it will be deleted and not be forgotten, unsecured, in the user’s download folder. As companies seek to comply with data privacy regulations, demonstrating to auditors that access to raw PII is as limited as possible is also a huge bonus. Tokenization allows you to feed tokenized data directly from Snowflake into whatever application needs it, without requiring data to be unencrypted and potentially inadvertently exposed to privileged users.
#3 Tokens maintain a connection to the original data, so analysis can be drilled down to the individual as needed.
Anonymized data is a security alternative that removes the personally identifiable information by grouping data into ranges. It can keep sensitive data safe while still allowing for high-level analysis. For example, you may group customers by age range or general location, removing the specific birth date or address. Analysts can derive some insights from this, but if they wish to change the cut or focus in, for example looking at users aged 20 to 25 versus 20 to 30, there’s no ability to do so. Anonymized data is limited by the original parameters which might not provide enough granularity or flexibility. And once the data has been analyzed, if a user wants to send a marketing offer to the group of customers, they can’t, because there’s no relationship to the original, individual PII.
Tokenization of data essentially provides the best of both worlds: the strong at-rest protection of encryption and the analysis opportunity provided by anonymization. It delivers tough security for sensitive data while allowing flexibility to utilize the data down to the individual. Tokenization allows companies to unlock the value of sensitive data in the cloud.