What is PII Data? A Definition
PII security has become something just about everyone has had to think about in the last few years with the increase in personal data breaches and the passage of the GDPR regulations in Europe. But that doesn’t mean it’s well understood. What do we mean when we talk about PII data anyway? Personally Identifiable Information or PII data generally refers to information that is related to or key to identifying a person. There are broader terms such as “personal data” or “personal information,” but “PII” has become the standard acronym used to refer to private or sensitive information that can identify a specific individual. The US NIST framework defines Personally Identifiable Information as any representation of information that permits the identity of an individual to whom the information applies to be reasonably inferred by either direct or indirect means.
While the abbreviation “PII” is commonly used in the United States, the phrase it abbreviates is not always the same – there are common variants based on personal or personally, and identifiable or identifying. PII data meaning ends up varying depending on the jurisdiction and the purpose for which the term is being used. For example, where the General Data Protection Regulation (GDPR) is the primary law regulating PII data, the term "personal data" is significantly broader. Regardless of the definition used, the focus on PII security is also growing quickly.
PII Data Examples
The first step to PII security is understanding what is considered PII data. As mentioned above, it’s more complicated than it may first appear. Not all private information is PII and not all PII data is private information. In fact, much of the information considered PII data and covered by regulation is actually publicly available information, such as an individual’s name or phone number. However, some of the information, especially when combined and in the hands of bad actors, can lead to negative consequences for individuals. Here are some PII examples:
- Names: full name, maiden name, mother’s maiden name, or alias
- Individual identification numbers: social security number (SSN), patient identification number passport number, driver’s license number, taxpayer identification number, financial account number, or credit card number
- Personal address: street address, or email address
- Personal phone numbers
- Personal characteristics: photographic images (particularly of a face or other identifying physical characteristics), fingerprints, handwriting
- Biometric data: retina scans, voice signatures, facial geometry
- Information identifying personal property: VIN or title number
- Technical Asset information: Internet Protocol (IP) or Media Access Control (MAC) addresses that consistently link to a particular person’s technology
What is Not PII Data?
PII security becomes easier if you understand what is not PII data. The examples below are not considered PII data alone as each could apply to multiple people. However, when combined with one of the above examples, the following could be used to identify a specific person:
- Date of birth
- Place of birth
- Business telephone number
- Business mailing or email address
- Geographical indicators
- Employment information
- Medical information
- Education information
- Financial information
PII vs PHI vs PCI Data
PII data has much in common and some overlap with other forms of sensitive or regulated data such as PHI and PCI, but it is not the same. Confusion often arises around whether PII means information that is identifiable (can be associated with a person) or identifying (associated uniquely with a person, so that the PII actually identifies them). In narrow data privacy rules, such as the Health Insurance Portability and Accountability Act (HIPAA), PII items have been specifically defined. In broader data protection regulations such as the GDPR, personal data is defined in a non-prescriptive principles-based way. Information that might not count as PII under HIPAA could be considered personal data per GDPR.
PHI data is personal health information as defined by the Health Insurance Portability and Accountability Act of 1996. HIPAA provides federal protections for personal health information held by covered entities and gives patients an array of rights with respect to that information. At the same time, HIPAA permits the disclosure of personal health information needed for patient care and other important purposes. This federal law required the creation of national standards to protect sensitive patient health information from being disclosed without the patient’s consent or knowledge. The US Department of Health and Human Services (HHS) issued the HIPAA Privacy Rule to effect the requirements of HIPAA. The HIPAA Security Rule protects a subgroup of information covered by the Privacy Rule. In addition to very clear health information, there is some overlap as when PII data like name, date of birth, and address are tied to personal health information, it is considered PHI as well.
PCI data stands for “payment card industry” and is defined by a consortium of financial institutions comprising the Payment Card Industry. The definition comes from the rules for protecting data in the PCI-DSS or payment card industry data security standard. The PCI Security Standards Council (SSC) defines “cardholder data” as the full Primary Account Number (PAN) or the full PAN along with any of the following identifiers: cardholder name, expiration date or service code. The rules were implemented to create an additional level of protection for card issuers by ensuring that merchants meet minimum levels of security when they store, process, and transmit cardholder data.
In the past PCI data might have been considered the most valuable and most at risk because it was related to financial data and could be used to directly access money. However, as many of us have unfortunately learned due to rampant credit card fraud over the last few years, credit card numbers can be easily changed. It’s not nearly as easy to move, change your social security number, or even your name. Those who have dealt with identity theft can understand how devastating it can be when unknown loans or other fraud show up on your credit report. And health information is simply unchangeable as its part of a person’s permanent “life record.” That puts PII data and PHI data in the lead in the race for data value and data risk. PII data might be considered more at risk due to its proliferation so PII security should always be a priority.
PII Security and the Internet
Before 1994, very little of our PII data was easily accessible so PII security wasn't as critical. If you wanted someone’s phone number, you had to know their name and have a hefty copy of what we called the “white pages” (a phone book) in order to look them up. Maybe a bank or telephone company had access to thousands of phone numbers, but not the average person. All of that changed with the advent of the Internet. The concept of PII data has become prevalent as information technology and the Internet have made it easier to collect PII. Every online order requires a name and email, not to mention physical address or phone number. This has led to a profitable market in collecting and reselling PII. PII can also be exploited by criminals in stalking or identity theft, or to aid in the planning of criminal acts. In reaction to these threats, many website privacy policies now specifically inform users on the gathering of PII, and lawmakers have enacted a series of regulations to limit the distribution and accessibility of PII making PII security a priority for consumers and companies.
PII Security Regulations
The era of stringent PII data privacy regulations that required PII security really kicked off with the implementation of the European Union’s General Data Protection Regulation (GDPR) in May 2018. This regulation requires organizations to safeguard personal data and uphold the privacy rights of anyone in EU territory. The regulation includes seven principles of data protection that are required and eight privacy rights that must be enabled. It also gives member state-level data protection authorities the power to enforce GDPR with sanctions and fines. The GDPR replaced a country-by-country patchwork of data protection laws and unified the EU under a single data protection regime. The regulation doesn’t apply to just European companies, however. Any company holding personal data of European citizens must comply.
The US is further behind the PII privacy regulation game. There is as yet no federal or national privacy regulation that applies across the country. The US is still in the patchwork era with some states like California, Utah, Colorado, Connecticut and Virginia passing state-level regulations. Five more states have introduced regulations. In 2022, a new bipartisan regulation called the American Data Privacy and Protection Act was introduced in the US House of Representatives. It follows the direction of GDPR and would apply to data controllers and accessors. It is effectively a consumer “Bill of Rights” around PII data privacy. The legislation currently sits in the House of Representatives for approval.
4 Steps to Complete PII Security
These privacy regulations have specific rules around PII security – what data should be protected and how. But in order to comply fully and reduce risk of censure, fees or fines, companies will need to take 4 key steps:
- Data classification: The first step to PII security is to identify sensitive information stored in your company’s databases. This can be done manually by reviewing all the databases and tagging columns or rows that contain PII. Some database solutions allow you to write SQL processes to do this also. However, it’s much faster and less error-prone to utilize an automated solution to find and tag social security numbers, date of birth or other key information wherever it’s located.
- Data access controls: Once PII data is identified controls that allow only approved individuals to access sensitive data should be applied. These controls can include data masking (changing characters to ***) and row or column-level access policies. A common additional requirement is auditable documentation of who has accessed what data and when.
- Data rate limiting: Because it’s best to assume any credentials could be compromised at any time, it’s best to limit the amount of damage even authorized access can do. Instead of allowing millions of lines of data to be downloaded, apply controls that limit the amount of data by role, by location, by time access to reduce the risk of a massive breach.
- Data tokenization: Finally, the most sensitive data, should be secured via a data tokenization solution that ensures even if “data” is accessed by a bad actor, they will only get their hands on tokens that are useless to them. The real data is stored in an encrypted token vault.
PII Security Conclusion
The problem of PII security is only on the upswing. As companies extract more insight and value from personal data on consumers, product users and customers, they’ll continue to gather, hold, share and utilize data. In fact, companies are not just collecting data for their own use, but to monetize it by selling the insights on their own customers to others to glean information from. While data collection and storage are increasing, laws regulating how this data can be stored and used are also increasing. Companies can stay ahead of the curve with processes and solutions to help scale PII security with the growth of PII data.