Thanks to the recent news of the US Department of Justice's requests for information from the major web search engines, I've seen a number of people express surprise and dismay that online services track information that they'd consider private. A term that I've seen bandied about a lot recently is Personally Identifiable Information (PII) which I'd never heard before starting work at MSN.

The Wikipedia definition for Personally Identifiable Information (PII) states

In information security and privacy, personally identifiable information or personally identifying information (PII) is any piece of information which can potentially be used to uniquely identify, contact, or locate a single person.

Items which might be considered PII include, but are not limited to, a person's:

Information that is not generally considered personally identifiable, because many people share the same trait, include:

  • First or last name, if common
  • Country, state, or city of residence
  • Age, especially if non-specific
  • Gender or race
  • Name of the school they attend or workplace
  • Grades, salary, or job position
  • Criminal record

When a person wishes to remain anonymous, descriptions of them will often employ several of the above, such as "a 34-year-old black man who works at Target". Note that information can still be private, in the sense that a person may not wish for it to become publicly known, without being personally identifiable. Moreover, sometimes multiple pieces of information, none of which are PII, may uniquely identify a person when brought together; this is one reason that multiple pieces of evidence are usually presented at criminal trials. For example, there may be only one Inuit person named Steve in the town of Lincoln Park, Michigan.

In addition, there is the notion of sensitive PII. This is information which can be linked to a person which the person desires to keep private due to potential for abuse. Examples of "sensitive PII" are a person's medical/health conditions; racial or ethnic origin; political, religious or philosophical beliefs or affiliations; trade union membership or sex life.

Many online services such as MSN have strict rules about when PII should be collected from users, how it must be secured and under what conditions it can be shared with other entities. However many Internet users don't understand that they disclose PII when using online services. Not only is there explicit collection of PII such as when user's provide their name, address and credit card information to online stores but there is often implicit PII collected which even savvy users fail to consider. For example, most Web servers log IP addresses of incoming HTTP requests which can then be used to identify users in many cases. It's easy to forget that practically every website you visit stores your IP address somewhere on their servers as soon as you hit the site. Other examples aren't so obvious. There was a recent article on Boing Boing entitled Data Mining 101: Finding Subversives with Amazon Wishlists which showed how to obtain sensitive PII such as people's political beliefs from their wishlists on A few years ago I read a blog post entitled Pets Considered Harmful which showed how one could obtain sensitive PII such as someone's email password by obtaining the name of the person's pet from reading their blog since "What is the name of your cat?" was a question used by GMail to allow one to change their password.  

The reason I bring this stuff up is that I've seen people like Robert Scoble's make comments about wanting "a button to click that shows everything that’s being collected from their experience". This really shows a lack of understanding about PII. Would such a button prevent users from revealing their political affiliations in their Amazon wishlists or giving would be email account hijackers the keys to their accounts by blogging about their pets? I doubt it.

The problem is that most people don't realize that they've revealed too much information about themselves until something bad happens. Unfortunately, by then it is usually too late to do anything about it. If are an Internet user,  you should be cognizant of the amount of PII you are giving away by using web applications like search engines, blogs, email, instant messaging, online stores and even social bookmarking services.

Be careful out there.


Thursday, January 26, 2006 11:39:58 PM (GMT Standard Time, UTC+00:00)
I get the 'be careful' message, of course we need to be aware of what PII we're leaving on the net as we browse. But does the fact that people are so ignorant of this excuse indefinite retention of PII and only vague 'privacy policies' that are unenforceable? Its hard enough getting unsubscribed from a mailing list, how can we know what PII companies are retaining and for how long? How can we be careful? there's no disclaimer notice prior to entering a site, and if there were it would have the same effect as the EULA splash screen when installing software.
As an article on Slate ( comments, we could browse the internet knowing that all our information /could/ appear in a major newspaper, but thats the exact opposite of what we (should) expect from the internet.
We can't rely on websites to play nice, and apparently US citizens can't rely on their president to do the same; what chance for us mere users?
I know that by default alot of PII is logged, what if we started making a stand and didn't? Or at least, when we do, explain somewhere publicly what we're tracking, why, and how long that information will be retained.
