Sunday, August 22, 2010

Inferring Your Ties on Google Buzz Using What Your Friends Say


A key component of privacy in social networks is the extent to which your connections and associations are public. For instance, Facebook makes your friends' names and IDs available to any application you use, though it allows you some control over how this information is revealed on your profile page. Google Buzz allows you to remove the lists of people you are following and who follow you from your public profile. An interesting question for a Buzz user is how effectively this feature allows you to hide your ties to others from public view.

The graph above shows that in a sample of over seven thousand Google Buzz users, I was able to infer approximately 40% of their ties without needing to refer to their reported ties. I just used the ties their friends reported. With a sufficiently comprehensive crawl, this percentage would approach seventy, or the percentage of people I estimate to make their following and follower lists public.

In other words, in social networks, what your friends say about your ties reveals a lot, even if you yourself keep the information hidden.

How I performed this analysis

Using the student-derived data set I reported on previously, I did the following:

  • I collected the follower and following information of 7,225 network participants reporting their following and follower lists publicly.
  • I counted a tie when one participant appeared in both the following and follower lists of another participant. This method allowed me to infer when a person who kept their lists hidden was tied to another person.
  • For users reporting their ties publicly, I plotted the relationship between inferred and reported ties.
  • Regressing reported on inferred ties for these public reporters revealed that for every inferred tie, the person reported approximately 2.5 ties. Stated otherwise, inferred ties represent 40% of reported ties (n.b., 0.4 or 40% is the inverse of 2.5).

That's not the end of it

One might assume that if everyone kept their following and follower lists hidden that that would be the end of it. Well, not really. Ties can simply be inferred based on public communication patterns. The lesson here is that the extent to which any of your interactions take place in a public space, inhabitants of that space will be able to infer things about you and the people you are connected to.

Areas that require further work

As in my prior post, my sampling approach here is not random. In particular, my students were following people who they could find publicly, so the estimate of the percentage of people hiding their following and follower lists is likely low. Further, I'm assuming that people who hide their following and follower lists are similar to those who report them publicly.

The solution to both these issues is better study design with random sampling. Further, the issue of hidden follower and following lists can be addressed by getting those users' permission to access their lists.