Privacy Awareness Week: Part 1: How much data is really being collected about you?

By Michael Hodson, Security Architect and AI Strategist at Digital Resilience.

Given the technological landscape in which we live, it is understandable that the privacy of information about ourselves is of high concern. Almost everything we do is now online; communicating with friends, purchasing goods and services, research, entertainment, banking, are all completed to a large extent, if not solely, online. I think it is well understood that the online platforms we use for these purposes collect various pieces of information about us, including not only things like names and phone numbers, but also search histories, websites we have been to, content we have engaged with, and preferences.

While it may be understood that this occurs, the extent to which data is collected about us is unclear. But this is a key concept for understanding privacy in our modern world. This article explores some of the ways data is collected and some examples of when it has gone too far.

Surveillance Capitalism

Since popularized by Shoshana Zuboff in her book ‘The Age of Surveillance Capitalism’, Surveillance Capitalism has become the standard term used to describe the collection of data for the purposes of making a profit, primarily through selling targeted advertising. This concept was first introduced by Google and the AdWords platform. Initially, Google was primarily a search engine that enabled its users to search for information on the Internet. As the platform evolved, it began to collect more and more information to produce better search results. Google realised they could use this information to sell targeted ads to advertisers, and the age of surveillance capitalism begun. This is now Google’s primary source of revenue. Ever felt like your phone was listening to you? This is why.

Data Breaches

As we rely more on technology to deliver services such as finance, health, utilities, etc., we inevitably need to share more and more personal information with more companies. Each of the companies that we engage with for services needs to collect certain information about us so they can provide the requested service.

There is no problem with this per se. The problem is that these companies are now the custodians of that data. And cyber criminals want it.

In October 2022, Australia’s largest private health insurer, Medibank, was hacked. The attacker stole the Private Health Information (PHI) of over 9.7 million Australians, including the details of health insurance claims. After Medibank refused to pay the ransom, this data was released on the Dark Web.

Data Brokers

While sharing (or losing) small amounts of personal information may not seem like an issue to most, the problem becomes more pronounced when the data is aggregated to form detailed profiles. Data Brokers collect information from various sources, such as public records, credit companies, and social media, to build profiles on people that can then be sold for various purposes, including advertising. It is estimated that Data Brokers can have up to 1,500 pieces of information on an individual and that the industry is worth hundreds of billions of dollars each year.

Aggregation and data broking is also common in the criminal world. Data can be combined from multiple data breaches, and with data obtained from legal and public sources, to form detailed profiles of individuals that can then be used for identity theft and extortion.

When it goes too far

The Cambridge Analytica (CA) data scandal is an excellent example of data collection gone too far. CA built a Facebook app called ‘This Is Your Digital Life’ that asked simple questions in the form of a Facebook quiz to build a personality profile of users. CA then used the answers from the questionnaire and combined it with other data they were able to obtain from Facebook, including what users had ‘liked’, to build profiles of the users with up to 5,000 data points on each user. The combination of data from Facebook and the personality survey enabled CA to then build a sophisticated algorithm that could predict the big five character traits (Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism) of people based on what they were doing on Facebook.

However, the CA scandal gets much bigger. Facebook allowed CA not only to collect the Facebook data of the 270,000 users that paid to use the app, but also the data of all the contacts of those people. Having this data, combined with data collected from data brokers, and their new profiling algorithm, CA were able to build psychometric profiles on an estimated 87 million people, which they then used to micro target political advertising based on their character traits and preferences.

In 2017, The Australian was shown leaked documents from Facebook claiming that the company had sophisticated algorithms that enabled them to identify when teenagers as young as 14 were feeling ‘stressed’, ‘defeated’, ‘overwhelmed’, ‘anxious’, ‘nervous’, ‘stupid’, ‘silly’, ‘useless’, and a ‘failure’, and could sell this information to advertisers. Most users, I think, would be shocked to find that a social media platform was tracking when they were feeling ‘worthless’.

Summary

As we have seen, data collection evolved from a genuine desire to improve the services offered by technology companies. But once these companies had collected vast amounts of information, they discovered that they could monetise it. In the case of Google, this was through selling targeted advertising.

When it became evident that companies had collected vast amount of data, others found ways to make money from it. Cyber criminals worked out how to steal it and monetise it through identity theft and extortion, and data brokers worked out how to aggregate, enrich, and sell it, and technology companies used complex algorithms to infer new and valuable information from it.

But the question remains – why should we care? In the next post we will discuss the impact to privacy.