A Brave New World: Big Data's Big Dangers
New technologies are not all equal. Some do nothing more than add a thin extra layer to the top-soil of human behavior (i.e., Teflon and the invention of non-stick frying pans). Some technologies, however, dig deeper, uprooting the norms of human behavior and replacing them with wholly new possibilities. For the last few months I have been arguing that Big Data — the machine-based collection and analysis of astronomical quantities of information — represents such a turn. And, for the most part, I have painted this transformation in a positive light. But last week's revelations about the NSA's PRISM program have put the potential dangers of Big Data front and center. So, let's take a peek at Big Data's dark side.
The central premise of Big Data is that all the digital breadcrumbs we leave behind as we go about our everyday lives create a trail of behavior that can be followed, captured, stored and "mined" en-mass, providing the miners with fundamental insights into both our personal and collective behavior.
The initial "ick" factor from Big Data is the loss of privacy, as pretty much every aspect of your life (location records via mobile phones, purchases via credit cards, interests via web-surfing behavior) has been recorded — and, possibly, shared — by some entity somewhere. Big Data moves from "ick" to potentially harmful when all of those breadcrumbs are thrown in a machine for processing.
This is the "data-mining" part of Big Data and it happens when algorithms are used to search for statistical correlations between one kind of behavior and another. This is where things can get really tricky and really scary.
Consider, for example, the age-old activity of securing a loan. Back in the day you went to a bank and they looked at your application, the market and your credit history. Then they said "yes" or "no." End of story. In the world of Big Data, banks now have more ways to assess your credit worthiness.
"We feel like all data is credit data," former Google CIO Douglas Merrill said last year in The New York Times. "We just don't know how to use it yet." Merrill is CEO of ZestCash, one of a host of start-up companies using information from sources such as social networks to determine the probability that an applicant will repay their loan.
Your contacts on LinkedIn can be used to assess your "character and capacity" when it comes to loans. Facebook friends can also be useful. Have rich friends? That's good. Know some deadbeats, not so much. Companies will argue they are only trying to sort out the good applicants from the bad. But there is also a real risk that you will be unfairly swept into an algorithm's dead zone and disqualified from a loan, with devastating consequences for your life.
Jay Stanley of the ACLU says being judged based on the actions of others is not limited to your social networks:
Credit card companies sometimes lower a customer's credit limit based on the repayment history of the other customers of stores where a person shops. Such "behavioral scoring" is a form of economic guilt-by-association based on making statistical inferences about a person that go far beyond anything that person can control or be aware of.
The link between behavior, health and health insurance is another gray (or dark) area for Big Data. Consider the case of Walter and Paula Shelton of Gilbert, Louisiana. Back in 2008, Business Week reported how the Sheltons were denied health insurance when records of their prescription drug purchases were pulled. Even though their blood pressure and anti-depression medications were for relatively minor conditions, the Sheltons had fallen into another algorithmic dead zone in which certain kinds of purchases trigger red flags that lead to denial of coverage.
Since 2008 the use of Big Data by the insurance industry has only become more entrenched. As The Wall Street Journal reports:
Companies also have started scrutinizing employees' other behavior more discreetly. Blue Cross and Blue Shield of North Carolina recently began buying spending data on more than 3 million people in its employer group plans. If someone, say, purchases plus-size clothing, the health plan could flag him for potential obesity—and then call or send mailings offering weight-loss solutions.
Of course no one will argue with helping folks get healthier. But with insurance costs dominating company spreadsheets, it's not hard to imagine how that data about plus-size purchases might someday factor into employment decisions.
And then there's the government's use, or misuse, of Big Data. For years critics have pointed to no-fly lists as an example of where Big Data can go wrong.
No-fly lists are meant to keep people who might be terrorists off of planes. It has long been assumed that data harvesting and mining are part of the process for determining who is on a no-fly list. So far, so good.
But the stories of folks unfairly listed are manifold: everything from disabled Marine Corps veterans to (at one point) the late Sen. Ted Kennedy. Because the methods used in placing people on the list are secret, getting off the list can, according to Connor Freidersdorf of The Atlantic, be a Kafka-esque exercise in frustration.
A 2008 National Academy of Sciences report exploring the use of Big Data techniques for national security made the dangers explicit:
The rich digital record that is made of people's lives today provides many benefits to most people in the course of everyday life. Such data may also have utility for counterterrorist and law enforcement efforts. However, the use of such data for these purposes also raises concerns about the protection of privacy and civil liberties. Improperly used, programs that do not explicitly protect the rights of innocent individuals are likely to create second-class citizens whose freedoms to travel, engage in commercial transactions, communicate, and practice certain trades will be curtailed—and under some circumstances, they could even be improperly jailed.
So where do we go from here?
From credit to health insurance to national security, the technologies of Big Data raise real concerns about far more than just privacy (though those privacy concerns are real, legitimate and pretty scary). The debate opening up before us is an essential one for a culture dominated by science and technology.
Who decides how we go forward? Who determines if a technology is adopted? Who determines when and how it will be deployed? Who has the rights to your data? Who speaks for us? How do we speak for ourselves?
These are the Big Questions that Big Data is forcing us to confront.