Behind “Big Data”
Over the last few years, the idea has taken hold that “big data” is driving far-reaching, and typically positive, change. “How Big Data Changes the Banking Industry,” “Big Data Is Transforming Medicine,” and “How Big Data Can Improve Manufacturing,” are characteristic headlines. “Big data” has become ubiquitous, powering everything from models of climate change to the advertisements sent to Web searchers.
Even in a society in which acronyms and sound-bites pass for knowledge, this familiar formulation stands out as vacuous. It offers us a reified name rather than an explanation of what the name means. What is the phenomenon denoted by “big data”? Why and when did it emerge? How is “it” changing things? Which things, in particular, are being changed – as opposed to merely being hyped? And last, but hardly least, are these changes desirable and, if so, for whom?
Big data is usually defined as data sets that are so large and complex – both structured and unstructured – that they challenge existing forms of statistical analysis. For instance, Google alone processes more than 40 thousand search queries every second, which equates to 3.5 billion in a day and 1.2 trillion searches per year; every minute, Facebook users post 31.25 million message and views 2.77 milion video, 347,222 tweets are generated; by the year 2020, 1.8 megabytes of new information is expected to be created every second for every person on the planet.
The compounding production of data – “datafication,” in one account  – is tied to proliferating arrays of digital sensors and probes, embedded in diverse arcs of practice. New means of storing, processing, and analyzing these data are the needed complement.
A quick etymological search finds that the term “big data” began to circulate during the years just before and after 2000. Its deployments than quickened; but this seemingly sharp-edged transition into what Andrejevic and Burdon call a “sensor society” actually possesses a deeper-rooted history.
The uses of statistics in prediction and control have long been entrenched, and have increased rapidly throughout the last century – as is pointed out by a working group on “Historicizing Big Data” established at the Max Planck Institute for the History of Science. The group emphasizes that big data must not be stripped out of “a Cold War political economy,” in that “many of the precursors to 21st century data sciences began as national security or military projects in the Big Science era of the 1950s and 1960s.”
Indeed not, for the U.S. military has continued to play a leading role. The years immediately after 2000 were when existing technologies for data gathering and -analysis were deployed by the soon-infamous “Total Information Awareness” (TIA) program supported by the Defense Advanced Research Projects Agency (DARPA). TIA – which was a mass scale data mining project to monitor all citizens as part of the “war on terror” – was considered the brainchild of ex-admiral John Poindexter, who was by then an employee of a private DARPA contractor called BMT Syntek. While different levels of financial support for TIA have been reported, and not all its funding was related to big data, they were substantial: the Electronic Privacy Information Center (EPIC) estimated that TIA-related programs totaled $112 million in FY2003 and $240 million for the three-year period, FY2001-FY2003.
If military funding gave big data both an institutional base and economic momentum, then interlockings between the military and private companies considerably boosted the business. In addition to DARPA’s contract with Syntek, it also enlisted Booz Allen & Hamilton, Lockheed Martin Corporation, Schafer Corporation, SRS Technologies, Adroit Systems, CACI Dynamic Systems, and ASI Systems International, as well as Cornell, Columbia and the University of California at Berkeley. Intelligence agencies were especially keen to leverage cutting-edge research in Silicon Valley. The CIA’s venture capital firm In-Q-Tel has been a major funder for big data start-ups like Keyhole, Palantir, and MemSQL. A shared concern for big data likewise brings the National Security Agency’s (NSA) surveillance programs close to Silicon Valley tech companies.
TIA may have been roundly discredited; but today, everything from battlefield maneuvers to military logistics to drone strikes now relies on big data. The Obama Administration has further widened the Government’s focus, by announcing a $200 million initiative aiming to “address the challenges of, and tap the opportunities afforded by, the big data revolution to advance agency missions”; in 2016, DARPA’s budget request for big data research and development programs is expected to grow by a hefty 39 percent.
Jon Schwarz states that “the Pentagon’s drone program uses [big data] in almost precisely the same way IBM encourages corporations to use it to track customers. The only significant difference comes at the very end of the drone process, when the customer is killed.” Schwarz’s grimly acerbic characterization leads on to a closely related aspect of the big data phenomenon. Big data is a name, but what it denotes is not ethereal; it refers to and relies on what some analysts refer to as “technology platforms” that tap medical test equipment, freeways, satellite remote sensing, seismic measurements, factory machinery, and smart phones – and soon household appliances, cars and the much-vaunted “Internet of things.” Such platforms need to be built and maintained; and this is no trivial matter. If the military supplied major early funding support, then as corporate capital stampeded in a much wider foundation had been laid. Big data possesses not only an etymology, but also a political economy.
Datafication has been shaped increasingly into a new industrial profit site. Big data has been described as the “new oil” for the 21st century ; but the oil industry, for one, is literally attempting to extract profit from big data. Royal Dutch Shell, one of the energy sector’s super-majors, has been developing the concept of a “data-driven oilfield” in its bid to reduce the cost of drilling – when a single deep-water well can cost over $100 million.
The following are representative of innumerable other corporate profit projects: Amino is an online consumer service built on a database that includes “nearly every practicing doctor in America and the treatment of 188 million people” as records of what is billed and paid for stream in from a dozen different data sources. Sensors mounted on planters, tractors and combines are tracking data about planting, spraying and harvesting and sending it back to Monsanto and other agro-chemical companies and commodity futures traders, which are seeking new markets and competitive advantages armed with sophisticated data-analysis tools. The Weather Company’s smartphone app has been downloaded onto 40 million iPhones and iPads; doubtless unknown to most users, the app generates barometric pressure readings, which the Weather Company has decided to monetize. It sold its digital unit to IBM, for $2 billion. Calculating that $500 billion in annual commerce is heavily dependent on the weather, IBM is counting on new applications for what is treated as entirely proprietary data, notwithstanding that 40 million users are needed to generate it.
Datafication is not limited to already existing industries. Museums in the US are mining visitors’ behavioral data, deploying the same tactics as Netflix and Walmart for everything from curatorial decisions to gift shop marketing. This puts them squarely on a path to join up with the lucrative data business that is being cultivated by the Internet firms. Microsoft, Google, Amazon etc, whose businesses are built on data and large computing infrastructure, are expanding their businesses into enterprise data storage, processing infrastructure, data monitoring, and analytic services. According to IDC, the big data and analytics market overall will reach $125 billion worldwide in 2015.
These and other military and profit-driven applications of big data raise an acute, overarching problem: How to make them democratically accountable? Rather than foregrounding war-making and profit, can technology policy be recast to serve democratic ends?
This is a complicated question, and we leave it for another occasion – except for one crucial point. Democratizing technology policy means rejecting the tenet that, if investors decide to bankroll a project, then this in itself is sufficient to render that project legitimate. The result of democratization would be both to slow and to narrow the field of big data projects. Putting the issue this way shows how much big data incarnates the same steam-roller quality that David F. Noble directly challenged thirty years ago: “There is a war on, but only one side is armed: this is the essence of the technology question today.” But today, the technology question is far more urgent. Strip-mining privacy is an obvious aspect; but vital issues cut in other directions too, including employment impacts, environmental sustainability, and the stampeding corporate takeover of every kind of cultural production.
There’s a lot more “big data” on the horizon. Richard Waters explains that the political economy of big data is becoming engorged by “whole classes of information” as capital prepares a new wave of profit projects, many still far down the line. The companies that provide or parasitize the technology platforms – the sensors and probes, cloud services and software applications, data analytics, skilled personnel, and market support – for yet other emerging applications are intent on sweeping up (“disrupting”) existing fields of practice, wholesale. In their search to control data – our data, not theirs – they are without a care for these “externalities.” Why ought we to let them?
 Bernard Marr, “Big Data: 20 Mind-Boggling Facts Everyone Must Read,” Forbes, September 30, 2015.
 Mayer-Schönberger, Viktor., and Kenneth Cukier. Big Data: A Revolution That Will Transform How We Live, Work, and Think. Boston: Houghton Mifflin Harcourt, 2013, 73-97.
 Steve Lohr, “The Origins of ‘Big Data’: An Etymological Detective Story,” New York Times, February 1, 2013; Francis X. Diebold, “A Personal Perspective on the Origin(s) and Development of “Big Data”: The Phenomenon, the Term, and the Discipline,” PIER Working Paper No. 13-003, November26, 2016.
 Mark Andrejevic and Mark Burdon, “Defining the Sensor Society,” Television and New Media 16 (1), January 2015: 19-36.
 Dan Bouk, “Insurance and the Origin of Big Data,” Le Monde Diplomatique, November 2015, at; Dan Bouk, How Our Days Became Numbered: Risk and the Rise of the Statistical Individual. Chicago: University of Chicago Press, 2015
 Syntek worked on data mining and an analytic system called Genoa project, which was adopted by the Defense Intelligence Agency. See Adam Mayle & Alex, “KnottOutsourcing Big Brother,” The Center for Public Integratiy, Decemeber 17, 2002.
 Total Information Awareness Programs: Funding,Composition, and Oversight Issues, Congressional Research Service Report RL31786, March 21, 2003.
 O’Harrow, Robert. No Place to Hide. New York: Free Press, 2005, 179.
 Murad Ahmed, “Palantir goes from CIA-funded start-up to big business,” Financial Times, June 24, 2015.
 Why CIA’s In-Q-Tel Likes Big Data Player MemSQL, Insider Surveillance, Insider Surveillance, January 16, 2015.
 Of prime interest to NSA, reportedly, are software to process and analyze big data; machine learning and natural language processing; and hardware and infrastructure for data storage and -processing. See Kurt Marko, “The NSA and Big Data: What IT can learn,” InformationWeek, July 18, 2013.
 Jeff Bertolucci, “Military Uses Big Data As Spy Tech,” Information Week 22 April 2013 ; Colin Wood, “How Does the Military Use Big Data?” Emergency Management, January 6, 2014 ; Alex Woodie, “How Analytics Is Driving Military Intelligence,” Datanami, February 3, 2014.
 Executive Office of the President, Big Data Across the Federal Government, March 29, 2012 ; Jason Mick, “Obama Admin. Plans $200M USD “Big Data” Spending Spree,” Daily Tech, April 2, 2012.
 Bernard Marr, “Big Data in Big Oil: How Shell Uses Analytics To Drive Business Success,” Forbes, May 26, 2015.
 The company says that it anonymizes these records. Steve Lohr, “Amino Harnesses Health Industry Data for Consumers,” New York Times, October 20, 2015.
 “IDC Predicts the 3rd Platform Will Bring Innovation, Growth, and Disruption Across All Industries in 2015,” Press Release, December 2, 2014.
 David F. Noble, “Present Tense Technology,” Democracy 3 (2), Spring 1983: 8.
 Richard Waters, “IBM’s acquisition of Weather Co is a test for the big data economy,” Financial Times, October 30, 2015.