सड़कनामा : May 23, 2015

Saturday, May 23, 2015

Missing the big picture on Big Data | G Sampath

Why is the government so anxious to make the ownership of an Aadhar card, which is officially voluntary, practically mandatory? Why did the online fashion store Myntra.com recently turn app-only, which means you can’t shop on it through the website or mobile browser but only by downloading the app?

Why is Facebook developing solar-powered drones to beam Internet from the sky? And why do both Facebook and Gmail keep badgering you for your cell phone number?

What is the need for something called internet.org when there is already an Internet out there? Why does our government want to invest in ‘Smart Cities’ when it is unwilling to invest adequately on education?

The answer to all these questions, as Bob Dylan might have said, is flowing in optic fibre cables. If not, it is definitely stored in a non-meteorological cloud somewhere. Its name: Big data.

“Big data is no different from gold; it is firstly, and ultimately, a commodity”

The UID-Aadhar project will be the largest such citizen database on the planet. The reason Myntra wants its customers to transact only from apps is that consumer data is most valuable when tied to specific individuals, as it enables a closer tracking of user behaviour. It is also why Google, Facebook, and other tech companies want your mobile number.

It is because Mark Zuckerberg does not possess a search engine like Google does — which, as the entry point to the Internet for most people, is the ultimate instrument for generating consumer data — that he wants to start another, smaller ‘internet’ for those who cannot afford the full-size one.

As for Smart Cities, it is a blatant scheme to ensure that every citizen is dragooned into a digital grid at all times, so that she secretes a non-stop data trail from birth to death. This data trail, or big data would be continuously captured and processed for optimal value extraction (read monetisation).

If a world governed on the basis of big data is indeed the future, then what does this bode for humanity? The dominant consensus right now is overwhelmingly positive. But if we delve deeper, the use of big data and what it would entail for the future of human lives will unravel a problematic picture.

ILLUSTRATION: SATWIK GADE

According to the optimists, big data — in combination with what is described as the Internet of Things (IoT), a world where the vast majority of gadgets, machines, and humans are connected to the internet and to each other — promises a future where all important decisions about business, life, and society would be taken purely (and happily?) on the basis of data.

Human judgment, which is typically partial, flawed, and conflicted — and often distorted by factors that are not measurable, and do not compute, such as moral qualms, or empathy — need never come into the picture. This, they believe, would make for greater efficiency, higher productivity, and the optimal utilisation of resources for the greatest good of the greatest number.

There is a name for such decision-making driven purely by big data analytics. It’s called ‘evidence-based decision-making’. Its semantic twin is ‘actionable information’. Evidence-based decision-making can and does pay off brilliantly in business operations — this is what enterprise software solutions do, and they were indeed the tech precursors of big data analytics. It is ideal also for, say, predicting the weather, or earthquakes, and for identifying bankable talent in team sports, as the bestselling book/ film Moneyball showed.

Besides, big data already plays a major role in the management of infrastructure and industry, not to mention security, military affairs, health, and geopolitics, as the Snowden leaks made amply clear.

Purveyors of technological determinism like to argue that, with the advances in cloud and mobile computing, the non-stop generation of data on a never-before scale is bound to change how humans think, and therefore act and live.

“The graver threat is a digital replay of colonial era exploitation, with data replacing mineral resources and raw materials as the source of value”

This was the contention of Chris Andersen, the former editor-in-chief of Wired magazine, in a widely debated piece titled “The End of Theory: The data deluge makes the scientific method obsolete”. Andersen’s logic is simple: since processing of big data can give us correlations that can predict accurately, causation is no longer relevant. So theory, or explanations of the world based on the model of cause and effect (which is how humans have traditionally made sense of the world), are now obsolete.

In other words, we no longer need to think. Collect data, feed them into the maw of analytics, and wait for solutions to emerge. In Andersen’s words, “With enough data, the numbers speak for themselves.”

Well, this is no longer solely the chartered accountant’s motto. It is the article of faith among the world’s movers and shakers.

The World Economic Forum, an annual gathering of the global power elite, notes in its 2014 Global Technology Report that data is “a new form of asset class”, adding, “data are now the equivalent of oil or gold. And today we are seeing a data boom rivaling the Texas oil boom of the 20th century and the San Francisco gold rush of the 1800s.”

In his foreword to this very report, John Chambers, the chairman and CEO of Cisco Systems, points out that, with the number of app downloads growing from 10 billion in 2010 to 77 billion in 2014, there is “a $19 trillion global opportunity to create value over the next decade.”

As per industry estimates, in India alone, it is set to touch $1 billion in 2015.

Will politics surrender to analytics?

No doubt, there is overpowering business logic to the rise and rise of big data analytics. But does this mean it should get a leading role in the domain of politics and public policy?

The answer to this question may already have been decided, going by the frequency with which “evidence-based policy-making” and “actionable information” pops up in government documents and the reports of bodies such as the United Nations or the World Bank.

India, too, is well and truly on board the bandwagon. On the one hand, the large pool of English-speaking engineering/ mathematics graduates makes India an attractive destination for the off-shoring of big data analytics, which Indian tech entrepreneurs are well placed to exploit.

On the other hand, with several citizen-to-government transactions, such as passport applications and tax payments migrating online, and the state unwilling to relax its grip on Aadhaar, and plans afoot to digitise medical records, it is clear that Big Data will come to play a major role.

Besides, many examples have been cited to prove that big data can be harnessed for social good. We have been told that cellphone call logs can help locate survivors during a natural disaster. Online searches can yield data to predict a disease outbreak (the principle behind applications such as Google Flu Trends).

And given the billions of dollars — the preferred term is ‘value’ — riding on the so-called ‘information economy’, it is unlikely that the raw material for data manufacture (also known as ‘people’) will have much say in the matter. We are already beginning to see this in India, with ‘evidence-based decision-making’ being trotted out as an argument against the precious few welfare schemes still left for India’s poor, such as the public distribution system (where data show that it is leaky), or the rural jobs scheme (where data show it is riddled with corruption).

A policy determined by such evidence alone would seek to scrap both schemes and replace them with cash transfers, as the incumbent government seems keen to do. But big data, by definition, is the wrong tool with which to understand the social consequences of giving cash instead of food grains — a critical policy input that can come only from politics, not analytics.

Where is big data taking us?

The exponential growth of big data analytics, and its increasing utilisation in government policy, is premised on many things, including growth in IT infrastructure, the digital inclusion of those hitherto excluded by poverty, and an overarching colonisation of the analog universe by the digital.

But what it needs above all is the erasure of the very concept of privacy. Many of us have already voluntarily surrendered our privacy, either for the sake of convenience or to save costs — by ticking the ‘I accept’ box when we sign on to a social media or email service.

But privacy — while critical for a functional democracy – is not the only casualty of big data. The graver threat is a digital replay of colonial era exploitation, with data replacing mineral resources and raw materials as the source of value.

We already have a bizarre scenario in several developing countries (including India) — a scenario that is somehow no longer perceived as bizarre — where people don’t have toilets (an amenity with tremendous public health consequences) but own cell phones, and their mobile data is being captured for ‘actionable information’ on the status of their health, and for ‘evidence-based’ framing of health policy.

“It promises a future where decisions about business, life, and society will be taken purely on the basis of data”

It is in the context of such anomalies that a term coined by a Tanzanian health minister becomes relevant: data colonialism.

The expression gained traction when Najeeb Al Shorbaji, Director, Knowledge and Management at the World Health Organisation (WHO), gave a speech in 2013, titled ‘Data Colonialism’. Shorbaji used the term to describe a scenario where the West has been mining African nations for health data without the Africans benefiting in any way.

He uses the same data-as-gold metaphor used by the WEF report to draw an analogy between the flow of raw materials from the colonies to Europe, and the flow of data from the erstwhile colonies to the developed West today. The objective in both cases is the same: extraction of value.

Today, useless data (or ‘data exhaust’ as it’s called) has to flow from the developing markets to the West (via Google or Amazon or their equivalent) in order to be commodified as information. Shorbaji illustrates the social dynamic of data-driven exploitation with an example from a domain that is usually touted as a poster boy for the benefits of big data analytics: healthcare.

He describes how impoverished Africans, who are not even aware of the concept of informed consent, living as they do in countries with no legislative framework for data collection and usage, agree to become guinea pigs for risky clinical trials in exchange for a little money or free medical treatment.

The animating logic of big data

This brings us to the philosophical basis of big data, which is rooted in the abstractions of statistics. It is well known that statistics grew as a discipline to address the needs of the modern state, which had to administer populations on a big scale. In big data, the post-modern state has found a fitting collaborator for monitoring, and pre-emptively controlling, sections of the populations that, in circumstances of prolonged deprivation or injustice, can be prone to unseemly eruptions against those who control the levers of the state.

Typically, the ‘big’ of big data is construed as a reference to the sheer volume, velocity (of generation) and variety (of sources) of the datasets involved. But perhaps the real reason why ‘big’ data is big is that it seeks to decisively appropriate human agency and transfer it to data and algorithms.

Even the term ‘actionable information’, often invoked in the context of big data, suggests that it is not humans who have to decide what is to be done, and therefore take responsibility for the choices being made, but somehow the data or information itself which decides (for humanity) the action to be taken. This, finally, is the inescapable social cost of big data analytics.

And so, finally, we come to the big question about big data: Can analytics find solutions to humanity’s problems? Yes, but not to the problems that human beings choose not to address.

Many global problems have their origins in deprivation. We don’t need big data analytics to tell us this. It is common sense that if such widespread deprivation is addressed – which requires solving the problem of extreme inequalities in wealth and income – a lot many problems, such as hunger and disease, can be resolved.

In fact, there already is ample data, including an OECD study, which confirms that reducing inequality boosts economic growth. But this has hardly prompted a corresponding change in government policies anywhere. While evidence-based policy-making may be good for business and the tech industry, it is only politics-driven policy-making that can make a positive difference to people’s lives. For, as data evangelists never tire of pointing out, big data is no different from gold — it is firstly, and ultimately, a commodity.

The writer is the Social Affairs Editor of The Hindu. Email: gsampath.thehindu@gmail.com

Pages

Saturday, May 23, 2015

Missing the big picture on Big Data | G Sampath