There's a logical fallacy that mathematicians are fond of quoting when
humans exercise their considerable built-in pattern-recognition
abilities to draw conclusions that could just be coincidence:
correlation does not imply causality. But, as Kenneth Cukier and Viktor
Mayer-Schönberger argue in
Big Data: A Revolution That Will Transform How We Live, Work, and Think,
what Big Data brings with it is a profound shift in our attempts to
understand How the World Works. In their view, correlation may now be
good enough all by itself.
For centuries we have focused on causation as a way of deriving general
principles from specific cases. For example, once we understood that
plants grew in response to ready supplies of sunlight, water and
nutrients in the soil, we were able to apply this knowledge to promote
more rapid and reliable growth.
What's happening now is that by churning through huge masses of data
we can find patterns that would not be trustworthy in smaller samples,
and derive value from them whether or not we understand the underlying
causality.
If studying millions of patient records shows that this weird complex
of symptoms indicates a particular rare illness and this particular
drug ameliorates it, does it matter why? The result will be to kill off
disciplines like sampling and habits of mind like the desire for
exactitude and causality. Being approximately right is good enough; we
don't need to risk being exactly wrong.
Cukier is the data editor of
The Economist; Mayer-Schönberger, an Oxford professor, is best known for his 2009 book
Delete,
in which he proposed the "right to be forgotten". This book seems to
reflect their disparate interests. The first half talks about the state
of Big Data, the kinds of new insights it's bringing and the changes
it's making in various industries, while the second studies its risks.
It's tempting to attribute them to Cukier and Mayer-Schönberger
respectively, but it's always dangerous to guess the mechanics of
collaboration — the sample size is too small.
The state-of-the-art story is
relatively familiar. Quantity can compensate for some lack of quality. Medical diagnostics. Spotting
flu outbreaks using Google's search data.
Moneyball (about which, I pause to complain that a book citing a non-fiction work should cite the original
book rather than the
movie).
The risks story is more interesting once it gets past the obligatory references to
Minority Report and
Google's 41 shades of blue as examples of the potential "dictatorship of Big Data".
Big Data profoundly changes the problem of privacy — another reason why
the US's data-driven companies are lobbying so hard to use the review of
data protection law to weaken it. One of the fundamental data
protection principles is that consent should be obtained for a change of
use. But secondary uses are where much of the value of Big Data is
derived. No one, for example, consented to the use of their search
engine queries to track flu outbreaks, yet using the data in this way is
clearly a public benefit. At least, it is until or unless some
enterprising government decides that putting all the people in those
areas under quarantine is a good idea.
Cukier and Mayer-Schönberger end up suggesting that we need a shift
to accountability for the use of data from the present situation of
restricting how it may be used. This is an idea we hear a lot these
days, and it suffers from the problem that sometimes the damage of
disclosure may be bad enough that no amount of accountability can fix
it. Plus, as
Simon Davies, the founder of Privacy International, is so fond of saying, "Companies are pathologically unable to regulate themselves".
Overall, this is probably the best-rounded book on Big Data to date.
Most just cheerlead, while a few are all doom and gloom. This one aims
at balance and a provides thorough grounding.
Source:
http://www.zdnet.com/big-data-book-review-7000016654/