what i thought big data was…

I read with interest an article in RWW entitled Big Data: What Do You Think It Is?.

I suppose the term “big data” is akin to “cloud computing” – you hear the term bandied about quite a bit, but often without a clear definition. Unless your talking about about legal presentations, in which you’ll see a zillion definitions of cloud computing trotted out, including, yes, once again, the NIST definition of cloud computing (PDF), amongst various others.

In any event, it was nice to see an article asking the question and (hopefully) providing a clear answer. Perhaps suprisingly (or not), when folks were surveyed on what they thought big data meant, there was no clear consensus:

Harris asked 154 C-level executives from U.S.-based multi-national companies last April a series of questions, one of them being to simply pick the definition of “Big Data” that most closely resembled their own strategies. The results were all over the map. While 28% of respondents agreed with “Massive growth of transaction data” (the notion that data is getting bigger) as most like their own concepts, 24% agreed with “New technologies designed to address the volume, variety, and velocity challenges of big data” (the notion that database systems are getting more complex).  Some 19% agreed with the “requirement to store and archive data for regulatory and compliance,” 18% agreed with the “explosion of new data sources,” while 11% stuck with “Other.”

The author then goes on to attempt to create a generally aapplicable definition:

Essentially, Big Data tools address the way large quantities of data are stored, accessed and presented for manipulation or analysis.

Perhaps rightly or wrongly, that hasn’t quite been the impression I’ve had when reading articles about big data. I’m not at all suggesting that the proposed definition is incorrect. In fact, perhaps the opposite. That being said, when I have in the past seen the term “big data” it was almost always used to describe not the technologies used to store, access or data, but rather primarily (or almost exclusively) analysis  of very large datasets in order to develop new knowledge, ideas or products. Or starting to collect that had previously not been collected (at least not in easily manipulated digital form) for the purposes of such analysis. For example, to figure out, based on purchasing patterns, that someone is pregnant in order to market baby supplies to them.  Or using cell-phone records to detect disease outbreaks, analyzing listening data to figure out how a recording artist becomes a star, analyzing information collected from smart meters to figure out ways to reduce energy consumption, using algorithms to analyze server and device logs to manage IT infrastructure, etc. etc. – see a nice collection of stories in GigaOm.

Of course, all of that necessarily presumes that the technology exists to record and access such large datasets. So that may well be properly considered part of big data, I suppose. Thought perhaps not quite as interesting as what you can do with it. At least to me.