what i thought big data was…

I read with interest an article in RWW entitled Big Data: What Do You Think It Is?.

I suppose the term “big data” is akin to “cloud computing” – you hear the term bandied about quite a bit, but often without a clear definition. Unless your talking about about legal presentations, in which you’ll see a zillion definitions of cloud computing trotted out, including, yes, once again, the NIST definition of cloud computing (PDF), amongst various others.

In any event, it was nice to see an article asking the question and (hopefully) providing a clear answer. Perhaps suprisingly (or not), when folks were surveyed on what they thought big data meant, there was no clear consensus:

Harris asked 154 C-level executives from U.S.-based multi-national companies last April a series of questions, one of them being to simply pick the definition of “Big Data” that most closely resembled their own strategies. The results were all over the map. While 28% of respondents agreed with “Massive growth of transaction data” (the notion that data is getting bigger) as most like their own concepts, 24% agreed with “New technologies designed to address the volume, variety, and velocity challenges of big data” (the notion that database systems are getting more complex).  Some 19% agreed with the “requirement to store and archive data for regulatory and compliance,” 18% agreed with the “explosion of new data sources,” while 11% stuck with “Other.”

The author then goes on to attempt to create a generally aapplicable definition:

Essentially, Big Data tools address the way large quantities of data are stored, accessed and presented for manipulation or analysis.

Perhaps rightly or wrongly, that hasn’t quite been the impression I’ve had when reading articles about big data. I’m not at all suggesting that the proposed definition is incorrect. In fact, perhaps the opposite. That being said, when I have in the past seen the term “big data” it was almost always used to describe not the technologies used to store, access or data, but rather primarily (or almost exclusively) analysis  of very large datasets in order to develop new knowledge, ideas or products. Or starting to collect that had previously not been collected (at least not in easily manipulated digital form) for the purposes of such analysis. For example, to figure out, based on purchasing patterns, that someone is pregnant in order to market baby supplies to them.  Or using cell-phone records to detect disease outbreaks, analyzing listening data to figure out how a recording artist becomes a star, analyzing information collected from smart meters to figure out ways to reduce energy consumption, using algorithms to analyze server and device logs to manage IT infrastructure, etc. etc. – see a nice collection of stories in GigaOm.

Of course, all of that necessarily presumes that the technology exists to record and access such large datasets. So that may well be properly considered part of big data, I suppose. Thought perhaps not quite as interesting as what you can do with it. At least to me.

from the “this is potentially very cool if it works” dept.

Came across this story by chance via an article in a Twine update that I was about to delete. Anyway, I caught the name Wolfram so thought I’d take a peek. The name might ring a bell – it’s Wolfram as in Wolfram Research, as in Stephen Wolfram of Mathematica fame. No slouch when it comes to all things mathematical. In any event, apparently in May he will unveil Alpha which, I gather from the article, is a “computational engine” that will actually compute answers to plain language queries. A brief sampling from the article:

For those who are more scientifically inclined, Stephen showed me many interesting examples — for example, Wolfram Alpha was able to solve novel numeric sequencing problems, calculus problems, and could answer questions about the human genome too. It was also able to compute answers to questions about many other kinds of topics (cooking, people, economics, etc.). Some commenters on this article have mentioned that in some cases Google appears to be able to answer questions, or at least the answers appear at the top of Google’s results. So what is the Big Deal? The Big Deal is that Wolfram Alpha doesn’t merely look up the answers like Google does, it computes them using at least some level of domain understanding and reasoning, plus vast amounts of data about the topic being asked about.

It will be interesting to see how (and whether) it actually performs. Given Wolfram’s credentials, the huge effort (undertaken in stealth mode it seems) and data that has gone into it and the positive articles to date (such as the one below) it does sound very promising.

From a legal perspective, it will be interesting to see how content used in the engine has been utilized and how the rights to such content (assuming there is at least some non-public domain material used) have been dealt with. From a tech perspective, it will be very interesting to see what the iron powering this thing will look like, particularly if it starts getting millions of queries a day, how the underlying algorithms work and the extent to which it can evolve and improve over time (I hesitate to use the word “learn”). And from a biz perspective, it will be interesting to see whether Wolfram takes a google-type approach to revenue generation (i.e. ads) or whether he has something else up his sleeve. Check it out for yourself in May.

via Wolfram Alpha is Coming — and It Could be as Important as Google | Twine.