Mine is 6 Terrabytes. How Big is Yours?

spaceWhat exactly is Big Data anyway?

According to the internet, Big Data is:

  • Any collection of data sets so large and complex that it becomes difficult to process using traditional data processing applications.
  • Any amount of data that’s too big to be handled by one computer.
  • An easily scalable system of unstructured data with accompanying tools that can efficiently pull structured datasets.
  • [Big Data is] what happened when the cost of storing information became less than the cost of making the decision to throw it away.
  • [Big Data is] when the size of the data becomes part of the problem.

There you have it. Big Data defined. Are you feeling well informed now? The term definitely has a moving target as a definition, so here are some thoughts that will hopefully help to improve your aim.

Big Data can certainly involve a galactic scale in it’s definition, but if we step away from the data science group and toward the business applications people, here are two definitions that help bring it down to Earth:

Big Data is…

1) When the volume, velocity and variety of data exceeds an organization’s storage or compute capacity for accurate and timely decision making.
2) Who cares? It’s what you’re doing with it that matters.

big dataThese statements imply a simple, time-honored use of two words to help clear the confusion: data vs. information. Data is disparate facts, quantities or characters of raw content: individual items you purchase, each GPS coordinate your phone logs, each step you take, and so on. Information is organized data providing some insight or answer to a question. It is data made useful.

For example,
DATA:
The distance from the earth to the moon is 238,900 miles.
A dollar bill has a thickness of 0.0043 inches

INFORMATION:
It would require 3.53 trillion one dollar bills to make a stack that would reach the moon.

moonThe usefulness of that information might be debatable, but it does relate two disparate pieces of data and provide an answer to a question.

Scale, perspective, and context are three similar and often-used words when considering or discussing Big Data. It’s important to understand the scale or scope or size of Big, what the context of a problem might be, and the perspective of an audience with regard to the problem/answer.

Appreciation of scale:
Item #1: With Warren Buffet’s vast wealth, it would require 71 of his fortunes to build a stack of dollar bills to the moon.

Item #2: The US Debt Clock reports 2.95 trillion as the current US Tax Revenue.

Thus, even Mr. Buffet’s vast wealth can’t even begin to solve the problem of building the money stack to the moon, but the 316 million “contributing” US citizens would nearly complete the stack to the moon right now. This illustrates a scalable solution to a problem. One extraordinary contributor to a solution falls far short of the goal, but a large number of ordinary contributors achieves the solution quickly.

Having a truly vast amount of data and applying a scalable solution is the quantum leap in problem solving that the data science groups are working on. Using the daily contributions of billions of people’s lives as input data to be organized into informed answers to problems of previously unimagined scale (ex: cancer, disease, genomics, why people like that TV show or that hair style).

When moving from the galactic to the more earthly scale of quantum leaps, businesses of all sizes, as well as individuals, will be using similar tools and applying similar techniques to discover answers to their larger questions. These scalable techniques and the benefits of previously related data sets will certainly find their way into our daily use. But if your organization’s storage or compute capacity for accurate and timely decision making centers around spreadsheets, Word documents and static web pages, you will likely have difficulty trying to find some good questions, much less any answers.

Start or accelerate your pace toward taming Big Data. And remember to Make It Useful.