What is Big Data - dfcrosby/Blogs GitHub Wiki

I have clients that work with what I consider to be Big Data. Recently, one of those clients was asked: “What the hell is [B]ig [D]ata, anyway?” and he replied “Nobody knows what it means.” Challenge accepted.

In the movie Big (which came out before the Big Data era), Tom Hanks character Josh is turned away from a carnival ride because he is not tall enough (“You must be at least this BIG to ride this ride.”) As he walks away in despair he comes upon the now famous Zoltar Speaks arcade machine and drops in a coin—the front of the machine lights-up “Zoltar Says—Make Your Wish.” Josh says out loud, “I wish I was Big.” He wakes up the next morning and he is big, not just in physical size, but in age—he is a grown-up. He spends the rest of the movie learning about what it means to be “big.”

The word choice in these contexts is appropriate because the word “big” has so many meanings and many of them are appropriate. While Josh just wanted to be taller, he also became older and all the little things in his life became bigger, as well.

The same applies to Big Data. The more common definitions typically revolve around quantifiable things like volume, velocity and variety. So, Big Data is big in terms of the volume or quantity of data—there is a lot of it, it comes in very fast, and it includes a lot of different types of information (numbers, words, pictures, video and metadata). But, big also refers to the scope and ubiquity of the data as well as the analytics.

Traditionally, big data was hard to analyze because of its size, velocity and variety—large data sets of different types of data required different custom analytics that produced incompatible outputs requiring further effort to normalize. One of the reasons we can talk about Big Data today is because we now have tools that make it user friendly enabling us to recognize its value and use it to our benefit. And those tools continue to improve and enable us to focus on the data and customize the analytics to suit our changing needs. These tools are an enabling technology—so big data is also about the analytics.

Consider wearable device data—you wear a device that monitors one or usually many different aspects of your physical being—heart rate, temperature, blood pressure, respiration rate, physical motion. That device can be linked to your smart phone which collects that information and supplements it with geolocation data. And your smart phone allows additional data to be provided—location can be used to collect environmental data, if your smart phone does not provide this directly. All this data is collected and uploaded to the cloud where a vendor’s app tries to make sense of it. Some systems don't do as good a job as you might think. A single user can generate 100s of diverse data points per day. Hundreds or thousands of users mean that millions of data points can be generated every month. That’s Big Data—it has volume, velocity and variety. Although small in comparison to Facebook, where approximately 510 comments are posted, 293,000 statuses are updated, and over 136,000 photos are uploaded every minute.

And this may be only the tip of the iceberg. Big Data produces more big data—your personal health data can be analyzed to detect trends (more data about the Big Data!), based on assumptions, and make suggestions about your needs. So if your data indicates that you run in the morning every Tuesday and Thursday, the app can check the weather and make suggestions—it’s supposed to rain Tuesday morning so maybe you should go to the gym instead and run on Wednesday which is supposed to be clear and the mid-60s. Or it could simply suggest you wear rain gear or dress warmly. By combining Big Data from multiple sources, we can provide benefits on an individualized basis. One expectation is that by monitoring personal health information, we will one day be able to predict the onset of disease and disease symptoms. Someday your smart phone may save your life – it will tell you to go see your doctor or go to the hospital because you are about to have a heart attack.

We are not there yet; we are still learning what we can and cannot do with Big Data. The first small steps relate to analyzing trends. By analyzing location and search data, Google has demonstrated the power of trends. You might have heard of one well known application - Google Flu Trends which shows where flu outbreaks are occurring. The value of big data comes from the ability of the analytics to put it in context, make it relevant to our interests and enable us to make better decisions. Big Data can be worldwide in scope, but have local impact.

Big Data has the possibility of being transformative to the healthcare community and life sciences industry. The question we are asking at Nixon Peabody is whether our legal and regulatory structures are ready.