Jason Atchley : eDiscovery : What is Big Data and the Revolution?



Jason Atchley

Rediscovery, Part 1: What Is Big Data And The Big Data Revolution?


A revolution is underway.

This is Part 1 of a multi-part blog series that discuss lessons from the big data revolution and their application to electronic discovery activities. Click here to view Part 2 of this series.  Click the following link to download the related white paper, “Rediscovery – Lessons For Electronic Discovery From The Big Data Revolution.”

In just the time it takes you to finish reading this blog entry, more than 400,000,000 e-mails will be sent.1 As large as that number is, it is not the revolution. Nor is the revolution the more than 1000 new websites that will be created or the more than 600 new blog posts that will be published on WordPress.2 Nor is it the more than 1,200,000 new pieces of content that will be shared on Facebook or the more than 200,000 tweets that will be sent on Twitter.3 Although vast, none of these growing oceans of data – nor the many others like them – are the revolution.
The availability of these vast electronic oceans is a new phenomenon, made possible by the intersection of cheap storage and processing, novel software solutions, and ubiquitous connectivity. From these new oceans, businesses and academics of all types are extracting astonishing intelligence. For example, Google has discovered that it can mine the searches users run to identify regional flu outbreaks a week or more before they are identified by the Centers for Disease Control and Prevention.4 But, these novelties are not the revolution either.
The big data revolution underway is a revolution of management.
It is axiomatic that you cannot manage what you cannot measure, and the advent of big data makes it possible to measure more than ever before. In every area, from the purchasing behaviors of customers, to the arrival times of airplanes, big data is being mined for insight and intelligence that can be used to make better decisions.
This shift, towards data-driven decision making, is the revolution.
Data-driven decision making is equally valuable to lawyers. From predicting probable patent case outcomes to quantifying the profitability of a law firm’s lawyers, data-driven decision making is already being leveraged throughout the legal field. In the context of electronic discovery, data-driven decision making has the potential to profoundly increase efficiency, quality, and control for corporate and law firm practitioners alike. 
This blog entry is the first in a four-part series that examine: big data; data-driven decision making; how corporations and law firms are leveraging it; and how you it applies in the context of electronic discovery.  We begin below with a deeper dive into just what we mean when we say “big data.” 

What is Big Data?

Big data is traditionally characterized by reference to four V’s: velocity, variety, volume, and veracity.5
  • Velocity refers to the speed at which data is created.6 The proliferation of network-connected users and network-connected devices has dramatically increased the speed at which new data is being generated. For example, Google receives and processes more than 2,000,000 queries every minute.7 
  • Variety refers to the diversity of structured and unstructured formats in which data is created.8 In earlier eras of computing, the vast majority of the data created and stored was structured data housed in relational databases; now, 90% of the data generated by organizations is unstructured (e.g., text documents, multimedia files, etc.).9
  • Volume refers to the immense quantities of data being created.10 As networked users, devices, and sensors have proliferated, the volumes of data being generated has increased exponentially. Google CEO Eric Schmidt has explained that from “the dawn of civilization through 2003” approximately 5 exabytes (5 billion gigabytes) of information was created but, as of 2010, that much information was being “created every 2 days.”11
  • Veracity, refers to the reliability or accuracy of the data.12 As more and more attention has been paid to this data and what can be done with it, the veracity of the data and the analyses performed upon it has also become an important dimension of big data.“ Having a lot of data in different volumes coming in at high speed is worthless if that data is incorrect.”13 Today, 1 in 3 business leaders do not trust the information upon which they rely to make decisions.14
Today, a majority of companies in the United States have at least 100,000 gigabytes of data stored.15 In a 2013 Gartner survey, 64% of the organizations surveyed indicated that they had invested or planned to invest in leveraging those stores of data.16 Nearly half were looking to big data projects as a way to improve process efficiency.17

This is Part 1 of a multi-part blog series that discuss lessons from the big data revolution and their application to electronic discovery activities. Click here to view Part 2 of this series.  Click the following link to download the related white paper, “Rediscovery – Lessons For Electronic Discovery From The Big Data Revolution.”

[1] Josh James, How Much Data is Created Every Minute?, Domosphere (Jun. 8, 2012), http://www.domo.com/blog/2012/06/how-much-data-is-created-every-minute/.[2] Id.[3] Id.[4] Miguel Helft, Google Uses Searches to Track Flu’s Spread, The New York Times (Nov. 11, 2008), http://www.nytimes.com/2008/11/12/technology/internet/12flu.html.[5] Mark van Rijmenam, Why The 3V’s Are Not Sufficient To Describe Big Data, BigData Startups (Aug. 7, 2013), http://www.bigdata-startups.com/3vs-sufficient-describe-big-data/.[6] Id.[7] Id.[8] Id.[9] Id.[10] Id.[11] Marshall Kirkpatrick, Google CEO Schmidt: “People Aren’t Ready for the Technology Revolution, ReadWrite Web (Aug. 4, 2010),http://readwrite.com/2010/08/04/google_ceo_schmidt_people_arent_ready_for_the_tech#awesm=~osi7er7BOIOYkx.[12] See supra note 5.[13] Id.[14] The FOUR V’s of Big Data, The Big Data & Analytics Hub, http://www.ibmbigdatahub.com/infographic/four-vs-big-data (last visited Jan. 9, 2014).[15] Id.[16 Gartner Survey Reveals That 64 Percent of Organizations Have Invested or Plan to Invest in Big Data in 2013, Gartner (Sep. 23, 2013),http://www.gartner.com/newsroom/id/2593815.[17] Id.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s