What is Big Data?
Big data revenues are estimated to reach $135 billion by the end of 2019. While the technical definition of big data varies by company, there is general agreement that it is defined by the volume, variety, and velocity of the data. The concept of big data arose from a combination of a number of advancements over the years including technology becoming more affordable, the rise mobile computing, increased social networking, and the ease cloud computing which resulted in increased volumes of stored transactional data which now can be analyzed with tools that combine open-source technology and commodity hardware.
Traditional data systems can be slow whereas big data tools can store, analyze and process large volumes of data at a fast pace. Big data analysis can seek to extract insight out of petabytes of data or more. It is usually based on distributed database architecture where large groups of data need to be broken down into smaller pieces for analysis. Several different computers are often present within the network and work together to find solutions. Whereas a traditional approach might include fixed data fields, with big data there are billions and billions of unstructured data sources that can provide valuable insights into business problems. Big data often leverages this semi-structured and unstructured data to produce data insights.
Free Access to Big Data
There are hundreds of free big data sets available today which transforms the playing field for producing new insights. For example:
- Facebook provides data on users’ Facebook profiles that aren’t private: https://developers.facebook.com/docs/graph-api
- There are tools to help programmatically gather and respond to big data on Twitter: www.big-data.tips/twitter-api
- Amazon Web Services shares public data including the 1000 Genome Project: http://aws.amazon.com/datasets
- Google shares world development indicators and economic data across the world: https://www.google.com/publicdata/directory
- The BROAD Institute provides cancer-related datasets: portals.broadinstitute.org/cgi-bin/cancer/datasets.cgi
- The World Health Organization shares data on hunger, health and disease: www.who.int/gho/en/
- US healthcare data including claim-level Medicare data, epidemiology and population statistics: https://www.healthdata.gov/
- The US Census Bureau has population, geographic and educational data: http://www.census.gov/data.html
- The US Government shares data on everything from climate to crime: http://data.gov
What are your favorite big data analytics tools?