Big Data - bobbae/gcp GitHub Wiki

Big data has been used in the industry to provide customer insights for transparent and simpler products, by analyzing and predicting customer behavior through data derived from social media, GPS-enabled devices, and CCTV footage. The Big Data also allows for better customer retention from insurance companies.

Big Data was originally associated with three key concepts: volume, variety, and velocity.

Current usage of the term Big Data tends to refer to the use of predictive analytics, user behavior analytics, or certain other advanced data analytics methods that extract value from big data, and seldom to a particular size of data set.

The world's technological per-capita capacity to store information has roughly doubled every 40 months since the 1980s.

Based on an IDC report prediction, the global data volume was predicted to grow exponentially from 4.4 zettabytes to 44 zettabytes between 2013 and 2020. By 2025, IDC predicts there will be 163 zettabytes of data.

The term Big Data may refer to a dataset which is too large or too complex for ordinary computing devices to process.

Origin

Big data emerged along with three papers from Google, Google File System(2003), MapReduce(2004), and BigTable(2006).

https://bowenli86.github.io/2016/10/23/distributed%20system/data/Big-Data-and-Google-s-Three-Papers-I-GFS-and-MapReduce/

Dataproc is a fully managed and highly scalable service for running Apache Hadoop, Apache Spark, Apache Flink, Presto, and a long list of open source tools and projects .

Ethics

Big data ethics also known as data ethics refers to systemizing, defending, and recommending concepts of right and wrong conduct in relation to data, in particular personal data and privacy.

Big Data Hadoop Tutorial

https://www.guru99.com/bigdata-tutorials.html