1.2.1.Big Data and Data Mining - sj50179/IBM-Data-Science-Professional-Certificate GitHub Wiki
Foundations of Big Data
"Big Data refers to the dynamic, large and disparate volumes of data being created by people, tools, and machines. It requires new, innovative, and scalable technology to collect, host, and analytically process the vast amount of data gathered in order to derive real-time business insights that relate to consumers, risk, profit, performance, productivity management, and enhanced shareholder value."
βλΉ λ°μ΄ν°λ μ¬λ, λꡬ, κΈ°κ³μμ μμ±λλ λμ μ΄κ³ ν¬κ³ μλ‘ λ€λ₯Έ μμ λ°μ΄ν°λ₯Ό λ§ν©λλ€. μλΉμ, μν, μ΄μ΅, μ±κ³Ό, μμ°μ± κ΄λ¦¬ λ° ν₯μλ μ£Όμ£Ό κ°μΉμ κ΄λ ¨λ μ€μκ° λΉμ¦λμ€ ν΅μ°°λ ₯μ μ»κΈ° μν΄ μμ§λ λ°©λν μμ λ°μ΄ν°λ₯Ό μμ§, νΈμ€ν λ° λΆμμ μΌλ‘ μ²λ¦¬ν μ μλ μλ‘κ³ νμ μ μ΄λ©° νμ₯ κ°λ₯ν κΈ°μ μ΄ νμν©λλ€.β - μΈμ€νΈμ μ (Ernst and Young)
V's of Big Data : velocity, volume, variety, veracity, value
λΉ λ°μ΄ν°μ V : μλ, λ³Όλ₯¨, λ€μμ±, μ§μ€μ±, κ°μΉ
- Velocity is the speed at which data accumulates. Data is being generated extremely fast, in a process that never stops. Near or real-time streaming, local, and cloud-based technologies can process information very quickly. 'μλ'λ λ°μ΄ν°κ° λμ λλ λΉ λ¦μ μ λμ λλ€. λ°μ΄ν°λ μ€λ¨λμ§ μλ νλ‘μΈμ€μμ λ§€μ° λΉ λ₯΄κ² μμ±λκ³ μμ΅λλ€. κ±°μ λλ μ€μκ° μ€νΈλ¦¬λ°, λ‘컬 λ° ν΄λΌμ°λ κΈ°λ° κΈ°μ μ μ 보λ₯Ό λ§€μ° λΉ λ₯΄κ² μ²λ¦¬ν μ μμ΅λλ€.
- Volume is the scale of the data, or the increase in the amount of data stored. Drivers of volume are the increase in data sources, higher resolution sensors, and scalable infrastructure. 'λ³Όλ₯¨'μ λ°μ΄ν°μ κ·λͺ¨ λλ μ μ₯λ λ°μ΄ν° μμ μ¦κ°μ λλ€. λ°μ΄ν° μμ€μ μ¦κ°, κ³ ν΄μλ μΌμ λ° νμ₯ κ°λ₯ν μΈνλΌ λ±μ΄ λ³Όλ₯¨μ μλλ ₯μ λλ€.
- Variety is the diversity of the data. Structured data fits neatly into rows and columns, in relational databases while unstructured data is not organized in a pre-defined way, like Tweets, blog posts, pictures, numbers, and video. 'λ€μμ±'μ λ°μ΄ν°μ λ€μμ±μ λλ€. ꡬ쑰νλ λ°μ΄ν°λ κ΄κ³ν λ°μ΄ν°λ² μ΄μ€μ νκ³Ό μ΄μ κΉλνκ² λ§μ§λ§, ꡬ쑰νλμ§ μμ λ°μ΄ν°λ νΈμ, λΈλ‘κ·Έ κ²μλ¬Ό, μ¬μ§, μ«μ λ° λΉλμ€μ κ°μ΄ 미리 μ μλ λ°©μμΌλ‘ ꡬμ±λμ§ μμ΅λλ€. Variety also reflects that data comes from different sources, machines, people, and processes, both internal and external to organizations. Drivers are mobile technologies, social media, wearable technologies, geo technologies, video, and many, many more. 'λ€μμ±'μ λν λ°μ΄ν°κ° μ‘°μ§ λ΄λΆμ μΈλΆμμ μλ‘ λ€λ₯Έ μμ€, κΈ°κ³, μΈλ ₯ λ° νλ‘μΈμ€μμ μ 곡λλ€λ κ²μ λ°μν©λλ€. μλλ ₯μλ λͺ¨λ°μΌ κΈ°μ , μμ λ―Έλμ΄, μ¨μ΄λ¬λΈ κΈ°μ , μ§λ¦¬ κΈ°μ , λΉλμ€ λ° κ·Έ μΈ μ¬λ¬ κ°μ§κ° μμ΅λλ€.
- Veracity is the quality and origin of data, and its conformity to facts and accuracy. Attributes include consistency, completeness, integrity, and ambiguity. Drivers include cost and the need for traceability. With the large amount of data available, the debate rages on about the accuracy of data in the digital age. Is the information real, or is it false? 'μ νμ±'μ λ°μ΄ν°μ νμ§κ³Ό μΆμ², κ·Έλ¦¬κ³ μ¬μ€κ³Ό μ νμ±μ λν μ ν©μ±μ λλ€. μμ±μλ μΌκ΄μ±, μμ μ±, λ¬΄κ²°μ± λ° λͺ¨νΈμ±μ΄ ν¬ν¨λ©λλ€. μ¬κΈ°μλ λΉμ©κ³Ό μΆμ μ± μκ΅¬κ° ν¬ν¨λ©λλ€. μ΄μ© κ°λ₯ν λ°μ΄ν°μ μμ΄ λ§μμ§μ λ°λΌ λμ§νΈ μλμ λ°μ΄ν°μ μ νμ±μ λν λ Όμμ΄ κ²©λ ¬ν΄μ§λλ€. κ·Έ μ λ³΄κ° μ§μ§μΈκ°μ, μλλ©΄ κ±°μ§μΈκ°μ?
- Value is our ability and need to turn data into value. Value isn't just profit. It may have medical or social benefits, as well as customer, employee, or personal satisfaction. The main reason that people invest time to understand Big Data is to derive value from it. 'κ°μΉ'λ λ°μ΄ν°λ₯Ό κ°μΉλ‘ λ°κΎΈλ μ°λ¦¬μ λ₯λ ₯κ³Ό νμμ λλ€. κ°μΉλ λ¨μ§ μ΄μ€λ§μ΄ μλλλ€. κ³ κ°, μ§μ λλ κ°μΈμ λ§μ‘±λΏλ§ μλλΌ μλ£ λλ μ¬νμ νΈμ΅μ κ°μ§ μ μμ΅λλ€. μ¬λλ€μ΄ λΉ λ°μ΄ν°λ₯Ό μ΄ν΄νκΈ° μν΄ μκ°μ ν¬μνλ μ£Όλ μ΄μ λ λΉ λ°μ΄ν°λ₯Ό ν΅ν΄ κ°μΉλ₯Ό μ°½μΆνκΈ° μν΄μμ λλ€.
Lesson Summary
In this lesson, I have learned:
-
How Big Data is defined by the Vs: Velocity, Volume, Variety, Veracity, and Value.. λΉ λ°μ΄ν°κ° Vs λ‘ μ μλ¨ : Velocity, Volume, Various, Veracity , Value
-
How Hadoop and other tools, combined with distributed computing power, are used to handle the demands of Big Data. λΆμ° μ»΄ν¨ν μ±λ₯κ³Ό κ²°ν©λ Hadoop λ° κΈ°ν ν΄μ΄ λΉ λ°μ΄ν°μ μꡬμ¬νμ μ²λ¦¬νλ λ°©λ²
-
What skills are required to analyse Big Data. λΉ λ°μ΄ν°λ₯Ό λΆμνλ €λ©΄ μ΄λ€ κΈ°μ μ΄ νμνμ§.
-
About the process of Data Mining, and how it produces results. λ°μ΄ν° λ§μ΄λ νλ‘μΈμ€ λ° λ°μ΄ν° λ§μ΄λμ κ²°κ³Ό μμ± λ°©λ²μ λνμ¬