1.2.1.Big Data and Data Mining - sj50179/IBM-Data-Science-Professional-Certificate GitHub Wiki

Foundations of Big Data

"Big Data refers to the dynamic, large and disparate volumes of data being created by people, tools, and machines. It requires new, innovative, and scalable technology to collect, host, and analytically process the vast amount of data gathered in order to derive real-time business insights that relate to consumers, risk, profit, performance, productivity management, and enhanced shareholder value."

β€œλΉ… λ°μ΄ν„°λŠ” μ‚¬λžŒ, 도ꡬ, κΈ°κ³„μ—μ„œ μƒμ„±λ˜λŠ” 동적이고 크고 μ„œλ‘œ λ‹€λ₯Έ μ–‘μ˜ 데이터λ₯Ό λ§ν•©λ‹ˆλ‹€. μ†ŒλΉ„μž, μœ„ν—˜, 이읡, μ„±κ³Ό, 생산성 관리 및 ν–₯μƒλœ μ£Όμ£Ό κ°€μΉ˜μ™€ κ΄€λ ¨λœ μ‹€μ‹œκ°„ λΉ„μ¦ˆλ‹ˆμŠ€ 톡찰λ ₯을 μ–»κΈ° μœ„ν•΄ μˆ˜μ§‘λœ λ°©λŒ€ν•œ μ–‘μ˜ 데이터λ₯Ό μˆ˜μ§‘, ν˜ΈμŠ€νŒ… 및 λΆ„μ„μ μœΌλ‘œ μ²˜λ¦¬ν•  수 μžˆλŠ” μƒˆλ‘­κ³  ν˜μ‹ μ μ΄λ©° ν™•μž₯ κ°€λŠ₯ν•œ 기술이 ν•„μš”ν•©λ‹ˆλ‹€.” - μ–ΈμŠ€νŠΈμ™€ 영 (Ernst and Young)

V's of Big Data : velocity, volume, variety, veracity, value

λΉ… λ°μ΄ν„°μ˜ V : 속도, λ³Όλ₯¨, λ‹€μ–‘μ„±, 진싀성, κ°€μΉ˜

  1. Velocity is the speed at which data accumulates. Data is being generated extremely fast, in a process that never stops. Near or real-time streaming, local, and cloud-based technologies can process information very quickly. '속도'λŠ” 데이터가 λˆ„μ λ˜λŠ” λΉ λ¦„μ˜ μ •λ„μž…λ‹ˆλ‹€. λ°μ΄ν„°λŠ” μ€‘λ‹¨λ˜μ§€ μ•ŠλŠ” ν”„λ‘œμ„ΈμŠ€μ—μ„œ 맀우 λΉ λ₯΄κ²Œ μƒμ„±λ˜κ³  μžˆμŠ΅λ‹ˆλ‹€. 거의 λ˜λŠ” μ‹€μ‹œκ°„ 슀트리밍, 둜컬 및 ν΄λΌμš°λ“œ 기반 κΈ°μˆ μ€ 정보λ₯Ό 맀우 λΉ λ₯΄κ²Œ μ²˜λ¦¬ν•  수 μžˆμŠ΅λ‹ˆλ‹€.
  2. Volume is the scale of the data, or the increase in the amount of data stored. Drivers of volume are the increase in data sources, higher resolution sensors, and scalable infrastructure. 'λ³Όλ₯¨'은 λ°μ΄ν„°μ˜ 규λͺ¨ λ˜λŠ” μ €μž₯된 데이터 μ–‘μ˜ μ¦κ°€μž…λ‹ˆλ‹€. 데이터 μ†ŒμŠ€μ˜ 증가, 고해상도 μ„Όμ„œ 및 ν™•μž₯ κ°€λŠ₯ν•œ 인프라 등이 λ³Όλ₯¨μ˜ 원동λ ₯μž…λ‹ˆλ‹€.
  3. Variety is the diversity of the data. Structured data fits neatly into rows and columns, in relational databases while unstructured data is not organized in a pre-defined way, like Tweets, blog posts, pictures, numbers, and video. 'λ‹€μ–‘μ„±'은 λ°μ΄ν„°μ˜ λ‹€μ–‘μ„±μž…λ‹ˆλ‹€. κ΅¬μ‘°ν™”λœ λ°μ΄ν„°λŠ” κ΄€κ³„ν˜• λ°μ΄ν„°λ² μ΄μŠ€μ˜ ν–‰κ³Ό 열에 κΉ”λ”ν•˜κ²Œ λ§žμ§€λ§Œ, κ΅¬μ‘°ν™”λ˜μ§€ μ•Šμ€ λ°μ΄ν„°λŠ” νŠΈμœ—, λΈ”λ‘œκ·Έ κ²Œμ‹œλ¬Ό, 사진, 숫자 및 λΉ„λ””μ˜€μ™€ 같이 미리 μ •μ˜λœ λ°©μ‹μœΌλ‘œ κ΅¬μ„±λ˜μ§€ μ•ŠμŠ΅λ‹ˆλ‹€. Variety also reflects that data comes from different sources, machines, people, and processes, both internal and external to organizations. Drivers are mobile technologies, social media, wearable technologies, geo technologies, video, and many, many more. 'λ‹€μ–‘μ„±'은 λ˜ν•œ 데이터가 쑰직 내뢀와 μ™ΈλΆ€μ—μ„œ μ„œλ‘œ λ‹€λ₯Έ μ†ŒμŠ€, 기계, 인λ ₯ 및 ν”„λ‘œμ„ΈμŠ€μ—μ„œ μ œκ³΅λœλ‹€λŠ” 것을 λ°˜μ˜ν•©λ‹ˆλ‹€. 원동λ ₯μ—λŠ” λͺ¨λ°”일 기술, μ†Œμ…œ λ―Έλ””μ–΄, μ›¨μ–΄λŸ¬λΈ” 기술, 지리 기술, λΉ„λ””μ˜€ 및 κ·Έ μ™Έ μ—¬λŸ¬ 가지가 μžˆμŠ΅λ‹ˆλ‹€.
  4. Veracity is the quality and origin of data, and its conformity to facts and accuracy. Attributes include consistency, completeness, integrity, and ambiguity. Drivers include cost and the need for traceability. With the large amount of data available, the debate rages on about the accuracy of data in the digital age. Is the information real, or is it false? 'μ •ν™•μ„±'은 λ°μ΄ν„°μ˜ ν’ˆμ§ˆκ³Ό 좜처, 그리고 사싀과 정확성에 λŒ€ν•œ μ ν•©μ„±μž…λ‹ˆλ‹€. μ†μ„±μ—λŠ” 일관성, μ™„μ „μ„±, 무결성 및 λͺ¨ν˜Έμ„±μ΄ ν¬ν•¨λ©λ‹ˆλ‹€. μ—¬κΈ°μ—λŠ” λΉ„μš©κ³Ό 좔적성 μš”κ΅¬κ°€ ν¬ν•¨λ©λ‹ˆλ‹€. 이용 κ°€λŠ₯ν•œ λ°μ΄ν„°μ˜ 양이 λ§Žμ•„μ§μ— 따라 디지털 μ‹œλŒ€μ˜ λ°μ΄ν„°μ˜ 정확성에 λŒ€ν•œ λ…ΌμŸμ΄ κ²©λ ¬ν•΄μ§‘λ‹ˆλ‹€. κ·Έ 정보가 μ§„μ§œμΈκ°€μš”, μ•„λ‹ˆλ©΄ κ±°μ§“μΈκ°€μš”?
  5. Value is our ability and need to turn data into value. Value isn't just profit. It may have medical or social benefits, as well as customer, employee, or personal satisfaction. The main reason that people invest time to understand Big Data is to derive value from it. 'κ°€μΉ˜'λŠ” 데이터λ₯Ό κ°€μΉ˜λ‘œ λ°”κΎΈλŠ” 우리의 λŠ₯λ ₯κ³Ό ν•„μš”μž…λ‹ˆλ‹€. κ°€μΉ˜λŠ” 단지 이윀만이 μ•„λ‹™λ‹ˆλ‹€. 고객, 직원 λ˜λŠ” 개인적 만쑱뿐만 μ•„λ‹ˆλΌ 의료 λ˜λŠ” μ‚¬νšŒμ  νŽΈμ΅μ„ κ°€μ§ˆ 수 μžˆμŠ΅λ‹ˆλ‹€. μ‚¬λžŒλ“€μ΄ λΉ… 데이터λ₯Ό μ΄ν•΄ν•˜κΈ° μœ„ν•΄ μ‹œκ°„μ„ νˆ¬μžν•˜λŠ” 주된 μ΄μœ λŠ” λΉ… 데이터λ₯Ό 톡해 κ°€μΉ˜λ₯Ό μ°½μΆœν•˜κΈ° μœ„ν•΄μ„œμž…λ‹ˆλ‹€.

Lesson Summary

In this lesson, I have learned:

  • How Big Data is defined by the Vs: Velocity, Volume, Variety, Veracity, and Value.. λΉ… 데이터가 Vs 둜 μ •μ˜λ¨ : Velocity, Volume, Various, Veracity , Value

  • How Hadoop and other tools, combined with distributed computing power, are used to handle the demands of Big Data. λΆ„μ‚° μ»΄ν“¨νŒ… μ„±λŠ₯κ³Ό κ²°ν•©λœ Hadoop 및 기타 툴이 λΉ… λ°μ΄ν„°μ˜ μš”κ΅¬μ‚¬ν•­μ„ μ²˜λ¦¬ν•˜λŠ” 방법

  • What skills are required to analyse Big Data. λΉ… 데이터λ₯Ό λΆ„μ„ν•˜λ €λ©΄ μ–΄λ–€ 기술이 ν•„μš”ν•œμ§€.

  • About the process of Data Mining, and how it produces results. 데이터 λ§ˆμ΄λ‹ ν”„λ‘œμ„ΈμŠ€ 및 데이터 λ§ˆμ΄λ‹μ˜ κ²°κ³Ό 생성 방법에 λŒ€ν•˜μ—¬


Quiz