Interview Question For 10 Years of Experience For Data Engineering - RatneshKumarSrivastava/Ratnesh GitHub Wiki

Python

  1. what is decorator and where to use it.
  2. list based program. (mandatory)

SQL

  1. window functions based questions (mandatory)
  2. summary of sales data
  3. explode method (mandatory)
  4. indexing
  5. partition and cluster and nested partitioning
  6. SCD type1, type2 and type3.

GCP

  1. all services and it's uses. (we will add here question)

#spark

  1. spark architecture
  2. spark optimization
  3. predicate pushdown, catalyst optimization
  4. Spark UI questions
  5. stages , jobs and tasks and how many stages creates for particular jobs
  6. how much data have you handled .
  7. how to handle 100 GB file and tell me how many cores, executor and memory of executors requires.
  8. data skewness with example.
  9. if job is taking more time to complete then what to do .
  10. out of memory fails

project

  1. dataflow of projects.
  2. batch and real time data ingestion.
  3. data validation tool and validation steps.