Interview Question For 10 Years of Experience For Data Engineering - RatneshKumarSrivastava/Ratnesh GitHub Wiki
Python
- what is decorator and where to use it.
- list based program. (mandatory)
SQL
- window functions based questions (mandatory)
- summary of sales data
- explode method (mandatory)
- indexing
- partition and cluster and nested partitioning
- SCD type1, type2 and type3.
GCP
- all services and it's uses. (we will add here question)
#spark
- spark architecture
- spark optimization
- predicate pushdown, catalyst optimization
- Spark UI questions
- stages , jobs and tasks and how many stages creates for particular jobs
- how much data have you handled .
- how to handle 100 GB file and tell me how many cores, executor and memory of executors requires.
- data skewness with example.
- if job is taking more time to complete then what to do .
- out of memory fails
project
- dataflow of projects.
- batch and real time data ingestion.
- data validation tool and validation steps.