AFTER INTERVIEW - RatneshKumarSrivastava/Ratnesh GitHub Wiki

List of Companies i have attended interview:
Infosys
L&T infotech
TCS
Data Matica
Xebia
ITC infotech
Deloitte
Tech mahindra
DXC
Infosys(selected)
Yes Bank
Lagato
Emids(selected)
Mphasis
Relevence Lab()
L&T infotech()
Mobilia()
Infosoft
Societe Generale
dunhamby


1.brodcoast variable in spark.
2 reduceByKey,groupByKey …..
3.files format .
4.internal processing of spark.
5. mapsidejoin, reducesidejoin
-————————————————————————-

Deloitte interview
1. differnce btw managed and external
2. if we will give default local of managed table at time of creating external table ?? is it possible
3. oozie and autosys
4. mapreduce code and UDF
5. insert and update logic in hive with question
for example
A
1
2
3
4

write sql query to update if 3 come then 2 and 4 then current_timestamp - need multiple way

6. if you want to execute query in hive using spark engine ?? is it possible
7. how many engines are there to execute hive queries ??
8. full explanation of project
9. select count(*) from table - how many number of mapper and reduces are working here
10. spark sql
11. load data into table and retrieve data from there
12. file formats orc and parquet which is better
13. where to use which format ??
14. joiner transformation ?
15. row_numer() over(partiitoned by sorted by )
16. versions
17. brief of project
18. cloudera manager ??
19. namenodes and why 5 only . can we increase size of nodes to reduce numbers ?
20. about backup of namenode and fsimage and edit logs ?
21. secondry name node
22. what is size of clousters and why . how much data you have how much you are goiong to store ?
23. if number of reducer are 2 and if one then which program is faster ??


1.Query to find Second Highest Salary of Employee?
Answer:-
Select distinct Salary from Employee e1 where 2=Select count(distinct Salary) from Employee e2 where e1.salary<=e2.salary;
2.Query to find duplicate rows in table?
-———————————————————————————————————————————————————
Yes Bank :::
1) diffrence between coalesce and repartition
2) transformation and actiton
4) dataframe to rdd
5) dataframe to hive table
6) how to execute query in dataframe
7) how to save dataframe into hdfs
8) reduceByKey and GroupByKey
9) if we have some records in emp table and which contains your perchase data sach as if you parchased 10 stuff in a day. then this table will contain your 10 records like your id , name, stuff. now need to group by of all stuff with id. output should be like . 10 , Ratnesh, pen,pencil, book, dress,mobile, computer
10) how to send 10 records from orcle to hdfs.
11) why oracle to netezza beacuse both are rdbms.
12) how to trannsfer data from orcle to hive in your project.
13) if you have 100 clusters and how do you find which data is stored in which cluster and with nodes of block.
14) boundry qeury
15) sqoop depely
16) data frame deeply
17) impala internal architechture
-—————————————————————————————————————————————————————-
DataMetica interview ::

first round ::
1. static and dynamic partition ??
2. add , drop partition , rename table.
3. bucketing .
4. how to transfer data from one cluster to other ? such as from dev enviroment to Production .
5. how to do incremental load using sqoop and how to update last value in production for incremental load. how sqoop job stores last value ?
6 . use of splt by in sqoop . difference between split by and boundry query in sqoop and if in one table there is no primary key and no split by command then whether sqoop command will run or not if yes then how ??
7 .difference between sort by , order by , disctributed by , culster by .
8. file formats ??
9. what is reacord reader ??
9. role of driver and worker node in spark?
10. you have one table employee having data like ::
1,ratnesh,2-jan
2,amit, 12-jan
1,ratnesh, 23-jan

find the latest updated record for each employess ??
select id, name, date from employee where rowid(select max(rowid) from employee group by id having count(*) > 1)

second round ::
1.all unix command
2. pig
3. parameter in hive like .select * from $emp;
4.how to set no of mapper in hive .
5. how to set no of replica using hdfs command ?
6. what are hdfs command ?
7. map join in hive ?
8. how to set spark memory for perticular job ?
9. how did you schedule job for hive and sqoop ?
10. if you hive script like abc.hql how can you execute this ?
11. how to load updated records into hive using sqoop ?
12. spark submit command


⚠️ **GitHub.com Fallback** ⚠️