Hadoop Architect Interview Requirement - vidyasekaran/bigdata_frameworks_components GitHub Wiki

Book to master big data architecture

Learning Path: Architect and Build Big Data Applications

https://www.analyticsvidhya.com/blog/2017/03/big-data-learning-path-for-all-engineers-and-data-scientists-out-there/

Read Mapreduce design pattern book from orielly - soft copy download - http://barbie.uta.edu/~jli/Resources/MapReduce&Hadoop/MapReduce%20Design%20Patterns.pdf

-Proficient understanding of distributed computing principles

-Management of Hadoop cluster, with all included services

-Ability to solve any ongoing issues with operating the cluster

-Must possess good client facing experience with the ability to facilitate Big data requirements sessions and lead teams and present analysis in business terms.

-Must have led managed executed analytics driven Big data project through the complete life-cycle of the project.

-Designed and worked on continuous improvement of Hadoop, NoSQL based solutions.

-Created end-to-end solution design and development approach in a Hadoop NoSQL environments with analytic EIM platforms.

-Experience in Agile development methodology Experience with ETL, data warehouse, BI platforms, RDBMS, SQL.

-Must be able to manage and ensure that the project schedules are adhered to as per the client specifications and deliveries are as per the time and quality standards. Coordination with On-Site Engineers and attending customer calls

-To create project plans and keep track of schedule for on time delivery as per the defined quality standards

-To ensure process improvement and compliance| and participate in technical design discussion and to review technical documents Raising risks issues and escalation to senior management and customer

  • To create all quality documents, collect metrics data and conducting Audits.

-Proficiency with Hadoop v2, MapReduce, HDFS

-Experience with building stream-processing systems, using solutions such as Storm or Spark-Streaming Good knowledge of Big Data querying tools, such as Pig, Hive, and Impala

-Experience with Spark and NoSQL databases, such as HBase, Cassandra, MongoDB

-Experience with integration of data from multiple data sources

-Knowledge of various ETL techniques and frameworks, like Flume

-Experience with various messaging systems, such as Kafka or RabbitMQ

-Experience with Big Data ML toolkits, such as Mahout, SparkML, or H2O

-Minimum experience in the relevant area 4 years Experience on either Hadoop stack such as PIG, Oozie, SPARK, SQOOP, FLUME, HIVE or NoSQL databases such as MongoDB, Cassandra, HBase, Neo4j.

  • Very comfortable with a backend language
  • Experience with Hadoop, MapReduce, HDFS
  • Experience with Spark
  • Helpful: Storm, Hive, Pig, Kafka exp

-Big Data Software Platforms (Hadoop ecosystem, teradata). Stream Data Processing, Kafka, Storm, Spark etc.

-Statistics (Hypothesis testing, Discriminant analysis, Linear/Logistic regression, PCA,Vector-analysis) Forecasting (Time Series) Machine Learning & AI (Classification, Clustering, Collaborating filtering/MBA, Neural Network) Multi-Media -----

-Analytics : Text mining(with sarcasm identification), video & voice analytics Advanced Analytics : Deep Learning, Cognitive Analytics Customer journey analytics/mapping

-Optimization Monte Carlo Simulation Stochastic Analytics Big Data Software Platforms (Hadoop ecosystem, teradata). Stream Data Processing, Kafka, Storm, Spark etc.

-Good understanding of Lambda Architecture, along with its advantages and drawbacks

-Experience with Cloudera/MapR/Hortonworks

-Extensive experience in BigData/ Hadoop/Cloudera in architecting & design solutions