Hadoop for Developers (4 days) Training Course
Course
In City Of London
Description
-
Type
Course
-
Location
City of london
Apache Hadoop is the most popular framework for processing Big Data on clusters of servers. This course will introduce a developer to various components (HDFS, MapReduce, Pig, Hive and HBase) Hadoop ecosystem.
Facilities
Location
Start date
Start date
Reviews
Subjects
- Java API
- Java
- Apache
- SQL
- Secondary
- Design
- Communications
- Writing
- Programming
- Architecture Design
Course programme
Section 1: Introduction to Hadoop
- hadoop history, concepts
- eco system
- distributions
- high level architecture
- hadoop myths
- hadoop challenges
- hardware / software
- lab : first look at Hadoop
- Design and architecture
- concepts (horizontal scaling, replication, data locality, rack awareness)
- Daemons : Namenode, Secondary namenode, Data node
- communications / heart-beats
- data integrity
- read / write path
- Namenode High Availability (HA), Federation
- labs : Interacting with HDFS
- concepts and architecture
- daemons (MRV1) : jobtracker / tasktracker
- phases : driver, mapper, shuffle/sort, reducer
- Map Reduce Version 1 and Version 2 (YARN)
- Internals of Map Reduce
- Introduction to Java Map Reduce program
- labs : Running a sample MapReduce program
- pig vs java map reduce
- pig job flow
- pig latin language
- ETL with Pig
- Transformations & Joins
- User defined functions (UDF)
- labs : writing Pig scripts to analyze data
- architecture and design
- data types
- SQL support in Hive
- Creating Hive tables and querying
- partitions
- joins
- text processing
- labs : various labs on processing data with Hive
- concepts and architecture
- hbase vs RDBMS vs cassandra
- HBase Java API
- Time series data on HBase
- schema design
- labs : Interacting with HBase using shell; programming in HBase Java API ; Schema design exercise
Hadoop for Developers (4 days) Training Course