Apache Hadoop: Manipulation and Transformation of Data Performance Training Course

Name: Apache Hadoop: Manipulation and Transformation of Data Performance Training Course
Brand: Nobleprog Limited
Price: 3750 GBP

Nobleprog Limited

Course

In City Of London

Price on request

Description

Type

Course

Location

City of london

This course is intended for developers, architects, data scientists or any profile that requires access to data either intensively or on a regular basis.
The major focus of the course is data manipulation and transformation.
Among the tools in the Hadoop ecosystem this course includes the use of Pig and Hive both of which are heavily used for data transformation and manipulation.
This training also addresses performance metrics and performance optimisation.
The course is entirely hands on and is punctuated by presentations of the theoretical aspects.

Facilities

City Of London (London)

See map

Token House, 11-12 Tokenhouse Yard, EC2R 7AS

Start date

On request

Questions & Answers

Add your question

Our advisors and other users will be able to reply to you

Who would you like to address this question to?

All
Students
Centre

Fill in your details to get a reply

I agree to the Privacy Policy and the Conditions.

We will only publish your name and question

Emagister S.L. (data controller) will process your data to carry out promotional activities (via email and/or phone), publish reviews, or manage incidents. You can learn about your rights and manage your preferences in the privacy policy.

Reviews

Subjects

Apache

Course programme

1.1Hadoop Concepts 1.1.1HDFS

The Design of HDFS
Command line interface
Hadoop File System

1.1.2Clusters

Anatomy of a cluster
Mater Node / Slave node
Name Node / Data Node

1.2Data Manipulation 1.2.1MapReduce detailed

Map phase
Reduce phase
Shuffle

1.2.2Analytics with Map Reduce

Group-By with MapReduce
Frequency distributions and sorting with MapReduce
Plotting results (GNU Plot)
Histograms with MapReduce
Scatter plots with MapReduce
Parsing complex datasets
Counting with MapReduce and Combiners
Build reports

1.2.3Data Cleansing

Document Cleaning
Fuzzy string search
Record linkage / data deduplication
Transform and sort event dates
Validate source reliability
Trim Outliers

1.2.4Extracting and Transforming Data

Transforming logs
Using Apache Pig to filter
Using Apache Pig to sort
Using Apache Pig to sessionize

1.2.5Advanced Joins

Joining data in the Mapper using MapReduce
Joining data using Apache Pig replicated join
Joining sorted data using Apache Pig merge join
Joining skewed data using Apache Pig skewed join
Using a map-side join in Apache Hive
Using optimized full outer joins in Apache Hive
Joining data using an external key value store

1.3Performance Diagnosis and Optimization Techniques

Map
- Investigating spikes in input data
- Identifying map-side data skew problems
- Map task throughput
- Small files
- Unsplittable files
Reduce
- Too few or too many reducers
- Reduce-side data skew problems
- Reduce tasks throughput
- Slow shuffle and sort
Competing jobs and scheduler throttling
Stack dumps & unoptimized code
Hardware failures
CPU contention
Tasks
- Extracting and visualizing task execution times
- Profiling your map and reduce tasks
Avoid the reducer
Filter and project
Using the combiner
Fast sorting with comparators
Collecting skewed data
Reduce skew mitigation

Apache Hadoop: Manipulation and Transformation of Data Performance Training Course

Price on request

Apache Hadoop: Manipulation and Transformation of Data Performance Training Course

Questions & Answers

Reviews

Subjects

Course programme

Add similar coursesand compare them to help you choose.

Add similar courses
and compare them to help you choose.