Description

Type

Course

Location

London-city

Both Data Science and Big Data have risen to prominence recently. Whilst they are not immutably linked, it is certainly true that many data scientists work extensively with big data. Both topics are so new that they are poorly understood; nevertheless there is considerable interest in them and there is a significant shortfall in the number of trained data scientists in the job market. This course introduces both the job role of the data scientist and big data itself.This course is vendor neutral; it is not about how to use any one vendor's products, it is about the fundamental underpinnings of these two important subject areas. This course is aimed at people who are trying to understand data science and big data and want to know about the range of skills, technologies and techniques that are appropriate to these new areas. So, for example, if you have already chosen, say, Hortonworks as a platform and want to acquire specific skills in that area, then take a look at 'Data Science for the Hortonworks Data Platform'. On other hand, if you fit the profile below, then this is the course for you.Target Audience:This course is intended for people aspiring to be data scientists and/or to work with Big Data. Others who may take this course include Business Intelligence (BI) professionals who want to work with big data and/or are looking to move into Data Science. People coming into the course are expected to have at least 3 years experience working in the IT field-typically in the areas of databases, BI, analytics or related areas. Learning Objectives At the end of this course you will be able to:Understand the role of the data scientistUnderstand big data and what makes it differentUtilise CAP theorem to choose a database engine for a given situationSelect a specific NoSQL database engineUse the analytical language RIdentify continuous and discontinuous dataUnderstand normal distributions, mean, mode, median and standard deviationsTake good samples from...

Facilities

London-City ((select))

EC3V 9LJ

Start date

On request

About this course

Requirements

Knowledge of at least one relational database engine (Oracle, SQL Server, DB2 etc.)An understanding of relational database modelling and designBasic understanding of data and how business systems use both data and information; this would be gained by at least a year's experience in IT or business systems development.

Questions & Answers

Add your question

Our advisors and other users will be able to reply to you

Who would you like to address this question to?

All
Students
Centre

Fill in your details to get a reply

I agree to the Privacy Policy and the Conditions.

We will only publish your name and question

Emagister S.L. (data controller) will process your data to carry out promotional activities (via email and/or phone), publish reviews, or manage incidents. You can learn about your rights and manage your preferences in the privacy policy.

Reviews

Subjects

Business Intelligence
Database training
Database
Data Mining

Course programme

Module 1: Introduction

This module introduces both Data Science and Big Data.

The origins of data science
What skills does a data scientist need?
How do they differ from those required for BI (Business Intelligence)?
Case Studies

Module 2: Big Data

This module outlines the difference between the tabular data that underpins the relational model and Big Data. It explores not only what Big Data is, but why the commercial (and scientific) worlds are so interested in it. Finally it looks at how we can cross analyse tabular data and Big Data.

Data Science isn't just about big data, but the two are certainly related
Atomicity of data
Tabular and big data
What is big data?
Why are we interested in big data?
Where is the business value?
How can we cross analysing tabular and big data?

Module 3: Finding the patterns in data

One of the vital skills for a data scientist is to be able to understand how numbers behave, how they are distributed and how we can determine the significance of any differences that we observe between numerical values. This involves an understanding of normal distributions, means, modes and standard deviations as well as, for example, Chi squared and t tests. This module covers these topics.

Continuous and discontinuous data
Random numbers aren't
Flat distributions
How multiple independent factors interact
Normal distributions
Mean, mode and median
Standard deviation
Sampling populations
Causality
Correlations
Chi squared
t test

Module 4: Data models - relational and NoSQL

This module describes the different models that are used to represent data and specifically contrasts the relational and NoSQL worlds. It covers CAP theorem and why that is relevant to data models.

Schema and schema-less storage
Deciding what analysis can be performed and where
CAP theorem
NoSQL databases
VoltDB

Module 5: Hadoop, HDFS and MapReduce

Hadoop, HDFS and MapReduce are well established examples of tools/methodologies for manipulations Big Data.

Hadoop
HDFS and MapReduce

Module 6: Data visualisation
The ability to create data visualisations that have meaning for a given set of data and the target audience is a major part of being a Data Scientist. This module describes how to plan and deploy visualisations and provides two case studies of where this has been successfully achieved.

Different visualisations for different types of data
Visualising data - a case study

Module 7: Introduction to R

R is a well-established, open source language, very specifically aimed at analysis. This module introduces the language and provides some practical work in using it.

Introduction to R
Lab : Using R

Module 8: Data mining

This module introduces not only data mining, but the CRISP methodology which helps to ensure that data mining is carried out effectively. It also introduces the Monte Carlo methodology for modelling and analysing systems.

What is data mining?
Data mining v. querying
CRISP-DM
- Business understanding
- Data understanding
- Data preparation
- Modelling
- Evaluation
- Deployment
Result validation
Change and monitor
Dangers of over-fitting
Outliers (and how to deal with them)
False positives
Monte Carlo simulations
Specific data mining techniques
Clustering - Design an algorithm
- Classification
- Decision trees
- Regression
- Segmentation
- Association
- Sequence analysis
- Neural nets

See related categories

Understanding Data Science and Big Data

Questions & Answers

Reviews

Subjects

Course programme

Add similar courses
and compare them to help you choose.

Understanding Data Science and Big Data

Questions & Answers

Reviews

Subjects

Course programme

Add similar coursesand compare them to help you choose.

Add similar courses
and compare them to help you choose.