Description

Type

Course
Methodology

Online

Start date

Different dates available

This course will teach you the techniques used by real data scientists in the tech industry and prepare you for a move into this career path. It includes hands-on Python code examples which you can use for reference and for practice. It also contains an entire section on machine learning with Apache Spark, which lets you scale up these techniques to "big data" analysed on a computing cluster.Frank Kane spent 9 years at Amazon and IMDb, developing and managing the technology that automatically delivers product and movie recommendations to millions of customers. Frank holds 17 issued patents in the fields of distributed computing, data mining, and machine learning. He also started his own successful company, Sundog Software, which focuses on virtual reality environment technology, and teaching others about big data analysis.This course is intended for software developers or programmers who want to transition into the lucrative data science career path. It would also suit Data analysts in the finance or other non-tech industries who want to transition into the tech industry. You will learn how to analyse data using code instead of tools and it covers the machine learning and data mining techniques real employers are looking for.

Facilities

Online

Start date

Different dates availableEnrolment now open

Questions & Answers

Add your question

Our advisors and other users will be able to reply to you

Who would you like to address this question to?

All
Students
Centre

Fill in your details to get a reply

I agree to the Privacy Policy and the Conditions.

We will only publish your name and question

Emagister S.L. (data controller) will process your data to carry out promotional activities (via email and/or phone), publish reviews, or manage incidents. You can learn about your rights and manage your preferences in the privacy policy.

Reviews

Subjects

Computing
Data Mining
Apache
Internet
IT
Technology
Industry
Cleaning
IT Security
Probability

Course programme

Course Programme

Introduction
Getting Started
[Activity] Installing Enthought Canopy
Python Basics, Part 1
[Activity] Python Basics, Part 2
Running Python Scripts
Statistics and Probability Refresher, and Python Practice
Types Of Data
Mean, Median, Mode
[Activity] Using mean, median, and mode in Python
[Activity] Variation and Standard Deviation
Probability Density Function; Probability Mass Function
Common Data Distributions
[Activity] Percentiles and Moments
[Activity] A Crash Course in matplotlib
[Activity] Covariance and Correlation
[Exercise] Conditional Probability
Exercise Solution: Conditional Probability of Purchase by Age
Bayes' Theorem
Predictive Models
[Activity] Linear Regression
[Activity] Polynomial Regression
[Activity] Multivariate Regression, and Predicting Car Prices
Multi-Level Models
Machine Learning with Python
Supervised vs. Unsupervised Learning, and Train/Test
Supervised vs. Unsupervised Learning, and Train/Test
Bayesian Methods: Concepts
[Activity] Implementing a Spam Classifier with Naive Bayes
K-Means Clustering
[Activity] Clustering people based on income and age
Measuring Entropy
[Activity] Install GraphViz
Decision Trees: Concepts
Decision Trees: Concepts
Ensemble Learning
Support Vector Machines (SVM) Overview
[Activity] Using SVM to cluster people using scikit-learn
Recommender Systems
User-Based Collaborative Filtering
Item-Based Collaborative Filtering
[Activity] Finding Movie Similarities
[Activity] Improving the Results of Movie Similarities
[Activity] Making Movie Recommendations to People
[Exercise] Improve the recommender's results
More Data Mining and Machine Learning Techniques
K-Nearest-Neighbors: Concepts
[Activity] Using KNN to predict a rating for a movie
Dimensionality Reduction; Principal Component Analysis
[Activity] PCA Example with the Iris data set
Data Warehousing Overview: ETL and ELT
Reinforcement Learning
External Resources
Dealing with Real-World Data
[Activity] K-Fold Cross-Validation to avoid overfitting
Data Cleaning and Normalization
[Activity] Cleaning web log data
Normalizing numerical data
[Activity] Detecting outliers
Apache Spark: Machine Learning on Big Data
[Activity] Installing Spark - Part 1
[Activity] Installing Spark - Part 1
[Activity] Installing Spark - Part 2
[Activity] - Installing Sparks Part 2
Spark Introduction
Spark and the Resilient Distributed Dataset (RDD)
Introducing MLLib
[Activity] Decision Trees in Spark
Introducing MLLib
TF / IDF
[Activity] Using the Spark 2.0 DataFrame API for MLLib
[Activity] Searching Wikipedia with Spark
Installing Spark file
Experimental Design
A/B Testing Concepts
T-Tests and P-Values
[Activity] Hands-on With T-Tests
Determining How Long to Run an Experiment
A/B Test Gotchas

What does the course include:

Includes 68 lectures and 9 hours of video content.
Learn how to perform machine learning on "big data" using Apache Spark and its MLLib package.
Apply best practices in cleaning and preparing your data prior to analysis
Be able to design experiments and interpret the results of A/B tests
Suitable for software developers or programmers who want to transition into the data science career path.

See related categories

Data Science & Machine Learning

Questions & Answers

Reviews

Subjects

Course programme

Add similar courses
and compare them to help you choose.

Data Science & Machine Learning

Questions & Answers

Reviews

Subjects

Course programme

Add similar coursesand compare them to help you choose.

Add similar courses
and compare them to help you choose.