Name: Distributed Machine Learning with Spark - University of California
Brand: edX

Description

Type

Course
Methodology

Online

Duration

4 Weeks
Start date

Different dates available

Learn the underlying principles required to develop scalable machine learning pipelines and gain hands-on experience using Spark.With this course you earn while you learn, you gain recognized qualifications, job specific skills and knowledge and this helps you stand out in the job market.

Facilities

Online

Start date

Different dates availableEnrolment now open

About this course

Requirements

Python programming background; experience with PySpark equivalent to CS105x: Introduction to Spark; comfort with mathematical and algorithmic reasoning; familiarity with basic machine learning concepts; exposure to algorithms, probability, linear algebra and calculus.

Questions & Answers

Add your question

Our advisors and other users will be able to reply to you

Who would you like to address this question to?

All
Students
Centre

Fill in your details to get a reply

I agree to the Privacy Policy and the Conditions.

We will only publish your name and question

Emagister S.L. (data controller) will process your data to carry out promotional activities (via email and/or phone), publish reviews, or manage incidents. You can learn about your rights and manage your preferences in the privacy policy.

Reviews

This centre's achievements

2017

How do you get the CUM LAUDE seal?

All courses are up to date

The average rating is higher than 3.7

More than 50 reviews in the last 12 months

This centre has featured on Emagister for 10 years

Subjects

Spark
Machine Learning
Data analysis
Statistics

Course programme

Machine learning aims to extract knowledge from data, relying on fundamental concepts in computer science, statistics, probability and optimization. Learning algorithms enable a wide range of applications, from everyday tasks such as product recommendations and spam filtering to bleeding edge applications like self-driving cars and personalized medicine. In the age of ‘big data’, with datasets rapidly growing in size and complexity and cloud computing becoming more pervasive, machine learning techniques are fast becoming a core component of large-scale data processing pipelines. This statistics and data analysis course introduces the underlying statistical and algorithmic principles required to develop scalable real-world machine learning pipelines. We present an integrated view of data processing by highlighting the various components of these pipelines, including exploratory data analysis, feature extraction, supervised learning, and model evaluation. You will gain hands-on experience applying these principles using Spark, a cluster computing system well-suited for large-scale machine learning tasks, and its packages spark.ml and spark.mllib. You will implement distributed algorithms for fundamental statistical models (linear regression, logistic regression, principal component analysis) while tackling key problems from domains such as online advertising and cognitive neuroscience.

See related categories

Distributed Machine Learning with Spark - University of California

Questions & Answers

Reviews

This centre's achievements

Subjects

Course programme

Add similar coursesand compare them to help you choose.

Add similar courses
and compare them to help you choose.