From 0 to 1 : Spark for Data Science with Python

Course

Online

£ 10 + VAT

Description

  • Type

    Course

  • Methodology

    Online

  • Start date

    Different dates available

Taught by a 4 person team including 2 Stanford-educated, ex-Googlers and 2 ex-Flipkart Lead Analysts. This team has decades of practical experience in working with Java and with billions of rows of data. Get your data to fly using Spark for analytics, machine learning and data science Let’s parse that.What's Spark? If you are an analyst or a data scientist, you're used to having multiple systems for working with data. SQL, Python, R, Java, etc. With Spark, you have a single engine where you can explore and play with large amounts of data, run machine learning algorithms and then use the same system to productionize your code.
Analytics: Using Spark and Python you can analyze and explore your data in an interactive environment with fast feedback. The course will show how to leverage the power of RDDs and Dataframes to manipulate data with ease. 
Machine Learning and Data Science : Spark's core functionality and built-in libraries make it easy to implement complex algorithms like Recommendations with very few lines of code. We'll cover a variety of datasets and algorithms including PageRank, MapReduce and Graph datasets. What's Covered:Lot's of cool stuff ..Music Recommendations using Alternating Least Squares and the Audioscrobbler dataset
Dataframes and Spark SQL to work with Twitter data
Using the PageRank algorithm with Google web graph dataset
Using Spark Streaming for stream processing 
Working with graph data using the Marvel Social network dataset .. and of course all the Spark basic and advanced features: Resilient Distributed Datasets, Transformations (map, filter, flatMap), Actions (reduce, aggregate) 
Pair RDDs , reduceByKey, combineByKey 
Broadcast and Accumulator variables 
Spark for MapReduce 
The Java API for Spark 
Spark SQL, Spark Streaming, MLlib and GraphFrames (GraphX for Python) Using discussion forums
.
Please use the discussion forums on this course to engage with other students and to help each other out

Facilities

Location

Start date

Online

Start date

Different dates availableEnrolment now open

About this course

Use Spark for a variety of analytics and Machine Learning tasks
Implement complex algorithms like PageRank or Music Recommendations
Work with a variety of datasets from Airline delays to Twitter, Web graphs, Social networks and Product Ratings
Use all the different features and libraries of Spark : RDDs, Dataframes, Spark SQL, MLlib, Spark Streaming and GraphX

Questions & Answers

Add your question

Our advisors and other users will be able to reply to you

Who would you like to address this question to?

Fill in your details to get a reply

We will only publish your name and question

Reviews

This centre's achievements

2021

All courses are up to date

The average rating is higher than 3.7

More than 50 reviews in the last 12 months

This centre has featured on Emagister for 4 years

Subjects

  • Truth
  • Mac
  • Programming
  • Team Training
  • Mac-OS
  • Algorithms
  • Data analysis
  • Mac OS
  • SQL
  • Linux
  • Java

Course programme

You, This Course and Us 1 lecture 02:15 You, This Course and Us You, This Course and Us You, This Course and Us 1 lecture 02:15 You, This Course and Us You, This Course and Us You, This Course and Us You, This Course and Us You, This Course and Us You, This Course and Us You, This Course and Us You, This Course and Us You, This Course and Us You, This Course and UsYou, This Course and UsYou, This Course and Us Introduction to Spark 9 lectures 01:30:06 What does Donald Rumsfeld have to do with data analysis? He has a great categorization for insights in data, really! There is a profound truth in here which data scientists and analysts have known for years. Why is Spark so cool? Explore, investigate and find patterns in data. Build fully fledged, scalable productions system. All using the same environment. An introduction to RDDs - Resilient Distributed Datasets RDDs are pretty magical, they are the core programming abstraction in Spark Built-in libraries for Spark Spark is even more powerful because of the packages that come with it. Spark SQL, Spark Streaming, MLlib and GraphX. Installing Spark Let's get started by installing Spark. We'll also configure Spark to work with IPython Notebook The PySpark Shell Start munging data using the PySpark REPL environment. Transformations and Actions Operations on data, transform data to extract information and then retrieve results. See it in Action : Munging Airlines Data with PySpark - I We've learnt a little bit about how Spark and RDDs work. Let's see it in action! [For Linux/Mac OS Shell Newbies] Path and other Environment Variables If you are unfamiliar with softwares that require working with a shell/command line environment, this video will be helpful for you. It explains how to update the PATH environment variable, which is needed to set up most Linux/Mac shell based softwares. Introduction to Spark 9 lectures 01:30:06 What does Donald Rumsfeld have to do with data analysis? He has a great categorization for insights in data, really! There is a profound truth in here which data scientists and analysts have known for years. Why is Spark so cool? Explore, investigate and find patterns in data. Build fully fledged, scalable productions system. All using the same environment. An introduction to RDDs - Resilient Distributed Datasets RDDs are pretty magical, they are the core programming abstraction in Spark Built-in libraries for Spark Spark is even more powerful because of the packages that come with it. Spark SQL, Spark Streaming, MLlib and GraphX. Installing Spark Let's get started by installing Spark. We'll also configure Spark to work with IPython Notebook The PySpark Shell Start munging data using the PySpark REPL environment. Transformations and Actions Operations on data, transform data to extract information and then retrieve results. See it in Action : Munging Airlines Data with PySpark - I We've learnt a little bit about how Spark and RDDs work. Let's see it in action! [For Linux/Mac OS Shell Newbies] Path and other Environment Variables If you are unfamiliar with softwares that require working with a shell/command line environment, this video will be helpful for you. It explains how to update the PATH environment variable, which is needed to set up most Linux/Mac shell based softwares. What does Donald Rumsfeld have to do with data analysis? He has a great categorization for insights in data, really! There is a profound truth in here which data scientists and analysts have known for years. What does Donald Rumsfeld have to do with data analysis? He has a great categorization for insights in data, really! There is a profound truth in here which data scientists and analysts have known for years. What does Donald Rumsfeld have to do with data analysis? He has a great categorization for insights in data, really! There is a profound truth in here which data scientists and analysts have known for years. What does Donald Rumsfeld have to do with data analysis? He has a great categorization for insights in data, really! There is a profound truth in here which data scientists and analysts have known for years. He has a great categorization for insights in data, really! There is a profound truth in here which data scientists and analysts have known for years. He has a great categorization for insights in data, really! There is a profound truth in here which data scientists and analysts have known for years. Why is Spark so cool? Explore, investigate and find patterns in data. Build fully fledged, scalable productions system. All using the same environment. Why is Spark so cool? Explore, investigate and find patterns in data. Build fully fledged, scalable productions system. All using the same environment. Why is Spark so cool? Explore, investigate and find patterns in data. Build fully fledged, scalable productions system. All using the same environment. Why is Spark so cool? Explore, investigate and find patterns in data. Build fully fledged, scalable productions system. All using the same environment. Explore, investigate and find patterns in data. Build fully fledged, scalable productions system. All using the same environment. Explore, investigate and find patterns in data. Build fully fledged, scalable productions system. All using the same environment. An introduction to RDDs - Resilient Distributed Datasets RDDs are pretty magical, they are the core programming abstraction in Spark An introduction to RDDs - Resilient Distributed Datasets RDDs are pretty magical, they are the core programming abstraction in Spark An introduction to RDDs - Resilient Distributed Datasets RDDs are pretty magical, they are the core programming abstraction in Spark An introduction to RDDs - Resilient Distributed Datasets RDDs are pretty magical, they are the core programming abstraction in Spark RDDs are pretty magical, they are the core programming abstraction in Spark RDDs are pretty magical, they are the core programming abstraction in Spark Built-in libraries for Spark Spark is even more powerful because of the packages that come with it. Spark SQL, Spark Streaming, MLlib and GraphX. Built-in libraries for Spark Spark is even more powerful because of the packages that come with it. Spark SQL, Spark Streaming, MLlib and GraphX. Built-in libraries for Spark Spark is even more powerful because of the packages that come with it. Spark SQL, Spark Streaming, MLlib and GraphX. Built-in libraries for Spark Spark is even more powerful because of the packages that come with it. Spark SQL, Spark Streaming, MLlib and GraphX. Spark is even more powerful because of the packages that come with it. Spark SQL, Spark Streaming, MLlib and GraphX. Spark is even more powerful because of the packages that come with it. Spark SQL, Spark Streaming, MLlib and GraphX. Installing Spark Let's get started by installing Spark. We'll also configure Spark to work with IPython Notebook Installing Spark Let's get started by installing Spark. We'll also configure Spark to work with IPython Notebook Installing Spark Let's get started by installing Spark. We'll also configure Spark to work with IPython Notebook Installing Spark Let's get started by installing Spark. We'll also configure Spark to work with IPython Notebook Let's get started by installing Spark. We'll also configure Spark to work with IPython Notebook Let's get started by installing Spark. We'll also configure Spark to work with IPython Notebook The PySpark Shell Start munging data using the PySpark REPL environment. The PySpark Shell Start munging data using the PySpark REPL environment. The PySpark Shell Start munging data using the PySpark REPL environment. The PySpark Shell Start munging data using the PySpark REPL environment. Start munging data using the PySpark REPL environment. Start munging data using the PySpark REPL environment. Transformations and Actions Operations on data, transform data to extract information and then retrieve results. Transformations and Actions Operations on data, transform data to extract information and then retrieve results. Transformations and Actions Operations on data, transform data to extract information and then retrieve results. Transformations and Actions Operations on data, transform data to extract information and then retrieve results.Operations on data, transform data to extract information and then retrieve results.Operations on data, transform data to extract information and then retrieve results. See it in Action : Munging Airlines Data with PySpark - I We've learnt a little bit about how Spark and RDDs work. Let's see it in action! See it in Action : Munging Airlines Data with PySpark - I We've learnt a little bit about how Spark and RDDs work. Let's see it in action! See it in Action : Munging Airlines Data with PySpark - I We've learnt a little bit about how Spark and RDDs work. Let's see it in action! See it in Action : Munging Airlines Data with PySpark - I We've learnt a little bit about how Spark and RDDs work. Let's see it in action! We've learnt a little bit about how Spark and RDDs work. Let's see it in action! We've learnt a little bit about how Spark and RDDs work. Let's see it in action! [For Linux/Mac OS Shell Newbies] Path and other Environment Variables If you are unfamiliar with softwares that require working with a shell/command line environment, this video will be helpful for you. It explains how to update the PATH environment variable, which is needed to set up most Linux/Mac shell based softwares. [For Linux/Mac OS Shell Newbies] Path and other Environment Variables If you are unfamiliar with softwares that require working with a shell/command line environment, this video will be helpful for you. It explains how to update the PATH environment variable, which is needed to set up most Linux/Mac shell based softwares. [For Linux/Mac OS Shell Newbies] Path and other Environment Variables If you are unfamiliar with softwares that require working with a shell/command line environment, this video will be helpful for you. It explains how to update the PATH environment variable, which is needed to set up most Linux/Mac shell based softwares. [For Linux/Mac OS Shell Newbies] Path and other Environment Variables If you are unfamiliar with softwares that require working with a shell/command line environment, this video will be helpful for you. It explains how to update the PATH environment variable, which is needed to set up most Linux/Mac shell based softwares. If you are unfamiliar with softwares that require working with a shell/command line environment, this video will be helpful for you. It explains how to update the PATH environment variable, which is needed to set up most Linux/Mac shell based softwares. If you are unfamiliar with softwares that require working with a shell/command line environment, this video will be helpful for you. It explains how to update the PATH environment variable, which is needed to set up most Linux/Mac shell based softwares. Resilient Distributed Datasets. 9 lectures 01:11:51 RDD Characteristics: Partitions and Immutability RDDs are very intuitive to use br Special Transformations and Actions Pair RDDs are special types of RDDs where every record is a key value pair. All normal...

Additional information

The course assumes knowledge of Python. You can write Python code directly in the PySpark shell. If you already have IPython Notebook installed, we'll show you how to configure it for Spark For the Java section, we assume basic knowledge of Java. An IDE which supports Maven, like IntelliJ IDEA/Eclipse would be helpful All examples work with or without Hadoop. If you would like to use Spark with Hadoop, you'll need to have Hadoop installed (either in pseudo-distributed or cluster mode).

From 0 to 1 : Spark for Data Science with Python

£ 10 + VAT