Apache Arrow for Data Analysis across Disparate Data Sources Training Course
Course
In City Of London
Description
-
Type
Course
-
Location
City of london
Apache Arrow is an open-source in-memory data processing framework. It is often used together with other data science tools for accessing disparate data stores for analysis. It integrates well with other technologies such as GPU databases, machine learning libraries and tools, execution engines, and data visualization frameworks.
In this onsite instructor-led, live training, participants will learn integrate Apache Arrow with various Data Science frameworks to access data from disparate data sources.
By the end of this training, participants will be able to:
Install and configure Apache Arrow in a distributed clustered environment
Use Apache Arrow to access data from disparate data sources
Use Apache Arrow to bypass the need for constructing and maintaining complex ETL pipelines
Analyze data across disparate data sources without having to consolidate it into a centralized repository
Audience
Data scientists
Data engineers
Format of the Course
Part lecture, part discussion, exercises and heavy hands-on practice
Note
To request a customized training for this course, please contact us to arrange.
Facilities
Location
Start date
Start date
Reviews
Subjects
- Data analysis
- Apache
- Access
Course programme
Introduction
Apache Arrow vs Parquet
Installing and Configuring Apache Arrow
Overview of Apache Arrow Features and Architecture
Exploring Data with Pandas and Apache Arrow
Exploring Data with Spark and Apache Arrow
Exploring Data with R and Apache Arrow
Exploring Data with MapD and Apache Arrow
Other Data Analysis Integrations
PySpark, Parquet files on S3, and Oracle tables and Elasticsearch indices
Troubleshooting
Summary and Conclusion
Apache Arrow for Data Analysis across Disparate Data Sources Training Course
