infyni

Big Data with Apache Spark, Kafka and Cassandra

In this live course you will master how to develop and run codes that analyzes gigabytes worth of information using the most advanced technologies.

Live Course

Live Class: Wednesday, 06 Mar

Duration: 37 Hours

Enrolled: 1

Offered by: infyni

Live Course
$455 40% off

$273

About Course

In the last few years, there has been significant growth in the adoption of Apache Kafka. Current users of Kafka include Uber, Twitter, Netflix, LinkedIn, Yahoo, Cisco and Goldman Sachs.  Apache Kafka is an open-source distributed event streaming platform for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications. Kafka is a scalable pub/sub system where users can publish a large number of messages on the system as well as consume those messages through a subscription, in real time.  
While Hadoop typically holds a copy of all types of data, it is impractical to feed all other systems off Hadoop since many of them require more real time data than what Hadoop can provide. Kafka is designed as a multi-subscription system where the same published data set can be consumed multiple times.

Skills You Will Gain

Big Data Apache Spark SQL Kafka Programming Cassandra Arcitecture DataFrames NoSQL CRUD Operations Replication Partitioning Clustering NodeTool

Course Offerings

  • Instructor-led interactive classes
  • Clarify your doubts during class
  • Access recordings of the class
  • Attend on mobile or tablet
  • Live projects to practice
  • Case studies to learn from
  • Lifetime mentorship support
  • Industry specific curriculum
  • Certificate of completion
  • Employability opportunity
  • Topics
  • Instructor (1)
  • Overview of frequently used Linux environment & commands
  • What is Big Data & Data engineering?
  • Understanding Big Data pipelines
  • Introduction to Big Data Ecosystem
  • Instructions for Installations
  • Hadoop Ecosystem & core components
  • Understanding Hadoop Distributed File System
  • HDFS Commands Hands on
  • Yarn Cluster Manager
  • Basics of language that is required for programming Spark applications.
  • Introduction to Apache Spark
  • Spark Installation Demo
  • Overview of Spark on a cluster
  • Spark Deploy Modes
  • Invoking Spark Shell
  • Understanding Drivers & Executors
  • Intro to RDD & DataFrame
  • Transformation & Actions
  • Wide & Narrow Transformations
  • Understanding Execution Plan
  • Setting up a free Dataproc cluster
  • Cluster overview
  • Using HDFS & Spark-shell on cluster
  • RDD Versus Dataframe/Datasets
  • Working with different file formats – Json,Parquet,Avro,XML
  • Working with Columns
  • Filter API
  • String/Date Manipulation
  • Joining Datasets
  • Aggregating Datasets
  • UDF Functions
  • Linking with Spark SQL
  • Initializing Spark SQL and execute Basic Queries
  • Working with Hive tables
  • Intellij Setup
  • Writing Spark in IDE
  • Configuring spark
  • Understanding execution plan
  • Setting up EMR Cluster
  • Using spark-submit
  • Packaging Code
  • Running spark on Cluster
  • Intro to Spark Streaming
  • Streaming from Files/Sockets
  • Understanding Triggers & watermarks
  • Windows in Spark Streaming
  • Streaming data from Twitter Example
  • Kafka Introduction
  • Topics, Partitions & Offsets
  • Brokers & Topics
  • Topic Replication
  • Producers & Message Keys
  • Consumers & Consumer Group
  • Consumer Offsets
  • Delivery Semantics
  • Kafka Broker Discovery
  • Zookeeper
  • Kafka Guarantees
  • Intro to Kafka Programming
  • Java Producer
  • Java Consumer
  • Configuring Producer & Consumer
  • What is Kafka Connect
  • Kafka Connect Architecture
  • Connectors & Configurations
  • Setup Kafka Connect
  • Kafka Connector Source
  • Kafka Connector Sink
  • Ingest Twitter Stream via Kafka Connect
  • Integrating Kafka with Spark
  • Producing processed data into JDBC sink
  • Introduction to NoSQL
  • CAP Theorm
  • Intro to Cassandra
  • Cluster Setup
  • Understanding Data Models
  • CRUD Operations
  • Partitioning/Clustering Key
  • Data Types
  • Replication
  • Read/Write Consistency
  • Gossip Protocol
  • Read/Write Anatomy
  • Compaction
  • Creating data frames from Cassandra table
  • Processing data
  • Pushing data into Cassandra table
  • Local Setup
  • Cassandra config files
  • Using Nodetool
  • Real World Case study using Kafka Connect/Spark Streaming and Cassandra