infyni

About Course

In the last few years, there has been significant growth in the adoption of Apache Kafka. Current users of Kafka include Uber, Twitter, Netflix, LinkedIn, Yahoo, Cisco and Goldman Sachs. Apache Kafka is an open-source distributed event streaming platform for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications. Kafka is a scalable pub/sub system where users can publish a large number of messages on the system as well as consume those messages through a subscription, in real time.

While Hadoop typically holds a copy of all types of data, it is impractical to feed all other systems off Hadoop since many of them require more real time data than what Hadoop can provide. Kafka is designed as a multi-subscription system where the same published data set can be consumed multiple times.

Skills You Will Gain

Big Data Apache Spark SQL Kafka Programming Cassandra Arcitecture DataFrames NoSQL CRUD Operations Replication Partitioning Clustering NodeTool

Course Offerings

Instructor-led interactive classes
Clarify your doubts during class
Access recordings of the class
Attend on mobile or tablet
Live projects to practice
Case studies to learn from
Lifetime mentorship support
Industry specific curriculum
Certificate of completion
Employability opportunity

Topics
Instructor (1)

1 Introduction to Linux

Overview of frequently used Linux environment & commands

2 Introduction to Big Data & Hadoop

What is Big Data & Data engineering?
Understanding Big Data pipelines
Introduction to Big Data Ecosystem
Instructions for Installations

3 Hadoop Eco-System

Hadoop Ecosystem & core components
Understanding Hadoop Distributed File System
HDFS Commands Hands on
Yarn Cluster Manager

4 Introduction to python/scala

Basics of language that is required for programming Spark applications.

5 SPARK: Introduction

Introduction to Apache Spark
Spark Installation Demo
Overview of Spark on a cluster
Spark Deploy Modes

6 Spark Internals

Invoking Spark Shell
Understanding Drivers & Executors
Intro to RDD & DataFrame
Transformation & Actions
Wide & Narrow Transformations
Understanding Execution Plan

7 Google Cloud Dataproc Cluster

Setting up a free Dataproc cluster
Cluster overview
Using HDFS & Spark-shell on cluster

8 Working with Dataframes

RDD Versus Dataframe/Datasets
Working with different file formats – Json,Parquet,Avro,XML
Working with Columns
Filter API
String/Date Manipulation
Joining Datasets
Aggregating Datasets
UDF Functions

9 Spark SQL: Analyzing Structured Data

Linking with Spark SQL
Initializing Spark SQL and execute Basic Queries
Working with Hive tables

10 Using Intellij IDE

Intellij Setup
Writing Spark in IDE
Configuring spark
Understanding execution plan

11 Running Spark on EMR Cluster

Setting up EMR Cluster
Using spark-submit
Packaging Code
Running spark on Cluster

12 Spark Streaming

Intro to Spark Streaming
Streaming from Files/Sockets
Understanding Triggers & watermarks
Windows in Spark Streaming
Streaming data from Twitter Example

13 Kafka: Fundamentals

Kafka Introduction
Topics, Partitions & Offsets
Brokers & Topics
Topic Replication
Producers & Message Keys
Consumers & Consumer Group
Consumer Offsets
Delivery Semantics
Kafka Broker Discovery
Zookeeper
Kafka Guarantees

14 Kafka Programming

Intro to Kafka Programming
Java Producer
Java Consumer
Configuring Producer & Consumer

15 Kafka Connect

What is Kafka Connect
Kafka Connect Architecture
Connectors & Configurations
Setup Kafka Connect
Kafka Connector Source
Kafka Connector Sink

16 Kafka with Spark Streaming

Ingest Twitter Stream via Kafka Connect
Integrating Kafka with Spark
Producing processed data into JDBC sink

17 Cassandra: Introduction

Introduction to NoSQL
CAP Theorm
Intro to Cassandra
Cluster Setup

18 Data Model & CRUD

Understanding Data Models
CRUD Operations
Partitioning/Clustering Key
Data Types

19 Cassandra Architecture

Replication
Read/Write Consistency
Gossip Protocol
Read/Write Anatomy
Compaction

20 Spark with Cassandra

Creating data frames from Cassandra table
Processing data
Pushing data into Cassandra table

21 Cassandra Configurations

Local Setup
Cassandra config files
Using Nodetool

22 Data Engineering: Final Project

Real World Case study using Kafka Connect/Spark Streaming and Cassandra

Navdeep Kaur

About Instructor

I am a Big Data Architect and Global Technical Trainer with expertise in Amazon Cloud, Big Data technologies , Kubernetes, Cassandra, Kafka and Java and with history of working in the information technology and services industry. Strong technical and analytical skills with a Bachelor of Technology (B.Tech.) focused in Information Technology from YMCA institute of Engineering and Technology.

0

Rating

0

Review

1

Student

1

Course

Big Data with Apache Spark, Kafka and Cassandra

$455 40% off

$273