infyni

Big Data with Spark, Cassandra and Kafka (NRIVA)

This is a live online training with lecture, demos and lab sessions with real time use-case scenarios. You will master how to develop and run codes that analyzes gigabytes worth of information using the most advanced technologies in this interactive course.

Live Course

Live Class: Thursday, 07 Mar

Duration: 30 Hours

Enrolled: 0

Offered by: infyni

(10)

Live Course
$455 40% off

$273

About Course

In the last few years, there has been significant growth in the adoption of Apache Kafka. Current users of Kafka include Uber, Twitter, Netflix, LinkedIn, Yahoo, Cisco and Goldman Sachs.  

Apache Kafka is an open-source distributed event streaming platform for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications. 

Big Data Analytics needs a set of tools to get significant results. Here are top 10 big data tools: 
    • Apache Hadoop
    • Apache Spark
    • Flink
    • Apache Storm
    • Apache Cassandra
    • MongoDB
    • Kafka
    • Tableau
    • RapidMiner
    • R Programming

Skills You Will Gain

Big Data Spark Cassandra Kafka

Course Offerings

  • Instructor-led interactive classes
  • Clarify your doubts during class
  • Access recordings of the class
  • Attend on mobile or tablet
  • Live projects to practice
  • Case studies to learn from
  • Lifetime mentorship support
  • Industry specific curriculum
  • Certificate of completion
  • Employability opportunity
  • Topics
  • Instructor (1)
  • Reviews
  • Understanding Disk Computing vs In Memory Computing
  • How Spark works
  • Understanding the Spark architecture and why it is better than Map Reduce
  • RDD: Unit of Data in Spark
  • Architecture of Spark
  • Understanding Spark Context, Worker Nodes, Executioner and Tasks
  • How Spark supports multiple languages
  • How beneficial is Unified Spark API
  • Transformation and Actions in RDD
  • How DAG's are formed
  • How does Spark Lazy Loading works
  • Broadcast, Accumulator in Spark Core
  • Persisting and Caching Data in Spark
  • DAG scheduler
  • Physical plan, logical plan
  • Common problems
  • Shuffles, Spills, small files
  • Optimizations
  • Spark Tuning
  • Memory Tuning
  • Garbage Collection Tuning
  • Data Structure Tuning
  • Why Spark SQL?
  • Understanding Spark SQL behind the hoods working
  • How Spark RDD still forms the base of Spark SQL
  • Data Frame: Basic Unit of Data in Spark SQL
  • Dataset: Type Safety unit of data in Spark SQL for Object Oriented Languages like Java and scalability
  • Loading Data from following format into Datasets or DataFrames a. JSON b. CSV c. Parquet d. ORC
  • Using Joins in Spark
  • Connecting Spark with Hive
  • Connecting Hive with HBase
  • Exporting Data from Spark
  • Understanding What Cassandra Is
  • Learning What Cassandra Is Being Used For
  • Understanding That Cassandra Is a Distributed Database
  • Learning What Snitch Is For
  • Learning What Gossip Is For
  • Learning How Data Gets Distributed
  • Learning About Replication
  • Learning About Virtual Nodes
  • Downloading Cassandra
  • Ensuring Oracle Java Is Installed
  • Installing Cassandra
  • Viewing The Main Configuration File
  • Providing Cassandra with Permission to Directories
  • Starting Cassandra
  • Checking Status
  • Accessing The Cassandra system.log File
  • Understanding Ways To Communicate With Cassandra
  • Using Cqlsh
  • Understanding A Cassandra Database
  • Defining A Keyspace
  • Deleting A Keyspace
  • Creating A Table
  • Defining Columns and Data Types
  • Defining A Primary Key
  • Recognizing A Partition Key
  • Specifying A Descending Clustering Order
  • Understanding Ways To Write Data
  • Using The INSERT INTO Command
  • Using The COPY Command
  • How Data Is Stored In Cassandra
  • How Data Is Stored On Disk
  • Understanding Data Modeling In Cassandra
  • Using A WHERE Clause
  • Understanding Secondary Indexes
  • Creating A Secondary Index
  • Defining A Composite Partition Key
  • Updating Data
  • Understanding How Updating Works
  • Deleting Data
  • Understanding Tombstones
  • Using TTLs
  • Updating A TTL
  • Understanding Cassandra Drivers
  • Exploring The DataStax Java Driver
  • Setting Up A Development Environment
  • Creating An Application Page
  • Acquiring The DataStax Java Driver Files
  • Getting The DataStax Java Driver Files Through Maven
  • Providing The DataStax Java Driver Files Manually
  • Connecting To A Cassandra Cluster
  • Executing A Query
  • Displaying Query Results
  • Brokers and Topics
  • Topic Replication
  • Producers and Message Keys
  • Consumers & Consumer Groups
  • Consumer Offsets & Delivery Semantics
  • Zookeeper
  • Kafka Guarantees
  • Windows - Download Kafka and PATH Setup
  • Windows - Start Zookeeper & Kafka
  • Windows - Summary
  • Kafka Topics CLI
  • Kafka Console Producer CLI
  • Kafka Console Consumer CLI
  • Kafka Consumers in Group
  • Kafka Consumer Groups CLI
  • Resetting Offsets
  • CLI Options that are good to know
  • What about UIs? Conduktor
  • Conduktor - Demo
  • KafkaCat as a replacement for Kafka CLI
  • Installing Java & IntelliJ Community Edition
  • Creating Kafka Project
  • Java Producer
  • Java Producer Callbacks
  • Java Producer with Keys
  • Java Consumer
  • Java Consumer inside Consumer Group
  • Java Consumer Seek and Assign
  • Client Bi-Directional Compatibility
  • Configuring Producers and Consumers
  • Real World Project Overview
  • Twitter Setup
  • Producer Part - Writing Twitter Client
  • Producer Part - Writing the Kafka Producer
  • Producer Configurations Introduction
  • acks & min.insync.replicas
  • retries, delivery.timeout.ms & max.in.flight.requests.per.connection
  • Idempotent Producer
  • Producer Part - Safe Producer
  • Producer Compression
  • Producer Batching
  • Producer Part - High Throughput Producer
  • Producer Default Partitions and Key Hashing
  • [Advanced] max.block.ms and buffer.memory
  • Refactoring the Project
  • Setting up ElasticSearch in the Cloud
  • ElasticSearch
  • Consumer Part - Setup Project
  • Consumer Part - Write the Consumer & Send to ElasticSearch
  • Delivery Semantics for Consumers
  • Consumer Part - Idempotence
  • Consumer Poll Behavior
  • Consumer Offset Commit Strategies
  • Consumer Part - Manual Commit of Offsets
  • Consumer Part - Performance Improvement using Batching
  • Consumer Offsets Reset Behavior
  • Consumer Part - Replaying Data
  • Consumer Internal Threads
  • Kafka Connect Introduction
  • Kafka Connect Twitter Hands-On
  • Kafka Streams Introduction
  • Kafka Streams Hands-On
  • Kafka Schema Registry Introduction

(10)