Complete Big Data Engineering Masterclass – Beginner to Advanced Training

 

Nowadays, Organizations have started to get better insights by making smarter use of data. This has lead in a huge requirement for Certified and Master Big Data Hadoop Developers in the market. Our Big Data Engineering Masterclass training is comprehensive and designed considering current industry job requirements by top industry experts to provide in-depth learning on the key core concepts of Big Data Hadoop and Spark Ecosystem.

Need more info

6 + 7 =

Course Overview

  • Introduction to Big Data

    Core concepts of big data, capabilities, solutions architecture. Problems, applications and systems related to big data.
  • Introduction to the Hadoop Framework

    Hadoop framework features, installation and run sample programs.
  • MapReduce

    Phases of Map Reduce paradigm and various input/output formats.
  • Apache®Spark

    Spark’s architecture, programming model detail, install, run, and interact with Spark to analyze real-world data.
  • Apache®Kafka

    Build real-time streaming applications and data pipelines through a combination of messaging, storage, and stream processing. Transform data as it arrives.
  • Apache®Pig

    Execute Pig data flow language scripts as Hadoop jobs in MapReduce and Spark.
  • Apache®Hive

    Hive Query Language (HQL) statements. Architecture of Hive, databases creation, tables, and perform various operations (DDL, DML, DQL, etc) using it. Performance optimization.
  • Apache®HBase

    Combine data sources that use a wide variety of different structures and schemas.
  • Apache®Sqoop

    Import and export data from traditional databases, like SQL, Oracle to Hadoop using Apache® Sqoop to perform various operations.
  • Apache®Oozie & Airflow

    Implement tasks & workflows to schedule a Hadoop job.

Features

Teaching Hours

Assignments

Project Hours

Hands on Coding Hours

eLearning Dashboard

Even if you miss a class or a new tool comes in, you can always check in to learn the latest.

Job Assistance Program

We will make sure you get a chance to prove your worth.

 

Real world projects

Add impressive projects to your portfolio.

 

Hands on coding

Practice make you perfect, as we strongly believe in learning by practice.

 

Our Students Speak

Beating the Automation Layoffs was a big challenge for me, but, with Big Data training from Factlabs, I got more confident and able to get over it to excel and grow.

Gaurav

Big Data Engineer, Major IT MNC

Enough of research, comparison and heads up, lets get started and get going!

ENROLL NOW!

Syllabus

Hadoop ,MapReduce and Pig.

Why Is Data So Important?

Pre-Requisite – Data Scale

What Is Big Data?

Big Bank: Big Challenge

Common Problems

3 Vs Of Big Data

Defining Big Data

Sources Of Data Flood

Exploding Data Problem

Redefining The Challenges Of Big Data

Possible Solutions: Scaling Up Vs. Scaling Out

Challenges Of Scaling Out

Solution For Data Explosion-Hadoop

Hadoop: Introduction

Hadoop In Layman’s Term

Hadoop Ecosystem

Evolutionary Features Of Hadoop

Hadoop Timeline

Why Learn Big Data Technologies?

Who Is Using Big Data?

HDFS: Introduction

Design Of HDFS

Why Hadoop Cluster?

HDFS Blocks

Components Of Hadoop 1.X

NameNode And Hadoop Cluster

Arrangement Of Racks

Arrangement Of Machines And Racks

Local FS And HDFS

NameNode

Checkpointing

Replica Placement

Benefits-Replica Placement And Rack Awareness

URI

URL And URN

HDFS Commands

Problems With HDFS In Hadoop 1.X

HDFS Federation (Included In Hadoop 2.X)

HDFS Federation

High Availability

Anatomy Of File Read From HDFS

Data Read Steps

Important Java Classes To Write Data To HDFS

Anatomy Of File Write To HDFS

Writing File To HDFS: Steps

Building Principles

Introduction To MapReduce

MR Demo

Pseudo Code

Mapper Class

Reducer Class

Driver Code

InputSplit

InputSplit And Data Blocks – Difference

Why Is The Block Size 128 MB?

RecordReader

InputFormat

Default Inputformat : TextInputFormat

OutputFormat

Using A Different OutputFormat

Important Points

Partitioner

Using Partitioner

Map Only Job

Flow Of Operations In MapReduce

Hive

Serialization In MapReduce

Custom Writable In MapReduce

Custom WritableComparable In MapReduce

Schedulers In YARN

FIFO Scheduler

Capacity Scheduler

Fair Scheduler

Differences Between Hadoop 1.X And Hadoop 2.X

Introduction To Apache Pig

Why Pig?

Apache Pig Architecture

Simple Data Types

Complex Data Types

Sample Execution

Pig Operators Demo

Parameter Substitution

Macros

Anatomy Of Reduce-Side-Join

Job Optimizations In Pig

UDF’s In Pig

Execution Of XML And CSV Files In Pig

Introduction

Hive DDL

Demo: Databases.Ddl

Demo: Tables.Ddl

Hive Views

Demo: Views.Ddl

Architecture

Primary Data Types

Data Load

Demo: ImportExport.Dml

Demo: HiveQueries.Dml

Demo: Explain.Hql Table Types

Demo: ExternalTable.Ddl

Complex Data Types

Demo: Working With Complex Datatypes

Hive Variables

Demo: Working With Hive Variables

Hive Variables And Execution Customization

Advanced Hive

Working With Arrays

Sort By And Order By

Distribute By And Cluster By

Partitioning

Static And Dynamic Partitioning

Bucketing Vs Partitioning

Joins And Types

Bucket-Map Join

Sort-Merge-Bucket-Map Join

Left Semi Join

Demo: Join Optimisations

Input Formats In Hive

Sequence Files In Hive

RC File In Hive

File Formats In Hive

ORC Files In Hive

Inline Index In ORC Files

ORC File Configurations In Hive

SerDe In Hive

Demo: CSVSerDe

JSONSerDe

RegexSerDe

Analytic And Windowing In Hive

Demo: Analytics.Hql

Hcatalog In Hive

Demo: Using_HCatalog

Accessing Hive With JDBC

Demo: HiveQueries.Java

HiveServer2 And Beeline

Demo: Beeline

UDF In Hive

Demo: ToUpper.Java And Working_with_UDF

Optimizations In Hive

Demo: Optimizations

HBase

Challenges With Traditional RDBMS

Features Of NoSQL Databases

NoSQL Database Types

CAP Theorem

What Is HBase Regions

HBase HMaster ZooKeeper

HBase First Read

HBase Meta Table

Region Split

Apache HBase Architecture Benefits

HBase Vs. RDBMS

Shell Commands

Oozie, Sqoop, Flume

Introduction To Oozie

Oozie Architecture

Oozie Workflow Nodes

Oozie Server

Oozie Workflow

Sqoop Architecture

Sqoop Features

Sqoop Hands On

Flume: Introduction

Flume Architecture

Example Description

Transactions

Batching

Exec Source

Spooling Directory Source

File Channel

Memory Channel

Logger Sink

HDFS Sink

Partitioning

Project Discussion

Scala

Introduction To Function Programming Language And Scala

Functional Vs OOP

Variable

Functions

Using If

While To Define Logic

Loops In Scala

Collections In Scala

Object Oriented Programming

Classes And Objects

Traits In Scala

Constructors In Scala

Method Overloading

Implicit Parameter Usage

Inheritance – OOP

Override Modifier

Polymorphism

Invoking Superclass Methods

Final Members

Traits In Detail

Control Structures In Detail

Exception Handling

Coding Without Break And Continue

Coding The Functional Way

Case Classes In Scala

Implicit Conversions And Implicit

Parameter In Depth

Spark

Introduction To Apache Spark

Map Reduce Limitations

RDD’s

Spark Context – SQLContext And HiveContext

Programming With RDD’s

Creating RDD’s From Text-Files

Transformations And Actions

How Does Spark Execution Work

RDD API’s – Filter

FlatMap

Fold

Foreach

Glom

GroupBy

Map

ReduceByKey

Zip

Persist

Unpersist

Read/Write From Storage

RDD Examples

RDD API’s – Aggregate

Cartesian

Checkpoint

Coalesce

Reparition

Cogroup

CollectAsMap

CombineByKey

Count And CountApprox Functions

More RDD Examples

Schema – StructType

StructFields

DataType

DataFrame API’s And Examples

Spark sql, Machine learning, Spark GraphX

Create Temporary Tables

SparkSQL

Parquet Vs Avro

Examples And Problem Solving On Real Data Using RDD And Converting The Same To Dataframe

Create A Spark Project

SBT / Maven

How Do Maven Repo Work

Accumulators

BroadCast Variables

Query Execution Plan

Internal Of Spark Workings

Kafka
Projects and production deployment

Project Discussion

Free WordPress Themes, Free Android Games