-
About the course
When we talk about the software domain, many technologies remain unfocussed. We have designed and integrated many technologies like Big-data and Hadoop development for Software professionals/System Architects/IT Managers to manage large, complex data sets and to scale it up form single server to thousands of machines. In this course, the trainee’s shall be exposed to the basic and advance in-depth concepts of Big-data and Hadoop along with implementation on varied industry use-cases.
Course objective
Exploring Hadoop 2.x Architecture.
Need and advantages of Big-Data and Hadoop.
Mastering the concepts of HDFS and MapReduce framework
How to setup Hadoop cluster and write complex MapReduce programs.
Implementing H-Base and MapReduce integration.
Performing data analytics using Pig, Hive and YARN.
Who should do this course?
All professionals who are keen to learn how to manage large and complex data sets and scale up it from single servers to thousands of machines should go for this course.
Pre-requisites
Anyone who wants to learn Big-Data and Hadoop development should have a basic knowledge of Java programming language.
-
Course Curriculum
Big-Data and Hadoop
Introduction to Big-Data & Hadoop
1. Limitations of RDBMS
2. Need for Big-Data
3. 3 Vs of Big-Data - Volume, Velocity and Variety
4. Introduction to Hadoop
5. History of Hadoop Evolution
6. Organizations using Hadoop
7. Hadoop Job Trend in India
Hadoop Components
Hadoop Core Components
HDFS
Regular File System Vs HDFS
Name-Node
Data-Node
Secondary Name-Node
Data-Block Split
Benefits of Data Block Approach
HDFS-Block Replication Architecture
Data Replication Technology
HDFS Access
Configure System
1. Introduction to Virtual Box
2. Creating and Configuring Linux UBuntu Server 14.04 machines in Virtual Box
3. Network Configuration in Virtual Box to communicate machines
4. Introduction to Linux Environment
5. SSH
5. 6. SCP
5. 7. Passwordless SSH creation between two machines
8. Java Setup on Linux
View
Types of Views
Creating Standard View
Creating Layout Page
Communication between Controller and View
Configure Hadoop
1. Hadoop1.x installation
2. Configure different Configuration files of Hadoop1.x
3. Configure Hadoop Environment Variables
4. Running Hadoop1.x on Linux and view HDFS Daemons Name-Node,Data-Node,Secondary Name-node
5. Start/Stop Hadoop Daemons together
6. Start/Stop Hadoop Daemons Individually
7. HDFS operations from command line
8. Hdfs web interface to view hdfs components
MapReduce
Introduction to MapReduce
1. MapReduce Overview
2. Introduction to JobTracker (Hadoop1.x)
3. Introduction to Task Tracker (Hadoop1.x)
4. Hadoop1.x Job Submission to job complete process architecture
5. MapReduse Analogy (Sort-Shuffle)
Word Count App
1. Developing a Word Count Application on eclipse
2. Running WordCount application on Hadoop1.x
3. Analyze Application through Command line
4. Analyze Application through web interface
Hadoop I/O
1. Hadoop I/O
2. Different I/O formats
3. Input Split
4. Writable Interface
Files
1. Sequence File
2. Map File
MapReduce Partitioner
1. Understanding MapReduce Custom Partitioner
2. Creating an application through Eclipse for Custom Partitioner and run on Yarn
3. Understanding Combiner Function
4. Creating an application through Eclipse for Combiner Function and run on Yarn
MapReduce Features
1. Understanding Map Side Join
2. 2. Understanding Distributed Cache
3. 3. Creating a MapReduce application for Distributed Cache and run on Yarn
4. 4. Understanding Partial Sort
4. 5. Creating Application for Partial Sort and run on Yarn
4. 6. Understanding Total Order Sort
7. Creating Application for Total Order Sort and run on Yarn
Sorting
1. 1. Understanding Reduce Side Join
2. Understanding Secondary Sort
3. Creating application for Secondary Sort and run on yarn
Yarn
Introduction to Yarn
1. Yarn (Hadoop2) Overview
2. Understanding Resource Manager
3. Understanding NodeManager
4. Yarn Job Submission
Configure Yarn
1. Installing Yarn on Linux machine
2. Running Yarn Daemons
3. Start/Stop Daemons individually
4. Start/Stop Daemons together
5. Yarn Command line utility for HDFS interaction
6. Yarn web interface
7. Running word count application on yarn
Hadoop Administration
1. Hadoop2 (yarn) cluster setup
2. Hadoop Administration
3. Understanding Safe Mode
Hive
Introduction to Hive
1. Hive Overview
2. Hive History
3. Hive Installation
4. Configure Hive Configuration Files
5. Configure Environment Variables for Hive
6. Creating tables and loading data into tables
7. Understanding Hive Ware house file system through HDFS webinterface
8. Running SQL queries in hive tables from hive shell
9. Running SQL Queries from file on hive
Hive Features
1. Understanding Hive external tables
2. Creating external tables and tweaking
3. Installing MySql on Linux system
4. Configure Hive meta store with MySQL
5. Understanding hive Joins
6. Performing joins over hive tables
Hive Partitioning
1. Understanding Distribute By clause
2. Hive Partitioning
3. Strict Mode and Dynamic Partitioning
4. Bucketing
Hive Functions
1. Understanding Hive Functions
2. Understanding UDF (User Defined Functions)
3. Creating User Defined Function through Eclipse
4. Configure and run UDF from sql query
Hive Security
1. 1. Understanding Security in Hive
2. 2. Implementing Security
-
Mock-up Tests and Assignments
TrainingNCR.com assures weekly mock-up tests and regular assignments to help the students cement their foundation and have a real work-like scenario. The total tests and assignments have no limit for diligent students.