Training With Multisoft Systems Apache Hadoop
Description:
This 5-day course provides training for administrators with
the fundamentals required tosuccessfully implement and maintains Hadoop
clusters. The course consists of an effective mix of interactive lecture and
extensive use of hands-on lab exercises. After successfully completing this
course each student will receive one free voucher for the Hadoop Certified Administrator
exam
Multisoft Systems four-day developer training course
delivers the key concepts and expertise necessary to create robust data
processing applications using Apache Hadoop. Through lecture and interactive,
hands-on exercises, attendees will navigate the Hadoop ecosystem, learning
topics such as:
• Map Reduce and the Hadoop Distributed File System (HDFS)
and how to write Map Reduce code
• Best practices and considerations for Hadoop development,
debugging techniques and implementation of workflows and common algorithms
• How to leverage Hive, Pig, Sqoop, Flume, Oozie and other
projects from the Apache Hadoop ecosystem
• Optimal hardware configurations and network considerations
for building out, maintaining and monitoring your Hadoop cluster
• Advanced Hadoop API topics required for real-world data
analysis
AUDIENCE:
This course is intended for experienced developers who wish
to write, maintain and/or optimize Apache Hadoop jobs. A background in Java is
preferred, but experience with other programming languages such as PHP, Python
or C# is sufficient.
Course Objectives:
By taking this course, administrators are enabled to perform
the following:
• Utilize best practices for deploying Hadoop clusters
• Determine hardware needs
• Monitor Hadoop clusters
• Recover from NameNode failure
• Handle DataNode failures
• Manage hardware upgrade processes including node removal,
configuration changes, node installation and rebalancing clusters
• Manage log files
• Install, configure, deploy verify and maintain Hadoop
clusters including:
• MapReduce
• HDFS
• Pig & Hive (and My SQL)
•H Base (and Zoo Keeper) & H Catalog
•Oozie
• Mahout
Target Audience
• Administrators who are interested in learning how to
deploy and manage a Hadoopcluster. We recommend students have previous
experience with UNIX.
Course Outline:
Introduction the Case for Apache Hadoop
• A Brief History of Hadoop
• Core Hadoop Components
• Fundamental Concepts
The Hadoop Distributed File System :
•HDFS Features
•HDFS Design Assumptions
• Overview of HDFS Architecture
• Writing and Reading Files
• Name Node Considerations
• An Overview of HDFS Security
• Hands-On Exercise
Map Reduce:
• What Is Map Reduce?
• Features of Map Reduce
• Basic Map Reduce Concepts
• Architectural Overview
• Map Reduce Version 2
• Failure Recovery
• Hands-On Exercise
An Overview of the Hadoop Ecosystem
• What is the Hadoop Ecosystem?
• Integration Tools
• Analysis Tools
• Data Storage and Retrieval Tools
Planning your Hadoop Cluster:
• General planning Considerations
• Choosing the Right Hardware
• Network Considerations
• Configuring Nodes Hadoop Installation
• Deployment Types
• Installing Hadoop
• Using Hadoop Manager for Easy Installation
• Basic Configuration Parameters
• Hands-On Exercise
Advanced Configuration :
• Advanced Parameters
• Configuring Rack Awareness
• Configuring Federation
• Configuring High Availability
• Using Configuration Management Tools
Hadoop Security :
• Why Hadoop Security Is Important
• Hadoop’s Security System Concepts
• What Kerberos Is and How it Works
• Configuring Kerberos Security
• Integrating a Secure Cluster with Other Systems
Managing and Scheduling Jobs:
• Managing Running Jobs
• Hands-On Exercise
• The FIFO Scheduler
• The Fair Scheduler
• Configuring the Fairr Scheduler
• Hands-On Exercise
Cluster Maintenance:
•Checking HDFS Status
•Hands-On Exercise
•Copying Data Between Clusters
•Adding and Removing Cluster Nodes
•Rebalancing the Cluster
•Hands-On Exercise
•NameNode Metadata Backup
• Cluster Upgrading
Cluster Monitoring and
Troubleshooting:
• General System Monitoring
• Managing Hadoop’s Log Files
• Using the Name Node and Job Tracker Web UrIs
• Hands-On Exercise
• Cluster Monitoring with Ganglia
• Common Troubleshooting Issues
• Benchmarking Your Cluster
Populating HDFS from External
Sources:
• An Overview of Flume
• Hands-On Exercise
• An Overview of Sqoop
• Best Practices for Importing Data
Installing and Managing Other Hadoop
Projects:
• Hive
• Pig
•H Base
Hadoop Distributed File System (HDFS):
Recognize and identify daemons and understand the normal
operation of an Apache Hadoopcluster, both in data storage and in data
processing. Describe the current features of computing systems that motivate a
system like Apache Hadoop:
• HDFS Design
• HDFS Daemons
• HDFS Federation
• HDFS HA
• Securing HDFS (Kerberos)
• File Read and Write Paths
Developing Solutions Using Apache
Hadoop
AUDIENCE:
This course is intended for experienced developers who wish
to write, maintain and/or optimize Apache Hadoop jobs. A background in Java is
preferred, but experience with other programming languages such as PHP, Python
or C# is sufficient.
Course Outline:
Introduction the Motivation for Hadoop
• Problems with Traditional Large-Scale Systems
• Requirements for a New Approach
• Introducing Hadoop
Hadoop: Basic Concepts:
• The Had Project and Hadoop Components
• The Hadoop Distributed File System
• Hands-On Exercise: Using HDFS
• How MapReduce Works
• Hands-On Exercise: Running a MapReduce Job
• How a Hadoop Cluster Operates
• Other Hadoop Ecosystem Projects
Writing a Map Reduce Program:
• The Map Reduce Flow
• Basic Map Reduce API Concepts
• Writing Map Reduce Drivers, Mappers and Reducers in Java
• Writing Mapers and Reducers in Other Languages Using the
Streaming API
• Speeding Up Hadoop Development by Using Eclipse
•Hands-On Exercise: Writing a Map Reduce Program
• Differences between the Old and New Map Reduce APIs
Unit Testing Map Reduce Programs:
• Unit Testing
• The J Unit and MR Unit Testing Frameworks
• Writing Unit Tests with MR Unit
• Hands-On Exercise: Writing Unit Tests with the MR Unit
Framework
Diving Deeper into the Hadoop API :
• Using the Tool Runner Class
• Hands-On Exercise: Writing and Implementing a Combiner
• Setting Up and Tearing Down Mappers and Reducers by Using
the Configure andClose Methods
• Writing Custom Petitioners for Better Load Balancing
• Optional Hands-On Exercise: Writing a Practitioner
• Accessing HDFS Programmatically
• Using the Distributed Cache
• Using the Hadoop API’s Library of Mappers, Reducers and
Partitioners
Practical Development Tips and
Techniques:
• Strategies for Debugging Map Reduce Code
• Testing Map Reduce Code Locally by Using Local Job Reducer
• Writing and Viewing Log Files
• Retrieving Job Information with Counters
• Determining the Optimal Number of Reducers for a Job
• Creating Map-Only MapReduce Jobs
• Hands-On Exercise: Using Counters and a Map-Only Job
Data Input and Output:
• Creating Custom Writable and WritableComparable
Implementations
• Saving Binary Data Using SequenceFile and Avro Data Files
• Implementing Custom Input Formats and Output Formats
• Issues to Consider When Using File Compression
• Hands-On Exercise: Using SequenceFiles and File
Compression
Common MapReduce Algorithms :
• Sorting and Searching Large Data Sets
• Performing a Secondary Sort
• Indexing Data
• Hands-On Exercise: Creating an Inverted Index
• Computing Term Frequency — Inverse Document Frequency
• Calculating Word Co-Occurrence
• Hands-On Exercise: Calculating Word Co-Occurrence
(Optional)
• Hands-On Exercise: Implementing Word Co-Occurrence with a
Customer Writable Cable (Optional)
Joining Data Sets in Map Reduce Jobs:
• Writing a Map-Side Join
• Writing a Reduce-Side Join
Integrating Hadoop into the
Enterprise Workflow:
•Integrating Hadoop into an Existing Enterprise
•Loading Data from an RDBMS into HDFS by Using Sqoop
• Hands-On Exercise: Importing Data with Sqoop
• Managing Real-Time Data Using Flume
• Accessing HDFS from Legacy Systems with Fuse DFS
Machine Learning and Mahout:
• Introduction to Machine Learning
• Using Mahout
• Hands-On Exercise: Using a Mahout Recommender
An Introduction to Hive and Pig:
• The Motivation for Hive and Pig
• Hive Basics
•Hands-On Exercise: Manipulating Data with Hive
•Pig Basics and
HttpFS
• Hands-On Exercise: Using Pig to Retrieve Movie Names from
Our Recommender
• Choosing Between Hive and Pig
An Introduction to Oozie:
•Introduction to Oozie
•Creating Oozie Workflows
•Hands-On Exercise: Running an Oozie Workflow
Industry Interface Program
Projects:
• Modular Assignments
• Mini Projects
• 1 Major Project
Domains / Industry:
•Retail Industry
•Banking & Finance
•Service
•E-Commerce
•Manufacturing & Production
•Web Application Development
•Research & Analytics
•HR & Consultancy
•FMCG
• Consumer Electronics
• Event Management Industry
• Telecom
Training & Performance Tracking:
Knowledge related to current technology aspects and
corporate level deliverable & Continuous training and assessment to make
you industry ready. Throughout the Training Curriculum Candidate will go
through a Scheduled Assessment Process as below:
• Continues Assessments
• Practical Workshops
• Modular Assignments
• Case Studies & Analysis
• Presentations (Latest Trends & Technologies)
• Tech Seminars
• Technical Viva
• Observing live Models of various projects
• Domain Specific Industry Projects
Skills Development Workshop:
Communication is something which all of us do from the very
first day of our life, yet there isa question that haunts us most of the time
“Did I express myself correctly in such and such situation?” The answer to this
question is really tricky, because in some cases we leave our signatures and
good impression but in some others we even fail to get our idea clearly. It
happens mostly because we don’t know how to act in certain situations. Every
time we fail we don’t lose completely, we do learn something, but prior
knowledge of the same thing could be more beneficial because then we could have
turned that failure into success.
The course / workshop would focus at
many aspects of personality, like:
• Building positive relationships with peers & seniors
• Building self-confidence & Developing clear
communication skills
• Exploring and working on factors that help or hinder
effective interpersonal communication
• Learning impacts of non-verbal behavior & Dealing with
difficult situations anddifficult people
Workshops Consists of Following
Activities:
• Personality Development
• Group Discussions & Debates
• Seminars & Presentations
• Case Studies & Analysis
• Corporate Communication Development
• HR & Interview Skills
• Management Games & Simulations
• Aptitude, Logical & Reasoning Assessments &
Development