Big Data with Hadoop Training
We are currently offering a world class Big Data with Hadoop training program for interested students and professionals. Registration for our training courses is open for anyone in the world because it is an online course.
1. Taught by Srini Ramineni. Srini is also the founder of DBA University. His profile can be read here.
2. Students get 12 months of on demand access to our videos from a previous live online training on a 24*7 basis.
3. Dedicated Cloud lab access for each student for 6 months. (with an option to increase the access duration to 1 year.)
4. LAB Work also includes how to install and maintain a 3 Node Cloudera Hadoop cluster (CDH) in the Amazon cloud.
5. Lab exercises with real world data sets to challenge students on course topics. Many lab exercises contain JSON (Java Script Object Notation) datasets.
6. This training uses the Cloudera distribution of Apache Hadoop.
7. All students receive access to our high quality 250 page training material through Dropbox email download.
8. The tuition fee is $649 (all-inclusive price.)
Welcome to a new world. Welcome to the world of Big Data. As per a recent McKinsey Global Institute report, there are almost 200,000 Big Data analytical talent positions available and 1.5 million more data-savvy managers needed to take full advantage of Big Data in the United States. The transformational potential of Big Data is in the below five domains.
1.Health Care (United States).
2.Public sector administration.(European Union).
3.Retail (United States).
5.Personal location data (Global).
Introduction to Big Data and the Hadoop Framework.
What is Big Data and what are the 3 characteristics of Big Data.
Introduction to Apache Hadoop.
History and current popular distributions of Hadoop.
Big Data with Hadoop job market and current trends and future predictions.
What are the use cases of Hadoop and learn about the entire Apache Hadoop ecosystem.
Lab Practice : Connect to the DBA University single node Hadoop server and browse its setup. Fully-distributed Hadoop cluster lab work will follow.
Lab Practice : How to setup a single node Hadoop server on your own PC.
The Hadoop File System (HDFS).
Introduction to the Hadoop Distributed File System (HDFS).
What is replication factor in HDFS and learn about best practices in HDFS design.
What is the Name Node, Secondary Name Node and what are Data Nodes in HDFS.
Browse the HDFS using the web interface.
Identify configuration parameters for the Namenode and Datanode.
High Availability of Data and Metadata (Name Node) in HDFS.
Practice lab exercises working with HDFS using City of Chicago data sets.
Map Reduce computation paradigm.
Introduction the Map Reduce computation paradigm for Big Data processing.
What are mappers and reducers.
Learn about the distributed data processing in Map Reduce.
Understand the differences between Map Reduce 1.0 and the latest Map Reduce with Yarn (MRV2) version.
Learn about the different components of the Map Reduce computation framework.
Apache Sqoop and Hadoop.
Introduction to Apache Sqoop tool.
Prerequisites for the Sqoop data connector for Oracle and Hadoop.
How to import data from a relational database to Hadoop using Sqoop.
How to export data from Hadoop to a relational database using Sqoop.
Practice lab exercises with Apache Sqoop and an Oracle database.
Data warehousing in Hadoop (Apache Hive)
Introduction to Apache Hive.
Understand the components and architecture of Apache Hive.
The command line interfaces for running HiveQL: hive and beeline.
Learn about Hive Partitions and Buckets.
Learn and practice HiveQL statements.
How to work with the Twitter API to download tweets data.
Practice lab exercises working with real time data sets in Hive.
What is Apache Pig.
Learn about the Pig Data Model.
What are the rules and syntax of the Pig Latin language.
What is a JSON data object and how to load and analyze JSON data sets using Pig.
Practice lab exercises working with real time JSON data sets in Pig.
Install Cloudera Hadoop cluster in the cloud.
Choosing the hardware and compute resources for the servers (nodes) in a Hadoop cluster.
Software installation prerequisites of the Cloudera Hadoop cluster (CDH).
Understand Cloudera Director and Cloudera Manager software components.
Learn how to perform Cloud Computing using Amazon Web Services (AWS).
Lab practice : How to install a 3 node Cloudera Hadoop cluster (CDH) in the cloud (AWS).
Lab practice : How to administer, manage and monitor the Hadoop cluster nodes using Cloudera Manager.
Learn about Apache Hue web interface.
Lab practice : Use Hue web interface to input Hive, Sqoop and Pig commands.
Introduction to Apache Spark.
Compare Apache Spark and Map Reduce computational framework.
Learn about Spark SQL and DataFrames.
How to store and analyze JSON documents using Apache Spark software framework.
Introduction to Cloudera Impala.
Key Features of Cloudera Impala.
Cloudera Impala vs Map Reduce computational framework.
Comparision among Apache Hive, Pig and Impala.
Practice lab exercises using Cloudera Impala.
Apache Flume and real world use cases.
What are the various components of Apache Flume.
Flume agent configuration.
Practice lab exercises using Apache Flume.