Best Big Data Training Institute in Dilsukhnagar
Module 1
- Big Data Introduction and Hadoop
- Fundamental
- Data Storage & Analysis
- Comparision with RDBMS
- HDFS ARCHITECTURE
- Basic Terminologies
- HDFS Block Concepts
- Replication Concepts
- Basic reading & writing of files in HDFS
- Basic processing concepts in MapReduce
- Data Flow
- Anatomy of file READ and WRITE
Module 2
- HADOOP ADMINISTRATOR
- HADOOP GEN1 VS HADOOP GEN 2(YARN)
- Linux commands
- Single and Multinode cluster installation (HADOOP Gen 2)
- AWS (EC2, RDS, S3, IAM and Cloud formation)
- Cloudera and Hortonworks distribution installation on AWS
- Cloudera Manager and Ambari
- Hadoop Security and Commissioning and Decommissioning of nodes
- Sizing of Hadoop Cluster and Name Node High Availability
Module 3
- DATA INGESTION
- Sqoop:
- Migration of data from MYSQL/ ORACLE to HDFS.
- Creating SQOOP job.
- Scheduling and Monitoring SQOOP job using OOZIE and Crontab.
- Incremental and Last modified mode in sqoop.
- Talend:
- Installation of Talend big data studio on windows server.
- Creating and Scheduling talend Jobs.
- Components: tmap, tmssqlinput, tmssqloutput,tFileInputDelimited, tfileoutputdelimited, tmssqloutputbulkexec, tunique, tFlowToIterate,tIterateToFlow, tlogcatcher, tflowmetercatcher, tfilelist, taggregate, tsort, thdfsinput, thdfsoutput, tFilterRow, thiveload.
- Flume:
- Flume Architecture
- Data Ingest in HDFS with Flume
- Flume Sources
- Flume Sinks
- Topology Design Considerations
Module 4
- DATA PROCESSING
- MapReduce:
- Env Setup
- Tool and ToolRunner
- Mapper
- Reducer
- Driver program
- How to package the job?
- MapReduce WebUI
- How MapReduce Job run?
- Shuffle & Sort
- Speculative Execution
- InputFormats
- Input Splits and Record Reader
- Default Input Formats
- Implement Custom Input Format
- OutputFormats
- Default Output formats
- Output Record Reader
- Compression
- Map Output
- Final Output
- Data types – default
- Writable vs Writable Comparable
- Custom Data types – Custom Writable/Comparable
- File Based Data structures
- Sequence file
- Reading and Writing into Sequence file
- Map File
- Tuning MapReduce Jobs
- Advanced MapReduce
- Sorting
- Partial Sort
- Total Sort
- Secondary Sort
- Joins
- Hive:
- Comparison with RDBMS
- HQL
- Data types
- Tables
- Importing and Exporting
- Partitioning and Bucketing – Advanced.
- Joins and Join Optimization.
- Functions- Built in & user defined
- Advanced Optimization of HQL
- Storage File Formats – Advanced
- Loading and Storing Data
- SerDes – Advanced
- Pig:
- Important basics
- Pig Latin
- Data types
- Functions – Built-in, User Defined
- Loading and Storing Data
- Spark:
- Spark introduction
- Spark vs MapReduce
- Intro to spark lib (SparkSql, SparkStreaming, Spark Core)
Module 5
- An Introduction to Python
- 1.1 Brief about the course
- 1.2 History/timelines of python
- 1.3 What is python ?
- 1.4 What python can do?
- 1.5 How the name was put up as python
- 1.6 Why python?
- 1.7 Who all are using python
- 1.8 Features of python
- 1.9 Python installation
- 1.10. Hello world
- 1. using cmd
- 2. IDLE
- 3. By py script
- 4. python command line
- 2: Beginning Python Basics
- 2.1. The print statements
- 2.2. Comments
- 2.3. Python Data Structures
- 2.4. variables & Data Types
- 1. rules for variable
- 2. declaring variables
- 3. Assignment in variables
- 4. operations with variables
- 5. Reserved keyword
- 2.5. Operators in Python
- 2.6. Simple Input & Output
- 2.7. Examples for variables , Data Types ,operators
- 3: Python Program Flow
- 3.1. Indentation
- 3.2. The If statement and its’ related statement
- 3.4. The while loop
- 3.5. The for loop
- 3.6. The range statement
- 3.7. Break
- 3.8. Continue
- 3.9. pass
- 3.9. Examples for looping
- 4: Functions & Modules
- 4.1. system define function(number system and its sdf ,String and its sdf
- )
- 4.2. Create your own functions (user define function)
- 4.3. Functions Parameters
- 4.4. Variable Arguments
- 4.5. An Exercise with functions
- 5: Exceptions
- 5.1. Errors
- 5.2. Exception Handling with try
- 5.3. Handling Multiple Exceptions
- 5.4. raise
- 5.5. finally
- 5.6. else
- 6: File Handling
- 6.1. File Handling Modes
- 6.2. Reading Files
- 6.3. Writing & Appending to Files
- 6.4. Handling File Exceptions
- 7: Data Structures and Data Structures functions
- 7.1. List and its sdf
- 7.2. tuple and its sdf
- 7.3. Dictionary and its sdf
- 7.4. set and its sdf
- 7.5. use cases and practical examples
- 8: casting
- 8:1 intro to casting
Module 6
- NOSQL
- Cassandra:
- Cassandra cluster installation
- Cassandra Architecture
- Cqlsh
- Replication strategy
- Tools: Opscenter, Nodetool and CCM
- Cassandra use cases
- Labs:
- Real Time use cases and Data sets covered (10+ Real Time datasets)
- Word count, Sensors (Weather Sensors) Dataset, Social Media data sets like YouTube, Twitter data analysis