Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
What’s in it for you?
Sqoop Tutorial
What’s in it for you?
Need for Sqoop
What’s in it for you?
Need for Sqoop
What is Sqoop?
What’s in it for you?
Need for Sqoop
What is Sqoop?
Sqoop Features
What’s in it for you?
Need for Sqoop
What is Sqoop?
Sqoop Features
Sqoop Architecture
What’s in it for you?
Need for Sqoop
What is Sqoop?
Sqoop Features
Sqoop Architecture
Sqoop Import
What’s in it for you?
Need for Sqoop
What is Sqoop?
Sqoop Features
Sqoop Architecture
Sqoop Import
Sqoop Export
What’s in it for you?
Need for Sqoop
What is Sqoop?
Sqoop Features
Sqoop Architecture
Sqoop Import
Sqoop Export
Sqoop Proc...
What’s in it for you?
Need for Sqoop
What is Sqoop?
Sqoop Features
Sqoop Architecture
Sqoop Import
Sqoop Export
Sqoop Proc...
Need for Sqoop
Need for Sqoop
Data processing
Processing huge volumes of data
requires loading data from diverse
sources
into Hadoop clus...
Need for Sqoop
Maintaining data consistency
1
Challenges
Need for Sqoop
Maintaining data consistency Ensuring efficient
utilization of
resources
1 2
Challenges
Need for Sqoop
Maintaining data consistency Ensuring efficient
utilization of
resources
Loading bulk data
to Hadoop was no...
Need for Sqoop
Maintaining data consistency Ensuring efficient
utilization of
resources
Loading bulk data
to Hadoop was no...
Need for Sqoop
Maintaining data consistency Ensuring efficient
utilization of
resources
Loading data using
scripts was slo...
What is Sqoop?
What is Sqoop?
Sqoop is a tool used to transfer bulk data between Hadoop and external datastores such as relational
databa...
What is Sqoop?
Sqoop is a tool used to transfer bulk data between Hadoop and external datastores such as relational
databa...
What is Sqoop?
Sqoop is a tool used to transfer bulk data between Hadoop and external datastores such as relational
databa...
Sqoop Features
Sqoop Features
1
5 2
4 3
Parallel import/export
Connectors for all
major RDBMS
databases
Kerberos Security
Integration
Imp...
Sqoop Features
1
5 2
4 3
Connectors for all major
RDMS databases
Kerberos Security
Integration
Import results of SQL
query...
Sqoop Features
1
5 2
4 3
Parallel import/export
Connectors for all major
RDMS databases
Kerberos Security
Integration
Impo...
Sqoop Features
1
5 2
4 3
Parallel import/export
Connectors for all
major RDBMS
databases
Kerberos Security
Integration
Imp...
Sqoop Features
1
5 2
4 3
Parallel import/export
Connectors for all major
RDMS databases
Kerberos Security
Integration
Impo...
Sqoop Features
1
5 2
4 3
Parallel import/export
Connectors for all major
RDMS databases
Kerberos Security
Integration
Impo...
Sqoop Architecture
Sqoop Architecture
Command
Client
Client submits the import/ export command
to import or export data
Sqoop Architecture
Command
ClientDocument Based
Systems
Relational
Database
Enterprise Data
Warehouse
Connector for Data w...
Sqoop Architecture
Command
ClientDocument Based
Systems
Relational
Database
Enterprise Data
Warehouse
Map Task
HDFS/ HBase...
Sqoop Architecture
Command
ClientDocument Based
Systems
Relational
Database
Enterprise Data
Warehouse
Map Task
HDFS/ HBase...
Sqoop Import
Sqoop Import
Folders
RDBMS data store
Sqoop Import
Folders
Gathers
Metadata
1
1
Introspect database to gather metadata (primary
key information)
RDBMS data stor...
Sqoop Import
Sqoop job
HDFS
storage
Map
Map
Map
Map
Folders Submits Map-Only
Job
Hadoop Cluster
1
Introspect database to g...
Sqoop Export
Sqoop Export
Sqoop job
HDFS
storage
Map
Map
Map
Map
Hadoop Cluster
Sqoop Export
Folders
RDBMS data store
Gathers
Metadata1...
Sqoop Import
$ sqoop import (generic args) (import args)
$ sqoop-import (generic args) (import args)
Argument Description
...
Sqoop Export
$ sqoop export (generic args) (export args)
$ sqoop-export (generic args) (export args)
Argument Description
...
Sqoop Processing
Sqoop Processing
Sqoop runs in the Hadoop cluster1
Sqoop Processing
Sqoop runs in the Hadoop cluster
It imports data from RDBMS / NOSQL database to HDFS
1
2
Sqoop Processing
Sqoop runs in the Hadoop cluster
It imports data from RDBMS / NOSQL database to HDFS
It uses mappers to s...
Sqoop Processing
Sqoop runs in the Hadoop cluster
It imports data from RDBMS / NOSQL database to HDFS
It uses mappers to s...
Demo on Sqoop
Sqoop Hadoop Tutorial | Apache Sqoop Tutorial | Sqoop Import Data From MySQL to HDFS | Simplilearn
Nächste SlideShare
Wird geladen in …5
×

Sqoop Hadoop Tutorial | Apache Sqoop Tutorial | Sqoop Import Data From MySQL to HDFS | Simplilearn

99 Aufrufe

Veröffentlicht am

This presentation about Sqoop will help you learn what is Sqoop, why is Sqoop important, the different features of Sqoop, the architecture of Sqoop, how Sqoop import and export works, how Sqoop processes data and finally you’ll see how to work with Sqoop commands. Sqoop is a tool used to transfer bulk data between Hadoop and external data stores such as relational databases. This tutorial will help you understand how Sqoop can load data from MySql database into HDFS and process that data using Sqoop commands. Finally, you will learn how to export the table imported in HDFS back to RDBMS. Now, let us get started and understand Sqoop in detail.

Below topics are explained in this Sqoop Hadoop presentation:
1. Need for Sqoop
2. What is Sqoop?
3. Sqoop features
4. Sqoop Architecture
5. Sqoop import
6. Sqoop export
7. Sqoop processing
8. Demo on Sqoop

What is this Big Data Hadoop training course about?
The Big Data Hadoop and Spark developer course have been designed to impart an in-depth knowledge of Big Data processing using Hadoop and Spark. The course is packed with real-life projects and case studies to be executed in the CloudLab.

What are the course objectives?
This course will enable you to:
1. Understand the different components of Hadoop ecosystem such as Hadoop 2.7, Yarn, MapReduce, Pig, Hive, Impala, HBase, Sqoop, Flume, and Apache Spark
2. Understand Hadoop Distributed File System (HDFS) and YARN as well as their architecture, and learn how to work with them for storage and resource management
3. Understand MapReduce and its characteristics, and assimilate some advanced MapReduce concepts
4. Get an overview of Sqoop and Flume and describe how to ingest data using them
5. Create database and tables in Hive and Impala, understand HBase, and use Hive and Impala for partitioning
6. Understand different types of file formats, Avro Schema, using Arvo with Hive, and Sqoop and Schema evolution
7. Understand Flume, Flume architecture, sources, flume sinks, channels, and flume configurations
8. Understand HBase, its architecture, data storage, and working with HBase. You will also understand the difference between HBase and RDBMS
9. Gain a working knowledge of Pig and its components
10. Do functional programming in Spark
11. Understand resilient distribution datasets (RDD) in detail
12. Implement and build Spark applications
13. Gain an in-depth understanding of parallel processing in Spark and Spark RDD optimization techniques
14. Understand the common use-cases of Spark and the various interactive algorithms
15. Learn Spark SQL, creating, transforming, and querying Data frames

Learn more at https://www.simplilearn.com/big-data-and-analytics/big-data-and-hadoop-training.

Veröffentlicht in: Bildung
  • Do you have any questions on this topic? Please share your feedback in the comment section below and we'll have our experts answer it for you. Thanks for going through our presentation. Cheers!
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier
  • Gehören Sie zu den Ersten, denen das gefällt!

Sqoop Hadoop Tutorial | Apache Sqoop Tutorial | Sqoop Import Data From MySQL to HDFS | Simplilearn

  1. 1. What’s in it for you? Sqoop Tutorial
  2. 2. What’s in it for you? Need for Sqoop
  3. 3. What’s in it for you? Need for Sqoop What is Sqoop?
  4. 4. What’s in it for you? Need for Sqoop What is Sqoop? Sqoop Features
  5. 5. What’s in it for you? Need for Sqoop What is Sqoop? Sqoop Features Sqoop Architecture
  6. 6. What’s in it for you? Need for Sqoop What is Sqoop? Sqoop Features Sqoop Architecture Sqoop Import
  7. 7. What’s in it for you? Need for Sqoop What is Sqoop? Sqoop Features Sqoop Architecture Sqoop Import Sqoop Export
  8. 8. What’s in it for you? Need for Sqoop What is Sqoop? Sqoop Features Sqoop Architecture Sqoop Import Sqoop Export Sqoop Processing
  9. 9. What’s in it for you? Need for Sqoop What is Sqoop? Sqoop Features Sqoop Architecture Sqoop Import Sqoop Export Sqoop Processing Demo on Sqoop
  10. 10. Need for Sqoop
  11. 11. Need for Sqoop Data processing Processing huge volumes of data requires loading data from diverse sources into Hadoop clusters This process of loading data from heterogeneous sources comes with a set of challenges
  12. 12. Need for Sqoop Maintaining data consistency 1 Challenges
  13. 13. Need for Sqoop Maintaining data consistency Ensuring efficient utilization of resources 1 2 Challenges
  14. 14. Need for Sqoop Maintaining data consistency Ensuring efficient utilization of resources Loading bulk data to Hadoop was not possible 1 2 3 Challenges
  15. 15. Need for Sqoop Maintaining data consistency Ensuring efficient utilization of resources Loading bulk data to Hadoop was not possible 1 2 3 4 Challenges Loading data using scripts was slow
  16. 16. Need for Sqoop Maintaining data consistency Ensuring efficient utilization of resources Loading data using scripts was slow Loading bulk data to Hadoop was not possible 1 2 3 4 Challenges Solution Sqoop helped in overcoming all the challenges to traditional approach and could load bulk data from RDBMS to Hadoop very easily
  17. 17. What is Sqoop?
  18. 18. What is Sqoop? Sqoop is a tool used to transfer bulk data between Hadoop and external datastores such as relational databases (MS SQL Server, MySQL) SQOOP = SQL + HADOOP
  19. 19. What is Sqoop? Sqoop is a tool used to transfer bulk data between Hadoop and external datastores such as relational databases (MS SQL Server, MySQL) RDBMS Import Export SQOOP = SQL + HADOOP
  20. 20. What is Sqoop? Sqoop is a tool used to transfer bulk data between Hadoop and external datastores such as relational databases (MS SQL Server, MySQL) RDBMS Import Export SQOOP = SQL + HADOOP Export
  21. 21. Sqoop Features
  22. 22. Sqoop Features 1 5 2 4 3 Parallel import/export Connectors for all major RDBMS databases Kerberos Security Integration Import results of SQL query Provides full and incremental load
  23. 23. Sqoop Features 1 5 2 4 3 Connectors for all major RDMS databases Kerberos Security Integration Import results of SQL query Provides full and incremental load Sqoop uses YARN framework to import and export data. This provides fault tolerance on top of parallelism Parallel import/export
  24. 24. Sqoop Features 1 5 2 4 3 Parallel import/export Connectors for all major RDMS databases Kerberos Security Integration Import results of SQL query Provides full and incremental load Sqoop allows us to import the result returned from an SQL query into HDFS
  25. 25. Sqoop Features 1 5 2 4 3 Parallel import/export Connectors for all major RDBMS databases Kerberos Security Integration Import results of SQL query Provides full and incremental load Sqoop provides connectors for multiple Relational Database Management System (RDBMS) databases such as MySQL and MS SQL Server
  26. 26. Sqoop Features 1 5 2 4 3 Parallel import/export Connectors for all major RDMS databases Kerberos Security Integration Import results of SQL query Provides full and incremental load Sqoop supports Kerberos computer network authentication protocol that allows nodes communicating over a non-secure network to prove their identity to one another in a secure manner
  27. 27. Sqoop Features 1 5 2 4 3 Parallel import/export Connectors for all major RDMS databases Kerberos Security Integration Import results of SQL query Sqoop can load the whole table or parts of the table by a single command. Hence, it supports full and incremental load Provides full and incremental load
  28. 28. Sqoop Architecture
  29. 29. Sqoop Architecture Command Client Client submits the import/ export command to import or export data
  30. 30. Sqoop Architecture Command ClientDocument Based Systems Relational Database Enterprise Data Warehouse Connector for Data warehouse Connector for Document based system Connector for RDBMS Data from different databases is fetched by Sqoop Connectors help in working with a range of popular databases
  31. 31. Sqoop Architecture Command ClientDocument Based Systems Relational Database Enterprise Data Warehouse Map Task HDFS/ HBase/ Hive Multiple mappers perform map tasks to load the data on to HDFS
  32. 32. Sqoop Architecture Command ClientDocument Based Systems Relational Database Enterprise Data Warehouse Map Task HDFS/ HBase/ Hive Similarly, multiple map tasks will export the data from HDFS on to RDBMS using Sqoop export command
  33. 33. Sqoop Import
  34. 34. Sqoop Import Folders RDBMS data store
  35. 35. Sqoop Import Folders Gathers Metadata 1 1 Introspect database to gather metadata (primary key information) RDBMS data store Sqoop Import
  36. 36. Sqoop Import Sqoop job HDFS storage Map Map Map Map Folders Submits Map-Only Job Hadoop Cluster 1 Introspect database to gather metadata (primary key information) 2 Sqoop divides the input dataset into splits and uses individual map tasks to push the splits to HDFS RDBMS data store Sqoop Import 2 Gathers Metadata 1
  37. 37. Sqoop Export
  38. 38. Sqoop Export Sqoop job HDFS storage Map Map Map Map Hadoop Cluster Sqoop Export Folders RDBMS data store Gathers Metadata1 Submits Map-Only Job 2 1 Introspect database to gather metadata (primary key information) 2 Sqoop divides the input dataset into splits and uses individual map tasks to push the splits to RDBMS. Sqoop will export Hadoop files back to RDBMS tables.
  39. 39. Sqoop Import $ sqoop import (generic args) (import args) $ sqoop-import (generic args) (import args) Argument Description --connect <jdbc-uri> Specify JDBC connect string --connection-manager <class-name> Specify connection manager class to use --driver <class-name> Manually specify JDBC driver class to use --hadoop-mapred-home <dir> Override $HADOOP_MAPRED_HOME --username <username> Set authentication username --help Print usage instructions
  40. 40. Sqoop Export $ sqoop export (generic args) (export args) $ sqoop-export (generic args) (export args) Argument Description --connect <jdbc-uri> Specify JDBC connect string --connection-manager <class-name> Specify connection manager class to use --driver <class-name> Manually specify JDBC driver class to use --hadoop-mapred-home <dir> Override $HADOOP_MAPRED_HOME --username <username> Set authentication username --help Print usage instructions
  41. 41. Sqoop Processing
  42. 42. Sqoop Processing Sqoop runs in the Hadoop cluster1
  43. 43. Sqoop Processing Sqoop runs in the Hadoop cluster It imports data from RDBMS / NOSQL database to HDFS 1 2
  44. 44. Sqoop Processing Sqoop runs in the Hadoop cluster It imports data from RDBMS / NOSQL database to HDFS It uses mappers to slice the incoming data into multiple formats and load the data in HDFS 1 2 3
  45. 45. Sqoop Processing Sqoop runs in the Hadoop cluster It imports data from RDBMS / NOSQL database to HDFS It uses mappers to slice the incoming data into multiple formats and load the data in HDFS It exports data back into RDBMS while making sure that the schema of the data in the database in maintained 1 2 3 4
  46. 46. Demo on Sqoop

×