SlideShare ist ein Scribd-Unternehmen logo
1 von 12
Downloaden Sie, um offline zu lesen
Sqoop
Import with Append mode
and Last Modified mode
Getting data from Mysql Database: import
$sqoop import --connect jdbc:mysql://localhost/db_1
--username root --password root --table student_details --split-
by ID --target-dir studentdata;
$hadoop fs -ls studentdata/
Now we can see multiple part-m files in the folder. This is
because sqoop uses multiple map tasks to process the output
and each mapper gives a subset of rows and by default
sqoop uses 4 mappers i.e. the output is divided into number
of mappers.
Use CAT command to view the content of the each mapper
outputs: part-m-00000 inside u will see row data such as abc 12 TX
part-m-00001 inside –no data---
part-m-00002 inside u will see row data such as ecg 56 FL
part-m-00003 inside --- no data---
Rupak Roy
 The reason behind this is we are not using Primary key
for splitting and results in unbalance tasks where some
mapper process more data than other.
Sqoop by default uses Primary key of the table for
splitting the column.
Alternatively, we can address this issue by explicitly
declaring which column will be used for splitting the rows
among the mappers.
#explicitly declaring
Sqoop import --connect jdbc:mysql://localhost/db_1 --
username root --password root --table student_details --
Target-dir studentdata --split-by ID;
Rupak Roy
Now using Primary key
#adding primary key to our table in the database
Mysql> ALTER TABLE student_details ;
ADD PRIMARY KEY (ID);
#now use the same query to load the same data from mysql database to the hdfs.
$sqoop import --connect
jdbc:mysql://localhost/db_1 --u root –p root - - table student_details –target-dir
student_details1.
Note: it will through an error if our target directory already have the same student_details folder.
#check the data
Hadoop fs –ls /user/hduser/student_details1
Hadoop fs –cat /user/hduser/student_details1/part-m-00000
#access the database and table list
$ sqoop list-databases --connect jdbc:/mysql://localhost/ --username root --
password root
$ sqoop list-tables –connect jdbc:/mysql://localhost/db_1 --username root --
password root
Rupak Roy
Controlling Parallelism
We know sqoop by default uses map tasks to
process its job.
However scoop also provides flexibility to
change the number of map tasks depending on
our job requirements.
 With the flexibility of controlling the amount of
map tasks i.e. the parallel processing. helps to
control the load in our database.
 More mappers doesn’t always mean faster
performance. The optimal number depends
on the type of database, hardware of the
nodes(systems) or the amount of job requests.
Rupak Roy
Controlling Parallelism
Sqoop import --connect jdbc:///localhost/db_1 --
usename root --password root --table student_details
--num – mappers 1 --target-dir student_details2;
Alternatively,
Sqoop import --connect jdbc://localhost/db_1 --
username root --password root --table emp --m 1 --
target-dir emp;
Rupak Roy
Now if we want to import only the
updated data
 This can be done by using 2 modes:
1) Append Mode
2) Last – Modified Mode
1)Append Mode:
First add new rows
Mysql> use tb_1;
> insert into student_details(ID,Name,Location) values (44,’albert’,’CA’)
> insert into student_details(ID,Name,Location) values (55,’Zayn’,’MI’)
#note: we need integer data type to detect the last value in append mode
ALTER TABLE student_details
MODIFY ID int(30);
Then import
$sqoop import --connect jdbc:mysql://locahost/db_1 --username root --password root --
table student_details –split-by ID --incremental append --check-column ID --last-value 33
–target-dir Appendresults/
Where as, type: incremental mode: append, --check-column: checks the column
Therefore Append Mode is used only when the table is populated with new rows.
Rupak Roy
Now if we want to import only the
updated data
2) Last- Modified Mode: is used to overcome the
limitations of Append Mode’s row and column
updates. Hence is it suitable when the table is
populated with new rows and columns.
Each time whenever the table gets updated, the
last modified mode will use the recent time-stamp
attached with each updates to import only the
new modified data rows and columns in hdfs.
Rupak Roy
Add
#add a timestamp column
Mysql> ALTER TABLE student_details
ADD Column updated_at
TIMESTAMP DEFAULT CURRENT_TIMESTAMP
ON UPDATE CURRENT_TIMESTAMP;
#add a new column
Mysql> ALTER TABLE student_details
ADD Column YEAR char(10)
AFTER Location;
#addValues to the new columns
MySql> insert into student_details(YEAR)
values(2010)
OR
Mysql> UPDATE student_details
SET Year = 2010
where Location = ‘FL’;

.repeat again for the 2 rows
Rupak Roy
#then import
$sqoop import --connect jdbc:mysql://localhost/db_1 -u root –p
root --table student_details --split-by ID --incremental lastmodified
--checkcolumn updated_at --last-Value “2017-01-15 13:00:28”
--target-dir lmresults/
Where as type: incremental mode: lastmodified,
--checkcolumn updated_at: will check the
timestamp column with --last-Value “2017-01-15
13:00:28”
Rupak Roy
Append Mode vs Last Modified
mode
 Both append and last modified mode sets apart with
their unique advantages over each others limitations.
In Append mode you don’t have to delete the existing
output folder in HDFS, it will create an another file and
renames by itself sequentially.
But in Last Modified Mode sqoop needs the existing
output HDFS folder to be empty .
Also
In append mode it will import the data from the last-
value described but in Last Modified mode it will take all
the newly modified rows & columns into account.
Rupak Roy
Next
 In real life it might not be efficient and
practical to remember the last value each
time we run sqoop. To overcome this issue we
have an another function call sqoop job.
Rupak Roy

Weitere Àhnliche Inhalte

Was ist angesagt?

Introductive to Hive
Introductive to Hive Introductive to Hive
Introductive to Hive Rupak Roy
 
Introduction to R and R Studio
Introduction to R and R StudioIntroduction to R and R Studio
Introduction to R and R StudioRupak Roy
 
Introduction to PIG components
Introduction to PIG components Introduction to PIG components
Introduction to PIG components Rupak Roy
 
Session 04 pig - slides
Session 04   pig - slidesSession 04   pig - slides
Session 04 pig - slidesAnandMHadoop
 
Session 19 - MapReduce
Session 19  - MapReduce Session 19  - MapReduce
Session 19 - MapReduce AnandMHadoop
 
Hive and data analysis using pandas
 Hive  and  data analysis  using pandas Hive  and  data analysis  using pandas
Hive and data analysis using pandasPurna Chander K
 
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
Inside Parquet Format
Inside Parquet FormatInside Parquet Format
Inside Parquet FormatYue Chen
 
Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 FacebookHive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebookragho
 
Calling r from sas (msug meeting, feb 17, 2018) revised
Calling r from sas (msug meeting, feb 17, 2018)   revisedCalling r from sas (msug meeting, feb 17, 2018)   revised
Calling r from sas (msug meeting, feb 17, 2018) revisedBarry DeCicco
 
03 hive query language (hql)
03 hive query language (hql)03 hive query language (hql)
03 hive query language (hql)Subhas Kumar Ghosh
 
Map reduce prashant
Map reduce prashantMap reduce prashant
Map reduce prashantPrashant Gupta
 
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
Mastering Hadoop Map Reduce - Custom Types and Other OptimizationsMastering Hadoop Map Reduce - Custom Types and Other Optimizations
Mastering Hadoop Map Reduce - Custom Types and Other Optimizationsscottcrespo
 

Was ist angesagt? (19)

Introductive to Hive
Introductive to Hive Introductive to Hive
Introductive to Hive
 
Introduction to R and R Studio
Introduction to R and R StudioIntroduction to R and R Studio
Introduction to R and R Studio
 
Introduction to PIG components
Introduction to PIG components Introduction to PIG components
Introduction to PIG components
 
Unit 4 lecture-3
Unit 4 lecture-3Unit 4 lecture-3
Unit 4 lecture-3
 
Session 04 pig - slides
Session 04   pig - slidesSession 04   pig - slides
Session 04 pig - slides
 
Session 19 - MapReduce
Session 19  - MapReduce Session 19  - MapReduce
Session 19 - MapReduce
 
Unit 2
Unit 2Unit 2
Unit 2
 
Unit 4-apache pig
Unit 4-apache pigUnit 4-apache pig
Unit 4-apache pig
 
Hive and data analysis using pandas
 Hive  and  data analysis  using pandas Hive  and  data analysis  using pandas
Hive and data analysis using pandas
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
 
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLab
 
Inside Parquet Format
Inside Parquet FormatInside Parquet Format
Inside Parquet Format
 
Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 FacebookHive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebook
 
Unit 5-lecture4
Unit 5-lecture4Unit 5-lecture4
Unit 5-lecture4
 
Calling r from sas (msug meeting, feb 17, 2018) revised
Calling r from sas (msug meeting, feb 17, 2018)   revisedCalling r from sas (msug meeting, feb 17, 2018)   revised
Calling r from sas (msug meeting, feb 17, 2018) revised
 
Advanced Sqoop
Advanced Sqoop Advanced Sqoop
Advanced Sqoop
 
03 hive query language (hql)
03 hive query language (hql)03 hive query language (hql)
03 hive query language (hql)
 
Map reduce prashant
Map reduce prashantMap reduce prashant
Map reduce prashant
 
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
Mastering Hadoop Map Reduce - Custom Types and Other OptimizationsMastering Hadoop Map Reduce - Custom Types and Other Optimizations
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
 

Ähnlich wie Import Mysql Data to HDFS with Sqoop Append and Last Modified Modes

Scoop Job, import and export to RDBMS
Scoop Job, import and export to RDBMSScoop Job, import and export to RDBMS
Scoop Job, import and export to RDBMSRupak Roy
 
Apache sqoop with an use case
Apache sqoop with an use caseApache sqoop with an use case
Apache sqoop with an use caseDavin Abraham
 
Architecting cloud
Architecting cloudArchitecting cloud
Architecting cloudTahsin Hasan
 
My sql with querys
My sql with querysMy sql with querys
My sql with querysNIRMAL FELIX
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...Chester Chen
 
PostgreSQL Database Slides
PostgreSQL Database SlidesPostgreSQL Database Slides
PostgreSQL Database Slidesmetsarin
 
From oracle to hadoop with Sqoop and other tools
From oracle to hadoop with Sqoop and other toolsFrom oracle to hadoop with Sqoop and other tools
From oracle to hadoop with Sqoop and other toolsGuy Harrison
 
Hive
HiveHive
HiveVetri V
 
Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Michael Renner
 
Laravel intake 37 all days
Laravel intake 37 all daysLaravel intake 37 all days
Laravel intake 37 all daysAhmed Abd El Ftah
 
Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
Movile Internet Movel SA: A Change of Seasons: A big move to Apache CassandraMovile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
Movile Internet Movel SA: A Change of Seasons: A big move to Apache CassandraDataStax Academy
 
Cassandra Summit 2015 - A Change of Seasons
Cassandra Summit 2015 - A Change of SeasonsCassandra Summit 2015 - A Change of Seasons
Cassandra Summit 2015 - A Change of SeasonsEiti Kimura
 
Rds data lake @ Robinhood
Rds data lake @ Robinhood Rds data lake @ Robinhood
Rds data lake @ Robinhood BalajiVaradarajan13
 
twp-integrating-hadoop-data-with-or-130063
twp-integrating-hadoop-data-with-or-130063twp-integrating-hadoop-data-with-or-130063
twp-integrating-hadoop-data-with-or-130063Madhusudan Anand
 
Integrating hadoop-data-with-oracle
Integrating hadoop-data-with-oracleIntegrating hadoop-data-with-oracle
Integrating hadoop-data-with-oracleJyotirmoy Pramanik
 
Introduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 

Ähnlich wie Import Mysql Data to HDFS with Sqoop Append and Last Modified Modes (20)

Scoop Job, import and export to RDBMS
Scoop Job, import and export to RDBMSScoop Job, import and export to RDBMS
Scoop Job, import and export to RDBMS
 
Apache sqoop with an use case
Apache sqoop with an use caseApache sqoop with an use case
Apache sqoop with an use case
 
Sqoop
SqoopSqoop
Sqoop
 
Architecting cloud
Architecting cloudArchitecting cloud
Architecting cloud
 
My sql.ppt
My sql.pptMy sql.ppt
My sql.ppt
 
My sql with querys
My sql with querysMy sql with querys
My sql with querys
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
 
AWS Pentest.pdf
AWS Pentest.pdfAWS Pentest.pdf
AWS Pentest.pdf
 
PostgreSQL Database Slides
PostgreSQL Database SlidesPostgreSQL Database Slides
PostgreSQL Database Slides
 
From oracle to hadoop with Sqoop and other tools
From oracle to hadoop with Sqoop and other toolsFrom oracle to hadoop with Sqoop and other tools
From oracle to hadoop with Sqoop and other tools
 
Hive
HiveHive
Hive
 
Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014
 
Laravel intake 37 all days
Laravel intake 37 all daysLaravel intake 37 all days
Laravel intake 37 all days
 
Raj mysql
Raj mysqlRaj mysql
Raj mysql
 
Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
Movile Internet Movel SA: A Change of Seasons: A big move to Apache CassandraMovile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
 
Cassandra Summit 2015 - A Change of Seasons
Cassandra Summit 2015 - A Change of SeasonsCassandra Summit 2015 - A Change of Seasons
Cassandra Summit 2015 - A Change of Seasons
 
Rds data lake @ Robinhood
Rds data lake @ Robinhood Rds data lake @ Robinhood
Rds data lake @ Robinhood
 
twp-integrating-hadoop-data-with-or-130063
twp-integrating-hadoop-data-with-or-130063twp-integrating-hadoop-data-with-or-130063
twp-integrating-hadoop-data-with-or-130063
 
Integrating hadoop-data-with-oracle
Integrating hadoop-data-with-oracleIntegrating hadoop-data-with-oracle
Integrating hadoop-data-with-oracle
 
Introduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLab
 

Mehr von Rupak Roy

Hierarchical Clustering - Text Mining/NLP
Hierarchical Clustering - Text Mining/NLPHierarchical Clustering - Text Mining/NLP
Hierarchical Clustering - Text Mining/NLPRupak Roy
 
Clustering K means and Hierarchical - NLP
Clustering K means and Hierarchical - NLPClustering K means and Hierarchical - NLP
Clustering K means and Hierarchical - NLPRupak Roy
 
Network Analysis - NLP
Network Analysis  - NLPNetwork Analysis  - NLP
Network Analysis - NLPRupak Roy
 
Topic Modeling - NLP
Topic Modeling - NLPTopic Modeling - NLP
Topic Modeling - NLPRupak Roy
 
Sentiment Analysis Practical Steps
Sentiment Analysis Practical StepsSentiment Analysis Practical Steps
Sentiment Analysis Practical StepsRupak Roy
 
NLP - Sentiment Analysis
NLP - Sentiment AnalysisNLP - Sentiment Analysis
NLP - Sentiment AnalysisRupak Roy
 
Text Mining using Regular Expressions
Text Mining using Regular ExpressionsText Mining using Regular Expressions
Text Mining using Regular ExpressionsRupak Roy
 
Introduction to Text Mining
Introduction to Text Mining Introduction to Text Mining
Introduction to Text Mining Rupak Roy
 
Apache Hbase Architecture
Apache Hbase ArchitectureApache Hbase Architecture
Apache Hbase ArchitectureRupak Roy
 
Introduction to Hbase
Introduction to Hbase Introduction to Hbase
Introduction to Hbase Rupak Roy
 
Introduction to Flume
Introduction to FlumeIntroduction to Flume
Introduction to FlumeRupak Roy
 
Apache Pig Relational Operators - II
Apache Pig Relational Operators - II Apache Pig Relational Operators - II
Apache Pig Relational Operators - II Rupak Roy
 
Passing Parameters using File and Command Line
Passing Parameters using File and Command LinePassing Parameters using File and Command Line
Passing Parameters using File and Command LineRupak Roy
 
Apache PIG Relational Operations
Apache PIG Relational Operations Apache PIG Relational Operations
Apache PIG Relational Operations Rupak Roy
 
Apache PIG casting, reference
Apache PIG casting, referenceApache PIG casting, reference
Apache PIG casting, referenceRupak Roy
 
Pig Latin, Data Model with Load and Store Functions
Pig Latin, Data Model with Load and Store FunctionsPig Latin, Data Model with Load and Store Functions
Pig Latin, Data Model with Load and Store FunctionsRupak Roy
 
YARN(yet an another resource locator)
YARN(yet an another resource locator)YARN(yet an another resource locator)
YARN(yet an another resource locator)Rupak Roy
 
Configuring and manipulating HDFS files
Configuring and manipulating HDFS filesConfiguring and manipulating HDFS files
Configuring and manipulating HDFS filesRupak Roy
 
Introduction to hadoop ecosystem
Introduction to hadoop ecosystem Introduction to hadoop ecosystem
Introduction to hadoop ecosystem Rupak Roy
 
Geo Spatial Plot using R
Geo Spatial Plot using R Geo Spatial Plot using R
Geo Spatial Plot using R Rupak Roy
 

Mehr von Rupak Roy (20)

Hierarchical Clustering - Text Mining/NLP
Hierarchical Clustering - Text Mining/NLPHierarchical Clustering - Text Mining/NLP
Hierarchical Clustering - Text Mining/NLP
 
Clustering K means and Hierarchical - NLP
Clustering K means and Hierarchical - NLPClustering K means and Hierarchical - NLP
Clustering K means and Hierarchical - NLP
 
Network Analysis - NLP
Network Analysis  - NLPNetwork Analysis  - NLP
Network Analysis - NLP
 
Topic Modeling - NLP
Topic Modeling - NLPTopic Modeling - NLP
Topic Modeling - NLP
 
Sentiment Analysis Practical Steps
Sentiment Analysis Practical StepsSentiment Analysis Practical Steps
Sentiment Analysis Practical Steps
 
NLP - Sentiment Analysis
NLP - Sentiment AnalysisNLP - Sentiment Analysis
NLP - Sentiment Analysis
 
Text Mining using Regular Expressions
Text Mining using Regular ExpressionsText Mining using Regular Expressions
Text Mining using Regular Expressions
 
Introduction to Text Mining
Introduction to Text Mining Introduction to Text Mining
Introduction to Text Mining
 
Apache Hbase Architecture
Apache Hbase ArchitectureApache Hbase Architecture
Apache Hbase Architecture
 
Introduction to Hbase
Introduction to Hbase Introduction to Hbase
Introduction to Hbase
 
Introduction to Flume
Introduction to FlumeIntroduction to Flume
Introduction to Flume
 
Apache Pig Relational Operators - II
Apache Pig Relational Operators - II Apache Pig Relational Operators - II
Apache Pig Relational Operators - II
 
Passing Parameters using File and Command Line
Passing Parameters using File and Command LinePassing Parameters using File and Command Line
Passing Parameters using File and Command Line
 
Apache PIG Relational Operations
Apache PIG Relational Operations Apache PIG Relational Operations
Apache PIG Relational Operations
 
Apache PIG casting, reference
Apache PIG casting, referenceApache PIG casting, reference
Apache PIG casting, reference
 
Pig Latin, Data Model with Load and Store Functions
Pig Latin, Data Model with Load and Store FunctionsPig Latin, Data Model with Load and Store Functions
Pig Latin, Data Model with Load and Store Functions
 
YARN(yet an another resource locator)
YARN(yet an another resource locator)YARN(yet an another resource locator)
YARN(yet an another resource locator)
 
Configuring and manipulating HDFS files
Configuring and manipulating HDFS filesConfiguring and manipulating HDFS files
Configuring and manipulating HDFS files
 
Introduction to hadoop ecosystem
Introduction to hadoop ecosystem Introduction to hadoop ecosystem
Introduction to hadoop ecosystem
 
Geo Spatial Plot using R
Geo Spatial Plot using R Geo Spatial Plot using R
Geo Spatial Plot using R
 

KĂŒrzlich hochgeladen

Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 

KĂŒrzlich hochgeladen (20)

Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 

Import Mysql Data to HDFS with Sqoop Append and Last Modified Modes

  • 1. Sqoop Import with Append mode and Last Modified mode
  • 2. Getting data from Mysql Database: import $sqoop import --connect jdbc:mysql://localhost/db_1 --username root --password root --table student_details --split- by ID --target-dir studentdata; $hadoop fs -ls studentdata/ Now we can see multiple part-m files in the folder. This is because sqoop uses multiple map tasks to process the output and each mapper gives a subset of rows and by default sqoop uses 4 mappers i.e. the output is divided into number of mappers. Use CAT command to view the content of the each mapper outputs: part-m-00000 inside u will see row data such as abc 12 TX part-m-00001 inside –no data--- part-m-00002 inside u will see row data such as ecg 56 FL part-m-00003 inside --- no data--- Rupak Roy
  • 3.  The reason behind this is we are not using Primary key for splitting and results in unbalance tasks where some mapper process more data than other. Sqoop by default uses Primary key of the table for splitting the column. Alternatively, we can address this issue by explicitly declaring which column will be used for splitting the rows among the mappers. #explicitly declaring Sqoop import --connect jdbc:mysql://localhost/db_1 -- username root --password root --table student_details -- Target-dir studentdata --split-by ID; Rupak Roy
  • 4. Now using Primary key #adding primary key to our table in the database Mysql> ALTER TABLE student_details ; ADD PRIMARY KEY (ID); #now use the same query to load the same data from mysql database to the hdfs. $sqoop import --connect jdbc:mysql://localhost/db_1 --u root –p root - - table student_details –target-dir student_details1. Note: it will through an error if our target directory already have the same student_details folder. #check the data Hadoop fs –ls /user/hduser/student_details1 Hadoop fs –cat /user/hduser/student_details1/part-m-00000 #access the database and table list $ sqoop list-databases --connect jdbc:/mysql://localhost/ --username root -- password root $ sqoop list-tables –connect jdbc:/mysql://localhost/db_1 --username root -- password root Rupak Roy
  • 5. Controlling Parallelism We know sqoop by default uses map tasks to process its job. However scoop also provides flexibility to change the number of map tasks depending on our job requirements.  With the flexibility of controlling the amount of map tasks i.e. the parallel processing. helps to control the load in our database.  More mappers doesn’t always mean faster performance. The optimal number depends on the type of database, hardware of the nodes(systems) or the amount of job requests. Rupak Roy
  • 6. Controlling Parallelism Sqoop import --connect jdbc:///localhost/db_1 -- usename root --password root --table student_details --num – mappers 1 --target-dir student_details2; Alternatively, Sqoop import --connect jdbc://localhost/db_1 -- username root --password root --table emp --m 1 -- target-dir emp; Rupak Roy
  • 7. Now if we want to import only the updated data  This can be done by using 2 modes: 1) Append Mode 2) Last – Modified Mode 1)Append Mode: First add new rows Mysql> use tb_1; > insert into student_details(ID,Name,Location) values (44,’albert’,’CA’) > insert into student_details(ID,Name,Location) values (55,’Zayn’,’MI’) #note: we need integer data type to detect the last value in append mode ALTER TABLE student_details MODIFY ID int(30); Then import $sqoop import --connect jdbc:mysql://locahost/db_1 --username root --password root -- table student_details –split-by ID --incremental append --check-column ID --last-value 33 –target-dir Appendresults/ Where as, type: incremental mode: append, --check-column: checks the column Therefore Append Mode is used only when the table is populated with new rows. Rupak Roy
  • 8. Now if we want to import only the updated data 2) Last- Modified Mode: is used to overcome the limitations of Append Mode’s row and column updates. Hence is it suitable when the table is populated with new rows and columns. Each time whenever the table gets updated, the last modified mode will use the recent time-stamp attached with each updates to import only the new modified data rows and columns in hdfs. Rupak Roy
  • 9. Add #add a timestamp column Mysql> ALTER TABLE student_details ADD Column updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP; #add a new column Mysql> ALTER TABLE student_details ADD Column YEAR char(10) AFTER Location; #addValues to the new columns MySql> insert into student_details(YEAR) values(2010) OR Mysql> UPDATE student_details SET Year = 2010 where Location = ‘FL’; 
.repeat again for the 2 rows Rupak Roy
  • 10. #then import $sqoop import --connect jdbc:mysql://localhost/db_1 -u root –p root --table student_details --split-by ID --incremental lastmodified --checkcolumn updated_at --last-Value “2017-01-15 13:00:28” --target-dir lmresults/ Where as type: incremental mode: lastmodified, --checkcolumn updated_at: will check the timestamp column with --last-Value “2017-01-15 13:00:28” Rupak Roy
  • 11. Append Mode vs Last Modified mode  Both append and last modified mode sets apart with their unique advantages over each others limitations. In Append mode you don’t have to delete the existing output folder in HDFS, it will create an another file and renames by itself sequentially. But in Last Modified Mode sqoop needs the existing output HDFS folder to be empty . Also In append mode it will import the data from the last- value described but in Last Modified mode it will take all the newly modified rows & columns into account. Rupak Roy
  • 12. Next  In real life it might not be efficient and practical to remember the last value each time we run sqoop. To overcome this issue we have an another function call sqoop job. Rupak Roy