SlideShare a Scribd company logo
1 of 13
Download to read offline
Best ETL, Data Warehousing Questions
srini
Created with myebookmaker
www.myebookmaker.com
Table of Contents
ETL Questions
Data warehousing UNIX Questions
1) What is meant by ETL?
The overall data acquisition process, called ETL (extraction, transformation,and loading), is
generally grouped into three main components :
-Extraction: Involves obtaining the required data from the various sources.
-Transformation: Source data undergoes a number of operations that prepare it for import into
the data warehouse (target database). To perform this task, integration and transformation
programs are used which can reformat, recalculate, modify structure and data elements, and add
lime elements. They can also perform calculations, summarization, de-normal-ization, etc.
-Loading: Involves physically placing extracted and transformed data in the target database.
The initial loading involves a massive data import into the data warehouse. Subsequently, an
extraction procedure periodically loads fresh data based on business rules and a pre–determined
frequency.
2) What are the types of dimensional schema?
There are two types of dimensional schema :
– Star schema
– Snowflake schema
3) What is star schema?
Star schema, there is only one central fact table, and a set of dimension tables, one for each
dimension. In star schema, each dimension is represented by only one table, and each table
contains a set of attributes.
4) What is snowflake schema?
-A snowflake schema avoids the redundancy of star schemas by normalizing the dimension
tables. Therefore, a dimension is represented by several tables related by referential integrity
constraints.
-Snowflake schema means a central located fact table is surrounded by denormalized
dimension tables again these denormalized dimension table is splited one or more normalized
dimension tables.
5) What is meant by starflake schema?
A starflake schema is a combination of the star and the snowflake schemas where some
dimensions are normalized while others are not.
6) What is Operational Data Store?
Operational Data Store (ODS) is a hybrid data architecture to cover the requirements for both
analytical and operational tasks.
Also Read:Top 20 ultimate ETLQuestions really good for interviews
7) What is the difference between star schema and snowflake schema?
The star schema consists of a fact table with a single table for each dimension. The snowflake
schema is a variation on the star schema where the dimensional tables from a star schema are
organized into a hierarchy by normalizing them. A fact constellation is a set of fact tables that
share some dimension tables.
8) What is data staging?
Data staging is the process of transferring the data from the data sources (operational systems)
into the target database of the data warehouse.
9) What is a session?
A session is a set of instructions that describes how and when to move data from sources to
targets.
More: ETLTesting from beginner to expert
10) What is mapping?
A mapping is a set of source and target definitions linked by transformation objects that define
the rules for data transformation.
11) What is Datadriven?
The informatica server follows instructions coded into update strategy transformations with in
the session maping determine how to flag records for insert,update,delete or reject. If you do not
choose data driven option setting, the informatica server ignores all update strategy
transformations in the mapping.
12) What are the three major types of metadata in a data warehouse?
Metadata in a data warehouse fall into three major categories :
Operational Metadata
Extraction and Transformation Metadata
End–User Metadata
13) What is OLAP?
Allow users to run complex dimensional queries
Enable users to generate canned queries.
Two categories of online analytical processing are multidimensional online analytical
processing (MOLAP) and relational online analytical processing (ROLAP).
14) What is meant by geographic information system(GIS)?
A software system that allows users to define, create, maintain, and control access to a
geographic database.
15) What is dimension table?
A relational table that contains dimension data.
16) What is the diffrence between OLTP and OLAP?
The main differences between OLTP and OLAP are:
OLTP systems are for doing clerical/operational processing of data whereas OLAP systems are for carrying out
analytical processing of the data.
OLTP systems look at data in one dimension; whereas in OLAP systems, data can be viewed in different dimensions
and hence interesting business intelligence can be extracted from the data.
Operational personnel of an organization use the OLTP systems whereas
management uses OLAP systems, though operational personnel may also use portions of OLAP system.
OLTP systems contain the current data as well as the details of the transactions. OLAP systems contain historical
data, and also data in summarized form.
OLTP database size is smaller as compared to OLAP systems. If the OLTP database occupies Gigabytes (GB) of
storage space, OLAP database occupies Terabytes (TB) of storage space.
More: Automating ETL: complete five projects
17) What is DTM?
DTM transform data received from reader buffer and its moves transformation to
transformation on row by row basis and it uses transformation caches when necessary.
18) What is meant by spatial data warehouse?
A data warehouse that manipulates spatial data, thus allowing spatial analysis. This is to be
contrasted with conventional and temporal data warehouses.
19) What is a Batch?
Batches provide a way to group sessions for either serial or parallel execution by the
Informatica Server.
20) What are the types of batch?
There are two types of batches are :
Sequential batch : Runs sessions one after the other.
Concurrent batch : Runs sessions at the same time.
there are many ways to do this. However the easiest way to display the first
line of a file is using the [head] command.
$> head -1 file. Txt
no prize in guessing that if you specify [head -2] then it would print first 2
records of the file.
another way can be by using [sed] command. [sed] is a very powerful text
editor which can be used for various text manipulation purposes like this.
$> sed '2,$ d' file. Txt
how does the above command work?
The 'd' parameter basically tells [sed] to delete all the records from display
from line 2 to last line of the file (last line is represented by $ symbol). Of
course it does not actually delete those lines from the file, it just does not
display those lines in standard output screen. So you only see the remaining
line which is the 1st line.
how to print/display the last line of a file?
the easiest way is to use the [tail] command.
$> tail -1 file. Txt
if you want to do it using [sed] command, here is what you should write:
$> sed -n '$ p' test
from our previous answer, we already know that '$' stands for the last line of
the file. So '$ p' basically prints (p for print) the last line in standard output
screen. '-n' switch takes [sed] to silent mode so that [sed] does not print
anything else in the output.
how to display n-th line of a file?
the easiest way to do it will be by using [sed] i guess. Based on what we
already know about [sed] from our previous examples, we can quickly deduce
this command:
$> sed –n ' p' file. Txt
you need to replace with the actual line number. So if you want to print the
4th line, the command will be
$> sed –n '4 p' test
How to print/display the first line of a file?
of course you can do it by using [head] and [tail] command as well like
below:
$> head - file. Txt | tail -1
you need to replace with the actual line number. So if you want to print the
4th line, the command will be
$> head -4 file. Txt | tail -1
how to remove the first line / header from a file?
we already know how [sed] can be used to delete a certain line from the
output – by using the'd' switch. So if we want to delete the first line the
command should be:
$> sed '1 d' file. Txt
but the issue with the above command is, it just prints out all the lines except
the first line of the file on the standard output. It does not really change the file
in-place. So if you want to delete the first line from the file itself, you have two
options.
either you can redirect the output of the file to some other file and then
rename it back to original file like below:
$> sed '1 d' file. Txt > new_file. Txt
$> mv new_file. Txt file. Txt
or, you can use an inbuilt [sed] switch '–i' which changes the file in-place.
See below:
$> sed –i '1 d' file. Txt
how to remove the last line/ trailer from a file in unix script?
always remember that [sed] switch '$' refers to the last line. So using this
knowledge we can deduce the below command:
$> sed –i '$ d' file. Txt
how to remove certain lines from a file in unix?
if you want to remove line to line from a given file, you can accomplish the
task in the similar method shown above. Here is an example:
$> sed –i '5,7 d' file. Txt
the above command will delete line 5 to line 7 from the file file. Txt
how to remove the last n-th line from a file?
this is bit tricky. Suppose your file contains 100 lines and you want to remove
the last 5 lines. Now if you know how many lines are there in the file, then you
can simply use the above shown method and can remove all the lines from 96
to 100 like below:
$> sed –i '96,100 d' file. Txt # alternative to command [head -95 file. Txt]
but not always you will know the number of lines present in the file (the file
may be generated dynamically, etc. ) in that case there are many different ways
to solve the problem. There are some ways which are quite complex and fancy.
But let's first do it in a way that we can understand easily and remember easily.
Here is how it goes:
$> tt=`wc -l file. Txt | cut -f1 -d' '`;sed –i "`expr $tt - 4`,$tt d" test
as you can see there are two commands. The first one (before the semi-colon)
calculates the total number of lines present in the file and stores it in a variable
called “tt”. The second command (after the semi-colon), uses the variable and
works in the exact way as shows in the previous example.
how to check the length of any line in a file?
we already know how to print one line from a file which is this:
$> sed –n ' p' file. Txt
where is to be replaced by the actual line number that you want to print. Now
once you know it, it is easy to print out the length of this line by using [wc]
command with '-c' switch.
$> sed –n '35 p' file. Txt | wc –c
the above command will print the length of 35th line in the file. Txt.
how to get the nth word of a line in unix?
assuming the words in the line are separated by space, we can use the [cut]
command. [cut] is a very powerful and useful command and it's real easy. All
you have to do to get the n-th word from the line is issue the following
command:
cut –f -d' '
'-d' switch tells [cut] about what is the delimiter (or separator) in the file,
which is space ' ' in this case. If the separator was comma, we could have
written -d',' then. So, suppose i want find the 4th word from the below string:
“a quick brown fox jumped over the lazy cat”, we will do something like this:
$> echo “a quick brown fox jumped over the lazy cat” | cut –f4 –d' '
and it will print “fox”
how to reverse a string in unix?
pretty easy. Use the [rev] command.
$> echo "unix" | rev
xinu
how to get the last word from a line in unix file?
we will make use of two commands that we learnt above to solve this. The
commands are [rev] and [cut]. Here we go.
let's imagine the line is: “c for cat”. We need “cat”. First we reverse the line.
We get “tac rof c”. Then we cut the first word, we get 'tac'. And then we
reverse it again.
$>echo "c for cat" | rev | cut -f1 -d' ' | rev
cat
how to get the n-th field from a unix command output?
we know we can do it by [cut]. Like below command extracts the first field
from the output of [wc –c] command
$>wc -c file. Txt | cut -d' ' -f1
109
but i want to introduce one more command to do this here. That is by using
[awk] command. [awk] is a very powerful command for text pattern scanning
and processing. Here we will see how may we use of [awk] to extract the first
field (or first column) from the output of another command. Like above suppose
i want to print the first column of the [wc –c] output. Here is how it goes like
this:
$>wc -c file. Txt | awk ' ''{print $1}'
109
the basic syntax of [awk] is like this:
awk 'pattern space''{action space}'
the pattern space can be left blank or omitted, like below:
$>wc -c file. Txt | awk '{print $1}'
109
in the action space, we have asked [awk] to take the action of printing the
first column ($1). More on [awk] later.
how to replace the n-th line in a file with a new line in unix?
this can be done in two steps. The first step is to remove the n-th line. And
the second step is to insert a new line in n-th line position. Here we go.
step 1: remove the n-th line
$>sed -i'' '10 d' file. Txt # d stands for delete
step 2: insert a new line at n-th line position
$>sed -i'' '10 i this is the new line' file. Txt # i stands for insert
how to show the non-printable characters in a file?
open the file in vi editor. Go to vi command mode by pressing [escape] and
then [:]. Then type [set list]. This will show you all the non-printable
characters, e. G. Ctrl-m characters (^m) etc. , in the file.
how to zip a file in linux?
use inbuilt [zip] command in linux
how to unzip a file in linux?
use inbuilt [unzip] command in linux.
$> unzip –j file. Zip
how to test if a zip file is corrupted in linux?
use “-t” switch with the inbuilt [unzip] command
$> unzip –t file. Zip
how to check if a file is zipped in unix?
in order to know the file type of a particular file use the [file] command like
below:
$> file file. Txt
file. Txt: ascii text
if you want to know the technical mime type of the file, use “-i” switch.
$>file -i file. Txt
file. Txt: text/plain; charset=us-ascii
if the file is zipped, following will be the result
$> file –i file. Zip
file. Zip: application/x-zip

More Related Content

What's hot

Data Warehouse and Business Intelligence - Recipe 2
Data Warehouse and Business Intelligence - Recipe 2Data Warehouse and Business Intelligence - Recipe 2
Data Warehouse and Business Intelligence - Recipe 2Massimo Cenci
 
Introduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RIntroduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RYanchang Zhao
 
Import web resources using R Studio
Import web resources using R StudioImport web resources using R Studio
Import web resources using R StudioRupak Roy
 
Tree representation in map reduce world
Tree representation  in map reduce worldTree representation  in map reduce world
Tree representation in map reduce worldYu Liu
 
Recipe 5 of Data Warehouse and Business Intelligence - The null values manage...
Recipe 5 of Data Warehouse and Business Intelligence - The null values manage...Recipe 5 of Data Warehouse and Business Intelligence - The null values manage...
Recipe 5 of Data Warehouse and Business Intelligence - The null values manage...Massimo Cenci
 
Apache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathurApache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathurSiddharth Mathur
 
Apache Hive Table Partition and HQL
Apache Hive Table Partition and HQLApache Hive Table Partition and HQL
Apache Hive Table Partition and HQLRupak Roy
 
1 list datastructures
1 list datastructures1 list datastructures
1 list datastructuresNguync91368
 
Dynamic memory allocation
Dynamic memory allocationDynamic memory allocation
Dynamic memory allocationMoniruzzaman _
 
Introduction To R Language
Introduction To R LanguageIntroduction To R Language
Introduction To R LanguageGaurang Dobariya
 
Recipes 8 of Data Warehouse and Business Intelligence - Naming convention tec...
Recipes 8 of Data Warehouse and Business Intelligence - Naming convention tec...Recipes 8 of Data Warehouse and Business Intelligence - Naming convention tec...
Recipes 8 of Data Warehouse and Business Intelligence - Naming convention tec...Massimo Cenci
 

What's hot (20)

Chapter16
Chapter16Chapter16
Chapter16
 
DataBase Management System Lab File
DataBase Management System Lab FileDataBase Management System Lab File
DataBase Management System Lab File
 
Data Warehouse and Business Intelligence - Recipe 2
Data Warehouse and Business Intelligence - Recipe 2Data Warehouse and Business Intelligence - Recipe 2
Data Warehouse and Business Intelligence - Recipe 2
 
Introduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RIntroduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in R
 
Import web resources using R Studio
Import web resources using R StudioImport web resources using R Studio
Import web resources using R Studio
 
Sql ppt
Sql pptSql ppt
Sql ppt
 
Data Management in Python
Data Management in PythonData Management in Python
Data Management in Python
 
ADVANCE ITT BY PRASAD
ADVANCE ITT BY PRASADADVANCE ITT BY PRASAD
ADVANCE ITT BY PRASAD
 
Tree representation in map reduce world
Tree representation  in map reduce worldTree representation  in map reduce world
Tree representation in map reduce world
 
BIG DATA Session 7 8
BIG DATA Session 7 8BIG DATA Session 7 8
BIG DATA Session 7 8
 
Recipe 5 of Data Warehouse and Business Intelligence - The null values manage...
Recipe 5 of Data Warehouse and Business Intelligence - The null values manage...Recipe 5 of Data Warehouse and Business Intelligence - The null values manage...
Recipe 5 of Data Warehouse and Business Intelligence - The null values manage...
 
Apache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathurApache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathur
 
Apache Hive Table Partition and HQL
Apache Hive Table Partition and HQLApache Hive Table Partition and HQL
Apache Hive Table Partition and HQL
 
Data Management in R
Data Management in RData Management in R
Data Management in R
 
1 list datastructures
1 list datastructures1 list datastructures
1 list datastructures
 
Dynamic memory allocation
Dynamic memory allocationDynamic memory allocation
Dynamic memory allocation
 
Introduction To R Language
Introduction To R LanguageIntroduction To R Language
Introduction To R Language
 
Big Data Analytics Lab File
Big Data Analytics Lab FileBig Data Analytics Lab File
Big Data Analytics Lab File
 
Recipes 8 of Data Warehouse and Business Intelligence - Naming convention tec...
Recipes 8 of Data Warehouse and Business Intelligence - Naming convention tec...Recipes 8 of Data Warehouse and Business Intelligence - Naming convention tec...
Recipes 8 of Data Warehouse and Business Intelligence - Naming convention tec...
 
Datamining with R
Datamining with RDatamining with R
Datamining with R
 

Viewers also liked

Surveillance of Borrelia Miyamotoi in the Greater Chicago Area
Surveillance of Borrelia Miyamotoi in the Greater Chicago AreaSurveillance of Borrelia Miyamotoi in the Greater Chicago Area
Surveillance of Borrelia Miyamotoi in the Greater Chicago AreaMallory Orth
 
Charlasobreeducacionsexual 121118143600-phpapp02
Charlasobreeducacionsexual 121118143600-phpapp02Charlasobreeducacionsexual 121118143600-phpapp02
Charlasobreeducacionsexual 121118143600-phpapp02Sergio Gustavo
 
Writing command macro in stratus cobol
Writing command macro in stratus cobolWriting command macro in stratus cobol
Writing command macro in stratus cobolSrinimf-Slides
 
Échange et interopérabilité des données structurées sur le Web
Échange et interopérabilité des données structurées sur le WebÉchange et interopérabilité des données structurées sur le Web
Échange et interopérabilité des données structurées sur le WebAntidot
 
Customer Review for TriMethyl-Histone H3-K56 Polyclonal Antibody (STJ29403)
Customer Review for TriMethyl-Histone H3-K56 Polyclonal Antibody (STJ29403)Customer Review for TriMethyl-Histone H3-K56 Polyclonal Antibody (STJ29403)
Customer Review for TriMethyl-Histone H3-K56 Polyclonal Antibody (STJ29403)St John's Laboratory Ltd
 
Customer Review For mTOR Polyclonal Antibody (STJ94280)
Customer Review For mTOR Polyclonal Antibody (STJ94280)Customer Review For mTOR Polyclonal Antibody (STJ94280)
Customer Review For mTOR Polyclonal Antibody (STJ94280)St John's Laboratory Ltd
 
Fórmulas de Matemática Financiera I
Fórmulas de Matemática Financiera IFórmulas de Matemática Financiera I
Fórmulas de Matemática Financiera Ibillod
 
Bab 3 teori gelagat pengguna
Bab 3 teori gelagat penggunaBab 3 teori gelagat pengguna
Bab 3 teori gelagat penggunaSyahira Md Desa
 
Historia de la contabilidad de costos
Historia de la contabilidad de costosHistoria de la contabilidad de costos
Historia de la contabilidad de costosbillod
 
GELAGAT PENGGUNA
GELAGAT PENGGUNAGELAGAT PENGGUNA
GELAGAT PENGGUNACkg Nizam
 

Viewers also liked (18)

Aaaa
AaaaAaaa
Aaaa
 
О.В.Кубська. Формування слухацької культури учнів мультимедійними засобами
О.В.Кубська. Формування слухацької культури учнів мультимедійними засобамиО.В.Кубська. Формування слухацької культури учнів мультимедійними засобами
О.В.Кубська. Формування слухацької культури учнів мультимедійними засобами
 
W2W
W2WW2W
W2W
 
Surveillance of Borrelia Miyamotoi in the Greater Chicago Area
Surveillance of Borrelia Miyamotoi in the Greater Chicago AreaSurveillance of Borrelia Miyamotoi in the Greater Chicago Area
Surveillance of Borrelia Miyamotoi in the Greater Chicago Area
 
Charlasobreeducacionsexual 121118143600-phpapp02
Charlasobreeducacionsexual 121118143600-phpapp02Charlasobreeducacionsexual 121118143600-phpapp02
Charlasobreeducacionsexual 121118143600-phpapp02
 
Gran vía
Gran víaGran vía
Gran vía
 
Writing command macro in stratus cobol
Writing command macro in stratus cobolWriting command macro in stratus cobol
Writing command macro in stratus cobol
 
Échange et interopérabilité des données structurées sur le Web
Échange et interopérabilité des données structurées sur le WebÉchange et interopérabilité des données structurées sur le Web
Échange et interopérabilité des données structurées sur le Web
 
Customer Review for TriMethyl-Histone H3-K56 Polyclonal Antibody (STJ29403)
Customer Review for TriMethyl-Histone H3-K56 Polyclonal Antibody (STJ29403)Customer Review for TriMethyl-Histone H3-K56 Polyclonal Antibody (STJ29403)
Customer Review for TriMethyl-Histone H3-K56 Polyclonal Antibody (STJ29403)
 
perdagangan internasional
perdagangan internasionalperdagangan internasional
perdagangan internasional
 
Customer Review For mTOR Polyclonal Antibody (STJ94280)
Customer Review For mTOR Polyclonal Antibody (STJ94280)Customer Review For mTOR Polyclonal Antibody (STJ94280)
Customer Review For mTOR Polyclonal Antibody (STJ94280)
 
Rome
RomeRome
Rome
 
Indirect Flow Cytometry Protocol
Indirect Flow Cytometry ProtocolIndirect Flow Cytometry Protocol
Indirect Flow Cytometry Protocol
 
Fórmulas de Matemática Financiera I
Fórmulas de Matemática Financiera IFórmulas de Matemática Financiera I
Fórmulas de Matemática Financiera I
 
Bab 3 teori gelagat pengguna
Bab 3 teori gelagat penggunaBab 3 teori gelagat pengguna
Bab 3 teori gelagat pengguna
 
Immunocytochemistry Protocol
Immunocytochemistry ProtocolImmunocytochemistry Protocol
Immunocytochemistry Protocol
 
Historia de la contabilidad de costos
Historia de la contabilidad de costosHistoria de la contabilidad de costos
Historia de la contabilidad de costos
 
GELAGAT PENGGUNA
GELAGAT PENGGUNAGELAGAT PENGGUNA
GELAGAT PENGGUNA
 

Similar to The best ETL questions in a nut shell

data stage-material
data stage-materialdata stage-material
data stage-materialRajesh Kv
 
22827361 ab initio-fa-qs
22827361 ab initio-fa-qs22827361 ab initio-fa-qs
22827361 ab initio-fa-qsCapgemini
 
MDF and LDF in SQL Server
MDF and LDF in SQL ServerMDF and LDF in SQL Server
MDF and LDF in SQL ServerMasum Reza
 
Db2 migration -_tips,_tricks,_and_pitfalls
Db2 migration -_tips,_tricks,_and_pitfallsDb2 migration -_tips,_tricks,_and_pitfalls
Db2 migration -_tips,_tricks,_and_pitfallssam2sung2
 
Top answers to etl interview questions
Top answers to etl interview questionsTop answers to etl interview questions
Top answers to etl interview questionssrimaribeda
 
Chapter 1( intro & overview)
Chapter 1( intro & overview)Chapter 1( intro & overview)
Chapter 1( intro & overview)MUHAMMAD AAMIR
 
Etl interview questions
Etl interview questionsEtl interview questions
Etl interview questionsashokvirtual
 
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Serban Tanasa
 
ata Warehouse and Business Intelligence - Recipe 7 - A messaging system for O...
ata Warehouse and Business Intelligence - Recipe 7 - A messaging system for O...ata Warehouse and Business Intelligence - Recipe 7 - A messaging system for O...
ata Warehouse and Business Intelligence - Recipe 7 - A messaging system for O...Massimo Cenci
 
Free ebooks download ! Edhole
Free ebooks download ! EdholeFree ebooks download ! Edhole
Free ebooks download ! EdholeEdhole.com
 
Free ebooks download ! Edhole
Free ebooks download ! EdholeFree ebooks download ! Edhole
Free ebooks download ! EdholeEdhole.com
 
Recipe 14 - Build a Staging Area for an Oracle Data Warehouse (2)
Recipe 14 - Build a Staging Area for an Oracle Data Warehouse (2)Recipe 14 - Build a Staging Area for an Oracle Data Warehouse (2)
Recipe 14 - Build a Staging Area for an Oracle Data Warehouse (2)Massimo Cenci
 

Similar to The best ETL questions in a nut shell (20)

data stage-material
data stage-materialdata stage-material
data stage-material
 
22827361 ab initio-fa-qs
22827361 ab initio-fa-qs22827361 ab initio-fa-qs
22827361 ab initio-fa-qs
 
MDF and LDF in SQL Server
MDF and LDF in SQL ServerMDF and LDF in SQL Server
MDF and LDF in SQL Server
 
Db2 migration -_tips,_tricks,_and_pitfalls
Db2 migration -_tips,_tricks,_and_pitfallsDb2 migration -_tips,_tricks,_and_pitfalls
Db2 migration -_tips,_tricks,_and_pitfalls
 
Top answers to etl interview questions
Top answers to etl interview questionsTop answers to etl interview questions
Top answers to etl interview questions
 
Flink internals web
Flink internals web Flink internals web
Flink internals web
 
Chapter 1( intro & overview)
Chapter 1( intro & overview)Chapter 1( intro & overview)
Chapter 1( intro & overview)
 
Module02
Module02Module02
Module02
 
Etl interview questions
Etl interview questionsEtl interview questions
Etl interview questions
 
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
 
Cassandra data modelling best practices
Cassandra data modelling best practicesCassandra data modelling best practices
Cassandra data modelling best practices
 
Lecture 9.pptx
Lecture 9.pptxLecture 9.pptx
Lecture 9.pptx
 
Data structure
 Data structure Data structure
Data structure
 
ata Warehouse and Business Intelligence - Recipe 7 - A messaging system for O...
ata Warehouse and Business Intelligence - Recipe 7 - A messaging system for O...ata Warehouse and Business Intelligence - Recipe 7 - A messaging system for O...
ata Warehouse and Business Intelligence - Recipe 7 - A messaging system for O...
 
Bt0065
Bt0065Bt0065
Bt0065
 
B T0065
B T0065B T0065
B T0065
 
Free ebooks download ! Edhole
Free ebooks download ! EdholeFree ebooks download ! Edhole
Free ebooks download ! Edhole
 
Free ebooks download ! Edhole
Free ebooks download ! EdholeFree ebooks download ! Edhole
Free ebooks download ! Edhole
 
Recipe 14 - Build a Staging Area for an Oracle Data Warehouse (2)
Recipe 14 - Build a Staging Area for an Oracle Data Warehouse (2)Recipe 14 - Build a Staging Area for an Oracle Data Warehouse (2)
Recipe 14 - Build a Staging Area for an Oracle Data Warehouse (2)
 
Oracle tutorial
Oracle tutorialOracle tutorial
Oracle tutorial
 

More from Srinimf-Slides

software-life-cycle.pptx
software-life-cycle.pptxsoftware-life-cycle.pptx
software-life-cycle.pptxSrinimf-Slides
 
Python Tutorial Questions part-1
Python Tutorial Questions part-1Python Tutorial Questions part-1
Python Tutorial Questions part-1Srinimf-Slides
 
Cics testing and debugging-session 7
Cics testing and debugging-session 7Cics testing and debugging-session 7
Cics testing and debugging-session 7Srinimf-Slides
 
CICS error and exception handling-recovery and restart-session 6
CICS error and exception handling-recovery and restart-session 6CICS error and exception handling-recovery and restart-session 6
CICS error and exception handling-recovery and restart-session 6Srinimf-Slides
 
Cics program, interval and task control commands-session 5
Cics program, interval and task control commands-session 5Cics program, interval and task control commands-session 5
Cics program, interval and task control commands-session 5Srinimf-Slides
 
Cics data access-session 4
Cics data access-session 4Cics data access-session 4
Cics data access-session 4Srinimf-Slides
 
CICS basic mapping support - session 3
CICS basic mapping support - session 3CICS basic mapping support - session 3
CICS basic mapping support - session 3Srinimf-Slides
 
Cics application programming - session 2
Cics   application programming - session 2Cics   application programming - session 2
Cics application programming - session 2Srinimf-Slides
 
CICS basics overview session-1
CICS basics overview session-1CICS basics overview session-1
CICS basics overview session-1Srinimf-Slides
 
The best Teradata RDBMS introduction a quick refresher
The best Teradata RDBMS introduction a quick refresherThe best Teradata RDBMS introduction a quick refresher
The best Teradata RDBMS introduction a quick refresherSrinimf-Slides
 
IMS DC Self Study Complete Tutorial
IMS DC Self Study Complete TutorialIMS DC Self Study Complete Tutorial
IMS DC Self Study Complete TutorialSrinimf-Slides
 
How To Master PACBASE For Mainframe In Only Seven Days
How To Master PACBASE For Mainframe In Only Seven DaysHow To Master PACBASE For Mainframe In Only Seven Days
How To Master PACBASE For Mainframe In Only Seven DaysSrinimf-Slides
 
Assembler Language Tutorial for Mainframe Programmers
Assembler Language Tutorial for Mainframe ProgrammersAssembler Language Tutorial for Mainframe Programmers
Assembler Language Tutorial for Mainframe ProgrammersSrinimf-Slides
 
The Easytrieve Presention by Srinimf
The Easytrieve Presention by SrinimfThe Easytrieve Presention by Srinimf
The Easytrieve Presention by SrinimfSrinimf-Slides
 
PLI Presentation for Mainframe Programmers
PLI Presentation for Mainframe ProgrammersPLI Presentation for Mainframe Programmers
PLI Presentation for Mainframe ProgrammersSrinimf-Slides
 
PL/SQL Interview Questions
PL/SQL Interview QuestionsPL/SQL Interview Questions
PL/SQL Interview QuestionsSrinimf-Slides
 

More from Srinimf-Slides (20)

software-life-cycle.pptx
software-life-cycle.pptxsoftware-life-cycle.pptx
software-life-cycle.pptx
 
Python Tutorial Questions part-1
Python Tutorial Questions part-1Python Tutorial Questions part-1
Python Tutorial Questions part-1
 
Cics testing and debugging-session 7
Cics testing and debugging-session 7Cics testing and debugging-session 7
Cics testing and debugging-session 7
 
CICS error and exception handling-recovery and restart-session 6
CICS error and exception handling-recovery and restart-session 6CICS error and exception handling-recovery and restart-session 6
CICS error and exception handling-recovery and restart-session 6
 
Cics program, interval and task control commands-session 5
Cics program, interval and task control commands-session 5Cics program, interval and task control commands-session 5
Cics program, interval and task control commands-session 5
 
Cics data access-session 4
Cics data access-session 4Cics data access-session 4
Cics data access-session 4
 
CICS basic mapping support - session 3
CICS basic mapping support - session 3CICS basic mapping support - session 3
CICS basic mapping support - session 3
 
Cics application programming - session 2
Cics   application programming - session 2Cics   application programming - session 2
Cics application programming - session 2
 
CICS basics overview session-1
CICS basics overview session-1CICS basics overview session-1
CICS basics overview session-1
 
100 sql queries
100 sql queries100 sql queries
100 sql queries
 
The best Teradata RDBMS introduction a quick refresher
The best Teradata RDBMS introduction a quick refresherThe best Teradata RDBMS introduction a quick refresher
The best Teradata RDBMS introduction a quick refresher
 
IMS DC Self Study Complete Tutorial
IMS DC Self Study Complete TutorialIMS DC Self Study Complete Tutorial
IMS DC Self Study Complete Tutorial
 
How To Master PACBASE For Mainframe In Only Seven Days
How To Master PACBASE For Mainframe In Only Seven DaysHow To Master PACBASE For Mainframe In Only Seven Days
How To Master PACBASE For Mainframe In Only Seven Days
 
Assembler Language Tutorial for Mainframe Programmers
Assembler Language Tutorial for Mainframe ProgrammersAssembler Language Tutorial for Mainframe Programmers
Assembler Language Tutorial for Mainframe Programmers
 
The Easytrieve Presention by Srinimf
The Easytrieve Presention by SrinimfThe Easytrieve Presention by Srinimf
The Easytrieve Presention by Srinimf
 
PLI Presentation for Mainframe Programmers
PLI Presentation for Mainframe ProgrammersPLI Presentation for Mainframe Programmers
PLI Presentation for Mainframe Programmers
 
PL/SQL Interview Questions
PL/SQL Interview QuestionsPL/SQL Interview Questions
PL/SQL Interview Questions
 
Macro teradata
Macro teradataMacro teradata
Macro teradata
 
DB2-SQL Part-2
DB2-SQL Part-2DB2-SQL Part-2
DB2-SQL Part-2
 
DB2 SQL-Part-1
DB2 SQL-Part-1DB2 SQL-Part-1
DB2 SQL-Part-1
 

Recently uploaded

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 

Recently uploaded (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 

The best ETL questions in a nut shell

  • 1.
  • 2. Best ETL, Data Warehousing Questions srini
  • 4. Table of Contents ETL Questions Data warehousing UNIX Questions
  • 5. 1) What is meant by ETL? The overall data acquisition process, called ETL (extraction, transformation,and loading), is generally grouped into three main components : -Extraction: Involves obtaining the required data from the various sources. -Transformation: Source data undergoes a number of operations that prepare it for import into the data warehouse (target database). To perform this task, integration and transformation programs are used which can reformat, recalculate, modify structure and data elements, and add lime elements. They can also perform calculations, summarization, de-normal-ization, etc. -Loading: Involves physically placing extracted and transformed data in the target database. The initial loading involves a massive data import into the data warehouse. Subsequently, an extraction procedure periodically loads fresh data based on business rules and a pre–determined frequency. 2) What are the types of dimensional schema? There are two types of dimensional schema : – Star schema – Snowflake schema 3) What is star schema? Star schema, there is only one central fact table, and a set of dimension tables, one for each dimension. In star schema, each dimension is represented by only one table, and each table contains a set of attributes. 4) What is snowflake schema? -A snowflake schema avoids the redundancy of star schemas by normalizing the dimension tables. Therefore, a dimension is represented by several tables related by referential integrity constraints. -Snowflake schema means a central located fact table is surrounded by denormalized dimension tables again these denormalized dimension table is splited one or more normalized dimension tables.
  • 6. 5) What is meant by starflake schema? A starflake schema is a combination of the star and the snowflake schemas where some dimensions are normalized while others are not. 6) What is Operational Data Store? Operational Data Store (ODS) is a hybrid data architecture to cover the requirements for both analytical and operational tasks. Also Read:Top 20 ultimate ETLQuestions really good for interviews 7) What is the difference between star schema and snowflake schema? The star schema consists of a fact table with a single table for each dimension. The snowflake schema is a variation on the star schema where the dimensional tables from a star schema are organized into a hierarchy by normalizing them. A fact constellation is a set of fact tables that share some dimension tables. 8) What is data staging? Data staging is the process of transferring the data from the data sources (operational systems) into the target database of the data warehouse. 9) What is a session? A session is a set of instructions that describes how and when to move data from sources to targets. More: ETLTesting from beginner to expert 10) What is mapping? A mapping is a set of source and target definitions linked by transformation objects that define the rules for data transformation. 11) What is Datadriven? The informatica server follows instructions coded into update strategy transformations with in the session maping determine how to flag records for insert,update,delete or reject. If you do not choose data driven option setting, the informatica server ignores all update strategy transformations in the mapping.
  • 7. 12) What are the three major types of metadata in a data warehouse? Metadata in a data warehouse fall into three major categories : Operational Metadata Extraction and Transformation Metadata End–User Metadata 13) What is OLAP? Allow users to run complex dimensional queries Enable users to generate canned queries. Two categories of online analytical processing are multidimensional online analytical processing (MOLAP) and relational online analytical processing (ROLAP). 14) What is meant by geographic information system(GIS)? A software system that allows users to define, create, maintain, and control access to a geographic database. 15) What is dimension table? A relational table that contains dimension data. 16) What is the diffrence between OLTP and OLAP? The main differences between OLTP and OLAP are: OLTP systems are for doing clerical/operational processing of data whereas OLAP systems are for carrying out analytical processing of the data. OLTP systems look at data in one dimension; whereas in OLAP systems, data can be viewed in different dimensions and hence interesting business intelligence can be extracted from the data. Operational personnel of an organization use the OLTP systems whereas management uses OLAP systems, though operational personnel may also use portions of OLAP system. OLTP systems contain the current data as well as the details of the transactions. OLAP systems contain historical data, and also data in summarized form. OLTP database size is smaller as compared to OLAP systems. If the OLTP database occupies Gigabytes (GB) of storage space, OLAP database occupies Terabytes (TB) of storage space. More: Automating ETL: complete five projects 17) What is DTM? DTM transform data received from reader buffer and its moves transformation to transformation on row by row basis and it uses transformation caches when necessary. 18) What is meant by spatial data warehouse?
  • 8. A data warehouse that manipulates spatial data, thus allowing spatial analysis. This is to be contrasted with conventional and temporal data warehouses. 19) What is a Batch? Batches provide a way to group sessions for either serial or parallel execution by the Informatica Server. 20) What are the types of batch? There are two types of batches are : Sequential batch : Runs sessions one after the other. Concurrent batch : Runs sessions at the same time.
  • 9. there are many ways to do this. However the easiest way to display the first line of a file is using the [head] command. $> head -1 file. Txt no prize in guessing that if you specify [head -2] then it would print first 2 records of the file. another way can be by using [sed] command. [sed] is a very powerful text editor which can be used for various text manipulation purposes like this. $> sed '2,$ d' file. Txt how does the above command work? The 'd' parameter basically tells [sed] to delete all the records from display from line 2 to last line of the file (last line is represented by $ symbol). Of course it does not actually delete those lines from the file, it just does not display those lines in standard output screen. So you only see the remaining line which is the 1st line. how to print/display the last line of a file? the easiest way is to use the [tail] command. $> tail -1 file. Txt if you want to do it using [sed] command, here is what you should write: $> sed -n '$ p' test from our previous answer, we already know that '$' stands for the last line of the file. So '$ p' basically prints (p for print) the last line in standard output screen. '-n' switch takes [sed] to silent mode so that [sed] does not print anything else in the output. how to display n-th line of a file? the easiest way to do it will be by using [sed] i guess. Based on what we already know about [sed] from our previous examples, we can quickly deduce this command: $> sed –n ' p' file. Txt you need to replace with the actual line number. So if you want to print the 4th line, the command will be $> sed –n '4 p' test How to print/display the first line of a file?
  • 10. of course you can do it by using [head] and [tail] command as well like below: $> head - file. Txt | tail -1 you need to replace with the actual line number. So if you want to print the 4th line, the command will be $> head -4 file. Txt | tail -1 how to remove the first line / header from a file? we already know how [sed] can be used to delete a certain line from the output – by using the'd' switch. So if we want to delete the first line the command should be: $> sed '1 d' file. Txt but the issue with the above command is, it just prints out all the lines except the first line of the file on the standard output. It does not really change the file in-place. So if you want to delete the first line from the file itself, you have two options. either you can redirect the output of the file to some other file and then rename it back to original file like below: $> sed '1 d' file. Txt > new_file. Txt $> mv new_file. Txt file. Txt or, you can use an inbuilt [sed] switch '–i' which changes the file in-place. See below: $> sed –i '1 d' file. Txt how to remove the last line/ trailer from a file in unix script? always remember that [sed] switch '$' refers to the last line. So using this knowledge we can deduce the below command: $> sed –i '$ d' file. Txt how to remove certain lines from a file in unix? if you want to remove line to line from a given file, you can accomplish the task in the similar method shown above. Here is an example: $> sed –i '5,7 d' file. Txt the above command will delete line 5 to line 7 from the file file. Txt how to remove the last n-th line from a file? this is bit tricky. Suppose your file contains 100 lines and you want to remove the last 5 lines. Now if you know how many lines are there in the file, then you
  • 11. can simply use the above shown method and can remove all the lines from 96 to 100 like below: $> sed –i '96,100 d' file. Txt # alternative to command [head -95 file. Txt] but not always you will know the number of lines present in the file (the file may be generated dynamically, etc. ) in that case there are many different ways to solve the problem. There are some ways which are quite complex and fancy. But let's first do it in a way that we can understand easily and remember easily. Here is how it goes: $> tt=`wc -l file. Txt | cut -f1 -d' '`;sed –i "`expr $tt - 4`,$tt d" test as you can see there are two commands. The first one (before the semi-colon) calculates the total number of lines present in the file and stores it in a variable called “tt”. The second command (after the semi-colon), uses the variable and works in the exact way as shows in the previous example. how to check the length of any line in a file? we already know how to print one line from a file which is this: $> sed –n ' p' file. Txt where is to be replaced by the actual line number that you want to print. Now once you know it, it is easy to print out the length of this line by using [wc] command with '-c' switch. $> sed –n '35 p' file. Txt | wc –c the above command will print the length of 35th line in the file. Txt. how to get the nth word of a line in unix? assuming the words in the line are separated by space, we can use the [cut] command. [cut] is a very powerful and useful command and it's real easy. All you have to do to get the n-th word from the line is issue the following command: cut –f -d' ' '-d' switch tells [cut] about what is the delimiter (or separator) in the file, which is space ' ' in this case. If the separator was comma, we could have written -d',' then. So, suppose i want find the 4th word from the below string: “a quick brown fox jumped over the lazy cat”, we will do something like this: $> echo “a quick brown fox jumped over the lazy cat” | cut –f4 –d' ' and it will print “fox” how to reverse a string in unix? pretty easy. Use the [rev] command.
  • 12. $> echo "unix" | rev xinu how to get the last word from a line in unix file? we will make use of two commands that we learnt above to solve this. The commands are [rev] and [cut]. Here we go. let's imagine the line is: “c for cat”. We need “cat”. First we reverse the line. We get “tac rof c”. Then we cut the first word, we get 'tac'. And then we reverse it again. $>echo "c for cat" | rev | cut -f1 -d' ' | rev cat how to get the n-th field from a unix command output? we know we can do it by [cut]. Like below command extracts the first field from the output of [wc –c] command $>wc -c file. Txt | cut -d' ' -f1 109 but i want to introduce one more command to do this here. That is by using [awk] command. [awk] is a very powerful command for text pattern scanning and processing. Here we will see how may we use of [awk] to extract the first field (or first column) from the output of another command. Like above suppose i want to print the first column of the [wc –c] output. Here is how it goes like this: $>wc -c file. Txt | awk ' ''{print $1}' 109 the basic syntax of [awk] is like this: awk 'pattern space''{action space}' the pattern space can be left blank or omitted, like below: $>wc -c file. Txt | awk '{print $1}' 109 in the action space, we have asked [awk] to take the action of printing the first column ($1). More on [awk] later. how to replace the n-th line in a file with a new line in unix? this can be done in two steps. The first step is to remove the n-th line. And the second step is to insert a new line in n-th line position. Here we go. step 1: remove the n-th line
  • 13. $>sed -i'' '10 d' file. Txt # d stands for delete step 2: insert a new line at n-th line position $>sed -i'' '10 i this is the new line' file. Txt # i stands for insert how to show the non-printable characters in a file? open the file in vi editor. Go to vi command mode by pressing [escape] and then [:]. Then type [set list]. This will show you all the non-printable characters, e. G. Ctrl-m characters (^m) etc. , in the file. how to zip a file in linux? use inbuilt [zip] command in linux how to unzip a file in linux? use inbuilt [unzip] command in linux. $> unzip –j file. Zip how to test if a zip file is corrupted in linux? use “-t” switch with the inbuilt [unzip] command $> unzip –t file. Zip how to check if a file is zipped in unix? in order to know the file type of a particular file use the [file] command like below: $> file file. Txt file. Txt: ascii text if you want to know the technical mime type of the file, use “-i” switch. $>file -i file. Txt file. Txt: text/plain; charset=us-ascii if the file is zipped, following will be the result $> file –i file. Zip file. Zip: application/x-zip