5. Apache
Tajo
• Open-‐source
big
data
warehouse
(also
called
SQL-‐on-‐hadoop)
system
• Apache
Top-‐level
project
since
March
2014
• Supports
SQL
standards
• Low
latency,
and
long
running
batch
queries
• 0.9.0
released
in
Oct
2014.
6. Hadoop
eco-‐system
Integra6on
• De-‐factor
standard
file
format
support
– Parquet,
RCFile,
SequenceFile,
and
Text
files
• Hcatalog
support
– Enable
Tajo
to
access
exisHng
tables
used
in
Hive
and
others
• Yarn
support
– Tajo
can
be
run
on
Yarn
cluster
by
using
Apache
Slider.
7. Overall
Architecture
Master
Server
(HA)
Client
JDBC
TSql
Web
UI
CatalogStore
DBMS
HCatalog
Submit
a
Query
Manage
metadata
Allocate
a
query
Send
task
&
monitor
Send
task
&
monitor
Slave
Server
TajoWorker
QueryMaster
Local
FileSyst
em
HDFS
Local
Query
Engine
StorageManager
Slave
Server
TajoWorker
QueryMaster
Local
FileSyst
em
HDFS
Local
Query
Engine
StorageManager
Slave
Server
TajoWorker
QueryMaster
Local
FileSyst
em
HDFS
Local
Query
Engine
StorageManager
타조마스터
TajoMaster
12. Tajo
Cluster
Mode
• Local
mode
– A
local
mode
Tajo
instance
can
start
up
with
very
simple
configuraHons.
• Fully
distributed
mode
– A
fully
distributed
mode
enables
a
Tajo
instance
to
run
on
(HDFS).
In
this
mode,
a
number
of
Tajo
workers
run
across
a
number
of
the
physical
nodes
where
HDFS
data
nodes
run.
13. Seng
up
a
Local
mode
• Local
mode
Hadoop
Cluster
없이
29. 디렉토리
property
nametajo.rootdir/name
valuefile:///tajo/meetup/warehouse/value
descripHonBase
directory
including
system
directories./descripHon
/property
property
nametajo.worker.tmpdir.locaHons/name
value/tmp/tajo-‐${user.name}/tmpdir/value
descripHonA
base
for
other
temporary
directories./descripHon
/property
30. Seng
up
a
Fully
distributed
mode
• conf/tajo-‐site.xml
– Master
와
44. Seng
up
a
Fully
distributed
mode
• Make
base
directories
and
set
permissions
• Distribute
a
Tajo
home
to
workers
• Launch
a
Tajo
cluster
$
$TAJO_HOME/bin/start-‐tajo.sh
$
$HADOOP_HOME/bin/hadoop
fs
-‐mkdir
/tajo
$
$HADOOP_HOME/bin/hadoop
fs
-‐chmod
g+w
/tajo
46. First
query
execu6on
• Star6ng
the
Tajo
Shell
(tsql)
• Tsql
usage
$
${TAJO_HOME}/bin/tsql
usage:
tsql
[opHons]
[database]
-‐B,-‐-‐background
execute
as
background
process
-‐c,-‐-‐command
arg
execute
only
single
command,
then
exit
-‐conf,-‐-‐conf
arg
configuraHon
value
-‐f,-‐-‐file
arg
execute
commands
from
file,
then
exit
-‐h,-‐-‐host
arg
Tajo
server
host
-‐help,-‐-‐help
help
-‐p,-‐-‐port
arg
Tajo
server
port
-‐param,-‐-‐param
arg
parameter
value
in
SQL
file
47. First
query
execu6on
• Create
tables
in
Tajo
– Managed
Table
– External
Table
• HDFS
일경우
49. 변경한다.
$
default
create
table
table1
(
id
int,
name
text,
score
float,
type
text)
using
text;
$
default
create
external
table
table1
(
id
int,
name
text,
score
float,
type
text)
using
text
with
('text.delimiter'='|')
locaHon
'file:/tajo/meetup/table1';
50. First
query
execu6on
• Selec6ng
data
default
select
*
from
table1
where
id
2;
Progress:
100%,
response
Hme:
0.492
sec
id,
name,
score,
type
-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐
3,
ghi,
3.4,
c
4,
jkl,
4.5,
d
5,
mno,
5.6,
e
(3
rows,
0.492
sec,
36
B
selected)
default
q
51. Maximum
number
of
parallel
running
tasks
• Worker
Heap
Memory
Size
– Tajo
Worker
는
162.
사용한다.
• create
table
text_table
(id
int,
name
text)
using
text
• create
table
json_table
(id
int,
name
text)
using
json
• Physical
Proper6es
– File
format
별로
173. 1byte
character
로
구분된다.
Default:
‘|’
• CSV:
‘text.delimiter’=‘,’
• TSV:
‘text.delimiter’=‘t’
• Hive
default
:
’text.delimiter'='u0001’
default
create
external
table
table1
(
id
int,
name
text,
score
float,
type
text)
using
text
with
('text.delimiter'='|')
locaHon
'/tajo/meetup/table1';
179. empty
string
을
사용
• Hive
default:
'text.null'='N’
default
create
external
table
table1
(
id
int,
name
text,
score
float,
type
text)
using
text
with
('text.delimiter'='|',
'text.null'='N')
locaHon
'file:/tajo/meetup/table1';
194. Text
Files
• Table
Compression
– Managed
Table
– External
Table
default
create
table
table2
(id
int,
name
text,
score
float,
type
text)
using
text
with
('text.delimiter'='|',
'compression.codec'='org.apache.hadoop.io.compress.DeflateCodec')
as
select
*
from
table1;
$
gzip
/tajo/meetup/table3/data.csv
$bin/tsql
default
create
external
table
table3
(id
int,
name
text,
score
float,
type
text)
using
text
with
('text.delimiter'='|',
'compression.codec'='org.apache.hadoop.io.compress.GzipCodec')
locaHon
'file:/tajo/meetup/table3';