Big data hadoop flume spark cloudera Oracle big data appliance apache , oracle loader for hadoop, Big data copy. Exadata to Big data appliance. bilginc It academy.
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Bigdata : Big picture
1. ZEKERIYA BEŞIROĞLU
BILGINC IT ACADEMY
ORACLE CLOUD DAY
19-11-2015
TROUG-TURKISH ORACLE USER GROUP
BIG DATA : BIG PICTURE
2. ZEKERIYA BEŞIROĞLU
▸ +18 IT
▸ +15 ORACLE DB&DWH
▸ +3 BIG DATA
▸ Leader of TROUG
▸ Instructor&Consultant
▸ http://zekeriyabesiroglu.com
▸ @zbesiroglu
TROUG BIG DATA
BIG PICTURE
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG
PICTURE
5. METIN
BIG DATA
Social networks
Banking and financial services
E-commerce services
Web-centric services
Internet search indexes
Scientific and document searches
Medical records
Web loggs
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG
PICTURE
7. FIRMALAR ,
MÜŞTERILERININ DNA
SINI ANALIZ ETMEK
ZORUNDALAR.
Zekeriya Beşiroğlu
TROUG
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG
PICTURE
8. TROUG
BIG DATADA HEDEF NEDİR? NASIL
YAPILMALI?
▸Big data teknolojilerini kullanarak business’a nasıl değer
katabilirim. Bir takım costları azaltabilirmiyim?
▸Big Data ile geleneksel database nasıl entegre edeceğim?
Structured,semi structured ve unstructured verileri
birleştirme
▸Analytics toolları ile sonuça ulaşma. Oracle Advance
Analytics,BI ve DW teknolojileri
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG
PICTURE
9. TROUG
DATA
▸ Schema on Write yapıyoruz
▸ Schema on READ yapalım.
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG
PICTURE
10. TROUG
BIG DATA PROJESI SAFHALARI
▸DATA ACQUISITION and Storage
▸DATA ACCESS and Processing
▸Data Unification and Analysis
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG
PICTURE
11. DATA ACQUISITION AND STORAGE
HADOOP DISTRIBUTED FILE SYSTEM-HDFS
▸petabyte-scale distributed file system
▸linearly scalable on commodity hardware
▸Schema on Read
▸Cheaper
▸low security
▸write once,read many
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG
PICTURE
12. DATA ACQUISITION AND STORAGE
HADOOP DISTRIBUTED FILE SYSTEM-HDFS
▸Basic file system operations
▸JSON log file HDFS yükleyebilirim. (hadoop fs -put)
13. DATA ACQUISITION AND STORAGE
WHAT IS FLUME?
▸Avro Source
▸Memory Channel
▸HDFS Sink
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG
PICTURE
14. DATA ACQUISITION AND STORAGE
ORACLE NOSQL DATABASE
▸Key Value Database
▸Access by java Apı
▸Stores unstructured or semi structured data as byte arrays
▸Highly reliable
▸Scalable throughput and predictable latency
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG
PICTURE
15. DATA ACQUISITION AND STORAGE
RDBMS & NOSQL
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG
PICTURE
16. DATA ACQUISITION AND STORAGE
HDFS & NOSQL
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG
PICTURE
17. DATA ACQUISITION AND STORAGE
APPLICATION DATABASE TECHNOLOGY
▸High Volume with Low value
▸Dynamic application schema
▸if answer yes NOSQL
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG
PICTURE
18. DATA ACQUISITION AND STORAGE
NOSQL EXAMPLE
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG
PICTURE
19. DATA ACCESS AND PROCESSING
MAP REDUCE
▸Write applications that process vast amounts of data , in
parallel on large cluster of commodity hardware in reliable
and fault tolerant.
▸Storing data in HDFS is low cost , fault tolerant and scalable.
▸Integrates with HDFS to provide parallel data processing
▸Batch-oriented
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG
PICTURE
20. DATA ACCESS AND PROCESSING
MAP REDUCE ORNEK
map(String input_key, String input_value)
foreach word w in input_value:
emit(w, 1)
reduce(String output_key,
Iterator<int> intermediate_vals)
set count = 0
foreach v in intermediate_vals:
count += v
emit(output_key, count)
(1000,’Galatasaray sampiyon olur’)
(2000,’beşiktas sampiyon olur’)
(2200,’Galatasaray Türkiyedir’)
(3000,’fenerbahce sampiyon olur’)
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG
PICTURE
21. DATA ACCESS AND PROCESSING
MAP REDUCE ORNEK
Output Mapper
(‘Galatasaray’, 1), (‘sampiyon’, 1), (‘olur’, 1), (‘beşiktas’, 1),
(‘sampiyon, 1), (‘olur’, 1), (‘Galatasaray’, 1), (‘Türkiyedir’, 1) (‘fenerbahce’, 1),
(‘sampiyon, 1), (‘olur’, 1)
Intermediate Data Reducer’a gönderilen
(‘Galatasaray’,[1,1])
(‘sampiyon’,[1,1,1])
(‘olur’,[1,1,1])
(‘beşiktas’,[1])
(‘fenerbahce’,[1])
(‘Türkiyedir’,[1])
Reducer’ın son cıktısı
(‘sampiyon’,3)
(‘olur’,3)
(‘Galatasaray’,2)
(‘fenerbahce’,1)
(‘beşiktas’,1)
(‘Türkiyedir’,1)
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG
PICTURE
22. DATA ACCESS AND PROCESSING
HIVE
▸SQL to query HDFS by using Hive QL(SQL like language)
▸Hive transform HiveQL queries into standard Mapreduce
jobs
▸Schema on Read via InputFormat and SerDe
▸Not ideal for ad hoc(slow)
▸Immature optimizer
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG
PICTURE
23. DATA ACCESS AND PROCESSING
HIVE
▸Log Processing
▸Text mining
▸Document Indexing
▸Business Analytics
▸Predictive Modeling
▸Not ideal for ad hoc query
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG
PICTURE
24. DATA ACCESS AND PROCESSING
PIG
▸Open Source Data flow system
▸simple language for queries and data manipulation, which is
compiled into map-reduce jobs that are run on hadoop
▸Provides common operations like join,group,sort
▸Works on files in HDFS
▸Ad hoc queries across large data sets.
▸log analysis
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG
PICTURE
25. DATA ACCESS AND PROCESSING
CLOUDERA IMPALA
▸DATABASE -LIKE SQL layer on top of Hadoop
▸Distributed,massively parallel processing database engine
▸SQL is the primary development language
▸Open Source,Impala process data in hadoop cluster
WITHOUT using MapReduce
▸Interactive analysis on data stored in HDFS and Hbase
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG
PICTURE
26. DATA ACCESS AND PROCESSING
ORACELE XQUERY FOR HADOOP
▸Is a transform engine for semistructured data that is stored in
Apache Hadoop
▸Transform Xquery language translating them into series of
Mapreduce
▸load data efficiently into Oracle Database by using Oracle
Loader for Hadoop
▸Provides read and write support to Oracle NOSQL DB
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG
PICTURE
27. DATA ACCESS AND PROCESSING
ORACELE XQUERY FOR HADOOP
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG
PICTURE
28. DATA ACCESS AND PROCESSING
APACHE SPARK
▸Open Source parallel data processing
▸Develop Fast
▸Online Streaming
▸Interactive analytics
▸Machine Learning
▸Speed
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG
PICTURE
29. DATA ACCESS AND PROCESSING
APACHE SPARK ÖRNEK
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG
PICTURE
30. DATA UNIFICATION AND ANALYSIS
APACHE SQOOP
▸Batch Loading
▸Transfer bulk data between structured data stores and
Apache Hadoop
▸Data import and Export between external data stores and
Hadoop
▸Parallelizes data transfer for fast performance
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG
PICTURE
31. DATA UNIFICATION AND ANALYSIS
ORACLE LOADER FOR HADOOP
▸Batch Loading
▸High performance loader for fast movement of data from
Hadoop into a table in Oracle Database
▸Loading using online and offline modes
▸offloading expensive data processing from the database
server to hadoop
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG
PICTURE
32. DATA UNIFICATION AND ANALYSIS
COPY TO BDA
▸Batch Loading
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG
PICTURE
33. DATA UNIFICATION AND ANALYSIS
ORACLE SQL CONNECTOR FOR
HADOOP
▸ Generate external table in database
pointing to HDFS data
▸ Load into database or query data in
place on HDFS
▸ Fine-grained control over type
mapping
▸ Parallel load with automatic load
balancing
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG
PICTURE
34. DATA UNIFICATION AND ANALYSIS
ORACLE TECHNOLOGIES
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG
PICTURE
35. DATA UNIFICATION AND ANALYSIS
ORACLE ADVANCED ANALYTICS
▸OAA=Oracle Data Mining+Oracle R enterprise
▸Performance
▸Predictive Analytics
▸Easy
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG
PICTURE
36. METIN
ORACLE BDA BENEFITS
▸ Ships with leading Hadoop
distribution(Cloudera)
▸ Hdfs,hbase,hive,flume,kafka,spark …
▸ Cloudera manager
▸ Ships with great connectivity to Oracle
Db
▸ Big Data SQL
▸ Big Data Connectors & ODI
TROUG @ZBESIROGLU BILGINC IT ACADEMY BIG DATA BIG
PICTURE