Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Building your bi system-HadoopCon Taiwan 2015
1. BUILD YOUR BI SYSTEM
PRACTICE IN DATA LAKE ECOSYSTEM
Bryan@Vpon Data
2. • Experience
Vpon Data Engineer
TWM, Keywear, Nielsen
• Bryan’s notes for data analysis
http://bryannotes.blogspot.tw
• Spark.TW
• Linikedin
https://tw.linkedin.com/pub/bryan-yang/7b/763/a79
ABOUT ME
13. 3 KINDS OF PROBLEMS
https://kavyamuthanna.wordpress.com/category/big-data/
14. BIG DATA BIG PROBLEM
http://www.mn.uio.no/ifi/studier/masteroppgaver/nd/masteroppgave_cloud_bigdata_hpc.html
15. BIG DATA BIG COST
• The cost of data storage
What does the data keep?
How long?
• The cost of data management
Is the machine and infra easy to maintain?
Data Flow(ETL)?
• The time cost of data processing
How long will the users can wait?
Accessibility of the data
Human costs you can not see
23. Overviews
Business intelligence (BI) is the set of techniques and tools for
the transformation of raw data into meaningful and useful
information for business analysis purposes. —Wikipedia
24. DIFFERENT FEATHERS
Price Perfomance Accessibility
Hadoop Low Median Low
SQL Server Low-Median Depends on Median
Data
Warehouse
High High Median
BI System High Depends on High
29. HIVE
• Create at Facebook
• Data warehouse in Hadoop ecosystem
• HiveQL(SQL like interface)
• Metastore(Save the schema of data,
schema on read)
• UDF
33. TERADATA
• Massively Parallel Processing
• Each processor handles different threads
of the program, and Each processor itself
has its own operating disk
• Teradata SQL is fully certified at the SQL
92
55. HOW TO CHOOSE THE
COMPONENT IN YOUR BI
FRAMEWORK ?
• The cost of data storage
• The cost of data management
• The time cost of data processing
56. CONSIDERINGS AND
SUGGESTIONS
• Time is money
• HDD space/ money for the time
• Understanding the components and
relationships
• Get balance of the needs and costs
• Good framework will help business growth
big data brings the problem in 3 ways.
Variety: kinds of data types, data sources , databases
Volume: log data, transection data, crawler data
Velocity: real time ,near real time, batch
Vpon is a big data advertising company. We receive and produce amount of data a day.
業務需求反應能等待的處理時間
We receive so many adhoc queries a day.
Queries are com from each development like Business development, sales, Account services
RD blahblah.
For example, how many users a day, how many requests a day, click rate, etc.