Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Â
Hadoop Summit Japan 2011 Fall - LT by IBM
1. Data Discovery Tool
BigSheets
MapReduce with No Coding?
p g
Atsushi Tsuchiya (eAtsuhsi@JP.ibm.com)
Atsushi Tsuchiya (eAtsuhsi@JP.ibm.com)
Big Data Tiger Team
IBMÂ Software
IBM Software
2. Looking at Data
Looking at Data
⢠What would you do with Big data?Â
h ld d ih i d ?
⢠How to make use of it?
⢠It is difficult! â too vague.
⢠No specific problem that needs to be solved.
p p
⢠No specific question that needs to be answered.
⢠Only you know is to improve the business.
yy p
⢠But you have *data*
⢠So what would you do first?
So, what would you do first?
Looking at Data!
g
3. IBM with Hadoop
IBM with Hadoop
⢠IBM has been working with Open sourceÂ
y g
community for the long time.
â Eclipse, Hadoop and so on âŚ
⢠BigInsights include Hadoop
4. BigInsights
⢠BigInsihgts i
i ih is IBM Hadoop product for Big dataÂ
d d f i d
analytics.
â Basic Edition (up to 10TB) â Free çĄĺă§ä˝żăăžăďź
â Enterprise EditionÂ
p
⢠Next version BigInsights â coming soon
Next version BigInsights coming soon.
â v1.2Â available.
⢠And many more
5. BigInsights Componetns
BigInsights Componetns
⢠BigInsihgts i l d
i ih includes:
â IBMÂ Java
â JAQL - IBMăéçşăăč¨čŞ(ăŞăźăăłă˝ăźăš)
â IBM Distribution of Hadoop
â BigSheets - ăăźăżć˘ç´˘ăăźăŤ
â FLEX scheduler for Adaptive MapReduceÂ
â Orchestrator (Workflow Engine)
â SystemT (Text Analytics), SystemML (Machine Learning)
â LDAP
â Web Console / Developer Studio
6. BigInsights â Basic Edition
BigInsights â Basic Edition
Version
Will be Update Basic Enterprise
Function in Nov Edition
Editi Edition
Editi
release.
Integrated Install Inc Inc
Open Source components:
Hadoop (including common utilities, HDFS, MapReduce framework) 0.20.2 Inc Inc
Jaql (programming / query language) 0.5.2 Inc Inc
Pig (programming / query language) 0.7 Inc Inc
Flume (data collection/aggregation) 0.9.1 Inc Inc
Hive (data summarization/querying) 0.5 Inc Inc
Lucene (text search) 3.0.2
302 Inc Inc
Zookeeper (process coordination) 3.2.2 Inc Inc
Avro (data serialization) 1.3.0 Inc Inc
HBase (
(real time read/write)
/ ) 0.20.6
0 20 6 Inc Inc
Oozie (workflow/ job orchestration) 2.2.2 Inc Inc
Online documentation Inc Inc
Capability to integrate with DB2, InfoSphere Warehouse Inc Inc
Two DB2 UDFs to submit jobs, and read results from BigInsights
7. BigInsights â Enterprise Edition
Enterprise Edition
Basic Enterprise
Function Edition Edition
R Connector
Jaql module to invoke R statistical capabilities from BigInsights n/a Inc
Netezza C
N t Connector
t
Jaql modules to read/write data from/to Netezza n/a Inc
LDAP n/a Inc
Web Console n/a Inc
Workflow Engine n/a Inc
Scheduler (Orchestrator) n/a Inc
Text Analytics Module (System T) n/a Inc
Eclipse support (for System Tďź* n/a Inc
BigSheets â Data Discovery Tool n/a Inc
IBM Optim Development Studio V2.2.1.0 n/a Inc
Support by IBM
pp y n/a Inc
13. Internet
BigSheets
Intranet
Gather Logs
Other
BigInsight
s
⢠BigInsights can gather data from
i i h h d f
â Predefined formats :
⢠BigSheets data reader
⢠Basic crawler data reader
⢠Basic crawler data reader (binary support)
Basic crawler data reader (binary support)
⢠Characterâdelimited data reader
⢠Tab Separated Value (TSV) data reader
p ( )
⢠JavaScript Object Notation (JSON) array reader
⢠Comma Separated Value (CSV) data reader
â Customer BigSheets ReaderÂ
14. Internet
BigSheets
Intranet
Gather Logs
Other
BigInsight
s
⢠BigInsights can import structured andÂ
i i h i d d
unstructured data
â CSV
â Files
â Network
⢠http
p
⢠hdfs
⢠AWS (S3n/S3)
â Other
⢠Customer Importer
15. Internet
BigSheets
Intranet
Collection Logs
Other
BigInsight
s
A complete list of MacDonald s in North America.
A complete list of MacDonald's in North America
16. Internet
BigSheets
Intranet
Logs
BigInsight
Other s
Calculate
Reformat
Import
A complete list of MacDonald's in North America.
17. Internet
BigSheets
Intranet
Logs
BigInsight
Other s
Column chart
Heat map