SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Big Data & Analytics
Keshav Tripathy, Bharti Consulting Inc.
Outline
• Big Data
• Gartner Hype Cycle 2012
• Large scale data processing
• Visual Analytics
• Chances and Challenges
• Discussions
Big Data V3
• Volume:Gigabyte(109), Terabyte(1012), Petabyte(1015), Exabyte(1018),
Zettabytes(1021)
• Variety: Structured,semi-structured, unstructured; Text, image, audio, video,
record
• Velocity(Dynamic, sometimes time-varying)
Big Data refers to datasets that grow so large that it is difficult to capture, store, manage, share, analyze and
visualize with the typical database software tools.
Numbers
• How many data in the world?
• 800 Terabytes, 2000
• 160 Exabytes, 2006
• 500 Exabytes(Internet), 2009
• 2.7 Zettabytes, 2012
• 35 Zettabytes by 2020
• How many data generated ONE day?
• 7 TB, Twitter
• 10 TB, Facebook
Big data: The next frontier for innovation, competition, and productivity
McKinsey Global Institute 2011
Why Is Big Data Important?
Gartner Hype Cycle 2012
Large Scale Visual Analytics
• Definition: Visual analytics is the science of analytical reasoning facilitated by
interactive visual interfaces.
• People use visual analytics tools and techniques to
• Synthesize information and derive insight from massive, dynamic,
ambiguous, and often conflicting data
• Detect the expected and discover the unexpected
• Provide timely, defensible, and understandable assessments
• Communicate assessment effectively for action.
Inforviz Reference Model to Visual Analytics
Applications
• Terrorism and Responses
• Multimedia Visual Analytics
• Situation Surveillance and Awareness in Investigative Analysis
• Disease visual analytics for Disease outbreak Prediction
• Financial Visual Analytics
• Cybersecurity Visual Analytics
• Visual Analytics for Investigative Analysis on Text Documents
Techniques and Technologies
• A wide variety of techniques and technologies has been developed and adapted for
• Data aggregation
• Data manipulation
• Data analysis
• Data visualization
• These techniques and technologies draw from several fields including
• Statistics
• Computer science
• Applied mathematics
• Economics.
Techniques and Applications
• Statistics: A/B testing(split testing/bucket testing ),Spatial analysis , Predictive modeling :Regression
• Machine Learning
• Unsupervised learning: cluster analysis
• Supervised learning: classification, support vector machines(SVM), ensemble learning
• Association rule learning
• Data Mining and Pattern Recognition: neural network, classification, clustering
• Natural language processing(NLP): Sentiment analysis
• Dimension Reduction: PCA, MDS, SVD
• Data fusion and data integration: Visual Word
• Time series analysis: Combination of statistics and signal processing
• Simulation: Monte Carlo simulations, MRF
• Optimization: Genetic algorithms
• Visualization: Scientific Viz, Inforviz, Visual Analtytics
Technologies
• Database and Data warehouse
• Google File System and MapReduce: Big Table
• Hadoop: HBase and MapReduce, open source Apache project
• Cassandra: An open source (free) DBMS, originally developed at Facebook and now an Apache Software foundation project.
• Data warehouse: ETL (extract, transform, and load) tools and business intelligence tools.
• Business intelligence (BI): data warehouse, reporting, real-time management dashboards
• Cloud computing: Services, SOA, etc.
• Metadata: XML
• Stream processing
• R, SAS and SPSS
• Visualization:Tag cloud,Clustergram,History flow, Themeriver, Treemap
Origin of Information Visualization
InforViz Techniques
• Scatterplot and Scatterplot Matrix
• Hierarchies Visualization:Node-Link Diagrams, Sunburst,Treemap, Circle-
packing layouts
• Network Visualization:Force-Directed Layout,Arc Diagrams,Matrix Views
• Multidimensional Visualization/Parallel Coordinates
• Stacked Graphs
• Flow Maps
Scatterplot and Scatterplot Matrix
Tree Visualization(1)
Node-Link Diagrams
sunburst
Tree Visualization(2)
Treemap
Circle-packing layouts
Network Visualization
Force-Directed Layout
Arc Diagrams
Matrix Views
Parallel Coordinates
Stacked Graphs
Flow Maps
Examples
Fraud Detection of Bank Wire Transactions
Displays and Views
A classical VA tool
GapMinder [Demo]
Smart Money Map [Demo]
A recent project
Chances and Challenges
• The basic techniques for large scale simulation and computing are ready
• However, large and time-consuming computing tasks need steering or
visualize the intermediate computing results.
• Most simulation and computing tasks have to tune hundreds of parameters.
• Smart/intelligent data mining/data processing algorithms are ready
• However, most data mining algorithms have high computational complexity: N2
rather than Nlog(N), or N
• How to combine automatic computing(machine) and high-level intelligence to gain
insight(Human), and involve human in the computing?
Recent Research Topics
• Unified Visual Analytics by Heterogeneous Data Sources(esp. Text)
• Structured and semi-structured data fusion framework
• Data indexing and similarity rank
• Visual analytics for high-dimensional heterogeneous data
• Domain Risk Management and Preventive Control by Sensor Data Collection and Data Mining
• Sensor techniques
• Data Warehouse
• Coordinated Views integrate visual analytic techniques
• Parallel/Distributed Computing Steering by Parameter Optimization and Visualization
• Parameter tuning and computing optimization
• Intermediate results visualization and task steering
• Markov Chain Monte Carlo(MCMC) Simulation
Questions and Thanks!

Weitere ähnliche Inhalte

Was ist angesagt?

Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystem
magda3695
 
Give sense to your Big Data w/ Apache TinkerPop™ & property graph databases
Give sense to your Big Data w/ Apache TinkerPop™ & property graph databasesGive sense to your Big Data w/ Apache TinkerPop™ & property graph databases
Give sense to your Big Data w/ Apache TinkerPop™ & property graph databases
DataStax
 

Was ist angesagt? (18)

From Developer to Data Scientist
From Developer to Data ScientistFrom Developer to Data Scientist
From Developer to Data Scientist
 
AzureDay - Introduction Big Data Analytics.
AzureDay  - Introduction Big Data Analytics.AzureDay  - Introduction Big Data Analytics.
AzureDay - Introduction Big Data Analytics.
 
Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?
 
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
 
Big data 101
Big data 101Big data 101
Big data 101
 
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
 
Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...
Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...
Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to ...
 
Big data
Big dataBig data
Big data
 
Data Infrastructure Development for SKA/Jasper Horrell
Data Infrastructure Development for SKA/Jasper HorrellData Infrastructure Development for SKA/Jasper Horrell
Data Infrastructure Development for SKA/Jasper Horrell
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystem
 
Hopper energyservices
Hopper energyservicesHopper energyservices
Hopper energyservices
 
Unit 1
Unit 1Unit 1
Unit 1
 
Cloud as a Data Platform
Cloud as a Data PlatformCloud as a Data Platform
Cloud as a Data Platform
 
Give sense to your Big Data w/ Apache TinkerPop™ & property graph databases
Give sense to your Big Data w/ Apache TinkerPop™ & property graph databasesGive sense to your Big Data w/ Apache TinkerPop™ & property graph databases
Give sense to your Big Data w/ Apache TinkerPop™ & property graph databases
 
MACHINE LEARNING ON MAPREDUCE FRAMEWORK
MACHINE LEARNING ON MAPREDUCE FRAMEWORKMACHINE LEARNING ON MAPREDUCE FRAMEWORK
MACHINE LEARNING ON MAPREDUCE FRAMEWORK
 
Steve Woolege Of Aster Data Gives Lightning Talk At BigDataCamp
Steve Woolege Of Aster Data Gives Lightning Talk At BigDataCampSteve Woolege Of Aster Data Gives Lightning Talk At BigDataCamp
Steve Woolege Of Aster Data Gives Lightning Talk At BigDataCamp
 
Lunch & Learn Intro to Big Data
Lunch & Learn Intro to Big DataLunch & Learn Intro to Big Data
Lunch & Learn Intro to Big Data
 
AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14
AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14
AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14
 

Ähnlich wie Bigdata analytics

Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
Srinath Perera
 
Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data Lessons
George Stathis
 

Ähnlich wie Bigdata analytics (20)

Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
 
Lecture1
Lecture1Lecture1
Lecture1
 
Big data analytics and machine intelligence v5.0
Big data analytics and machine intelligence   v5.0Big data analytics and machine intelligence   v5.0
Big data analytics and machine intelligence v5.0
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web development
 
DA_01_Intro.pptx
DA_01_Intro.pptxDA_01_Intro.pptx
DA_01_Intro.pptx
 
Knowledge Discovery
Knowledge DiscoveryKnowledge Discovery
Knowledge Discovery
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
Lecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsLecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in details
 
A Production Quality Sketching Library for the Analysis of Big Data
A Production Quality Sketching Library for the Analysis of Big DataA Production Quality Sketching Library for the Analysis of Big Data
A Production Quality Sketching Library for the Analysis of Big Data
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptx
 
WWV2015: Jibes Paul van der Hulst big data
WWV2015: Jibes Paul van der Hulst big dataWWV2015: Jibes Paul van der Hulst big data
WWV2015: Jibes Paul van der Hulst big data
 
Big Data And Machine Learning Using MATLAB.pdf
Big Data And Machine Learning Using MATLAB.pdfBig Data And Machine Learning Using MATLAB.pdf
Big Data And Machine Learning Using MATLAB.pdf
 
Dw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhanDw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhan
 
Analytics&IoT
Analytics&IoTAnalytics&IoT
Analytics&IoT
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
 
Traditional Machine Learning and Deep Learning on OpenPOWER/POWER systems
Traditional Machine Learning and Deep Learning on OpenPOWER/POWER systemsTraditional Machine Learning and Deep Learning on OpenPOWER/POWER systems
Traditional Machine Learning and Deep Learning on OpenPOWER/POWER systems
 
Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data Lessons
 
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraLow-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
 
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1
 

Kürzlich hochgeladen

Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
HyderabadDolls
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
gajnagarg
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 

Kürzlich hochgeladen (20)

Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 

Bigdata analytics

  • 1. Big Data & Analytics Keshav Tripathy, Bharti Consulting Inc.
  • 2. Outline • Big Data • Gartner Hype Cycle 2012 • Large scale data processing • Visual Analytics • Chances and Challenges • Discussions
  • 3. Big Data V3 • Volume:Gigabyte(109), Terabyte(1012), Petabyte(1015), Exabyte(1018), Zettabytes(1021) • Variety: Structured,semi-structured, unstructured; Text, image, audio, video, record • Velocity(Dynamic, sometimes time-varying) Big Data refers to datasets that grow so large that it is difficult to capture, store, manage, share, analyze and visualize with the typical database software tools.
  • 4. Numbers • How many data in the world? • 800 Terabytes, 2000 • 160 Exabytes, 2006 • 500 Exabytes(Internet), 2009 • 2.7 Zettabytes, 2012 • 35 Zettabytes by 2020 • How many data generated ONE day? • 7 TB, Twitter • 10 TB, Facebook Big data: The next frontier for innovation, competition, and productivity McKinsey Global Institute 2011
  • 5. Why Is Big Data Important?
  • 7. Large Scale Visual Analytics • Definition: Visual analytics is the science of analytical reasoning facilitated by interactive visual interfaces. • People use visual analytics tools and techniques to • Synthesize information and derive insight from massive, dynamic, ambiguous, and often conflicting data • Detect the expected and discover the unexpected • Provide timely, defensible, and understandable assessments • Communicate assessment effectively for action.
  • 8. Inforviz Reference Model to Visual Analytics
  • 9. Applications • Terrorism and Responses • Multimedia Visual Analytics • Situation Surveillance and Awareness in Investigative Analysis • Disease visual analytics for Disease outbreak Prediction • Financial Visual Analytics • Cybersecurity Visual Analytics • Visual Analytics for Investigative Analysis on Text Documents
  • 10. Techniques and Technologies • A wide variety of techniques and technologies has been developed and adapted for • Data aggregation • Data manipulation • Data analysis • Data visualization • These techniques and technologies draw from several fields including • Statistics • Computer science • Applied mathematics • Economics.
  • 11. Techniques and Applications • Statistics: A/B testing(split testing/bucket testing ),Spatial analysis , Predictive modeling :Regression • Machine Learning • Unsupervised learning: cluster analysis • Supervised learning: classification, support vector machines(SVM), ensemble learning • Association rule learning • Data Mining and Pattern Recognition: neural network, classification, clustering • Natural language processing(NLP): Sentiment analysis • Dimension Reduction: PCA, MDS, SVD • Data fusion and data integration: Visual Word • Time series analysis: Combination of statistics and signal processing • Simulation: Monte Carlo simulations, MRF • Optimization: Genetic algorithms • Visualization: Scientific Viz, Inforviz, Visual Analtytics
  • 12. Technologies • Database and Data warehouse • Google File System and MapReduce: Big Table • Hadoop: HBase and MapReduce, open source Apache project • Cassandra: An open source (free) DBMS, originally developed at Facebook and now an Apache Software foundation project. • Data warehouse: ETL (extract, transform, and load) tools and business intelligence tools. • Business intelligence (BI): data warehouse, reporting, real-time management dashboards • Cloud computing: Services, SOA, etc. • Metadata: XML • Stream processing • R, SAS and SPSS • Visualization:Tag cloud,Clustergram,History flow, Themeriver, Treemap
  • 13. Origin of Information Visualization
  • 14. InforViz Techniques • Scatterplot and Scatterplot Matrix • Hierarchies Visualization:Node-Link Diagrams, Sunburst,Treemap, Circle- packing layouts • Network Visualization:Force-Directed Layout,Arc Diagrams,Matrix Views • Multidimensional Visualization/Parallel Coordinates • Stacked Graphs • Flow Maps
  • 23.
  • 24. Fraud Detection of Bank Wire Transactions
  • 28. Smart Money Map [Demo]
  • 30. Chances and Challenges • The basic techniques for large scale simulation and computing are ready • However, large and time-consuming computing tasks need steering or visualize the intermediate computing results. • Most simulation and computing tasks have to tune hundreds of parameters. • Smart/intelligent data mining/data processing algorithms are ready • However, most data mining algorithms have high computational complexity: N2 rather than Nlog(N), or N • How to combine automatic computing(machine) and high-level intelligence to gain insight(Human), and involve human in the computing?
  • 31. Recent Research Topics • Unified Visual Analytics by Heterogeneous Data Sources(esp. Text) • Structured and semi-structured data fusion framework • Data indexing and similarity rank • Visual analytics for high-dimensional heterogeneous data • Domain Risk Management and Preventive Control by Sensor Data Collection and Data Mining • Sensor techniques • Data Warehouse • Coordinated Views integrate visual analytic techniques • Parallel/Distributed Computing Steering by Parameter Optimization and Visualization • Parameter tuning and computing optimization • Intermediate results visualization and task steering • Markov Chain Monte Carlo(MCMC) Simulation