SlideShare ist ein Scribd-Unternehmen logo
1 von 52
Outline• Forecasts and
Why Big data?
• What is Big
Data?
• Parts of the
Puzzle
• Quick Tutorial
• Conclusion
Photo by John Trainoron Flickr
http://www.flickr.com/photos/trainor/2902023575/, Licensed under CC
Asimov Foundation
• 20 million worlds
• A Scientist who
mathematically calculate
the fate of the Universe
• His effort to change that
Fate
• A Beautiful story
Consider a day in your life
• What is the best road to take?
• Would there be any bad
weather?
• What is the best way to invest
the money?
• Should I take that loan?
• Can I optimize my day?
• Is there a way to do this
faster?
• What have others done in
similar cases?
• Which product should I buy?
People wanted to (through ages)
• To know (what
happened?)
• To Explain (why it
happened)
• To Predict (what will
happen?)
Many Cultures had
Claim for Omniscience
• Oracle
• Astrology
• Book of Changes
• Tarot Cards
• Crystal balls
Claim for Omniscience
To know, explain and predict!!
• Grand challenge of our time
• We have been trying to do this in
many other means
• Now we trying to do this via
science
Any sufficiently advanced technology
is indistinguishable from magic.
---Arthur C. Clarke.
• We see a possibilities though “lot
of data”
You does not seem to be convinced!!
• Why, sometimes I've believed as many as six
impossible things before breakfast.
Prophecies of our time
• We can predict Weather
• We do understand language
translation pretty well
• We can predict how an air
plane behave good enough to
fly
• Our ability on forensics
• We understand remedies for
many diseases
• We can tell how astro bodies
will behave (e.g. comets)
This is being Done: Weather
• Remarkably accurate for few
days
– SL incident
• Data :- weather radars, satellite
data, weather balloons, planes
etc.,
• forecast models
– Simulation
– Numerical method
• Challenge is computing power
– algorithms that take more than
24 hours
– resolution is the key ……
Democratizing Analysis
• Forecasting was done, but in limited manner
and only by few (e.g. National Labs,
Intelligence community)
• That is changing!!
What is Big data?
• There is lot of data available
– E.g. Internet of things
• We have computing power
• We have technology
• Goal is same
– To know
– To Explain
– To predict
• Challenge is the full lifecycle
Data, the wealth of our time
"Data is a precious
thing because they last
longer than systems” -
Tim Barnes Lee
• Access to data is becoming ultimate
competitive advantage
– E.g. Google+ vs. Facebook
– Why many organizations try hard to give us free
things and keep us always logged in (e.g. Gmail,
facebook, search engine tool bars)
Drivers of Big Data
Data Avalanche/ Moore’s law of data
• We are now collecting and converting large amount
of data to digital forms
• 90% of the data in the world today was created
within the past two years.
• Amount of data we have doubles very fast
In real life, most data are Big
• Web does millions of activities per second, and so
much server logs are created.
• Social networks e.g. Facebook, 800 Million active
users, 40 billion photos from its user base.
• There are >4 billion phones and >25% are smart
phones. There are billions of RFID tags.
• Observational and Sensor data
– Weather Radars, Balloons
– Environmental Sensors
– Telescopes
– Complex physics simulations
Why Big Data is hard?
• How store? Assuming 1TB bytes it
takes 1000 computers to store a 1PB
• How to move? Assuming 10Gb
network, it takes 2 hours to copy 1TB,
or 83 days to copy a 1PB
• How to search? Assuming each record
is 1KB and one machine can process
1000 records per sec, it needs 277CPU
days to process a 1TB and 785 CPU
years to process a 1 PB
• How to process?
– How to convert algorithms to work in large
size
– How to create new algorithms
http://www.susanica.com/photo/9
Why it is hard (Contd.)?
• System build of many
computers
• That handles lots of data
• Running complex logic
• This pushes us to frontier of
Distributed Systems and
Databases
• More data does not mean
there is a simple model
• Some models can be complex
as the system
http://www.flickr.com/photos/mariachily/5250487136,
Licensed CC
Big Data Architecture
Sensors
• There are sensors everywhere
– RFID (e.g. Walmart), GPS
sensors, Mobile Phone …
• Internet
– Click streams, Emails, chat,
search, tweets ,Transactions …
• Real Word
– Video surveillance, Cash flows,
Traffic, Surveillance, Stock
exchange, Smart Grid,
Production line …
– Internet of Things
http://www.flickr.com/photos/imuttoo/4257813689/ by Ian Muttoo,
http://www.flickr.com/photos/eastcapital/4554220770/,
http://www.flickr.com/photos/patdavid/4619331472/ by Pat David copyright CC
Collecting Data
• Data collected at sensors and sent to big data
system via events or flat files
• Event Streams: we name the events by its
content/ originator
• Get data through
– Point to Point
– Event Bus
• E.g. Data bridge
– a thrift based transport we
did that do about 400k
events/ sec
Storing Data
• Historically we used databases
– Scale is a challenge: replication,
sharding
• Scalable options
– NoSQL (Cassandra, Hbase) [If
data is structured]
• Column families Gaining Ground
– Distributed file systems (e.g.
HDFS) [If data is unstructured]
• New SQL
– In Memory computing, VoltDB
• Specialized data structures
– Graph Databases, Data structure
servers http://www.flickr.com/photos/keso/3631339
67/
Making Sense of Data
• To know (what happened?)
– Basic analytics +
visualizations (min, max,
average, histogram,
distributions … )
– Interactive drill down
• To explain (why)
– Data mining, classifications,
building models, clustering
• To forecast
– Neural networks, decision
models
To know (what happened?)
• Mainly Analytics
– Min, Max, average,
correlation, histograms
– Might join group data in many
ways
• Implemented with
MapReduce or Queries
• Data is often presented with
some visualizations
• Examples
– forensics
– Assessments
– Historical data/ reports/
trends
http://www.flickr.com/photos/isriya/2967310
333/
Search
• Process and Index the data. The killer app of
our time.
http://pixabay.com, by Steven Iodice
• Web Search
• Graph Search
• Semantic Search
• Drug Discovery
• …
Patterns
• Correlation
– Scatter plot, statistical
correlation
• Data Mining (Detecting
Patterns)
– Clustering and classification
– Finding Similar items
– Finding Hubs and authorities
in a Graph
– Finding frequent item sets
– Making recommendation
• Apache Mahout
http://www.flickr.com/photos/eriwst/2987739376/ and http://www.flickr.com/photos/focx/5035444779/
Forecasts and Models
• Trying to build a model for the
data
• Theoretically or empirically
– Analytical models (e.g. Physics)
– Neural networks
– Reinforcement learning
– Unsupervised learning (clustering,
dimensionality reduction, kernel
methods)
• Examples
– Translation
– Weather Forecast models
– Building profiles of users
– Traffic models
– Economic models
• Remember: Correlation does not
mean causality
http://misterbijou.blogspot.com/201
0_09_01_archive.html
Information Visualization
• Presenting information
– To end user
– To decision takers
– To scientist
• Interactive exploration
• Sending alerts
http://www.flickr.com/photos/stevefae
embra/3604686097/
Napoleon's Russian Campaign
Show Han Rosling’s Data
• This use a tool called “Gapminder”
Usecase 1: Targeted Marketing
• Collect data and build a model on
– What user like
– What he has brought
– What is his buying power
• Making recommendations
• Giving personalized deals
Usecase 2: Travel
• Collect traffic data +
transportation data from
sensors
• Build a model (e.g. Car
Following Models)
• Predict the traffic ..
Possibilities on congestion
• E.g. divert traffic, adjust
troll, adjust traffic lights
Practical Tutorial
• Data collection
– Pub/sub, event architectures
• Processing
– Store and Process: MapReduce/Hadoop
– Processing Moving Data: CEP
• Visualization
– GNU Plot/ R
DEBS Challenge
• Event Processing
challenge
• Real football game,
sensors in player
shoes + ball
• Events in 15k Hz
• Event format
– Sensor ID, TS, x, y, z, v,
a
• Queries
– Running Stats
– Ball Possession
– Heat Map of Activity
– Shots at Goal
MapReduce/ Hadoop
• First introduced by Google,
and used as the processing
model for their architecture
• Implemented by opensource
projects like Apache Hadoop
and Spark
• Users writes two functions:
map and reduce
• The framework handles the
details like distributed
processing, fault tolerance,
load balancing etc.
• Widely used, and the one of
the catalyst of Big data
void map(ctx, k, v){
tokens = v.split();
for t in tokens
ctx.emit(t,1)
}
void reduce(ctx, k, values[]){
count = 0;
for v in values
count = count + v;
ctx.emit(k,count);
}
MapReduce (Contd.)
Histogram of Speeds
Histogram of Speeds(Contd.)
void map(ctx, k, v){
event = parse(v);
range = calcuateRange(event.v);
ctx.emit(range,1)
}
void reduce(ctx, k, values[]){
count = 0;
for v in values
count = count + v;
ctx.emit(k,count);
}
How long player Spend Near the ball?
How long player Spend Near the
ball?(Contd.)
void map(ctx, k, v){
event = parse(v);
cell = calcuateOverlappingCells(v.x,
v.y, v.z);
ctx.emit(cell,v.x, v.y, v.z);
}
Void reduce(ctx, k, values[]){
players = joinAndFindPlayersNearBall(values);
for p in players
ctx.emit(p,p);
}
Hadoop Landscape
• HIVE - Query data using SQL style queries, and Hive will
convert them to MapReduce jobs and run in Hadoop.
• Pig - We write programs using data flow style scripts, and
Pig convert them to MapReduce jobs and run in Hadoop.
• Mahout
– Collection of MapReduce jobs implementing many Data
mining and Artificial Intelligence algorithms using MapReduce
A = load 'hdi-data.csv' using PigStorage(',')
AS (id:int, country:chararray, hdi:float,
lifeex:int, mysch:int, eysch:int, gni:int);
B = FILTER A BY gni > 2000;
C = ORDER B BY gni;
dump C;
hive> SELECT country, gni from HDI WHERE gni > 2000;
Is Hadoop Enough?
• Limitations
– Takes time for processing
– Lack of Incremental processing
– Weak with Graph usecases
– Not very easy to create a processing pipeline (addressed
with HIVE, Pig etc.)
– Too close to programmers
– Faster implementations are possible
• Alternatives
– Apache Drill http://incubator.apache.org/drill/
– Spark http://spark-project.org/
– Graph Processors like http://giraph.apache.org/
Data In the Move
• Idea is to process data as they
are received in streaming
fashion
• Used when we need
– Very fast output
– Lots of events (few 100k to
millions)
– Processing without storing (e.g.
too much data)
• Two main technologies
– Stream Processing (e.g. Strom,
http://storm-project.net/ )
– Complex Event Processing (CEP)
http://wso2.com/products/complex
-event-processor/
Complex Event Processing (CEP)
• Sees inputs as Event streams and queried with
SQL like language
• Supports Filters, Windows, Join, Patterns and
Sequences
from p=PINChangeEvents#win.time(3600) join
t=TransactionEvents[p.custid=custid][amount>1000
0] #win.time(3600)
return t.custid, t.amount;
Example: Detect ball Possession
• Possession is time a
player hit the ball
until someone else
hits it or it goes out
of the ground
from Ball#window.length(1) as b join
Players#window.length(1) as p
unidirectional
on debs: getDistance(b.x,b.y,b.z,
p.x, p.y, p.z) < 1000
and b.a > 55
select ...
insert into hitStream
from old = hitStream ,
b = hitStream [old. pid != pid ],
n= hitStream[b.pid == pid]*,
( e1 = hitStream[b.pid != pid ]
or e2= ballLeavingHitStream)
select ...
insert into BallPossessionStream
http://www.flickr.com/photos/glennharper/146164820/
GNU Plot
set terminal png
set output "buyfreq.png"
set title "Frequency Distribution of Items
brought by Buyer";
setylabel "Number of Items Brought";
setxlabel "Buyers Sorted by Items count";
set key left top
set log y
set log x
plot "1.data" using 2 title "Frequency"
with linespoints
• Open source
• Very powerful
Big data lifecycle
• Realizing the big data lifecycle is hard
• Need wide understanding about many fields
• Big data teams will include members from
many fields working together
• Data Scientists: A new role, evolving Job
description
• His Job is to make sense of data, by using Big data
processing, advance algorithms, Statistical
methods, and data mining etc.
• Likely to be a highest paid/ cool Jobs.
• Big organizations (Google, Facebook, Banks) hiring
Math PhD by a lot for this.
Data Scientist
Statisticians
System experts
Domain Experts
DB experts
AI experts
Dark Side
• Privacy
– Invasion of privacy
– Data might be used for
unexpected things
• Big Brother
– Data likely to used for
control (e.g. governments)
• If technology is out there,
may be it is OK. It is very
hard to hide any thing,
which work both ways
Challenges
• Speed (e.g. targeted advertising, reacting to data)
• Extracting semantics and handling multiple
representations and formats
• Security Data ownership, delegation, permissions,
and Privacy
• Making data accessible to all intended parties,
from anywhere, anytime, from any device,
through any format (subjected to permissions).
• Map-Reduce good enough? What about other
parallel problems?
• Handling Uncertainty
Conclusions
• Lot of data, realizing data -> insight -> predications
• We do lot of predications even now.
• There is lot between data and forecasts, OK to do
them.
– Analytics
– Visualizations
– Patterns
– Data mining
• If you looking to start, learn MapReduce, CEP, and
GNU plot.
• Visualization is the key!!
• Learn some distributed systems and AI
Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

Big Data
Big DataBig Data
Big DataRohit Jain
 
Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)Yaman Hajja, Ph.D.
 
Big data Presentation
Big data PresentationBig data Presentation
Big data PresentationAswadmehar
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyRohit Dubey
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop IntroductionJayant Mukherjee
 
Big data
Big dataBig data
Big dataPooja Shah
 
Our big data
Our big dataOur big data
Our big datauthrarajan
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Simplilearn
 
Big data ppt
Big data pptBig data ppt
Big data pptYash Raj
 
Big data introduction
Big data introductionBig data introduction
Big data introductionChirag Ahuja
 
The Advantages and Disadvantages of Big Data
The Advantages and Disadvantages of Big DataThe Advantages and Disadvantages of Big Data
The Advantages and Disadvantages of Big DataNicha Tatsaneeyapan
 
Big data-ppt
Big data-pptBig data-ppt
Big data-pptNazir Ahmed
 
A Brief History of Big Data
A Brief History of Big DataA Brief History of Big Data
A Brief History of Big DataBernard Marr
 

Was ist angesagt? (20)

Big Data
Big DataBig Data
Big Data
 
Overview of Big data(ppt)
Overview of Big data(ppt)Overview of Big data(ppt)
Overview of Big data(ppt)
 
Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)
 
Big data Presentation
Big data PresentationBig data Presentation
Big data Presentation
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit Dubey
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
 
Big Data
Big DataBig Data
Big Data
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big data
Big dataBig data
Big data
 
Our big data
Our big dataOur big data
Our big data
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big data
Big dataBig data
Big data
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big data introduction
Big data introductionBig data introduction
Big data introduction
 
The Advantages and Disadvantages of Big Data
The Advantages and Disadvantages of Big DataThe Advantages and Disadvantages of Big Data
The Advantages and Disadvantages of Big Data
 
Big data-ppt
Big data-pptBig data-ppt
Big data-ppt
 
Big Data
Big DataBig Data
Big Data
 
Big data
Big dataBig data
Big data
 
A Brief History of Big Data
A Brief History of Big DataA Brief History of Big Data
A Brief History of Big Data
 

Ähnlich wie Introduction to Big Data

Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack Srinath Perera
 
Building your big data solution
Building your big data solution Building your big data solution
Building your big data solution WSO2
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapSrinath Perera
 
Data analytics introduction
Data analytics introductionData analytics introduction
Data analytics introductionamiyadash
 
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...Srinath Perera
 
Big Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesBig Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesRukshan Batuwita
 
Introduction to Data Processing (by Srinath Perera)
Introduction to Data Processing (by Srinath Perera)Introduction to Data Processing (by Srinath Perera)
Introduction to Data Processing (by Srinath Perera)SLASSCOM Technology Forum
 
SĂŠminaire Big Data Alter Way - Elasticsearch - octobre 2014
SĂŠminaire Big Data Alter Way - Elasticsearch - octobre 2014SĂŠminaire Big Data Alter Way - Elasticsearch - octobre 2014
SĂŠminaire Big Data Alter Way - Elasticsearch - octobre 2014ALTER WAY
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analyticshumerashaziya
 
datamining-lect1.pptx
datamining-lect1.pptxdatamining-lect1.pptx
datamining-lect1.pptxGautamDematti1
 
chương 1 - Tổng quan về khai phá dữ liệu.pdf
chương 1 - Tổng quan về khai phá dữ liệu.pdfchương 1 - Tổng quan về khai phá dữ liệu.pdf
chương 1 - Tổng quan về khai phá dữ liệu.pdfphongnguyen312110237
 
Internet of Things and Big Data
Internet of Things and Big DataInternet of Things and Big Data
Internet of Things and Big DataSrinath Perera
 
Data Mining Lecture_1.pptx
Data Mining Lecture_1.pptxData Mining Lecture_1.pptx
Data Mining Lecture_1.pptxSubrata Kumer Paul
 
WebServices_Grid.ppt
WebServices_Grid.pptWebServices_Grid.ppt
WebServices_Grid.pptEqinNiftalyev
 
Data sciences and marketing analytics
Data sciences and marketing analyticsData sciences and marketing analytics
Data sciences and marketing analyticsMJ Xavier
 

Ähnlich wie Introduction to Big Data (20)

Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack
 
Building your big data solution
Building your big data solution Building your big data solution
Building your big data solution
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
 
Data analytics introduction
Data analytics introductionData analytics introduction
Data analytics introduction
 
00-01 DSnDA.pdf
00-01 DSnDA.pdf00-01 DSnDA.pdf
00-01 DSnDA.pdf
 
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
 
Big Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesBig Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our Lives
 
Lecture1
Lecture1Lecture1
Lecture1
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
 
Introduction to Data Processing (by Srinath Perera)
Introduction to Data Processing (by Srinath Perera)Introduction to Data Processing (by Srinath Perera)
Introduction to Data Processing (by Srinath Perera)
 
SĂŠminaire Big Data Alter Way - Elasticsearch - octobre 2014
SĂŠminaire Big Data Alter Way - Elasticsearch - octobre 2014SĂŠminaire Big Data Alter Way - Elasticsearch - octobre 2014
SĂŠminaire Big Data Alter Way - Elasticsearch - octobre 2014
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
datamining-lect1.pptx
datamining-lect1.pptxdatamining-lect1.pptx
datamining-lect1.pptx
 
chương 1 - Tổng quan về khai phá dữ liệu.pdf
chương 1 - Tổng quan về khai phá dữ liệu.pdfchương 1 - Tổng quan về khai phá dữ liệu.pdf
chương 1 - Tổng quan về khai phá dữ liệu.pdf
 
Internet of Things and Big Data
Internet of Things and Big DataInternet of Things and Big Data
Internet of Things and Big Data
 
Data Mining Lecture_1.pptx
Data Mining Lecture_1.pptxData Mining Lecture_1.pptx
Data Mining Lecture_1.pptx
 
WebServices_Grid.ppt
WebServices_Grid.pptWebServices_Grid.ppt
WebServices_Grid.ppt
 
Data sciences and marketing analytics
Data sciences and marketing analyticsData sciences and marketing analytics
Data sciences and marketing analytics
 
SKILLWISE-BIGDATA ANALYSIS
SKILLWISE-BIGDATA ANALYSISSKILLWISE-BIGDATA ANALYSIS
SKILLWISE-BIGDATA ANALYSIS
 
Bigdata analytics
Bigdata analyticsBigdata analytics
Bigdata analytics
 

Mehr von Srinath Perera

Book: Software Architecture and Decision-Making
Book: Software Architecture and Decision-MakingBook: Software Architecture and Decision-Making
Book: Software Architecture and Decision-MakingSrinath Perera
 
Data science Applications in the Enterprise
Data science Applications in the EnterpriseData science Applications in the Enterprise
Data science Applications in the EnterpriseSrinath Perera
 
An Introduction to APIs
An Introduction to APIs An Introduction to APIs
An Introduction to APIs Srinath Perera
 
An Introduction to Blockchain for Finance Professionals
An Introduction to Blockchain for Finance ProfessionalsAn Introduction to Blockchain for Finance Professionals
An Introduction to Blockchain for Finance ProfessionalsSrinath Perera
 
AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?Srinath Perera
 
Healthcare + AI: Use cases & Challenges
Healthcare + AI: Use cases & ChallengesHealthcare + AI: Use cases & Challenges
Healthcare + AI: Use cases & ChallengesSrinath Perera
 
How would AI shape Future Integrations?
How would AI shape Future Integrations?How would AI shape Future Integrations?
How would AI shape Future Integrations?Srinath Perera
 
The Role of Blockchain in Future Integrations
The Role of Blockchain in Future IntegrationsThe Role of Blockchain in Future Integrations
The Role of Blockchain in Future IntegrationsSrinath Perera
 
Future of Serverless
Future of ServerlessFuture of Serverless
Future of ServerlessSrinath Perera
 
Blockchain: Where are we? Where are we going?
Blockchain: Where are we? Where are we going? Blockchain: Where are we? Where are we going?
Blockchain: Where are we? Where are we going? Srinath Perera
 
Few thoughts about Future of Blockchain
Few thoughts about Future of BlockchainFew thoughts about Future of Blockchain
Few thoughts about Future of BlockchainSrinath Perera
 
A Visual Canvas for Judging New Technologies
A Visual Canvas for Judging New TechnologiesA Visual Canvas for Judging New Technologies
A Visual Canvas for Judging New TechnologiesSrinath Perera
 
Privacy in Bigdata Era
Privacy in Bigdata  EraPrivacy in Bigdata  Era
Privacy in Bigdata EraSrinath Perera
 
Blockchain, Impact, Challenges, and Risks
Blockchain, Impact, Challenges, and RisksBlockchain, Impact, Challenges, and Risks
Blockchain, Impact, Challenges, and RisksSrinath Perera
 
Today's Technology and Emerging Technology Landscape
Today's Technology and Emerging Technology LandscapeToday's Technology and Emerging Technology Landscape
Today's Technology and Emerging Technology LandscapeSrinath Perera
 
An Emerging Technologies Timeline
An Emerging Technologies TimelineAn Emerging Technologies Timeline
An Emerging Technologies TimelineSrinath Perera
 
The Rise of Streaming SQL and Evolution of Streaming Applications
The Rise of Streaming SQL and Evolution of Streaming ApplicationsThe Rise of Streaming SQL and Evolution of Streaming Applications
The Rise of Streaming SQL and Evolution of Streaming ApplicationsSrinath Perera
 
Analytics and AI: The Good, the Bad and the Ugly
Analytics and AI: The Good, the Bad and the UglyAnalytics and AI: The Good, the Bad and the Ugly
Analytics and AI: The Good, the Bad and the UglySrinath Perera
 
Transforming a Business Through Analytics
Transforming a Business Through AnalyticsTransforming a Business Through Analytics
Transforming a Business Through AnalyticsSrinath Perera
 
SoC Keynote:The State of the Art in Integration Technology
SoC Keynote:The State of the Art in Integration TechnologySoC Keynote:The State of the Art in Integration Technology
SoC Keynote:The State of the Art in Integration TechnologySrinath Perera
 

Mehr von Srinath Perera (20)

Book: Software Architecture and Decision-Making
Book: Software Architecture and Decision-MakingBook: Software Architecture and Decision-Making
Book: Software Architecture and Decision-Making
 
Data science Applications in the Enterprise
Data science Applications in the EnterpriseData science Applications in the Enterprise
Data science Applications in the Enterprise
 
An Introduction to APIs
An Introduction to APIs An Introduction to APIs
An Introduction to APIs
 
An Introduction to Blockchain for Finance Professionals
An Introduction to Blockchain for Finance ProfessionalsAn Introduction to Blockchain for Finance Professionals
An Introduction to Blockchain for Finance Professionals
 
AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?
 
Healthcare + AI: Use cases & Challenges
Healthcare + AI: Use cases & ChallengesHealthcare + AI: Use cases & Challenges
Healthcare + AI: Use cases & Challenges
 
How would AI shape Future Integrations?
How would AI shape Future Integrations?How would AI shape Future Integrations?
How would AI shape Future Integrations?
 
The Role of Blockchain in Future Integrations
The Role of Blockchain in Future IntegrationsThe Role of Blockchain in Future Integrations
The Role of Blockchain in Future Integrations
 
Future of Serverless
Future of ServerlessFuture of Serverless
Future of Serverless
 
Blockchain: Where are we? Where are we going?
Blockchain: Where are we? Where are we going? Blockchain: Where are we? Where are we going?
Blockchain: Where are we? Where are we going?
 
Few thoughts about Future of Blockchain
Few thoughts about Future of BlockchainFew thoughts about Future of Blockchain
Few thoughts about Future of Blockchain
 
A Visual Canvas for Judging New Technologies
A Visual Canvas for Judging New TechnologiesA Visual Canvas for Judging New Technologies
A Visual Canvas for Judging New Technologies
 
Privacy in Bigdata Era
Privacy in Bigdata  EraPrivacy in Bigdata  Era
Privacy in Bigdata Era
 
Blockchain, Impact, Challenges, and Risks
Blockchain, Impact, Challenges, and RisksBlockchain, Impact, Challenges, and Risks
Blockchain, Impact, Challenges, and Risks
 
Today's Technology and Emerging Technology Landscape
Today's Technology and Emerging Technology LandscapeToday's Technology and Emerging Technology Landscape
Today's Technology and Emerging Technology Landscape
 
An Emerging Technologies Timeline
An Emerging Technologies TimelineAn Emerging Technologies Timeline
An Emerging Technologies Timeline
 
The Rise of Streaming SQL and Evolution of Streaming Applications
The Rise of Streaming SQL and Evolution of Streaming ApplicationsThe Rise of Streaming SQL and Evolution of Streaming Applications
The Rise of Streaming SQL and Evolution of Streaming Applications
 
Analytics and AI: The Good, the Bad and the Ugly
Analytics and AI: The Good, the Bad and the UglyAnalytics and AI: The Good, the Bad and the Ugly
Analytics and AI: The Good, the Bad and the Ugly
 
Transforming a Business Through Analytics
Transforming a Business Through AnalyticsTransforming a Business Through Analytics
Transforming a Business Through Analytics
 
SoC Keynote:The State of the Art in Integration Technology
SoC Keynote:The State of the Art in Integration TechnologySoC Keynote:The State of the Art in Integration Technology
SoC Keynote:The State of the Art in Integration Technology
 

KĂźrzlich hochgeladen

Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptDr. Soumendra Kumar Patra
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectBoston Institute of Analytics
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 

KĂźrzlich hochgeladen (20)

Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 

Introduction to Big Data

  • 1.
  • 2. Outline• Forecasts and Why Big data? • What is Big Data? • Parts of the Puzzle • Quick Tutorial • Conclusion Photo by John Trainoron Flickr http://www.flickr.com/photos/trainor/2902023575/, Licensed under CC
  • 3. Asimov Foundation • 20 million worlds • A Scientist who mathematically calculate the fate of the Universe • His effort to change that Fate • A Beautiful story
  • 4. Consider a day in your life • What is the best road to take? • Would there be any bad weather? • What is the best way to invest the money? • Should I take that loan? • Can I optimize my day? • Is there a way to do this faster? • What have others done in similar cases? • Which product should I buy?
  • 5. People wanted to (through ages) • To know (what happened?) • To Explain (why it happened) • To Predict (what will happen?)
  • 6. Many Cultures had Claim for Omniscience • Oracle • Astrology • Book of Changes • Tarot Cards • Crystal balls Claim for Omniscience
  • 7. To know, explain and predict!! • Grand challenge of our time • We have been trying to do this in many other means • Now we trying to do this via science Any sufficiently advanced technology is indistinguishable from magic. ---Arthur C. Clarke. • We see a possibilities though “lot of data”
  • 8. You does not seem to be convinced!! • Why, sometimes I've believed as many as six impossible things before breakfast.
  • 9. Prophecies of our time • We can predict Weather • We do understand language translation pretty well • We can predict how an air plane behave good enough to fly • Our ability on forensics • We understand remedies for many diseases • We can tell how astro bodies will behave (e.g. comets)
  • 10. This is being Done: Weather • Remarkably accurate for few days – SL incident • Data :- weather radars, satellite data, weather balloons, planes etc., • forecast models – Simulation – Numerical method • Challenge is computing power – algorithms that take more than 24 hours – resolution is the key ……
  • 11. Democratizing Analysis • Forecasting was done, but in limited manner and only by few (e.g. National Labs, Intelligence community) • That is changing!!
  • 12. What is Big data? • There is lot of data available – E.g. Internet of things • We have computing power • We have technology • Goal is same – To know – To Explain – To predict • Challenge is the full lifecycle
  • 13. Data, the wealth of our time "Data is a precious thing because they last longer than systems” - Tim Barnes Lee • Access to data is becoming ultimate competitive advantage – E.g. Google+ vs. Facebook – Why many organizations try hard to give us free things and keep us always logged in (e.g. Gmail, facebook, search engine tool bars)
  • 15. Data Avalanche/ Moore’s law of data • We are now collecting and converting large amount of data to digital forms • 90% of the data in the world today was created within the past two years. • Amount of data we have doubles very fast
  • 16. In real life, most data are Big • Web does millions of activities per second, and so much server logs are created. • Social networks e.g. Facebook, 800 Million active users, 40 billion photos from its user base. • There are >4 billion phones and >25% are smart phones. There are billions of RFID tags. • Observational and Sensor data – Weather Radars, Balloons – Environmental Sensors – Telescopes – Complex physics simulations
  • 17. Why Big Data is hard? • How store? Assuming 1TB bytes it takes 1000 computers to store a 1PB • How to move? Assuming 10Gb network, it takes 2 hours to copy 1TB, or 83 days to copy a 1PB • How to search? Assuming each record is 1KB and one machine can process 1000 records per sec, it needs 277CPU days to process a 1TB and 785 CPU years to process a 1 PB • How to process? – How to convert algorithms to work in large size – How to create new algorithms http://www.susanica.com/photo/9
  • 18. Why it is hard (Contd.)? • System build of many computers • That handles lots of data • Running complex logic • This pushes us to frontier of Distributed Systems and Databases • More data does not mean there is a simple model • Some models can be complex as the system http://www.flickr.com/photos/mariachily/5250487136, Licensed CC
  • 20. Sensors • There are sensors everywhere – RFID (e.g. Walmart), GPS sensors, Mobile Phone … • Internet – Click streams, Emails, chat, search, tweets ,Transactions … • Real Word – Video surveillance, Cash flows, Traffic, Surveillance, Stock exchange, Smart Grid, Production line … – Internet of Things http://www.flickr.com/photos/imuttoo/4257813689/ by Ian Muttoo, http://www.flickr.com/photos/eastcapital/4554220770/, http://www.flickr.com/photos/patdavid/4619331472/ by Pat David copyright CC
  • 21. Collecting Data • Data collected at sensors and sent to big data system via events or flat files • Event Streams: we name the events by its content/ originator • Get data through – Point to Point – Event Bus • E.g. Data bridge – a thrift based transport we did that do about 400k events/ sec
  • 22. Storing Data • Historically we used databases – Scale is a challenge: replication, sharding • Scalable options – NoSQL (Cassandra, Hbase) [If data is structured] • Column families Gaining Ground – Distributed file systems (e.g. HDFS) [If data is unstructured] • New SQL – In Memory computing, VoltDB • Specialized data structures – Graph Databases, Data structure servers http://www.flickr.com/photos/keso/3631339 67/
  • 23. Making Sense of Data • To know (what happened?) – Basic analytics + visualizations (min, max, average, histogram, distributions … ) – Interactive drill down • To explain (why) – Data mining, classifications, building models, clustering • To forecast – Neural networks, decision models
  • 24. To know (what happened?) • Mainly Analytics – Min, Max, average, correlation, histograms – Might join group data in many ways • Implemented with MapReduce or Queries • Data is often presented with some visualizations • Examples – forensics – Assessments – Historical data/ reports/ trends http://www.flickr.com/photos/isriya/2967310 333/
  • 25. Search • Process and Index the data. The killer app of our time. http://pixabay.com, by Steven Iodice • Web Search • Graph Search • Semantic Search • Drug Discovery • …
  • 26. Patterns • Correlation – Scatter plot, statistical correlation • Data Mining (Detecting Patterns) – Clustering and classification – Finding Similar items – Finding Hubs and authorities in a Graph – Finding frequent item sets – Making recommendation • Apache Mahout http://www.flickr.com/photos/eriwst/2987739376/ and http://www.flickr.com/photos/focx/5035444779/
  • 27. Forecasts and Models • Trying to build a model for the data • Theoretically or empirically – Analytical models (e.g. Physics) – Neural networks – Reinforcement learning – Unsupervised learning (clustering, dimensionality reduction, kernel methods) • Examples – Translation – Weather Forecast models – Building profiles of users – Traffic models – Economic models • Remember: Correlation does not mean causality http://misterbijou.blogspot.com/201 0_09_01_archive.html
  • 28. Information Visualization • Presenting information – To end user – To decision takers – To scientist • Interactive exploration • Sending alerts http://www.flickr.com/photos/stevefae embra/3604686097/
  • 30. Show Han Rosling’s Data • This use a tool called “Gapminder”
  • 31. Usecase 1: Targeted Marketing • Collect data and build a model on – What user like – What he has brought – What is his buying power • Making recommendations • Giving personalized deals
  • 32. Usecase 2: Travel • Collect traffic data + transportation data from sensors • Build a model (e.g. Car Following Models) • Predict the traffic .. Possibilities on congestion • E.g. divert traffic, adjust troll, adjust traffic lights
  • 33. Practical Tutorial • Data collection – Pub/sub, event architectures • Processing – Store and Process: MapReduce/Hadoop – Processing Moving Data: CEP • Visualization – GNU Plot/ R
  • 34. DEBS Challenge • Event Processing challenge • Real football game, sensors in player shoes + ball • Events in 15k Hz • Event format – Sensor ID, TS, x, y, z, v, a • Queries – Running Stats – Ball Possession – Heat Map of Activity – Shots at Goal
  • 35. MapReduce/ Hadoop • First introduced by Google, and used as the processing model for their architecture • Implemented by opensource projects like Apache Hadoop and Spark • Users writes two functions: map and reduce • The framework handles the details like distributed processing, fault tolerance, load balancing etc. • Widely used, and the one of the catalyst of Big data void map(ctx, k, v){ tokens = v.split(); for t in tokens ctx.emit(t,1) } void reduce(ctx, k, values[]){ count = 0; for v in values count = count + v; ctx.emit(k,count); }
  • 38. Histogram of Speeds(Contd.) void map(ctx, k, v){ event = parse(v); range = calcuateRange(event.v); ctx.emit(range,1) } void reduce(ctx, k, values[]){ count = 0; for v in values count = count + v; ctx.emit(k,count); }
  • 39. How long player Spend Near the ball?
  • 40. How long player Spend Near the ball?(Contd.) void map(ctx, k, v){ event = parse(v); cell = calcuateOverlappingCells(v.x, v.y, v.z); ctx.emit(cell,v.x, v.y, v.z); } Void reduce(ctx, k, values[]){ players = joinAndFindPlayersNearBall(values); for p in players ctx.emit(p,p); }
  • 41. Hadoop Landscape • HIVE - Query data using SQL style queries, and Hive will convert them to MapReduce jobs and run in Hadoop. • Pig - We write programs using data flow style scripts, and Pig convert them to MapReduce jobs and run in Hadoop. • Mahout – Collection of MapReduce jobs implementing many Data mining and Artificial Intelligence algorithms using MapReduce A = load 'hdi-data.csv' using PigStorage(',') AS (id:int, country:chararray, hdi:float, lifeex:int, mysch:int, eysch:int, gni:int); B = FILTER A BY gni > 2000; C = ORDER B BY gni; dump C; hive> SELECT country, gni from HDI WHERE gni > 2000;
  • 42. Is Hadoop Enough? • Limitations – Takes time for processing – Lack of Incremental processing – Weak with Graph usecases – Not very easy to create a processing pipeline (addressed with HIVE, Pig etc.) – Too close to programmers – Faster implementations are possible • Alternatives – Apache Drill http://incubator.apache.org/drill/ – Spark http://spark-project.org/ – Graph Processors like http://giraph.apache.org/
  • 43. Data In the Move • Idea is to process data as they are received in streaming fashion • Used when we need – Very fast output – Lots of events (few 100k to millions) – Processing without storing (e.g. too much data) • Two main technologies – Stream Processing (e.g. Strom, http://storm-project.net/ ) – Complex Event Processing (CEP) http://wso2.com/products/complex -event-processor/
  • 44. Complex Event Processing (CEP) • Sees inputs as Event streams and queried with SQL like language • Supports Filters, Windows, Join, Patterns and Sequences from p=PINChangeEvents#win.time(3600) join t=TransactionEvents[p.custid=custid][amount>1000 0] #win.time(3600) return t.custid, t.amount;
  • 45. Example: Detect ball Possession • Possession is time a player hit the ball until someone else hits it or it goes out of the ground from Ball#window.length(1) as b join Players#window.length(1) as p unidirectional on debs: getDistance(b.x,b.y,b.z, p.x, p.y, p.z) < 1000 and b.a > 55 select ... insert into hitStream from old = hitStream , b = hitStream [old. pid != pid ], n= hitStream[b.pid == pid]*, ( e1 = hitStream[b.pid != pid ] or e2= ballLeavingHitStream) select ... insert into BallPossessionStream http://www.flickr.com/photos/glennharper/146164820/
  • 46. GNU Plot set terminal png set output "buyfreq.png" set title "Frequency Distribution of Items brought by Buyer"; setylabel "Number of Items Brought"; setxlabel "Buyers Sorted by Items count"; set key left top set log y set log x plot "1.data" using 2 title "Frequency" with linespoints • Open source • Very powerful
  • 47. Big data lifecycle • Realizing the big data lifecycle is hard • Need wide understanding about many fields • Big data teams will include members from many fields working together
  • 48. • Data Scientists: A new role, evolving Job description • His Job is to make sense of data, by using Big data processing, advance algorithms, Statistical methods, and data mining etc. • Likely to be a highest paid/ cool Jobs. • Big organizations (Google, Facebook, Banks) hiring Math PhD by a lot for this. Data Scientist Statisticians System experts Domain Experts DB experts AI experts
  • 49. Dark Side • Privacy – Invasion of privacy – Data might be used for unexpected things • Big Brother – Data likely to used for control (e.g. governments) • If technology is out there, may be it is OK. It is very hard to hide any thing, which work both ways
  • 50. Challenges • Speed (e.g. targeted advertising, reacting to data) • Extracting semantics and handling multiple representations and formats • Security Data ownership, delegation, permissions, and Privacy • Making data accessible to all intended parties, from anywhere, anytime, from any device, through any format (subjected to permissions). • Map-Reduce good enough? What about other parallel problems? • Handling Uncertainty
  • 51. Conclusions • Lot of data, realizing data -> insight -> predications • We do lot of predications even now. • There is lot between data and forecasts, OK to do them. – Analytics – Visualizations – Patterns – Data mining • If you looking to start, learn MapReduce, CEP, and GNU plot. • Visualization is the key!! • Learn some distributed systems and AI