SlideShare a Scribd company logo
1 of 73
Big Data Intelligence
The Beginning
Prof.Ashok.R | +91-9943900101 | ashok@zettab.com
ZettaB.com
Big Data Training in Coimbatore
Ref: Ullman et.al, Mining Massive Datasets
Caution
• The grass is always green on the other side
Be inspired!
• Stories.. and more stories…
Be informed!
• The devil is in the details
Be challenged!
2Hsuan- Tien Lin
The Dream in 1945
3
• A dream machine of Vannevar Bush (1945)
• An extended supplement to Human Memory
• A device which stores individual library such as
books, records and communications
• Microfilms can be searched, copied and shared
• Useful to store and share information among
lawyers, patent attorney, Doctor and chemists
• The base concept from which WWW evolved
Leads to Web and Web Scale Data
• Data Volume: doubles every 1.2 years
5 EB – Total data produced in 5000 years till 2003
20 EB – Data collected by Google alone for a day now
• Data Variety : structured, semi and unstructured
xml, JSON, doc, pdf, html, email body, .mp4,.jpeg…
• Data Velocity : Lot happens in a minute
72 hours of new video uploaded in YouTube
3 million searches in google
200 million emails sent
350 thousand Tweets | 1 million searches in Twitter
690 thousand shares | 420 GB data handled in FB
20 million photo views in Flickr
4
Source: Qmee, Wikibon
https://blog.kissmetrics.com/facebook-statistics/
Desktop
Hobbyist
Internet
Big Data
Byte one grain of rice
Kilobyte cup of rice
Source: What is big data, Slideshare.net
Megabyte 8 bags of rice
Gigabyte 3 Trucks of rice
Terabyte 2 container ships
Petabyte Fills half the area of Tirupur
Exabyte Fills the area of south india
ZettaByte Fills Indian ocean twice
PB/EB/ZB
210
220
230
240
250
260
270
1
Big Data Intelligence (BDI)
The ability to understand all of us better by connecting the dots
from massive data sets (with TB/PB Volume, streaming
Velocity and Variety in sources) to
predict the future.
6
WHY DO WE PREDICT
7
To Survive
8
With largest neural network brain to store and process
big volume of data with 100 billions of neurons and 2.5
PB equivalent memory @ 100 million MIPS (33K i7 cores)
Vision | Touch | Hearing | Smell | Taste
Scientificamerican.com, Storagecraft.com
1250 MB/s | 125 MB/s | 12.5 MB/s | 1.25 MB/s
You only feel 0.7% of
What you sense
The Prediction Power
• 10000 hours (7-8 years) of rigorous practice is required to be
the world-class expert—in anything Daniel Levitin, The neurologist
• This enables the ability to predict 2 seconds before others-
“Two Second Advantage”
– Wayne Gretzky, The greatest Ice-hockey player of all time, was able to predict where the puck was
going to be, an instant before it arrived
– Sachin Tendulkar
– Warren Buffet
– Viswanath Anand
9
wins the competitors
Can Machines Think (to Predict)?
10
Alan Turing asked this question
in 1950 and proposed a test
to validate it.
Which is machine, and
which is woman???
Can Machine Imitate Brain?
Which is man,
and which is
woman???
Turing Test
11
Did any machine pass?
12
Any machine nearer? (near AI)
“William Wilkinson’s ‘An account of the
principalities of Wallachia and Modavia’
inspired this author’s most famous novel.”
Jeopardy! Quiz Contest.
The challenge is to predict the question and
bet with reasonable confidence.
No.
IBM
Watson Computer
13
“William Wilkinson’s ‘An account of the principalities of Wallachia and Modavia’ inspired this
author’s most famous novel.”
Near AI Solutions
• Natural language processing
• Machine learning
• Prediction analytics
• Face recognition
• Languages translation
• Speech recognition
14
Can machine predict?
• Share price in a stock market next day
• Top 5 products consumers want to buy next week
• Price of Tomato(1 Kg) next month
• No. of cars to be sold next quarter
• Potential criminals in the city/ mega event
• When machine/human will become sick
• Best matched course/school to study
• Best matched job/company to work
15
Google Story: Where it all began
• 50 billion indexed pages
• Thousands/Millions/Billions of pages may match each search
query
• How to rank them in order to display the most relevant
(important) pages in the top.
• Predict what you want to see. Not what you asked.
Do You Know?
4 billion searches happen in a day
Each query uses 1000 nodes
Result returned in 0.2 seconds.
20 billion pages crawled per day
20 Exabytes of data collected in a day
Page Rank
• Give pages ranks (scores) based on links to them
– Links from many pages  high rank
– Link from a high-rank page  high rank
Parallel Programming With Spark, Matei Zaharia



ji
t
it
j
r
r
i
1
d
“rank” rj for page j
Matrix-Vector Multiplication
MatrixGoogle
A.rr t1t


A
• Page rank equation in a practical form,
(Rank vector r is the Eigen vector of A)
Iteration is repeated, till rank vector converges
(or max. iteration reaches)
For iteration t+1,
RAM is not Enough
• Won’t be a problem for small dimension (NxN)
• Consider, N=1 billion (pages that match a query)
• Dimension is now in billions
– A is billion x billion matrix
– r is billion size rank vector
– r(old,new) has two billion entries ( 16 GB for 8 bytes double values)
– A has billion x billion entries ( 8 ExaBytes)
Though, we have methods such as sparse matrix to reduce dimensions in actual implementation.
RAM size of a highly configured server node: 128-512 GB
Worker Node
20
Datacenter node
16 cores
10-30 TB disks
(Secondary)
128-512GB RAM
(Main memory)
1-4TB (SSD)
1 -10 Gbps
0.2-1GB/s
(x10 disks) (Seek)
1-4GB/s
(x4 disks)
40-60GB/s
Source: AmpLab, UCB, Dell
Disk is slowest and not Enough
• 50 billion web pages x 20KB = 1 PB
• 1 computer reads 30-35 MB/sec from disk
~10 months to read all
• Also, it requires 1,000 hard drives to store all
21J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
http://www.mmds.org
can you wait that long for one search?
Parallelism using Cluster
• 8-64 nodes/rack, 4-16 racks in a cluster
• 1 Gbps bandwidth within rack, 8 Gbps out of rack
• Node specs :
8-16 cores, 128-512 GB RAM, 10×1 TB disks
Aggregation switch
Rack switch
ToR
But Nodes Fail at Scale
• One server may stay upto 3 years (1,000 days)
• If you have 1,000 servers, expect to loose 1/day
• Google has 1 Million servers
–Hence 1000 machines will fail every day.
23J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
http://www.mmds.org
Traditional RDBMS Fail
• Not designed for variety of data types (Text, Video, Images)
• Not capable to handle big volume (PB/EB/ZB)
• Not designed for parallelism
• Poor fault tolerance at Scale (Million servers)
• Slow down due to joins, volume, ACID check and high velocity
requests
• Designed for transaction processing; Not designed for deep
analytics (intensive computing)
24
Google Solution: DFS
• Distributed File System
• Divide the bigger data file into smaller chunks of size 16-64
MB and store them in different nodes in different racks.
• Chunks are replicated (2-3) for fault tolerance
25
C0 C1
C2C5
Chunk server 1
D1
C5
Chunk server 3
C1
C3C5
Chunk server 2
…
C2D0
D0
C0 C5
Chunk server N
C2
D0
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
http://www.mmds.org
& Map-Reduce
Map-Reduce environment(Master) takes care of:
• Handling machine failures (with replica nodes)
• Partitioning the input data
• Scheduling workers
• Performing the group by key step
• Managing inter-machine communication
J. Leskovec, A. Rajaraman, J. Ullman: Mining of
Massive Datasets, http://www.mmds.org
26
Big Data Platform
M-R App
MapReduce Stack
(Hadoop & Spark)
Distributed File Systems
(HDFS/ GFS)
BUT WITH RESTRICTION
DFS is useful, only when
• Size is big (> 1 TB)
• Files are rarely updated
– Works for Google (to store indexed pages)
– Will not be effective for Airline reservation system
(where frequent data updates are done)
29
M-R is useful, only when
• Dimension in billions
– Matrix-vector multiplication in Google Pagerank
• Graph with millions of nodes and billions of edges
– FB Network Graph
• Deep analytical application with intensive computing
– Useful in Finding users with similar buying pattern for products
recommendations in Amazon
– But not useful to manage online retail sales of Amazon (frequent data
updates, transactions)
30
BDI PARADIGM
Google Creates
• DFS (GFS)
• Map-Reduce
• Dremel (Big Query)
• Pregel
& Apache Follows
• GFS  HDFS
• Map-Reduce Hadoop, Spark
• Dremel  Drill
• Pregel  Giraph
SCALA
• Uses and Runs on Java Virtual Machine
• Yet, simpler to write (succinct) than Java
– Strong Type Inference (statically typed)
– Lesser Code
• Functional Programming (+ OOP)
– First class functions
• Used to develop Spark stack (Hadoop 2.0)
• Most suited for Map-Reduce applications
– Traits, collections, nested classes
– Immutable dataset
– Scalable
Mining
• Link Analysis
• Classification
• Content based recommendation
• Collaborative Filtering
• Finding similar items
• Clustering, Decomposition…
Machine Learning (Supervised/ Unsupervised)
Cloud
• Amazon AWS
• Google Cloud Platform
• IBM BlueMix
• OpenStack
• Data Bricks, Cloudera, HortonWorks, MapR,…
• SAP, Oracle…
Spark as a service, Hadoop as a service
One Circle
BIG POTENTIAL
Big Market
• $16.9 billion in 2015
• $50 billion by 2017
• 90 percent of the Fortune 500 already initiated big data
projects
• Big Data Spending : $8M Per company
• 200 TB of stored data per company
– with >1000 employees
Ref: McKinsey 2011
39
Big Players
• Leaders
– IBM, HP, Dell, SAP, Teradata, Oracle, SAS, Accenture
(>$400 Million)
• Pure players (100% revenue from Big data)
– Palantir, Pivotal, Splunk, Mu Sigma, Actian, Opera Solutions
(>$100 Million)
• Indian Players
– TCS, CapGemini
(>$10 million)
40
WikiBon
2013
Big Jobs
J. Leskovec, A. Rajaraman, J. Ullman: Mining of
Massive Datasets, http://www.mmds.org
41
BIG ENABLERS
42
Smart phones
• 1.2 billion sold in 2014
– 23.1 % increase over 2013
• Accounts to 27% of global handsets
– but consumes 95% of global traffic (1.5 EB/month)
• Daily SMS count exceeded the world population
Ref: Znet, Fool.com
43
Nielsen’s Law
44
bandwidth doubles every twenty-one months
5G in 2020 and 6G in 2030.
Moore's Law
45
Zilog PC
1980 iPhone
2007
Kryder's Law
46
In 2020, 2.5-inch disk drive would
store ~ 40 TB and cost about $40.
Storage capacity (doubles every 12
months) grows faster than Moore’s
law (processing capacity doubles
every 18-24 months).
All Together
47
Annual
Growth Rate
Nielsen's Law
Internet
bandwidth
50%
Moore's Law
Computing
power
60%
Kryder’s Law
Storage
capacity
100%
BIG SOURCES
Social Networks
49
as of August 2015
http://www.statista.com/
No. of active users in millions
Facebook
Ref: Chassis-plans.com, Wikibon
50
60 million posts per day
2.6 billion likes per day
375 million photos uploaded per day
15 TB data uploaded per day
600 TB data handled per day
700 TB Graph search DB
300 PB user data
http://allfacebook.com/orcfile b130817
Twitter
Ref: Chassis-plans.com, Wikibon
51
500 Million tweets per day
1.6 Billion search queries per day
316 Million montly active users
80% active users on mobile
Youtube
• 100 hours of new video every minute
• 53% mobile traffic is video
• Avg Human vision input: ½ million hours/life
• Youtube new uploads: 15 million hours/ year
52
MORE STORIES
House of Cards
Big data analytics picked up on the success of the British version
of House of Cards, and the popularity of David Fincher (Actor)
and Kevin Spacey (Director) movies
Netflix then made a major decision to commit $100 million for
two 13-episode sessions for its remake (US version) with
above team and streamed online
Netflix earned $1 Billion in that Quarter.
The Atlantic: May 2012
https://gigaom.com/2013/04/22/netflix-q1-2014-earnings/
first Emmy-winning
Streaming show
Lumiata
creates personalized treatment recommendations based on patients'
health data, using 170 million data points
55
raised US$10 Million from VCs
Ash Damle
Founder & CEO
MedAware
Avoids prescription errors due to
Drug mix-up
Patient mix-up
Unawareness of clinical data
Dosage mix-up
56
Example: Chlorambucil (chemotherapy) prescribed to a patient without
cancer, instead of Chloramphenicol (antibiotic)
Using mathematical model derived from Millions of EMRs which
represents real-world treatment patterns
Raised US$1 million funding
Windward
• Only platform to analyze maritime data from ships and ocean
to maintain ship history, predict threats and help make huge
financial decisions on shipping and commodity flows
• Earlier to 2010, it was impossible to know vessel’s location
once it sailed past 30 miles off shores; Then commercial
satellites were introduced ; But the big data collected from
ships gave corrupted picture
57
Raised $15.8 million.
mnubo
• Analytics of IoT Data
• Analytics of data from Connected car for driving
habits, vehicle failure pattern, inventory
management, usage based insurance etc
(36M connected cars will be on the road in 2020)
58
Raised $6 million
rocana
59
How many of your servers
are talking to blacklisted IPs?
How long has your
business been hacked?
Recana helps IT identify the root cause of performance or
security issues at any scale and complexity and resolve
underlying issues in real-time.
Instead of employing “brute force” searches against
millions of log entries, advanced analytics identifies
anomalies for investigation.
raised $19.4 million
Whetlab
• Only 5 data scientists worked
• Twitter acquired at undisclosed deal to increase the ability to
show users the kinds of tweets and content they actually want
to see.
60
Applied Predictive Technologies
Cloud based cause and effect analytics platform to accurately
measure the profit impact of pricing, marketing,
merchandising, operations, and capital initiatives, tailoring
investments in these areas to maximize ROI.
Acquired by MasterCard for $600 million.
61
Netflix Challenge
• Data: How users have rated movies
– 100.5 million ratings by 5 Lakh users to 18K movies
• Goal: Predict how a user would rate an unrated movie
– A recommender system problem
– 10% improvement: 1 million dollar prize
62Hsuan- Tien Lin
KDD Cup Challenge
• Data: How users rated songs
– 252.8 million ratings by 1 million users to 650K songs (Yahoo!)
• Goal: Recommend new songs that user would like
63Hsuan- Tien Lin
BDI for National Security
• TIA after (11/9)
• NATGRID after Mumbai attack (26/11)
– We could have stopped both, if we would have connected the pieces
of intel from all security agencies and info tracked from suspects
together.
64
More Applications
• Building a Stock Investment Strategy Model
• Predicting Customer Transaction Behavior
• Failure Prediction
• Opinion Mining to Determine User Sentiments
• Financial Loss Prediction
• Insurance Claim Prediction Model
• Bond Trade Price Prediction
• Prediction of Number of Days in the Hospital
• Accelerating Discovery of Drugs for Mutants of H1N1
• Molecular Activity Prediction
• Job Recommendation Engine
65
https://insofeprojects.wordpress.com/insofe-projects/
WHAT NEXT
A first course on BDI
Day Topics
Day 1 FN BDI: The Beginning
DFS and Map-Reduce
Distributed Graph (Pregel)
Page Rank algorithm
Day 1 AN BDI Tools Landscape
Dremel and Big Query
Naïve Bayes Classifier
Day 2 FN TF-IDF, Jaccard and Cosine
Collaborative filtering
Shingling, Minhashing
Locality Sensitive Hashing
Day Topics
Day 2 AN Scala Basics for MR apps
Practice session
More fun with Scala
Day 3 FN Spark projects using
Scala
Day 3 AN Student Projects ideas
Q&A
M.S. Options in USA
68
University Program
Stanford University M.S-CS, Specialization in Information
Management and Analytics
Four course graduate certificate in mining
massive datasets (link)
Northwestern University Master of Science In Analytics
DePaul University Master of Science in Predictive Analytics
North Carolina State University Master of Science In Analytics
University of Ottawa, Canada M.Sc in Analytics
University of Connecticut MS in Business Analytics and Project
Management
informationweek.com
IBM Director Dr. Spohrer's short list
PG options in India
69
Institute Program
Indian School of Business Certified Program in Business
Analytics (CBA)
Great Lakes Institute of
Management
PGP in Business Analytics
IIM Bangalore Analytics Essentials, BAI
IIM Ahmedabad Advanced Analytics for
Management
AnalyticsVidya.com, analyticsindiamag.com
Road Ahead
”The ultimate
search engine would
understand exactly
what you mean and
give back exactly
what you want.”
- Larry Page
Evolution of Manager Desk
71
Tree is God and above all
72
Prof.Ashok.R | +91-9943900101 | ashok@zettab.com
ZettaB.com
Big Data Training in Coimbatore

More Related Content

What's hot

Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2tcloudcomputing-tw
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache HadoopChristopher Pezza
 
OpenLSH - a framework for locality sensitive hashing
OpenLSH  - a framework for locality sensitive hashingOpenLSH  - a framework for locality sensitive hashing
OpenLSH - a framework for locality sensitive hashingJ Singh
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3tcloudcomputing-tw
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Rohit Agrawal
 
AWS Enterprise Day | Closing Keynote, Singapore - Dr Werner Vogels
AWS Enterprise Day | Closing Keynote, Singapore - Dr Werner VogelsAWS Enterprise Day | Closing Keynote, Singapore - Dr Werner Vogels
AWS Enterprise Day | Closing Keynote, Singapore - Dr Werner VogelsAmazon Web Services
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture EMC
 
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)Suman Srinivasan
 
Hadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingHadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingCloudera, Inc.
 
Hadoop: Distributed data processing
Hadoop: Distributed data processingHadoop: Distributed data processing
Hadoop: Distributed data processingroyans
 
Presentation on Hadoop Technology
Presentation on Hadoop TechnologyPresentation on Hadoop Technology
Presentation on Hadoop TechnologyOpenDev
 
The Right Data for the Right Job
The Right Data for the Right JobThe Right Data for the Right Job
The Right Data for the Right JobEmily Curtin
 
Faster Faster Faster! Datamarts with Hive at Yahoo
Faster Faster Faster! Datamarts with Hive at YahooFaster Faster Faster! Datamarts with Hive at Yahoo
Faster Faster Faster! Datamarts with Hive at YahooMithun Radhakrishnan
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataWANdisco Plc
 

What's hot (19)

Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
 
OpenLSH - a framework for locality sensitive hashing
OpenLSH  - a framework for locality sensitive hashingOpenLSH  - a framework for locality sensitive hashing
OpenLSH - a framework for locality sensitive hashing
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
 
AWS Enterprise Day | Closing Keynote, Singapore - Dr Werner Vogels
AWS Enterprise Day | Closing Keynote, Singapore - Dr Werner VogelsAWS Enterprise Day | Closing Keynote, Singapore - Dr Werner Vogels
AWS Enterprise Day | Closing Keynote, Singapore - Dr Werner Vogels
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Intro To Hadoop
Intro To HadoopIntro To Hadoop
Intro To Hadoop
 
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
 
Hadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingHadoop: Distributed Data Processing
Hadoop: Distributed Data Processing
 
Hadoop Family and Ecosystem
Hadoop Family and EcosystemHadoop Family and Ecosystem
Hadoop Family and Ecosystem
 
Hadoop Overview kdd2011
Hadoop Overview kdd2011Hadoop Overview kdd2011
Hadoop Overview kdd2011
 
Hadoop and big data
Hadoop and big dataHadoop and big data
Hadoop and big data
 
Hadoop: Distributed data processing
Hadoop: Distributed data processingHadoop: Distributed data processing
Hadoop: Distributed data processing
 
Presentation on Hadoop Technology
Presentation on Hadoop TechnologyPresentation on Hadoop Technology
Presentation on Hadoop Technology
 
The Right Data for the Right Job
The Right Data for the Right JobThe Right Data for the Right Job
The Right Data for the Right Job
 
Hadoop overview
Hadoop overviewHadoop overview
Hadoop overview
 
Faster Faster Faster! Datamarts with Hive at Yahoo
Faster Faster Faster! Datamarts with Hive at YahooFaster Faster Faster! Datamarts with Hive at Yahoo
Faster Faster Faster! Datamarts with Hive at Yahoo
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 

Viewers also liked

Social Media 101 - An Introduction to Social Media
Social Media 101 - An Introduction to Social MediaSocial Media 101 - An Introduction to Social Media
Social Media 101 - An Introduction to Social MediaLisa Myers
 
Report Mi.Mo. - I cittadini Migliorano la Mobilità - 2012
Report Mi.Mo. - I cittadini Migliorano la Mobilità - 2012Report Mi.Mo. - I cittadini Migliorano la Mobilità - 2012
Report Mi.Mo. - I cittadini Migliorano la Mobilità - 2012Conetica
 
Türkiyede Eğitim Sitemi
Türkiyede Eğitim SitemiTürkiyede Eğitim Sitemi
Türkiyede Eğitim SitemiYunus Emre
 
Debt Dr Newsletter December 2010
Debt Dr Newsletter December 2010Debt Dr Newsletter December 2010
Debt Dr Newsletter December 2010drazza65
 
Media APP Summit Non-Profits
Media APP Summit Non-ProfitsMedia APP Summit Non-Profits
Media APP Summit Non-ProfitsSusan Halligan
 
MTech14: Up Close and Personal: Technology's Undeniable Impact on Individuali...
MTech14: Up Close and Personal: Technology's Undeniable Impact on Individuali...MTech14: Up Close and Personal: Technology's Undeniable Impact on Individuali...
MTech14: Up Close and Personal: Technology's Undeniable Impact on Individuali...New England Direct Marketing Association
 
Social Shares - The New Link Building. SMX London 2012
Social Shares - The New Link Building. SMX London 2012Social Shares - The New Link Building. SMX London 2012
Social Shares - The New Link Building. SMX London 2012Lisa Myers
 

Viewers also liked (20)

Social Media 101 - An Introduction to Social Media
Social Media 101 - An Introduction to Social MediaSocial Media 101 - An Introduction to Social Media
Social Media 101 - An Introduction to Social Media
 
Report Mi.Mo. - I cittadini Migliorano la Mobilità - 2012
Report Mi.Mo. - I cittadini Migliorano la Mobilità - 2012Report Mi.Mo. - I cittadini Migliorano la Mobilità - 2012
Report Mi.Mo. - I cittadini Migliorano la Mobilità - 2012
 
Trompito 1
Trompito 1Trompito 1
Trompito 1
 
Gov. Presentation - Development Of Democracy
Gov. Presentation - Development Of DemocracyGov. Presentation - Development Of Democracy
Gov. Presentation - Development Of Democracy
 
Easter 1
Easter 1Easter 1
Easter 1
 
Türkiyede Eğitim Sitemi
Türkiyede Eğitim SitemiTürkiyede Eğitim Sitemi
Türkiyede Eğitim Sitemi
 
Rememberit Well
Rememberit WellRememberit Well
Rememberit Well
 
Debt Dr Newsletter December 2010
Debt Dr Newsletter December 2010Debt Dr Newsletter December 2010
Debt Dr Newsletter December 2010
 
4ª pb 9 ano sao judas
4ª pb 9 ano sao judas4ª pb 9 ano sao judas
4ª pb 9 ano sao judas
 
MTech13: "Social Media Tools for Success" - Eric Andersen
MTech13: "Social Media Tools for Success" - Eric AndersenMTech13: "Social Media Tools for Success" - Eric Andersen
MTech13: "Social Media Tools for Success" - Eric Andersen
 
Tif original 2011 final council presentation
Tif original 2011 final council presentationTif original 2011 final council presentation
Tif original 2011 final council presentation
 
Media APP Summit Non-Profits
Media APP Summit Non-ProfitsMedia APP Summit Non-Profits
Media APP Summit Non-Profits
 
Favorite Apps and Business Tools
Favorite Apps and Business ToolsFavorite Apps and Business Tools
Favorite Apps and Business Tools
 
Morin
MorinMorin
Morin
 
MTech14: Up Close and Personal: Technology's Undeniable Impact on Individuali...
MTech14: Up Close and Personal: Technology's Undeniable Impact on Individuali...MTech14: Up Close and Personal: Technology's Undeniable Impact on Individuali...
MTech14: Up Close and Personal: Technology's Undeniable Impact on Individuali...
 
Social Shares - The New Link Building. SMX London 2012
Social Shares - The New Link Building. SMX London 2012Social Shares - The New Link Building. SMX London 2012
Social Shares - The New Link Building. SMX London 2012
 
Prophecy
ProphecyProphecy
Prophecy
 
Catalogo Lazúli 2009
Catalogo Lazúli 2009Catalogo Lazúli 2009
Catalogo Lazúli 2009
 
Workflow NPW2010
Workflow NPW2010Workflow NPW2010
Workflow NPW2010
 
Angeli Tindall
Angeli TindallAngeli Tindall
Angeli Tindall
 

Similar to BDI- The Beginning (Big data training in Coimbatore)

Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop IntroductionJayant Mukherjee
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapSrinath Perera
 
Distributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewDistributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewKonstantin V. Shvachko
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoopMohit Tare
 
Big Data - Umesh Bellur
Big Data - Umesh BellurBig Data - Umesh Bellur
Big Data - Umesh BellurSTS FORUM 2016
 
E Science As A Lens On The World Lazowska
E Science As A Lens On The World   LazowskaE Science As A Lens On The World   Lazowska
E Science As A Lens On The World Lazowskaguest43b4df3
 
E Science As A Lens On The World Lazowska
E Science As A Lens On The World   LazowskaE Science As A Lens On The World   Lazowska
E Science As A Lens On The World LazowskaWCET
 
Large scale computing
Large scale computing Large scale computing
Large scale computing Bhupesh Bansal
 
PUC Masterclass Big Data
PUC Masterclass Big DataPUC Masterclass Big Data
PUC Masterclass Big DataArjen de Vries
 
Big data tutorial_part4
Big data tutorial_part4Big data tutorial_part4
Big data tutorial_part4heyramzz
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-HadoopNagarjuna D.N
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist SoftServe
 

Similar to BDI- The Beginning (Big data training in Coimbatore) (20)

Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
 
Spark
SparkSpark
Spark
 
Big data
Big dataBig data
Big data
 
Distributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewDistributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology Overview
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Addressing dm-cloud
Addressing dm-cloudAddressing dm-cloud
Addressing dm-cloud
 
Big Data - Umesh Bellur
Big Data - Umesh BellurBig Data - Umesh Bellur
Big Data - Umesh Bellur
 
E Science As A Lens On The World Lazowska
E Science As A Lens On The World   LazowskaE Science As A Lens On The World   Lazowska
E Science As A Lens On The World Lazowska
 
E Science As A Lens On The World Lazowska
E Science As A Lens On The World   LazowskaE Science As A Lens On The World   Lazowska
E Science As A Lens On The World Lazowska
 
Large scale computing
Large scale computing Large scale computing
Large scale computing
 
bigdata.pdf
bigdata.pdfbigdata.pdf
bigdata.pdf
 
PUC Masterclass Big Data
PUC Masterclass Big DataPUC Masterclass Big Data
PUC Masterclass Big Data
 
Bertenthal
BertenthalBertenthal
Bertenthal
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 
Big data tutorial_part4
Big data tutorial_part4Big data tutorial_part4
Big data tutorial_part4
 
Big Data
Big DataBig Data
Big Data
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-Hadoop
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist
 
bigdata.pptx
bigdata.pptxbigdata.pptx
bigdata.pptx
 

Recently uploaded

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...karishmasinghjnh
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...amitlee9823
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx9to5mart
 

Recently uploaded (20)

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
 

BDI- The Beginning (Big data training in Coimbatore)

  • 1. Big Data Intelligence The Beginning Prof.Ashok.R | +91-9943900101 | ashok@zettab.com ZettaB.com Big Data Training in Coimbatore Ref: Ullman et.al, Mining Massive Datasets
  • 2. Caution • The grass is always green on the other side Be inspired! • Stories.. and more stories… Be informed! • The devil is in the details Be challenged! 2Hsuan- Tien Lin
  • 3. The Dream in 1945 3 • A dream machine of Vannevar Bush (1945) • An extended supplement to Human Memory • A device which stores individual library such as books, records and communications • Microfilms can be searched, copied and shared • Useful to store and share information among lawyers, patent attorney, Doctor and chemists • The base concept from which WWW evolved
  • 4. Leads to Web and Web Scale Data • Data Volume: doubles every 1.2 years 5 EB – Total data produced in 5000 years till 2003 20 EB – Data collected by Google alone for a day now • Data Variety : structured, semi and unstructured xml, JSON, doc, pdf, html, email body, .mp4,.jpeg… • Data Velocity : Lot happens in a minute 72 hours of new video uploaded in YouTube 3 million searches in google 200 million emails sent 350 thousand Tweets | 1 million searches in Twitter 690 thousand shares | 420 GB data handled in FB 20 million photo views in Flickr 4 Source: Qmee, Wikibon https://blog.kissmetrics.com/facebook-statistics/
  • 5. Desktop Hobbyist Internet Big Data Byte one grain of rice Kilobyte cup of rice Source: What is big data, Slideshare.net Megabyte 8 bags of rice Gigabyte 3 Trucks of rice Terabyte 2 container ships Petabyte Fills half the area of Tirupur Exabyte Fills the area of south india ZettaByte Fills Indian ocean twice PB/EB/ZB 210 220 230 240 250 260 270 1
  • 6. Big Data Intelligence (BDI) The ability to understand all of us better by connecting the dots from massive data sets (with TB/PB Volume, streaming Velocity and Variety in sources) to predict the future. 6
  • 7. WHY DO WE PREDICT 7
  • 8. To Survive 8 With largest neural network brain to store and process big volume of data with 100 billions of neurons and 2.5 PB equivalent memory @ 100 million MIPS (33K i7 cores) Vision | Touch | Hearing | Smell | Taste Scientificamerican.com, Storagecraft.com 1250 MB/s | 125 MB/s | 12.5 MB/s | 1.25 MB/s You only feel 0.7% of What you sense
  • 9. The Prediction Power • 10000 hours (7-8 years) of rigorous practice is required to be the world-class expert—in anything Daniel Levitin, The neurologist • This enables the ability to predict 2 seconds before others- “Two Second Advantage” – Wayne Gretzky, The greatest Ice-hockey player of all time, was able to predict where the puck was going to be, an instant before it arrived – Sachin Tendulkar – Warren Buffet – Viswanath Anand 9 wins the competitors
  • 10. Can Machines Think (to Predict)? 10 Alan Turing asked this question in 1950 and proposed a test to validate it.
  • 11. Which is machine, and which is woman??? Can Machine Imitate Brain? Which is man, and which is woman??? Turing Test 11
  • 12. Did any machine pass? 12 Any machine nearer? (near AI) “William Wilkinson’s ‘An account of the principalities of Wallachia and Modavia’ inspired this author’s most famous novel.” Jeopardy! Quiz Contest. The challenge is to predict the question and bet with reasonable confidence. No.
  • 13. IBM Watson Computer 13 “William Wilkinson’s ‘An account of the principalities of Wallachia and Modavia’ inspired this author’s most famous novel.”
  • 14. Near AI Solutions • Natural language processing • Machine learning • Prediction analytics • Face recognition • Languages translation • Speech recognition 14
  • 15. Can machine predict? • Share price in a stock market next day • Top 5 products consumers want to buy next week • Price of Tomato(1 Kg) next month • No. of cars to be sold next quarter • Potential criminals in the city/ mega event • When machine/human will become sick • Best matched course/school to study • Best matched job/company to work 15
  • 16. Google Story: Where it all began • 50 billion indexed pages • Thousands/Millions/Billions of pages may match each search query • How to rank them in order to display the most relevant (important) pages in the top. • Predict what you want to see. Not what you asked. Do You Know? 4 billion searches happen in a day Each query uses 1000 nodes Result returned in 0.2 seconds. 20 billion pages crawled per day 20 Exabytes of data collected in a day
  • 17. Page Rank • Give pages ranks (scores) based on links to them – Links from many pages  high rank – Link from a high-rank page  high rank Parallel Programming With Spark, Matei Zaharia    ji t it j r r i 1 d “rank” rj for page j
  • 18. Matrix-Vector Multiplication MatrixGoogle A.rr t1t   A • Page rank equation in a practical form, (Rank vector r is the Eigen vector of A) Iteration is repeated, till rank vector converges (or max. iteration reaches) For iteration t+1,
  • 19. RAM is not Enough • Won’t be a problem for small dimension (NxN) • Consider, N=1 billion (pages that match a query) • Dimension is now in billions – A is billion x billion matrix – r is billion size rank vector – r(old,new) has two billion entries ( 16 GB for 8 bytes double values) – A has billion x billion entries ( 8 ExaBytes) Though, we have methods such as sparse matrix to reduce dimensions in actual implementation. RAM size of a highly configured server node: 128-512 GB
  • 20. Worker Node 20 Datacenter node 16 cores 10-30 TB disks (Secondary) 128-512GB RAM (Main memory) 1-4TB (SSD) 1 -10 Gbps 0.2-1GB/s (x10 disks) (Seek) 1-4GB/s (x4 disks) 40-60GB/s Source: AmpLab, UCB, Dell
  • 21. Disk is slowest and not Enough • 50 billion web pages x 20KB = 1 PB • 1 computer reads 30-35 MB/sec from disk ~10 months to read all • Also, it requires 1,000 hard drives to store all 21J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org can you wait that long for one search?
  • 22. Parallelism using Cluster • 8-64 nodes/rack, 4-16 racks in a cluster • 1 Gbps bandwidth within rack, 8 Gbps out of rack • Node specs : 8-16 cores, 128-512 GB RAM, 10×1 TB disks Aggregation switch Rack switch ToR
  • 23. But Nodes Fail at Scale • One server may stay upto 3 years (1,000 days) • If you have 1,000 servers, expect to loose 1/day • Google has 1 Million servers –Hence 1000 machines will fail every day. 23J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
  • 24. Traditional RDBMS Fail • Not designed for variety of data types (Text, Video, Images) • Not capable to handle big volume (PB/EB/ZB) • Not designed for parallelism • Poor fault tolerance at Scale (Million servers) • Slow down due to joins, volume, ACID check and high velocity requests • Designed for transaction processing; Not designed for deep analytics (intensive computing) 24
  • 25. Google Solution: DFS • Distributed File System • Divide the bigger data file into smaller chunks of size 16-64 MB and store them in different nodes in different racks. • Chunks are replicated (2-3) for fault tolerance 25 C0 C1 C2C5 Chunk server 1 D1 C5 Chunk server 3 C1 C3C5 Chunk server 2 … C2D0 D0 C0 C5 Chunk server N C2 D0 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
  • 26. & Map-Reduce Map-Reduce environment(Master) takes care of: • Handling machine failures (with replica nodes) • Partitioning the input data • Scheduling workers • Performing the group by key step • Managing inter-machine communication J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 26
  • 27. Big Data Platform M-R App MapReduce Stack (Hadoop & Spark) Distributed File Systems (HDFS/ GFS)
  • 29. DFS is useful, only when • Size is big (> 1 TB) • Files are rarely updated – Works for Google (to store indexed pages) – Will not be effective for Airline reservation system (where frequent data updates are done) 29
  • 30. M-R is useful, only when • Dimension in billions – Matrix-vector multiplication in Google Pagerank • Graph with millions of nodes and billions of edges – FB Network Graph • Deep analytical application with intensive computing – Useful in Finding users with similar buying pattern for products recommendations in Amazon – But not useful to manage online retail sales of Amazon (frequent data updates, transactions) 30
  • 32. Google Creates • DFS (GFS) • Map-Reduce • Dremel (Big Query) • Pregel
  • 33. & Apache Follows • GFS  HDFS • Map-Reduce Hadoop, Spark • Dremel  Drill • Pregel  Giraph
  • 34. SCALA • Uses and Runs on Java Virtual Machine • Yet, simpler to write (succinct) than Java – Strong Type Inference (statically typed) – Lesser Code • Functional Programming (+ OOP) – First class functions • Used to develop Spark stack (Hadoop 2.0) • Most suited for Map-Reduce applications – Traits, collections, nested classes – Immutable dataset – Scalable
  • 35. Mining • Link Analysis • Classification • Content based recommendation • Collaborative Filtering • Finding similar items • Clustering, Decomposition… Machine Learning (Supervised/ Unsupervised)
  • 36. Cloud • Amazon AWS • Google Cloud Platform • IBM BlueMix • OpenStack • Data Bricks, Cloudera, HortonWorks, MapR,… • SAP, Oracle… Spark as a service, Hadoop as a service
  • 39. Big Market • $16.9 billion in 2015 • $50 billion by 2017 • 90 percent of the Fortune 500 already initiated big data projects • Big Data Spending : $8M Per company • 200 TB of stored data per company – with >1000 employees Ref: McKinsey 2011 39
  • 40. Big Players • Leaders – IBM, HP, Dell, SAP, Teradata, Oracle, SAS, Accenture (>$400 Million) • Pure players (100% revenue from Big data) – Palantir, Pivotal, Splunk, Mu Sigma, Actian, Opera Solutions (>$100 Million) • Indian Players – TCS, CapGemini (>$10 million) 40 WikiBon 2013
  • 41. Big Jobs J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 41
  • 43. Smart phones • 1.2 billion sold in 2014 – 23.1 % increase over 2013 • Accounts to 27% of global handsets – but consumes 95% of global traffic (1.5 EB/month) • Daily SMS count exceeded the world population Ref: Znet, Fool.com 43
  • 44. Nielsen’s Law 44 bandwidth doubles every twenty-one months 5G in 2020 and 6G in 2030.
  • 46. Kryder's Law 46 In 2020, 2.5-inch disk drive would store ~ 40 TB and cost about $40. Storage capacity (doubles every 12 months) grows faster than Moore’s law (processing capacity doubles every 18-24 months).
  • 47. All Together 47 Annual Growth Rate Nielsen's Law Internet bandwidth 50% Moore's Law Computing power 60% Kryder’s Law Storage capacity 100%
  • 49. Social Networks 49 as of August 2015 http://www.statista.com/ No. of active users in millions
  • 50. Facebook Ref: Chassis-plans.com, Wikibon 50 60 million posts per day 2.6 billion likes per day 375 million photos uploaded per day 15 TB data uploaded per day 600 TB data handled per day 700 TB Graph search DB 300 PB user data http://allfacebook.com/orcfile b130817
  • 51. Twitter Ref: Chassis-plans.com, Wikibon 51 500 Million tweets per day 1.6 Billion search queries per day 316 Million montly active users 80% active users on mobile
  • 52. Youtube • 100 hours of new video every minute • 53% mobile traffic is video • Avg Human vision input: ½ million hours/life • Youtube new uploads: 15 million hours/ year 52
  • 54. House of Cards Big data analytics picked up on the success of the British version of House of Cards, and the popularity of David Fincher (Actor) and Kevin Spacey (Director) movies Netflix then made a major decision to commit $100 million for two 13-episode sessions for its remake (US version) with above team and streamed online Netflix earned $1 Billion in that Quarter. The Atlantic: May 2012 https://gigaom.com/2013/04/22/netflix-q1-2014-earnings/ first Emmy-winning Streaming show
  • 55. Lumiata creates personalized treatment recommendations based on patients' health data, using 170 million data points 55 raised US$10 Million from VCs Ash Damle Founder & CEO
  • 56. MedAware Avoids prescription errors due to Drug mix-up Patient mix-up Unawareness of clinical data Dosage mix-up 56 Example: Chlorambucil (chemotherapy) prescribed to a patient without cancer, instead of Chloramphenicol (antibiotic) Using mathematical model derived from Millions of EMRs which represents real-world treatment patterns Raised US$1 million funding
  • 57. Windward • Only platform to analyze maritime data from ships and ocean to maintain ship history, predict threats and help make huge financial decisions on shipping and commodity flows • Earlier to 2010, it was impossible to know vessel’s location once it sailed past 30 miles off shores; Then commercial satellites were introduced ; But the big data collected from ships gave corrupted picture 57 Raised $15.8 million.
  • 58. mnubo • Analytics of IoT Data • Analytics of data from Connected car for driving habits, vehicle failure pattern, inventory management, usage based insurance etc (36M connected cars will be on the road in 2020) 58 Raised $6 million
  • 59. rocana 59 How many of your servers are talking to blacklisted IPs? How long has your business been hacked? Recana helps IT identify the root cause of performance or security issues at any scale and complexity and resolve underlying issues in real-time. Instead of employing “brute force” searches against millions of log entries, advanced analytics identifies anomalies for investigation. raised $19.4 million
  • 60. Whetlab • Only 5 data scientists worked • Twitter acquired at undisclosed deal to increase the ability to show users the kinds of tweets and content they actually want to see. 60
  • 61. Applied Predictive Technologies Cloud based cause and effect analytics platform to accurately measure the profit impact of pricing, marketing, merchandising, operations, and capital initiatives, tailoring investments in these areas to maximize ROI. Acquired by MasterCard for $600 million. 61
  • 62. Netflix Challenge • Data: How users have rated movies – 100.5 million ratings by 5 Lakh users to 18K movies • Goal: Predict how a user would rate an unrated movie – A recommender system problem – 10% improvement: 1 million dollar prize 62Hsuan- Tien Lin
  • 63. KDD Cup Challenge • Data: How users rated songs – 252.8 million ratings by 1 million users to 650K songs (Yahoo!) • Goal: Recommend new songs that user would like 63Hsuan- Tien Lin
  • 64. BDI for National Security • TIA after (11/9) • NATGRID after Mumbai attack (26/11) – We could have stopped both, if we would have connected the pieces of intel from all security agencies and info tracked from suspects together. 64
  • 65. More Applications • Building a Stock Investment Strategy Model • Predicting Customer Transaction Behavior • Failure Prediction • Opinion Mining to Determine User Sentiments • Financial Loss Prediction • Insurance Claim Prediction Model • Bond Trade Price Prediction • Prediction of Number of Days in the Hospital • Accelerating Discovery of Drugs for Mutants of H1N1 • Molecular Activity Prediction • Job Recommendation Engine 65 https://insofeprojects.wordpress.com/insofe-projects/
  • 67. A first course on BDI Day Topics Day 1 FN BDI: The Beginning DFS and Map-Reduce Distributed Graph (Pregel) Page Rank algorithm Day 1 AN BDI Tools Landscape Dremel and Big Query Naïve Bayes Classifier Day 2 FN TF-IDF, Jaccard and Cosine Collaborative filtering Shingling, Minhashing Locality Sensitive Hashing Day Topics Day 2 AN Scala Basics for MR apps Practice session More fun with Scala Day 3 FN Spark projects using Scala Day 3 AN Student Projects ideas Q&A
  • 68. M.S. Options in USA 68 University Program Stanford University M.S-CS, Specialization in Information Management and Analytics Four course graduate certificate in mining massive datasets (link) Northwestern University Master of Science In Analytics DePaul University Master of Science in Predictive Analytics North Carolina State University Master of Science In Analytics University of Ottawa, Canada M.Sc in Analytics University of Connecticut MS in Business Analytics and Project Management informationweek.com IBM Director Dr. Spohrer's short list
  • 69. PG options in India 69 Institute Program Indian School of Business Certified Program in Business Analytics (CBA) Great Lakes Institute of Management PGP in Business Analytics IIM Bangalore Analytics Essentials, BAI IIM Ahmedabad Advanced Analytics for Management AnalyticsVidya.com, analyticsindiamag.com
  • 70. Road Ahead ”The ultimate search engine would understand exactly what you mean and give back exactly what you want.” - Larry Page
  • 72. Tree is God and above all 72
  • 73. Prof.Ashok.R | +91-9943900101 | ashok@zettab.com ZettaB.com Big Data Training in Coimbatore

Editor's Notes

  1. Watson was developed by 25 researchers over four years. The software runs on a supercomputer with 2,880 IBM Power750 cores, or computing brains, and 15 terabytes of memory. One of Watson’s advantages is that it can hit the buzzer to answer a question faster than any human possibly can — six to 10 milliseconds. Watson won $1 million and all of its winnings will be donated to charity. Watson is an analytical computing system that specializes in natural human language and provides specific answers to complex questions at rapid speeds. Watson cannot respond to video or audio clues and they were omitted by jeopardy producers.
  2. An Osborne Executive portable computer, from 1982 with aZilog Z80 4MHz CPU, and a 2007 Apple iPhone with a 412MHzARM11 CPU; the Executive weighs 100 times as much, has nearly 500 times as much volume, cost approximately 10 times as much (adjusted for inflation), and has about 1/100th the clock frequencyof the smartphone.
  3. “House of Cards” is one of the first major test cases of this Big Data-driven creative strategy. For almost a year, Netflix executives have told us that their detailed knowledge of Netflix subscriber viewing preferences clinched their decision to license a remake of the popular and critically well regarded 1990 BBC miniseries. Netflix’s data indicated that the same subscribers who loved the original BBC production also gobbled down movies starring Kevin Spacey or directed by David Fincher. Therefore, concluded Netflix executives, a remake of the BBC drama with Spacey and Fincher attached was a no-brainer, to the point that the company committed $100 million for two 13-episode seasons.
  4. http://whatsthebigdata.com/big-data-startups/