Seminar on Big Data : Current Trends and Technology Landscape
Event Details:
Introduction to Big Data - Bijilash Babu, EY. (10:00 AM - 10:40 AM)
What is Big Data?
Why is it relevant?
Leveraging value from Big Data
Big Data Analytics
A Data-driven world
2. Big Data Analytics
Roadmap
❖ Data Deluge
❖ What’s Big Data?
❖ Why is it relevant?
❖ Is it just a hype?
❖ Hypothesis to a Data-driven world
❖ How do we leverage value from Data?
❖ What’s Analytics?
❖ Big Data Ecosystem
❖ Interesting Case studies
3. Big Data Analytics
Data Deluge
❖ Digitalisation [image]
❖ Sensors [image]
❖ Retailers [Amazon]
❖ Utilities[Smart meter]
❖ Automobiles[car cluster]
❖ Social media[Twitter, FB]
❖ Medical records
❖ Banking Transactions
❖ By 2020, 40 Zettabytes of data will be created
❖ An increase of 300 times from 2005
5. Big Data Analytics
Big Data
❖ Big data is high-volume, high-velocity and high-variety
information assets that demand cost-effective, innovative
forms of information processing for enhanced insight and
decision making. - Doug Laney, Gartner
❖ Big Data software describes a new generation of software
and architectures designed to economically extract value
from very large volumes of a wide variety of data by enabling
high-velocity capture, discovery, and/or analysis. - IDC
❖ Big data is the kind of Data that is difficult extract value
from.
7. From Hypothesis to a Data-driven World
Beer and Diaper
❖ Teradata discovered an unlikely correlation
❖ A relationship between retail sales of beer and diapers!!!
❖ What could be the hypothesis?
❖ It would help, if you have more data, say other attributes…
❖ Purchases happened between 05.00 and 07.00pm
❖ Young fathers were buying diapers on their way home from work
❖ …and picking up something for themselves at the same time.
► Are Pools more dangerous than guns?
► When it comes to children: a swimming pool is 100 times more
deadly.
Customer ID Attribute 1 Attribute 2 … Attribute n
8. Big Data Analytics
From Hypothesis to a Data-driven World…
❖ It’s good to look at data analytically, let the numbers speak for themselves
❖ Non-causal relationships are relevant, difficult establish manually.
❖ Before BD such large correlation analysis were not practical
❖ Wine Quality = 12.145+ 0.1117 winter rainfall + 0.06 temp – 0.003 harvest rain
❖ Even F=ma needed correction in some cases, Einstein had to correct it
❖ Airfare prediction, Oren Etzioni, Seat buy-back!
❖ Wal-Mart: Strawberry Pop Tarts and Hurricane
❖ Steve Jobs, gene sequencing, cancer treatment
❖ Discharged patients back with issues- MSR and MS Amalga
❖ Normal case: congestive heart failure
❖ Surprising case: Initial complaints had words like, depression distress, …
10. Big Data Analytics
The Analytics Test
❖ A simple litmus test of where people are on their analytics journey.
❖ If a CIO says, “my business is unique,” they are just beginning.
❖ If a CIO says, “can you tell me, based on your experience, what is happening in
another industry that I might be able to apply?” they really get it, because they
realize that all of these techniques, while we use different terms and different data,
are the same techniques across industries - Keith Collins, SAS
12. Big Data Analytics
The brain connection
❖ Researchers on Astronomy and HEP depend on Data
❖ Each week, HST downlinks approx. 120 GB of data
❖ In 30 seconds, the human brain produces as much data as the Hubble Space
Telescope has produced in its lifetime.
| NATURE | VOL 499 | 18 JULY 2013
❖ How about creating software, that works like the brain?
❖ How could we do this?
❖ Biologically inspired algorithms
❖ Machine learning
❖ Neural networks
❖ Data mining
❖ Linear algebra
❖ ……
13. Big Data Analytics
Analytics
► Analytics leverage data in a particular functional process (or application) to enable
context-specific insight that is actionable.“ - Gartner
❖ Usage of non-trivial mathematical and statistical
methods or any other algorithms to leverage hidden
insights from large amount of Data.
16. Healthcare
❖ “We already have extensive clinical, health financing and patient-
care administrative data from our IT systems.
❖ How do practitioners approach data analytics in the areas of
genomics, governance, nursing and clinical care?
❖ Combining this data with lifestyle, geospatial, behavioral and
genotype data will provide us with better insights into the health
risks of our population and discover new co-relationships
between data sets.”
❖ Automate best practices, Move care to less expensive nevus
Technology involved care modes
19
17. Watson’s journey…
❖ Who is he?
❖ What does he do now?
❖ Sr. Resident at Memorial Sloan-Kettering Cancer Centre
❖ Help improve medical decision making
❖ Does ML and HPDA on Medical Data
❖ Helps assists other doctors and nurses with his insight
20
19. Oil and Gas, another promising vertical
❖ Massive amount of Data
❖ Front end Dashboards are widely used.
❖ Predictive maintenance techniques are used
❖ Historical data is not yet used for value addition
❖ Often due to semi-structured or unstructured data
❖ Expensive simulations could be avoided
22
22. Mobile data trends
❖ IDC forecasts a 30-fold increase in data volumes by 2020,
Mobile is playing a large part in driving this explosion in
data.
❖ Mobile big data isn't only a function of smart-phone
penetration and consumer usage patterns. The data is also
created by apps or other services working in the background.
25
23. Transportation
❖ Cars today are stuffed with sensors, chips and SW
❖ 30% of the price is due to the electronic components.
❖ Data contains info about how car-parts work while at road.
❖ The same data can be used to tune inefficient parts.
❖ A large car maker spotted issues with a sensor in the talk
❖ Developed and licensed a patent to the German OEM
❖ Though the Car maker had internal team they went for an
extern..
24. On racing circuits
❖ Racing teams now eat Big Data for breakfast, lunch and dinner. And for
snacks in-between. -Doug Laney, Gartner.
❖ Data from clutch, gearbox, fuel system, oil,…, as well as the drivers’ health.
❖ Crunching them (1GB) between races, 1000 simulations during the race.
❖ After just a couple laps they can predict the performance of each
subsystem with up to 90% accuracy.
❖ Tune these system during the race to help the team win.
❖ And for each season, new cars are built from the ground up using 95% new
parts designed using this data.
❖ Proprietary suite of analytics tools for real time storage, analysis,
visualization and manipulation of data.
27
25. Transport
❖ A major jet engine manufacturer completely transformed its
business by not just making jet engines but their maintenance too
❖ The company no longer just sell jet engines but offers to monitor
them, charging customers for usage.
❖ From their office in the UK, they monitor and analyze the
performance of more than 3,700 Jet engines worldwide to spot
problems before the breakdown occurs.
❖ And repairs or replaces them as required. Service now accounts
for 70% of the civil-aircraft engine division’s income.
28