Published on Sep 16, 2015
The Internet of Things (IoT) is beginning to impact every aspect of our lives. We now do almost everything using our mobile devices – from turning on a coffee pot to counting our daily steps, even turning off the lights and locking the doors of our homes. The digital and physical universes are merging, and creating massive amounts of data. It isn't just IoT, IDC predicts that by 2020, we’ll create 44 trillion gigabytes of data, much of which will be unstructured.
We’re all aware that the dramatic increase in unstructured data will have an impact. But do you really know how to set-up your business and your team for success? Are you really future proofed or just hoping the patches will hold?
In order to successfully manage the big data deluge, companies must adopt strategic approaches to ensure they can not only manage, but benefit from and even monetize big data. Data is becoming a board level discussion. NoSQL is emerging as a real game changer for organisations looking to leverage their data for business competitive advantage, move away from existing complicated, expensive and even out-dated technologies. But for many NoSQL is still untested and untried and requires a leap of faith for enterprise business few find un-challenging to take.
The decision you make on your data strategy are becoming both your greatest challenge and your greatest opportunity. Come discover business critical insight into how to take advantage of emerging technologies for maximum business impact.
I discuss how big data will affect enterprises, including its benefits and challenges, as well as steps organizations can take to manage the big data deluge. This talk is designed to break down the true situation, uncover the true facts and implications of both your existing and future technology stacks. Demystify the challenge and provide the insight you need.
3. 80 MILLION PEOPLE
1300 MESSAGES A SECOND
24 Hours a Day / 365 Days a Year
Information is requested and amended more
than 2.6 BILLION times a year
Has enabled over 42 MILLION Summary
Care Records to be created and stored
Has transmitted over 1.3 BILLION
prescription messages. 1800 messages/s
£21m savings in first year. 700hrs saved
every day
NHS Data Spine
4. Smart Thermostat Market is expected to grow from $146.9
million in revenue in 2014 to $2.3 billion by 2023.
-Navigant
THE CONNECTED HOME
Store historical date for analytics
Store User Schedules
Sessions Storage for Connected Users
5. 1.5 TRILLION RECORDS PER DAY
400% Year Over Year Increase
Ability to Monitor entire IT
environment from Single Portal
Array of Real-Time Statistics
and Insight
Centralize & Correlate Events,
Alerts, and Notifications
6. Cloud-based management tools and data analytics usually only
accessible to larger companies.
Millions of transactions each day.
POS FOR OVER 20K SMALL
BUSINESSES
Quick Service
Retail
Restaurant & Bar
7. 20 TERABYTES OF DATA
PER DAY BILLIONS OF
MOBILE DEVICES
10 BILLION data transactions a
day – 150,000 a second – Apple
Forecasting 2.8 BILLION locations
around the world
Generates 4GB OF DATA every
second
We’re focusing on helping
people make better decisions
with the weather.
8. WEATHER FORECAST
PREDICTS WALMART’S SALES
Ideal BERRY weather turns out to
be low wind with temperatures
below 28C.
People are more likely to eat STEAK
when it's warm out with higher winds
but no rain, but not if it gets too hot.
9. WHAT ARE THE IMPLICATIONS?
Enterprises must choose: Modernize or Sink!
• Adopt technologies to successfully manage data generated
by new data sources and consumed by users accessing
complex data types from around the world, 24 hours a day.
• Organizations must make plans to handle data ingestion,
data storage, and analytics in order to bring value to the
business.
• Increasingly look to distributed systems to help manage the
influx of traffic, and ease current operational challenges.
10. DISTRIBUTED WORKLOADS
App App App App
Virtualization
Server
App
Aggregation
Server Server Server Server
Client-Server Era:
SMALL APPS
BIG SERVERS
ONE LOCATION
Cloud Era:
BIG APPS
SMALL SERVERS
MANY LOCATIONS
11. Everything works
at small scale
What happens when
something goes wrong
The customer
experience matters
WHY DISTRIBUTED SYSTEMS?
Scale out, up and down predictably
and linearly
Survive server, network or data
center failures
Data locality enables data operations
close to end-users
DISTRIBUTED SYSTEMS
DEVELOPERS OPERATIONS CUSTOMERS SALES
12. The growing hype surrounding
data lakes is causing substantial
confusion in the information
management space
Gartner
Challenge 1 – Isolation of Data
13. Challenge 2 – Consistency of Data
To Be or Not To Be
Consistent?
Understand the
question…
C A
P
X
The CAP Theorem
Dr Eric Brewer
Describes the trade-offs involved in
distributed systems
14. Challenge 3 – Data Gravity
Apps
Services
Lower Latency
&
Higher Bandwidth
Growth
Over Time Data
15. “Perhaps the biggest challenge is that the IoT has the
potential to generate orders of magnitude more data
than any other source in existence today. So, in the
world of the IoT we will test the limits of ‘big.’”
Bill Franks, Chief Analytics Officer for Teradata
Putting Challenges In Perspective
IoT, catchy phrase, but what does it really mean to your daily life…it means data is being captured, leveraged and enabled to make life easier…
For example, lets take a day in the life of interconnected devices and big data…look at how might play out on a day out the door to fly here…
Phone wakes me to nudge me to start a cup of coffee
One machine telling another machine to start something at my convenience
Quick cup coffee
Work out in gym – tracking my data to synch later
Show that I’m losing in an activity competition to my wife
My Apple Watch chimes in that need to get out the door or I’ll miss my flight, which is still on time
Off to the airport…
Driving away and realized that didn’t lock front door and if don’t my wife never will
Pull up my Smartthings app and lock the door and turn my office lights off
Just one more device enabling another in my world and collecting data on how I live along the way
Everything up to here has been about end devices enabling us while the collect data to allow us to be more efficient ...improving our life.
However, it doesn’t stop there. End devices are collecting vast quantities of data to make us more efficient in millions of ways we don’t even notice…
GE & Pivotal – have been capturing TB of real-time data to analyze and improve fuel usage, travel patterns, engine efficiencies and more to optimize consumption and increase safety
IoT & Big Data isn't about after the fact review – its about real time applicability of the data and what it can do
GE’s Flight Efficiency Services saved Altia airlines $46m in fuel savings through real-time analysis
IoT, Big Data, Machine-to-Machine, whatever you call it, its beginning to impact every facet of our life….
http://www.forbes.com/sites/ptc/2014/06/23/will-the-internet-of-things-revolutionize-the-aircraft-industry/
All of this interaction and data collection is happening around me b4 I ever even get to the airport. Data that will be used to improve my daily life and help the companies I choose to do business w/ provide me more services and products that might be of interest to me.
And when I jumped on the pane to come here, companies like GE and Pivotal are capturing in real time analytics TB of data on fuel usage, travel patterns, engine efficiencies and more, all to help streamline support, consumption and increase safety.
IoT isn't about big data after the fact, its becoming more about real time applications that drive efficiencies – GE’s Flight Efficiency Services saved Italia airlines $46m in fuel costs through real time analysis of engine data, and they’re only beginning.
However, it doesn’t stop there. End devices are collecting vast quantities of data to make us more efficient in millions of ways we don’t even notice…
All of this interaction and data collection is happening around me b4 I ever even get to the airport. Data that will be used to improve my daily life and help the companies I choose to do business w/ provide me more services and products that might be of interest to me.
And when I jumped on the pane to come here, companies like GE and Pivotal are capturing in real time analytics TB of data on fuel usage, travel patterns, engine efficiencies and more, all to help streamline support, consumption and increase safety.
IoT isn't about big data after the fact, its becoming more about real time applications that drive efficiencies – GE’s Flight Efficiency Services saved Italia airlines $46m in fuel costs through real time analysis of engine data, and they’re only beginning.
Interconnected world means greater efficiencies at every level
UK gov spine project – 80M brits quicker access to medical info to obtain prescription medicine
1st step already generated 1.3B prescription messages – however, long term vision, connect every medical data point together to start drawing corollaries to proactively provide services or ensure when emergency activities happen that everything is known – unlimited advancements w/ access to big data
http://systems.hscic.gov.uk/spine
80 million people tracking every data point to improve the medical care and support of their offerings. Right now handling every prescription across the UK that is needed, and over time integrating to every touch point involved in a persons daily medical needs from prescriptions to emergency surgery.
Every interaction creates another data point for correlation and support.
Smart thermostat market expected to grow from $146.9 million in 2014 to $2.3 billion by 2023 as part of “the connected home” and the Internet of Things (IOT) and
The number of homes in North America and Europe with a smart thermostat grew by 105% to 3.2 million in 2014 according to research from Berg Insight
Emerson Climate Technologies Smart Thermostats collect and store historical data for analytics.
Store user schedules and sessions
“We are alr
Use service support example of breaking furnace
eady working with partners to develop an Application Programming Interface (API) for energy management solutions, and see this as just a piece of a larger connected home platform for Emerson Climate Technologies.” - Ed Purvis, executive vice president at Emerson.
Boundary provides the ability to monitor an entire IT environment from a single portal.
Real-time statistics and insight.
Centralize and correlate events, alerts, and notifications.
Boundary agents collect and pass back an average of 1.5 trillion records per day representing a year-over-year increase of 400%.
mid-size company can process 1.5T transaction in real time cost effectively
iPad based POS systems enabling easier commerce for small business
Over 10,000 small businesses with solutions for retail, quick service, and restaurant & bar POS
Reporting, analytics, inventory management, and marketing tools
Cloud-based, auto updating service.
Creating POS exp for small business that would have only been available 3-5yrs ago for large chains, at much great cost. – think what they can do to make recommendations w/ that data over time
A company who understands the power of Big Data intimately is TWC…
Collect 10B data transactions a day from Apple alone
They currently collect more weather data then anyone in the world, including governments, and they think there only scratching the surface – and they’re adding 20TB a day of data.
http://www.computerweekly.com/news/2240228147/CIO-interview-Bryson-Koehler-CIO-of-The-Weather-Company
http://www.informationweek.com/big-data/software-platforms/big-data-reshapes-weather-channel-predictions/d/d-id/1112776
Wal-Mart’s #2 ad spend – TWC
Draw corollaries in real time to increase daily take
Ideal Berry Weather or Steak weather – localized and correlated w/ the POS data from Wal-Mart’s system
More opt created by enabling Big Data workloads across diverse geos and environments
http://adage.com/article/dataworks/weather-forecast-predicts-sales-outlook-walmart/295544/
I’m sure if you were to be asked who was Wal-Mart’s 2nd largest advertising spend with, you would not have said the TWC b4 today.
When you have access to such large amounts of data, you can draw corollaries in real time to help drive your business, example above becomes a real time reaction to what’s happening that day in that geographic location.
New business models are being enabled by big data workloads running across diverse localities and environments…
The Internet of Things and big data explosion is here and it’s time to choose: modernize or sink? In a global economy companies must adopt strategic technologies to successfully manage data generated by new data sources and consumed by users accessing complex data types from around the world, 24 hours a day.
How the industry has changed…
90’s on premise and one location – now, horizontal world
Its’ everywhere, from public cloud, SaaS services, to service providers doing dedicated private or hardware, to local colo of your own equipment.
Distributed workloads – background taken on this challenge, T3, AWS and now Basho
90’s into early 21st century…client server arch -- 21st century and beyond…HORIZONTAL WORLD; No longer just in your data center. Its’ everywhere, from public cloud, SaaS services, to service providers doing dedicated private or hardware, to local colo of your own equipment. We are becoming a world of “Distributed Workloads” – area that quite familiar with though my current and past companies…
Companies like Facebook, AMZN, GOOG, built huge distributed systems w/ strict requirements around scalability, fault tolerance, & global footprints - same concepts must be considered by the Enterprise
Distributed systems that scale out horizontally – Assuming failure and latency is part of the equation
Must ensure data is close to the user to reduce latency
You’ll see more and more enterprises distributing components of their workloads on private and public clouds.
Concepts of governance, orchestration and availability are taking on whole new meanings
As you tackle these challenges to leverage new opts – keep 3 things in mind…
There is growing hype around data lakes. Data lakes focus on storing disparate data and ignore how or why data is used, governed, defined and secured.
Data in itself means nothing, its what you do w/ it in the distributed workload that adds value
Btw, if hear one more exec say we have a data lake or reservoir and then make the point like everything's covered, I’m going to puke
Collecting data is only the beg, how you leverage it is the goal – think through the implications otherwise, you’ve just bought yourself an expensive archival system
If isolated and not accessible then you’ve lost – must ensure that data doesn’t get lost and isolated in data lakes but stay connected across the distributed system is key
http://www.gartner.com/newsroom/id/2809117
Strong consistency vs eventual consistency – relational DB vs unstructured data – Its no longer a write once and read many world
All these workloads create data, and that data resides in different locations based upon whose running it
You no longer control all the variables, so assume that latency, availability and consistency should factor into your end user exp
Chaos Monkey if you’re a Netflix fan – chaos is the norm and that will impact the consistency of your data, factor it
Trade offs of consistency and availably must be considered in distributed systems
- Consistency, Availability and Partition Tolerance
In the world of relational databases strong consistency has reigned as a requirement. In this new distributed world use cases must clearly look at strong consistency and eventually consistency. It’s no longer a write once read many world.
All these workloads create data, and that data resides in different locations based upon whose running it – you no longer control all the variables, so assume that latency and consistency should factor into your end user exp – or you’ll be disappointed in the results if something times out or delivers inaccurate data because it couldn't’t factor when something might not be available – new words to live by, assume failure – Chaos Monkey if you’re a Netflix fan – chaos is the norm and that will impact the consistency of your data, factor it
Data has weight
Friend Dave McCrory, Basho CTO, wrote Data Gravity – foundation thesis for AWS & Equinix business models
Wherever the data resides it doesn’t move around as much
Apps will naturally build around data to ensure reduced latency and drive consistency. Factor it into your distributed workloads because it will happen.
Distributed systems must be designed to ensure low latency and high throughput to ensure a great user experience across the globe.
Factor these things into your data strategy, you’ll be better off to leverage data to enable vs being run over by it
Don’t want to block drinks, so will be around post if anyone has questions
With these challenges comes great opt.
The ability to capture data to enable your business is exciting and some of the projects being worked on now, literally will change your everyday life. However, for those of us working on the distributed workloads don't assume that just having the data is the beg, its only the start.
Thank you.