4. Can you capitalize on the data explosion?
Human interaction data
Per-day posting to Facebook
across 1.1 billion active users
4kB per active user
Digitization of Analog Reality
40,000 petabytes a day*
10m self-driving cars by 2020
20MB / sec
Front ultrasonic sensors
10kB / sec
20MB / sec
100kB / sec
Front, rear and
40MB / sec
100kB / sec
Rear radar sensors
100kB / sec
100kB / sec
100kB / sec
* Driver assistance systems only
7. What are core Memory-Driven Computing components?
Fast memory fabric
New and Adapted
8. HPE introduces the world’s largest single-memory computer
The prototype contains 160 terabytes of memory
– 160 TB of shared memory
– 40 physical nodes
– dual socket capable ARMv8-A workload
optimized System on a Chip.
– Photonics/Optical communication links,
including the new X1 photonics module,
– Software tools designed to take advantage
of persistent memory.
9. How big is 160 terabytes?
160 terabyte prototype can
simultaneously work with over
books worth of content
5 times over
every book in the Library of Congress
German Center for Neurodegenerative Diseases (DZNE)
• Currently over 40X speed improvements. We’re
getting results that used to take more than 25
minutes in 36s.
• Expectation is to gain up to 100X more
performance when we expand our learnings to the
other components of their pipeline.
• DZNE has never been able to work with so much
data at one time, which means finding hidden
correlations and better answers than ever before
– ultimately resulting in new discoveries to help cure
11. Memory-Driven Computing
will allow you to optimize
to maximize on-time flights
by having control of every state of the
plane, and the operations associated
with it, avoiding delays.
Simulate likely disruptions and store
an almost infinite series of “what-ifs”
so you can easily look up the
scenario and immediately know what
Optimizing over a
large search space in
problems before they
Two curves on chart build from left to right. Numbers appear over each data point. “Capability gap” arrow appears once the two curves start to diverge
Every two years, we’re creating more data than through all of history
Our ambitions are growing faster than our computers can improve.
The definition of real time is changing
The old world analyzes the past, which gives us hindsight
Real time means analyzing the new while it’s still new
Real time means insight and foresight
Can you capitalize on the data explosion?
99% of data created at the edge is discarded today
Deciding what to keep introduces bias
Bias precludes new insight
Raw data is priceless.
Someone is going to work this out and create whole new industries.
Will it be you, or your competition?
Data Source: K. Rupp, http://www.karlrupp.net/2015/06/40-years-of-microprocessor-trend-data/
DC’s Data Age 2025 study, sponsored by Seagate, April 2017
[Next slide: The New Normal if you want to go deeper into the this graph]
Once social media had taken off, we started to talk about the vast amounts of data that was being processed. It was way more than the structured data we’d processed up to that point.
For example, by May 2016, Facebook was processing 4 petabytes of data a day. That’s twice the entire Walmart transaction (structured) database, every day. While this is high, it’s actually only 4KB per active Facebook user.
However, we’re about to take another leap in magnitude in the amount of data we’ll be processing. For example, just one self-driving car will generate 4 terabytes of data a day (as the graphic on the right nicely shows). That’s just one car. There will be 10 million of them by 2020. And that’s just self-driving cars. We’ll be generating such levels of data from IoT sensors all over our “physical world”.
Why do much data? Because we are moving to an era where we sense reality. And, unlike structured data and social media data, reality is not digital. We need to create a digital version of our analog reality. That takes a lot of data (actually, the amount of data it generates is our choice – the higher the fidelity of our sensors the more data we generate. And our analysis is better if we have high fidelity sensors – so we will have a lot of data).
Walmart and Facebook in their own ways, are extracting huge value and monetizing their 2-4PB of data, (maybe even still scratching the surface) When someone figures out how to extract value from that 40kPB and the things they will be able to do, will seem like magic. To extract value – we need new architectures/techs to achieve that. Memory driven architectures, AI/ML, etc.
*This 40EB per day figure is for the driver assistance/autonomous driving sensory systems Most modern vehicles today already have more than 100 microprocessors handing the vehicle operational technology and all of that data is also thrown away save the check engine light codes.
Note: HPE’s working set is 19PB across 20,000 databases
Car stats : Intel
Structured data SAS institute 2013
Facebook : brandwatch, May 2016
Number of self-driving cars : Business Insider, June 2016
[Next slide: Memory-Driven Computing]
[This is why Memory-Driven Computing is “Powerful” (addressing the first pillar). Note, this slide contains the single most important concept in the Memory-Driven Computing architecture.]
The challenges are always on building enough memory to keep up with compute. Memory has always been the scarce resource (never enough volume/resources).
Traditional computers chop up your information – the data – to match the limitations of the processor
Processor as gatekeeper
We flip that around and puts the data first – bringing processing to the data
Processor almost irrelevant – can swap out to suit task
We call this Memory-Driven Computing
SoC, universal memory and photonics are the key parts of the architecture of the future
With this architecture, we can ingest, store and manipulate truly massive datasets while simultaneously achieving multiple orders of magnitude less energy/bit
Q: What is HPE doing here that is truly different?
A: New technologies are not substitutional - we’re re-architecting
[Next: Memory-Driven Computing concept build out…]
These are the key attributes that help us talk about Memory-Driven Computing but please note this is not the actual framework that holds everything together.
Powerful: The shift to Memory-Driven Computing is a necessary evolution to overcome the limitations of today’s computing platform, which will soon reach its physical and computational limits which is unable to optimize the unprecedented volume of data being created. Memory-Driven Computing will serve as the foundation for all technology, underpinning everything from the data center to the Internet of Things to the next supercomputers.
Open: HPE’s goal is to make Memory-Driven Computing an open ecosystem with many contributors. We believe the new architecture, combined with industry collaboration, can help solve the key challenges presented by today’s data explosion. This open collaboration is what’s behind our decision to put key development tools on Github and why the company saw it valuable to be a founding member of the Gen-Z Consortium (more about this later). We’re not trying to keep anyone out; we’re asking for companies to join us in the open. Customers and developers ready to learn more can start by joining our The Machine User Group (more information about this later)
Trusted: Today, security is bolted on as an afterthought and the current design doesn’t allow a lot of room for security. Modern attacks simply by-pass anti-Virus and attack below the OS where they are undetectable. Memory-Driven Computing gives us the opportunity to start over, to build in security from the start. We’re focused on support below the operating system to provide isolation, detection of unknown compromises, and encryption of data in flight, in use and at rest.
Simple: Current data models make it difficult to share and store data efficiently. Radically simplify the way you use, store and share data real time to manage your data efficiently. Benefit from a shorter path to persistence, rather than paying the costs of data translation as your data passes through various software layers.
[Next: Processor centric computing to Memory-Driven Computing]
The point of this slide is that that you have to have all of these in order to be Memory-Driven Computing
[Next: Build anything with Memory-Driven Computing]
On May 16, 2017, Hewlett Packard Enterprise announced prototype of the first Memory-Driven Computing platform from The Machine research project is the largest single-memory system on the planet, capable of working with up to 160 terabytes (TB) of data at the same time. To put that into context, 160 TB would allow our new prototype to simultaneously work with the data held in every book in the Library of Congress five times over – or approximately 160 million books. We’ve never been able to work with data sets of this size in this way.
Based on the current system, HPE believes it could easily scale to a multi-exabyte single-memory system. In fact, we believe the technology can successfully scale to a nearly limitless pool of memory and will serve as the foundation for computing systems of the future. Every computing category and device – from smartphones to supercomputers – has something to gain from this new computing paradigm.
This work, which involves deriving insights from data at unprecedented levels, has implications for nearly every industry.
The image on this slide is the real 40 nodes that lives in Ft. Collins, CO. Here are other assets that you might find helpful to show:
We also have a 360 degree video of our team working in the sandbox to show that it’s alive and well.
Video file: https://news.hpe.com/uploads/2017/05/HPE_360-with-Logo_V3.mp4
You tube link: https://www.youtube.com/watch?v=eu_OvJPyCdY&t=50s&index=1&list=PL0_ubpZ6vGcCmAhiKaqhT-4wEUtpebJGR
B-roll footage of the team collaborating meant as a “live feed” into Ft. Collins (~15 mins long): https://news.hpe.com/uploads/2017/05/Hour-Long-Live-View-Edit-04.mp4
Feel free to show the 2 ½ min. video: “The Computer Built for the Era of Big Data: https://www.youtube.com/watch?v=AE5VitamCEQc
We partnered with The Atlantic for the launch of The Machine 40 node in Washington D.C where we focused on how HPE’s Memory-Drivein Computing can help with the mission to Mars, which requires the most powerful computing system in the world has ever seen: https://www.youtube.com/watch?v=IoVCOKh0Hcw&listhe new with the themet=PL0_ubpZ6vGcCmAhiKaqhT-4wEUtpebJGR&index=4
Here is the blog announcement to reference https://news.hpe.com/a-new-computer-built-for-the-big-data-era/
Large Scale Graph Inference (LSGI) application is running on the 40 nodes/160 TB prototype. To show that the Prototype is ALIVE/operational, we’re running the LSGI app based on 200 million security datasets of URLs and web links of the Internet in 2012 with a known list of benign and malicious websites, as well as access patterns based on data from HP’s DNS servers.
Based on the live LSGI app running on the Prototype, LSGI workload is running 20 million vertices on 40 nodes and the activity we’re seeing is that the data is constantly changing and LSGI is re-computing their value at about 100 iterations to get to a stable resolution/output.
When you get to the 98% mark, that’s when you start making business decisions. Imagine how fast we can reach a stable resolution from 3 days to 3 minutes to allow near real-time decisions. Based on the current LSGI output, it’s making rapid changes early on, and further improvements are smaller the longer you wait for LSGI to keep running.
160 Terabyte Prototype can simultaneously work with the data held in every book in the Library of Congress five times over – or approximately 160 million books. We’ve never been able to work with data sets of this size in this way.
[Next: talk about The Machine prototype capacity]
For more technical explanation:
We looked at an application (Kallisto) that is part of a much larger computational pipeline used by DZNE, and ran it on both an emulated environment on the Superdome X, as well as on the 40-Node Prototype.
For that application, we saw an improvement of getting over 40x speed improvements (on the Superdome X) over published results. Thus some of the advantage we saw was from better hardware (Superdome X), and some was from our code refactoring to Memory-Driven Computing principles.
Our results on the 40-Node Prototype showed that as number of nodes are increased, we can increase the number of application instances linearly. This means that the overall work DZNE can do will scale linearly as number of nodes are added for their pipeline.
This is just one of many applications that are used by DZNE. If other applications show similar results, we believe that the overall computational pipeline can be made up to two orders of magnitude (100x) faster.
[Next: talk about our The Machine prototype node boards]