The velocity and volume of data are growing faster than ever before, and companies are looking for new methods to speed their data analytics. Using an innovative FPGA-based architecture, the Ryft ONE supercharges data analytics and provides you more value from your data.
2. 2
Information—the fuel of business—is trapped
in analysis platforms built on 70-year
old architectures.
3. 3
Data volume and velocity challenge traditional
computing methods
Traditional Approach:
• Commodity x86 based servers
• Cluster with open source software
• Scale for volume
• Scale for parallelism / performance
Challenges:
• High level languages can be inefficient
• Data intensive workloads drive in-memory solutions
• DRAM footprints at commodity prices are small
• Scaling out increases cost and complexity
4. Ryft delivers huge benefits in a small package.
Highest performance per watt and lowest total cost of ownership (TCO) of
any product on the market.
48 TB in 1U
• Data storage is abstracted
as a set of Linux mount
points
• Support native
encryption/decryption with
no loss in performance
(AES 256 Encryption)
Simple API
• C library abstracts internal
FPGA constructs to simplify
programmability, allowing a
programmer to invoke
operations as simple function
calls, returning simple results
• Command line
• Web Interface
Linux Front End
• Linux (Ubuntu 14.04 LTS )
front end - Standard build,
Non restricted OS, apt-get
• API calls FPGA fabric
backend
• Linux services/protocols can
be used
• ssh/scp/rsync/sftp
• Standard monitoring
agents
• Web services
• Security configuration
5. x86 Architecture vs. Systolic Arrays
Memory
PE
One Clock Cycle
(x86)
Memory
PEPEPE PE PEPE
One Clock Cycle
FPGA- Systolic Array
100 ns
100 ns
6. FPGA Benefits
x86 FPGA
• General purpose computing
• Sequential in nature
• Non-deterministic performance
• Interrupts
• Memory allocation
• Problems are broken into a sequence of
operations and processed serially
• Increasing number of instructions
• Increased overhead
• Increasing required power/cooling
required
• Software can break problems down and
bring parallelism:
• Between processors/cores
• Between servers
• Output combined over interconnects
• Not general purpose
• Purpose built algorithms
• Can be reprogramed via firmware
• Parallel in nature
• Can execute many parallel operations in
one clock cycle
• More output with less power and clock
speed
• ~1000X less instructions to solve the same
problem as x86
• 100% deterministic performance
• No memory fetching or management
• No interrupts
8. The Ryft ONE is powered by a breakthrough in
Real-time Data Analysis.
The only 1U platform capable of analyzing streaming, historical,
unstructured, and multi-structured data in real-time at 10 GB/second.
Ryft ONE avoids bottlenecks that strangle conventional systems
by combining these two innovations:
The Ryft Analytics Cortex™
Ryft ONE leverages a massively parallel bitwise
computing architecture to deliver unprecedented
performance from the smallest possible form factor.
The Ryft Algorithm Primitives™ Library
Each Ryft ONE comes with a subscription to this
growing collection of pre-built algorithm components,
and an open API to leverage them.
+
9. “We see Spark Streaming scales nearly linearly to 100 nodes, and can
process up to 6 GB/s at sub-second latency on 100 nodes for Grep, 2.3
GB/s for the other, more CPU-intensive jobs”
UC Berkley Streaming Computation at Scale
Proprietary | 9
http://www.cs.berkeley.edu/~matei/papers/2013/sosp_spark_streaming.pdf
10. Ryft transforms datacenter economics.
The Ryft ONE
Costly & Complex Clusters
Search = 10 GB/s
Term Frequency = 2.5 GB/s
Search = 6GB/s
Term Frequency= 2.3 GB/s
11. Wikipedia Examples
• English XML Dump is offered by Wikipedia
• Total Corpus is 44GB
• Copying the data takes 44 seconds
• Fuzzy search would take 4.4 seconds
• Term Frequency would take 17.6 seconds
13. Data Exploration Use Case
• RDF—understanding
of native formats
• Powerful no-index
search
• Flexible query format
with wildcarding
• Identify relationships
between disparate data
14. HDFS
Data Triage for Hadoop/Spark Use Case
Raw Data
M/R
noSQL Hive
Text
Index
Application
Hours?
Days?
15. Search / Minimize
@10GB/s
Data Triage for Hadoop/Spark Use Case
Ingest @ 1-4GB/s
Seconds!
HDFS
• Social media signal/noise
• Fuzzy searching at line rate
@badguy1
@badguy2
@badguy01
@badboy01
Search: “badguy??”
16. Organizations who want real-time insights into all their data
Large data sets (changing, structured & unstructured, Text, Binary, Imaging)
High Velocity Data
• Logging
• Ad Data
• Twitter
Forensics & Legal Discovery
• Host based forensics
• E-discovery
Scientific Data
• Genomics
• Sensor Data
Financial
• Compliance
• Fraud Detection
Cyber Security
• PCAP
• Full packet capture
• Binary Analysis
Imagery Analysis
• Change Analysis
• High Performance Rendering
17. Revisiting Performance Results
Ryft ONE closes the industry’s data analytics performance gap
by combining the following into a single architecture:
Parallel FPGA architectures to accelerate performance
Dedicated storage/access/RAM
Elimination of data security performance bottlenecks
Elimination of operating system and high level language overhead
Minimizing the need to move data
Use Case
Single Ryft ONE
Throughput
Spark Cluster to Match
Performance
Search ~10GB/sec > 100 nodes1
Fuzzy Search ~10GB/sec 100-200 nodes2
Term Frequency ~2.5GB/sec 100 nodes1
18. Accelerate business insights with the only platform purpose-built
to simultaneously analyze any type of data—historical and
streaming, unstructured and multi-structured—
100X faster with 70% lower TCO.
The Ryft ONE: More data. Less center. Faster insights.
Legacy proprietary platforms are too slow and costly
No real-time performance; limited data formats
Priced out of the range of all but the largest enterprises
Hadoop/Spark running on clusters are slow, complex, and brittle
Significant technology, performance, and knowledge gaps remain
Slow and complex setup and maintenance; X86 architecture is not sustainable
Demand for knowledgeable developers far exceeds supply
Need purpose built solutions that are open, high speed, and sustainable
Top ISV/OEMs working to unlock power of new architectures
Enterprises developing homegrown servers
Hyper growth emerging markets for applying HPC resources to data analysis
x86 servers are used universally across many problem areas:
Data analysis
Search
Simulation
Machine learning
Genome sequencing
Graph processing
Scale-out x86 clusters have advantages but also many drawbacks:
Increased node count for to meet DRAM footprint requirements
Increased node count for CPU core requirements
Inefficient high level languages
Overhead of distributing data and combining results
Datacenter sprawl
Complex deployments
Increased operational cost
A New Approach is Needed
Highly distributed memory architectures turn complex analytics problems into I/O problems, because they must frequently move data between physically distributed memory, disk storage, processors, & networked nodes. The rising class of complex analytic workloads demands strong communications and near-real-time turnaround. Trying to partition (slice) these problems into smaller pieces that can run independently is like trying to cut a human into dozens of chunks and expecting each chunk to go on living
Commodity Hardware Clusters using Hadoop/Spark are designed for compute-intensive workloads, not data analyticsWithout purpose-built solutions for Big Data Analytics challenges, IT has been forced to piecemeal a solution and scale out to larger and larger commodity hardware clusters that are strangled by i/o performance bottlenecks
MapReduce/Hadoop tools were originally designed to run relatively simple, non-real-time tasks on highly distributed architectures such as clusters and clouds; these workloads frequently make the slow journey out to disk and back
Spark operates on similar principles but more efficiently — it saves up multiple tasks before going out to disk
JSON – Java Script Object Notation, ODBC – open database connection, ODATA – open data protocol
Footprint Comparison
Years in the making, the Ryft ONE combines two proven innovations in hardware and software to optimize compute, storage and IO performance:
Fast Actionable Business Insights Analyze historical and streaming data at an unprecedented 10 Gigabytes per second or faster
Traditional Clustered SystemsBig Data Analytics challenges by re-engineering old technologies to try to make them faster
Ryft’s revolutionary innovations in hardware and software dramatically reduce Mean Time to Decisions
High Velocity Data
These are use cases where the data arrives so rapidly that the indexing approaches don’t work well without expensive scaling and licensing.
Logging (enterprise level syslog or flume)
Ad data (Admeld)
Click stream (web logs)
Twitter firehose
Scientific Data
These are use cases where the data doesn’t format well for tokenizers and indexers.
Genomics (sequencing / bowtie and like algorithms)
Other sensor data
Financial
These check multiple data sources to determine the legitimacy of an action. The turnaround time determines if it is a forensic finding or circumvents the incident.
Compliance
Fraud Detection
Forensics and Legal Discovery
These users get data in a large package that can take vast amounts of time to index and sometimes indexing isn’t possible due to unfamiliar formats that aren’t parsed and text extracted. Our brute force comparison methods sidestep many of these issues and allow analysts to find key pieces of data in seconds vs. days.
Host based forensics on disk images
E-discovery
E-mail
Databases
Documents
Messaging servers
Copier hard drive images
Cyber Security
PCAP
Full packet capture (includes payload analysis)
Binary analysis (malware/virus)
Configuration file diff checking
Imagery Analysis
Change analysis
Military airborne sensors
Security cameras
Aircraft radar
Astronomy
High Performance Rendering
The node/cluster configurations are noted in the footnotes in the slide, and also in the notes below. They were taken directly from published literature, which is why they differ across search/fuzzy/TF vs. Sort. Sort was a more recent publication which used higher-end hardware.
Each node in the Spark cluster for the search, fuzzy search and term frequency operations consisted of m1.xlarge EC2 nodes made up of 4 cores, 15GB RAM and 1.68TB storage each, as taken from an academic publication by UCB: http://www.cs.berkeley.edu/~matei/papers/2013/sosp_spark_streaming.pdf, and also from Amazon EC2 configuration information: http://aws.amazon.com/ec2/previous-generation/
Spark cluster configuration for the sort operation was taken from a more recent publication (https://databricks.com/blog/2014/10/10/spark-petabyte-sort.html) where they called out extremely high-end (and highest cost as compared to any other EC2 instance at $6.82/hour) EC2 instances where each i2.8xlarge node consists of: 32 cores, 244GB RAM, and 6.4TB of storage per node! That’s an amazing and costly amount of resources!
The performance of any sort algorithm is highly dependent on the size of the sort key and the size of its accompanying data record. Ryft ONE’s worst case is on the order of 1GB/sec, and a typical real-world case can be upwards of 10GB/sec.
The equivalent number of Spark nodes for Sort is estimated at approximately 65 nodes. This estimate stems from an analysis of the latest Spark sort benchmark performance numbers as published in https://databricks.com/blog/2014/10/10/spark-petabyte-sort.html coupled with estimated Spark performance degradation (at approximately 50%) when moving from the non-real-world sort benchmarks employed to a more realistic real-world sort. Even if the assumptions and estimates are off (say even by a factor of 2), the fact that a single 1U Ryft ONE can achieve the sort performance of a large cluster of nodes where each node is 32 cores, 244GB RAM and 6.4TB is simply amazing.
Massively valuable businesses and applications will be built off of rapidly increasing volume, velocity and variety of data.
Today, most enterprise big data initiatives struggle to make it out of prototype stage, because current tools like Hadoop and Spark are complex to build and maintain, limited in capabilities, and built upon server clusters using von Neumann architectures designed 70 years ago. Today’s x86 architectures which rely on these legacy architectures are not designed for high performance data analysis and cannot do what companies need them to answer the questions they need to ask.
Businesses need a new category of high performance, open. and low maintenance platform that supports the volume, velocity and variety of big data—at a price tag that makes high performance computing capabilities attainable by all businesses.
Massively valuable businesses and applications can be built on the Ryft platform to enable companies to do things never before possible while transforming data center economics.