2. Confidential â Š 2013 Equinix Inc. www.equinix.com 2
Big Data at Equinix
~2 million
Alarms
~200k
interconnections
~250k
Electrical circuits
Sensors across 95+ IBXs
~40k
Infrastructure objects
3. Confidential â Š 2013 Equinix Inc. www.equinix.com 3
Big Data at Equinix
Sensors across 95+ IBXs
Lead to / produce
Support for multiple protocols
Push as well pull methods
Time series data
Cross sectional dataNot so clean data
High velocity
Clean data Lots and lots of noise
Some useful intel
4. Confidential â Š 2013 Equinix Inc. www.equinix.com 4
Big Data at Equinix
What do we use(or plan to use) this data for?
Customer Presentment Billing
Operations New Product & Services
5. Confidential â Š 2013 Equinix Inc. www.equinix.com 5
Big Data at Equinix
Use-case analysis : 80-20 rule
~80% of use-cases analyzed act upon âHot Dataâ
~80% of data for most of use-cases analyzed is time-series.
All âquick winâ use-cases need data mediation, aggregation and roll-up for
presentment.
Real-time to near real-time processing of events
Collection, processing and storage technologies suitable for
time-series data.
Collection, mediation, cross-referencing and co-relation of
data from different sources; roll-up and aggregate.
6. Confidential â Š 2013 Equinix Inc. www.equinix.com 6
Big Data at Equinix
Our Approach : Equinix Big Data Platform
§ď§âŻ Common platform to be shared by all initial Big
Data use cases â multi tenancy
§ď§âŻ Built on inexpensive hardware using free or
inexpensive software
§ď§âŻ Seamless & massive scalability using scale-out
§ď§âŻ High reliability - partial failover, graceful
degradation, self-healing, self-balancing
§ď§âŻ Data ingestion and processing capabilities for
high volumes at high velocity
§ď§âŻ Support for structured and semi-structured data
§ď§âŻ Provides real-time processing abilities
§ď§âŻ Provides parallel processing capabilities
§ď§âŻ Support for low latency queries, wide range
scan queries and search
§ď§âŻ Provides abstraction via connectors,
frameworks and libraries
§ď§âŻ Support for low latency queries, wide range
scan queries and search
§ď§âŻ Support for predictive analytics using machine
learning
Immediate requirements
Long term goals
Big Data Platform - Logical Architecture (technology agnostic)
7. Confidential â Š 2013 Equinix Inc. www.equinix.com 7
Big Data at Equinix
Requirements & Technologies considered for Big Data Platform
8. Confidential â Š 2013 Equinix Inc. www.equinix.com 8
Big Data at Equinix
Grand Finale
Hadoop Ecosystem vs. DataStax Enterprise
SearchSearch
SearchSearch
AnalyticsAnalytics
StorageStorageAnalyticsAnalytics
StorageStorage
StorageStorage
Hadoop
 Distributed
 File
 System
(Storage/Analytics)
NameNode Secondary
 Name
 Node
Data
 Nodes
 (Storage)
HBase
 (Storage/Analytics)
Hbase
 Master
Hbase
 Region
 Servers
Hbase
 Master
Search
Management
Â
Services
Cloudera
 Manager
Solr
 Nodes
Zookeeper
Pros
â˘âŻ Scalability
â˘âŻ Cloud readiness
â˘âŻ Resource availability
â˘âŻ Industry momentum
â˘âŻ Product eco-system
maturity
â˘âŻ Technical support
Cons
â˘âŻ Infrastructure footprint
â˘âŻ Operational Complexity
â˘âŻ Learning curve
â˘âŻ Availability
â˘âŻ Total cost of ownership
Pros
â˘âŻ Infrastructure footprint
â˘âŻ Operational ease
â˘âŻ Scalability
â˘âŻ Availability
â˘âŻ Cloud readiness
â˘âŻ Learning curve
â˘âŻ Resource availability
â˘âŻ Technical support
â˘âŻ Total cost of ownership
Cons
â˘âŻ Industry momentum
â˘âŻ Product eco-system
maturity
9. Confidential â Š 2013 Equinix Inc. www.equinix.com 9
Criteria
 Cassandra
 HBase
Â
CAP Theorem Focus Availability, Partition-Tolerance Consistency, Availability
Data Partitioning
Supports ordered & random partitioning, random
partitioning is recommended.
Ordered Partitioning. Load balancing
achieved through resharding.
Distributed System P2P architecture (Amazon Dynamo)
Master / Slave via HDFS, Zookeeper for
coordination
Administration & Maintenance Medium High
Single Write Master No (R+W+1 to get Strong Consistency) Yes
Multi-tenancy Yes Yes
Secondary indexes
Supports secondary indexes on CF where column
name is known.
Does not natively support secondary indexes.
Consistency Tunable Consistency Strict consistency (Not ACID)
Hot Spot Problem
No, distributes load across nodes using random
partition strategy.
Yes, one node may handle most of the traffic
due to ordered partition.
Multi-Data Center Support
and Disaster Recovery
Asynchronous replication via WAN Asynchronous replication via WAN
Single point of failure Ring topology, there is no single point of failure.
Although there exists a concept of a master
server, HBase itself does not depend on it
heavily. HBase cluster can keep serving data
even if the master goes down. Hadoop
namenode is a single point of failure.
Commercial vendors Datastax, Acunu Clodera, Hortonworks
Cassandra Vs. HBase
Big Data at Equinix
10. Confidential â Š 2013 Equinix Inc. www.equinix.com 10
Why DSE Cassandra
Big Data at Equinix
Support for Analytics
Integrated search using Solr
Security features
Cluster management capabilities
Commercial support
DataStax would probably list lots of more reasons, these are the reasons
relevant to us.
11. Confidential â Š 2013 Equinix Inc. www.equinix.com 11
Big Data at Equinix
Grand Finale
Hadoop Ecosystem vs. DataStax Enterprise
SearchSearch
SearchSearch
AnalyticsAnalytics
StorageStorageAnalyticsAnalytics
StorageStorage
StorageStorage
Hadoop
 Distributed
 File
 System
(Storage/Analytics)
NameNode Secondary
 Name
 Node
Data
 Nodes
 (Storage)
HBase
 (Storage/Analytics)
Hbase
 Master
Hbase
 Region
 Servers
Hbase
 Master
Search
Management
Â
Services
Cloudera
 Manager
Solr
 Nodes
Zookeeper
Pros
â˘âŻ Scalability
â˘âŻ Cloud readiness
â˘âŻ Resource availability
â˘âŻ Industry momentum
â˘âŻ Product eco-system
maturity
â˘âŻ Technical support
Cons
â˘âŻ Infrastructure footprint
â˘âŻ Operational Complexity
â˘âŻ Learning curve
â˘âŻ Availability
â˘âŻ Total cost of ownership
Pros
â˘âŻ Infrastructure footprint
â˘âŻ Operational ease
â˘âŻ Scalability
â˘âŻ Availability
â˘âŻ Cloud readiness
â˘âŻ Learning curve
â˘âŻ Resource availability
â˘âŻ Technical support
â˘âŻ Total cost of ownership
Cons
â˘âŻ Industry momentum
â˘âŻ Product eco-system
maturity
Ăźďź Sold
12. Confidential â Š 2013 Equinix Inc. www.equinix.com 12
Big Data at Equinix
How far are we on our Big Data journey?
ĂźďźâŻ Pilot use-case from PoC to Production
ĂźďźâŻ Moved network statistics use case from RRD
based solution to DSE Cassandra
ĂźďźâŻ Build in progress for
§ď§âŻ power monitoring use cases
§ď§âŻ data center monitoring
§ď§âŻ network monitoring
In-plans
Ăď⯠Recommendation engine on interconnection
platform
Ăď⯠Use case analysis and technology selection for
connected data sets
Ăď⯠Building data science capabilities for use cases
requiring predictive modeling
A few data points
Physical bare metal boxes for DSE
nodes
Densely packed data nodes with 4TB
storage on each node, 96GB RAM
About ~250 million records a day
Also used for log analysis for internal
IT systems monitoring use-cases
13. Confidential â Š 2013 Equinix Inc. www.equinix.com 13
Big Data at Equinix
Experience so far
Lack of standards based connectors / drivers
DataStax has developed a Java Driver, but doesnât support JDBC
No data visualization tools to access from Cassandra for low-latency access
No data access tools (Toad equivalent) available yet; DevCenter is not there yet
We
used Astyanax and are evaluating DataStax java driver
built libraries to abstract Astyanax for application engineering teams
built rest services for data access by applications
Good reliability
Not many instances of nodes being down
Handled loads even when nodes were down
14. Confidential â Š 2013 Equinix Inc. www.equinix.com 14
Big Data at Equinix
Where do we go from here??
Graph databases
Batch processing (Hadoop, Spark , MapReduce ??)
Interactive queries
Online data processing
Data analytics
Data science and machine learning
Data visualization tools and applications
Developer toolkits
We are hiring
Big Data Architect
Big Data Engineers
Data Scientists
send resume at
pkumar@equinix.com
18. Confidential â Š 2013 Equinix Inc. www.equinix.com 18
GLOBAL
DATA CENTERS
95+ Data Centers
9M+ Square Feet
99.999% Uptime Record
INTERCONNECTION
950+ Networks
110,000+ Cross Connects
BUSINESS
ECOSYSTEMS
Equinix Marketplaceâ˘
4,000+ Businesses
Revenue Opportunities
MOVING TOWARDS THE FUTURE | PLATFORM
Equinix: A Platform for Growth