SlideShare a Scribd company logo
1 of 24
CONFIDENTIAL
1
Praveen Kumar
Emerging Software Platforms,
Global Software Engineering
Mar 2014
Equinix Big Data Platform & Cassandra
Confidential – © 2013 Equinix Inc. www.equinix.com 2
Big Data at Equinix
~2 million
Alarms
~200k
interconnections
~250k
Electrical circuits
Sensors across 95+ IBXs
~40k
Infrastructure objects
Confidential – © 2013 Equinix Inc. www.equinix.com 3
Big Data at Equinix
Sensors across 95+ IBXs
Lead to / produce
Support for multiple protocols
Push as well pull methods
Time series data
Cross sectional dataNot so clean data
High velocity
Clean data Lots and lots of noise
Some useful intel
Confidential – © 2013 Equinix Inc. www.equinix.com 4
Big Data at Equinix
What do we use(or plan to use) this data for?
Customer Presentment Billing
Operations New Product & Services
Confidential – © 2013 Equinix Inc. www.equinix.com 5
Big Data at Equinix
Use-case analysis : 80-20 rule
~80% of use-cases analyzed act upon “Hot Data”
~80% of data for most of use-cases analyzed is time-series.
All “quick win” use-cases need data mediation, aggregation and roll-up for
presentment.
Real-time to near real-time processing of events
Collection, processing and storage technologies suitable for
time-series data.
Collection, mediation, cross-referencing and co-relation of
data from different sources; roll-up and aggregate.
Confidential – © 2013 Equinix Inc. www.equinix.com 6
Big Data at Equinix
Our Approach : Equinix Big Data Platform
 Common platform to be shared by all initial Big
Data use cases – multi tenancy
 Built on inexpensive hardware using free or
inexpensive software
 Seamless & massive scalability using scale-out
 High reliability - partial failover, graceful
degradation, self-healing, self-balancing
 Data ingestion and processing capabilities for
high volumes at high velocity
 Support for structured and semi-structured data
 Provides real-time processing abilities
 Provides parallel processing capabilities
 Support for low latency queries, wide range
scan queries and search
 Provides abstraction via connectors,
frameworks and libraries
 Support for low latency queries, wide range
scan queries and search
 Support for predictive analytics using machine
learning
Immediate requirements
Long term goals
Data Sources
Java
Messages
Flat Files
FTP
Log
Streams
RDBMS
JSON
Files
Files
(Unstructured)
Equinix Big Data Platform
Ingestion Layer
Connector
Parser
Data
Processor
Writer
Real-time Processing Layer
Repository
Raw Data
Processed/Derived Data
Parallel Processing Layer
Reconciliator Deep Analyzer
Real-time
monitoring
Real-time Predictive
Analytics
Access Layer
Low latency
Ad-hoc access
Batch frameworkLarge range
data access
Big Data Platform - Logical Architecture (technology agnostic)
Confidential – © 2013 Equinix Inc. www.equinix.com 7
Big Data at Equinix
Documents
Sensors
Requirements & Technologies considered for Big Data Platform
Data Sources
Data Collection &
Ingestion
Data Processing
& Storage
Data
Intelligence
Data Visualization
Sales Cloud
Service Cloud
On Premise Apps
Equinix Custom Apps
Oracle eBiz, Siebel….
Equinix Custom Apps
Real-time Analytics
Ad-hoc Analysis
Dashboards
Log Analysis
Bulk/ Trend Analysis
Data Ingestion capabilities
• Scale-out System
• Real-time validation
• Real-time analytics
• Supports stream, batch,
extraction on industry
standard protocols
Data Formats / Types
• System & App Logs
• Usage Data (Time-series)
• Behavior tracking events
• Complex business events
• Transactional & operational
• Master & meta data
Apache Kafka
Apache Scribe
Apache Flume
Batch Reporting
Predictive Modeling
Alerts & Notifications
Search
• Machine learning
• Pattern detection
• Regression analysis
• Time-series analysis
• Statistical modeling
• Clustering
• Classification
• Recommendation
engine
• Parallel processing
capabilities
• Scale-out System
• Runs on inexpensive HW
• High availability
• Supports structured, semi-
structured & unstructured
data.
• Fast write-speed
• NoSQL capabilities
• Time-series data support
• Data mart capability
• Relational schema support
• Self-healing capability
Confidential – © 2013 Equinix Inc. www.equinix.com 8
Big Data at Equinix
Grand Finale
Hadoop Ecosystem vs. DataStax Enterprise
SearchSearch
SearchSearch
AnalyticsAnalytics
StorageStorageAnalyticsAnalytics
StorageStorage
StorageStorage
Hadoop Distributed File System
(Storage/Analytics)
NameNode Secondary Name Node
Data Nodes (Storage)
HBase (Storage/Analytics)
Hbase Master
Hbase Region Servers
Hbase Master
Search
Management
Services
Cloudera Manager
Solr Nodes
Zookeeper
Pros
• Scalability
• Cloud readiness
• Resource availability
• Industry momentum
• Product eco-system
maturity
• Technical support
Cons
• Infrastructure footprint
• Operational Complexity
• Learning curve
• Availability
• Total cost of ownership
Pros
• Infrastructure footprint
• Operational ease
• Scalability
• Availability
• Cloud readiness
• Learning curve
• Resource availability
• Technical support
• Total cost of ownership
Cons
• Industry momentum
• Product eco-system
maturity
Confidential – © 2013 Equinix Inc. www.equinix.com 9
Criteria Cassandra HBase
CAP Theorem Focus Availability, Partition-Tolerance Consistency, Availability
Data Partitioning
Supports ordered & random partitioning, random
partitioning is recommended.
Ordered Partitioning. Load balancing
achieved through resharding.
Distributed System P2P architecture (Amazon Dynamo)
Master / Slave via HDFS, Zookeeper for
coordination
Administration & Maintenance Medium High
Single Write Master No (R+W+1 to get Strong Consistency) Yes
Multi-tenancy Yes Yes
Secondary indexes
Supports secondary indexes on CF where column
name is known.
Does not natively support secondary indexes.
Consistency Tunable Consistency Strict consistency (Not ACID)
Hot Spot Problem
No, distributes load across nodes using random
partition strategy.
Yes, one node may handle most of the traffic
due to ordered partition.
Multi-Data Center Support
and Disaster Recovery
Asynchronous replication via WAN Asynchronous replication via WAN
Single point of failure Ring topology, there is no single point of failure.
Although there exists a concept of a master
server, HBase itself does not depend on it
heavily. HBase cluster can keep serving data
even if the master goes down. Hadoop
namenode is a single point of failure.
Commercial vendors Datastax, Acunu Clodera, Hortonworks
Cassandra Vs. HBase
Big Data at Equinix
Confidential – © 2013 Equinix Inc. www.equinix.com 10
Why DSE Cassandra
Big Data at Equinix
Support for Analytics
Integrated search using Solr
Security features
Cluster management capabilities
Commercial support
DataStax would probably list lots of more reasons, these are the reasons
which made sense to us.
Confidential – © 2013 Equinix Inc. www.equinix.com 11
Big Data at Equinix
Grand Finale
Hadoop Ecosystem vs. DataStax Enterprise
SearchSearch
SearchSearch
AnalyticsAnalytics
StorageStorageAnalyticsAnalytics
StorageStorage
StorageStorage
Hadoop Distributed File System
(Storage/Analytics)
NameNode Secondary Name Node
Data Nodes (Storage)
HBase (Storage/Analytics)
Hbase Master
Hbase Region Servers
Hbase Master
Search
Management
Services
Cloudera Manager
Solr Nodes
Zookeeper
Pros
• Scalability
• Cloud readiness
• Resource availability
• Industry momentum
• Product eco-system
maturity
• Technical support
Cons
• Infrastructure footprint
• Operational Complexity
• Learning curve
• Availability
• Total cost of ownership
Pros
• Infrastructure footprint
• Operational ease
• Scalability
• Availability
• Cloud readiness
• Learning curve
• Resource availability
• Technical support
• Total cost of ownership
Cons
• Industry momentum
• Product eco-system
maturity
 Sold
Confidential – © 2013 Equinix Inc. www.equinix.com 12
Big Data at Equinix
How far are we on our Big Data journey?
 Pilot use-case from PoC to Production
 Moved network statistics use case from RRD
based solution to DSE Cassandra
 Build in progress for
 power monitoring use cases
 data center monitoring
 network monitoring
In-plans
 Recommendation engine on interconnection
platform
 Use case analysis and technology selection for
connected data sets
 Building data science capabilities for use cases
requiring predictive modeling
A few data points
Physical bare metal boxes for DSE
nodes
Densely packed data nodes with 4TB
storage on each node, 96GB RAM
About ~250 million records a day
Also used for log analysis for internal
IT systems monitoring use-cases
Confidential – © 2013 Equinix Inc. www.equinix.com 13
Big Data at Equinix
Experience so far
Lack of standards based connectors / drivers
DataStax has developed a Java Driver, but doesn’t support JDBC
No data visualization tools to access from Cassandra for low-latency access
No data access tools (Toad equivalent) available yet
Datastax DevCenter is trying to solve this problem
We
used Astyanax and are evaluating DataStax java driver
built libraries to abstract Astyanax for application engineering teams
built rest services for data access by applications
Confidential – © 2013 Equinix Inc. www.equinix.com 14
Big Data at Equinix
Where do we go from here??
Graph databases
Batch processing (Hadoop, Spark , MapReduce ??)
Interactive queries
Online data processing
Data analytics
Data science and machine learning
Data visualization tools and applications
Developer toolkits
We are hiring
Big Data Engineers
Data Scientists
send resume at
pkumar@equinix.com
CONFIDENTIAL
15
Thank you!
• pkmr.work@gmail.com
• pkumar@equinix.com
• www.equinix.com
EQUINIX?
Confidential – © 2013 Equinix Inc. www.equinix.com 17
WHO IS EQUINIX?
Confidential – © 2013 Equinix Inc. www.equinix.com 18
GLOBAL
DATA CENTERS
95+ Data Centers
9M+ Square Feet
99.999% Uptime Record
INTERCONNECTION
950+ Networks
110,000+ Cross Connects
BUSINESS
ECOSYSTEMS
Equinix Marketplace™
4,000+ Businesses
Revenue Opportunities
MOVING TOWARDS THE FUTURE | PLATFORM
Equinix: A Platform for Growth
Solid. Powerful. Growing.
$1.8B
IN ANNUALIZED
REVENUE
MEMBER OF THE NASDAQ 100
$7B
INVESTMENTS
IN EXPANSION
15 COUNTRIES
5 CONTINENTS
31 MARKETS
Confidential – © 2013 Equinix Inc. www.equinix.com 21
HOW WE’RE DIFFERENT | GLOBAL FOOTPRINT
Where You Are. Where You Need To Be.
90%
PASS THROUGH EQUINIX DATA CENTERS
OVER
OF INTERNET ROUTES
950+NETWORK PROVIDERS
450+
CLOUD & SaaS
PROVIDERS
CONFIDENTIAL
24
Thank you!
• pkmr.work@gmail.com
• pkumar@equinix.com
• www.equinix.com

More Related Content

What's hot

Equinix and Customers to Present on "Mobility" at PTC '13.
Equinix and Customers to Present on "Mobility" at PTC '13.Equinix and Customers to Present on "Mobility" at PTC '13.
Equinix and Customers to Present on "Mobility" at PTC '13.Equinix
 
Cloudera Altus: Big Data in der Cloud einfach gemacht
Cloudera Altus: Big Data in der Cloud einfach gemachtCloudera Altus: Big Data in der Cloud einfach gemacht
Cloudera Altus: Big Data in der Cloud einfach gemachtCloudera, Inc.
 
Cloud Computing Stats - Benefitting SMBs
Cloud Computing Stats - Benefitting SMBsCloud Computing Stats - Benefitting SMBs
Cloud Computing Stats - Benefitting SMBsRapidScale
 
Cloudera + Syncsort: Fuel Business Insights, Analytics, and Next Generation T...
Cloudera + Syncsort: Fuel Business Insights, Analytics, and Next Generation T...Cloudera + Syncsort: Fuel Business Insights, Analytics, and Next Generation T...
Cloudera + Syncsort: Fuel Business Insights, Analytics, and Next Generation T...Precisely
 
AWS DC Summit - Data Led Migration
AWS DC Summit - Data Led MigrationAWS DC Summit - Data Led Migration
AWS DC Summit - Data Led MigrationSandy Carter
 
Turn Big Data into Big Value on Informatica and AWS
Turn Big Data into Big Value on Informatica and AWSTurn Big Data into Big Value on Informatica and AWS
Turn Big Data into Big Value on Informatica and AWSAmazon Web Services
 
The Vortex of Change - Digital Transformation (Presented by Intel)
The Vortex of Change - Digital Transformation (Presented by Intel)The Vortex of Change - Digital Transformation (Presented by Intel)
The Vortex of Change - Digital Transformation (Presented by Intel)Cloudera, Inc.
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...DataStax
 
Cloudera - IoT & Smart Cities
Cloudera - IoT & Smart CitiesCloudera - IoT & Smart Cities
Cloudera - IoT & Smart CitiesCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Exploring Interconnection Oriented Architectures with AWS
Exploring Interconnection Oriented Architectures with AWSExploring Interconnection Oriented Architectures with AWS
Exploring Interconnection Oriented Architectures with AWSAmazon Web Services
 
Equinix Performance Hub gives Enterprise Networks a Giant Boost
Equinix Performance Hub gives Enterprise Networks a Giant BoostEquinix Performance Hub gives Enterprise Networks a Giant Boost
Equinix Performance Hub gives Enterprise Networks a Giant BoostEquinix
 
Cloud Computing Stats - Cloud in the Enterprise
Cloud Computing Stats - Cloud in the EnterpriseCloud Computing Stats - Cloud in the Enterprise
Cloud Computing Stats - Cloud in the EnterpriseRapidScale
 
EVOLUTION Denver
EVOLUTION Denver EVOLUTION Denver
EVOLUTION Denver Equinix
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Digital Integration Hub - Maximise Your APIs
Digital Integration Hub - Maximise Your APIsDigital Integration Hub - Maximise Your APIs
Digital Integration Hub - Maximise Your APIsDaniel Toomey
 
7 Innovations That Will Transform IT Operations
7 Innovations That Will Transform IT Operations7 Innovations That Will Transform IT Operations
7 Innovations That Will Transform IT OperationsOpsRamp
 
IDC Infographic - How Flash Fits into Your Cloud
IDC Infographic - How Flash Fits into Your CloudIDC Infographic - How Flash Fits into Your Cloud
IDC Infographic - How Flash Fits into Your CloudWestern Digital
 
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...Cloudera, Inc.
 

What's hot (20)

Equinix and Customers to Present on "Mobility" at PTC '13.
Equinix and Customers to Present on "Mobility" at PTC '13.Equinix and Customers to Present on "Mobility" at PTC '13.
Equinix and Customers to Present on "Mobility" at PTC '13.
 
Cloudera Altus: Big Data in der Cloud einfach gemacht
Cloudera Altus: Big Data in der Cloud einfach gemachtCloudera Altus: Big Data in der Cloud einfach gemacht
Cloudera Altus: Big Data in der Cloud einfach gemacht
 
Cloud Computing Stats - Benefitting SMBs
Cloud Computing Stats - Benefitting SMBsCloud Computing Stats - Benefitting SMBs
Cloud Computing Stats - Benefitting SMBs
 
Cloudera + Syncsort: Fuel Business Insights, Analytics, and Next Generation T...
Cloudera + Syncsort: Fuel Business Insights, Analytics, and Next Generation T...Cloudera + Syncsort: Fuel Business Insights, Analytics, and Next Generation T...
Cloudera + Syncsort: Fuel Business Insights, Analytics, and Next Generation T...
 
AWS DC Summit - Data Led Migration
AWS DC Summit - Data Led MigrationAWS DC Summit - Data Led Migration
AWS DC Summit - Data Led Migration
 
Turn Big Data into Big Value on Informatica and AWS
Turn Big Data into Big Value on Informatica and AWSTurn Big Data into Big Value on Informatica and AWS
Turn Big Data into Big Value on Informatica and AWS
 
The Vortex of Change - Digital Transformation (Presented by Intel)
The Vortex of Change - Digital Transformation (Presented by Intel)The Vortex of Change - Digital Transformation (Presented by Intel)
The Vortex of Change - Digital Transformation (Presented by Intel)
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
 
Cloudera - IoT & Smart Cities
Cloudera - IoT & Smart CitiesCloudera - IoT & Smart Cities
Cloudera - IoT & Smart Cities
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Exploring Interconnection Oriented Architectures with AWS
Exploring Interconnection Oriented Architectures with AWSExploring Interconnection Oriented Architectures with AWS
Exploring Interconnection Oriented Architectures with AWS
 
Equinix Performance Hub gives Enterprise Networks a Giant Boost
Equinix Performance Hub gives Enterprise Networks a Giant BoostEquinix Performance Hub gives Enterprise Networks a Giant Boost
Equinix Performance Hub gives Enterprise Networks a Giant Boost
 
Cloud Computing Stats - Cloud in the Enterprise
Cloud Computing Stats - Cloud in the EnterpriseCloud Computing Stats - Cloud in the Enterprise
Cloud Computing Stats - Cloud in the Enterprise
 
EVOLUTION Denver
EVOLUTION Denver EVOLUTION Denver
EVOLUTION Denver
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Digital Integration Hub - Maximise Your APIs
Digital Integration Hub - Maximise Your APIsDigital Integration Hub - Maximise Your APIs
Digital Integration Hub - Maximise Your APIs
 
Top 5 IoT Use Cases
Top 5 IoT Use CasesTop 5 IoT Use Cases
Top 5 IoT Use Cases
 
7 Innovations That Will Transform IT Operations
7 Innovations That Will Transform IT Operations7 Innovations That Will Transform IT Operations
7 Innovations That Will Transform IT Operations
 
IDC Infographic - How Flash Fits into Your Cloud
IDC Infographic - How Flash Fits into Your CloudIDC Infographic - How Flash Fits into Your Cloud
IDC Infographic - How Flash Fits into Your Cloud
 
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
 

Viewers also liked

Big Data in the Cloud? Yes, you can do it in OpenStack
Big Data in the Cloud? Yes, you can do it in OpenStackBig Data in the Cloud? Yes, you can do it in OpenStack
Big Data in the Cloud? Yes, you can do it in OpenStackObed N Muñoz
 
Montreal Linux MeetUp - OpenStack Overview (2017.10.03)
Montreal Linux MeetUp - OpenStack Overview (2017.10.03)Montreal Linux MeetUp - OpenStack Overview (2017.10.03)
Montreal Linux MeetUp - OpenStack Overview (2017.10.03)Stacy Véronneau
 
日本OpenStackユーザ会 Atlantaサミット報告会 Swift関連報告
日本OpenStackユーザ会 Atlantaサミット報告会 Swift関連報告日本OpenStackユーザ会 Atlantaサミット報告会 Swift関連報告
日本OpenStackユーザ会 Atlantaサミット報告会 Swift関連報告Kota Tsuyuzaki
 
201708 OpenStack Seminar in Myanmar
201708 OpenStack Seminar in Myanmar201708 OpenStack Seminar in Myanmar
201708 OpenStack Seminar in MyanmarTakashi Torii
 
201711 OpenStack Summit Sydney Report
201711 OpenStack Summit Sydney Report201711 OpenStack Summit Sydney Report
201711 OpenStack Summit Sydney ReportTakashi Torii
 
OpenStack & the Evolving Cloud Ecosystem
OpenStack & the Evolving Cloud EcosystemOpenStack & the Evolving Cloud Ecosystem
OpenStack & the Evolving Cloud EcosystemMark Voelker
 

Viewers also liked (7)

Big Data in the Cloud? Yes, you can do it in OpenStack
Big Data in the Cloud? Yes, you can do it in OpenStackBig Data in the Cloud? Yes, you can do it in OpenStack
Big Data in the Cloud? Yes, you can do it in OpenStack
 
OpenStack Swift紹介
OpenStack Swift紹介OpenStack Swift紹介
OpenStack Swift紹介
 
Montreal Linux MeetUp - OpenStack Overview (2017.10.03)
Montreal Linux MeetUp - OpenStack Overview (2017.10.03)Montreal Linux MeetUp - OpenStack Overview (2017.10.03)
Montreal Linux MeetUp - OpenStack Overview (2017.10.03)
 
日本OpenStackユーザ会 Atlantaサミット報告会 Swift関連報告
日本OpenStackユーザ会 Atlantaサミット報告会 Swift関連報告日本OpenStackユーザ会 Atlantaサミット報告会 Swift関連報告
日本OpenStackユーザ会 Atlantaサミット報告会 Swift関連報告
 
201708 OpenStack Seminar in Myanmar
201708 OpenStack Seminar in Myanmar201708 OpenStack Seminar in Myanmar
201708 OpenStack Seminar in Myanmar
 
201711 OpenStack Summit Sydney Report
201711 OpenStack Summit Sydney Report201711 OpenStack Summit Sydney Report
201711 OpenStack Summit Sydney Report
 
OpenStack & the Evolving Cloud Ecosystem
OpenStack & the Evolving Cloud EcosystemOpenStack & the Evolving Cloud Ecosystem
OpenStack & the Evolving Cloud Ecosystem
 

Similar to Equinix Big Data Platform and Cassandra - A view into the journey

Cassandra Day SV 2014: Apache Cassandra at Equinix for High Performance, Scal...
Cassandra Day SV 2014: Apache Cassandra at Equinix for High Performance, Scal...Cassandra Day SV 2014: Apache Cassandra at Equinix for High Performance, Scal...
Cassandra Day SV 2014: Apache Cassandra at Equinix for High Performance, Scal...DataStax Academy
 
Analytics and Lakehouse Integration Options for Oracle Applications
Analytics and Lakehouse Integration Options for Oracle ApplicationsAnalytics and Lakehouse Integration Options for Oracle Applications
Analytics and Lakehouse Integration Options for Oracle ApplicationsRay Février
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptxFedoRam1
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduCloudera, Inc.
 
Actian Analytics Platform - Hadoop SQL Edition
Actian Analytics Platform - Hadoop SQL EditionActian Analytics Platform - Hadoop SQL Edition
Actian Analytics Platform - Hadoop SQL EditionAlessandro Salvatico
 
Build Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsightBuild Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsightDataWorks Summit/Hadoop Summit
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionJames Serra
 
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)Trivadis
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database RoundtableEric Kavanagh
 
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...confluent
 
Get Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber SolutionGet Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber SolutionCloudera, Inc.
 
Building a scalable analytics environment to support diverse workloads
Building a scalable analytics environment to support diverse workloadsBuilding a scalable analytics environment to support diverse workloads
Building a scalable analytics environment to support diverse workloadsAlluxio, Inc.
 
Revolutionizing the customer experience - Hello Engagement Database
Revolutionizing the customer experience - Hello Engagement DatabaseRevolutionizing the customer experience - Hello Engagement Database
Revolutionizing the customer experience - Hello Engagement DatabaseDipti Borkar
 
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformHow to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformCloudera, Inc.
 
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...VMworld
 
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...Amazon Web Services
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Cardinality-HL-Overview
Cardinality-HL-OverviewCardinality-HL-Overview
Cardinality-HL-OverviewHarry Frost
 

Similar to Equinix Big Data Platform and Cassandra - A view into the journey (20)

Cassandra Day SV 2014: Apache Cassandra at Equinix for High Performance, Scal...
Cassandra Day SV 2014: Apache Cassandra at Equinix for High Performance, Scal...Cassandra Day SV 2014: Apache Cassandra at Equinix for High Performance, Scal...
Cassandra Day SV 2014: Apache Cassandra at Equinix for High Performance, Scal...
 
Analytics and Lakehouse Integration Options for Oracle Applications
Analytics and Lakehouse Integration Options for Oracle ApplicationsAnalytics and Lakehouse Integration Options for Oracle Applications
Analytics and Lakehouse Integration Options for Oracle Applications
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptx
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
 
Actian Analytics Platform - Hadoop SQL Edition
Actian Analytics Platform - Hadoop SQL EditionActian Analytics Platform - Hadoop SQL Edition
Actian Analytics Platform - Hadoop SQL Edition
 
Build Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsightBuild Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsight
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
 
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
 
Get Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber SolutionGet Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber Solution
 
Building a scalable analytics environment to support diverse workloads
Building a scalable analytics environment to support diverse workloadsBuilding a scalable analytics environment to support diverse workloads
Building a scalable analytics environment to support diverse workloads
 
Revolutionizing the customer experience - Hello Engagement Database
Revolutionizing the customer experience - Hello Engagement DatabaseRevolutionizing the customer experience - Hello Engagement Database
Revolutionizing the customer experience - Hello Engagement Database
 
AWS Big Data Solution Days
AWS Big Data Solution DaysAWS Big Data Solution Days
AWS Big Data Solution Days
 
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformHow to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
 
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
 
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Cardinality-HL-Overview
Cardinality-HL-OverviewCardinality-HL-Overview
Cardinality-HL-Overview
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 

Recently uploaded

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Equinix Big Data Platform and Cassandra - A view into the journey

  • 1. CONFIDENTIAL 1 Praveen Kumar Emerging Software Platforms, Global Software Engineering Mar 2014 Equinix Big Data Platform & Cassandra
  • 2. Confidential – © 2013 Equinix Inc. www.equinix.com 2 Big Data at Equinix ~2 million Alarms ~200k interconnections ~250k Electrical circuits Sensors across 95+ IBXs ~40k Infrastructure objects
  • 3. Confidential – © 2013 Equinix Inc. www.equinix.com 3 Big Data at Equinix Sensors across 95+ IBXs Lead to / produce Support for multiple protocols Push as well pull methods Time series data Cross sectional dataNot so clean data High velocity Clean data Lots and lots of noise Some useful intel
  • 4. Confidential – © 2013 Equinix Inc. www.equinix.com 4 Big Data at Equinix What do we use(or plan to use) this data for? Customer Presentment Billing Operations New Product & Services
  • 5. Confidential – © 2013 Equinix Inc. www.equinix.com 5 Big Data at Equinix Use-case analysis : 80-20 rule ~80% of use-cases analyzed act upon “Hot Data” ~80% of data for most of use-cases analyzed is time-series. All “quick win” use-cases need data mediation, aggregation and roll-up for presentment. Real-time to near real-time processing of events Collection, processing and storage technologies suitable for time-series data. Collection, mediation, cross-referencing and co-relation of data from different sources; roll-up and aggregate.
  • 6. Confidential – © 2013 Equinix Inc. www.equinix.com 6 Big Data at Equinix Our Approach : Equinix Big Data Platform  Common platform to be shared by all initial Big Data use cases – multi tenancy  Built on inexpensive hardware using free or inexpensive software  Seamless & massive scalability using scale-out  High reliability - partial failover, graceful degradation, self-healing, self-balancing  Data ingestion and processing capabilities for high volumes at high velocity  Support for structured and semi-structured data  Provides real-time processing abilities  Provides parallel processing capabilities  Support for low latency queries, wide range scan queries and search  Provides abstraction via connectors, frameworks and libraries  Support for low latency queries, wide range scan queries and search  Support for predictive analytics using machine learning Immediate requirements Long term goals Data Sources Java Messages Flat Files FTP Log Streams RDBMS JSON Files Files (Unstructured) Equinix Big Data Platform Ingestion Layer Connector Parser Data Processor Writer Real-time Processing Layer Repository Raw Data Processed/Derived Data Parallel Processing Layer Reconciliator Deep Analyzer Real-time monitoring Real-time Predictive Analytics Access Layer Low latency Ad-hoc access Batch frameworkLarge range data access Big Data Platform - Logical Architecture (technology agnostic)
  • 7. Confidential – © 2013 Equinix Inc. www.equinix.com 7 Big Data at Equinix Documents Sensors Requirements & Technologies considered for Big Data Platform Data Sources Data Collection & Ingestion Data Processing & Storage Data Intelligence Data Visualization Sales Cloud Service Cloud On Premise Apps Equinix Custom Apps Oracle eBiz, Siebel…. Equinix Custom Apps Real-time Analytics Ad-hoc Analysis Dashboards Log Analysis Bulk/ Trend Analysis Data Ingestion capabilities • Scale-out System • Real-time validation • Real-time analytics • Supports stream, batch, extraction on industry standard protocols Data Formats / Types • System & App Logs • Usage Data (Time-series) • Behavior tracking events • Complex business events • Transactional & operational • Master & meta data Apache Kafka Apache Scribe Apache Flume Batch Reporting Predictive Modeling Alerts & Notifications Search • Machine learning • Pattern detection • Regression analysis • Time-series analysis • Statistical modeling • Clustering • Classification • Recommendation engine • Parallel processing capabilities • Scale-out System • Runs on inexpensive HW • High availability • Supports structured, semi- structured & unstructured data. • Fast write-speed • NoSQL capabilities • Time-series data support • Data mart capability • Relational schema support • Self-healing capability
  • 8. Confidential – © 2013 Equinix Inc. www.equinix.com 8 Big Data at Equinix Grand Finale Hadoop Ecosystem vs. DataStax Enterprise SearchSearch SearchSearch AnalyticsAnalytics StorageStorageAnalyticsAnalytics StorageStorage StorageStorage Hadoop Distributed File System (Storage/Analytics) NameNode Secondary Name Node Data Nodes (Storage) HBase (Storage/Analytics) Hbase Master Hbase Region Servers Hbase Master Search Management Services Cloudera Manager Solr Nodes Zookeeper Pros • Scalability • Cloud readiness • Resource availability • Industry momentum • Product eco-system maturity • Technical support Cons • Infrastructure footprint • Operational Complexity • Learning curve • Availability • Total cost of ownership Pros • Infrastructure footprint • Operational ease • Scalability • Availability • Cloud readiness • Learning curve • Resource availability • Technical support • Total cost of ownership Cons • Industry momentum • Product eco-system maturity
  • 9. Confidential – © 2013 Equinix Inc. www.equinix.com 9 Criteria Cassandra HBase CAP Theorem Focus Availability, Partition-Tolerance Consistency, Availability Data Partitioning Supports ordered & random partitioning, random partitioning is recommended. Ordered Partitioning. Load balancing achieved through resharding. Distributed System P2P architecture (Amazon Dynamo) Master / Slave via HDFS, Zookeeper for coordination Administration & Maintenance Medium High Single Write Master No (R+W+1 to get Strong Consistency) Yes Multi-tenancy Yes Yes Secondary indexes Supports secondary indexes on CF where column name is known. Does not natively support secondary indexes. Consistency Tunable Consistency Strict consistency (Not ACID) Hot Spot Problem No, distributes load across nodes using random partition strategy. Yes, one node may handle most of the traffic due to ordered partition. Multi-Data Center Support and Disaster Recovery Asynchronous replication via WAN Asynchronous replication via WAN Single point of failure Ring topology, there is no single point of failure. Although there exists a concept of a master server, HBase itself does not depend on it heavily. HBase cluster can keep serving data even if the master goes down. Hadoop namenode is a single point of failure. Commercial vendors Datastax, Acunu Clodera, Hortonworks Cassandra Vs. HBase Big Data at Equinix
  • 10. Confidential – © 2013 Equinix Inc. www.equinix.com 10 Why DSE Cassandra Big Data at Equinix Support for Analytics Integrated search using Solr Security features Cluster management capabilities Commercial support DataStax would probably list lots of more reasons, these are the reasons which made sense to us.
  • 11. Confidential – © 2013 Equinix Inc. www.equinix.com 11 Big Data at Equinix Grand Finale Hadoop Ecosystem vs. DataStax Enterprise SearchSearch SearchSearch AnalyticsAnalytics StorageStorageAnalyticsAnalytics StorageStorage StorageStorage Hadoop Distributed File System (Storage/Analytics) NameNode Secondary Name Node Data Nodes (Storage) HBase (Storage/Analytics) Hbase Master Hbase Region Servers Hbase Master Search Management Services Cloudera Manager Solr Nodes Zookeeper Pros • Scalability • Cloud readiness • Resource availability • Industry momentum • Product eco-system maturity • Technical support Cons • Infrastructure footprint • Operational Complexity • Learning curve • Availability • Total cost of ownership Pros • Infrastructure footprint • Operational ease • Scalability • Availability • Cloud readiness • Learning curve • Resource availability • Technical support • Total cost of ownership Cons • Industry momentum • Product eco-system maturity  Sold
  • 12. Confidential – © 2013 Equinix Inc. www.equinix.com 12 Big Data at Equinix How far are we on our Big Data journey?  Pilot use-case from PoC to Production  Moved network statistics use case from RRD based solution to DSE Cassandra  Build in progress for  power monitoring use cases  data center monitoring  network monitoring In-plans  Recommendation engine on interconnection platform  Use case analysis and technology selection for connected data sets  Building data science capabilities for use cases requiring predictive modeling A few data points Physical bare metal boxes for DSE nodes Densely packed data nodes with 4TB storage on each node, 96GB RAM About ~250 million records a day Also used for log analysis for internal IT systems monitoring use-cases
  • 13. Confidential – © 2013 Equinix Inc. www.equinix.com 13 Big Data at Equinix Experience so far Lack of standards based connectors / drivers DataStax has developed a Java Driver, but doesn’t support JDBC No data visualization tools to access from Cassandra for low-latency access No data access tools (Toad equivalent) available yet Datastax DevCenter is trying to solve this problem We used Astyanax and are evaluating DataStax java driver built libraries to abstract Astyanax for application engineering teams built rest services for data access by applications
  • 14. Confidential – © 2013 Equinix Inc. www.equinix.com 14 Big Data at Equinix Where do we go from here?? Graph databases Batch processing (Hadoop, Spark , MapReduce ??) Interactive queries Online data processing Data analytics Data science and machine learning Data visualization tools and applications Developer toolkits We are hiring Big Data Engineers Data Scientists send resume at pkumar@equinix.com
  • 15. CONFIDENTIAL 15 Thank you! • pkmr.work@gmail.com • pkumar@equinix.com • www.equinix.com
  • 17. Confidential – © 2013 Equinix Inc. www.equinix.com 17 WHO IS EQUINIX?
  • 18. Confidential – © 2013 Equinix Inc. www.equinix.com 18 GLOBAL DATA CENTERS 95+ Data Centers 9M+ Square Feet 99.999% Uptime Record INTERCONNECTION 950+ Networks 110,000+ Cross Connects BUSINESS ECOSYSTEMS Equinix Marketplace™ 4,000+ Businesses Revenue Opportunities MOVING TOWARDS THE FUTURE | PLATFORM Equinix: A Platform for Growth
  • 19. Solid. Powerful. Growing. $1.8B IN ANNUALIZED REVENUE MEMBER OF THE NASDAQ 100 $7B INVESTMENTS IN EXPANSION
  • 21. Confidential – © 2013 Equinix Inc. www.equinix.com 21 HOW WE’RE DIFFERENT | GLOBAL FOOTPRINT Where You Are. Where You Need To Be.
  • 22. 90% PASS THROUGH EQUINIX DATA CENTERS OVER OF INTERNET ROUTES 950+NETWORK PROVIDERS
  • 24. CONFIDENTIAL 24 Thank you! • pkmr.work@gmail.com • pkumar@equinix.com • www.equinix.com