5. Infochimps Big Data Platform
HBase
Elastic-
search
Hadoop
Command
Center
Platform
API
Zabbix
Zookeepers Chef MySQLNFS
Backup
Scheduler
Listener Queue
Storm
HTTP(S)
Syslog
Archive
Storage
You only worry about a tiny
part of the overall platform.
9. Variety, Velocity, & Volume
LOGTXT
CSV
XML
HTTP
JSON
Input Data
Cloud::Streams
Your Application
Command Center
A complete managed service for
custom analytics in the
public, private, or hybrid cloud.
Cloud::Queries
Cloud::Hadoop
13. Infochimps Cloud Pillars
Fast
• Completely Integrated &
Unified Architecture
• Deployed in hours
• Expanded in minutes
8/17/2013 Infochimps Confidential 13
Simple
• We focus on
Infrastructure Managed
Services
• Customers focus on data
& applications
Flexible
• Cloud Agnostic
• Modular
• Portable
• Open Standards Based
Scalable
• Elastic Cloud
Infrastructure
• Linearly Scalable Across
All Big Data Functions
• Enterprise Class
Hinweis der Redaktion
The only part you have to worry about is in the yellow circle. This is the same deploy pack that runs on your local machine for development.
Key MessagesWe help you leverage the people and resources you already have.Infochimps Cloud eliminates all the implementation headaches caused by Big Data enabling your Big Data applications to be completed quickly and fully achieve their objectives.Working with Big Data shouldn’t require you to hire rocket scientists or send your team to 12 weeks of Hadoop boot camp. Infochimps and our partners empower your existing teams to implement any data-driven application your Big Data vision requires. How it WorksYour largest, fastest data sources are streamed in to the Infochimps cloud, where real-time transformation, aggregation, decoration, and matching can be done.Data is then saved to a database for querying (typically Elasticsearch, Hbase, or MySQL).Simultaneously, data is saved to Hadoop for things like historical processing.Technical PointsInfochimps is a managed cloud service provider, and handles everything except your application. Also, all your application has to worry about is pointing to the database for querying data.The ETA for records is far less than 5 seconds, and we have customers who have SLA’s of under 1 second.
Key MessagesFrom: http://www.infochimps.com/infochimps-cloud/cloud-services/cloud-streams/Streaming data and real-time analytics -- Easily handle millions of events per second with in-stream ETL and analyticsIt’s not enough anymore to simply perform historical analysis and batch reports. In situations where you need to make well-informed decisions in real-time, the data and insights must also be timely and immediately actionable. Cloud::Streams lets you process data as it flows into your application, powering real-time dashboards and on-the-fly analytics and delivering data seamlessly to Hadoop clusters and NoSQL databases.Single-purpose ETL solutions are rapidly being replaced with multi-node, multi-purpose data integration platforms — the universal glue that connects systems together and makes Big Data analytics feasible. Cloud::Streams is a linearly scalable, fault-tolerant distributed routing framework for data integration, collection, and streaming data processing. Ready-to-go integration connectors allow you to tap into virtually any internal or external data source that your application needs.BenefitsEasily integrate with virtually any data source, both live/in-motion as well as bulk/at restProcess data as it flows, at scale – not only generating real-time insights, but also delivering data to databases and Hadoop clusters that has already been cleaned, transformed, and augmented/enhancedSolve any business use case with the ability to handle any complexity business logic and parallel stream computingWrite your analytics once when leveraging Wukong – then run in both real-time with Cloud::Streams and in batch with Cloud::Hadoop
Key MessagesAd hoc and interactive analytics -- power your Big Data applications with data you can queryCloud::Queries, a cloud service delivered by Infochimps Cloud, enables advanced distributed text search, any-format document storage and database tables with more than 1B rows — structured and un-structured. Databases and data storage are provided as a cloud service, including worry-free database maintenance, updates and support. Depending on your application requirements, multiple storage technologies may be appropriate including NoSQL and New SQL databases such as HBase, Cassandra, Elasticsearch, MongoDB or even MySQL. Whatever your needs, with Cloud::Queries you’ll have the most powerful cloud database for the job, scaling to the needs of your business and providing APIs that will support your most demanding ad hoc and interactive queries and applications.BenefitsEliminate frustrations of large-scale database administration and data managementTight integration with Big Data processing workflows and delivery paths results for a truly comprehensive Big Data stackLinearly scalable, distributed systems support of the most demanding applications and analytics queries
From http://www.infochimps.com/infochimps-cloud/cloud-services/cloud-hadoop/Key MessagesElastic Hadoop and large-scale batch analytics -- The easiest way to configure and manage Hadoop clusters in the cloudYour team recognizes the power that massively parallel data analysis can provide, and Hadoop is the standard to handle massively scalable data. Cloud::Hadoop, a cloud service delivered by Infochimps™ Cloud, is the ideal Hadoop solution. Turn clusters on at a moment’s notice with advanced elastic spin-up/spin-down capabilities, scale and customize on the fly and leverage tools such as Pig, Hive and Wukong that make Hadoop easier to use and much more useful for enterprises.BenefitsFocus on building applications and answering business questions, not on keeping an extremely complex Hadoop cluster happy and performantScale up to meet any data processing demand through superior elasticityBe more efficient with resources, while still having quick access to HDFS data, with instantly elastic and high performing clustersWrite your analytics once when leveraging Wukong, then run both in batch with Cloud::Hadoop and in real-time streaming with Cloud::Streams