Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Cloud platform aws-gcp-azure-bluemix
1. A couple of days ago I came across the article "Mapping AWS, Google Cloud, Azure Services to Big
Data Warehouse Architecture" here. I do know a bit about data warehousing, and even big data
warehouse architecture. However, what interests me is actually a "map of various cloud services
against the big data warehouse architecture". More precisely, cloud services from "the three most
popular cloud platforms: Microsoft Azure, Google Cloud Platform, and Amazon AWS" are mapped to
their open source origination and/or counterparts. As a technical IBMer, my primary area is Big Data &
Advanced Analytics, but I happen to know a little about the IBM Bluemix platform. So for (more)
completeness, here it comes Bluemix! - Note though, here only Bluemix services involved in big data
warehouse architecture are listed. To explore more, see Bluemix website.
Disclaimer
1.While I'm employed by IBM this article represents completely my personal viewpoints.
Furthermore, I've tried my best but still I can't guarantee the 100% completeness, accuracy,
and/or potential services changes.
2.The original author of the article aforementioned own(s) the copyright and by no means I'm
modifying the content. Neither do I agree nor disagree with the author on the content. However,
for convenience, I'm putting the original table (or map) along with their IBM Bluemix
counterparts side by side.
PS, Due to space limitation, all the open source stuff in the Bluemix column refers to the cloud service
provisioned by IBM Bluemix rather than the original open source software, e.g., HDFS/Hadoop/Hive,
etc. means the individual component within BigInsights for Apache Hadoop or BigInsights for Apache
Hadoop (Subscription) service and PostgreSQL refers to ElephantSQL and/or Compose for
PostgreSQL service.
2. Open Source Amazon AWS Microsoft Azure Google Cloud IBM Bluemix
Batch Ingest
Sqoop
File Transfer
Flume
StreamSets
AWS Data Transfer
Services (various
options)
Import/Export
Service
Data Factory
Cloud DataFlow
Sqoop
File Transfer
Lift (Aspera)
Flume
Various services
Streaming Ingest
Flume
StreamSets
Amazon Kinesis
Firehose
Event Hubs
IOT Hub
Cloud DataFlow
Flume, Spark
Streaming Analytics
Persistent
Storage
HDFS
RDBMS
S3, Glacier
RDS
Storage Blob
HDFS
SQL Database
Persistent Disk
Google Cloud
Storage
Cloud SQL
HDFS
RDBMS (IBM
Proprietary: Db2,
dashDB, Informix ...
open source: MySQL,
PostgreSQL ...
NoSQL: MongoDB,
Redis, Cloudant ...
Block Storage, Cloud
Object Storage, File
Storage, CDN, etc.
Transient Storage Kafka Kinesis
Event Hubs
IOT Hub
HDInsight (Kafka)
Cloud Pub/Sub
Cloud IoT Core
Kafka, Message Hub
Batch Processing
Hive
Flink, Spark
MapReduce
PostgreSQL
EMR Spark
EMR Hadoop
EMR Presto
AWS Batch
Redshift
Azure Batch
HDInisght
(Spark/Map Reduce)
SQL Data
Warehouse
Data Lake Analytics
Cloud Dataflow
(open source
Apache Beam)
Cloud DataProc
(Spark, Hadoop)
Hive, Spark,
MapReduce, MySQL,
PostgreSQL
Db2, Information Server
on Cloud, etc.
Stream
Processing
Flink
Spark
Beam
Amazon Kinesis
Streams
Amazon Kinesis
Analytics
EMR Spark
Stream Analytics
HDInsight (Storm,
Spark)
Cloud Dataflow
(open source
Apache Beam)
DataProc (Spark,
Hadoop)
Spark
Streaming Analytics
Machine
Learning
Scikit
Tensorflow
Spark MLLib
Lex
Polly
Recognition
Azure ML
Cognitive Services
Natural
Language
SpeechTranslati
Data Science
Experience (includes
3. TensorFlow
etc.
Huge number
of libraries
Amazon Machine
Learning
on
Vision
Video
ML Engine
support for R, Python
with scikit, TensorFlow,
Spark with MLLib, etc.)
Watson Machine
Learning
Serving Storage
Graph
JanusGraph
N/A Marketplace
Only, e.g. OrientDB
N/A Marketplace
only, e.g OrientDB
N/A IBM Graph
Serving Storage
BI/EDW
Impala +
Kudu
Redshift
Athena
SQL Data
Warehouse
BigQuery
Db2 for Warehouse
BigSQL
Serving Storage
Search (keywords
+ facets)
Solr
Amazon
CloudSearch
Amazon
Elasticsearch
Azure Search
N/A
Marketplace,
e.g. Solr
Solr, Compose for
ElasticSearch
Serving Storage
RDBMS
PostgreSQL RDS SQL DB Cloud SQL
IBM Proprietary: Db2,
dashDB, Informix ...
and open source:
MySQL, PostgreSQL ...
Serving Storage
NoSQL
HBase DynamoDB
HDInsight (HBase)
CosmosDB
BigTable
Spanner
DataStore
NoSQL: HBase,
MongoDB, Redis,
Cloudant, Redis ...
Sandboxes
Notebook
Zeppelin EMR Zeppelin Azure Notebooks Cloud Datalab
Data Science
Experience (Juypter)
Spark
Sandboxes Data
Science or
Preparation
Platform
Dataiku DSS
Community
Edition (not
open source)
N/A Marketplace
only, e.g. Dataiku
DSS
N/A Marketplace
only, e.g. Dataiku
DSS
Cloud DataPrep
(beta). Under the
hood this is
Trifacta.
Data Science
Experience
Clients/Data
Apps
Superset (BI) Quicksight PowerBI
Google Data
Studio
Data Science
Experience
Watson Machine
Learning
Decision Optimization
Orchestration Airflow AWS Data Pipeline Data Factory
N/A
Marketplace
Workload Scheduler (?)
ETL Tool N/A AWS Glue (beta) Data Factory N/A
Marketplace
Data Connect
Information Server on
4. Cloud
MDM Hub N/A N/A Marketplace N/A Marketplace
N/A
Marketplace
MDM on Cloud
Lineage N/A AWS Glue (beta) N/A N/A
Information Server on
Cloud
Catalog N/A AWS Glue (beta) Data Catalog
N/A
Marketplace
Information Server on
Cloud