in this presentation we go through the differences and similarities between Redshift and BigQuery. It was presented during the Athens Big Data meetup May 2017.
4. Amazon Redshift
Released on 2012 (beta)
based on ParAccel (PostgreSQL clone)
Designed for OLAP and BIapplications
Relationaland Columnardatabase
Petabyteto Exabytescale (Spectrum)
5. Google BigQuery
evolution of Dremel(2006)
Initially launched in 2010
WebService on top of Dremel Technology
More of a hybridsystem (columnar + nested data)
Petabytescale
6. Amazon Redshift Google BigQuery
Build on top of a proven technology
Relational
SQL
Analysts
Build something from scratch
Nested data structures are a first class citizen
NoSQL
Developers
VS
12. Data Types
Redshift:Closer to the StandardSQLdatatypes (e.g. INT4,
INT8) but doesnotsupport the full range of PostgreSQL
data types
BigQuery: Smaller set of data types supported. But...
13. Data Types
Redshift:Very basic support for JSON
BigQuery: Support for Array and STRUCT types. Nesteddata
structures are first class citizens.
15. Data Manipulation
BigQuery used to be appendonly, now it supportsUpdates
andDeletes (DML). But still limited.
Redshift always had this Supported via SQL but with a
catch (Vacuum)
16. Table Manipulation
BigQuery: Limited and expensive via standard SQL, or via
HTTP API (but you have to unload and reload the table).
Redshift: Supported via SQL
Both support views but not materialized
18. Data Consistency
Redshift supports transactions VS BigQuery No
Deduplication harder to be achieved on BigQuery (costly
also).
Even more complex when we go streaming.
21. Cluster Management
Here is where BigQueryreallyshines. It is fullymanaged with
supportforHA.
Redshift doesnotabstract completely the hardware from the
user and it is difficult to implement it as a HA service.
This changes with Spectrum.
26. Amazon Redshift Google BigQuery
Resources capped by your cluster size
No quotas related to inserts/updates
etc
2,000 slots per account
Encourages the append only model with
strict DML quotas
Both have a limit of 50 concurrent Queries
Cluster resizing a pain with Redshift.
39. Amazon Redshift Google BigQuery
More predictable costs
More intuitive data modeling (Analysts)
Options for optimizations
Easier & cheaper to start with
Good for nested data
Easier to work with time series
VS