SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Downloaden Sie, um offline zu lesen
Put your subtitle here. Feel free to pick from the handful of pretty Google colors available to you.
Make the subtitle something clever. People will think it’s neat.
Welcome!
DoIT International
Practicing multi-cloud since 2010.
Agenda
1
2
3
4
5
AWS Athena
Google BigQuery
Test Drive
Summary
Q & A
2
DoIT International confidential │ Do not distribute
About me..
Vadim Solovey - CTO // DoiT International
DoIT International confidential │ Do not distribute
DoIT International confidential │ Do not distribute
Put your subtitle here. Feel free to pick from the handful of pretty Google colors available to you.
Make the subtitle something clever. People will think it’s neat.
AWS Athena vs Google BigQuery
for interactive SQL queries on large datasets (#20/16)
Vadim Solovey - CTO // DoIT International
Google Cloud Developer Expert | AWS Certified Solutions Architect
Put your subtitle here. Feel free to pick from the handful of pretty Google colors available to you.
Make the subtitle something clever. People will think it’s neat.
Athena (/əˈθiːnə/; Greek:
- the goddess of
wisdom, craft,
and war
Put your subtitle here. Feel free to pick from the handful of pretty Google colors available to you.
Make the subtitle something clever. People will think it’s neat.
OR
Put your subtitle here. Feel free to pick from the handful of pretty Google colors available to you.
Make the subtitle something clever. People will think it’s neat.
Will Athena
slay BigQuery?
Vadim Solovey - CTO // DoIT International
Google Cloud Developer Expert | AWS Certified Solutions Architect
Section Slide Template Option 2
Put your subtitle here. Feel free to pick from the handful of pretty Google colors available to you.
Make the subtitle something clever. People will think it’s neat.
your mileage may
will vary
Warning:
DoIT International confidential │ Do not distribute
AWS Athena
• Serverless Analytical Columnar Database based on Facebook’s Presto
• Data:
• External Tables (*SV, JSON, ORC, PARQUET files in S3 bucket)
• Ingestion:
• Just store files on S3
• Convert to columnar/compressed format using EMR
• ANSI SQL 2011
• Priced at $5/TB of scanned data & standard S3 storage/ops costs
• Cost Optimization -converting data into columnar format, partitioning, limit queried columns.
DoIT International confidential │ Do not distribute
Google BigQuery
• Serverless Analytical Columnar Database based on Google Dremel
• Data:
• Native Tables
• External Tables (*SV, JSON, AVRO files stored in Google Cloud Storage bucket)
• Ingestion:
• File Imports
• Streaming API (up to 100K records/sec per table)
• Federated Tables (files in bucket, Bigtable table or Google Spreadsheet)
• ANSI SQL 2011
• Priced at $5/TB of scanned data + storage + streaming (if used)
• Cost Optimization - partitioning, limit queried columns, 24-hour cache, cold data.
DoIT International confidential │ Do not distribute
Summary
Feature  Product AWS Athena Google BigQuery
Data Formats *SV, JSON, PARQUET/z, ORC/z External (*SV, JSON, AVRO) / Native
ANSI SQL Support Yes* Yes*
DDL Support Only CREATE/ALTER/DROP CREATE/UPDATE/DELETE (w/ quotas)
Underlying Technology FB Presto Google Dremel
Caching No Yes
Cold Data Pricing S3 Lifecycle Policy 50% discount after 90 days of inactivity
User Defined Functions No Yes
Data Partitioning On Any Key By DAY
Pricing $5/TB (scanned) plus S3 ops $5/TB (scanned) less cached data
DoIT International confidential │ Do not distribute
How we tested?
• Dataset
• New York Yellow Taxi Public Dataset (https://data.cityofnewyork.us) [130GB, 1.1B rows]
• Akamai Log (30GB, 1B rows]
• BigQuery [NY Taxi]
• Import of data into native table
• External table on top of 500x uncompressed CSV files in GCS bucket
• Caching: off
• AWS Athena [NY Taxi]
• Copied 500x uncompressed CSV files from GCS to S3 bucket
• Using EMR 5.2 (HIVE/PRESTO) converted the data into ORC/z and PARQUET/z formats
DoIT International confidential │ Do not distribute
Tables & Formats
BigQuery
• trips_ext (500x CSV files, 490MB each) [245GB in total]
• trips_nat (130GB total)
AWS Athena
• trips_csv (500x CSV files, 490MB each)
• trips_par (4 files, 3.2GB each)
• trips_parz (8 files, 1.7GB each)
• trips_orc (8 files, 2GB each)
• trips_orcz (8 files, 2.1GB each)
DoIT International confidential │ Do not distribute
Test Drive Summary
Query Type AWS Athens (GB/time) Google BigQuery (GB/time) t.diff %
[1] LOOKUP 48MB (4.1s) 130GB (2.0s) - 51%
[2] LOOKUP & AGGR 331MB (4.35s) 13.4GB (2.7s) - 48%
[3] GROUP/ORDER BY 5.74GB (8.85s) 8.26GB (5.4s) - 27%
[4] TEXT FUNCTIONS 606MB (11.3s) 13.6GB (2.4s) - 470%
[5] JSON FUNCTIONS 29MB (17.8s) 63.9GB (8.9s) - 100%
[6] REGEX FUNCTIONS (1.3s) 5.45GB (1.9s) + 31%
[7] FEDERATED DATA 133GB (19.4s) 133GB (36.4s) +47%
DoIT International confidential │ Do not distribute
What Athena does better than BigQuery?
Advantages:
• Can be faster than BigQuery, especially with federated/external tables
• Ability to use regex to define a schema (query files without needing to change the format)
• Can be faster and cheaper than BigQuery when using a partitioned/columnar format
• Tables can be partitioned on any column
Issues:
• It’s not easy to convert data between formats
• Doesn’t support DDL, i.e. no insert/update/delete
• Randomly giving the HIVE_UNKNOWN_ERROR
• No streaming support
• Struggles with really large datasets
DoIT International confidential │ Do not distribute
What BigQuery does better than Athena?
• It has native table support giving it better performance and more features
• It’s easy to manipulate data, insert/update records and write query results back to a table
• Querying native tables is very fast
• Easy to convert non-columnar formats into a native table for columnar queries
• Supports UDFs, although they will be available in the future for Athena
• Supports nested tables (nested and repeated fields)
• Works well for petabyte scale queries
Section Slide Template Option 2
Put your subtitle here. Feel free to pick from the handful of pretty Google colors available to you.
Make the subtitle something clever. People will think it’s neat.
Questions?
DoIT International confidential │ Do not distribute
[1] Lookup Query
SELECT *
FROM trips_par
WHERE vendor_id = 'VTS'
LIMIT 10
Back
DoIT International confidential │ Do not distribute
[2] Lookup & Aggregation
SELECT max(passenger_count)
FROM trips_par
WHERE vendor_id <> 'VTS'
Back
DoIT International confidential │ Do not distribute
[3] GROUP BY / ORDER BY Query
SELECT substr(string(pickup_datetime),1,7) month,
COUNT(*) trips
FROM [doit-playground:playground.trips_nat]
WHERE substr(string(pickup_datetime),1,4) = '2014'
GROUP BY 1
ORDER BY 1
Back
DoIT International confidential │ Do not distribute
[4] ‘LIKE’ Functions Query
SELECT
count(*)
FROM
log_par
WHERE
UA LIKE '%AppleWebKit%' OR
Back
DoIT International confidential │ Do not distribute
[5] JSON Functions Query
SELECT
JSON_EXTRACT(Misc_Fields,'$.network.edgeIP') AS edgeIP, COUNT(*) AS total
FROM
[doit-playground:playground.akamai_errors]
GROUP BY
edgeIP
ORDER BY total DESC
LIMIT 10
Back
DoIT International confidential │ Do not distribute
[6] Regex Functions Query
SELECT *
FROM log_par
WHERE REGEXP_MATCH(reqPath, r'msn.*-home') LIMIT 10
Back

Weitere ähnliche Inhalte

Was ist angesagt?

Serverless Big Data Architecture on Google Cloud Platform at Credit OK
Serverless Big Data Architecture on Google Cloud Platform at Credit OKServerless Big Data Architecture on Google Cloud Platform at Credit OK
Serverless Big Data Architecture on Google Cloud Platform at Credit OKKriangkrai Chaonithi
 
Hoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on SparkHoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on SparkVinoth Chandar
 
Bleeding Edge Databases
Bleeding Edge DatabasesBleeding Edge Databases
Bleeding Edge DatabasesLynn Langit
 
InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...
InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...
InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...InfluxData
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for ExperimentationGleb Kanterov
 
Presto: Query Anything - Data Engineer’s perspective
Presto: Query Anything - Data Engineer’s perspectivePresto: Query Anything - Data Engineer’s perspective
Presto: Query Anything - Data Engineer’s perspectiveAlluxio, Inc.
 
Building a Real-time Stream Processing Pipeline - Kinesis Data Firehose, Amaz...
Building a Real-time Stream Processing Pipeline - Kinesis Data Firehose, Amaz...Building a Real-time Stream Processing Pipeline - Kinesis Data Firehose, Amaz...
Building a Real-time Stream Processing Pipeline - Kinesis Data Firehose, Amaz...★ Akshay Surve
 
Scylla @ Disney+ Hotstar
Scylla @ Disney+ HotstarScylla @ Disney+ Hotstar
Scylla @ Disney+ HotstarScyllaDB
 
Hadoop Networking at Datasift
Hadoop Networking at DatasiftHadoop Networking at Datasift
Hadoop Networking at Datasifthuguk
 
Migrating a multi tenant app to Azure (war biopic)
Migrating a multi tenant app to Azure (war biopic)Migrating a multi tenant app to Azure (war biopic)
Migrating a multi tenant app to Azure (war biopic)★ Akshay Surve
 
Change Data Capture with Data Collector @OVH
Change Data Capture with Data Collector @OVHChange Data Capture with Data Collector @OVH
Change Data Capture with Data Collector @OVHParis Data Engineers !
 
Presto Summit 2018 - 04 - Netflix Containers
Presto Summit 2018 - 04 - Netflix ContainersPresto Summit 2018 - 04 - Netflix Containers
Presto Summit 2018 - 04 - Netflix Containerskbajda
 
NDC Minnesota - Analyzing StackExchange data with Azure Data Lake
NDC Minnesota - Analyzing StackExchange data with Azure Data LakeNDC Minnesota - Analyzing StackExchange data with Azure Data Lake
NDC Minnesota - Analyzing StackExchange data with Azure Data LakeTom Kerkhove
 
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand UsersDisney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand UsersScyllaDB
 
Dynamic Object Routing
Dynamic Object RoutingDynamic Object Routing
Dynamic Object RoutingCloudian
 
Clickhouse at Cloudflare. By Marek Vavrusa
Clickhouse at Cloudflare. By Marek VavrusaClickhouse at Cloudflare. By Marek Vavrusa
Clickhouse at Cloudflare. By Marek VavrusaValery Tkachenko
 
Amazon RedShift - Ianni Vamvadelis
Amazon RedShift - Ianni VamvadelisAmazon RedShift - Ianni Vamvadelis
Amazon RedShift - Ianni Vamvadelishuguk
 
Presto Summit 2018 - 10 - Qubole
Presto Summit 2018  - 10 - QubolePresto Summit 2018  - 10 - Qubole
Presto Summit 2018 - 10 - Qubolekbajda
 
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...Altinity Ltd
 

Was ist angesagt? (20)

Serverless Big Data Architecture on Google Cloud Platform at Credit OK
Serverless Big Data Architecture on Google Cloud Platform at Credit OKServerless Big Data Architecture on Google Cloud Platform at Credit OK
Serverless Big Data Architecture on Google Cloud Platform at Credit OK
 
Hoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on SparkHoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on Spark
 
Bleeding Edge Databases
Bleeding Edge DatabasesBleeding Edge Databases
Bleeding Edge Databases
 
InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...
InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...
InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for Experimentation
 
Presto: Query Anything - Data Engineer’s perspective
Presto: Query Anything - Data Engineer’s perspectivePresto: Query Anything - Data Engineer’s perspective
Presto: Query Anything - Data Engineer’s perspective
 
Building a Real-time Stream Processing Pipeline - Kinesis Data Firehose, Amaz...
Building a Real-time Stream Processing Pipeline - Kinesis Data Firehose, Amaz...Building a Real-time Stream Processing Pipeline - Kinesis Data Firehose, Amaz...
Building a Real-time Stream Processing Pipeline - Kinesis Data Firehose, Amaz...
 
Scylla @ Disney+ Hotstar
Scylla @ Disney+ HotstarScylla @ Disney+ Hotstar
Scylla @ Disney+ Hotstar
 
Hadoop Networking at Datasift
Hadoop Networking at DatasiftHadoop Networking at Datasift
Hadoop Networking at Datasift
 
Migrating a multi tenant app to Azure (war biopic)
Migrating a multi tenant app to Azure (war biopic)Migrating a multi tenant app to Azure (war biopic)
Migrating a multi tenant app to Azure (war biopic)
 
Change Data Capture with Data Collector @OVH
Change Data Capture with Data Collector @OVHChange Data Capture with Data Collector @OVH
Change Data Capture with Data Collector @OVH
 
Presto Summit 2018 - 04 - Netflix Containers
Presto Summit 2018 - 04 - Netflix ContainersPresto Summit 2018 - 04 - Netflix Containers
Presto Summit 2018 - 04 - Netflix Containers
 
NDC Minnesota - Analyzing StackExchange data with Azure Data Lake
NDC Minnesota - Analyzing StackExchange data with Azure Data LakeNDC Minnesota - Analyzing StackExchange data with Azure Data Lake
NDC Minnesota - Analyzing StackExchange data with Azure Data Lake
 
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand UsersDisney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
 
Dynamic Object Routing
Dynamic Object RoutingDynamic Object Routing
Dynamic Object Routing
 
REDSHIFT - Amazon
REDSHIFT - AmazonREDSHIFT - Amazon
REDSHIFT - Amazon
 
Clickhouse at Cloudflare. By Marek Vavrusa
Clickhouse at Cloudflare. By Marek VavrusaClickhouse at Cloudflare. By Marek Vavrusa
Clickhouse at Cloudflare. By Marek Vavrusa
 
Amazon RedShift - Ianni Vamvadelis
Amazon RedShift - Ianni VamvadelisAmazon RedShift - Ianni Vamvadelis
Amazon RedShift - Ianni Vamvadelis
 
Presto Summit 2018 - 10 - Qubole
Presto Summit 2018  - 10 - QubolePresto Summit 2018  - 10 - Qubole
Presto Summit 2018 - 10 - Qubole
 
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
 

Andere mochten auch

Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQueryIntro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQueryChris Schalk
 
An indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
An indepth look at Google BigQuery Architecture by Felipe Hoffa of GoogleAn indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
An indepth look at Google BigQuery Architecture by Felipe Hoffa of GoogleData Con LA
 
Google BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperGoogle BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperMárton Kodok
 
Google BigQuery - Features & Benefits
Google BigQuery - Features & BenefitsGoogle BigQuery - Features & Benefits
Google BigQuery - Features & BenefitsAndreas Raible
 

Andere mochten auch (7)

Redshift VS BigQuery
Redshift VS BigQueryRedshift VS BigQuery
Redshift VS BigQuery
 
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQueryIntro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
 
An indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
An indepth look at Google BigQuery Architecture by Felipe Hoffa of GoogleAn indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
An indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
 
Google Cloud Spanner Preview
Google Cloud Spanner PreviewGoogle Cloud Spanner Preview
Google Cloud Spanner Preview
 
Google BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperGoogle BigQuery for Everyday Developer
Google BigQuery for Everyday Developer
 
Google BigQuery
Google BigQueryGoogle BigQuery
Google BigQuery
 
Google BigQuery - Features & Benefits
Google BigQuery - Features & BenefitsGoogle BigQuery - Features & Benefits
Google BigQuery - Features & Benefits
 

Ähnlich wie AWS Athena vs. Google BigQuery for interactive SQL Queries

Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Michael Rys
 
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Michael Rys
 
TDC2016POA | Trilha Cloud Computing - Microsoft Azure ? From Zero To Hero!
TDC2016POA | Trilha Cloud Computing - Microsoft Azure ? From Zero To Hero!TDC2016POA | Trilha Cloud Computing - Microsoft Azure ? From Zero To Hero!
TDC2016POA | Trilha Cloud Computing - Microsoft Azure ? From Zero To Hero!tdc-globalcode
 
Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Michael Rys
 
Optimizing Big Data to run in the Public Cloud
Optimizing Big Data to run in the Public CloudOptimizing Big Data to run in the Public Cloud
Optimizing Big Data to run in the Public CloudQubole
 
A lap around Azure Data Factory
A lap around Azure Data FactoryA lap around Azure Data Factory
A lap around Azure Data FactoryBizTalk360
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftAmazon Web Services
 
Data Modeling on Azure for Analytics
Data Modeling on Azure for AnalyticsData Modeling on Azure for Analytics
Data Modeling on Azure for AnalyticsIke Ellis
 
Amazon Athena Hands-On Workshop
Amazon Athena Hands-On WorkshopAmazon Athena Hands-On Workshop
Amazon Athena Hands-On WorkshopDoiT International
 
PASS_Summit_2019_Azure_Storage_Options_for_Analytics
PASS_Summit_2019_Azure_Storage_Options_for_AnalyticsPASS_Summit_2019_Azure_Storage_Options_for_Analytics
PASS_Summit_2019_Azure_Storage_Options_for_AnalyticsDustin Vannoy
 
Data modeling trends for analytics
Data modeling trends for analyticsData modeling trends for analytics
Data modeling trends for analyticsIke Ellis
 
Build a modern data platform.pptx
Build a modern data platform.pptxBuild a modern data platform.pptx
Build a modern data platform.pptxIke Ellis
 
Splitgraph: AHL talk
Splitgraph: AHL talkSplitgraph: AHL talk
Splitgraph: AHL talkSplitgraph
 
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stackAccelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stackAlluxio, Inc.
 
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...MongoDB
 
Big Data Day LA 2015 - What's New Tajo 0.10 and Beyond by Hyunsik Choi of Gruter
Big Data Day LA 2015 - What's New Tajo 0.10 and Beyond by Hyunsik Choi of GruterBig Data Day LA 2015 - What's New Tajo 0.10 and Beyond by Hyunsik Choi of Gruter
Big Data Day LA 2015 - What's New Tajo 0.10 and Beyond by Hyunsik Choi of GruterData Con LA
 
Big Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with AzureBig Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with AzureChristos Charmatzis
 
Building Cloud-Native Applications with Microsoft Windows Azure
Building Cloud-Native Applications with Microsoft Windows AzureBuilding Cloud-Native Applications with Microsoft Windows Azure
Building Cloud-Native Applications with Microsoft Windows AzureBill Wilder
 

Ähnlich wie AWS Athena vs. Google BigQuery for interactive SQL Queries (20)

GCP for AWS Professionals
GCP for AWS ProfessionalsGCP for AWS Professionals
GCP for AWS Professionals
 
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
 
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
 
TDC2016POA | Trilha Cloud Computing - Microsoft Azure ? From Zero To Hero!
TDC2016POA | Trilha Cloud Computing - Microsoft Azure ? From Zero To Hero!TDC2016POA | Trilha Cloud Computing - Microsoft Azure ? From Zero To Hero!
TDC2016POA | Trilha Cloud Computing - Microsoft Azure ? From Zero To Hero!
 
Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)
 
Optimizing Big Data to run in the Public Cloud
Optimizing Big Data to run in the Public CloudOptimizing Big Data to run in the Public Cloud
Optimizing Big Data to run in the Public Cloud
 
A lap around Azure Data Factory
A lap around Azure Data FactoryA lap around Azure Data Factory
A lap around Azure Data Factory
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
Data Modeling on Azure for Analytics
Data Modeling on Azure for AnalyticsData Modeling on Azure for Analytics
Data Modeling on Azure for Analytics
 
Amazon Athena Hands-On Workshop
Amazon Athena Hands-On WorkshopAmazon Athena Hands-On Workshop
Amazon Athena Hands-On Workshop
 
PASS_Summit_2019_Azure_Storage_Options_for_Analytics
PASS_Summit_2019_Azure_Storage_Options_for_AnalyticsPASS_Summit_2019_Azure_Storage_Options_for_Analytics
PASS_Summit_2019_Azure_Storage_Options_for_Analytics
 
Data modeling trends for analytics
Data modeling trends for analyticsData modeling trends for analytics
Data modeling trends for analytics
 
Build a modern data platform.pptx
Build a modern data platform.pptxBuild a modern data platform.pptx
Build a modern data platform.pptx
 
Splitgraph: AHL talk
Splitgraph: AHL talkSplitgraph: AHL talk
Splitgraph: AHL talk
 
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stackAccelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
 
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
 
Phissug s01 ep6, stretch database
Phissug s01 ep6, stretch databasePhissug s01 ep6, stretch database
Phissug s01 ep6, stretch database
 
Big Data Day LA 2015 - What's New Tajo 0.10 and Beyond by Hyunsik Choi of Gruter
Big Data Day LA 2015 - What's New Tajo 0.10 and Beyond by Hyunsik Choi of GruterBig Data Day LA 2015 - What's New Tajo 0.10 and Beyond by Hyunsik Choi of Gruter
Big Data Day LA 2015 - What's New Tajo 0.10 and Beyond by Hyunsik Choi of Gruter
 
Big Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with AzureBig Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with Azure
 
Building Cloud-Native Applications with Microsoft Windows Azure
Building Cloud-Native Applications with Microsoft Windows AzureBuilding Cloud-Native Applications with Microsoft Windows Azure
Building Cloud-Native Applications with Microsoft Windows Azure
 

Mehr von DoiT International

Terraform Modules Restructured
Terraform Modules RestructuredTerraform Modules Restructured
Terraform Modules RestructuredDoiT International
 
GAN training with Tensorflow and Tensor Cores
GAN training with Tensorflow and Tensor CoresGAN training with Tensorflow and Tensor Cores
GAN training with Tensorflow and Tensor CoresDoiT International
 
Orchestrating Redis & K8s Operators
Orchestrating Redis & K8s OperatorsOrchestrating Redis & K8s Operators
Orchestrating Redis & K8s OperatorsDoiT International
 
K8s best practices from the field!
K8s best practices from the field!K8s best practices from the field!
K8s best practices from the field!DoiT International
 
An Open-Source Platform to Connect, Manage, and Secure Microservices
An Open-Source Platform to Connect, Manage, and Secure MicroservicesAn Open-Source Platform to Connect, Manage, and Secure Microservices
An Open-Source Platform to Connect, Manage, and Secure MicroservicesDoiT International
 
Is your Elastic Cluster Stable and Production Ready?
Is your Elastic Cluster Stable and Production Ready?Is your Elastic Cluster Stable and Production Ready?
Is your Elastic Cluster Stable and Production Ready?DoiT International
 
Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing
Cloud Dataflow - A Unified Model for Batch and Streaming Data ProcessingCloud Dataflow - A Unified Model for Batch and Streaming Data Processing
Cloud Dataflow - A Unified Model for Batch and Streaming Data ProcessingDoiT International
 
AWS Cyber Security Best Practices
AWS Cyber Security Best PracticesAWS Cyber Security Best Practices
AWS Cyber Security Best PracticesDoiT International
 
Google BigQuery 101 & What’s New
Google BigQuery 101 & What’s NewGoogle BigQuery 101 & What’s New
Google BigQuery 101 & What’s NewDoiT International
 
Running Production-Grade Kubernetes on AWS
Running Production-Grade Kubernetes on AWSRunning Production-Grade Kubernetes on AWS
Running Production-Grade Kubernetes on AWSDoiT International
 
Scaling Jenkins with Kubernetes by Ami Mahloof
Scaling Jenkins with Kubernetes by Ami MahloofScaling Jenkins with Kubernetes by Ami Mahloof
Scaling Jenkins with Kubernetes by Ami MahloofDoiT International
 
CI Implementation with Kubernetes at LivePerson by Saar Demri
CI Implementation with Kubernetes at LivePerson by Saar DemriCI Implementation with Kubernetes at LivePerson by Saar Demri
CI Implementation with Kubernetes at LivePerson by Saar DemriDoiT International
 
Kubernetes @ Nanit by Chen Fisher
Kubernetes @ Nanit by Chen FisherKubernetes @ Nanit by Chen Fisher
Kubernetes @ Nanit by Chen FisherDoiT International
 
Dataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data ProcessingDataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data ProcessingDoiT International
 
Kubernetes - State of the Union (Q1-2016)
Kubernetes - State of the Union (Q1-2016)Kubernetes - State of the Union (Q1-2016)
Kubernetes - State of the Union (Q1-2016)DoiT International
 

Mehr von DoiT International (16)

Terraform Modules Restructured
Terraform Modules RestructuredTerraform Modules Restructured
Terraform Modules Restructured
 
GAN training with Tensorflow and Tensor Cores
GAN training with Tensorflow and Tensor CoresGAN training with Tensorflow and Tensor Cores
GAN training with Tensorflow and Tensor Cores
 
Orchestrating Redis & K8s Operators
Orchestrating Redis & K8s OperatorsOrchestrating Redis & K8s Operators
Orchestrating Redis & K8s Operators
 
K8s best practices from the field!
K8s best practices from the field!K8s best practices from the field!
K8s best practices from the field!
 
An Open-Source Platform to Connect, Manage, and Secure Microservices
An Open-Source Platform to Connect, Manage, and Secure MicroservicesAn Open-Source Platform to Connect, Manage, and Secure Microservices
An Open-Source Platform to Connect, Manage, and Secure Microservices
 
Is your Elastic Cluster Stable and Production Ready?
Is your Elastic Cluster Stable and Production Ready?Is your Elastic Cluster Stable and Production Ready?
Is your Elastic Cluster Stable and Production Ready?
 
Applying ML for Log Analysis
Applying ML for Log AnalysisApplying ML for Log Analysis
Applying ML for Log Analysis
 
Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing
Cloud Dataflow - A Unified Model for Batch and Streaming Data ProcessingCloud Dataflow - A Unified Model for Batch and Streaming Data Processing
Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing
 
AWS Cyber Security Best Practices
AWS Cyber Security Best PracticesAWS Cyber Security Best Practices
AWS Cyber Security Best Practices
 
Google BigQuery 101 & What’s New
Google BigQuery 101 & What’s NewGoogle BigQuery 101 & What’s New
Google BigQuery 101 & What’s New
 
Running Production-Grade Kubernetes on AWS
Running Production-Grade Kubernetes on AWSRunning Production-Grade Kubernetes on AWS
Running Production-Grade Kubernetes on AWS
 
Scaling Jenkins with Kubernetes by Ami Mahloof
Scaling Jenkins with Kubernetes by Ami MahloofScaling Jenkins with Kubernetes by Ami Mahloof
Scaling Jenkins with Kubernetes by Ami Mahloof
 
CI Implementation with Kubernetes at LivePerson by Saar Demri
CI Implementation with Kubernetes at LivePerson by Saar DemriCI Implementation with Kubernetes at LivePerson by Saar Demri
CI Implementation with Kubernetes at LivePerson by Saar Demri
 
Kubernetes @ Nanit by Chen Fisher
Kubernetes @ Nanit by Chen FisherKubernetes @ Nanit by Chen Fisher
Kubernetes @ Nanit by Chen Fisher
 
Dataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data ProcessingDataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data Processing
 
Kubernetes - State of the Union (Q1-2016)
Kubernetes - State of the Union (Q1-2016)Kubernetes - State of the Union (Q1-2016)
Kubernetes - State of the Union (Q1-2016)
 

Kürzlich hochgeladen

Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 

Kürzlich hochgeladen (20)

Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 

AWS Athena vs. Google BigQuery for interactive SQL Queries

  • 1. Put your subtitle here. Feel free to pick from the handful of pretty Google colors available to you. Make the subtitle something clever. People will think it’s neat. Welcome! DoIT International Practicing multi-cloud since 2010.
  • 3. DoIT International confidential │ Do not distribute About me.. Vadim Solovey - CTO // DoiT International
  • 4. DoIT International confidential │ Do not distribute
  • 5. DoIT International confidential │ Do not distribute
  • 6. Put your subtitle here. Feel free to pick from the handful of pretty Google colors available to you. Make the subtitle something clever. People will think it’s neat. AWS Athena vs Google BigQuery for interactive SQL queries on large datasets (#20/16) Vadim Solovey - CTO // DoIT International Google Cloud Developer Expert | AWS Certified Solutions Architect
  • 7. Put your subtitle here. Feel free to pick from the handful of pretty Google colors available to you. Make the subtitle something clever. People will think it’s neat. Athena (/əˈθiːnə/; Greek: - the goddess of wisdom, craft, and war
  • 8. Put your subtitle here. Feel free to pick from the handful of pretty Google colors available to you. Make the subtitle something clever. People will think it’s neat. OR
  • 9. Put your subtitle here. Feel free to pick from the handful of pretty Google colors available to you. Make the subtitle something clever. People will think it’s neat. Will Athena slay BigQuery? Vadim Solovey - CTO // DoIT International Google Cloud Developer Expert | AWS Certified Solutions Architect
  • 10. Section Slide Template Option 2 Put your subtitle here. Feel free to pick from the handful of pretty Google colors available to you. Make the subtitle something clever. People will think it’s neat. your mileage may will vary Warning:
  • 11. DoIT International confidential │ Do not distribute AWS Athena • Serverless Analytical Columnar Database based on Facebook’s Presto • Data: • External Tables (*SV, JSON, ORC, PARQUET files in S3 bucket) • Ingestion: • Just store files on S3 • Convert to columnar/compressed format using EMR • ANSI SQL 2011 • Priced at $5/TB of scanned data & standard S3 storage/ops costs • Cost Optimization -converting data into columnar format, partitioning, limit queried columns.
  • 12. DoIT International confidential │ Do not distribute Google BigQuery • Serverless Analytical Columnar Database based on Google Dremel • Data: • Native Tables • External Tables (*SV, JSON, AVRO files stored in Google Cloud Storage bucket) • Ingestion: • File Imports • Streaming API (up to 100K records/sec per table) • Federated Tables (files in bucket, Bigtable table or Google Spreadsheet) • ANSI SQL 2011 • Priced at $5/TB of scanned data + storage + streaming (if used) • Cost Optimization - partitioning, limit queried columns, 24-hour cache, cold data.
  • 13. DoIT International confidential │ Do not distribute Summary Feature Product AWS Athena Google BigQuery Data Formats *SV, JSON, PARQUET/z, ORC/z External (*SV, JSON, AVRO) / Native ANSI SQL Support Yes* Yes* DDL Support Only CREATE/ALTER/DROP CREATE/UPDATE/DELETE (w/ quotas) Underlying Technology FB Presto Google Dremel Caching No Yes Cold Data Pricing S3 Lifecycle Policy 50% discount after 90 days of inactivity User Defined Functions No Yes Data Partitioning On Any Key By DAY Pricing $5/TB (scanned) plus S3 ops $5/TB (scanned) less cached data
  • 14. DoIT International confidential │ Do not distribute How we tested? • Dataset • New York Yellow Taxi Public Dataset (https://data.cityofnewyork.us) [130GB, 1.1B rows] • Akamai Log (30GB, 1B rows] • BigQuery [NY Taxi] • Import of data into native table • External table on top of 500x uncompressed CSV files in GCS bucket • Caching: off • AWS Athena [NY Taxi] • Copied 500x uncompressed CSV files from GCS to S3 bucket • Using EMR 5.2 (HIVE/PRESTO) converted the data into ORC/z and PARQUET/z formats
  • 15. DoIT International confidential │ Do not distribute Tables & Formats BigQuery • trips_ext (500x CSV files, 490MB each) [245GB in total] • trips_nat (130GB total) AWS Athena • trips_csv (500x CSV files, 490MB each) • trips_par (4 files, 3.2GB each) • trips_parz (8 files, 1.7GB each) • trips_orc (8 files, 2GB each) • trips_orcz (8 files, 2.1GB each)
  • 16. DoIT International confidential │ Do not distribute Test Drive Summary Query Type AWS Athens (GB/time) Google BigQuery (GB/time) t.diff % [1] LOOKUP 48MB (4.1s) 130GB (2.0s) - 51% [2] LOOKUP & AGGR 331MB (4.35s) 13.4GB (2.7s) - 48% [3] GROUP/ORDER BY 5.74GB (8.85s) 8.26GB (5.4s) - 27% [4] TEXT FUNCTIONS 606MB (11.3s) 13.6GB (2.4s) - 470% [5] JSON FUNCTIONS 29MB (17.8s) 63.9GB (8.9s) - 100% [6] REGEX FUNCTIONS (1.3s) 5.45GB (1.9s) + 31% [7] FEDERATED DATA 133GB (19.4s) 133GB (36.4s) +47%
  • 17. DoIT International confidential │ Do not distribute What Athena does better than BigQuery? Advantages: • Can be faster than BigQuery, especially with federated/external tables • Ability to use regex to define a schema (query files without needing to change the format) • Can be faster and cheaper than BigQuery when using a partitioned/columnar format • Tables can be partitioned on any column Issues: • It’s not easy to convert data between formats • Doesn’t support DDL, i.e. no insert/update/delete • Randomly giving the HIVE_UNKNOWN_ERROR • No streaming support • Struggles with really large datasets
  • 18. DoIT International confidential │ Do not distribute What BigQuery does better than Athena? • It has native table support giving it better performance and more features • It’s easy to manipulate data, insert/update records and write query results back to a table • Querying native tables is very fast • Easy to convert non-columnar formats into a native table for columnar queries • Supports UDFs, although they will be available in the future for Athena • Supports nested tables (nested and repeated fields) • Works well for petabyte scale queries
  • 19. Section Slide Template Option 2 Put your subtitle here. Feel free to pick from the handful of pretty Google colors available to you. Make the subtitle something clever. People will think it’s neat. Questions?
  • 20. DoIT International confidential │ Do not distribute [1] Lookup Query SELECT * FROM trips_par WHERE vendor_id = 'VTS' LIMIT 10 Back
  • 21. DoIT International confidential │ Do not distribute [2] Lookup & Aggregation SELECT max(passenger_count) FROM trips_par WHERE vendor_id <> 'VTS' Back
  • 22. DoIT International confidential │ Do not distribute [3] GROUP BY / ORDER BY Query SELECT substr(string(pickup_datetime),1,7) month, COUNT(*) trips FROM [doit-playground:playground.trips_nat] WHERE substr(string(pickup_datetime),1,4) = '2014' GROUP BY 1 ORDER BY 1 Back
  • 23. DoIT International confidential │ Do not distribute [4] ‘LIKE’ Functions Query SELECT count(*) FROM log_par WHERE UA LIKE '%AppleWebKit%' OR Back
  • 24. DoIT International confidential │ Do not distribute [5] JSON Functions Query SELECT JSON_EXTRACT(Misc_Fields,'$.network.edgeIP') AS edgeIP, COUNT(*) AS total FROM [doit-playground:playground.akamai_errors] GROUP BY edgeIP ORDER BY total DESC LIMIT 10 Back
  • 25. DoIT International confidential │ Do not distribute [6] Regex Functions Query SELECT * FROM log_par WHERE REGEXP_MATCH(reqPath, r'msn.*-home') LIMIT 10 Back