SlideShare ist ein Scribd-Unternehmen logo
1 von 67
Downloaden Sie, um offline zu lesen
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
discovery, interpretation communication meaningful
patterns in data
What is my
revenue growth
month by month?
How is my marketing
campaign working?
Which age group
had the most
insurance claims?
What is the
crime rate by
cities?
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Organic revenue growth
every 5
years
15
years
live for
1,000x
scale
>10x
grows
11 8 5 4
How do I provide democratized
access to data to enable
informed decisions while at the
same time enforce data
governance and prevent
mismanagement of the data?
more valuable
Hadoop Elasticsearch Presto Spark
Democratization
of data
Governance
& control
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Democratization
of data
Governance
& control
Hadoop Elasticsearch Presto Spark
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
// //
Broken view of your business and your customers
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Any scale, concurrency, with low cost, high throughput &
performance
Data from new sources, streaming, batch, real-time
Increasingly diverse types of data
Democratization of data – usage by many people of various skills,
make it easy run & operate
Choice of tools, techniques, and applications
I WANT SUPPORT FOR . . .
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Lakes Provide Customers with what they want…
Single source of truth in a single store (data lake)
Flexibility to grow to any scale, with low costs
Choice to analyze data in a variety of ways
Avoid lock-in, store data in open formats
Democratize analytics with security & governance
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Marketplace
Redshift
Data warehousing
EMR
Hadoop +
Spark
Athena
Interactive analytics
Kinesis
Analytics Real-
time
Elasticsearch service
Operational Analytics
RDS
MySQL, PostgreSQL,
MariaDB, Oracle, SQL Server
Aurora
MySQL, PostgreSQL
QuickSight SageMaker
DynamoDB
Key value, Document
ElastiCache
Redis, Memcached
Neptune
Graph
Timestream
Time Series
QLDB
Ledger Database
S3/Glacier
Glue
ETL & Data Catalog
Lake Formation
Data Lakes
Database Migration Service | Snowball | Snowmobile | Kinesis Data Firehose | Kinesis Data Streams | Data Pipeline | Direct Connect
Data Movement
AnalyticsDatabases
Business Intelligence & Machine Learning
Data Lake
Managed
Blockchain
Blockchain
Templates
Blockchain
Comprehend Rekognition Lex Transcribe DeepLens 250+ solutions
730+ Database
solutions
600+ Analytics
solutions
25+ Blockchain
solutions
20+ Data lake
solutions
30+ solutions
RDS on VMWare
#awsanalytics / #awsbuilders
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
process more than 2 Exabytes of data
Most popular Fastest
More than 15K customers
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Based on the cloud DW benchmark derived from TPS-DS 30 TB dataset, 4-node cluster
Redshift Vendor 1 Vendor 2
Queries Per Hour
(Higher is better)
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Fastest Most cost-effective
up to 75% $758,845
average annual benefits
per TB per year
$319,300
higher revenue
per 100TB per year
469%
Less than the #2 cloud DW with
on-demand pricing and 75% less
with Reserved Instances (RIs)
*based on IDC’s “ROI of Amazon Redshift paper”, 2017
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Fastest Most cost-effective Integrates with your data lake
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Forrester Wave™ Big Data Warehouse Q4 2018
AWS rated top in the
leader bracket and
received a score of
5/5 (the highest
score possible) in a
number of areas
such as Use Cases,
Roadmap, Market
Awareness, and
Ability to Execute
AWS positioned as
a Leader in the
Gartner Magic
Quadrant for Data
Management
Gartner Magic Quadrant, 2018
Solutions for
Analytics
a data warehouse that extends to the data lake
data warehousing that
integrates seamlessly with the data lake
• Fully-managed
• Massively parallel OLAP architecture
scales to query GBs to EBs of data
• Automatic scaling
• Secure
• Highly-rated most popular
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Five Key Highlights
• Amazon Redshift has a service SLA of 99.9%
• Amazon Redshift mirrors data onto a second node
• Amazon Redshift automatically detects and recovers
from a disk or node failure
• Amazon Redshift automatically backups your data
• Amazon Redshift can automatically replicate your
backups to another AWS region (e.g. DR site)
Load
Unload
Backup
Restore
metadata
processing
nodes
parallel
restore
Amazon Simple Storage
Service )
SQL clients/BI tools
128GB RAM
16TB disk
16 cores
JDBC/ODBC
128gb ram
16TB disk
16 coresCompute
node
128gb ram
16TB disk
16 coresCompute
node
128gb ram
16TB disk
16 coresCompute
node
Leader
node
Amazon S3
...
1 2 3 4 N
Amazon
Redshift
Spectrum
Load
Query
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• A compute node is partitioned into either 2 or 16 slices;
a slice can be thought of as a “virtual compute node”
• Each slice is allocated a portion of the compute node's
memory and disk space, where it processes a portion of
the workload assigned to the compute node by the
leader node
• The leader node manages distributing data to the slices
and apportions the workload for any queries or other
database operations to the slices
• Slices are Redshift’s Symmetric Multiprocessing (SMP)
mechanism – they work in parallel to complete
operations
Compute Node
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
A Redshift cluster can have up to128 ds2.8xlarge
nodes (2 petabytes of local storage) and can support
exabytes of data with its Redshift Spectrum feature
Note: AWS reserves the right to change instance
types at any time. For example, DC1 is a
DEPRECATED dense-compute instance type that
SHOULD NOT BE USED. Instead, upgrade from DC1
to DC2 for the same price with better performance
Instance Family Instance Type
Storage
Memory
#
CPUs
# Slices $
Disk type Capacity
Dense-
Compute
DC2 large NVMe
SSD
160 GB 16 GB 2 2 $
DC2 8xlarge NVMe
SSD
2.56 TB 244 GB 32 16 $$
Dense-Storage
DS2 xlarge Magnetic 2 TB 32 GB 4 2 $
DS2 8xlarge Magnetic 16 TB 244 GB 36 16 $$
• Dense-Compute (DC2) Nodes – Solid state disks
• Dense-Storage (DS2) Nodes – Magnetic disks
• The key difference between instance types is the
compute/storage ratio and storage latency (SSD vs
Magnetic storage)
Redshift instance types
are named according to
their corresponding
Amazon EC2 instance
types; for more
information, visit
https://aws.amazon.com/ec2/insta
nce-types/
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Query
SELECT COUNT(*)
FROM S3.EXT_TABLE
GROUP BY…
...
Amazon S3
Exabyte-scale object storage
Data Catalog
Apache Hive Metastore
1
Query is optimized and compiled using
ML at the leader node. Determine what
gets run locally and what goes to
Amazon Redshift Spectrum
2
Query plan is sent to
all compute nodes
3
Compute nodes obtain partition info
from Data Catalog; dynamically prune
partitions
4
Each compute node issues multiple
requests to the Amazon Redshift
Spectrum layer
5
Amazon Redshift Spectrum nodes
scan your S3 data
6
7
Amazon Redshift
Spectrum
projects, filters,
joins and
aggregates
Final aggregations and joins
with local Amazon Redshift
tables done in-cluster
8
Result is sent back to client9 Leader Node
Compute
Nodes
10 GigE
(HPC)
JDBC/ODBC
SQL Clients /
BI Tools
Redshift Spectrum Fleet
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Redshift Spectrum seamlessly integrates with
your existing SQL & BI apps
• Support for complex joins, nested queries and
window functions
• Support for data partitioned in S3 by any key
• Date, Time, and any other custom keys (e.g. Year,
Month, Day, Hour)
• Leverages Amazon Glue’s data catalog or an
Amazon EMR Hive Metastore
no data loading required
read different file
formats
read compressed files
read encrypted files
ansi sql
https://docs.amazonaws.cn/en_us/redshift/latest/dg/c-using-
spectrum.html
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Amazon Athena, Redshift, & EMR have some shared analytical &
data lake use cases, but they each address different needs &
scenarios
• Amazon Redshift provides the fastest query performance for
enterprise reporting & business intelligence workloads, particularly
those involving extremely complex SQL with multiple joins and
sub-queries. Redshift also supports querying an S3 data lake &
joins between S3 data with local cluster data
• Amazon EMR makes it simple & cost effective to run Hadoop,
Spark, & Presto. EMR is flexible - you can run custom applications
and code, and define specific compute, memory, storage, and
application parameters to optimize your analytic requirements
• Amazon Athena is a standalone service that provides the easiest
way to run data exploration and discovery queries, as well as
analytical queries on data lakes, geospatial data, and service logs
without the need to setup or manage any servers
When is Redshift strongly recommended
over Athena?
• Latency has to be sub-second;
Redshift employs multiple caches &
an optimized query planner
• Data and workloads require data
warehouse
• Data is highly-relational (e.g.
normalized data that would be
difficult or otherwise
disadvantageous for the use
case to de-normalize)
• Data has a transactional nature
to it (e.g. data gets updated)
• Workloads involve many,
complex joins
• Workloads involve joins between
data warehouse data & an S3 data
lake – use Redshift (Redshift
Spectrum)
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Redshift is a fully ACID and ANSI SQL Compliant Data
Warehouse
• Use cases relying on indexes can alternatively achieve
fast query performance through parallelism, and
efficient data storage & I/O
• Table distribution styles, data compression, and sort
keys significantly impact parallelism, and efficient data
storage and I/O
• Redshift creates one database by default, but other
databases can be created (Note: having multiple
databases could lead to one DB monopolizing the
cluster’s resources)
• Databases are autonomous units in Redshift – i.e.
queries can join tables within a single database only
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Redshift: Popular Data Models
STAR
Highly
Denormalized
Redshift can be used with a number of data models including…
Snowflake
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Row storage (e.g. MySQL): all row fields are stored
together on disk (typically in a sequential file)
• Accessing a column (example: scanning SSN of all
residents) with row storage:
• Scan entire table
• Resultant unnecessary I/O and caching overhead
• Column storage (e.g. Amazon Redshift): each table
column is stored separately on disk (typically in a
separate file or set of files)
• Accessing column (example: scanning SSN of all
residents) with columnar storage:
• Only scan blocks for relevant column(s)
• Significant less I/O
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
CREATE TABLE deep_dive (
aid INT --audience_id
,loc CHAR(3) --location
,dt DATE --date
);
Given the following table definition and data for the deep_dive table, how will a
simple SQL query behave in a row-based data store, and then in a column-based
store?
SELECT min(dt) FROM deep_dive;
Row-based storage behavior
• Need to read everything
• Excessive & unnecessary
I/O
Column-based storage
behavior
• Orelevantnly scan blocks
for column
• Significantly less I/O
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Redshift is a columnar database,
which means data on disk is
physically organised by column
• Column data is stored to 1MB
immutable blocks; a full block can
contain as little as one value or as
many as millions of values
• Each slice stores a set of blocks
that contain a range of the values
for each column
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Column-stores compress very nicely
• Each value in a single column is the same data type
• Likely to have repeating values in a single column
• Redshift can typically achieve between 3x-4x data
compression ratios
• Compression reduces storage requirements, but
also improves performance by reducing I/O
• Columns grow and shrink independently in
Redshift Note:
In Redshift jargon, “Column Encoding” refers
to “Compression”
Redshift typically
achieves 3x-4x
data
compression
ratios
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Redshift supports a number of compression
algorithms (e.g. LZO, ZSTD, RUNLENGTH, etc.)
• Compression algorithms can achieve different
compression ratios for different data types
• Use PG_TABLE_DEF to verify/view the current
encoding applied to each column in a table:
SELECT * FROM PG_TABLE_DEF WHERE
SCHEMANAME = ‘myschema’ AND TABLENAME =
‘mytable’
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Columnar compression is automatically and intelligently applied
by the COPY command to empty tables
• Redshift’s ANALYZE COMPRESSION command will analyze an
existing table and recommend the best compression settings
• Compress everything except sort key columns
• In some cases, RAW (no compression) is the best compression
option (e.g. sparse columns or relatively small tables: ~10k rows)
• Redshift’s Column Encoding Utility automates the use of the
ANALYZE COMPRESSION command with a data migration to
change compression in-place
Note: Beware cases where you’ve tested COPY
with a small number of rows before doing a
full load; COPY will not be re-evaluate on
non-empty tables
ANALYZE COMPRESSION
[ [ table_name ]
[ ( column_name [, ...] ) ] ]
[COMPROWS numrows]
COMPUDATE
PRESET
Column compression is set based on column’s data
type; no data is sampled
COMPUDATE [ON]
Best column compression is determined & set by
applying different compression codecs on sample
set of column data
COMPUDATE OFF Skips any compression analysis
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Redshift is a distributed database with a single leader
and one or more compute nodes, where data is stored on
compute nodes or on Amazon S3
• Distribution Style - a table property which dictates how
that table’s data is distributed on internal storage
• Distribution goals
• Distribute data evenly for parallel processing
• Ensure each node has the same amount of data
• Minimize data movement during query processing
Data Distribution Tips
• A sub-optimal data distribution can
lead to data skew and poor query
performance; if unsure which
distribution style to choose for a
table, let Redshift pick for you. #auto
• Redshift’s Column Encoding Utility
can be used to change a table’s
distribution style
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Four distribution styles to choose from in Redshift
A column value is hashed, and the
same hash value is placed on the
same slice
KEY
Full table data is placed on each
compute node’s first slice
ALL
Data is evenly distributed across all
slices using a round-robin distribution
EVEN
Default option; Redshift starts the
table with ALL, but switches the table
to EVEN when the table grows larger
AUTO
Data Distribution Tips
• Consider using the ALL distribution
style for all infrequently-modified
small tables (~3 million rows or less)
• Distributions keys should have high
cardinality to avoid data skew and
“hot” nodes
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Good distribution keys are frequently joined to other
tables (e.g. fact table joined with a dimension table)
• High cardinality
• High frequency of values relative to overall row
count
select count(distinct <my_column>) unique_values,
count(9) total_rows from <my_table>;
• Low skew
• Each unique value in the column appears the same
number of times as every other value
• Use a date column only if cardinality is high enough, and
queries don’t typically filter on a very narrow date period
(to avoid workload skew among the node slices)
Data Distribution Tips
Use the query below to see the
distribution of unique values for your key
column (an even distribution is better)
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Zone Maps are minimum-and-maximum values for each
block of data
• Zone maps are stored in-memory and automatically
generated
• Zone Maps allow Redshift to effectively prune blocks
that cannot contain data needed for a given query
• Minimizes unnecessary I/O
• Along with sort keys, zone maps play a crucial role in
enabling range-restricted scans to prune blocks and
reduce I/O
1MB
Bloc
k
MIN
MAX
2001052
32001052
7
Block 42334
sales_dt
412.07
1269.33
price
…
…
…
##
##
Col
n
Redshift
stores
data in
1MB
blocks
1MB
Bloc
k
1MB
Bloc
k
ZoneMap
Block 863 Block
n
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Redshift Sorting
Sort keys can be added to a table by
specifying the SORTKEY table property
on one or more columns
• Redshift uses sort keys to physically order data on
disk
• In combination with zone maps, sort keys enable
range-restricted scans to prune blocks and reduce
I/O
• Sort keys combined with zone maps function like
an index for a given set of columns
• Sort keys benefit MERGE JOIN performance with a
much faster sort
• Redshift supports two types of sort keys
• Compound Sort Key (default)
• Interleaved Sort Key
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Optimal Sort Key
• Should consist of columns most commonly found in
WHERE clause filter predicates
• Extremely common for sort key to be a date
• Compound Sort Key Tips
• Column order matters – no skip scanning
• Order columns by lowest cardinality to highest, if
possible
• Define four or less sort key columns—more will result in
marginal gains and increased ingestion overhead
• If your table is frequently joined, then include the DISTKEY in
the sort key as the first column
• A column that is CAST() to be joined or filtered, will not be
used as a sort key (e.g. casting DATE to TIMESTAMPTZ); modify
underlying data & then set this value as sort key
• Sort keys are less beneficial on small tables
Redshift Sorting
• Columns added to a sort key after a
high-cardinality column are not
effective
• With an established workload, the
Redshift GitHub has scripts to help you
find sort key suggestions
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
SELECT count(*)
FROM deep_dive
WHERE dt = '06-09-2017';
MIN: 01-JUNE-2017
MAX: 06-JUNE-2017
MIN: 07-JUNE-2017
MAX: 12-JUNE-2017
MIN: 13-JUNE-2017
MAX: 21-JUNE-2017
MIN: 21-JUNE-2017
MAX: 30-JUNE-2017
Sorted by date
MIN: 01-JUNE-2017
MAX: 20-JUNE-2017
MIN: 08-JUNE-2017
MAX: 30-JUNE-2017
MIN: 12-JUNE-2017
MAX: 20-JUNE-2017
MIN: 02-JUNE-2017
MAX: 25-JUNE-2017
Unsorted table
Zone maps and sort
keys can serve as a
significant optimization
by reducing the number
of blocks examined (and
therefore I/O) during
query execution
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Redshift supports the TEMPORARY keyword on CREATE
TABLE, CREATE TABLE AS, and through the #<NAME>
marker on SELECT
SELECT ... INTO #MY_TEMP_TABLE FROM ...
• Temporary table characteristics
• Stored like all other Redshift tables, but only have
a lifetime of the session (dropped on session
termination)
• Default to no columnar compression & even
distribution
• Do not have statistics by default
Redshift Temporary Tables
Define temporary tables with columnar
compression and an appropriate
distribution style to increase performance
This often is the worst
possible configuration for
table storage,
Tip: Define temporary tables
as you would a permanent
table
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Capabilities
• Temp tables can be used exactly as permanent tables
would in ETL jobs or analytics
• Temp tables can participate in complex/multi transactions
• Temp tables exhibit faster I/O (not mirrored to other
nodes)
• Can COPY and UNLOAD temporary tables
• SELECT INTO # does not provide the ability to set
DISTSTYLE or column encoding
• Best Practices
• Avoid the use of SELECT INTO # (use explicit CREATE
TEMPORARY TABLE (AS) statements instead)
• Include column encoding settings on CREATE command
• Include distribution keys or style when creating temp tables
• Compute statistics when creating large temp tables as part
of an ETL process
Redshift Temporary Tables
Create a temporary table that is LIKE
another table so that it inherits the parent
table’s column definitions, distribution
style and sort keys
create temp table
temp_tbl
(like parent_tbl);
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
SET DW
Sort Keys: Ensure sort keys exist to facilitate filters in the where clauseS
Encoding (compression): Reduced I/O improves query performanceE
Table Maintenance (vacuum, analyze): Current table statistics increase
sort key effectiveness, and table defragmentation reduces wasted
storage while improving query performance
T
Data Distribution: Ensure distribution keys exist to facilitate most
common joinsD
Workload Management: Machine learning algorithms profile queries to
place them in appropriate queue with the appropriate resourcesW
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• A table is a table, right? Nope – with Data Lakes,
tables are collections of files
• Data Lake file types have a huge influence on
performance of Redshift Spectrum queries
• Best Practices
• Number of files in Data Lake should be a multiple of your
Redshift slice count (general best practice)
Redshift Spectrum can automatically split Parquet, ORC, text-
format, and Bz2 files for optimal processing
• File sizes should be in the range 64MB – 512MB
• Files should be of uniform size (especially for files that
can’t be automatically split by Redshift Spectrum such as
Avro and Gzip) to avoid execution skew
Redshift/Data Lake Interactions
A data lake is a centralized repository that
allows you to store all your structured and
unstructured data at any scale.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Understanding query mechanics can maximize the
work done by Redshift Spectrum
• Do as much as possible on Redshift Spectrum
before bringing data back to your cluster
• Data Lake Best Practices for Redshift Spectrum
• Use Data Lake file formats that are optimized for
read by Redshift Spectrum (and Athena!)
• ORC and Parquet apply columnar encoding, similar
to how data is stored inside Redshift
• Redshift Spectrum can also work with Avro, CSV and
JSON data, but these files are *much* larger on S3
than ORC/Parquet
Redshift/Data Lake Interactions
Redshift Spectrum is a feature of Redshift
that enables queries to reference external
tables
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Partitions should be based on:
• Frequently filtered columns (either through join or where clause)
• Business Units
• Business Groups (user cohorts, application names, business units,
etc.)
• Date & Time
• Consider how your users query data
• Do they look month by month, or current month and year vs
previous year for same month, etc.
• Do they understand the columns you have created?
• Date based partition columns have a type
• Full dates included in a single value may be formatted or not
(yyyy-mm-dd or yyyymmdd)
• Formatted dates can only be strings
• Either type of date needs to consider ordering (date=dd-mm-yyyy
cannot be used in order by clause, but date=yyyy-mm-dd can!)
Redshift/Data Lake Interactions
Open file formats such as Parquet
and ORC are optimal for
Redshift/Data Lake interactions,
because of their columnar structure
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Keep data with similar security model in the same prefix:
s3://mybucket/data
• Application or business unit prefixes can be helpful:
s3://mybucket/data/marketing
• Each table resides in its own prefix:
s3://mybucket/data/marketing/impressions
• Add high level business unit partitions:
s3://mybucket/data/marketing/impressions/application=flux_capacitor
s3://mybucket/data/marketing/impressions/application=cold_fusion
• Add dates:
s3://mybucket/data/marketing/impressions/application=flux_capacitor/date=20180122
s3://mybucket/data/marketing/impressions/application=cold_fusion/date=20180123
OR
s3://mybucket/data/marketing/impressions/application=flux_capacitor/yyyy=2018/mm=01/dd=22
Redshift Spectrum extends the same
MPP principle used by Redshift clusters,
to query external data, using multiple
Redshift Spectrum instances as needed
to scan files. Place the files in a separate
folder for each table.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Workload management (WLM) is a feature that helps
manage workloads and avoid short, fast-running queries
getting stuck in queues behind long-running queries
Three WLM methods that are complimentary to each other
Redshift Workload Management
The amount of memory available to a
query is a function of
• WLM queue where it runs
• Percentage of memory assigned to
the WLM queue
• Number of query slots being
consumed by the query
Queues
(basic WLM)
WLM always assigns every query executed in Redshift to a
specific queue on the basis of user group, query group, or
WLM rules (e.g. [return_row_count > 1000000])
Short-Query
Acceleration
(SQA)
Redshift uses machine learning to determine what
constitutes a “short” running query in your cluster
“Short” running queries are then automatically identified &
run immediately in short-query queue if queuing occurs
Concurrency
Scaling
Redshift uses machine learning to predict queuing in your
cluster and when queuing occurs, transient Amazon
Redshift clusters are added to your cluster where queries
are routed for execution
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Default WLM setup
• One default WLM queue: concurrency level of five (enables up to 5
queries to run concurrently) and no timeout
• Auto WLM enabled (auto query concurrency & memory allocation)
• One superuser queue: concurrency level of one and no timeout
• SQA enabled (enabled/disabled via checkbox in Redshift console)
• Concurrency scaling disabled (enabled/disabled via Redshift
console)
Customizing WLM
• Customize WLM queues via a few clicks on the Redshift console
• Up to 8 custom queues are allowed in a Redshift cluster
• WLM queues have four main “levers”
• Concurrency level (aka “query slots”)
• Memory allocation (%)
• Targets (i.e. user, query groups, or query monitoring rules)
• Timeout (ms)
Redshift WLM Queue Setup via Redshift Console
1. Click on Parameter Groups in navigation pane and
choose Create Cluster Parameter Group
2. Click Add Queue button to add a new WLM queue
3. Associate parameter group with your cluster
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Auto WLM
Auto WLM is enabled by default when the default
parameter group is used, and must be explicitly
enabled when a custom parameter group is used.
Auto WLM can be enabled in a customer parameter
group through the Amazon Redshift console by
choosing Switch WLM mode and then
choosing Auto WLM. With this choice, one queue is
used to manage queries, and
the memory and concurrency on main fields are
both set to auto.
When Auto WLM is not enabled, manual WLM
requires you to specify values for query
concurrency and memory allocation.
• Automatic workload management (“Auto WLM”) lets
Amazon Redshift automatically manage query concurrency
and memory allocation
• Auto WLM could create up to eight queues with each
queue having a priority
• Auto WLM will automatically determine the amount of
resources that queries need, and adjust the concurrency
based on the workload
• Concurrency is set lower when queries requiring large
amounts of resources are in the system (e.g. hash joins
between large tables)
• Concurrency is set higher when lighter queries (e.g. inserts,
deletes, scans, or simple aggregations) are submitted
• Auto WLM & SQA work together to allow short running and
lightweight queries to complete even while long running,
resource intensive queries are active
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Redshift Query Priorities
There are six possible query priorities
The CRITICAL priority is a higher priority
than HIGHEST and is available to superusers.
To set this priority, you can use the
functions:
• CHANGE_QUERY_PRIORITY
• CHANGE_SESSION_PRIORITY
• CHANGE_USER_PRIORITY
Only one CRITICAL query can run at a time
• WLM queues can be defined with a specific priority (relative
importance) & queries will inherit a queue’s priority
• Administrators can use priorities to prioritize different
workloads (e.g. ETL, Ingestion, Audit, BI, etc.)
• Amazon Redshift uses priority when letting queries into the
system to determine amount of resources allocated to query
• Predictable performance for high priority workload comes at
the cost of other, lower priority workloads
Lower priority queries are not starved, but might run longer because waiting
behind more important queries or running with fewer resources
• Can enable concurrency scaling to maintain predictable
performance for lower priority workloads
• Auto WLM automatically creates and assigns queues
corresponding to priorities
• NORMAL (default)
• LOW
• LOWEST
• CRITICAL (admins)
• HIGHEST
• HIGH
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Query Monitoring Rules (QMR) are intended to help
automatically handle runaway (poorly written) local or
Spectrum queries
• QMR can be defined for a WLM queue via the Redshift console
(max 25 rules for all WLM queues) and each rule can take one
of four actions for offending queries
• LOG – log info about the query in STL_WLM_RULE_ACTION table
• ABORT – log the action and terminate the query
• HOP – log the action and move query to another appropriate
queue if one exists, otherwise terminate it
• PRIORITY - Change query priority (only available with Auto-WLM)
• Common QMR use cases
• Guard against wasteful resource utilization, runaway costs, etc.
• Log resource-intensive queries
Redshift QMR
Each query monitoring rule includes up to
three conditions, or predicates, and one
action, similar to an if-then statement:
if {predicate(s)} then {action}
A predicate consists of a metric, operator,
and value (e.g. rows_scanned > 1000000)
If all of the predicates for any rule are met,
that rule's action is triggered. Possible rule
actions are log, hop, and abort
“That User”: every DB has that user that loves to execute queries
with unnecessarily expensive behavior (e.g. Cartesian product)
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Concurrency Scaling is a Redshift feature that automatically
adds transient clusters to your cluster within seconds to
handle concurrent requests with consistently fast
performance
• Free for over 97% of Redshift customers
• For every 24 hours that your main cluster is in use, you
accrue a one-hour credit for Concurrency Scaling. Beyond
that, customers are billed on a per-second basis per
transient cluster
• Applies to Redshift local & spectrum queries
• Email notifications are issued when concurrency scaling
occurs
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Elastic Resize
• Existing cluster is modified to add or remove nodes
• During the actual resize, existing connections to the Redshift
cluster are put on hold, no new connections are accepted until the
resize finishes, and the cluster is unavailable for querying
• Typically completes within ~15 minutes or less
Resizing a cluster is easily
achieved with a few clicks on
the Redshift console, and there
are two resizing approaches to
choose from
Classic Resize
• Redshift cluster can be reconfigured to a different node
count and instance type
• Involves streaming all data from the original Redshift
cluster to a newly created Redshift cluster with the new
configuration. During the resize, the original Redshift
cluster is in read-only mode, and the customer is only
charged for one cluster
• Depending on data size, may take several hours to
complete
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Redshift provides a Postgres compliant driver endpoint
• Two driver options for connecting to Redshift
• JDBC/ODBC Postgres driver
• Proprietary Redshift driver
• 35% faster than Postgres Driver
• Support for IAM SSO
• Like other Postgres clients, you connect to Redshift as a
database user, using a hostname, port, and database
name (viewable on Redshift console)
Examples
jdbc:[redshift|postgresql]://endpoint:port/databaseName
• jdbc:redshift://demo.dsi9zn4ccku4.us-east-
1.redshift.amazonaws.com:8192/pocdb
• jdbc:postgresql://demo.dsi9zn4ccku4.us-east-
1.redshift.amazonaws.com:8192/pocdb
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Query Editor is a web-based query interface to run single SQL
statement queries in Amazon Redshift cluster directly from your
AWS Management console without having to install & setup an
external JDBC/ODBC client
• Query results viewable in console & downloadable to CSV file
• Queries can be saved for convenient repeat execution
• Query execution steps & times can be viewed to isolate
bottlenecks & optimize queries
• Other considerations
• Max 50 Query Editor users at the same time per cluster
• Query Editor applicable for short queries (runtime < 10 min)
• Query result sets are paginated with 100 rows per page
• Transactions & Enhanced VPC Routing are not supported
• Access to Query Editor requires specific IAM permissions
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• By default, Amazon Redshift clusters are locked down
so nobody has access
• To grant other users inbound access, must associate
the Redshift cluster with a security group
• Use security groups to authorize other VPC Security
Groups, or CIDR Blocks to connect
• VPC Security Groups should be used for AWS
Service & EC2 connectivity, or Cross Account Access
(recommended approach)
• CIDR Blocks should be used for connections from
on-prem/other side of Customer Gateway
• Having separate cluster security groups per application
or cluster is a good practice
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Collections of database tables and other
database objects (similar to namespaces)Schemas
Named user accounts that can connect
to a databaseUsers
Collections of users that can be
collectively assigned privileges for easier
security maintenance
Groups
Redshift Security
In Amazon Redshift, schemas are
similar to operating system
directories, except that schemas
cannot be nested
Users can be granted access to a
single schema or to multiple
schemas
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Redshift Security
Create a view to conceal row or
columns for which a user or group of
users are not authorized to access
CREATE VIEW secure_view AS
SELECT col1, col3 from
underlying_table;
GRANT SELECT ON secure_view
TO GROUP restricted_group;
REVOKE ALL ON
underlying_table FROM GROUP
restricted_group;
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Security has always been priority-one at AWS, & Amazon Redshift is no exception
End-to-end data encryption
IAM integration & integration with SAML IdP’s for Federation (SSO)
Amazon VPC for network isolation
Database security model (users, groups, privileges)
Audit logging and notifications
Certifications that include SOC 1/2/3, PCI-DSS, FedRAMP, & HIPAA
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
require_ssl true
Redshift Security
SSL encryption can be used with client
connections to Amazon Redshift
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Redshift can encrypt data at rest (data stored locally or
in S3 backups) using AES algorithm with 256 bit key
• Key management can be performed by Redshift, AWS
KMS, or your HSM
• You control rotation of encryption keys via API
• Redshift blocks of data backed up to S3 are encrypted
using the cluster’s encryption key
• Redshift uses hardware based crypto modules to keep
impact to performance to ~ 20% or less
• Redshift clusters that need to comply with PCI, SOX,
HIPAA must be configured with encryption enabled
Redshift Security
Redshift clusters can be configured to
encrypt data at rest through a simple
checkbox in the Redshift console
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Audit Logs
• Stored in three log files: Connection log,
User log, and User Activity log
• Must be explicitly enabled
• Stored indefinitely unless S3 lifecycle rules
are in place to archive or delete files
automatically
• Cluster restarts don't affect audit logs in S3
• Access to log files does not require access
to the Redshift database
• S3 charges apply
System (STL) Tables
• Stored in multiple tables, including
SVL_STATEMENTTEXT and
STL_CONNECTION_LOG
• Automatically available on every node in
the data warehouse cluster
• Log history is stored for two to five days,
depending on log usage and available disk
space
• Access to STL tables requires access to the
Amazon Redshift database
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
CloudTrail Logs
• Stored indefinitely in S3, unless S3 lifecycle rules are in
place to archive or delete files automatically
• CloudTrail captures last 90 days of Management Events by
default without charge (available using CloudTrail APIs or
via the Console)
• Maintaining longer history of events is possible but
additional deliveries may apply, including S3 charges
• Access to log files does not require access to the Redshift
database
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon RDS
Amazon
Athena
Amazon
EMR
AWS Glue
Amazon
SageMaker
Amazon S3
Amazon
QuickSight
Amazon
Redshift
AWS Glue
Crawlers
Web app data
On-premises data
Streaming data
Other databases AWS Glue
Data Catalog
1. Crawlers scan your
data sets and populate
the Glue Data Catalog
2. The Glue Data
Catalog serves as a
central metadata
repository
3. Once catalogued in Glue,
your data is immediately
available for analytics
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
ETL
processS3
Spectrum
Athena
Marketing
data source
Other source
systems
S3
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
aws.amazon.com/
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Ricardo Serafim
Analytics Specialist Solutions Architect
rserafim@amazon.com

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction to PolyBase
Introduction to PolyBaseIntroduction to PolyBase
Introduction to PolyBaseJames Serra
 
Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Jonathan Seidman
 
Snowflake for Data Engineering
Snowflake for Data EngineeringSnowflake for Data Engineering
Snowflake for Data EngineeringHarald Erb
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseData Con LA
 
Preparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/ML Preparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/ML Amazon Web Services
 
Qubole on AWS - White paper
Qubole on AWS - White paper Qubole on AWS - White paper
Qubole on AWS - White paper Vasu S
 
Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...
Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...
Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...Amazon Web Services
 
Big Data Analytics & Architecture
Big Data Analytics & ArchitectureBig Data Analytics & Architecture
Big Data Analytics & ArchitectureAnjani Phuyal
 
Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)James Serra
 
Architecting Data Lakes on AWS
Architecting Data Lakes on AWSArchitecting Data Lakes on AWS
Architecting Data Lakes on AWSSajith Appukuttan
 
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive
 
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...Amazon Web Services
 
Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI Holden Ackerman
 
Amazon big success using big data analytics
Amazon big success using big data analyticsAmazon big success using big data analytics
Amazon big success using big data analyticsKovid Academy
 
Building Your Data Lake on AWS - Level 200
Building Your Data Lake on AWS - Level 200Building Your Data Lake on AWS - Level 200
Building Your Data Lake on AWS - Level 200Amazon Web Services
 
Building with Purpose-Built Databases: Match Your workload to the Right Database
Building with Purpose-Built Databases: Match Your workload to the Right DatabaseBuilding with Purpose-Built Databases: Match Your workload to the Right Database
Building with Purpose-Built Databases: Match Your workload to the Right DatabaseAWS Summits
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMark Kromer
 
Chug building a data lake in azure with spark and databricks
Chug   building a data lake in azure with spark and databricksChug   building a data lake in azure with spark and databricks
Chug building a data lake in azure with spark and databricksBrandon Berlinrut
 
Turn Big Data Into Big Value On Informatica and Amazon
Turn Big Data Into Big Value On Informatica and AmazonTurn Big Data Into Big Value On Informatica and Amazon
Turn Big Data Into Big Value On Informatica and AmazonAmazon Web Services
 

Was ist angesagt? (20)

Introduction to PolyBase
Introduction to PolyBaseIntroduction to PolyBase
Introduction to PolyBase
 
Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013
 
Snowflake for Data Engineering
Snowflake for Data EngineeringSnowflake for Data Engineering
Snowflake for Data Engineering
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake House
 
Preparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/ML Preparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/ML
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Qubole on AWS - White paper
Qubole on AWS - White paper Qubole on AWS - White paper
Qubole on AWS - White paper
 
Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...
Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...
Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...
 
Big Data Analytics & Architecture
Big Data Analytics & ArchitectureBig Data Analytics & Architecture
Big Data Analytics & Architecture
 
Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)
 
Architecting Data Lakes on AWS
Architecting Data Lakes on AWSArchitecting Data Lakes on AWS
Architecting Data Lakes on AWS
 
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
 
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
 
Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI
 
Amazon big success using big data analytics
Amazon big success using big data analyticsAmazon big success using big data analytics
Amazon big success using big data analytics
 
Building Your Data Lake on AWS - Level 200
Building Your Data Lake on AWS - Level 200Building Your Data Lake on AWS - Level 200
Building Your Data Lake on AWS - Level 200
 
Building with Purpose-Built Databases: Match Your workload to the Right Database
Building with Purpose-Built Databases: Match Your workload to the Right DatabaseBuilding with Purpose-Built Databases: Match Your workload to the Right Database
Building with Purpose-Built Databases: Match Your workload to the Right Database
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics
 
Chug building a data lake in azure with spark and databricks
Chug   building a data lake in azure with spark and databricksChug   building a data lake in azure with spark and databricks
Chug building a data lake in azure with spark and databricks
 
Turn Big Data Into Big Value On Informatica and Amazon
Turn Big Data Into Big Value On Informatica and AmazonTurn Big Data Into Big Value On Informatica and Amazon
Turn Big Data Into Big Value On Informatica and Amazon
 

Ähnlich wie Immersion Day - Como simplificar o acesso ao seu ambiente analítico

Using AWS Purpose-Built Databases to Modernize your Applications
Using AWS Purpose-Built Databases to Modernize your ApplicationsUsing AWS Purpose-Built Databases to Modernize your Applications
Using AWS Purpose-Built Databases to Modernize your ApplicationsAmazon Web Services
 
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon RedshiftBDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon RedshiftAmazon Web Services
 
Migrating your traditional Data Warehouse to a Modern Data Lake
Migrating your traditional Data Warehouse to a Modern Data LakeMigrating your traditional Data Warehouse to a Modern Data Lake
Migrating your traditional Data Warehouse to a Modern Data LakeAmazon Web Services
 
Big Data@Scale_AWSPSSummit_Singapore
Big Data@Scale_AWSPSSummit_SingaporeBig Data@Scale_AWSPSSummit_Singapore
Big Data@Scale_AWSPSSummit_SingaporeAmazon Web Services
 
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018Amazon Web Services
 
Applying AWS Purpose-Built Database Strategy - SRV307 - Anaheim AWS Summit
Applying AWS Purpose-Built Database Strategy - SRV307 - Anaheim AWS SummitApplying AWS Purpose-Built Database Strategy - SRV307 - Anaheim AWS Summit
Applying AWS Purpose-Built Database Strategy - SRV307 - Anaheim AWS SummitAmazon Web Services
 
SRV307 Applying AWS Purpose-Built Database Strategy: Match Your Workload to ...
 SRV307 Applying AWS Purpose-Built Database Strategy: Match Your Workload to ... SRV307 Applying AWS Purpose-Built Database Strategy: Match Your Workload to ...
SRV307 Applying AWS Purpose-Built Database Strategy: Match Your Workload to ...Amazon Web Services
 
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAnalyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAmazon Web Services
 
Database Freedom. Database migration approaches to get to the Cloud - Marcus ...
Database Freedom. Database migration approaches to get to the Cloud - Marcus ...Database Freedom. Database migration approaches to get to the Cloud - Marcus ...
Database Freedom. Database migration approaches to get to the Cloud - Marcus ...Amazon Web Services
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesBuild Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesAmazon Web Services
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesBuild Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesAmazon Web Services
 
Building a Modern Data Warehouse - Deep Dive on Amazon Redshift
Building a Modern Data Warehouse - Deep Dive on Amazon RedshiftBuilding a Modern Data Warehouse - Deep Dive on Amazon Redshift
Building a Modern Data Warehouse - Deep Dive on Amazon RedshiftAmazon Web Services
 
AWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAmazon Web Services
 
Building with AWS Databases: Match Your Workload to the Right Database (DAT30...
Building with AWS Databases: Match Your Workload to the Right Database (DAT30...Building with AWS Databases: Match Your Workload to the Right Database (DAT30...
Building with AWS Databases: Match Your Workload to the Right Database (DAT30...Amazon Web Services
 
Databases - Choosing the right Database on AWS
Databases - Choosing the right Database on AWSDatabases - Choosing the right Database on AWS
Databases - Choosing the right Database on AWSAmazon Web Services
 
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...Amazon Web Services
 
AWS Purpose-Built Database Strategy: The Right Tool for The Right Job
AWS Purpose-Built Database Strategy: The Right Tool for The Right JobAWS Purpose-Built Database Strategy: The Right Tool for The Right Job
AWS Purpose-Built Database Strategy: The Right Tool for The Right JobAmazon Web Services
 
From raw data to business insights. A modern data lake
From raw data to business insights. A modern data lakeFrom raw data to business insights. A modern data lake
From raw data to business insights. A modern data lakejavier ramirez
 

Ähnlich wie Immersion Day - Como simplificar o acesso ao seu ambiente analítico (20)

Using AWS Purpose-Built Databases to Modernize your Applications
Using AWS Purpose-Built Databases to Modernize your ApplicationsUsing AWS Purpose-Built Databases to Modernize your Applications
Using AWS Purpose-Built Databases to Modernize your Applications
 
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon RedshiftBDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
 
Migrating your traditional Data Warehouse to a Modern Data Lake
Migrating your traditional Data Warehouse to a Modern Data LakeMigrating your traditional Data Warehouse to a Modern Data Lake
Migrating your traditional Data Warehouse to a Modern Data Lake
 
Big Data@Scale_AWSPSSummit_Singapore
Big Data@Scale_AWSPSSummit_SingaporeBig Data@Scale_AWSPSSummit_Singapore
Big Data@Scale_AWSPSSummit_Singapore
 
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018
 
Applying AWS Purpose-Built Database Strategy - SRV307 - Anaheim AWS Summit
Applying AWS Purpose-Built Database Strategy - SRV307 - Anaheim AWS SummitApplying AWS Purpose-Built Database Strategy - SRV307 - Anaheim AWS Summit
Applying AWS Purpose-Built Database Strategy - SRV307 - Anaheim AWS Summit
 
Managed NoSQL databases
Managed NoSQL databasesManaged NoSQL databases
Managed NoSQL databases
 
SRV307 Applying AWS Purpose-Built Database Strategy: Match Your Workload to ...
 SRV307 Applying AWS Purpose-Built Database Strategy: Match Your Workload to ... SRV307 Applying AWS Purpose-Built Database Strategy: Match Your Workload to ...
SRV307 Applying AWS Purpose-Built Database Strategy: Match Your Workload to ...
 
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAnalyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
 
AWS Database Services @ Scale
AWS Database Services @ ScaleAWS Database Services @ Scale
AWS Database Services @ Scale
 
Database Freedom. Database migration approaches to get to the Cloud - Marcus ...
Database Freedom. Database migration approaches to get to the Cloud - Marcus ...Database Freedom. Database migration approaches to get to the Cloud - Marcus ...
Database Freedom. Database migration approaches to get to the Cloud - Marcus ...
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesBuild Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesBuild Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
 
Building a Modern Data Warehouse - Deep Dive on Amazon Redshift
Building a Modern Data Warehouse - Deep Dive on Amazon RedshiftBuilding a Modern Data Warehouse - Deep Dive on Amazon Redshift
Building a Modern Data Warehouse - Deep Dive on Amazon Redshift
 
AWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scale
 
Building with AWS Databases: Match Your Workload to the Right Database (DAT30...
Building with AWS Databases: Match Your Workload to the Right Database (DAT30...Building with AWS Databases: Match Your Workload to the Right Database (DAT30...
Building with AWS Databases: Match Your Workload to the Right Database (DAT30...
 
Databases - Choosing the right Database on AWS
Databases - Choosing the right Database on AWSDatabases - Choosing the right Database on AWS
Databases - Choosing the right Database on AWS
 
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...
 
AWS Purpose-Built Database Strategy: The Right Tool for The Right Job
AWS Purpose-Built Database Strategy: The Right Tool for The Right JobAWS Purpose-Built Database Strategy: The Right Tool for The Right Job
AWS Purpose-Built Database Strategy: The Right Tool for The Right Job
 
From raw data to business insights. A modern data lake
From raw data to business insights. A modern data lakeFrom raw data to business insights. A modern data lake
From raw data to business insights. A modern data lake
 

Mehr von Amazon Web Services LATAM

AWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvemAWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvemAmazon Web Services LATAM
 
AWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e BackupAWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e BackupAmazon Web Services LATAM
 
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.Amazon Web Services LATAM
 
AWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvemAWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvemAmazon Web Services LATAM
 
AWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e BackupAWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e BackupAmazon Web Services LATAM
 
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.Amazon Web Services LATAM
 
Automatice el proceso de entrega con CI/CD en AWS
Automatice el proceso de entrega con CI/CD en AWSAutomatice el proceso de entrega con CI/CD en AWS
Automatice el proceso de entrega con CI/CD en AWSAmazon Web Services LATAM
 
Automatize seu processo de entrega de software com CI/CD na AWS
Automatize seu processo de entrega de software com CI/CD na AWSAutomatize seu processo de entrega de software com CI/CD na AWS
Automatize seu processo de entrega de software com CI/CD na AWSAmazon Web Services LATAM
 
Ransomware: como recuperar os seus dados na nuvem AWS
Ransomware: como recuperar os seus dados na nuvem AWSRansomware: como recuperar os seus dados na nuvem AWS
Ransomware: como recuperar os seus dados na nuvem AWSAmazon Web Services LATAM
 
Ransomware: cómo recuperar sus datos en la nube de AWS
Ransomware: cómo recuperar sus datos en la nube de AWSRansomware: cómo recuperar sus datos en la nube de AWS
Ransomware: cómo recuperar sus datos en la nube de AWSAmazon Web Services LATAM
 
Aprenda a migrar y transferir datos al usar la nube de AWS
Aprenda a migrar y transferir datos al usar la nube de AWSAprenda a migrar y transferir datos al usar la nube de AWS
Aprenda a migrar y transferir datos al usar la nube de AWSAmazon Web Services LATAM
 
Aprenda como migrar e transferir dados ao utilizar a nuvem da AWS
Aprenda como migrar e transferir dados ao utilizar a nuvem da AWSAprenda como migrar e transferir dados ao utilizar a nuvem da AWS
Aprenda como migrar e transferir dados ao utilizar a nuvem da AWSAmazon Web Services LATAM
 
Cómo mover a un almacenamiento de archivos administrados
Cómo mover a un almacenamiento de archivos administradosCómo mover a un almacenamiento de archivos administrados
Cómo mover a un almacenamiento de archivos administradosAmazon Web Services LATAM
 
Os benefícios de migrar seus workloads de Big Data para a AWS
Os benefícios de migrar seus workloads de Big Data para a AWSOs benefícios de migrar seus workloads de Big Data para a AWS
Os benefícios de migrar seus workloads de Big Data para a AWSAmazon Web Services LATAM
 

Mehr von Amazon Web Services LATAM (20)

AWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvemAWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvem
 
AWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e BackupAWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e Backup
 
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
 
AWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvemAWS para terceiro setor - Sessão 1 - Introdução à nuvem
AWS para terceiro setor - Sessão 1 - Introdução à nuvem
 
AWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e BackupAWS para terceiro setor - Sessão 2 - Armazenamento e Backup
AWS para terceiro setor - Sessão 2 - Armazenamento e Backup
 
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
AWS para terceiro setor - Sessão 3 - Protegendo seus dados.
 
Automatice el proceso de entrega con CI/CD en AWS
Automatice el proceso de entrega con CI/CD en AWSAutomatice el proceso de entrega con CI/CD en AWS
Automatice el proceso de entrega con CI/CD en AWS
 
Automatize seu processo de entrega de software com CI/CD na AWS
Automatize seu processo de entrega de software com CI/CD na AWSAutomatize seu processo de entrega de software com CI/CD na AWS
Automatize seu processo de entrega de software com CI/CD na AWS
 
Cómo empezar con Amazon EKS
Cómo empezar con Amazon EKSCómo empezar con Amazon EKS
Cómo empezar con Amazon EKS
 
Como começar com Amazon EKS
Como começar com Amazon EKSComo começar com Amazon EKS
Como começar com Amazon EKS
 
Ransomware: como recuperar os seus dados na nuvem AWS
Ransomware: como recuperar os seus dados na nuvem AWSRansomware: como recuperar os seus dados na nuvem AWS
Ransomware: como recuperar os seus dados na nuvem AWS
 
Ransomware: cómo recuperar sus datos en la nube de AWS
Ransomware: cómo recuperar sus datos en la nube de AWSRansomware: cómo recuperar sus datos en la nube de AWS
Ransomware: cómo recuperar sus datos en la nube de AWS
 
Ransomware: Estratégias de Mitigação
Ransomware: Estratégias de MitigaçãoRansomware: Estratégias de Mitigação
Ransomware: Estratégias de Mitigação
 
Ransomware: Estratégias de Mitigación
Ransomware: Estratégias de MitigaciónRansomware: Estratégias de Mitigación
Ransomware: Estratégias de Mitigación
 
Aprenda a migrar y transferir datos al usar la nube de AWS
Aprenda a migrar y transferir datos al usar la nube de AWSAprenda a migrar y transferir datos al usar la nube de AWS
Aprenda a migrar y transferir datos al usar la nube de AWS
 
Aprenda como migrar e transferir dados ao utilizar a nuvem da AWS
Aprenda como migrar e transferir dados ao utilizar a nuvem da AWSAprenda como migrar e transferir dados ao utilizar a nuvem da AWS
Aprenda como migrar e transferir dados ao utilizar a nuvem da AWS
 
Cómo mover a un almacenamiento de archivos administrados
Cómo mover a un almacenamiento de archivos administradosCómo mover a un almacenamiento de archivos administrados
Cómo mover a un almacenamiento de archivos administrados
 
Simplifique su BI con AWS
Simplifique su BI con AWSSimplifique su BI con AWS
Simplifique su BI con AWS
 
Simplifique o seu BI com a AWS
Simplifique o seu BI com a AWSSimplifique o seu BI com a AWS
Simplifique o seu BI com a AWS
 
Os benefícios de migrar seus workloads de Big Data para a AWS
Os benefícios de migrar seus workloads de Big Data para a AWSOs benefícios de migrar seus workloads de Big Data para a AWS
Os benefícios de migrar seus workloads de Big Data para a AWS
 

Kürzlich hochgeladen

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 

Kürzlich hochgeladen (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

Immersion Day - Como simplificar o acesso ao seu ambiente analítico

  • 1. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 2. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 3. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. discovery, interpretation communication meaningful patterns in data What is my revenue growth month by month? How is my marketing campaign working? Which age group had the most insurance claims? What is the crime rate by cities?
  • 4. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Organic revenue growth every 5 years 15 years live for 1,000x scale >10x grows 11 8 5 4 How do I provide democratized access to data to enable informed decisions while at the same time enforce data governance and prevent mismanagement of the data? more valuable Hadoop Elasticsearch Presto Spark Democratization of data Governance & control
  • 5. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Democratization of data Governance & control Hadoop Elasticsearch Presto Spark
  • 6. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 7. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. // // Broken view of your business and your customers
  • 8. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Any scale, concurrency, with low cost, high throughput & performance Data from new sources, streaming, batch, real-time Increasingly diverse types of data Democratization of data – usage by many people of various skills, make it easy run & operate Choice of tools, techniques, and applications I WANT SUPPORT FOR . . .
  • 9. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Lakes Provide Customers with what they want… Single source of truth in a single store (data lake) Flexibility to grow to any scale, with low costs Choice to analyze data in a variety of ways Avoid lock-in, store data in open formats Democratize analytics with security & governance
  • 10. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Marketplace Redshift Data warehousing EMR Hadoop + Spark Athena Interactive analytics Kinesis Analytics Real- time Elasticsearch service Operational Analytics RDS MySQL, PostgreSQL, MariaDB, Oracle, SQL Server Aurora MySQL, PostgreSQL QuickSight SageMaker DynamoDB Key value, Document ElastiCache Redis, Memcached Neptune Graph Timestream Time Series QLDB Ledger Database S3/Glacier Glue ETL & Data Catalog Lake Formation Data Lakes Database Migration Service | Snowball | Snowmobile | Kinesis Data Firehose | Kinesis Data Streams | Data Pipeline | Direct Connect Data Movement AnalyticsDatabases Business Intelligence & Machine Learning Data Lake Managed Blockchain Blockchain Templates Blockchain Comprehend Rekognition Lex Transcribe DeepLens 250+ solutions 730+ Database solutions 600+ Analytics solutions 25+ Blockchain solutions 20+ Data lake solutions 30+ solutions RDS on VMWare #awsanalytics / #awsbuilders
  • 11. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. process more than 2 Exabytes of data Most popular Fastest More than 15K customers
  • 12. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Based on the cloud DW benchmark derived from TPS-DS 30 TB dataset, 4-node cluster Redshift Vendor 1 Vendor 2 Queries Per Hour (Higher is better)
  • 13. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Fastest Most cost-effective up to 75% $758,845 average annual benefits per TB per year $319,300 higher revenue per 100TB per year 469% Less than the #2 cloud DW with on-demand pricing and 75% less with Reserved Instances (RIs) *based on IDC’s “ROI of Amazon Redshift paper”, 2017
  • 14. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Fastest Most cost-effective Integrates with your data lake
  • 15. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Forrester Wave™ Big Data Warehouse Q4 2018 AWS rated top in the leader bracket and received a score of 5/5 (the highest score possible) in a number of areas such as Use Cases, Roadmap, Market Awareness, and Ability to Execute AWS positioned as a Leader in the Gartner Magic Quadrant for Data Management Gartner Magic Quadrant, 2018 Solutions for Analytics a data warehouse that extends to the data lake data warehousing that integrates seamlessly with the data lake • Fully-managed • Massively parallel OLAP architecture scales to query GBs to EBs of data • Automatic scaling • Secure • Highly-rated most popular
  • 16. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Five Key Highlights • Amazon Redshift has a service SLA of 99.9% • Amazon Redshift mirrors data onto a second node • Amazon Redshift automatically detects and recovers from a disk or node failure • Amazon Redshift automatically backups your data • Amazon Redshift can automatically replicate your backups to another AWS region (e.g. DR site)
  • 17. Load Unload Backup Restore metadata processing nodes parallel restore Amazon Simple Storage Service ) SQL clients/BI tools 128GB RAM 16TB disk 16 cores JDBC/ODBC 128gb ram 16TB disk 16 coresCompute node 128gb ram 16TB disk 16 coresCompute node 128gb ram 16TB disk 16 coresCompute node Leader node Amazon S3 ... 1 2 3 4 N Amazon Redshift Spectrum Load Query
  • 18. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • A compute node is partitioned into either 2 or 16 slices; a slice can be thought of as a “virtual compute node” • Each slice is allocated a portion of the compute node's memory and disk space, where it processes a portion of the workload assigned to the compute node by the leader node • The leader node manages distributing data to the slices and apportions the workload for any queries or other database operations to the slices • Slices are Redshift’s Symmetric Multiprocessing (SMP) mechanism – they work in parallel to complete operations Compute Node
  • 19. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. A Redshift cluster can have up to128 ds2.8xlarge nodes (2 petabytes of local storage) and can support exabytes of data with its Redshift Spectrum feature Note: AWS reserves the right to change instance types at any time. For example, DC1 is a DEPRECATED dense-compute instance type that SHOULD NOT BE USED. Instead, upgrade from DC1 to DC2 for the same price with better performance Instance Family Instance Type Storage Memory # CPUs # Slices $ Disk type Capacity Dense- Compute DC2 large NVMe SSD 160 GB 16 GB 2 2 $ DC2 8xlarge NVMe SSD 2.56 TB 244 GB 32 16 $$ Dense-Storage DS2 xlarge Magnetic 2 TB 32 GB 4 2 $ DS2 8xlarge Magnetic 16 TB 244 GB 36 16 $$ • Dense-Compute (DC2) Nodes – Solid state disks • Dense-Storage (DS2) Nodes – Magnetic disks • The key difference between instance types is the compute/storage ratio and storage latency (SSD vs Magnetic storage) Redshift instance types are named according to their corresponding Amazon EC2 instance types; for more information, visit https://aws.amazon.com/ec2/insta nce-types/
  • 20. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Query SELECT COUNT(*) FROM S3.EXT_TABLE GROUP BY… ... Amazon S3 Exabyte-scale object storage Data Catalog Apache Hive Metastore 1 Query is optimized and compiled using ML at the leader node. Determine what gets run locally and what goes to Amazon Redshift Spectrum 2 Query plan is sent to all compute nodes 3 Compute nodes obtain partition info from Data Catalog; dynamically prune partitions 4 Each compute node issues multiple requests to the Amazon Redshift Spectrum layer 5 Amazon Redshift Spectrum nodes scan your S3 data 6 7 Amazon Redshift Spectrum projects, filters, joins and aggregates Final aggregations and joins with local Amazon Redshift tables done in-cluster 8 Result is sent back to client9 Leader Node Compute Nodes 10 GigE (HPC) JDBC/ODBC SQL Clients / BI Tools Redshift Spectrum Fleet
  • 21. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Redshift Spectrum seamlessly integrates with your existing SQL & BI apps • Support for complex joins, nested queries and window functions • Support for data partitioned in S3 by any key • Date, Time, and any other custom keys (e.g. Year, Month, Day, Hour) • Leverages Amazon Glue’s data catalog or an Amazon EMR Hive Metastore no data loading required read different file formats read compressed files read encrypted files ansi sql https://docs.amazonaws.cn/en_us/redshift/latest/dg/c-using- spectrum.html
  • 22. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Amazon Athena, Redshift, & EMR have some shared analytical & data lake use cases, but they each address different needs & scenarios • Amazon Redshift provides the fastest query performance for enterprise reporting & business intelligence workloads, particularly those involving extremely complex SQL with multiple joins and sub-queries. Redshift also supports querying an S3 data lake & joins between S3 data with local cluster data • Amazon EMR makes it simple & cost effective to run Hadoop, Spark, & Presto. EMR is flexible - you can run custom applications and code, and define specific compute, memory, storage, and application parameters to optimize your analytic requirements • Amazon Athena is a standalone service that provides the easiest way to run data exploration and discovery queries, as well as analytical queries on data lakes, geospatial data, and service logs without the need to setup or manage any servers When is Redshift strongly recommended over Athena? • Latency has to be sub-second; Redshift employs multiple caches & an optimized query planner • Data and workloads require data warehouse • Data is highly-relational (e.g. normalized data that would be difficult or otherwise disadvantageous for the use case to de-normalize) • Data has a transactional nature to it (e.g. data gets updated) • Workloads involve many, complex joins • Workloads involve joins between data warehouse data & an S3 data lake – use Redshift (Redshift Spectrum)
  • 23. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Redshift is a fully ACID and ANSI SQL Compliant Data Warehouse • Use cases relying on indexes can alternatively achieve fast query performance through parallelism, and efficient data storage & I/O • Table distribution styles, data compression, and sort keys significantly impact parallelism, and efficient data storage and I/O • Redshift creates one database by default, but other databases can be created (Note: having multiple databases could lead to one DB monopolizing the cluster’s resources) • Databases are autonomous units in Redshift – i.e. queries can join tables within a single database only
  • 24. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Redshift: Popular Data Models STAR Highly Denormalized Redshift can be used with a number of data models including… Snowflake
  • 25. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Row storage (e.g. MySQL): all row fields are stored together on disk (typically in a sequential file) • Accessing a column (example: scanning SSN of all residents) with row storage: • Scan entire table • Resultant unnecessary I/O and caching overhead • Column storage (e.g. Amazon Redshift): each table column is stored separately on disk (typically in a separate file or set of files) • Accessing column (example: scanning SSN of all residents) with columnar storage: • Only scan blocks for relevant column(s) • Significant less I/O
  • 26. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. CREATE TABLE deep_dive ( aid INT --audience_id ,loc CHAR(3) --location ,dt DATE --date ); Given the following table definition and data for the deep_dive table, how will a simple SQL query behave in a row-based data store, and then in a column-based store? SELECT min(dt) FROM deep_dive; Row-based storage behavior • Need to read everything • Excessive & unnecessary I/O Column-based storage behavior • Orelevantnly scan blocks for column • Significantly less I/O
  • 27. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Redshift is a columnar database, which means data on disk is physically organised by column • Column data is stored to 1MB immutable blocks; a full block can contain as little as one value or as many as millions of values • Each slice stores a set of blocks that contain a range of the values for each column
  • 28. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Column-stores compress very nicely • Each value in a single column is the same data type • Likely to have repeating values in a single column • Redshift can typically achieve between 3x-4x data compression ratios • Compression reduces storage requirements, but also improves performance by reducing I/O • Columns grow and shrink independently in Redshift Note: In Redshift jargon, “Column Encoding” refers to “Compression” Redshift typically achieves 3x-4x data compression ratios
  • 29. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Redshift supports a number of compression algorithms (e.g. LZO, ZSTD, RUNLENGTH, etc.) • Compression algorithms can achieve different compression ratios for different data types • Use PG_TABLE_DEF to verify/view the current encoding applied to each column in a table: SELECT * FROM PG_TABLE_DEF WHERE SCHEMANAME = ‘myschema’ AND TABLENAME = ‘mytable’
  • 30. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Columnar compression is automatically and intelligently applied by the COPY command to empty tables • Redshift’s ANALYZE COMPRESSION command will analyze an existing table and recommend the best compression settings • Compress everything except sort key columns • In some cases, RAW (no compression) is the best compression option (e.g. sparse columns or relatively small tables: ~10k rows) • Redshift’s Column Encoding Utility automates the use of the ANALYZE COMPRESSION command with a data migration to change compression in-place Note: Beware cases where you’ve tested COPY with a small number of rows before doing a full load; COPY will not be re-evaluate on non-empty tables ANALYZE COMPRESSION [ [ table_name ] [ ( column_name [, ...] ) ] ] [COMPROWS numrows] COMPUDATE PRESET Column compression is set based on column’s data type; no data is sampled COMPUDATE [ON] Best column compression is determined & set by applying different compression codecs on sample set of column data COMPUDATE OFF Skips any compression analysis
  • 31. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Redshift is a distributed database with a single leader and one or more compute nodes, where data is stored on compute nodes or on Amazon S3 • Distribution Style - a table property which dictates how that table’s data is distributed on internal storage • Distribution goals • Distribute data evenly for parallel processing • Ensure each node has the same amount of data • Minimize data movement during query processing Data Distribution Tips • A sub-optimal data distribution can lead to data skew and poor query performance; if unsure which distribution style to choose for a table, let Redshift pick for you. #auto • Redshift’s Column Encoding Utility can be used to change a table’s distribution style
  • 32. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Four distribution styles to choose from in Redshift A column value is hashed, and the same hash value is placed on the same slice KEY Full table data is placed on each compute node’s first slice ALL Data is evenly distributed across all slices using a round-robin distribution EVEN Default option; Redshift starts the table with ALL, but switches the table to EVEN when the table grows larger AUTO Data Distribution Tips • Consider using the ALL distribution style for all infrequently-modified small tables (~3 million rows or less) • Distributions keys should have high cardinality to avoid data skew and “hot” nodes
  • 33. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Good distribution keys are frequently joined to other tables (e.g. fact table joined with a dimension table) • High cardinality • High frequency of values relative to overall row count select count(distinct <my_column>) unique_values, count(9) total_rows from <my_table>; • Low skew • Each unique value in the column appears the same number of times as every other value • Use a date column only if cardinality is high enough, and queries don’t typically filter on a very narrow date period (to avoid workload skew among the node slices) Data Distribution Tips Use the query below to see the distribution of unique values for your key column (an even distribution is better)
  • 34. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Zone Maps are minimum-and-maximum values for each block of data • Zone maps are stored in-memory and automatically generated • Zone Maps allow Redshift to effectively prune blocks that cannot contain data needed for a given query • Minimizes unnecessary I/O • Along with sort keys, zone maps play a crucial role in enabling range-restricted scans to prune blocks and reduce I/O 1MB Bloc k MIN MAX 2001052 32001052 7 Block 42334 sales_dt 412.07 1269.33 price … … … ## ## Col n Redshift stores data in 1MB blocks 1MB Bloc k 1MB Bloc k ZoneMap Block 863 Block n
  • 35. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Redshift Sorting Sort keys can be added to a table by specifying the SORTKEY table property on one or more columns • Redshift uses sort keys to physically order data on disk • In combination with zone maps, sort keys enable range-restricted scans to prune blocks and reduce I/O • Sort keys combined with zone maps function like an index for a given set of columns • Sort keys benefit MERGE JOIN performance with a much faster sort • Redshift supports two types of sort keys • Compound Sort Key (default) • Interleaved Sort Key
  • 36. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Optimal Sort Key • Should consist of columns most commonly found in WHERE clause filter predicates • Extremely common for sort key to be a date • Compound Sort Key Tips • Column order matters – no skip scanning • Order columns by lowest cardinality to highest, if possible • Define four or less sort key columns—more will result in marginal gains and increased ingestion overhead • If your table is frequently joined, then include the DISTKEY in the sort key as the first column • A column that is CAST() to be joined or filtered, will not be used as a sort key (e.g. casting DATE to TIMESTAMPTZ); modify underlying data & then set this value as sort key • Sort keys are less beneficial on small tables Redshift Sorting • Columns added to a sort key after a high-cardinality column are not effective • With an established workload, the Redshift GitHub has scripts to help you find sort key suggestions
  • 37. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. SELECT count(*) FROM deep_dive WHERE dt = '06-09-2017'; MIN: 01-JUNE-2017 MAX: 06-JUNE-2017 MIN: 07-JUNE-2017 MAX: 12-JUNE-2017 MIN: 13-JUNE-2017 MAX: 21-JUNE-2017 MIN: 21-JUNE-2017 MAX: 30-JUNE-2017 Sorted by date MIN: 01-JUNE-2017 MAX: 20-JUNE-2017 MIN: 08-JUNE-2017 MAX: 30-JUNE-2017 MIN: 12-JUNE-2017 MAX: 20-JUNE-2017 MIN: 02-JUNE-2017 MAX: 25-JUNE-2017 Unsorted table Zone maps and sort keys can serve as a significant optimization by reducing the number of blocks examined (and therefore I/O) during query execution
  • 38. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Redshift supports the TEMPORARY keyword on CREATE TABLE, CREATE TABLE AS, and through the #<NAME> marker on SELECT SELECT ... INTO #MY_TEMP_TABLE FROM ... • Temporary table characteristics • Stored like all other Redshift tables, but only have a lifetime of the session (dropped on session termination) • Default to no columnar compression & even distribution • Do not have statistics by default Redshift Temporary Tables Define temporary tables with columnar compression and an appropriate distribution style to increase performance This often is the worst possible configuration for table storage, Tip: Define temporary tables as you would a permanent table
  • 39. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Capabilities • Temp tables can be used exactly as permanent tables would in ETL jobs or analytics • Temp tables can participate in complex/multi transactions • Temp tables exhibit faster I/O (not mirrored to other nodes) • Can COPY and UNLOAD temporary tables • SELECT INTO # does not provide the ability to set DISTSTYLE or column encoding • Best Practices • Avoid the use of SELECT INTO # (use explicit CREATE TEMPORARY TABLE (AS) statements instead) • Include column encoding settings on CREATE command • Include distribution keys or style when creating temp tables • Compute statistics when creating large temp tables as part of an ETL process Redshift Temporary Tables Create a temporary table that is LIKE another table so that it inherits the parent table’s column definitions, distribution style and sort keys create temp table temp_tbl (like parent_tbl);
  • 40. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. SET DW Sort Keys: Ensure sort keys exist to facilitate filters in the where clauseS Encoding (compression): Reduced I/O improves query performanceE Table Maintenance (vacuum, analyze): Current table statistics increase sort key effectiveness, and table defragmentation reduces wasted storage while improving query performance T Data Distribution: Ensure distribution keys exist to facilitate most common joinsD Workload Management: Machine learning algorithms profile queries to place them in appropriate queue with the appropriate resourcesW
  • 41. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • A table is a table, right? Nope – with Data Lakes, tables are collections of files • Data Lake file types have a huge influence on performance of Redshift Spectrum queries • Best Practices • Number of files in Data Lake should be a multiple of your Redshift slice count (general best practice) Redshift Spectrum can automatically split Parquet, ORC, text- format, and Bz2 files for optimal processing • File sizes should be in the range 64MB – 512MB • Files should be of uniform size (especially for files that can’t be automatically split by Redshift Spectrum such as Avro and Gzip) to avoid execution skew Redshift/Data Lake Interactions A data lake is a centralized repository that allows you to store all your structured and unstructured data at any scale.
  • 42. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Understanding query mechanics can maximize the work done by Redshift Spectrum • Do as much as possible on Redshift Spectrum before bringing data back to your cluster • Data Lake Best Practices for Redshift Spectrum • Use Data Lake file formats that are optimized for read by Redshift Spectrum (and Athena!) • ORC and Parquet apply columnar encoding, similar to how data is stored inside Redshift • Redshift Spectrum can also work with Avro, CSV and JSON data, but these files are *much* larger on S3 than ORC/Parquet Redshift/Data Lake Interactions Redshift Spectrum is a feature of Redshift that enables queries to reference external tables
  • 43. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Partitions should be based on: • Frequently filtered columns (either through join or where clause) • Business Units • Business Groups (user cohorts, application names, business units, etc.) • Date & Time • Consider how your users query data • Do they look month by month, or current month and year vs previous year for same month, etc. • Do they understand the columns you have created? • Date based partition columns have a type • Full dates included in a single value may be formatted or not (yyyy-mm-dd or yyyymmdd) • Formatted dates can only be strings • Either type of date needs to consider ordering (date=dd-mm-yyyy cannot be used in order by clause, but date=yyyy-mm-dd can!) Redshift/Data Lake Interactions Open file formats such as Parquet and ORC are optimal for Redshift/Data Lake interactions, because of their columnar structure
  • 44. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Keep data with similar security model in the same prefix: s3://mybucket/data • Application or business unit prefixes can be helpful: s3://mybucket/data/marketing • Each table resides in its own prefix: s3://mybucket/data/marketing/impressions • Add high level business unit partitions: s3://mybucket/data/marketing/impressions/application=flux_capacitor s3://mybucket/data/marketing/impressions/application=cold_fusion • Add dates: s3://mybucket/data/marketing/impressions/application=flux_capacitor/date=20180122 s3://mybucket/data/marketing/impressions/application=cold_fusion/date=20180123 OR s3://mybucket/data/marketing/impressions/application=flux_capacitor/yyyy=2018/mm=01/dd=22 Redshift Spectrum extends the same MPP principle used by Redshift clusters, to query external data, using multiple Redshift Spectrum instances as needed to scan files. Place the files in a separate folder for each table.
  • 45. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Workload management (WLM) is a feature that helps manage workloads and avoid short, fast-running queries getting stuck in queues behind long-running queries Three WLM methods that are complimentary to each other Redshift Workload Management The amount of memory available to a query is a function of • WLM queue where it runs • Percentage of memory assigned to the WLM queue • Number of query slots being consumed by the query Queues (basic WLM) WLM always assigns every query executed in Redshift to a specific queue on the basis of user group, query group, or WLM rules (e.g. [return_row_count > 1000000]) Short-Query Acceleration (SQA) Redshift uses machine learning to determine what constitutes a “short” running query in your cluster “Short” running queries are then automatically identified & run immediately in short-query queue if queuing occurs Concurrency Scaling Redshift uses machine learning to predict queuing in your cluster and when queuing occurs, transient Amazon Redshift clusters are added to your cluster where queries are routed for execution
  • 46. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Default WLM setup • One default WLM queue: concurrency level of five (enables up to 5 queries to run concurrently) and no timeout • Auto WLM enabled (auto query concurrency & memory allocation) • One superuser queue: concurrency level of one and no timeout • SQA enabled (enabled/disabled via checkbox in Redshift console) • Concurrency scaling disabled (enabled/disabled via Redshift console) Customizing WLM • Customize WLM queues via a few clicks on the Redshift console • Up to 8 custom queues are allowed in a Redshift cluster • WLM queues have four main “levers” • Concurrency level (aka “query slots”) • Memory allocation (%) • Targets (i.e. user, query groups, or query monitoring rules) • Timeout (ms) Redshift WLM Queue Setup via Redshift Console 1. Click on Parameter Groups in navigation pane and choose Create Cluster Parameter Group 2. Click Add Queue button to add a new WLM queue 3. Associate parameter group with your cluster
  • 47. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Auto WLM Auto WLM is enabled by default when the default parameter group is used, and must be explicitly enabled when a custom parameter group is used. Auto WLM can be enabled in a customer parameter group through the Amazon Redshift console by choosing Switch WLM mode and then choosing Auto WLM. With this choice, one queue is used to manage queries, and the memory and concurrency on main fields are both set to auto. When Auto WLM is not enabled, manual WLM requires you to specify values for query concurrency and memory allocation. • Automatic workload management (“Auto WLM”) lets Amazon Redshift automatically manage query concurrency and memory allocation • Auto WLM could create up to eight queues with each queue having a priority • Auto WLM will automatically determine the amount of resources that queries need, and adjust the concurrency based on the workload • Concurrency is set lower when queries requiring large amounts of resources are in the system (e.g. hash joins between large tables) • Concurrency is set higher when lighter queries (e.g. inserts, deletes, scans, or simple aggregations) are submitted • Auto WLM & SQA work together to allow short running and lightweight queries to complete even while long running, resource intensive queries are active
  • 48. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Redshift Query Priorities There are six possible query priorities The CRITICAL priority is a higher priority than HIGHEST and is available to superusers. To set this priority, you can use the functions: • CHANGE_QUERY_PRIORITY • CHANGE_SESSION_PRIORITY • CHANGE_USER_PRIORITY Only one CRITICAL query can run at a time • WLM queues can be defined with a specific priority (relative importance) & queries will inherit a queue’s priority • Administrators can use priorities to prioritize different workloads (e.g. ETL, Ingestion, Audit, BI, etc.) • Amazon Redshift uses priority when letting queries into the system to determine amount of resources allocated to query • Predictable performance for high priority workload comes at the cost of other, lower priority workloads Lower priority queries are not starved, but might run longer because waiting behind more important queries or running with fewer resources • Can enable concurrency scaling to maintain predictable performance for lower priority workloads • Auto WLM automatically creates and assigns queues corresponding to priorities • NORMAL (default) • LOW • LOWEST • CRITICAL (admins) • HIGHEST • HIGH
  • 49. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Query Monitoring Rules (QMR) are intended to help automatically handle runaway (poorly written) local or Spectrum queries • QMR can be defined for a WLM queue via the Redshift console (max 25 rules for all WLM queues) and each rule can take one of four actions for offending queries • LOG – log info about the query in STL_WLM_RULE_ACTION table • ABORT – log the action and terminate the query • HOP – log the action and move query to another appropriate queue if one exists, otherwise terminate it • PRIORITY - Change query priority (only available with Auto-WLM) • Common QMR use cases • Guard against wasteful resource utilization, runaway costs, etc. • Log resource-intensive queries Redshift QMR Each query monitoring rule includes up to three conditions, or predicates, and one action, similar to an if-then statement: if {predicate(s)} then {action} A predicate consists of a metric, operator, and value (e.g. rows_scanned > 1000000) If all of the predicates for any rule are met, that rule's action is triggered. Possible rule actions are log, hop, and abort “That User”: every DB has that user that loves to execute queries with unnecessarily expensive behavior (e.g. Cartesian product)
  • 50. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Concurrency Scaling is a Redshift feature that automatically adds transient clusters to your cluster within seconds to handle concurrent requests with consistently fast performance • Free for over 97% of Redshift customers • For every 24 hours that your main cluster is in use, you accrue a one-hour credit for Concurrency Scaling. Beyond that, customers are billed on a per-second basis per transient cluster • Applies to Redshift local & spectrum queries • Email notifications are issued when concurrency scaling occurs
  • 51. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Elastic Resize • Existing cluster is modified to add or remove nodes • During the actual resize, existing connections to the Redshift cluster are put on hold, no new connections are accepted until the resize finishes, and the cluster is unavailable for querying • Typically completes within ~15 minutes or less Resizing a cluster is easily achieved with a few clicks on the Redshift console, and there are two resizing approaches to choose from Classic Resize • Redshift cluster can be reconfigured to a different node count and instance type • Involves streaming all data from the original Redshift cluster to a newly created Redshift cluster with the new configuration. During the resize, the original Redshift cluster is in read-only mode, and the customer is only charged for one cluster • Depending on data size, may take several hours to complete
  • 52. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Redshift provides a Postgres compliant driver endpoint • Two driver options for connecting to Redshift • JDBC/ODBC Postgres driver • Proprietary Redshift driver • 35% faster than Postgres Driver • Support for IAM SSO • Like other Postgres clients, you connect to Redshift as a database user, using a hostname, port, and database name (viewable on Redshift console) Examples jdbc:[redshift|postgresql]://endpoint:port/databaseName • jdbc:redshift://demo.dsi9zn4ccku4.us-east- 1.redshift.amazonaws.com:8192/pocdb • jdbc:postgresql://demo.dsi9zn4ccku4.us-east- 1.redshift.amazonaws.com:8192/pocdb
  • 53. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Query Editor is a web-based query interface to run single SQL statement queries in Amazon Redshift cluster directly from your AWS Management console without having to install & setup an external JDBC/ODBC client • Query results viewable in console & downloadable to CSV file • Queries can be saved for convenient repeat execution • Query execution steps & times can be viewed to isolate bottlenecks & optimize queries • Other considerations • Max 50 Query Editor users at the same time per cluster • Query Editor applicable for short queries (runtime < 10 min) • Query result sets are paginated with 100 rows per page • Transactions & Enhanced VPC Routing are not supported • Access to Query Editor requires specific IAM permissions
  • 54. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • By default, Amazon Redshift clusters are locked down so nobody has access • To grant other users inbound access, must associate the Redshift cluster with a security group • Use security groups to authorize other VPC Security Groups, or CIDR Blocks to connect • VPC Security Groups should be used for AWS Service & EC2 connectivity, or Cross Account Access (recommended approach) • CIDR Blocks should be used for connections from on-prem/other side of Customer Gateway • Having separate cluster security groups per application or cluster is a good practice
  • 55. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Collections of database tables and other database objects (similar to namespaces)Schemas Named user accounts that can connect to a databaseUsers Collections of users that can be collectively assigned privileges for easier security maintenance Groups Redshift Security In Amazon Redshift, schemas are similar to operating system directories, except that schemas cannot be nested Users can be granted access to a single schema or to multiple schemas
  • 56. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Redshift Security Create a view to conceal row or columns for which a user or group of users are not authorized to access CREATE VIEW secure_view AS SELECT col1, col3 from underlying_table; GRANT SELECT ON secure_view TO GROUP restricted_group; REVOKE ALL ON underlying_table FROM GROUP restricted_group;
  • 57. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Security has always been priority-one at AWS, & Amazon Redshift is no exception End-to-end data encryption IAM integration & integration with SAML IdP’s for Federation (SSO) Amazon VPC for network isolation Database security model (users, groups, privileges) Audit logging and notifications Certifications that include SOC 1/2/3, PCI-DSS, FedRAMP, & HIPAA
  • 58. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. require_ssl true Redshift Security SSL encryption can be used with client connections to Amazon Redshift
  • 59. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Redshift can encrypt data at rest (data stored locally or in S3 backups) using AES algorithm with 256 bit key • Key management can be performed by Redshift, AWS KMS, or your HSM • You control rotation of encryption keys via API • Redshift blocks of data backed up to S3 are encrypted using the cluster’s encryption key • Redshift uses hardware based crypto modules to keep impact to performance to ~ 20% or less • Redshift clusters that need to comply with PCI, SOX, HIPAA must be configured with encryption enabled Redshift Security Redshift clusters can be configured to encrypt data at rest through a simple checkbox in the Redshift console
  • 60. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Audit Logs • Stored in three log files: Connection log, User log, and User Activity log • Must be explicitly enabled • Stored indefinitely unless S3 lifecycle rules are in place to archive or delete files automatically • Cluster restarts don't affect audit logs in S3 • Access to log files does not require access to the Redshift database • S3 charges apply System (STL) Tables • Stored in multiple tables, including SVL_STATEMENTTEXT and STL_CONNECTION_LOG • Automatically available on every node in the data warehouse cluster • Log history is stored for two to five days, depending on log usage and available disk space • Access to STL tables requires access to the Amazon Redshift database
  • 61. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. CloudTrail Logs • Stored indefinitely in S3, unless S3 lifecycle rules are in place to archive or delete files automatically • CloudTrail captures last 90 days of Management Events by default without charge (available using CloudTrail APIs or via the Console) • Maintaining longer history of events is possible but additional deliveries may apply, including S3 charges • Access to log files does not require access to the Redshift database
  • 62. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon RDS Amazon Athena Amazon EMR AWS Glue Amazon SageMaker Amazon S3 Amazon QuickSight Amazon Redshift AWS Glue Crawlers Web app data On-premises data Streaming data Other databases AWS Glue Data Catalog 1. Crawlers scan your data sets and populate the Glue Data Catalog 2. The Glue Data Catalog serves as a central metadata repository 3. Once catalogued in Glue, your data is immediately available for analytics
  • 63. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. ETL processS3 Spectrum Athena Marketing data source Other source systems S3
  • 64. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 65. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 66. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. aws.amazon.com/
  • 67. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Ricardo Serafim Analytics Specialist Solutions Architect rserafim@amazon.com