SlideShare ist ein Scribd-Unternehmen logo
1 von 135
Downloaden Sie, um offline zu lesen
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Building Your First Big Data
Application on AWS - ABD317
B e n S n i v e l y , S p e c i a l i s t S o l u t i o n s A r c h i t e c t , A W S
R y a n N i e n h u s , S r . P M , A m a z o n K i n e s i s
R a d h i k a R a v i r a l a , E M R S o l u t i o n s A r c h i t e c t , A W S
D a r i o R i v e r a , S p e c i a l i s t S o l u t i o n s A r c h i t e c t , A W S
A l l a n M a c I n n i s , K i n e s i s S o l u t i o n s A r c h i t e c t , A W S
C h r i s M a r s h a l l , S o l u t i o n s A r c h i t e c t , A W S
N o v e m b e r 2 0 1 7
AWS re:INVENT
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Your Application Architecture
Amazon Kinesis
Producer UI
Amazon
Kinesis
Firehose
AmazonKinesis
Analytics
AmazonKinesis
Firehose
Amazon
EMR
Amazon
Redshift
Amazon
QuickSight
Generate web logs Collect web logs
and deliver to S3
Process & compute
aggregate web log metrics
Deliver processed web
logs to Amazon Redshift
Raw web logs from
Firehose
Run SQL queries on
processed web logs
Visualize web logs to
discover insights
Amazon S3
Bucket
Interactive analysis
of web logs
Amazon
Athena
Interactive querying
of web logs
AWS Glue
Extract metadata, create
tables and transform
Web logs from CSV to
Parquet
Amazon S3
Bucket
Transformed web
logs from Glue
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What is qwikLABS?
• Provides access to AWS services for this workshop
• No need to provide a credit card
• Automatically deleted when you’re finished
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Sign in and start the lab
Once the lab is started you will see a Lab setup progress bar. It takes
~10 min for the lab to be setup
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Navigating qwikLABS
• Student Resources:
Scripts for your labs
• Open Console : Opens
AWS Management
Console
• Addl Connection Details:
Links to different
Interfaces
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Everything you need for the lab
Open the AWS Management Console, login and verify the following AWS
resources are created:
• One Amazon Kinesis Analytics application
• One Kinesis Analytics preprocessing AWS Lambda function
• Two Amazon Kinesis Firehose delivery streams
• One Amazon EMR Cluster
• One Amazon Redshift Cluster
Sign up (later) for:
• Amazon QuickSight
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Kinesis
Amazon Kinesis
Streams
Amazon Kinesis
Analytics
Amazon Kinesis
Firehose
Build custom
applications that process
and analyze streaming
data
Easily process and
analyze streaming data
with standard SQL
Easily load streaming
data into AWS
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Kinesis Streams
• Easy administration and low cost
• Build real time applications with framework of choice
• Secure, durable storage
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Kinesis Firehose
• Zero administration and seamless elasticity
• Direct-to-data store integration
• Serverless, continuous data transformations
Amazon S3
Amazon Redshift
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Kinesis - Firehose vs. Streams
Amazon Kinesis Streams is for use cases that require custom
processing, per incoming record, with sub-1 second processing
latency, and a choice of stream processing frameworks.
Amazon Kinesis Firehose is for use cases that require zero
administration, ability to use existing analytics tools based on
Amazon S3, Amazon Redshift, and Amazon Elasticsearch
Service, and a data latency of 60 seconds or higher.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 1
Collect logs using a
Kinesis Firehose delivery stream
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Your Application Architecture
Amazon Kinesis
Producer UI
Amazon
Kinesis
Firehose
AmazonKinesis
Analytics
AmazonKinesis
Firehose
Amazon
EMR
Amazon
Redshift
Amazon
QuickSight
Generate web logs Collect web logs
and deliver to S3
Process & compute
aggregate web log metrics
Deliver processed web
logs to Amazon Redshift
Raw web logs from
Firehose
Run SQL queries on
processed web logs
Visualize web logs to
discover insights
Amazon S3
Bucket
Interactive analysis
of web logs
Amazon
Athena
Interactive querying
of web logs
AWS Glue
Extract metadata, create
tables and transform
Web logs from CSV to
Parquet
Amazon S3
Bucket
Transformed web
logs from AWS Glue
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Collect logs with a Kinesis Firehose
delivery stream
Time: 5 minutes
We are going to:
A. Write to a Firehose delivery stream - Simulate writing transformed
Apache Web Logs to a Firehose delivery stream that is configured to
deliver data into an S3 bucket.
There are many different libraries that can be used to write data to a
Firehose delivery stream. One popular option is called the Amazon
Kinesis Agent.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Collect logs with a Kinesis Firehose
delivery stream
Amazon Kinesis Agent
• Standalone Java application to collect and send data to Firehose
• Continuously monitors set of files
• Handles file rotation, check-pointing and retry on failures
• Emits Amazon CloudWatch metrics
• Pre-process records parsed from monitored files
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Collect logs with a Kinesis Firehose
delivery stream
For example, the agent can transform an Apache Web Log to JSON.
From:
To:
{"HOST" : "125.166.52.103",
"IDENT" : null,
"AUTHUSER" : null,
"DATETIME" : "08/Mar/2017:17:06:44 -08:00",
"REQUEST" : "GET /explore",
"RESPONSE" : 200,
"BYTES" : 2503,
"REFERRER" : null,
"AGENT" : "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 5.2;
Trident/5.0)“}
125.166.52.103 - - [08/Mar/2017:17:06:44 -08:00] "GET /explore“
200 2503 "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 5.2;
Trident/5.0)“}
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Collect logs with a Kinesis Firehose
delivery stream
So that we don’t have to install or setup software on your machine, we are
going to use a utility called the Kinesis Data Generator to simulate using
the Amazon Kinesis agent. The Kinesis Data Generator can populate a
Firehose delivery stream using a template and is simple to setup.
Let’s get started!
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 1A: Working with the KDG
Qwiklabs has already created and setup the Kinesis Firehose delivery
stream for us. All we have to do is start writing data to it.
1. Go to this Kinesis Data Generator (KDG) Help Section at
http://tinyurl.com/kinesispublisher:
Which will redirect to:
https://s3.amazonaws.com/kinesis-data-producer-test/help.html
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 1A: Working with the KDG
2. Click “Create Amazon Cognito User with AWS CloudFormation”
This link will take you to a service called AWS CloudFormation. AWS
CloudFormation gives developers and systems administrators an easy way
to create and manage a collection of related AWS resources, provisioning
and updating them in an orderly and predictable fashion.
We use AWS CloudFormation to create the necessary user credentials for
you to use the Kinesis Data Generator.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 1A: Working with the KDG
3. On the next screen, click next. (Here we are using a template stored in
an Amazon S3 bucket to create the necessary credentials).
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 1A: Working with the KDG
4. Specify a user name and password (and remember them!), and then
click next. This user name and password will be used to sign in to the
Kinesis Data Generator. The password must be at least 6 alpha-
numeric characters, and contain at least one number.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 1A: Working with the KDG
We use a service called Amazon Cognito to create these credentials.
Amazon Cognito lets you easily add user sign-up and sign-in to your
mobile and web apps. With Amazon Cognito, you also have the options to
authenticate users through social identity providers such as Facebook,
Twitter, or Amazon, with SAML identity solutions, or by using your own
identity system.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 1A: Working with the KDG
5. The next screen has some additional options for the CloudFormation
stack which are not needed. Click next.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 1A: Working with the KDG
6. This screen is a review screen so you can verify you have selected the
correct options. When you are ready, check the “acknowledge”
button and click create.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 1A: Working with the KDG
7. You are taken to a screen that shows the stack creation process. (You
may have to refresh the page). In approximately one minute, the stack
will complete and you will see “CREATE_COMPLETE” under status.
Once this occurs, a) select the template, b) select the outputs tab, c)
click the link to navigate to your very own Kinesis Data Generator
(hosted on Amazon S3).
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 1A: Working with the KDG
8. Login using your user name and password
9. Select the region “us-west-2” (Oregon)
10. Select the delivery stream with name (qls-<somerandomnumber>-
FirehoseDeliveryStream-<somerandomnumber>-11111111111)
11. Specify a data rate (Records per second). Please choose a number less
than 10.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 1A: Working with the KDG
We have chosen a number less than 10 because everyone is on the same WiFi and
we want to be sure we don’t use all the bandwidth. Additionally, if you are not
plugged in you may run into a battery issues at a higher rate.
12. For the Record Template, you should use the template (Apache Combined Log
Template found in the Student Resources section). The template should look
like the following:
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 1A: Working with the KDG
13. Click “Send Data to Kinesis”
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Review: Monitoring Your Delivery Stream
Go to the Amazon CloudWatch Metrics Console and search
“IncomingRecords”. Select this metric for your Firehose delivery stream
and choose a 1 Minute SUM.
What are the most important metrics to monitor?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 2
Real-time data processing using
Kinesis Analytics
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Your Application Architecture
Amazon Kinesis
Producer UI
Amazon
Kinesis
Firehose
AmazonKinesis
Analytics
AmazonKinesis
Firehose
Amazon
EMR
Amazon
Redshift
Amazon
QuickSight
Generate web logs Collect web logs
and deliver to S3
Process & compute
aggregate web log metrics
Deliver processed web
logs to Redshift
Raw web logs from
Firehose
Run SQL queries on
processed web logs
Visualize web logs to
discover insights
Amazon S3
Bucket
Interactive analysis
of web logs
Amazon
Athena
Interactive querying
of web logs
AWS Glue
Extract metadata, create
tables and transform
Web logs from CSV to
Parquet
Amazon S3
Bucket
Transformed web
logs from Glue
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Kinesis Analytics
• Powerful real time applications
• Easy to use, fully managed
• Automatic elasticity
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Kinesis Analytics Applications
Easily write SQL code to process streaming data
Connect to streaming source
Continuously deliver SQL results
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Process Data using Kinesis Analytics
Time: 20 minutes
We are going to:
 Write a SQL query to compute an aggregate metrics for an interesting
statistic on the incoming data
 Write a SQL query using an anomaly detection function
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 2A: Start Amazon Kinesis Analytics
App
• Navigate to the Kinesis
dashboard
• Click on the Kinesis Analytics
Application
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 2A: Start Kinesis App
Click on “Go to SQL editor”. In the next screen, click on “Yes, start
application”
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
View Sample Records in Kinesis App
• Review sample records delivered to the source stream
(SOURCE_SQL_STREAM_001)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Kinesis App Metadata
Note that Amazon Kinesis adds metadata to each record being sent that
was shown in the formatted record sample:
• The ROWTIME represents the time the application read the record,
and is a special column used for time series analytics. This is also
known as the process time.
• The APPROXIMATE_ARRIVAL_TIME is the time the delivery stream
received the record. This is also known as ingest time.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Kinesis Analytics: How streaming app works
Writing SQL over streaming data using Kinesis Analytics follows a two part
model:
1. Create an in-application stream for storing intermediate SQL results. An
in-application stream is like a SQL table, but is continuously updated.
2. Create a PUMP which will continuously read FROM one in-application
stream and INSERT INTO a target in-application stream
DESTINATION_STREAM
Part 1
AGGREGATE_STREAM
Part 2
SOURCE_SQL_STREAM_001
Source Stream
OUTPUT_PUMP
AGGREGATE_PUMP
Send to Redshift
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 2B: Calculate an aggregate metric
Calculate a count using a tumbling window and a GROUP BY clause.
A tumbling window is similar to
a periodic report, where you
specify your query and a time
range, and results are emitted at
the end of a range. (EX: COUNT
number of items by key for 10
seconds)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 2C: Calculate an aggregate metric
Tumbling
Sliding
Custom
• Fixed size and non-overlapping
• Use FLOOR() or STEP() function (coming soon) in a GROUP BY
statement
• Fixed size and overlapping; row boundaries are determined when
new rows enter window
• Use standard OVER and WINDOW clause (i.e. count (col) OVER
(RANGE INTERVAL ‘5’ MIN)
• Not fixed size and overlapping; row boundaries are determined by
conditions
• Implementations vary, but typically require two steps (Step 1 –
Identify boundaries, Step 2 – perform computation)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 2C: Calculate an aggregate metric
The window is defined by the following statement in the SELECT
statement. Note that the ROWTIME column is implicitly included in every
stream query, and represents the processing time of the application.
This is known as a tumbling window. Tumbling windows are always
included in a GROUP BY clause and use a STEP function. The STEP function
takes an interval to product the periodic reports.
You can also use the SQL FLOOR() function to achieve the same thing.
STEP(source_sql_stream_001.ROWTIME BY INTERVAL '10' SECOND)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 2B: Calculate an aggregate metric
Create an aggregate_stream using SQL command found in the Kinesis
Analytics SQL file located in the Student Resources section of your lab.
Copy and paste the SQL in the SQL editor underneath the
“DESTINATION_SQL_STREAM” DDL. Click “Save and run SQL”
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 2C: Anomaly Detection
• Kinesis Analytics includes some advanced algorithms in it that are
extensions to the SQL language. These include approximate count
distinct (hyperloglog), approximate top K (space saving), and
anomaly detection (random cut forest).
• The random cut forest algorithm will detect anomalies in real-time on
multi-dimensional data sets. You pass the algorithm n number of
numeric fields and the algorithm produces an anomaly score on your
stream data. Higher scores are more anomalous.
• The minimum anomaly score is 0 and the maximum is log2 s, where s is
the subsample size parameter passed to random cut forest (third
parameter). You will have to try the algorithm on your data to get a
feel for the anomaly score, as the score is data-dependent.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 2C: Anomaly Detection
Create an anomaly_stream using the SQL command found in the
Kinesis Analytics SQL file located in the Student Resources section of
your lab. Append the SQL in your SQL editor. Click “Save and run SQL”
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Review – In-Application SQL streams
Your application has multiple in-application SQL streams including
DESTINATION_SQL_STREAM and ANOMALY_STREAM. These in-application
streams which are like SQL tables that are continuously updated.
What else is unique about an in-application stream aside from its
continuous nature?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 3
Deliver streaming results to Amazon
Redshift
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Your Application Architecture
Amazon Kinesis
Producer UI
Amazon
Kinesis
Firehose
AmazonKinesis
Analytics
AmazonKinesis
Firehose
Amazon
EMR
Amazon
Redshift
Amazon
QuickSight
Generate web logs Collect web logs
and deliver to S3
Process & compute
aggregate web log metrics
Deliver processed web
logs to Redshift
Raw web logs from
Firehose
Run SQL queries on
processed web logs
Visualize web logs to
discover insights
Amazon S3
Bucket
Interactive analysis
of web logs
Amazon
Athena
Interactive querying
of web logs
AWS Glue
Extract metadata, create
tables and transform
Web logs from CSV to
Parquet
Amazon S3
Bucket
Transformed web
logs from Glue
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 3: Deliver data to Amazon
Redshift using Kinesis Firehose
Time: 5 minutes
We are going to:
A. Connect to Amazon Redshift cluster and create a table to hold
web logs data
B. Update Kinesis Analytics application to send data to Amazon
Redshift, via the Firehose delivery stream.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
You can connect with pgweb
• Already Installed and configured for
the Redshift Cluster
• Just navigate to pgweb and start
interacting
Note: From the qwikLABS console,
open the pgWeb link in a new window.
Or, Use any JDBC/ODBC/libpq client
• Aginity Workbench for
Amazon Redshift
• SQL Workbench/J
• DBeaver
• Datagrip
If you use above SQL clients,
• The username/password is in your
qwikLABS console – scroll at the
bottom. The end point is there too
and the database is called logs.
Activity 3A: Connect to Amazon Redshift
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 3B: Create table in Amazon
Redshift
Create table weblogs to capture the in-coming data from a Firehose delivery stream
Note: You can download Amazon Redshift SQL code from qwikLabs Student
Resource Section (Click on Open Redshift SQL)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 3C: Deliver Data to Amazon
Redshift using Firehose
Update Kinesis Analytics application to send data to Firehose delivery
stream. Firehose delivers the streaming data to Amazon Redshift.
1. Go to the Kinesis Analytics console
2. Choose the Amazon Redshift delivery stream as destination and click
on the edit button (see the pencil icon in the figure below)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 3C: Deliver Data to Amazon
Redshift using Firehose
4. Validate your destination
1. Choose the Firehose “qls-xxxxxxx-RedshiftDeliveryStream-
xxxxxxxx” delivery stream.
2. Keep the default for “Connect in-application stream”
3. Choose CSV as the “Output format”
4. Select “Choose from IAM roles that Kinesis Analytics can assume”
5. Click “Save and continue”
5. It will take about 1 – 2 minutes for everything to be updated and for
data to start appearing in Amazon Redshift.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 3C: Deliver Data to Amazon Redshift
using Firehose
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Review: Amazon Redshift Test Queries
Find distribution of response codes over days (Copy SQL from Redshift SQL
file
Count the number of 404 response codes
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Review: Amazon Redshift Test Queries
Show all requests paths with status “PAGE NOT FOUND
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Extract, Transform and Load (ETL)
with AWS Glue
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Fully managed ETL (extract, transform, and
load) service
AWS Glue
• Categorize your data, clean it, enrich it and
move it reliably between various data stores
• Once catalogued, your data is immediately
searchable and query-able across your data
silos
• Simple, flexible and cost-effective
• Serverless; runs on a fully managed, scale-out
Spark environment
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Glue Components
Data Catalog
• Discover and Organize your data in various databases,
data warehouses and data lakes
• Runs jobs in Spark containers – automatic scaling
based on SLA
• Glue is serverless – only pay for the resources you
consume
Job Authoring
• Focus on the writing transformations
• Generate code through a wizard
• Write your own code
Job Execution
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Glue-How it works
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 4: Transform Web logs to
Parquet using AWS Glue
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Your Application Architecture
Amazon Kinesis
Producer UI
Amazon
Kinesis
Firehose
AmazonKinesis
Analytics
AmazonKinesis
Firehose
Amazon
EMR
Amazon
Redshift
Amazon
QuickSight
Generate web logs Collect web logs
and deliver to S3
Process & compute
aggregate web log metrics
Deliver processed web
logs to Redshift
Raw web logs from
Firehose
Run SQL queries on
processed web logs
Visualize web logs to
discover insights
Amazon S3
Bucket
Interactive analysis
of web logs
Amazon
Athena
Interactive querying
of web logs
AWS Glue
Extract metadata, create
tables and transform
Web logs from CSV to
Parquet
Amazon S3
Bucket
Transformed web
logs from Glue
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity: Catalog and perform ETL on
Weblogs
Time: 40 minutes
We are going to:
A. Discover and catalog the web log data deposited into the S3 bucket
using AWS Glue Crawler
B. Transform Web logs to Parquet format with AWS Glue ETL Job
Authoring tool
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 4A: Discover dataset with AWS
Glue
We use AWS GLUE’s crawler to extract data and metadata. From the
AWS Management Console, select AWS Glue Service, click on “Get
Started” on the next screen.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 4A: Add crawler using AWS Glue
Select Crawlers section on the left and click on Add crawler
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 4A: ETL with AWS Glue
Specify a name for the crawler. Click on folder icon to choose the
data store on S3. Click Next
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 4A: ETL with AWS Glue
Provide S3 path location where the weblogs are deposited (navigate
S3 path with contains the word “logs” in it and select the “raw”
folder). Click Select
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 4A: ETL with AWS Glue
Click Next on the next screen to not add another data store
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 4A: ETL with AWS Glue
In the IAM Role section, select “Create an IAM role”, add “default” to
the IAM role and click “Next”
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 4A: Add crawler in AWS Glue
Choose Run on demand, to run the crawler now and click Next
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 4A: Add crawler with AWS Glue
On the next screen, click on Add database to add a database (use weblogdb
for the database name)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 4A: Add crawler with AWS Glue
Review and click Finish to create a crawler
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 4A: Add crawler with AWS Glue
Click on Run it now link to run the crawler
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 4A: Add crawler with AWS Glue
Crawler shows a Ready status when it is finished running.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 4A: Table creation in AWS Glue
Observe that the crawler has created a table for your dataset.
The crawler automatically classified the dataset as combinedapache log
format.
Click the table to take a look at the properties
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 4A: Table creation in AWS Glue
Glue used the GrokSerDe (Serializer/Deserializer) to correctly
interpret the web logs
You can click on the View partitions link to look at the partitions in
the dataset
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 4A: Create ETL Job in AWS Glue
With the dataset cataloged and table created, we are now ready to convert
the web logs based apache combined log format to a more optimal Parquet
format for querying.
Click on Add job to begin creating the ETL job
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 4B: ETL Job in AWS Glue
• In the Job properties window, specify the Job
Name
• In this example, we will create a new ETL script.
• Glue automatically chooses the name of the
script file name and the path where the script
will be persisted
• For the “Temporary Directory” specify a S3
temporary directory in your lab account (use
s3://<stack>-logs-<account>-us-west-2/
bucket and append a folder name temp)
• The path should look like:
s3://<stack>-logs-<account>-us-west-2/temp/
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 4B: ETL Job in AWS Glue
• DO NOT CLICK NEXT YET
• Copy the temporary path to a text editor, modify the path as follows
and save it in a file (we will need this path for storing parquet files)
s3://<stack>-logs-<account>-us-west-2/weblogs/processed/parquet
For e.g.,
s3://qls-108881-f75177905b7b5b0e-logs-XXXXXXXX3638-us-west-
2/weblogs/processed/parquet
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 4B: ETL Job in AWS Glue
• Expand “Script libraries and job
parameters” section, and increase the DPUs
to 20
• Let’s pass a job parameter to send the S3
path where parquet files will be deposited.
• Specify the following values for Key and
Value
• Key: -- parquet_path (notice the 2
hyphens at the beginning)
• Value: s3://<stack>-logs-<account>-us-
west-2/weblogs/processed/parquet/
• Note: Value is the S3 path we stored in the
previous section
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 4B: ETL Job in AWS Glue
Click Next in the following screen and click Finish to complete the job creation
Activity 4B: ETL Job in Glue
• Close Script Editor tips
window (if it appears)
• In the Glue Script Editor,
copy the ETL code by
clicking on the “Open
Glue ETL Code” link in
Student Resources
• Ensure that the database
name (db_name) and
table name reflect the
database and table name
created by the Glue
Crawler
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 4B: ETL Job in AWS Glue
Click Save and then Run Job button to execute your ETL. Click on
“Save and run job” in the next window.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 4B: ETL Job in AWS Glue
Run job to continue. This might take a few minutes. When the job
finishes, weblogs will be transformed to parquet format
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Interactive Querying with Amazon
Athena
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Interactive Query Service
• Query directly from Amazon S3
• Use ANSI SQL
• Serverless
• Multiple Data Formats
• Cost Effective
Amazon
Athena
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Familiar Technologies Under the Covers
Used for SQL Queries
In-memory distributed query engine
ANSI-SQL compatible with extensions
Used for DDL functionality
Complex data types
Multitude of formats
Supports data partitioning
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Comparing performance and cost savings
for compression and columnar format
Dataset
Size on Amazon
S3
Query Run time Data Scanned Cost
Data stored as text
files
1 TB 236 seconds 1.15 TB $5.75
Data stored in
Apache Parquet
format*
130 GB 6.78 seconds 2.51 GB $0.013
Savings /
Speedup
87% less with
Parquet
34x faster
99% less data
scanned
99.7% savings
(*compressed using Snappy compression)
https://aws.amazon.com/blogs/big-data/analyzing-data-in-s3-using-amazon-athena/
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 5
Interactive Querying with Amazon
Athena
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Your Application Architecture
Amazon Kinesis
Producer UI
Amazon
Kinesis
Firehose
AmazonKinesis
Analytics
AmazonKinesis
Firehose
Amazon
EMR
Amazon
Redshift
Amazon
QuickSight
Generate web logs Collect web logs
and deliver to S3
Process & compute
aggregate web log metrics
Deliver processed web
logs to Amazon Redshift
Raw web logs from
Firehose
Run SQL queries on
processed web logs
Visualize web logs to
discover insights
Amazon S3
Bucket
Interactive analysis
of web logs
Amazon
Athena
Interactive querying
of web logs
AWS Glue
Extract metadata, create
tables and transform
Web logs from CSV to
Parquet
Amazon S3
Bucket
Transformed web
logs from Glue
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity: Interactive querying with Amazon
Athena
Time: 15 minutes
We are going to:
A. Create a table over the processed weblogs in S3. These are the
parquet files created by AWS Glue ETL job in the previous section
B. Run interactive queries on the Parquet’ed weblogs
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 5A: Setup Amazon Athena
1. From the AWS Management Console, search for Athena and click on the
service
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 5A: Setup Amazon Athena
2. Select Amazon Athena from the Analytics section and click on Get
Started on the next page
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 5A: Setup Amazon Athena
3. Dismiss the window for running the Athena tutorial.
4. Dismiss any other tutorial window
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 5A: Setup Amazon Athena
5. We are now ready to create a table in Athena. But before we do that, we
need to get the S3 bucket location where AWS Glue job delivered
parquet files. Click on Services, then choose S3 from the Storage
Section.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 5A: Set Up Amazon Athena
6. Locate the bucket looks like ‘qls-<stackname>-logs-#####-us-west-2’.
Navigate to the parquet folder. Copy the name of the bucket into a text
editor
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 5A: Setup Amazon Athena
7. Go back to the Athena Console
8. Let’s create a table in Athena on the Parquet dataset created by AWS Glue.
9. In the Athena console, choose the ‘weblogdb’ on the database dropdown
10. Enter the SQL command (found by clicking on the “Open Athena SQL” link
in Student Resources section of qwiklabs) to create a table.
11. Make sure to replace the <your-parquet-path> with the bucket location for
the parquet files you copied in the previous step.
(Screen shot in next slide)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 5B: Working with Amazon Athena
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 5B: Working with Amazon Athena
8. The SQL DDL in the
previous step creates
a table in Athena
based on the
parquet files we
created with the
Glue ETL Job
9. Select weblogdb
from the database
section and click on
the three stacked
dots icon to sample
a few rows of the S3
data
Activity 5C: Interactive Querying with Amazon Athena
• Run interactive queries (copy SQL queries from “Athena SQL” in
Student Resources) and see the results on the console
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 5C: Interactive Querying with Athena
2. Optionally, you can save the results of a query to CSV by choosing the
file icon on the Results pane.
3. You can also view the results of previous queries or queries that may
take some time to complete. Choose History then either search for your
query or choose View or Download to view or download the results of
previous completed queries. This also displays the status of queries that
are currently running.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Review: Amazon Athena Interactive Queries
Query results are also stored in Amazon S3 in a bucket called aws-athena-
query-results-ACCOUNTID-REGION.
Where can you can change the default location in the console?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data processing with Amazon EMR
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon EMR release
Storage
S3 (EMRFS), HDFS
YARN
Cluster Resource Management
Batch
MapReduce
Interactive
Tez
In Memory
Spark
Applications
Hive, Pig, Spark SQL/Streaming/ML, Mahout, Sqoop
HBase/Phoenix
Presto
Hue (SQL Interface/Metastore Management)
Zeppelin (Interactive Notebook)
Ganglia (Monitoring)
HiveServer2/Spark Thriftserver
(JDBC/ODBC)
Amazon EMR service
Streaming
Flink
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
On-cluster UIs
Manage applicationsNotebooks
SQL editor, Workflow designer,
Metastore browser
Design and execute
queries and workloads
And more using
bootstrap actions!
The Hadoop ecosystem can run in Amazon EMR
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Easy to use Spot Instances
On-demand for
core nodes
Standard
Amazon EC2
pricing for
on-demand
capacity
Spot Instances
for task nodes
Up to 90%
off Amazon
EC2
on-demand
pricing
Meet SLA at predictable cost Exceed SLA at lower cost
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon S3 as your persistent data store
Separate compute and storage
Resize and shut down Amazon EMR
clusters with no data loss
Point multiple Amazon EMR
clusters at same
data in Amazon S3
EMR
EMR
Amazon
S3
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
EMRFS makes it easier to leverage S3
Better performance and error handling options
Transparent to applications – Use “s3://”
Consistent view
• For consistent list and read-after-write for new puts
Support for Amazon S3 server-side and client-side encryption
Faster listing using EMRFS metadata
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Apache Spark
• Fast, general-purpose engine for large-
scale data processing
• Write applications quickly in Java, Scala,
or Python
• Combine SQL, streaming, and complex
analytics
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Apache Zeppelin
• Web-based notebook for
interactive analytics
• Multiple language back end
• Apache Spark integration
• Data visualization
• Collaboration
https://zeppelin.apache.org/
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 6
Interactive analysis using Amazon
EMR
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Your Application Architecture
Amazon Kinesis
Producer UI
Amazon
Kinesis
Firehose
AmazonKinesis
Analytics
AmazonKinesis
Firehose
Amazon
EMR
Amazon
Redshift
Amazon
QuickSight
Generate web logs Collect web logs
and deliver to S3
Process & compute
aggregate web log metrics
Deliver processed web
logs to Amazon Redshift
Raw web logs from
Firehose
Run SQL queries on
processed web logs
Visualize web logs to
discover insights
Amazon S3
Bucket
Interactive analysis
of web logs
Amazon
Athena
Interactive querying
of web logs
AWS Glue
Extract metadata, create
tables and transform
Web logs from CSV to
Parquet
Amazon S3
Bucket
Transformed web
logs from Glue
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 6: Process and Query data with
Amazon EMR
Time: 20 minutes
We are going to:
A. Use a Zeppelin Notebook to interact with Amazon EMR Cluster
B. Process the data in Amazon S3 using Apache Spark
C. Query the data processed in the earlier stage and create simple charts
Activity 6A: Open the Zeppelin interface
1. Copy the Zeppelin end point in
Student Resources section in
qwiklabs
2. Click on the “Open Zeppelin
Notebook” link in Student
Resources section to open the
zeppelin link into a new window.
3. Download the file (or Copy and
save it to file with .json extension)
4. Import the Notebook using the
Import Note link on Zeppelin
interface
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 6A: Open the Zeppelin interface
• Use s3://<stack>-logs-<account>-us-west-2/processed/parquet
bucket where the processed parquet files were deposited.
• Run the paragraph
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 6B: Run the notebook
Enter the S3 bucket name where the parquet files are stored. The bucket
name begins with <stack>-*-logs-#####-region
Execute Step 1
• Enter bucket name (<stack>-*-logs-##########-us-west-2)
Execute Step 2
• Create a Dataframe with the parquet files from the Glue ETL job
Execute Step 3
• Sample a few rows
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 6B: Run the notebook
Execute Step 4 to process the data
• Notice how the ‘AGENT’ field consists of the ’BROWSER’ at the
beginning of the column value. Let’s extract the browser from
that .
• Create a UDF that will extract the browser part and add it to the
Dataframe
• Print the new Dataframe
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 6B: Run the notebook
Execute Step 6
• Register the data frame as a temporary table
• Now you can run SQL queries on the temporary tables.
Execute the next 3 steps and observe the charts created
• What did you learn about the dataset?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Review : Interactive analysis using Amazon
EMR
You just learned on how to process and query data using Amazon
EMR with Apache Spark
Amazon EMR has many other frameworks available for you to use
• Hive, Presto, Flink, Pig, MapReduce
• Hue, Oozie, HBase
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Optional Exercise:
Data Visualization with Amazon
QuickSight
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon QuickSight
Fast, Easy Interactive Analytics
for Anyone, Everywhere
Ease of use targeted at business users.
Blazing fast performance powered by
SPICE.
Broad connectivity with AWS data
services, on-premises data, files and
business applications.
Cloud-native solution that scales
automatically.
1/10th the cost of traditional BI
solutions.
Create, share and collaborate with
anyone in your organization, on the
web or on mobile.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Connect, SPICE, Analyze
Amazon QuickSight allows you to connect to data from a wide variety of
AWS, third party, and on-premises sources and import it to SPICE or query
directly. Users can then easily explore, analyze, and share their insights
with anyone.
Amazon RDS
Amazon S3
Amazon
Redshift
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 7
Visualize results in Amazon
QuickSight
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Your Application Architecture
Amazon Kinesis
Producer UI
Amazon
Kinesis
Firehose
AmazonKinesis
Analytics
AmazonKinesis
Firehose
Amazon
EMR
Amazon
Redshift
Amazon
QuickSight
Generate web logs Collect web logs
and deliver to S3
Process & compute
aggregate web log metrics
Deliver processed web
logs to Amazon Redshift
Raw web logs from
Firehose
Run SQL queries on
processed web logs
Visualize web logs to
discover insights
Amazon S3
Bucket
Interactive analysis
of web logs
Amazon
Athena
Interactive querying
of web logs
AWS Glue
Extract metadata, create
tables and transform
Web logs from CSV to
Parquet
Amazon S3
Bucket
Transformed web
logs from Glue
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 7: Visualization with Amazon
QuickSight
We are going to:
A. Register for a Amazon QuickSight account
B. Connect to the Amazon Redshift Cluster
C. Create visualizations for analysis to answer questions like:
A. What are the most common http requests and how successful
(response code of 200) are they
B. Which are the most requested URIs
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 7A: Amazon QuickSight
Registration
• Go to AWS Console, click on
QuickSight from the
Analytics section.
• Click on Signup in the next
window
• Make sure the subscription
type is Standard and click
Continue on the next screen
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 7A: Amazon QuickSight
Registration
• On the Subscription Type page,
enter the account name (see note
below)
• Enter your email address
• Select US West region
• Check the S3 (all buckets) box
• Note: Amazon QuickSight
Account name is the AWS
account number on the
qwikLabs console
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 7A: Amazon QuickSight
Registration
• If a pop box to choose S3
buckets appears, click Select
buckets
• Click on Go To Amazon
Quicksight
• Dismiss welcome screen
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 7B: Connect to data source
• Click on Manage Data and
then select New Dataset to
create a new data set in
Amazon QuickSight
• Choose Redshift (Auto-
discovered) as the data
source. Amazon QuickSight
autodiscovers databases
associated with your AWS
account (Amazon Redshift
database in this case)
Activity 7B: Connect to Amazon Redshift
Note: Use ”dbadmin” as the
username. You can get the Amazon
Redshift database password from
qwikLABS by navigating to the
“Connection details” section (see
below)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 7C: Choose your weblogs Amazon
Redshift table
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 7D: Ingest data into SPICE
SPICE is Amazon QuickSight's in-
memory optimized calculation
engine, designed specifically for
fast, Interactive data visualization
You can improve the performance
of database data sets by importing
the data into SPICE instead of
using a direct query to the
database
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Activity 7E: Creating your first analysis
What are the most requested
http request types and their
corresponding response codes
for this site?
Simply select request,
response and let AUTOGRAPH
create the optimal
visualization
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Review – Creating your Analysis
Exercise: Add a visual to demonstrate which URI are the most requested?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Please don’t forgot to fill out your evaluations.
THANK YOU!

Weitere ähnliche Inhalte

Was ist angesagt?

AWS Security Week: AWS Secrets Manager
AWS Security Week: AWS Secrets ManagerAWS Security Week: AWS Secrets Manager
AWS Security Week: AWS Secrets ManagerAmazon Web Services
 
A Brief Look at Serverless Architecture
A Brief Look at Serverless ArchitectureA Brief Look at Serverless Architecture
A Brief Look at Serverless ArchitectureAmazon Web Services
 
AWS 기반 클라우드 아키텍처 모범사례 - 삼성전자 개발자 포털/개발자 워크스페이스 - 정영준 솔루션즈 아키텍트, AWS / 유현성 수석,...
AWS 기반 클라우드 아키텍처 모범사례 - 삼성전자 개발자 포털/개발자 워크스페이스 - 정영준 솔루션즈 아키텍트, AWS / 유현성 수석,...AWS 기반 클라우드 아키텍처 모범사례 - 삼성전자 개발자 포털/개발자 워크스페이스 - 정영준 솔루션즈 아키텍트, AWS / 유현성 수석,...
AWS 기반 클라우드 아키텍처 모범사례 - 삼성전자 개발자 포털/개발자 워크스페이스 - 정영준 솔루션즈 아키텍트, AWS / 유현성 수석,...Amazon Web Services Korea
 
대용량 트래픽을 처리하는 최적의 서버리스 애플리케이션 - 안효빈, 구성완 AWS 솔루션즈 아키텍트 :: AWS Summit Seoul 2021
대용량 트래픽을 처리하는 최적의 서버리스 애플리케이션  - 안효빈, 구성완 AWS 솔루션즈 아키텍트 :: AWS Summit Seoul 2021대용량 트래픽을 처리하는 최적의 서버리스 애플리케이션  - 안효빈, 구성완 AWS 솔루션즈 아키텍트 :: AWS Summit Seoul 2021
대용량 트래픽을 처리하는 최적의 서버리스 애플리케이션 - 안효빈, 구성완 AWS 솔루션즈 아키텍트 :: AWS Summit Seoul 2021Amazon Web Services Korea
 
OpsNow를 활용한 AWS Cloud 비용 최적화 전략
OpsNow를 활용한 AWS Cloud 비용 최적화 전략OpsNow를 활용한 AWS Cloud 비용 최적화 전략
OpsNow를 활용한 AWS Cloud 비용 최적화 전략BESPIN GLOBAL
 
서버리스 아키텍처 패턴 및 로그 처리를 위한 파이프라인 구축기 - 황윤상 솔루션즈 아키텍트, AWS / Matthew Han, SendBi...
서버리스 아키텍처 패턴 및 로그 처리를 위한 파이프라인 구축기 - 황윤상 솔루션즈 아키텍트, AWS / Matthew Han, SendBi...서버리스 아키텍처 패턴 및 로그 처리를 위한 파이프라인 구축기 - 황윤상 솔루션즈 아키텍트, AWS / Matthew Han, SendBi...
서버리스 아키텍처 패턴 및 로그 처리를 위한 파이프라인 구축기 - 황윤상 솔루션즈 아키텍트, AWS / Matthew Han, SendBi...Amazon Web Services Korea
 
Deep Dive on Amazon GuardDuty - AWS Online Tech Talks
Deep Dive on Amazon GuardDuty - AWS Online Tech TalksDeep Dive on Amazon GuardDuty - AWS Online Tech Talks
Deep Dive on Amazon GuardDuty - AWS Online Tech TalksAmazon Web Services
 
HSBC and AWS Day - AWS foundations
HSBC and AWS Day - AWS foundationsHSBC and AWS Day - AWS foundations
HSBC and AWS Day - AWS foundationsAmazon Web Services
 
Introduction to Batch Processing on AWS
Introduction to Batch Processing on AWSIntroduction to Batch Processing on AWS
Introduction to Batch Processing on AWSAmazon Web Services
 
AWS Multi-Account Architecture and Best Practices
AWS Multi-Account Architecture and Best PracticesAWS Multi-Account Architecture and Best Practices
AWS Multi-Account Architecture and Best PracticesAmazon Web Services
 
고객의 플랫폼/서비스를 개선한 국내 사례 살펴보기 – 장준성 AWS 솔루션즈 아키텍트, 강산아 NDREAM 팀장, 송영호 야놀자 매니저, ...
고객의 플랫폼/서비스를 개선한 국내 사례 살펴보기 – 장준성 AWS 솔루션즈 아키텍트, 강산아 NDREAM 팀장, 송영호 야놀자 매니저, ...고객의 플랫폼/서비스를 개선한 국내 사례 살펴보기 – 장준성 AWS 솔루션즈 아키텍트, 강산아 NDREAM 팀장, 송영호 야놀자 매니저, ...
고객의 플랫폼/서비스를 개선한 국내 사례 살펴보기 – 장준성 AWS 솔루션즈 아키텍트, 강산아 NDREAM 팀장, 송영호 야놀자 매니저, ...Amazon Web Services Korea
 
Deep Dive on Amazon Elastic Container Service (ECS) and Fargate
Deep Dive on Amazon Elastic Container Service (ECS) and FargateDeep Dive on Amazon Elastic Container Service (ECS) and Fargate
Deep Dive on Amazon Elastic Container Service (ECS) and FargateAmazon Web Services
 
높은 가용성과 성능 향상을 위한 ElastiCache 활용 팁 - 임근택, SendBird :: AWS Summit Seoul 2019
높은 가용성과 성능 향상을 위한 ElastiCache 활용 팁 - 임근택, SendBird :: AWS Summit Seoul 2019 높은 가용성과 성능 향상을 위한 ElastiCache 활용 팁 - 임근택, SendBird :: AWS Summit Seoul 2019
높은 가용성과 성능 향상을 위한 ElastiCache 활용 팁 - 임근택, SendBird :: AWS Summit Seoul 2019 Amazon Web Services Korea
 
FinOps - AWS Cost and Operational Efficiency - Pop-up Loft Tel Aviv
FinOps - AWS Cost and Operational Efficiency - Pop-up Loft Tel AvivFinOps - AWS Cost and Operational Efficiency - Pop-up Loft Tel Aviv
FinOps - AWS Cost and Operational Efficiency - Pop-up Loft Tel AvivAmazon Web Services
 
Getting Started with Serverless Architectures
Getting Started with Serverless ArchitecturesGetting Started with Serverless Architectures
Getting Started with Serverless ArchitecturesAmazon Web Services
 

Was ist angesagt? (20)

AWS Security Week: AWS Secrets Manager
AWS Security Week: AWS Secrets ManagerAWS Security Week: AWS Secrets Manager
AWS Security Week: AWS Secrets Manager
 
Amazon Macie Demo
Amazon Macie DemoAmazon Macie Demo
Amazon Macie Demo
 
A Brief Look at Serverless Architecture
A Brief Look at Serverless ArchitectureA Brief Look at Serverless Architecture
A Brief Look at Serverless Architecture
 
AWS 기반 클라우드 아키텍처 모범사례 - 삼성전자 개발자 포털/개발자 워크스페이스 - 정영준 솔루션즈 아키텍트, AWS / 유현성 수석,...
AWS 기반 클라우드 아키텍처 모범사례 - 삼성전자 개발자 포털/개발자 워크스페이스 - 정영준 솔루션즈 아키텍트, AWS / 유현성 수석,...AWS 기반 클라우드 아키텍처 모범사례 - 삼성전자 개발자 포털/개발자 워크스페이스 - 정영준 솔루션즈 아키텍트, AWS / 유현성 수석,...
AWS 기반 클라우드 아키텍처 모범사례 - 삼성전자 개발자 포털/개발자 워크스페이스 - 정영준 솔루션즈 아키텍트, AWS / 유현성 수석,...
 
ElastiCache & Redis
ElastiCache & RedisElastiCache & Redis
ElastiCache & Redis
 
대용량 트래픽을 처리하는 최적의 서버리스 애플리케이션 - 안효빈, 구성완 AWS 솔루션즈 아키텍트 :: AWS Summit Seoul 2021
대용량 트래픽을 처리하는 최적의 서버리스 애플리케이션  - 안효빈, 구성완 AWS 솔루션즈 아키텍트 :: AWS Summit Seoul 2021대용량 트래픽을 처리하는 최적의 서버리스 애플리케이션  - 안효빈, 구성완 AWS 솔루션즈 아키텍트 :: AWS Summit Seoul 2021
대용량 트래픽을 처리하는 최적의 서버리스 애플리케이션 - 안효빈, 구성완 AWS 솔루션즈 아키텍트 :: AWS Summit Seoul 2021
 
OpsNow를 활용한 AWS Cloud 비용 최적화 전략
OpsNow를 활용한 AWS Cloud 비용 최적화 전략OpsNow를 활용한 AWS Cloud 비용 최적화 전략
OpsNow를 활용한 AWS Cloud 비용 최적화 전략
 
서버리스 아키텍처 패턴 및 로그 처리를 위한 파이프라인 구축기 - 황윤상 솔루션즈 아키텍트, AWS / Matthew Han, SendBi...
서버리스 아키텍처 패턴 및 로그 처리를 위한 파이프라인 구축기 - 황윤상 솔루션즈 아키텍트, AWS / Matthew Han, SendBi...서버리스 아키텍처 패턴 및 로그 처리를 위한 파이프라인 구축기 - 황윤상 솔루션즈 아키텍트, AWS / Matthew Han, SendBi...
서버리스 아키텍처 패턴 및 로그 처리를 위한 파이프라인 구축기 - 황윤상 솔루션즈 아키텍트, AWS / Matthew Han, SendBi...
 
Deep Dive on Amazon GuardDuty - AWS Online Tech Talks
Deep Dive on Amazon GuardDuty - AWS Online Tech TalksDeep Dive on Amazon GuardDuty - AWS Online Tech Talks
Deep Dive on Amazon GuardDuty - AWS Online Tech Talks
 
HSBC and AWS Day - AWS foundations
HSBC and AWS Day - AWS foundationsHSBC and AWS Day - AWS foundations
HSBC and AWS Day - AWS foundations
 
Cost Optimization on AWS
Cost Optimization on AWSCost Optimization on AWS
Cost Optimization on AWS
 
Setting Up a Landing Zone
Setting Up a Landing ZoneSetting Up a Landing Zone
Setting Up a Landing Zone
 
Introduction to Batch Processing on AWS
Introduction to Batch Processing on AWSIntroduction to Batch Processing on AWS
Introduction to Batch Processing on AWS
 
AWS Multi-Account Architecture and Best Practices
AWS Multi-Account Architecture and Best PracticesAWS Multi-Account Architecture and Best Practices
AWS Multi-Account Architecture and Best Practices
 
고객의 플랫폼/서비스를 개선한 국내 사례 살펴보기 – 장준성 AWS 솔루션즈 아키텍트, 강산아 NDREAM 팀장, 송영호 야놀자 매니저, ...
고객의 플랫폼/서비스를 개선한 국내 사례 살펴보기 – 장준성 AWS 솔루션즈 아키텍트, 강산아 NDREAM 팀장, 송영호 야놀자 매니저, ...고객의 플랫폼/서비스를 개선한 국내 사례 살펴보기 – 장준성 AWS 솔루션즈 아키텍트, 강산아 NDREAM 팀장, 송영호 야놀자 매니저, ...
고객의 플랫폼/서비스를 개선한 국내 사례 살펴보기 – 장준성 AWS 솔루션즈 아키텍트, 강산아 NDREAM 팀장, 송영호 야놀자 매니저, ...
 
Deep Dive on Amazon Elastic Container Service (ECS) and Fargate
Deep Dive on Amazon Elastic Container Service (ECS) and FargateDeep Dive on Amazon Elastic Container Service (ECS) and Fargate
Deep Dive on Amazon Elastic Container Service (ECS) and Fargate
 
높은 가용성과 성능 향상을 위한 ElastiCache 활용 팁 - 임근택, SendBird :: AWS Summit Seoul 2019
높은 가용성과 성능 향상을 위한 ElastiCache 활용 팁 - 임근택, SendBird :: AWS Summit Seoul 2019 높은 가용성과 성능 향상을 위한 ElastiCache 활용 팁 - 임근택, SendBird :: AWS Summit Seoul 2019
높은 가용성과 성능 향상을 위한 ElastiCache 활용 팁 - 임근택, SendBird :: AWS Summit Seoul 2019
 
FinOps - AWS Cost and Operational Efficiency - Pop-up Loft Tel Aviv
FinOps - AWS Cost and Operational Efficiency - Pop-up Loft Tel AvivFinOps - AWS Cost and Operational Efficiency - Pop-up Loft Tel Aviv
FinOps - AWS Cost and Operational Efficiency - Pop-up Loft Tel Aviv
 
Getting Started with Serverless Architectures
Getting Started with Serverless ArchitecturesGetting Started with Serverless Architectures
Getting Started with Serverless Architectures
 
Cloud Economics
Cloud EconomicsCloud Economics
Cloud Economics
 

Ähnlich wie Build Your First Big Data Application on AWS

BDA309 Build Your First Big Data Application on AWS
BDA309 Build Your First Big Data Application on AWSBDA309 Build Your First Big Data Application on AWS
BDA309 Build Your First Big Data Application on AWSAmazon Web Services
 
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...Amazon Web Services
 
RET301-Build Single Customer View across Multiple Retail Channels using AWS S...
RET301-Build Single Customer View across Multiple Retail Channels using AWS S...RET301-Build Single Customer View across Multiple Retail Channels using AWS S...
RET301-Build Single Customer View across Multiple Retail Channels using AWS S...Amazon Web Services
 
AWS X-Ray: Debugging Applications at Scale - AWS Online Tech Talks
AWS X-Ray: Debugging Applications at Scale - AWS Online Tech TalksAWS X-Ray: Debugging Applications at Scale - AWS Online Tech Talks
AWS X-Ray: Debugging Applications at Scale - AWS Online Tech TalksAmazon Web Services
 
Deliver Voice Automated Serverless BI Solutions in Under 3 Hours - ABD325 - r...
Deliver Voice Automated Serverless BI Solutions in Under 3 Hours - ABD325 - r...Deliver Voice Automated Serverless BI Solutions in Under 3 Hours - ABD325 - r...
Deliver Voice Automated Serverless BI Solutions in Under 3 Hours - ABD325 - r...Amazon Web Services
 
Reactive Architectures with Microservices
Reactive Architectures with MicroservicesReactive Architectures with Microservices
Reactive Architectures with MicroservicesAWS Germany
 
ARC325_Managing Multiple AWS Accounts at Scale
ARC325_Managing Multiple AWS Accounts at ScaleARC325_Managing Multiple AWS Accounts at Scale
ARC325_Managing Multiple AWS Accounts at ScaleAmazon Web Services
 
Deploying Business Analytics at Enterprise Scale - AWS Online Tech Talks
Deploying Business Analytics at Enterprise Scale - AWS Online Tech TalksDeploying Business Analytics at Enterprise Scale - AWS Online Tech Talks
Deploying Business Analytics at Enterprise Scale - AWS Online Tech TalksAmazon Web Services
 
Serverless Architecture Patterns
Serverless Architecture PatternsServerless Architecture Patterns
Serverless Architecture PatternsAmazon Web Services
 
CON309_Containerized Machine Learning on AWS
CON309_Containerized Machine Learning on AWSCON309_Containerized Machine Learning on AWS
CON309_Containerized Machine Learning on AWSAmazon Web Services
 
How Nextdoor Built a Scalable, Serverless Data Pipeline for Billions of Event...
How Nextdoor Built a Scalable, Serverless Data Pipeline for Billions of Event...How Nextdoor Built a Scalable, Serverless Data Pipeline for Billions of Event...
How Nextdoor Built a Scalable, Serverless Data Pipeline for Billions of Event...Amazon Web Services
 
AWS Webinar CZSK Uvod do cloud computingu
AWS Webinar CZSK Uvod do cloud computinguAWS Webinar CZSK Uvod do cloud computingu
AWS Webinar CZSK Uvod do cloud computinguVladimir Simek
 
What's New in Serverless - SRV305 - re:Invent 2017
What's New in Serverless - SRV305 - re:Invent 2017What's New in Serverless - SRV305 - re:Invent 2017
What's New in Serverless - SRV305 - re:Invent 2017Amazon Web Services
 
Interstella 8888: Advanced Microservice Operations - CON407 - re:Invent 2017
Interstella 8888: Advanced Microservice Operations - CON407 - re:Invent 2017Interstella 8888: Advanced Microservice Operations - CON407 - re:Invent 2017
Interstella 8888: Advanced Microservice Operations - CON407 - re:Invent 2017Amazon Web Services
 
Moving to Amazon ECS – the Not-So-Obvious Benefits - CON356 - re:Invent 2017
Moving to Amazon ECS – the Not-So-Obvious Benefits - CON356 - re:Invent 2017Moving to Amazon ECS – the Not-So-Obvious Benefits - CON356 - re:Invent 2017
Moving to Amazon ECS – the Not-So-Obvious Benefits - CON356 - re:Invent 2017Amazon Web Services
 
Using AWS Management Tools to Enable Governance, Compliance, Operational, and...
Using AWS Management Tools to Enable Governance, Compliance, Operational, and...Using AWS Management Tools to Enable Governance, Compliance, Operational, and...
Using AWS Management Tools to Enable Governance, Compliance, Operational, and...Amazon Web Services
 
How Chick-fil-A Embraces DevSecOps on AWS - SID306 - re:Invent 2017
How Chick-fil-A Embraces DevSecOps on AWS - SID306 - re:Invent 2017How Chick-fil-A Embraces DevSecOps on AWS - SID306 - re:Invent 2017
How Chick-fil-A Embraces DevSecOps on AWS - SID306 - re:Invent 2017Amazon Web Services
 
Design, Build, and Modernize Your Web Applications with AWS
 Design, Build, and Modernize Your Web Applications with AWS Design, Build, and Modernize Your Web Applications with AWS
Design, Build, and Modernize Your Web Applications with AWSDonnie Prakoso
 
MCL306_Making IoT Smarter with AWS Rekognition.pdf
MCL306_Making IoT Smarter with AWS Rekognition.pdfMCL306_Making IoT Smarter with AWS Rekognition.pdf
MCL306_Making IoT Smarter with AWS Rekognition.pdfAmazon Web Services
 
MCL306_Making IoT Smarter with AWS Rekognition
MCL306_Making IoT Smarter with AWS RekognitionMCL306_Making IoT Smarter with AWS Rekognition
MCL306_Making IoT Smarter with AWS RekognitionAmazon Web Services
 

Ähnlich wie Build Your First Big Data Application on AWS (20)

BDA309 Build Your First Big Data Application on AWS
BDA309 Build Your First Big Data Application on AWSBDA309 Build Your First Big Data Application on AWS
BDA309 Build Your First Big Data Application on AWS
 
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...
 
RET301-Build Single Customer View across Multiple Retail Channels using AWS S...
RET301-Build Single Customer View across Multiple Retail Channels using AWS S...RET301-Build Single Customer View across Multiple Retail Channels using AWS S...
RET301-Build Single Customer View across Multiple Retail Channels using AWS S...
 
AWS X-Ray: Debugging Applications at Scale - AWS Online Tech Talks
AWS X-Ray: Debugging Applications at Scale - AWS Online Tech TalksAWS X-Ray: Debugging Applications at Scale - AWS Online Tech Talks
AWS X-Ray: Debugging Applications at Scale - AWS Online Tech Talks
 
Deliver Voice Automated Serverless BI Solutions in Under 3 Hours - ABD325 - r...
Deliver Voice Automated Serverless BI Solutions in Under 3 Hours - ABD325 - r...Deliver Voice Automated Serverless BI Solutions in Under 3 Hours - ABD325 - r...
Deliver Voice Automated Serverless BI Solutions in Under 3 Hours - ABD325 - r...
 
Reactive Architectures with Microservices
Reactive Architectures with MicroservicesReactive Architectures with Microservices
Reactive Architectures with Microservices
 
ARC325_Managing Multiple AWS Accounts at Scale
ARC325_Managing Multiple AWS Accounts at ScaleARC325_Managing Multiple AWS Accounts at Scale
ARC325_Managing Multiple AWS Accounts at Scale
 
Deploying Business Analytics at Enterprise Scale - AWS Online Tech Talks
Deploying Business Analytics at Enterprise Scale - AWS Online Tech TalksDeploying Business Analytics at Enterprise Scale - AWS Online Tech Talks
Deploying Business Analytics at Enterprise Scale - AWS Online Tech Talks
 
Serverless Architecture Patterns
Serverless Architecture PatternsServerless Architecture Patterns
Serverless Architecture Patterns
 
CON309_Containerized Machine Learning on AWS
CON309_Containerized Machine Learning on AWSCON309_Containerized Machine Learning on AWS
CON309_Containerized Machine Learning on AWS
 
How Nextdoor Built a Scalable, Serverless Data Pipeline for Billions of Event...
How Nextdoor Built a Scalable, Serverless Data Pipeline for Billions of Event...How Nextdoor Built a Scalable, Serverless Data Pipeline for Billions of Event...
How Nextdoor Built a Scalable, Serverless Data Pipeline for Billions of Event...
 
AWS Webinar CZSK Uvod do cloud computingu
AWS Webinar CZSK Uvod do cloud computinguAWS Webinar CZSK Uvod do cloud computingu
AWS Webinar CZSK Uvod do cloud computingu
 
What's New in Serverless - SRV305 - re:Invent 2017
What's New in Serverless - SRV305 - re:Invent 2017What's New in Serverless - SRV305 - re:Invent 2017
What's New in Serverless - SRV305 - re:Invent 2017
 
Interstella 8888: Advanced Microservice Operations - CON407 - re:Invent 2017
Interstella 8888: Advanced Microservice Operations - CON407 - re:Invent 2017Interstella 8888: Advanced Microservice Operations - CON407 - re:Invent 2017
Interstella 8888: Advanced Microservice Operations - CON407 - re:Invent 2017
 
Moving to Amazon ECS – the Not-So-Obvious Benefits - CON356 - re:Invent 2017
Moving to Amazon ECS – the Not-So-Obvious Benefits - CON356 - re:Invent 2017Moving to Amazon ECS – the Not-So-Obvious Benefits - CON356 - re:Invent 2017
Moving to Amazon ECS – the Not-So-Obvious Benefits - CON356 - re:Invent 2017
 
Using AWS Management Tools to Enable Governance, Compliance, Operational, and...
Using AWS Management Tools to Enable Governance, Compliance, Operational, and...Using AWS Management Tools to Enable Governance, Compliance, Operational, and...
Using AWS Management Tools to Enable Governance, Compliance, Operational, and...
 
How Chick-fil-A Embraces DevSecOps on AWS - SID306 - re:Invent 2017
How Chick-fil-A Embraces DevSecOps on AWS - SID306 - re:Invent 2017How Chick-fil-A Embraces DevSecOps on AWS - SID306 - re:Invent 2017
How Chick-fil-A Embraces DevSecOps on AWS - SID306 - re:Invent 2017
 
Design, Build, and Modernize Your Web Applications with AWS
 Design, Build, and Modernize Your Web Applications with AWS Design, Build, and Modernize Your Web Applications with AWS
Design, Build, and Modernize Your Web Applications with AWS
 
MCL306_Making IoT Smarter with AWS Rekognition.pdf
MCL306_Making IoT Smarter with AWS Rekognition.pdfMCL306_Making IoT Smarter with AWS Rekognition.pdf
MCL306_Making IoT Smarter with AWS Rekognition.pdf
 
MCL306_Making IoT Smarter with AWS Rekognition
MCL306_Making IoT Smarter with AWS RekognitionMCL306_Making IoT Smarter with AWS Rekognition
MCL306_Making IoT Smarter with AWS Rekognition
 

Mehr von Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Mehr von Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Build Your First Big Data Application on AWS

  • 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Building Your First Big Data Application on AWS - ABD317 B e n S n i v e l y , S p e c i a l i s t S o l u t i o n s A r c h i t e c t , A W S R y a n N i e n h u s , S r . P M , A m a z o n K i n e s i s R a d h i k a R a v i r a l a , E M R S o l u t i o n s A r c h i t e c t , A W S D a r i o R i v e r a , S p e c i a l i s t S o l u t i o n s A r c h i t e c t , A W S A l l a n M a c I n n i s , K i n e s i s S o l u t i o n s A r c h i t e c t , A W S C h r i s M a r s h a l l , S o l u t i o n s A r c h i t e c t , A W S N o v e m b e r 2 0 1 7 AWS re:INVENT
  • 2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Your Application Architecture Amazon Kinesis Producer UI Amazon Kinesis Firehose AmazonKinesis Analytics AmazonKinesis Firehose Amazon EMR Amazon Redshift Amazon QuickSight Generate web logs Collect web logs and deliver to S3 Process & compute aggregate web log metrics Deliver processed web logs to Amazon Redshift Raw web logs from Firehose Run SQL queries on processed web logs Visualize web logs to discover insights Amazon S3 Bucket Interactive analysis of web logs Amazon Athena Interactive querying of web logs AWS Glue Extract metadata, create tables and transform Web logs from CSV to Parquet Amazon S3 Bucket Transformed web logs from Glue
  • 3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What is qwikLABS? • Provides access to AWS services for this workshop • No need to provide a credit card • Automatically deleted when you’re finished
  • 4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Sign in and start the lab Once the lab is started you will see a Lab setup progress bar. It takes ~10 min for the lab to be setup
  • 5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Navigating qwikLABS • Student Resources: Scripts for your labs • Open Console : Opens AWS Management Console • Addl Connection Details: Links to different Interfaces
  • 6. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Everything you need for the lab Open the AWS Management Console, login and verify the following AWS resources are created: • One Amazon Kinesis Analytics application • One Kinesis Analytics preprocessing AWS Lambda function • Two Amazon Kinesis Firehose delivery streams • One Amazon EMR Cluster • One Amazon Redshift Cluster Sign up (later) for: • Amazon QuickSight
  • 7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Kinesis Amazon Kinesis Streams Amazon Kinesis Analytics Amazon Kinesis Firehose Build custom applications that process and analyze streaming data Easily process and analyze streaming data with standard SQL Easily load streaming data into AWS
  • 8. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Kinesis Streams • Easy administration and low cost • Build real time applications with framework of choice • Secure, durable storage
  • 9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Kinesis Firehose • Zero administration and seamless elasticity • Direct-to-data store integration • Serverless, continuous data transformations Amazon S3 Amazon Redshift
  • 10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Kinesis - Firehose vs. Streams Amazon Kinesis Streams is for use cases that require custom processing, per incoming record, with sub-1 second processing latency, and a choice of stream processing frameworks. Amazon Kinesis Firehose is for use cases that require zero administration, ability to use existing analytics tools based on Amazon S3, Amazon Redshift, and Amazon Elasticsearch Service, and a data latency of 60 seconds or higher.
  • 11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 1 Collect logs using a Kinesis Firehose delivery stream
  • 12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Your Application Architecture Amazon Kinesis Producer UI Amazon Kinesis Firehose AmazonKinesis Analytics AmazonKinesis Firehose Amazon EMR Amazon Redshift Amazon QuickSight Generate web logs Collect web logs and deliver to S3 Process & compute aggregate web log metrics Deliver processed web logs to Amazon Redshift Raw web logs from Firehose Run SQL queries on processed web logs Visualize web logs to discover insights Amazon S3 Bucket Interactive analysis of web logs Amazon Athena Interactive querying of web logs AWS Glue Extract metadata, create tables and transform Web logs from CSV to Parquet Amazon S3 Bucket Transformed web logs from AWS Glue
  • 13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Collect logs with a Kinesis Firehose delivery stream Time: 5 minutes We are going to: A. Write to a Firehose delivery stream - Simulate writing transformed Apache Web Logs to a Firehose delivery stream that is configured to deliver data into an S3 bucket. There are many different libraries that can be used to write data to a Firehose delivery stream. One popular option is called the Amazon Kinesis Agent.
  • 14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Collect logs with a Kinesis Firehose delivery stream Amazon Kinesis Agent • Standalone Java application to collect and send data to Firehose • Continuously monitors set of files • Handles file rotation, check-pointing and retry on failures • Emits Amazon CloudWatch metrics • Pre-process records parsed from monitored files
  • 15. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Collect logs with a Kinesis Firehose delivery stream For example, the agent can transform an Apache Web Log to JSON. From: To: {"HOST" : "125.166.52.103", "IDENT" : null, "AUTHUSER" : null, "DATETIME" : "08/Mar/2017:17:06:44 -08:00", "REQUEST" : "GET /explore", "RESPONSE" : 200, "BYTES" : 2503, "REFERRER" : null, "AGENT" : "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 5.2; Trident/5.0)“} 125.166.52.103 - - [08/Mar/2017:17:06:44 -08:00] "GET /explore“ 200 2503 "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 5.2; Trident/5.0)“}
  • 16. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Collect logs with a Kinesis Firehose delivery stream So that we don’t have to install or setup software on your machine, we are going to use a utility called the Kinesis Data Generator to simulate using the Amazon Kinesis agent. The Kinesis Data Generator can populate a Firehose delivery stream using a template and is simple to setup. Let’s get started!
  • 17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 1A: Working with the KDG Qwiklabs has already created and setup the Kinesis Firehose delivery stream for us. All we have to do is start writing data to it. 1. Go to this Kinesis Data Generator (KDG) Help Section at http://tinyurl.com/kinesispublisher: Which will redirect to: https://s3.amazonaws.com/kinesis-data-producer-test/help.html
  • 18. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 1A: Working with the KDG 2. Click “Create Amazon Cognito User with AWS CloudFormation” This link will take you to a service called AWS CloudFormation. AWS CloudFormation gives developers and systems administrators an easy way to create and manage a collection of related AWS resources, provisioning and updating them in an orderly and predictable fashion. We use AWS CloudFormation to create the necessary user credentials for you to use the Kinesis Data Generator.
  • 19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 1A: Working with the KDG 3. On the next screen, click next. (Here we are using a template stored in an Amazon S3 bucket to create the necessary credentials).
  • 20. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 1A: Working with the KDG 4. Specify a user name and password (and remember them!), and then click next. This user name and password will be used to sign in to the Kinesis Data Generator. The password must be at least 6 alpha- numeric characters, and contain at least one number.
  • 21. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 1A: Working with the KDG We use a service called Amazon Cognito to create these credentials. Amazon Cognito lets you easily add user sign-up and sign-in to your mobile and web apps. With Amazon Cognito, you also have the options to authenticate users through social identity providers such as Facebook, Twitter, or Amazon, with SAML identity solutions, or by using your own identity system.
  • 22. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 1A: Working with the KDG 5. The next screen has some additional options for the CloudFormation stack which are not needed. Click next.
  • 23. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 1A: Working with the KDG 6. This screen is a review screen so you can verify you have selected the correct options. When you are ready, check the “acknowledge” button and click create.
  • 24. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 1A: Working with the KDG 7. You are taken to a screen that shows the stack creation process. (You may have to refresh the page). In approximately one minute, the stack will complete and you will see “CREATE_COMPLETE” under status. Once this occurs, a) select the template, b) select the outputs tab, c) click the link to navigate to your very own Kinesis Data Generator (hosted on Amazon S3).
  • 25. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 1A: Working with the KDG 8. Login using your user name and password 9. Select the region “us-west-2” (Oregon) 10. Select the delivery stream with name (qls-<somerandomnumber>- FirehoseDeliveryStream-<somerandomnumber>-11111111111) 11. Specify a data rate (Records per second). Please choose a number less than 10.
  • 26. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 1A: Working with the KDG We have chosen a number less than 10 because everyone is on the same WiFi and we want to be sure we don’t use all the bandwidth. Additionally, if you are not plugged in you may run into a battery issues at a higher rate. 12. For the Record Template, you should use the template (Apache Combined Log Template found in the Student Resources section). The template should look like the following:
  • 27. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 1A: Working with the KDG 13. Click “Send Data to Kinesis”
  • 28. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Review: Monitoring Your Delivery Stream Go to the Amazon CloudWatch Metrics Console and search “IncomingRecords”. Select this metric for your Firehose delivery stream and choose a 1 Minute SUM. What are the most important metrics to monitor?
  • 29. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 2 Real-time data processing using Kinesis Analytics
  • 30. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Your Application Architecture Amazon Kinesis Producer UI Amazon Kinesis Firehose AmazonKinesis Analytics AmazonKinesis Firehose Amazon EMR Amazon Redshift Amazon QuickSight Generate web logs Collect web logs and deliver to S3 Process & compute aggregate web log metrics Deliver processed web logs to Redshift Raw web logs from Firehose Run SQL queries on processed web logs Visualize web logs to discover insights Amazon S3 Bucket Interactive analysis of web logs Amazon Athena Interactive querying of web logs AWS Glue Extract metadata, create tables and transform Web logs from CSV to Parquet Amazon S3 Bucket Transformed web logs from Glue
  • 31. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Kinesis Analytics • Powerful real time applications • Easy to use, fully managed • Automatic elasticity
  • 32. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Kinesis Analytics Applications Easily write SQL code to process streaming data Connect to streaming source Continuously deliver SQL results
  • 33. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Process Data using Kinesis Analytics Time: 20 minutes We are going to:  Write a SQL query to compute an aggregate metrics for an interesting statistic on the incoming data  Write a SQL query using an anomaly detection function
  • 34. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 2A: Start Amazon Kinesis Analytics App • Navigate to the Kinesis dashboard • Click on the Kinesis Analytics Application
  • 35. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 2A: Start Kinesis App Click on “Go to SQL editor”. In the next screen, click on “Yes, start application”
  • 36. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. View Sample Records in Kinesis App • Review sample records delivered to the source stream (SOURCE_SQL_STREAM_001)
  • 37. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Kinesis App Metadata Note that Amazon Kinesis adds metadata to each record being sent that was shown in the formatted record sample: • The ROWTIME represents the time the application read the record, and is a special column used for time series analytics. This is also known as the process time. • The APPROXIMATE_ARRIVAL_TIME is the time the delivery stream received the record. This is also known as ingest time.
  • 38. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Kinesis Analytics: How streaming app works Writing SQL over streaming data using Kinesis Analytics follows a two part model: 1. Create an in-application stream for storing intermediate SQL results. An in-application stream is like a SQL table, but is continuously updated. 2. Create a PUMP which will continuously read FROM one in-application stream and INSERT INTO a target in-application stream DESTINATION_STREAM Part 1 AGGREGATE_STREAM Part 2 SOURCE_SQL_STREAM_001 Source Stream OUTPUT_PUMP AGGREGATE_PUMP Send to Redshift
  • 39. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 2B: Calculate an aggregate metric Calculate a count using a tumbling window and a GROUP BY clause. A tumbling window is similar to a periodic report, where you specify your query and a time range, and results are emitted at the end of a range. (EX: COUNT number of items by key for 10 seconds)
  • 40. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 2C: Calculate an aggregate metric Tumbling Sliding Custom • Fixed size and non-overlapping • Use FLOOR() or STEP() function (coming soon) in a GROUP BY statement • Fixed size and overlapping; row boundaries are determined when new rows enter window • Use standard OVER and WINDOW clause (i.e. count (col) OVER (RANGE INTERVAL ‘5’ MIN) • Not fixed size and overlapping; row boundaries are determined by conditions • Implementations vary, but typically require two steps (Step 1 – Identify boundaries, Step 2 – perform computation)
  • 41. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 2C: Calculate an aggregate metric The window is defined by the following statement in the SELECT statement. Note that the ROWTIME column is implicitly included in every stream query, and represents the processing time of the application. This is known as a tumbling window. Tumbling windows are always included in a GROUP BY clause and use a STEP function. The STEP function takes an interval to product the periodic reports. You can also use the SQL FLOOR() function to achieve the same thing. STEP(source_sql_stream_001.ROWTIME BY INTERVAL '10' SECOND)
  • 42. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 2B: Calculate an aggregate metric Create an aggregate_stream using SQL command found in the Kinesis Analytics SQL file located in the Student Resources section of your lab. Copy and paste the SQL in the SQL editor underneath the “DESTINATION_SQL_STREAM” DDL. Click “Save and run SQL”
  • 43. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 2C: Anomaly Detection • Kinesis Analytics includes some advanced algorithms in it that are extensions to the SQL language. These include approximate count distinct (hyperloglog), approximate top K (space saving), and anomaly detection (random cut forest). • The random cut forest algorithm will detect anomalies in real-time on multi-dimensional data sets. You pass the algorithm n number of numeric fields and the algorithm produces an anomaly score on your stream data. Higher scores are more anomalous. • The minimum anomaly score is 0 and the maximum is log2 s, where s is the subsample size parameter passed to random cut forest (third parameter). You will have to try the algorithm on your data to get a feel for the anomaly score, as the score is data-dependent.
  • 44. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 2C: Anomaly Detection Create an anomaly_stream using the SQL command found in the Kinesis Analytics SQL file located in the Student Resources section of your lab. Append the SQL in your SQL editor. Click “Save and run SQL”
  • 45. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Review – In-Application SQL streams Your application has multiple in-application SQL streams including DESTINATION_SQL_STREAM and ANOMALY_STREAM. These in-application streams which are like SQL tables that are continuously updated. What else is unique about an in-application stream aside from its continuous nature?
  • 46. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 3 Deliver streaming results to Amazon Redshift
  • 47. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Your Application Architecture Amazon Kinesis Producer UI Amazon Kinesis Firehose AmazonKinesis Analytics AmazonKinesis Firehose Amazon EMR Amazon Redshift Amazon QuickSight Generate web logs Collect web logs and deliver to S3 Process & compute aggregate web log metrics Deliver processed web logs to Redshift Raw web logs from Firehose Run SQL queries on processed web logs Visualize web logs to discover insights Amazon S3 Bucket Interactive analysis of web logs Amazon Athena Interactive querying of web logs AWS Glue Extract metadata, create tables and transform Web logs from CSV to Parquet Amazon S3 Bucket Transformed web logs from Glue
  • 48. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 3: Deliver data to Amazon Redshift using Kinesis Firehose Time: 5 minutes We are going to: A. Connect to Amazon Redshift cluster and create a table to hold web logs data B. Update Kinesis Analytics application to send data to Amazon Redshift, via the Firehose delivery stream.
  • 49. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. You can connect with pgweb • Already Installed and configured for the Redshift Cluster • Just navigate to pgweb and start interacting Note: From the qwikLABS console, open the pgWeb link in a new window. Or, Use any JDBC/ODBC/libpq client • Aginity Workbench for Amazon Redshift • SQL Workbench/J • DBeaver • Datagrip If you use above SQL clients, • The username/password is in your qwikLABS console – scroll at the bottom. The end point is there too and the database is called logs. Activity 3A: Connect to Amazon Redshift
  • 50. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 3B: Create table in Amazon Redshift Create table weblogs to capture the in-coming data from a Firehose delivery stream Note: You can download Amazon Redshift SQL code from qwikLabs Student Resource Section (Click on Open Redshift SQL)
  • 51. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 3C: Deliver Data to Amazon Redshift using Firehose Update Kinesis Analytics application to send data to Firehose delivery stream. Firehose delivers the streaming data to Amazon Redshift. 1. Go to the Kinesis Analytics console 2. Choose the Amazon Redshift delivery stream as destination and click on the edit button (see the pencil icon in the figure below)
  • 52. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 3C: Deliver Data to Amazon Redshift using Firehose 4. Validate your destination 1. Choose the Firehose “qls-xxxxxxx-RedshiftDeliveryStream- xxxxxxxx” delivery stream. 2. Keep the default for “Connect in-application stream” 3. Choose CSV as the “Output format” 4. Select “Choose from IAM roles that Kinesis Analytics can assume” 5. Click “Save and continue” 5. It will take about 1 – 2 minutes for everything to be updated and for data to start appearing in Amazon Redshift.
  • 53. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 3C: Deliver Data to Amazon Redshift using Firehose
  • 54. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Review: Amazon Redshift Test Queries Find distribution of response codes over days (Copy SQL from Redshift SQL file Count the number of 404 response codes
  • 55. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Review: Amazon Redshift Test Queries Show all requests paths with status “PAGE NOT FOUND
  • 56. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Extract, Transform and Load (ETL) with AWS Glue
  • 57. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Fully managed ETL (extract, transform, and load) service AWS Glue • Categorize your data, clean it, enrich it and move it reliably between various data stores • Once catalogued, your data is immediately searchable and query-able across your data silos • Simple, flexible and cost-effective • Serverless; runs on a fully managed, scale-out Spark environment
  • 58. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Glue Components Data Catalog • Discover and Organize your data in various databases, data warehouses and data lakes • Runs jobs in Spark containers – automatic scaling based on SLA • Glue is serverless – only pay for the resources you consume Job Authoring • Focus on the writing transformations • Generate code through a wizard • Write your own code Job Execution
  • 59. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Glue-How it works
  • 60. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 4: Transform Web logs to Parquet using AWS Glue
  • 61. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Your Application Architecture Amazon Kinesis Producer UI Amazon Kinesis Firehose AmazonKinesis Analytics AmazonKinesis Firehose Amazon EMR Amazon Redshift Amazon QuickSight Generate web logs Collect web logs and deliver to S3 Process & compute aggregate web log metrics Deliver processed web logs to Redshift Raw web logs from Firehose Run SQL queries on processed web logs Visualize web logs to discover insights Amazon S3 Bucket Interactive analysis of web logs Amazon Athena Interactive querying of web logs AWS Glue Extract metadata, create tables and transform Web logs from CSV to Parquet Amazon S3 Bucket Transformed web logs from Glue
  • 62. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity: Catalog and perform ETL on Weblogs Time: 40 minutes We are going to: A. Discover and catalog the web log data deposited into the S3 bucket using AWS Glue Crawler B. Transform Web logs to Parquet format with AWS Glue ETL Job Authoring tool
  • 63. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 4A: Discover dataset with AWS Glue We use AWS GLUE’s crawler to extract data and metadata. From the AWS Management Console, select AWS Glue Service, click on “Get Started” on the next screen.
  • 64. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 4A: Add crawler using AWS Glue Select Crawlers section on the left and click on Add crawler
  • 65. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 4A: ETL with AWS Glue Specify a name for the crawler. Click on folder icon to choose the data store on S3. Click Next
  • 66. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 4A: ETL with AWS Glue Provide S3 path location where the weblogs are deposited (navigate S3 path with contains the word “logs” in it and select the “raw” folder). Click Select
  • 67. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 4A: ETL with AWS Glue Click Next on the next screen to not add another data store
  • 68. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 4A: ETL with AWS Glue In the IAM Role section, select “Create an IAM role”, add “default” to the IAM role and click “Next”
  • 69. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 4A: Add crawler in AWS Glue Choose Run on demand, to run the crawler now and click Next
  • 70. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 4A: Add crawler with AWS Glue On the next screen, click on Add database to add a database (use weblogdb for the database name)
  • 71. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 4A: Add crawler with AWS Glue Review and click Finish to create a crawler
  • 72. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 4A: Add crawler with AWS Glue Click on Run it now link to run the crawler
  • 73. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 4A: Add crawler with AWS Glue Crawler shows a Ready status when it is finished running.
  • 74. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 4A: Table creation in AWS Glue Observe that the crawler has created a table for your dataset. The crawler automatically classified the dataset as combinedapache log format. Click the table to take a look at the properties
  • 75. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 4A: Table creation in AWS Glue Glue used the GrokSerDe (Serializer/Deserializer) to correctly interpret the web logs You can click on the View partitions link to look at the partitions in the dataset
  • 76. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 4A: Create ETL Job in AWS Glue With the dataset cataloged and table created, we are now ready to convert the web logs based apache combined log format to a more optimal Parquet format for querying. Click on Add job to begin creating the ETL job
  • 77. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 4B: ETL Job in AWS Glue • In the Job properties window, specify the Job Name • In this example, we will create a new ETL script. • Glue automatically chooses the name of the script file name and the path where the script will be persisted • For the “Temporary Directory” specify a S3 temporary directory in your lab account (use s3://<stack>-logs-<account>-us-west-2/ bucket and append a folder name temp) • The path should look like: s3://<stack>-logs-<account>-us-west-2/temp/
  • 78. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 4B: ETL Job in AWS Glue • DO NOT CLICK NEXT YET • Copy the temporary path to a text editor, modify the path as follows and save it in a file (we will need this path for storing parquet files) s3://<stack>-logs-<account>-us-west-2/weblogs/processed/parquet For e.g., s3://qls-108881-f75177905b7b5b0e-logs-XXXXXXXX3638-us-west- 2/weblogs/processed/parquet
  • 79. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 4B: ETL Job in AWS Glue • Expand “Script libraries and job parameters” section, and increase the DPUs to 20 • Let’s pass a job parameter to send the S3 path where parquet files will be deposited. • Specify the following values for Key and Value • Key: -- parquet_path (notice the 2 hyphens at the beginning) • Value: s3://<stack>-logs-<account>-us- west-2/weblogs/processed/parquet/ • Note: Value is the S3 path we stored in the previous section
  • 80. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 4B: ETL Job in AWS Glue Click Next in the following screen and click Finish to complete the job creation
  • 81. Activity 4B: ETL Job in Glue • Close Script Editor tips window (if it appears) • In the Glue Script Editor, copy the ETL code by clicking on the “Open Glue ETL Code” link in Student Resources • Ensure that the database name (db_name) and table name reflect the database and table name created by the Glue Crawler
  • 82. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 4B: ETL Job in AWS Glue Click Save and then Run Job button to execute your ETL. Click on “Save and run job” in the next window.
  • 83. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 4B: ETL Job in AWS Glue Run job to continue. This might take a few minutes. When the job finishes, weblogs will be transformed to parquet format
  • 84. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Interactive Querying with Amazon Athena
  • 85. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Interactive Query Service • Query directly from Amazon S3 • Use ANSI SQL • Serverless • Multiple Data Formats • Cost Effective Amazon Athena
  • 86. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Familiar Technologies Under the Covers Used for SQL Queries In-memory distributed query engine ANSI-SQL compatible with extensions Used for DDL functionality Complex data types Multitude of formats Supports data partitioning
  • 87. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Comparing performance and cost savings for compression and columnar format Dataset Size on Amazon S3 Query Run time Data Scanned Cost Data stored as text files 1 TB 236 seconds 1.15 TB $5.75 Data stored in Apache Parquet format* 130 GB 6.78 seconds 2.51 GB $0.013 Savings / Speedup 87% less with Parquet 34x faster 99% less data scanned 99.7% savings (*compressed using Snappy compression) https://aws.amazon.com/blogs/big-data/analyzing-data-in-s3-using-amazon-athena/
  • 88. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 5 Interactive Querying with Amazon Athena
  • 89. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Your Application Architecture Amazon Kinesis Producer UI Amazon Kinesis Firehose AmazonKinesis Analytics AmazonKinesis Firehose Amazon EMR Amazon Redshift Amazon QuickSight Generate web logs Collect web logs and deliver to S3 Process & compute aggregate web log metrics Deliver processed web logs to Amazon Redshift Raw web logs from Firehose Run SQL queries on processed web logs Visualize web logs to discover insights Amazon S3 Bucket Interactive analysis of web logs Amazon Athena Interactive querying of web logs AWS Glue Extract metadata, create tables and transform Web logs from CSV to Parquet Amazon S3 Bucket Transformed web logs from Glue
  • 90. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity: Interactive querying with Amazon Athena Time: 15 minutes We are going to: A. Create a table over the processed weblogs in S3. These are the parquet files created by AWS Glue ETL job in the previous section B. Run interactive queries on the Parquet’ed weblogs
  • 91. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 5A: Setup Amazon Athena 1. From the AWS Management Console, search for Athena and click on the service
  • 92. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 5A: Setup Amazon Athena 2. Select Amazon Athena from the Analytics section and click on Get Started on the next page
  • 93. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 5A: Setup Amazon Athena 3. Dismiss the window for running the Athena tutorial. 4. Dismiss any other tutorial window
  • 94. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 5A: Setup Amazon Athena 5. We are now ready to create a table in Athena. But before we do that, we need to get the S3 bucket location where AWS Glue job delivered parquet files. Click on Services, then choose S3 from the Storage Section.
  • 95. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 5A: Set Up Amazon Athena 6. Locate the bucket looks like ‘qls-<stackname>-logs-#####-us-west-2’. Navigate to the parquet folder. Copy the name of the bucket into a text editor
  • 96. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 5A: Setup Amazon Athena 7. Go back to the Athena Console 8. Let’s create a table in Athena on the Parquet dataset created by AWS Glue. 9. In the Athena console, choose the ‘weblogdb’ on the database dropdown 10. Enter the SQL command (found by clicking on the “Open Athena SQL” link in Student Resources section of qwiklabs) to create a table. 11. Make sure to replace the <your-parquet-path> with the bucket location for the parquet files you copied in the previous step. (Screen shot in next slide)
  • 97. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 5B: Working with Amazon Athena
  • 98. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 5B: Working with Amazon Athena 8. The SQL DDL in the previous step creates a table in Athena based on the parquet files we created with the Glue ETL Job 9. Select weblogdb from the database section and click on the three stacked dots icon to sample a few rows of the S3 data
  • 99. Activity 5C: Interactive Querying with Amazon Athena • Run interactive queries (copy SQL queries from “Athena SQL” in Student Resources) and see the results on the console
  • 100. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 5C: Interactive Querying with Athena 2. Optionally, you can save the results of a query to CSV by choosing the file icon on the Results pane. 3. You can also view the results of previous queries or queries that may take some time to complete. Choose History then either search for your query or choose View or Download to view or download the results of previous completed queries. This also displays the status of queries that are currently running.
  • 101. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Review: Amazon Athena Interactive Queries Query results are also stored in Amazon S3 in a bucket called aws-athena- query-results-ACCOUNTID-REGION. Where can you can change the default location in the console?
  • 102. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data processing with Amazon EMR
  • 103. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon EMR release Storage S3 (EMRFS), HDFS YARN Cluster Resource Management Batch MapReduce Interactive Tez In Memory Spark Applications Hive, Pig, Spark SQL/Streaming/ML, Mahout, Sqoop HBase/Phoenix Presto Hue (SQL Interface/Metastore Management) Zeppelin (Interactive Notebook) Ganglia (Monitoring) HiveServer2/Spark Thriftserver (JDBC/ODBC) Amazon EMR service Streaming Flink
  • 104. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. On-cluster UIs Manage applicationsNotebooks SQL editor, Workflow designer, Metastore browser Design and execute queries and workloads And more using bootstrap actions!
  • 105. The Hadoop ecosystem can run in Amazon EMR
  • 106. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Easy to use Spot Instances On-demand for core nodes Standard Amazon EC2 pricing for on-demand capacity Spot Instances for task nodes Up to 90% off Amazon EC2 on-demand pricing Meet SLA at predictable cost Exceed SLA at lower cost
  • 107. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon S3 as your persistent data store Separate compute and storage Resize and shut down Amazon EMR clusters with no data loss Point multiple Amazon EMR clusters at same data in Amazon S3 EMR EMR Amazon S3
  • 108. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. EMRFS makes it easier to leverage S3 Better performance and error handling options Transparent to applications – Use “s3://” Consistent view • For consistent list and read-after-write for new puts Support for Amazon S3 server-side and client-side encryption Faster listing using EMRFS metadata
  • 109. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Apache Spark • Fast, general-purpose engine for large- scale data processing • Write applications quickly in Java, Scala, or Python • Combine SQL, streaming, and complex analytics
  • 110. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Apache Zeppelin • Web-based notebook for interactive analytics • Multiple language back end • Apache Spark integration • Data visualization • Collaboration https://zeppelin.apache.org/
  • 111. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 6 Interactive analysis using Amazon EMR
  • 112. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Your Application Architecture Amazon Kinesis Producer UI Amazon Kinesis Firehose AmazonKinesis Analytics AmazonKinesis Firehose Amazon EMR Amazon Redshift Amazon QuickSight Generate web logs Collect web logs and deliver to S3 Process & compute aggregate web log metrics Deliver processed web logs to Amazon Redshift Raw web logs from Firehose Run SQL queries on processed web logs Visualize web logs to discover insights Amazon S3 Bucket Interactive analysis of web logs Amazon Athena Interactive querying of web logs AWS Glue Extract metadata, create tables and transform Web logs from CSV to Parquet Amazon S3 Bucket Transformed web logs from Glue
  • 113. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 6: Process and Query data with Amazon EMR Time: 20 minutes We are going to: A. Use a Zeppelin Notebook to interact with Amazon EMR Cluster B. Process the data in Amazon S3 using Apache Spark C. Query the data processed in the earlier stage and create simple charts
  • 114. Activity 6A: Open the Zeppelin interface 1. Copy the Zeppelin end point in Student Resources section in qwiklabs 2. Click on the “Open Zeppelin Notebook” link in Student Resources section to open the zeppelin link into a new window. 3. Download the file (or Copy and save it to file with .json extension) 4. Import the Notebook using the Import Note link on Zeppelin interface
  • 115. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 6A: Open the Zeppelin interface • Use s3://<stack>-logs-<account>-us-west-2/processed/parquet bucket where the processed parquet files were deposited. • Run the paragraph
  • 116. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 6B: Run the notebook Enter the S3 bucket name where the parquet files are stored. The bucket name begins with <stack>-*-logs-#####-region Execute Step 1 • Enter bucket name (<stack>-*-logs-##########-us-west-2) Execute Step 2 • Create a Dataframe with the parquet files from the Glue ETL job Execute Step 3 • Sample a few rows
  • 117. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 6B: Run the notebook Execute Step 4 to process the data • Notice how the ‘AGENT’ field consists of the ’BROWSER’ at the beginning of the column value. Let’s extract the browser from that . • Create a UDF that will extract the browser part and add it to the Dataframe • Print the new Dataframe
  • 118. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 6B: Run the notebook Execute Step 6 • Register the data frame as a temporary table • Now you can run SQL queries on the temporary tables. Execute the next 3 steps and observe the charts created • What did you learn about the dataset?
  • 119. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Review : Interactive analysis using Amazon EMR You just learned on how to process and query data using Amazon EMR with Apache Spark Amazon EMR has many other frameworks available for you to use • Hive, Presto, Flink, Pig, MapReduce • Hue, Oozie, HBase
  • 120. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Optional Exercise: Data Visualization with Amazon QuickSight
  • 121. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon QuickSight Fast, Easy Interactive Analytics for Anyone, Everywhere Ease of use targeted at business users. Blazing fast performance powered by SPICE. Broad connectivity with AWS data services, on-premises data, files and business applications. Cloud-native solution that scales automatically. 1/10th the cost of traditional BI solutions. Create, share and collaborate with anyone in your organization, on the web or on mobile.
  • 122. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Connect, SPICE, Analyze Amazon QuickSight allows you to connect to data from a wide variety of AWS, third party, and on-premises sources and import it to SPICE or query directly. Users can then easily explore, analyze, and share their insights with anyone. Amazon RDS Amazon S3 Amazon Redshift
  • 123. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 7 Visualize results in Amazon QuickSight
  • 124. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Your Application Architecture Amazon Kinesis Producer UI Amazon Kinesis Firehose AmazonKinesis Analytics AmazonKinesis Firehose Amazon EMR Amazon Redshift Amazon QuickSight Generate web logs Collect web logs and deliver to S3 Process & compute aggregate web log metrics Deliver processed web logs to Amazon Redshift Raw web logs from Firehose Run SQL queries on processed web logs Visualize web logs to discover insights Amazon S3 Bucket Interactive analysis of web logs Amazon Athena Interactive querying of web logs AWS Glue Extract metadata, create tables and transform Web logs from CSV to Parquet Amazon S3 Bucket Transformed web logs from Glue
  • 125. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 7: Visualization with Amazon QuickSight We are going to: A. Register for a Amazon QuickSight account B. Connect to the Amazon Redshift Cluster C. Create visualizations for analysis to answer questions like: A. What are the most common http requests and how successful (response code of 200) are they B. Which are the most requested URIs
  • 126. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 7A: Amazon QuickSight Registration • Go to AWS Console, click on QuickSight from the Analytics section. • Click on Signup in the next window • Make sure the subscription type is Standard and click Continue on the next screen
  • 127. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 7A: Amazon QuickSight Registration • On the Subscription Type page, enter the account name (see note below) • Enter your email address • Select US West region • Check the S3 (all buckets) box • Note: Amazon QuickSight Account name is the AWS account number on the qwikLabs console
  • 128. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 7A: Amazon QuickSight Registration • If a pop box to choose S3 buckets appears, click Select buckets • Click on Go To Amazon Quicksight • Dismiss welcome screen
  • 129. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 7B: Connect to data source • Click on Manage Data and then select New Dataset to create a new data set in Amazon QuickSight • Choose Redshift (Auto- discovered) as the data source. Amazon QuickSight autodiscovers databases associated with your AWS account (Amazon Redshift database in this case)
  • 130. Activity 7B: Connect to Amazon Redshift Note: Use ”dbadmin” as the username. You can get the Amazon Redshift database password from qwikLABS by navigating to the “Connection details” section (see below)
  • 131. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 7C: Choose your weblogs Amazon Redshift table
  • 132. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 7D: Ingest data into SPICE SPICE is Amazon QuickSight's in- memory optimized calculation engine, designed specifically for fast, Interactive data visualization You can improve the performance of database data sets by importing the data into SPICE instead of using a direct query to the database
  • 133. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Activity 7E: Creating your first analysis What are the most requested http request types and their corresponding response codes for this site? Simply select request, response and let AUTOGRAPH create the optimal visualization
  • 134. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Review – Creating your Analysis Exercise: Add a visual to demonstrate which URI are the most requested?
  • 135. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Please don’t forgot to fill out your evaluations. THANK YOU!