Harnessing Artificial Intelligence in your Applications - Level 300

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Adam Larter
Principal Solutions Architect, Developer
Specialist, Amazon Web Services
Alastair Cousins
Senior Solutions Architect, Amazon Web Services
Harnessing Artificial Intelligence
in Your Applications
Amazon Rekognition, Amazon Polly, and Amazon Lex
Level 300

Intelligent Multimodal Interfaces

What is Amazon Polly?
• A service that converts text into lifelike speech
• Offers 47 lifelike voices across 24 languages
• Low latency responses enable developers to build
real-time systems
• Developers can store, replay, and distribute
generated speech

Amazon Polly: Quality
Natural-sounding speech
A subjective measure of how close TTS output is to human speech.
Accurate text processing
Ability of the system to interpret common text formats such
as abbreviations, numerical sequences, homographs etc.
Today in Sydney Australia, it's 26°C
It’s nice to know, we’re going to Nice
Highly intelligible
A measure of how comprehensible speech is.
Peter Piper picked a peck of pickled peppers

Amazon Polly: SSML
Speech Synthesis Markup Language
Is a W3C recommendation, an XML-based markup language for speech
synthesis applications
<speak>
My name is Adam Larter. It is spelled
<prosody rate='x-slow'>
<say-as interpret-as="characters">Larter</say-as>
</prosody>
</speak>

Example Use Case
Adding speech synthesis to any app

Polly Voice Synthesis Demo
Amazon Polly
Amazon API
Gateway
Lambda
function
Amazon
S3
Mobile App
IoT Device
Calling through API Gateway
allows us to implement caching
and use throttling and API
Keys via Usage Plans

Images – Another Untapped Interface

Amazon Rekognition
Deep learning-based image recognition service
Search, verify, and organise millions of images
Object and Scene
Detection
Facial
Analysis
Face
Comparison
Facial
Recognition

Detecting Faces in a Crowd
IoT
Camera
Amazon
Rekognition
Lambda
function
Amazon API
Gateway
DetectFaces()
Image
with
Faces
"Emotions": [
{"Confidence": 99.1335220336914,
"Type": "HAPPY" },
{"Confidence": 3.3275485038757324,
"Type": "CALM"},
{"Confidence": 0.31517744064331055,
"Type": "SAD"}
],
"Eyeglasses": {"Confidence": 99.8050537109375,
"Value": false},
"EyesOpen": {Confidence": 99.99979400634766,
"Value": true},

Understanding Bounding Boxes
Turn Ratios into X/Y
co-ordinates:
multiply by the image
width/height
"BoundingBox": {
"Height": 0.3449999988079071,
"Left": 0.09666666388511658,
"Top": 0.27166667580604553,
"Width": 0.23000000417232513
},

Tip: Capture Additional Context
Introduce a coefficient to
capture additional image
context by inflating
the bounding box

Scaling to Many Faces
Amazon
Rekognition
Lambda
function Amazon
ElasticSearch
Amazon
SNS
Lambda
function
Amazon
S3
User’s Face
Image
Fan Out of Lambda Functions via SNS.
1 Notification per Face detected
Metadata from DetectFaces() +
S3 Object Ref to Face Image
Metadata +
Location +
Timestamp
User’s Face
Image

Example Use Case
Authentication using face image

Sign In Using Face
• Cognito User Pools (CUP) as System of Record for users
• Create a Developer-Authenticated Identity Provider (IdP)
to perform AuthN using Amazon Rekognition
• Federate CUP and Developer IdP through
Cognito Identity Federation
• CUP user names are unique – make use of the
ExternalId parameter in indexFaces()

CUP and Developer
Authenticated Identities
will be linked after this call
Linking Identities in Cognito Federation

Amazon
Cognito
User Pool
Username and password sent to
Cognito User Pools Identity Provider
Link Face to Cognito User
Mobile
App

Amazon
Cognito
User Pool
Cognito Identity Token returned
Mobile
App

Amazon Cognito
Identity Pool
Cognito Identity Token
Mobile
App
User’s Face
Image
Amazon API
Gateway
Lambda
functionUser’s Face
Image
+
Cognito User Pool username
stored in the Rekognition
collection as the ExternalId for
the user’s face vector
Amazon
Rekognition
username
as ExternalId
Store in
Collection
Identities linked by call to
getOpenIdTokenForDeveloperIdentity()

Amazon Cognito
Identity Pool
Mobile
App
User’s Face
Image
Amazon API
Gateway
Lambda
functionUser’s Face
Image
ExternalId used as the unique user identifier in call to
CognitoIdentity::getOpenIdTokenForDeveloperIdentity
Amazon
Rekognition
Sign In Using Face FaceId +
ExternalId
AccessKeyId / SecretAccessKey / SessionToken

Sign In Using Face – Implementation
Linking face to Cognito User:
• Sign in first using Cognito User Pools via Cognito SDK
• Take user’s picture & send image with JWT
• Rekognition::indexFaces()
to store user’s face vector in collection and use
Cognito User Pools username as the External Id
• CognitoIdentity::getOpenIdTokenForDeveloperIdentity
to create a Cognito Token and link the identities together

Sign In Using Face – Implementation
Sign in using face:
• Rekognition::searchFacesByImage()
to get External Id
• Cognito::getOpenIdTokenForDeveloperIdentity()
with retrieved External Id to generate the Token and Identity
Id the client app needs
• Client app then follows standard Cognito process using
CognitoCachingCredentialsProvider()

Amazon Lex
AWS
Lambda
Polly Amazon
CloudWatch
Monitoring
Text
Speech
Text
Amazon
DynamoDB
AWS IoT
Amazon API
Gateway
Conversational Interfaces
Applications

Walkthrough
Lex Bot Creation Process

Example Use Case
The Smart Assistant

Smart Assistant - Key Features
• Triggers using any type of input, not just speech
− This demo uses a camera, and on-device face detection with
OpenCV – http://opencv.org
• Hot word detection to get device’s attention
− Snowboy - https://snowboy.kitt.ai/
• Silence detection during live speech capture
− SoX - http://sox.sourceforge.net/
• NLU provided by Amazon Lex
− Speech input SDK not yet available
− Don’t let that stop you calling the API directly!

Smart Assistant
Wait for Hot Word
(Snowboy)
Wait for Face to
appear in camera view
Listen for audio
command
START

Smart Assistant
Wait for Face to
appear in camera view
Capture image from
webcam
(fswebcam)
Recognise Face
(Amazon Rekognition)
Resize to improve
process effiiency
(Imagemagick)
Detect face on device
(OpenCV)
Known User State
Replay Audio
Is the face
in the
collection?
YES
NO
Run User Speech
Dialogue Interaction
and NLU

Smart Assistant
Process intent
(API Gateway/Lambda)
Listen for speech input
with silence detection
(SoX)
Play audio response &
loop back to listen for
speech input
Construct Lex payload
and submit to API
(HTTPS Request)
Parse response
headers
YES
Run User Speech
Dialogue Interaction
and NLU
Is the
interaction
Ready for
Fulfillment
?
NO
Listen for speech input
with silence detection
(SoX)

Harnessing Artificial Intelligence in your Applications - Level 300

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Harnessing Artificial Intelligence in your Applications - Level 300

Ähnlich wie Harnessing Artificial Intelligence in your Applications - Level 300 (20)

Mehr von Amazon Web Services

Mehr von Amazon Web Services (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Harnessing Artificial Intelligence in your Applications - Level 300