AWS offers a family of AI services that provide cloud-native Machine Learning and Deep Learning technologies allowing developers to build an entirely new generation of apps that can see, hear, speak, understand, and interact with the real world. In this session we take a look at Amazon Rekognition, Amazon Polly, and Amazon Lex.
Speakers:
Adam Larter, Developer Solutions Architect, Amazon Web Services
Alastair Cousins, Solutions Architect, Amazon Web Services
3. What is Amazon Polly?
• A service that converts text into lifelike speech
• Offers 47 lifelike voices across 24 languages
• Low latency responses enable developers to build
real-time systems
• Developers can store, replay, and distribute
generated speech
4. Amazon Polly: Quality
Natural-sounding speech
A subjective measure of how close TTS output is to human speech.
Accurate text processing
Ability of the system to interpret common text formats such
as abbreviations, numerical sequences, homographs etc.
Today in Sydney Australia, it's 26°C
It’s nice to know, we’re going to Nice
Highly intelligible
A measure of how comprehensible speech is.
Peter Piper picked a peck of pickled peppers
5. Amazon Polly: SSML
Speech Synthesis Markup Language
Is a W3C recommendation, an XML-based markup language for speech
synthesis applications
<speak>
My name is Adam Larter. It is spelled
<prosody rate='x-slow'>
<say-as interpret-as="characters">Larter</say-as>
</prosody>
</speak>
7. Polly Voice Synthesis Demo
Amazon Polly
Amazon API
Gateway
Lambda
function
Amazon
S3
Mobile App
IoT Device
Calling through API Gateway
allows us to implement caching
and use throttling and API
Keys via Usage Plans
9. Amazon Rekognition
Deep learning-based image recognition service
Search, verify, and organise millions of images
Object and Scene
Detection
Facial
Analysis
Face
Comparison
Facial
Recognition
10. Amazon Rekognition
Deep learning-based image recognition service
Search, verify, and organise millions of images
Object and Scene
Detection
Facial
Analysis
Face
Comparison
Facial
Recognition
11.
12. Detecting Faces in a Crowd
IoT
Camera
Amazon
Rekognition
Lambda
function
Amazon API
Gateway
DetectFaces()
Image
with
Faces
"Emotions": [
{"Confidence": 99.1335220336914,
"Type": "HAPPY" },
{"Confidence": 3.3275485038757324,
"Type": "CALM"},
{"Confidence": 0.31517744064331055,
"Type": "SAD"}
],
"Eyeglasses": {"Confidence": 99.8050537109375,
"Value": false},
"EyesOpen": {Confidence": 99.99979400634766,
"Value": true},
13. Understanding Bounding Boxes
Turn Ratios into X/Y
co-ordinates:
multiply by the image
width/height
"BoundingBox": {
"Height": 0.3449999988079071,
"Left": 0.09666666388511658,
"Top": 0.27166667580604553,
"Width": 0.23000000417232513
},
14. Tip: Capture Additional Context
Introduce a coefficient to
capture additional image
context by inflating
the bounding box
16. Scaling to Many Faces
Amazon
Rekognition
Lambda
function Amazon
ElasticSearch
Amazon
SNS
Lambda
function
Amazon
S3
User’s Face
Image
Fan Out of Lambda Functions via SNS.
1 Notification per Face detected
Metadata from DetectFaces() +
S3 Object Ref to Face Image
Metadata +
Location +
Timestamp
User’s Face
Image
18. Sign In Using Face
• Cognito User Pools (CUP) as System of Record for users
• Create a Developer-Authenticated Identity Provider (IdP)
to perform AuthN using Amazon Rekognition
• Federate CUP and Developer IdP through
Cognito Identity Federation
• CUP user names are unique – make use of the
ExternalId parameter in indexFaces()
22. Amazon Cognito
Identity Pool
Cognito Identity Token
Link Face to Cognito User
Mobile
App
User’s Face
Image
Amazon API
Gateway
Lambda
functionUser’s Face
Image
+
Cognito User Pool username
stored in the Rekognition
collection as the ExternalId for
the user’s face vector
Amazon
Rekognition
username
as ExternalId
Store in
Collection
Identities linked by call to
getOpenIdTokenForDeveloperIdentity()
23. Amazon Cognito
Identity Pool
Mobile
App
User’s Face
Image
Amazon API
Gateway
Lambda
functionUser’s Face
Image
ExternalId used as the unique user identifier in call to
CognitoIdentity::getOpenIdTokenForDeveloperIdentity
Amazon
Rekognition
Sign In Using Face FaceId +
ExternalId
AccessKeyId / SecretAccessKey / SessionToken
24. Sign In Using Face – Implementation
Linking face to Cognito User:
• Sign in first using Cognito User Pools via Cognito SDK
• Take user’s picture & send image with JWT
• Rekognition::indexFaces()
to store user’s face vector in collection and use
Cognito User Pools username as the External Id
• CognitoIdentity::getOpenIdTokenForDeveloperIdentity
to create a Cognito Token and link the identities together
25. Sign In Using Face – Implementation
Sign in using face:
• Rekognition::searchFacesByImage()
to get External Id
• Cognito::getOpenIdTokenForDeveloperIdentity()
with retrieved External Id to generate the Token and Identity
Id the client app needs
• Client app then follows standard Cognito process using
CognitoCachingCredentialsProvider()
29. Smart Assistant - Key Features
• Triggers using any type of input, not just speech
− This demo uses a camera, and on-device face detection with
OpenCV – http://opencv.org
• Hot word detection to get device’s attention
− Snowboy - https://snowboy.kitt.ai/
• Silence detection during live speech capture
− SoX - http://sox.sourceforge.net/
• NLU provided by Amazon Lex
− Speech input SDK not yet available
− Don’t let that stop you calling the API directly!
30. Smart Assistant
Wait for Hot Word
(Snowboy)
Wait for Face to
appear in camera view
Listen for audio
command
START
31. Smart Assistant
Wait for Face to
appear in camera view
Capture image from
webcam
(fswebcam)
Recognise Face
(Amazon Rekognition)
Resize to improve
process effiiency
(Imagemagick)
Detect face on device
(OpenCV)
Known User State
Replay Audio
Is the face
in the
collection?
YES
NO
Run User Speech
Dialogue Interaction
and NLU
32. Smart Assistant
Process intent
(API Gateway/Lambda)
Listen for speech input
with silence detection
(SoX)
Play audio response &
loop back to listen for
speech input
Construct Lex payload
and submit to API
(HTTPS Request)
Parse response
headers
YES
Run User Speech
Dialogue Interaction
and NLU
Is the
interaction
Ready for
Fulfillment
?
NO
Listen for speech input
with silence detection
(SoX)