The presentation introduced big data and machine learning on AWS. It defined big data and machine learning, discussed common use cases, and described key AWS services that support big data and machine learning workloads. It also provided an example of using machine learning on AWS to more intelligently detect events based on normalized event data from various sources. The presentation concluded with a discussion of demos and further learning resources.
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
Big Data and Machine Learning on AWS
1. Big Data and Machine Learning on AWS
AWS User Groups of Florida
April 2018
Patrick Hannah, VP of Engineering, CloudHesive
2. About Me
• Who am I?
• What’s my background?
• What do I hope to get out of the
presentation?
• How am I using AWS?
3. About CloudHesive
• Professional Services
– Assessment (Current environment, datacenter or cloud footprint)
– Strategy (Getting to the future state)
– Migration (Environment-to-cloud, Datacenter-to-cloud)
– Implementation (Point solutions)
– Support (Break/fix and ongoing enhancement)
• DevOps Services
– Assessment
– Strategy
– Implementation (Point solutions)
– Management (Supporting infrastructure, solutions or ongoing
enhancement)
– Support (Break/fix and ongoing enhancement)
• Managed Security Services (SecOps)
– Encryption as a Service (EaaS) – encryption at rest and in flight
– End Point Security as a Service
– Threat Management
– SOC II Type 2 Validated
• Next Generation Managed Services
– Leveraging our Professional, DevOps and Managed Security Services
– Single payer billing
– Intelligent operations and automation
– AWS Audited
4. What are we going to talk about?
• Big Data and Machine Learning
• Common Use Cases
• AWS Services in support of Big Data and Machine Learning
• Demos
• Conclusion
5. Let’s define Big Data and Machine Learning
• From Wikipedia:
– Big data is data sets that are so voluminous and complex that traditional data processing
application software are inadequate to deal with them
7. Let’s talk about some of it’s applications
• Research
– Grid/HPC Computing (the original cloud)
– Initiator of open source projects
– Enabler and enabled by Public Cloud
– AWS Just Announced OpenData Registry: https://registry.opendata.aws/
• Business Operations
– ERP
– Data Warehouses
– Business Intelligence
– Business Systems
• Applied
– Every Major Industry
– {Dev|Sec|Ops}
– Products (b2b, b2c)
8. Let’s talk about its characteristics
• Lifecycle driven
– Collect
– Store
– Process/Analyze
– Consume
• Generation
– Batch
– Streaming
• Format
– Text
– Images
– Audio
– Video
15. Overview of Machine Learning
• What is Machine Learning?
– A subfield of computer science that evolved from the study of pattern recognition and
computational learning theory in artificial intelligence.
• What is AWS Machine Learning?
– A platform that allows software developers to build and train predictive applications and
host those applications in a scalable AWS cloud solution.
16. Key Terms for AWS Machine Learning
• Datasources
– Contain metadata associated with data inputs to Amazon ML (your sample data)
• ML models
– Generate predictions using the patterns extracted from the input data
• Evaluations
– Measure the quality of ML models
• Batch predictions
– Asynchronously generate predictions for multiple input data observations
• Real-time predictions
– Synchronously generate predictions for individual data observations
17. What problem are we trying to solve?
• Alerting on event data (which we will describe on the next slide) is based on traditional
mechanisms:
• Threshold crossed > alert
• Pattern matched > alert
• These mechanisms are consistent, until an outlier comes along.
• When an outlier comes along, we need to manually evaluate it
• When it comes along again, we add an exception for it
• Why not leverage Machine Learning to do this for us?
18. Get our event data in one place
• Collect from Disparate Systems
– Structured Data (Key/Value, Time/Series)
• CPU, Memory, Storage, IO, Bandwidth
– Unstructured Data (Logs)
• Windows Event Logs
• Linux /var/log
• E-Mail
• Third party systems
• Normalize it (into a common format)
• Push it (to a stream)
19. Evaluate and Action on it with Machine Learning
• Once a threshold has been crossed, but before we take action on it pass it to AWS Machine
Learning (via Kinesis)
• AWS Machine Learning uses the previously designated Model to determine the likelihood of
the event being a false positive
• If Machine Learning determines it’s a false positive, it gets logged in the event stream
• If Machine Learning determines it’s an actionable event, it is forwarded on to our alert system
(via SNS)
20. Why use Machine Learning to Solve This Problem?
• Consistency
– No longer is a human making judgement call (which will vary from person to person)
– No longer is a human taking manual action to whitelist/blacklist/filter the event (which
may be done inconsistently)
• Cost Savings
– The cost of this making the judgement call (in distraction, time and errors) outweighs the
cost of the service
– At $0.0001 per prediction, assuming 1% of events are false positives your cost for
automatically detecting a false positive is $0.01 (1 Cent) versus the cost of paying a
human to manually detect a false positive
21. Conclusion
• AWS provides a number of services to support your Big Data and Machine Learning needs
• Getting started on AWS is easy; with the free tier, you can experiment with a number of
services without incurring significant cost.
• Adoption of AWS in your organization can be as easy or as hard as you want to make it; start
simple and iterate.
23. Further Learning
• Getting Started: https://aws.amazon.com/getting-started
• General Reference: http://docs.aws.amazon.com/general/latest/gr
• Global Infrastructure: https://aws.amazon.com/about-aws/global-infrastructure/
• FAQs: https://aws.amazon.com/faqs
• Documentation: https://aws.amazon.com/documentation/
• Architecture: https://aws.amazon.com/architecture
• Whitepapers: https://aws.amazon.com/whitepapers
• Security: https://aws.amazon.com/security
• Blog: https://aws.amazon.com/blogs
• Service Specific Pages: https://aws.amazon.com/service
• AWS Answers: https://aws.amazon.com/answers/
• AWS Knowledge Center: https://aws.amazon.com/premiumsupport/knowledge-center/
• SlideShare: http://www.slideshare.net/AmazonWebServices
• Github: https://github.com/aws and https://github.com/awslabs
Hinweis der Redaktion
AMAZON DOT COM!!!
Agriculture, Forestry, Fishing and Hunting
Mining, Quarrying, and Oil and Gas Extraction
Utilities
Construction
Manufacturing
Wholesale Trade (41 in Canada,[3] 42 in the United States[2])
Retail Trade
Transportation and Warehousing
Information
Finance and Insurance
Real Estate and Rental and Leasing
Professional, Scientific, and Technical Services
Management of Companies and Enterprises
Administrative and Support and Waste Management and Remediation Services
Educational Services
Health Care and Social Assistance
Arts, Entertainment, and Recreation
Accommodation and Food Services
Other Services (except Public Administration)
Public Administration
From re:Invent 2017
From re:Invent 2017+MQ+DMS+Kinesis Video
From re:Invent 2017+CloudSearch
From re:Invent 2017
+Quicksight+Glue+Data Pipeline
Machine Learning:
Subset of Predictive Analytics
Various techniques/approaches that I won’t get into
Numerous software products available
Examples:
Good Example: Marketing, Fraud Detection, RiskHow Target Knew a High School Girl Was Pregnant Before Her Parents Did (http://techland.time.com/2012/02/17/how-target-knew-a-high-school-girl-was-pregnant-before-her-parents/)
Machine Learning is consistent and not subject to human error but garbage in = garbage out.
Like any piece of technology you are giving up control for perceived benefits (I want to see every event and assess it’s validity versus letting Machine Learning do it for me (cite example of me using Erlang for capacity planning)
AWS’ Machine Learning:
Similar characteristics to other AWS Services (Cloud: Managed, Abstracted)
Once key terms are understood, it’s easy to get started (I’m a great example of this)
Don’t need to pick software, stand up EC2 instances, install it, configure it, learn it
Our interest is in real-time predictions (100 ms, being real time)
Output (target) is binary (1/0), multiclass (a,b,c) or prediction (3.141)
Well suited for situations where manual effort or logic is too complex
Based on https://github.com/awslabs/machine-learning-samples/tree/master/social-media