Video is a "last-mile problem" for search technology. Unlike webpages, documents, and email, content in videos has traditionally been impossible to search. Recent advances in automated speech and text recognition, however, let businesses and universities search inside video assets as easily as inside textual content. In this session, you'll learn how Panopto is using AWS to solve the video-search problem at scale, while saving over 50% in operating costs by taking advantage of Spot instances. We discuss the cross-platform architecture that combines Windows and Linux to provide cost-effective video processing and search indexing. We also dive deep into scaling Spot elastically based on user demand, handling fallback situations when instances are revoked, and using the Spot bidding process to optimize cost structure. Finally, we discuss future plans to reduce operating costs even further through Spot fleets and grid processing.
2. What to Expect from the Session
Primer on inside-video search
Dive into how we use Spot to search video at scale
Overview of our cross-platform architecture
Best practices for scaling Spot Instances elastically
7. Title: An Introduction to Network Security
Description: A broad overview of network
security as defined by today’s hybrid
corporate WANs.
Tags: Network security, intrusion detection,
corporate WAN, firewall, authentication
!?
9. The network is the entry point to your application. It provides the first gatekeepers that
control access to the various servers in your environment. Servers are protected with
their own operating system gatekeepers, but it is important not to allow them to be
deluged with attacks from the network layer. It is equally important to ensure that network
gatekeepers cannot be replaced or reconfigured by imposters. In a nutshell, network
security involves protecting network devices and the data that they forward.
The basic components of a network, which act as the front-line gatekeepers, are the
router, the firewall, and the switch. An attacker looks for poorly configured network
gatekeepers to exploit. Common vulnerabilities include weak default installation settings,
wide-open access controls, and unpatched devices.
50%
10. 5,625 words spoken
50% have no search value
2,813 words with search value
With10 tags, you’ve
only covered 0.3%
of valuable content
11.
12. Six Types of Video Content Indexing
1. Manually entered metadata
2. Transcription
3. Automatic Speech Recognition (ASR)
4. Optical Character Recognition (OCR)
5. Slide extraction
6. Viewer notes
15. Our Challenge
2013-01 2014-01 2015-01 2016-01
Running on AWS since 2009
Growing exponentially
Need to index every video – quickly & cost-efficiently
15 years of video (400TB) content uploaded monthly
Need to extract metadata out of 4PB of video
122M unique images have been indexed for OCR
>3TB SOLR index
* Numbers are inclusive of both enterprise and education accounts; numbers do not include on-premises customers
16. Option 1: On-Demand Amazon EC2 Instances
Hours of Content
$
Budget
Today
Cost-prohibitive to
offer to all
customers
Cost
Enable
ASR/OCR
17. Content Ingestion
Windows and
Mac Clients
Mobile Apps
Video Capture
Appliance
Remote Capture
Client
Other Ingestion
Content DiscoveryContent Management Content Delivery
Content
Consumption
Transcoding
Editing
Search Indexing
Governance
Option 2: Make Search an Upsell Capability
Analytics
Access Control
Video CMS
Public Hosting
SmartSearch™
Email and Social
Integrations
Search
Federation
Panopto
Streaming
CDN Integration
P2P Streaming
Panopto ECDN
WAN Op
Solutions
Interactive
Player
Panopto Mobile
Audio Podcast
Embedded
Player
Quizzing and
Polls
18. Option 3: Use Reserved Instances (RIs)
Theoretically would save costs
RIs work best for predictable workloads
30 sec SLA to begin indexing results in spiky demand curve vs. flat line
Upfront Monthly Effective
Hourly
Savings over
On-Demand
On-Demand
Hourly
$0 $213.16 $0.292 30%
$0.42$1304 $75.92 $0.253 40%
$2170 $0.00 $0.248 41%
c3.2xlarge
19. Option 3: Use Reserved Instances (RIs)
RI
Delayed
Start
WasteWaste
# Instances
t
20. Option 3: Use Reserved Instances (RIs)
RI
Overspend Overspend Overspend
Waste Waste
# Instances
t
22. Option 5: Spot Instances
Excess EC2 capacity auctioned at steeply discounted prices
Spot Instances can be accessed on demand to meet our variable needs
On-Demand
Instances
Spot Instances added
when bid ≥ market
23. Pre-configured or custom machine images
Configure security and network access
Choose from instance types and locations
Use static IP endpoints
Attach persistent block storage to instances
Pay fixed price by the hour
On-Demand vs. Spot Instances
Pre-configured or custom machine images
Configure security and network access
Choose from instance types and locations
Use static IP endpoints
Attach persistent block storage to instances
Pay variable by the hour
25. The Spot Auction
Set a bid price (for example, $0.27)
Instance runs while bid ≥ market price
Instances terminate bid < market price
Instances run
Instances terminate
26. Spot Considerations
Is your workload appropriate for potential volatility?
How to deal with a lack of capacity?
Can you run on a wide range of instance types
(via Spot Fleet)?
Look at historical bid prices for your instance types and
regions to estimate your savings.
28. The Importance of Windows to our
Architecture
Single codebase for cloud and on-premises
For on-prem customers, Windows is often a requirement
Windows is therefore critical to our cloud architecture as well
On-Prem Cloud
34. Bidding Strategy: Start Simple
Sealed-bid, second-price auction
Set your bid to market price
of an On-Demand Instance
$0.14
$0.24
$0.34
On-Demand
Instance Price: $0.84
35. The Challenge of Long-Running Jobs
The longer the job, the greater the
chance of instance revocation
Short window to determine how best
to failover (2 minutes)
Job Length
ChanceofInstanceRevocation
36. Managing Jobs in the Face of Instance Revocation
$
Market price
increase
Spot
“Spotter”
service
Wait until
T-30s Is Job
Done?
Yes
No Action
No
1. Save State
2. Kill Job
3. Reallocate
!
37. Scaling Up with Predictive Job Modeling
1. Number of waiting jobs
2. Number of jobs currently processing
3. When current jobs expected to finish
4. Incoming jobs in the last <interval>
5. Number of jobs expected to arrive
6. Time to spin up new machine
7. SLA by job
Inputs
More processing
capacity required?
Data
Scientists
?
39. Scaling Down
Active
Active
Hold
Hold
If the rate of incoming and in-process jobs is less than current processing capacity,
then we’re in a scale-down state.
Identify instances, not processing jobs. Then identify those within 15 minutes of a billing hour.
Active
Hold
Scale
Down
Scale
Down
Hold
Active
Active Hold
Scale
Down
Scale
Down
Active
40. But what if there’s a deficit of Spot capacity?
Operate two Auto Scaling groups for each backend worker pool
One for Spot ASG, one for on-demand ASG
When actual Spot capacity < desired capacity, offload to on-demand
Automatic Speech Recognition
Spot
On-Demand
42. Move to Spot Fleet
Ability to launch the most cost-efficient
instance type for any job
Lower prices with diversified resources
Ability to apply custom weighting (create
capacity units based on our app needs)
Challenge: no accounting for the cost of
EBS
Challenge: lacking ASG’s health checks
Challenge: lacking ASG’s tag propagation
43. From Immutable to Dynamic
Instance Configuration
Need to account for different processing capacity of different instance types
Will need to optimize number of workers being run in parallel on each VM
Substantial cost savings potential
Today: Immutable
Pro: Spin up instances quickly
Con: Could be more cost-efficient
Future: Dynamic
Choose the best Availability Zone,
instance type based on market price
44. Subdivide job
for grid processing
Future
Painful to cancel a 90% complete,
30 minute OCR indexing job
Today
Subdividing Jobs
Grid processing minimizes impact of Spot Instance loss
Also allows greater parallelization for faster user-visible time to task completion
47. Scenarios Spot has Unlocked for Panopto
Scale our inside-video search
technology across our entire
customer base.
Accelerate business growth. The
money saved with Spot is being
reinvested in expanding our team.