How the Air Force leveraged 50+ years of accumulated data in its data lake to synthesize previously approved analyses rather than building new analysis from scratch.
https://www.tamr.com/air-force-data-assets
2. Ensuring Flight Safety
F-16 unloaded F-16 with many mounted “stores”: missiles, bombs,
fuel tanks, sensors, etc.
New store configurations lead to many possible problems, including:
Unsuccessful separation:
https://www.youtube.com/watch?v=fPTnmZ_HPAs&t=68s
Aeroelastic flutter: https://www.youtube.com/watch?v=qpJBvQXQC2M&t=44s
3. AF SEEK EAGLE Office
• Mission:
We deliver war-winning capability by efficiently evaluating the integration of the state-of-the-art weapons
on current and future generation aircraft, providing accurate combat weapon delivery software, while
serving as responsible stewards of our nation's resources.
• Vision:
Be the most agile, trusted, and responsive provider of innovative and cost-effective war-winning weapons
integration and mission planning solutions in the DoD.
8. Aligned with Air Force Chief Data Office
“Data is the future of our force,” Crider said. “Unlocking and unleashing the
power of our data is going to keep the Air Force at the forefront of
technological advancement. We must take advantage of today’s technology
so we can learn faster than our adversaries and ensure the maximum
effectiveness of our force.”
http://www.af.mil/News/Article-Display/Article/1448828/af-chief-data-officer-data-is-the-future-of-the-force/
• Organizing and storing data
• Finding data
• Making it accessible
• Linking it
• Making it trustworthy
• Providing an environment to access,
view, filter it
• Understanding it
Data accessibility
Support
appropriate
heterogeneity
Data governance
Automation w/
machine learning
Digit-
ization
“Exposing
engineering data as
a strategic asset”
AF CDO Core Goals AFSEO CDO
Immediate
Priorities
9. AFSEO Data Challenges
• Technology and processes
management must go together
• Choosing the right technology for
today and the future
• Supporting multigenerational office
• 200 TB of data from 50+ years
• Majority of data is unstructured in
sundry formats
- includes a large video and
simulations library
- Final products are memos and
reports in .doc and .pdf
• Infrastructure modernization ongoing
Background Issues
10. Data Lake Design
• On premise, but cloud ready
• Open Architecture - “Best in
Breed”
• Scalable - for the future
• Configurable & adaptable
• Interoperabile with internal legacy
AFSEO tools
• Secure, using existing role based
access control (at the file level)
• Correct results without massive
investment of time or money
App Layer
“Data unification powered by
machine learning”
Storage
“Scale out network-attached
storage platform”
Microservices
“Software platform for data
engineering at scale”
Design Principles
The data lake is an extensible workspace for data processing and application creation
Key Technologies
11. Data Lake Products
Numerical predictions
AUTOMATED OUTPUTS PRODUCTIVITY TOOLS
Document catalog powered by
clean, consistent metadata
Recommendation with
sources cited
Interactive antecedent browser
NEW
SEEK
EAGLE
REQUEST
Metadata extraction
INPUT AUTOMATED ANALYSIS
Discipline-specific logic
Machine learning models
for each discipline
12. Data Lake Architecture
Linux VM
Hue
Cloudera
Microservices
Dell/EMC Isilon
HDFS
CIFS
Windows File Browser
Linux VM
Cloudera
Microservices
DB
App 1:
File Catalog &
Metadata Tagging
DB
App
API
(future)
AD, LDAP and
Kerberos
integrated
Cloudera & Isilon,
and by extension
Tamr
Connected to other
AFSEO DBs and
apps
Solr Spark
...
Cloudera
Manager
Banana
HBase Solr Spark...
App 2:
Recommendation
Engine
App 3:
Deep
Predictor
Tamr Entity
Resolution
13. Data Lake Architecture
Linux VM
Hue
Cloudera
Microservices
Dell/EMC Isilon
HDFS
CIFS
Windows File Browser
Linux VM
Cloudera
Microservices
DB
App 1:
File Catalog &
Metadata Tagging
DB
App
API
(future)
AD, LDAP and
Kerberos
integrated
Cloudera & Isilon,
and by extension
Tamr
Connected to other
AFSEO DBs and
apps
Solr Spark
...
Cloudera
Manager
Banana
HBase Solr Spark...
App 2:
Recommendation
Engine
App 3:
Deep
Predictor
Tamr Entity
Resolution
An aside
Lucene
Elastic-
Search
Solr
Kibana Banana
E(L)K
Stack
Viz
Search
Solr
equiv.
15. • A Data Lake offers multi-protocol access with
tiering to workloads with different performance
requirements.
• All file sharing protocols capabilities allow
access from all other protocols
• Multiprotocol access controlled through AD,
LDAP, NIS and other providers
PowerScale Supported Protocols
• SMB – Microsoft OS
• NFS – Linux / Unix
• HTTP
• REST
• SWIFT
• S3 – Object Storage – Cloud Option
• HDFS – Hadoop
• NDMP – Backup & Recovery
• FTP
Flexibility makes AI an integral part of IT
17. App 1: Searchable File Catalog
Outcomes
● Improve productivity by making 50 years
of test data and SEEK EAGLE analyses
easily searchable
● Surface key references and connections
buried in documents
● Tagging >131 TB of data with 4.3 billion
descriptive labels in 30+ tag types
(aircraft, stores, author, file type, etc.)
● Analyst uses browser to search and filter
by tag
Prior State
● Research wastes engineer time
Digging through old files takes hours or
days (“limited data accessibility”)
● Siloed data
If one department needs help from another,
they have to ask a human (“limited data
sharing”)
● Disorganization leads to repeated work
Finding the right test from 10 years ago is
so difficult, engineers repeat the
experiment (“limited data usage”)
18. App 1: Searchable File Catalog
Maintain
this
system
alongside
the users’
folders
One click opens a file locally
Metadata
tags are
compre-
hensive &
ambitious
Amazon-
like search
& filter mix
19. ● 80% of requests should be fully
automated: engineer is just editor of
automatically produced product
● If the judgement is “too close to call”,
shows relevant historical document as a
place to start
● Encoded some of the veterans’ instincts
in the logic of the app
● Makes it easier for the program
management office to plan
● Engineers from each discipline touch
every request
● 90% of requests will be “by analogy”
● Even planning is hard
Requirement
Loads EMI Stability/
Control
Flutter Separ
ations
Mission
planning
Prior State
Recommendation
Outcomes
App 2: Recommendation Engine
20. App 2: Recommendation Engine
NEW SEEK
EAGLE
REQUEST
REFERENCE DATA
ACES Standard
Rationale
STAMP
DATA PROCESSING
Entity Resolution, NLP,
Transformations
SCORING
Tolerance Check
Configuration Comparator
HISTORICAL DATA
Approved
configs
Eng. dataTech
Order
Certify by
Analogy?
Produce
Publishable
Documents✅
🆇 Human review
and/or testing
needed
Flight Limits
Engineering Rationale
with sources cited
Historical data browser,
sorted by similarity
22. ● Time intensive
Every request results in 10,000s of
“download configurations”
● Manual
Physics-based simulations used heavily,
but non-trivial interpretation still needed
● Testing data is sparse
30+ years of flight tests, but most configs
aren’t tested due to high cost
● Expert-driven
Good judgment requires years of training
App 3: ML Predictions (Flutter)
● Tamr predicts amount of oscillation as
accurately as possible at every
possible flight condition
● The app automatically generates a
flight envelope for each new SEEK
EAGLE Request.
● Tamr shows most similar previous
flight tests and confidence intervals for
human check
● The Tamr output allows for fewer and
more targeted flight tests.
Prior State Outcomes
23. App 3: ML Predictions (Flutter)
Data lake used to predict aerodynamic flutter from first principles
Automatically produced “flight envelope” for
new configuration
Data Robot used to automate ML tuning process
24. Outcomes
Result Impact
AFSEO processes more configurations/year,
giving the pilot more options for a mission
Increased ROI and effective utilization of Air
Force aircraft
AFSEO cycle time is reduced Innovations reach the front lines sooner
Better data usage → fewer test flights, fewer
experiments in the wind tunnel, etc.
Cost saving for AFSEO
Data lake not only reveals and organizes
data, it converts data → insights
SEEK EAGLE Office is more productive
Data is easier to find;
Greybeard “instincts” have been codified
Reduce the onboarding time for new
engineers