More Related Content Similar to A Journey from Too Much Data to Curated Insights - ABD211 - re:Invent 2017 (20) More from Amazon Web Services (20) A Journey from Too Much Data to Curated Insights - ABD211 - re:Invent 20171. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS re:Invent
Sysco Corporation
W e s l e y S t o r y
V P , S a l e s T e c h n o l o g y a n d E n t e r p r i s e T e c h n o l o g y S e r v i c e s
N a v i n A d v a n i
S r . D i r e c t o r - E n t e r p r i s e I n f o r m a t i o n M a n a g e m e n t
N o v e m b e r 3 0 , 2 0 1 7
A J o u r n e y f r o m T o o M u c h D a t a t o C u r a t e d I n s i g h t s
2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Sysco Corporation
A n O v e r v i e w
Sysco is the global leader in selling, marketing and distributing food products to restaurants, healthcare and
educational facilities, lodging establishments and other customers who prepare meals away from home
Sysco operates 197 distribution facilities, serves about half a million customers in 13 countries
For Fiscal Year 2017 that ended July 1, 2017, Sysco generated sales of more than $55 billion
COSTA
RICA
3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The Motivation
3 y e a r p l a n
4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Setting The Stage
A c h a n g e w a s n e e d e d
Current State Challenges
Lack of Analytical Capabilities: Lack of business analytical capabilities to
analyze large volume data across category management, customer insights,
price simulations, etc.
Reporting Inconsistencies and Long Lead Times: Reporting standards are not
defined, most reports / transactions are tailored to requests. Multiple data
source and systems creating spaghetti data scenarios leading to
inconsistencies
Creeping Cost of Ownership: Aged and Siloed BI solutions and processes are
slowly increasing the total cost of ownership in storage, infrastructure,
maintenance and administration
Scalability & Stability Issues: Reporting team is currently above capacity with
several thousands custom reports running. Issues with performance, delays in
reporting due to data load causing instabilities
Future State Goals
Enable Revenue Growth - Better enable business decisions through data
visibility and consistency
Improve Operational Efficiency - Increase the efficiency of business
processes through data management best practices
Enhanced Customer Experience – Deliver more intuitive information to our
internal and external customers through self-serve reporting model
Enterprise View Of Data - Consolidated view of the customers, suppliers and
products data from Sysco SUS and SAP broadline and specialties companies
(Canada, Sygma, etc.) in one physical location
Reduce Total Cost of Ownership and Deliver Value Faster – Faster time to
market for insights at a lower price
§ Provide accuracy, timeliness and fidelity to the BI reporting process
§ Next generation architecture that fosters innovation and reduce costs
§ Change the BI consumption pattern, i.e. move from hindsight to insight driven reporting
§ Take manual work load off the team and enable them becoming data analyst rather than report creators
§ Enable decommissioning of triplicated business applications and processes
Benefits of Transition
5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The Enablement
H o w d i d w e g e t t o i n s i g h t s t h a t m a t t e r a n d s u p p o r t t h e p l a n ?
Due to competitive market pressures there was a big push to streamline operating costs and the three key areas below
helped unlock savings, drive top line growth and market share.
The three year plan was enabled by quick actionable insights that were derived using tools like Tableau
Merchandising Supply Chain
Sales & Margin
Management
Initiative Category Management Operational Data Insights
Revenue Management,
Opportunity Tracking
and Cost to Serve
Targeted Insights
• Broker Performance
• Category Attribute Analysis
• Category Conversion
• Category Compliance
• Innovation Items Scorecard
• Marketing associate compliance
• Inbound & Outbound
Productivity
• Cost per Piece
• Service Level
• Warehouse Efficiency
• Driver/Delivery Scorecards
• eCommerce Penetration
and Adoption
• Opportunity Tracker
• Price Management Tool
• Deal Manager
6. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Insights That Matter
S o m e o f t h e D e s c r i p t i v e I n s i g h t s
• Cost Per
Piece
dashboard
• Summary
view of
comparison
results
• Allows to
compare to
plan and PY
• Provides
ability to drill
down to
department
(Warehouse/
Delivery/
Maintenance)
Category Management
Price Optimization
Operational Productivity Measures
7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
A Transformation
H o l i s t i c a p p r o a c h a c r o s s t h r e e p i l l a r s
The roadmap consisted of improvements across the three dimensions of people,
process and technology in order to achieve a successful transformation.
The data and analytic needs at Sysco have been morphing over the last few years driven by
our digital transformation. These needs have directly driven our analytic capabilities roadmap.
PEOPLE
- Centralization & restructuring
of the BI org
- Strategic insourcing of key roles
- Training, re-tooling for individual
and team growth
PROCESS
- Adoption of an Agile
delivery model
- Data Governance
- Continuous process improvements
- Change management to help with
adoption
TECHNOLOGY
- Additional capability at
a lower cost
- Consolidate toolsets
- Easier access to non-USBL data
- Stabilize the existing platform
Business Value Derived from
Data & Analytics
8. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The How
9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Overview of reporting & analytics
C a p a b i l i t y m a t u r i t y
What
happened
Why it
happened
What will happen and
actions we should we take
Operational
Management
Decision
Support
Data
Science
Formatted
Reporting
Parameterized
Reporting
Guided
Exploration
Exploratory
Analysis
Predictive &
Prescriptive
AI/ML
10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
SEED
S y s c o E c o s y s t e m f o r E n t e r p r i s e D a t a – A n O v e r v i e w
What is SEED (Sysco Ecosystem for Enterprise Data) ?
v SEED is a AWS based ecosystem that allows Sysco to unlock the value from our data and drive our analytics journey forward,
while also modernizing our technology landscape to enable scalable enterprise wide data discovery & insights
v SEED is envisioned to scale with evolving business needs and provides a foundation for data governance and data security
v SEED, being cloud native, inherently also helps drive the Data Science and our Agile journey forward with the ability to
quickly stand up sandbox environments for experimentation
ü Demand driven model with predictable & affordable costs
ü Stabilization of environments reduced cost of delivery over time
ü Broad and deep functionality to support various use cases within data and analytics
ü Improved agility and quality with powerful tools for data manipulations and migrations
Why SEED?
11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Ecosystem versus EDW
C o n s o l i d a t e d f o c u s o n d a t a i n g e s t i o n , c o n s u m p t i o n , a n d n e e d f o r n e w
c a p a b i l i t i e s l e d u s t o t h e e c o s y s t e m a p p r o a c h .
Architecture simplification
(Ingestion, consumption, and new
capabilities)
• Movement from capacity-driven
model to a demand-driven model
for predictable costs
• Handle mixed loads by offloading
processing (ETL) to a distributed
environment
• Simplify and regulate data
movement across systems
• Allow for addition of data types from
transactions, interaction, and
observations, currently not in the
EDW
• Usage-driven consumption design
patterns
Cost optimization
• CAP-EX and OP-EX reduction
• Sustainable support solution that allows
for reduction in MS costs
• Reduction in number of tools to deploy
and mange
User value
• High-valued BI capabilities drive
development of the data-warehouse
• Timely access to data—hours/minutes
versus multiple days/months
• Enablement of advanced analytics
Enhanced reliability and accuracy
• Accurate data delivered via repeatable
process
• Errors identified and corrected before
business use
12. WMS, IDS, DPR,
Sales, Inventory,
Master Data
SWMS
Amazon S3
Raw data Transformed
Data
Reportable
Data
AWS Lambda Amazon EMR AWS Data Pipeline
Amazon
Redshift
Amazon RDS
Extracts
Amazon
Athena
Other BI apps
Internal
External
Data Science
ELT / Compute Layer
Storage Layer Analyze LayerIngestion//
Collection
Layer
Auditing and Monitoring Layer
Amazon CloudWatch
Extracts
Consumers
Sygma
Freshpoint
Amazon Glue
- post phase II
AWS CloudTrail
Amazon Glacier
archive Metastore
Amazon
Redshift
Spectrum
Amazon Glue
- post Phase II
SEED
S y s c o E c o s y s t e m f o r E n t e r p r i s e D a t a – A F i t f o r P u r p o s e A r c h i t e c t u r e
13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Consumption
Interactive queries, ad hoc queries and data extracts queries within each use cases were evaluated and optimized for SLA
requirements
14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data ingestion
T h e f o c u s w a s t o a c c e l e r a t e l o a d s a n d d e c o u p l e f r o m m i x e d l o a d s
s c e n a r i o s i m p a c t i n g S L A s
Ingestion//Collection- Layer
Raw$data$$
storage$
Opco41
Opco42
Opco43
Opco4n
.
.
.
Opco44
done
done
done
Job- Submission-
Layer Amzon
Redshift
Transformed$
and$
Reportable$
data
Storage/Persistent-
Resource- Management
Create$EMR$
Cluster
Terminate$EMR
Cluster
Metastore
OPCO-Tracking-Table-
Data- Processing- /-Compute
OPCO-1-4 3
OPCO-4
OPCO-5-4 6
Tracker
Pipeline-
submitter
15. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data consumption—data extracts
D a t a c o n s u m p t i o n u s i n g f i t f o r p u r p o s e t e c h n o l o g y w e n t a l o n g
w a y i n o p t i m i z i n g p e r f o r m a n c e
• App data extract:
• User performs the scheduled or on-
demand data extract through an
application, for example: SAP BOBJ
• Application extracts data on a
schedule to populate a local
RDBMS
• User scheduled extract: Data extraction by
a user usually via a SQL client on a
scheduled weekly or monthly basis
• User ad hoc extract through SQL: User
executes a large volume data extract query
16. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
SEED
E n a b l i n g A n a l y t i c s N e e d s
Analytical Use Cases
for the Business
Revenue Management
• Margins review by market
• Predictive Pricing simulations with external
economic data
• Pass thru predictive pricing analysis at all
levels of the organization
• Descriptive model for Customer Segmentation
Merchandising and Supply Chain
• Assortment optimization at scale
• Track vendor cost components of items
• Lotting using decision trees
• Forecast Vendor Price changes
• Market basket analysis
• Warehouse Performance Analysis
Marketing
• Share of Wallet
• Machine learning for future promotions
• Cross-sell opportunity feeder
• Churn analysis
The capabilities of SEED allow for the enablement of advanced analytics use
cases already defined and requested by the various functional areas.
SEED
• Analytical Sandboxes
• Quicker time to market
• R integration
• Better performing retrievals
• Large data sets
• Unstructured data
17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Scale Out of Tableau
H o w w e s t a y e d j u s t a h e a d o f t h e c u r v e
Slow Dashboard
Rendering
Memory Utilization
reaching limits
Storage Limitations
Needed improved
IOPS (Input/Output
Operations Per
Second)
Needed High
Availability
Top most used Sites
Workbooks by Site
Proactive Monitoring
and
Growth Projection
18. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Current System Specifications
EC2 Instance Type: r3.4xlarge
Operating System: Windows 2012 R2
vCPU: 16 (High Frequency Intel Xeon E5-2670 v2 Ivy Bridge Processors)
# Cores: 8
RAM: 122GB
Worker Nodes
• EC2 Instance Type: c4.2xlarge
• Operating System: Windows 2012 R2
• vCPU: 8 (High frequency Intel Xeon E5-2666 v3 (Haswell) processors optimized
specifically for EC2)
• # Cores: 4
• RAM: 15GB
Primary Node
On Prem
2 Nodes
16 Cores
128 GB RAM
AWS
3 Nodes
16 Cores
244 GB
3000 IOPS
AWS
6 Nodes
40 Cores
610 GB
3000 IOPS
19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Tableau Utilization Growth
2014 2015 2016 2017 Scale Out
Total number of
Server Users 64 1,700 3,860 12,713 20,000
Total number of
Active Users 64 1,100 1,375 5,825 12,000
Dedicated Core /
vCPU capacity 16 40 80 vCPU 80 vCPU 192 vCPU
Concurrent Users 11 55 120 350 TBD
Max Concurrency 16 60 150 400 960
Number of
Workbooks 8 110 206 671 TBD
20. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Benefits of moving to SEED on AWS
Scalability &
Availability to meet
Business Needs
Better Cost
Leverage
Improved
Capability
Security
Testing before
implementation
Governance
21. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Where are we headed
R o a d m a p o v e r t h e n e x t 1 y e a r
1–3 Months
(Consolidation and stabilization)
• Evaluate reporting patterns in detail
and optimize query monitoring rules
to refine priorities for optimal user
experience
• Implement CI/CD to enable
infrastructure spin up and spin off
based on amount of data processed
and automate code delivery and
testing
• Run end-to-end tests against real-
life query scenarios to optimize the
cost-saving further
• Agile transformation for the team to
leverage cloud enablement
3–6 Months
(Stabilization and optimization)
• Build universal SEED metadata
catalogue by merging catalogues in
Athena and Amazon RDS
• Build AWS Glue crawlers to crawl
and catalog existing data residing at
various places within Sysco
• Leverage AWS Glue ETL for all
PySpark ET jobs (alleviates the
dependency on maintaining and
sizing EMR clusters)
• Push cold data in Amazon Redshift
Spectrum and realize cost benefits
further
6 Months–Year
(Optimization and acceleration)
• Migrate data collection to AWS
Glue/Amazon EMR
• Migrate redwood jobs to data
pipeline in keeping us with
leveraging AWS managed services
• Based on business priorities,
convert key components of the
solution from batch mode to
streaming mode
• Continue to expand the data
repository solution to include more
data sources across Sysco for
cross-functional analysis across
various domains
22. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
THANK YOU!