Raymond Fu gave a presentation on building an enterprise analytics platform at the SoCal Data Science Conference. He has over 16 years of experience in big data, business intelligence, and enterprise architecture. He discussed how big data disrupts traditional architecture and requires new skills. Advanced analytics involves creating predictive models through machine learning to enable strategic and operational decisions. An enterprise analytics strategy involves data management, modernizing data platforms, and operationalizing advanced analytics models. Fu outlined the key capabilities needed for data management, analytics creation, and analytics operationalization. He provided examples of reference architectures and services that can be used to build an enterprise analytics platform.
2. 22
Raymond Fu
Practice Architect, Trace3
16 years of IT experience specializing in big data, business intelligence, and
enterprise architecture. 10 year corporate career with Bank of America
highlighted by leading many data integrations and warehousing initiatives from
mergers and acquisitions.
Founded his own technology company Xceed Consulting Group in 2012 enabling
data driven solutions.
Joined California based consulting company Trace3 in 2016 as a practice architect
for the Data Intelligence team.
Blog: Everything About Data
Twitter: @RaymondxFu
3. • Typically, organizations got a firm grasp on required People, Process, and
Technology to deliver capabilities, articulate end-to-end roadmap, identify
platforms and resources.
• Big Data disrupts the traditional architecture paradigm. Organizations may have
an idea or interest, but they don’t necessarily know what will come out of it.
• The answer or outcome for an initial question will trigger the next set of
questions. It requires a unique combination of skill sets, the likes of which are new
and not in abundance.
• The pursuit of the answer is advanced analytics.
Big Data Disruption
3
4. Advanced Analytics Definition
• The process, tools, technology, and collaboration to create predictive
models that enable/drive strategic and operational decisions. The
predictive models (1) generate insights and hypotheses and (2) test/score
them through experiments, so organizations KNOW what works better.
• Predictive models are created using machine learning, deep learning,
advanced data management tools and visualization tools
• An integral part of Advanced Analytics includes the operationalization of the
predictive models so they can be rapidly scored and decisioned at scale
6. Advanced Analytics Process
6
• Domain
knowledge
• Hypothesis
development
• Model architecture
• Algorithm selection and development
• Feature engineering
• Visualization
Collaboration
Reproducibility
• Data mining
• Statistical data shaping
• Training
• Cross-validation testing
• Environment and libraries
Production
feature
generation,
modeling, testing
Deployment
Parallel
experiments
• Performance
assessment
• Connectivity
• Landing
• Ingestion
• Knowledge
• Preparation
Business metric
assessment
Data
management
Analytics creation
(business modeling)
Analytics operationalization
(model production and deployment)
Organization
and business
impact
• Continuous
integration
and
deployment
• Model iteration
and redeployment
IT/DE, DS LoB, DS DS, IT/DE, LoB LoB, DS, IT/DE
• R-T and batch
scoring
• Decisioning
7. Enterprise Big Data Strategy
• Information management
• Data architecture, data governance and meta data management.
• Address key issues such as data integration and data quality.
• Data platform modernization
• Enterprise data warehouse offload.
• Data lake platform assessment.
• Advanced Analytics
• Methodology
• Tools recommendation
• Operationalization
8. • Step 1 – Establish Business Context and Scope (incubate ideas)
• Step 2 – Establish an Architecture Vision
• Step 3 – Assess the Current State
• Step 4 – Establish Future State and Economic Model
• Step 5 – Develop a Strategic Roadmap
• Step 6 – Establish Governance over the Architecture
Enterprise Architecture Approach
9. Establishing an Architecture Vision
9
The architecture development process needs to be more fluid and different from SDLC-like
architecture process. It must allow organizations to continuously assess progress, correct
course where needed, balance cost, and gain acceptance.
10. Advanced Analytics Capabilities
10
Category Capability Items
Organization and
business impact
Fast, informed
decisions
• Time from question to hypothesis to model implementation to informed decision
Strategic and
operational
role
• Degree of input into business/policy decisions
• Perceived and quantified value of analytics
Analytics
operationalization
Model
performance
• Execution of experiments in parallel
• Model performance for scoring and decisioning
Model
deployment
• Continuous integration and deployment
Analytics creation
Efficient model
creation
• Use of data mining and visualization tools
• Rapidly spun-up environment customized to individual data scientists that enables execution of large data sets and highly
mathematical algorithms
• Collaboration among data scientists and between data scientist and lines of business; reuse of data sets and models
• Model reproducibility (including versions, algorithms, data sets, parameters, notes, environment)
Appropriate
model selection
• Understanding, and appropriate use, of model architecture and algorithms, feature engineering, hyper parameterization,
statistical and mathematical concepts, training and validation, scoring, and decisioning
• Use of ML and DL concepts, tools, and libraries
• Use of graph systems
Data
management
Data capability • Infrastructure and tools to access and cleanse data
Data
knowledge and
confidence
• Understanding of, and confidence in, data (e.g. what is available, their relationships)
Data access • Access to internal and external data through infrastructure, logical associations, and tools
13. 13
Structured data source Unstructured data source
RDBM
S
Big
Data
Business Intelligence / Data Visualization Advanced
Analytics
HDFS NoSQL Cloud Storage
ETLETL
Teradata
Operation
CRM ERP Accounting Clickstream Sensor Info Images/Video Event Logs Social Media
Tools
Real-time
Streaming
Library (ML and DL) Online ML
AWS
Azure
torch
Machine Learning API
Google Prediction
AWS
Azure
BigML
IBM Watson
14. Advanced Analytics Services
14
Service Type Services
Overall
Assessment
• Advanced Analytics assessment
Architecture
• Architecture for data science
• Architecture for cloud analytics
ETL/ELT
• Data source identification and
integration
• Data virtualization
• Data preparation
Data analysis
and modeling
(data science)
• Statistical / quantitative analysis
• Descriptive analysis
• Predictive modeling
• Machine learning
• Deep learning
• Graph systems
• Simulation and optimization
Service Type Services
Visualization and
insight
presentation and
recommendations
• Data exploration / mining / advanced
visualization to understand the data
• Insight presentation and recommendations
Tools
recommendation
• Infrastructure
• Software tools
• Software environment, programming, libraries
Process
improvement
• Analytics process improvement
• Data governance
• Model governance
• Continuous integration and deployment of
models
Organizational
capabilities
• Advanced analytics organization structure and
roles
• Advanced analytics training
• Advanced analytics staff augmentation
15. Best Practice
15
• Align Analytics with Specific Business Goals
• Ease Skills Shortage with Standards and Governance
• Optimize Knowledge Transfer with a Center of Excellence
• Top Payoff is Aligning Unstructured with Structured Data
• Plan Your Discovery Lab for Performance
• Align with the Cloud Operating Model