Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Big analytics best practices @ PARC
1. Confidential Think Big AnalyticsConfidential Think Big Analytics
Big Analytics Best Practices
An Executive Guide
September 26, 2012
Ron Bodkin
Founder and CEO
ron.bodkin@thinkbiganalytics.com
@ronbodkin
2. Confidential Think Big Analytics
Introduction
• One of Silicon Valley’s Fastest Growing Big Data start ups
• 100% Focus on Big Data consulting & Data Science solution services
• Management Background:
Cambridge Technology, C-bridge, Oracle, Sun Microsystems, Quantcast, Accenture
C-bridge Internet Solutions (CBIS) founder 1996 & executives, IPO 1999
• Clients: 40+
– Focuses: Technology, Financial Services, Retail, Advertising
• North America Locations
• US East: Boston, New York, Miami
• US Central: Chicago, Austin
• US West: HQ Mountain View, San Diego, Salt Lake City
Think Big is the leading professional services firm that’s purpose built for Big Data.
28/17/2013
3. Confidential Think Big Analytics
Big Analytics, enabled by Big Data
Big Data invented to solve
web scale data challenges.
Opportunity and mandate
for enterprises to compete
with advanced analytics.
Now enabling new
businesses and products.
38/17/2013
4. Confidential Think Big Analytics
1. It’s just a new name for Business Intelligence.
2. The packaged applications are about to emerge.
3. The enterprise can wait.
4. Low cost, low skill staffing will work.
5. It’s simple to get results.
6. You can automate all the intelligence.
7. You can buy it all from a single vendor “stack.”
The 7 Myths of Big Data
48/17/2013
7. Confidential Think Big Analytics
360 Customer View Analytics
Trends
• Compute model scores faster
• Analyze full data sets
• Incorporate new data
• Build new services from data
Basic Reporting
Data Ingestion
Batch
Processing
Fast Analytics
Data
Enrichment
Data Science
78/17/2013
8. Confidential Think Big Analytics
Social Media
“The digital transformation occurring at American Express cuts across many business units,
and it has to because of the breadth and depth of our business,” Leslie Berland SVP of
Digital Partnerships and Development explains. “From customer service to merchant
services to our entertainment and travel business units, to corporate affairs, as well as our
newly formed digital partnerships and development team, social media is a company-wide
initiative.”
Source: http://mashable.com/2012/03/28/american-express-social-media/
March 28, 2012
88/17/2013
10. Confidential Think Big Analytics
Envision
Current State
Future State
Prioritized
Initiatives
Key
Decisions &
Impact
Analysis
Reference
Architecture
Design
Patterns
Technology
Rankings
Organization
& Training
Optimized
Projects
Selection
Big Analytics
Roadmap &
RecommendationsGap Analysis
Big Data Strategy Readiness Analysis
Technology
Recommendations
Big Data Roadmap
Big Analytics Roadmap Methodology
Analytic
Platform
Decision Tree
Data
Strategy
108/17/2013
11. Confidential Think Big Analytics
Data Strategy: Value from Integration
Ad Server
Mobile
Social
Web Site
Devices & Enterprise Applications
Outside Data (new)
118/17/2013
13. Confidential Think Big Analytics
Organizing for Success
• Driven by collaboration between
data scientists, engineers and
business
• Leverages the manifest and latent
signal of multi-structured data
• Emphasizes exploratory analysis to
uncover novel topologies in the
data
• Boosts power with diverse
multivariate models and holistic
data sets
• Triangulates truth with multiple
approaches when problems are
intractable
138/17/2013
14. Confidential Think Big Analytics
Need for New Skills
Database
Administrator
Big Data
Administrator
Business
Analyst
Data Science
Math Modeler
Data
Architect
Data Architect
Big Data Modeling
Developers
Big Data
Engineer
Invest and scale complementary skills to move to a data-centric organizational model.
• Include expert training, mentoring and joint solution development
148/17/2013
16. Confidential Think Big Analytics
An Integrated Approach
Creating value with nimble, incremental innovation
Brainstorm
POC
Pilot
Deploy
Training
GTM
Partners
Clients
Industry
Analysts
Strategic
Technology
Business&TechnologyRequirements
Data Science & Analytics
Center of Excellence
InternalSolutions
ExternalSolutions
QA TestEngineer
Risk Management
Big Data
Lab
Technology
Experts
Best in Class Analytics Sand Box
Monitoring
Open Source
Innovation
Business SMEs
Envision Education Engineering
Strategy Management, Development & Operations
Support & Performance Measurement
BUSINESS VELOCITY
Administration & Optimization
Big Data Strategy
Readiness Analysis
Technology
Recommendations
Big Data
Roadmap
168/17/2013
17. Confidential Think Big Analytics
• Develop data and analytics
platforms that bridge the old
and new.
• Understand integration
patterns and use cases to
effectively guide new
initiatives.
• Partner with business on
opportunities for innovation.
• Build organizational maturity
along a number of dimensions
(platform, architecture, data
engineering, data science).
17
New IT Platforms
Data Mining
(R, Mahout)
Query
(Hive/Pig)
MapReduce
Parallel
Export
Parallel
Export
Messaging
Replication
Hadoop Cluster
Management, Monitoring, and Security
Landing Zone
External Data Sources
Event
Ingest
Realtime to Seconds Minutes and Up
Interactions
Analysis
Source: Think Big Analytics
MPP EDW:
structured
summary data
Fast Unstruct-
ured DB
Prod
Cycle
(Min's)
Science
Cycle
(Days)
Scheduler &
Dependency
Engine
DFS
Data Science
Tools
Tradtional
BI Tools
Scale out
DB
Scale out
DB
Relational
DBMS
Serving
Engine
Secondary
Index
low vol
ACID
Read /
Write
Distributed
SearchDistributed
Search
DB Sync
8/17/2013
18. Confidential Think Big Analytics
Data Science
A New Role Exists – the Data Scientist
Focused on data not models
Works with analysts to create business value
• One Part Scientist/Statistician
• One Part Sleuth
• One Part Artist
• One Part Programmer
188/17/2013
19. Confidential Think Big Analytics
1. Big Analytics is a critical capability.
2. Your organization can create value now.
3. Get help to get off on the right foot.
4. Adopt incrementally.
Conclusions
Think Big Start Smart Scale Fast
198/17/2013
New Data Sources, Innovative Use Cases, Data Science & Predictive AnalyticsA new class of big data technologies were invented to address data management challenges at Web scale. These technologies enabled new approaches to solve analytic questions that were too complex or did not fit into traditional systems:Reduce cycle time developing new analytic modelsRun analyses that were previously impossible Simpler modeling approaches by utilizing larger datasetsAnalysis conducted at a far lower costFlexibility for future unknowns +Compute Processing $ & Time ex. 26 Days 2 minex. 42 Hours 40 minex. 18 Hours 16 min=Business Innovation VelocityBig Data is Changing the Game. Organizations need to get smarter, leveraging substantial untapped data assets for sustained competitive advantage. reduce cycle time -> 1. much lower effort to work with new datasets; 2. parallel distributed infrastructure processes data much faster 3. compute approximate answers before investing in projects to automateMore detailed example on reduced cycle time - Hive allows you to define the underlying structure of the raw data just enough to let you run SQL-esque queries against it. run new types of analysis->1) model across complex datasets that did not match relational database model2) work with larger datasets and compute intensive algorithms simpler modeling->1) google whitepaper2) fewer assumptions, simpler models required when looking at entire customer dbvs 3% and extrapolating lower cost->- shorter cycle times- lower infrastructure costs for storage and processing utilizing commodity hw and open source sw- reduced processing time Flexibility -> promise, that by storing everything, you have source data to continue to generate and model new hypothesis, reduced cycle times for experiments to increase value, now have the ability to store 10 years of full data for self and suppliersInnovations in commodity hardware, elastic, distributed, open source software platforms, such as Hadoop, and NoSQL database technologies are changing the game for advanced analytics at the core:
leverages the manifest and latent signal of multistructured data we say "multistructured" not "unstructured" now most of the data in the world has latent signal - it's hidden as a messy tangle of other crap. bi tools are really designed to work with data where the relevant signal is overt (manifest variables) and this is true of the corresponding models* emphasizes exploratory analysis to uncover novel topologies in the data so this is stuff like narrow strata and behavioral cohorts. just said all fancy and whatnot* boosts power with diverse multivariate models and wholistic data sets the world is multivariate integrating models designed for structured data with those designed for unstructured data gives new power it not just more data, it's data from new sources, providing a new lenses, new behaviors, etc. all in concert e.g. integrating online and offline, adding offline brand exposures to online ad efficacy assessments and attribution analysis* triangulates truth with multiple approaches when problems are intractable stop trying to "prove" things, let validity and predictability testing guide you, focus on avoiding spurious relationships through theoryPlaybook as talking pointsRich data setsURLs, social graphs, text feedback…New, rich visualizationExplorationAutomated detectionHighlights, trends, anomaliesCollaboration with data scientists…
DBA -> Big DBAPrior experience:Diverse system environmentsApplication performance mgtSystems appreciationMetrics-focusedNew skills:Management & monitoring toolsMetricsAutomation for scaleLower-level workload tuning DA --> DA BDMPrior experience:Data-focused: digging into detailsDiverse database environmentsDeep domain knowledgeFamiliarity with unstructured data (XML)Hybrid dbs and non-db systemsNew skills:Data modeling for unstructured dataAlternative tools and documentationLanguages and APIs (Hive, Pig, M/R)Process Models (M/R, Key/Value)Lower-level optimizations BA -> DS MMNew Skills:Introduction to HadoopNew tools for data manipulationVariety of new modelsChallenging top-down approachesWorking with unstructured dataBottoms-up pattern discoveryEfficient programming at scaleLarge scale Machine Learning Dev -> Big Data EngineerNew Skills:Processing models(MapReduce, Key/Value)Data modelingSchemas for unstructuredLanguages/APIs (Hive, Pig, M/R)Work process from small to full-scaleInvestigating approachesManual optimization ExplorationLearning1st Internal DataTest WorkloadsProcess LimitedProductionPilot AppsAgile DataFeedback LoopProcess LimitedPortfolioBroad App RangeIntense AnalyticsNew Feeds, Derived DataSpace LimitedData-CentricOrgImpacts Core BizNew ProductsAnalytic FocusSpace Limited
Big Data solution FactoryBig Data Labs Asset partnering with Think Big Gather best practicesQtlry review of brainstormsVendor briefingsGartner and ind analysts and researchWhat is the criteria for techSelection on POC vs PilotRamp Adoption and share assetsCollaboration tools