Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

UX STRAT USA Presentation: Joe Lamantia, Bottomline Technologies

1.547 Aufrufe

Veröffentlicht am

UX STRAT USA 2018 presentation slides by Joe Lamantia of Bottomline Technologies: "Pioneering a UX-Based Product Strategy Practice"

Veröffentlicht in: Design
  • There are over 16,000 woodworking plans that comes with step-by-step instructions and detailed photos, Click here to take a look ➤➤ http://ishbv.com/tedsplans/pdf
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier
  • ♣♣ 10 Easy Ways to Improve Your Performance in Bed... ♥♥♥ https://tinyurl.com/rockhardxx
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier

UX STRAT USA Presentation: Joe Lamantia, Bottomline Technologies

  1. 1. FLYING BLIND ON A ROCKET CYCLE PIONEERING EXPERIENCE-CENTERED PRODUCT STRATEGY FOR EMERGING SPACES
  2. 2. JOE LAMANTIA Currently: VP Design & Development @ Bottomline Technologies Previous 20 years: end-to-end customer experience, all stages of product and service development, and digital / business transformation, focusing on emerging business and technology. Archetype(s): Sometime Entrepreneur / Proto-academic / Arm-chair Pro Cyclist https://www.linkedin.com/in/digitaljoelamantia/ @mojoe JoeLamantia.com [joelamantia.net]
  3. 3. !3 Businesses around the world depend on Bottomline Technologies (NASDAQ: EPAY) solutions to help them make complex business payments simple, smart and secure, including some of the world’s largest banks, and private and publicly traded companies.
  4. 4. This case study describes building a learning- driven strategy capability to guide an adventurous product development group focused on the new domains of big data analytics and machine intelligence. I’ll share the outcomes of our efforts to launch new products chartered directly around customer experience value; outline the methods, tools, and perspectives that powered product discovery and strategic planning; share a framework and patterns for identifying and understanding emerging domains; and review the application of this toolkit to new situations.
  5. 5. EMERGING SPACES
  6. 6. ROADS?! WHERE WE’RE GOING, WE DON’T NEED ROADS…!
  7. 7. = $$
  8. 8. DATA SCIENCE MACHINE INTELLIGENCE
  9. 9. PRODUCT STRATEGY?
  10. 10. STRATEGY…
  11. 11. BUSINESS STRATEGY IS ABOUT IDENTIFYING YOUR BUSINESS OBJECTIVES AND DECIDING WHERE TO INVEST TO BEST ACHIEVE THOSE OBJECTIVES. Marty Cagan http://svpg.com/business-strategy-vs-product-strategy/
  12. 12. THE PRODUCT STRATEGY SPEAKS TO HOW YOU HOPE TO DELIVER ON THE BUSINESS STRATEGY. Marty Cagan http://svpg.com/business-strategy-vs-product-strategy/
  13. 13. http://rethinkingproductmanagement.blogspot.com/2012/06/product-strategy-what-does-it-mean-need.html
  14. 14. http://melissaperri.com/2016/07/14/what-is-good-product-strategy/ PRODUCT STRATEGY
  15. 15. http://mashable.com/2015/02/13/fifty-shades-of-grey-mad-libs/#7Te9vMnONqqF
  16. 16. OPPORTUNITY ASSESSMENT “I ASK PRODUCT MANAGERS TO ANSWER TEN FUNDAMENTAL QUESTIONS” 1. Exactly what problem will this solve? (value proposition) 2. For whom do we solve that problem? (target market) 3. How big is the opportunity? (market size) 4. What alternatives are out there? (competitive landscape) 5. Why are we best suited to pursue this? (our differentiator) 6. Why now? (market window) 7. How will we get this product to market? (go-to-market strategy) 8. How will we measure success/make money from this product? (metrics/revenue strategy) 9. What factors are critical to success? (solution requirements) 10.Given the above, what’s the recommendation? (go or no-go) http://svpg.com/assessing-product-opportunities/ Assessing Product Opportunities by Marty Cagan | Dec 13, 2006
  17. 17. PRODUCT DISCOVERY MODERN PRODUCT DISCOVERY • Introduction [:26] • Modern Product Discovery [:54] • The Evolution of Modern Product Discovery [4:15] • The Agile Manifesto [7:06] • The Rise of User Experience Design [8:47] • The Lean Startup: Eric Ries [9:49] • The Jobs-To-Be-Done Framework: Clayton Christensen and Anthony Ulwick [10:42] • OKRs and Design Sprints [12:12] • The Goal of Modern Product Discovery [14:27] • Putting Discovery Practices Into Context: The Opportunity Solution Tree [21:32] • The Future of Product Discovery [29:42] https://www.producttalk.org/2017/02/evolution-product-discovery/ The Evolution of Modern Product Discovery February 8, 2017 by Teresa Torres 9 Comments
  18. 18. http://rethinkingproductmanagement.blogspot.com/2012/06/product-strategy-what-does-it-mean-need.html
  19. 19. WHY ARE YOU HERE…?
  20. 20. WHERE ARE YOU GOING…?
  21. 21. PRODUCT STRATEGY CHARTS A DESIRED SET OF COURSES THROUGH THE SPACE OF POSSIBLE PRODUCTS FOR A DOMAIN Joe Lamantia http://svpg.com/business-strategy-vs-product-strategy/
  22. 22. Johnny Appleseed TEXT
  23. 23. OBSERVE ORIENT ACT DECIDE
  24. 24. OPPORTUNITY ASSESSMENT PRODUCT DISCOVERY INVEST…? PORTFOLIO PLANNING
  25. 25. WHAT AM I LOOKING FOR…?
  26. 26. DEEP STRUCTURE CHANGE VECTORS EARLY SIGNALS INFLECTION POINTS EMERGING SPACES HOLISTIC EXPERIENCES
  27. 27. EACH ASPECT = POTENTIAL LEVERAGE POINT FOR STRATEGIC ENGAGEMENT
  28. 28. DEEP STRUCTURE CHANGE VECTORS EARLY SIGNALS INFLECTION POINTS EMERGING SPACES HOLISTIC EXPERIENCES
  29. 29. DEEP STRUCTURE ENTERPRISE / B2B • Business process • Activity • Social structure: Organizational model • Boundaries • Regulation • IT / Systems architecture • Lifecycle • Flows: capital, information, people • Frame: shareholder value, social enterprise CONSUMER / B2C • Value scheme: wealth, love, knowledge, safety • Demographics • Boundaries • Mores • Culture • Social structure: community / group • Frame: active lifestyle, sustainability
  30. 30. ONCE UPON A TIME…
  31. 31. Information Visibility through Endeca Discovery Applications MDEX Engine Rapidly changing
 data and content Large volumes of 
 highly attributed records Structured and
 unstructured information Discovery Applications Intuitive user experience guides untrained users to discover relationships in data Specialized Database High performance database purpose built for data-driven search, navigation, and analytics Flexible Data Integration Consolidate structured and unstructured data to bridge whitespace between enterprise systems
  32. 32. $$$$
  33. 33. ASSIMILATE!
  34. 34. …NOW WHAT…?
  35. 35. THE NEW GIG
  36. 36. 1. GET IN THE HEADS OF DATA SCIENTISTS 2. BE THE SPIRIT OF THE PRODUCT
  37. 37. BUT HOW…?
  38. 38. CONTINUOUS LEARNING LEAN STRATEGY
  39. 39. CONTINUOUS LEARNING
  40. 40. UNDERSTAND & EMPATHIZE WITH CUSTOMER PERSPECTIVES >>ARTICULATE CUSTOMER VALUE SOURCES
  41. 41. IDENTIFY BUSINESS IMPLICATIONS >> INFORM ALL STAGES OF PRODUCT & SERVICE DEVELOPMENT
  42. 42. INVESTIGATING CUSTOMERS EXPLORING HYPOTHESES ABOUT VALUE
  43. 43. INVESTIGATING CUSTOMERS: “WHAT DO AP MANAGERS NEED (TO BE MORE EFFECTIVE (AT IMPROVING RECONCILIATION PROCESSES))? WHY?”
  44. 44. OUTCOMES VALUE CHAINS MAP, CUSTOMER LANDSCAPE / SEGMENTS, PERSONAS, CAPABILITY MODELS, DOMAIN MODELS
  45. 45. EXPLORING HYPOTHESES ABOUT VALUE: “AUTOMATION OF RECONCILIATION ACTIVITIES WILL ENABLE ACCOUNTS PAYABLE GROUPS IN MID-MARKET COMPANIES TO HANDLE 30% MORE TRANSACTIONS.”
  46. 46. PRODUCT DEVELOPMENT IMPACT INNOVATION OPPORTUNITIES PRODUCT HYPOTHESES FOR VALIDATION PRODUCT CONCEPTS FOR PROTOTYPING PLANNING GUIDANCE (ROADMAP > EPIC > QA) DELIVERY GUIDANCE: FEATURES AND FUNCTIONS
  47. 47. INCREMENTAL EXPLORATORY PROGRESSIVE CUMULATIVE STRUCTURED ADAPTIVE
  48. 48. DUAL-TRACK AGILE 1. Hypothesis A “Lorum ipsem…” 2. Hypothesis B 3. Investigate A 4. Hypothesis C 5. Investigate B 6. Investigate C
  49. 49. INVESTIGATE
  50. 50. Data Scientist Square - San Francisco Bay Area Job Description Square is hiring a Data Scientist on our Risk team. The Risk team at Square is responsible for enabling growth while mitigating financial loss associated with transactions. We work closely with our Product and Growth teams to craft a fantastic experience for our buyers and sellers. Desired Skills & Experience As a Data Scientist on our Risk team, you will use machine learning and data mining techniques to assess and mitigate the risk of every entity and event in our network. You will sift through a growing stream of payments, settlements, and customer activities to identify suspicious behavior with high precision and recall. You will explore and understand our customer base deeply, become an expert in Risk, and contribute to a world-class underwriting system that helps Square provide delightful service to both buyers and sellers.
 
 To accomplish this, you are comfortable writing production code in Java and conducting exploratory data analysis in R and Python. You can take statistical and engineering ideas from prototype to production. You excel in a small team setting and you apply expert knowledge in engineering and statistics.
 
 Responsibilities 1. Investigate, prototype and productionize features and machine learning models to identify good and bad behavior. 2. Design, build, and maintain robust production machine learning systems. 3. Create visualizations that enable rapid detection of suspicious activity in our user base. 4. Become a domain expert in Risk. 5. Participate in the engineering life-cycle. 6. Work closely with analysts and engineers. Requirements 1. Ability to find a needle in the haystack. With data. 2. Extensive programming experience in Java and Python or R. 3. Knowledge of one or more of the following: classification techniques in machine learning, data mining, applied statistics, data visualization. 4. Concise verbal and written articulation of complex ideas. Even Better 1. Contagious passion for Square’s mission. 2. Data mining or machine learning competition experience. Company Description Square is a revolutionary service that enables anyone to accept credit cards anywhere. Square offers an easy to use, free credit card reader that plugs into a phone or iPad. It's simple to sign up. There is no extra equipment, complicated contracts, monthly fees or merchant account required.
 
 Co-founded by Jim McKelvey and Jack Dorsey in 2009, the company is headquartered in San Francisco.
  51. 51. The Conway Model The ‘Subway’ Model
  52. 52. WHAT SORT OF PERSON? ▸ They seem different than analysts: ▸ problem set ▸ relationship to discovery tools ▸ skills and professional profile ▸ discovery / analytical methods ▸ perspective ▸ workflow and collaboration ▸ Are they? How?
  53. 53. AREAS OF INVESTIGATION ▸ Workflow ▸ Environment ▸ Organizational model ▸ Pain points ▸ Tools ▸ Data landscape ▸ Analytical practices ▸ Project structure ▸ Unmet needs
  54. 54. TEXT
  55. 55. DISCUSSION GUIDE Can you please walk me through a recent or current project? a. How was the project initiated? b. How defined was the business problem in the beginning? Did the problem change? c. Where/who did you obtain data sets from? How did you make the decision? d.Describe the data you used: How did the data sets look like? How big were they? Were they structured or unstructured? e. What tools or techniques did you use to do the analyses? Did they map to the specific steps you mentioned just now? f. How did you decide these were the tools/techniques to use? To what extent were these decisions made by yourself and to what extent were they standardized by your group/team? g. How did you present the results of your analyses? What tools did you use? What do you like and dislike about your current tool set? h. Which stage of this project was the most challenging? To what extent did the tools satisfy what you intended to do? What features were lacking? i. How much collaboration was there during each stage of the project? i. Background and role of collaborators ii. Collaboration modes iii. Types of information shared Thinking about the projects you have worked on, is there a common approach you take to address these problems? How did you decide on this approach/tools?
  56. 56. NEEDS What are the most common and useful statistical techniques you use during discovery and analysis efforts? “(1) The most commonly used statistical techniques used to date (in our strategic planning work) are:  dimensionality reduction (partition clustering, multiple correspondence analysis), factor analysis, partition clustering (k-means, k-medoids, fuzzy clustering), cluster validation techniques (silhouette, dunn’s index, connectivity), multivariate outlier detection, linear regression, and logistic regression.” What statistical capabilities or functions would be very useful if provided within Endeca discovery applications, and where would they be useful? (2) Techniques that would assist with identifying outliers or invalid data.  Much of this work seems to be done by hand.  I believe that we are also getting to the point where we could start using linear regression and splines (for showing trends).”
  57. 57. NEEDS For example, would system-generated descriptive statistical visualizations be useful for whole data sets - or for smaller user- selected groups of attributes?   “With regards to your last question on visualization, we have put in significant effort to use visualization in our Endeca installation.  We have built visualizations such as tree maps, flow diagrams, sun burst diagrams, scatter plots showing clusters, and hierarchical edge bundling diagrams to explore our data sets.  Would it be useful for the application to analyze and suggest possible distribution models it sees in the data; for the values of individual attributes, and/or for larger sets of data? Our data tends to be qualitative rather than quantitative so this drives much of our visualizations. So yes, interactive descriptive statistical visualization would be helpful – on the complete data set and individual attributes.”
  58. 58. Discovery/Information Needs Support longer term strategic planning: •How can we decrease the time-to-install service for new customers •How can we decrease the time it takes to restore service after a storm causes wide- spread outages •How can we decrease operational cost for each department/line of business •How many call center representatives do I need in my call center •How much offsite technician headcount do we need based on historical/seasonal trends balanced against current customer install base and ongoing sales/marketing efforts?  Evaluate Success: •How effective was a particular marketing campaign •How effective is a new training program for call center representatives •How effective is a self-install approach Understanding variables that impact KPIs.  KPIs include: •Call center volume •% successful resolution by support staff •Time-to-install •Sales volume •Sales revenue  Understanding & Explaining Variance using Retrospective Analyses •Why does Connecticut have a shorter time-to-install than Rhode Island •Why did 2 identical marketing campaigns in 2 different markets have vastly different impact on sales •Is the variance significant, or does it represent random deviation?  Ad-hoc Reporting •How many calls to the call center needed to be escalated to tier 2 support last month •How many new customers complained that a technician was later/didn't show up for the install appointment Analyst Profile: Scott – Operations Analyst Summary Education BA Information Systems (Connecticut State College) MBA  Org Leadership (Johnson & Wales) Scott is a mid-level analyst with a background in Business Information Systems, and MBA in Organizational Leadership.  He works in a 6-person team at Cox-New England (Telecommunications). His current role involves conducting data mining analysis to support operations research and organizational decision making/strategic planning. Scott's work supports both sides of the profit equation: operations research/analysis to support internal cost-cutting and process innovation, and formative/summative evaluation to help drive effective sales/ marketing efforts to increase revenue.  His group is also given target cost savings goals that they need to help individual departments achieve to fulfill a cost reduction organizational mandate.  His group accomplishes this by discovering inefficiencies in process through data mining, predictive modeling and retrospective data analysis. Cox has highly attributed enterprise data on customers, marketing campaigns, pricing variants and special offers, demographics, geography of the area, building and home types, school schedules, weather events, etc. that describe customer usage patterns, consumption of media bandwidth, etc. Each of their products (data, cable, phone, wireless) has different usage profiles that vary along many of the dimensions and variables listed above. His group is focused on residential customers; business customers are handled by a separate unit.    
  59. 59. ‘FIVE THINGS ANALYSTS DO WITH DATA’ ▸ Clustering ▸ Dimension Reduction ▸ Anomaly Detection ▸ Characterization ▸ Testing probability model & validation Source: Frontiers in Massive Data Analysis http://www.nap.edu/openbook.php?record_id=18374 } } Structure of data Profile of data } Validity of data
  60. 60. Findings
  61. 61. Skillz
  62. 62. Business Analytics Data Science Intuitive Manual Gradual Individual Empirical Augmented Accelerated Cooperative* Nature of sense making activity
  63. 63. Data Scientist: Profile
  64. 64. Sense Maker Segment Sense makers need to create and/or employ insights to accomplish their business goals and satisfy their responsibilities. These insights emerge from independent and collaborative discovery efforts that involve direct interaction with discovery applications, and participation in discovery environments. Insight Consumer Analyst Casual Analyst Data Scientist Analytics Manager Problem Solver
  65. 65. Creates data-driven insights, offerings, and resources to transform the organization Work Experience 10 Years Education Ph.D. Statistics, MS Bio-Informatics Job Title Senior Data Scientist Company LInkedIn Summarize & Communicate Review findings with colleagues; summarize ,visualize, and communicate key findings to Insight Consumers/decision makers Prototype & Experiment with data driven feature: How can we prototype/ evaluate this w/out disrupting the site? Gather Data & Analyze Results Use descriptive, inferential, and predictive statistics to evaluate results Analyze & Identify causal/ predictive factors: Who are the best candidates to contact for a job based on recruiter needs and profile content? Dana Data Scientist • Defining and capturing useful measures of online attention • Getting all the data analytic tools to work together properly • No current workflow support or tools for data wrangling, analysis, experimentation,, and prototyping • Effective tools to help experiment with and evaluate value /utility of features and activities for users • Ability to rapidly prototype data-driven features w/out risk of online service disruptions • Open source data manipulation, mining & analysis tools including R, Pig, Hadoop, Python, etc. • Statistical packages such as SAS, SPSS, etc. • Custom analytical tools built using open source components and languages • Leverage data to support the org mission • Enhance products & services with data-driven insights and features • Use data to identify new opportunities and prototype/drive new customer offerings • Create useful data sets/streams, measures, & resources (e.g., data models, algorithms, etc. Key Goals Tools Pain Points Wish List Sample Workflow Dana is a Senior Data Scientist who has worked at LinkedIn for 5 years. Dana’s education includes a Ph.D. in Statistics and an MS in Bio Informatics. Dana’s previous work includes positions in academic research groups as a doctoral candidate and post-doc, as well as software engineering roles in the Internet & technology industries. •Dana works with several other data scientists and her Analytics Manager on a centralized team •Dana and her colleagues aim to create data driven insights, features, resources, and offerings that deliver strategic value to LinkedIn •Dana works with Analysts on other teams to define and create discovery tools, data sets, and methods for use by their groups at LinkedIn. •Dana & team are visible & well established within LinkedIn, and have a voice in product strategy and operational context; they have a high degree of autonomy in defining data science projects •Dana works with Insight Consumers to suggest and determine potential new data driven offerings to prototype and evaluate. • How can we leverage data to increase online engagement with LinkedIn? •How should we measure engagement & what factors drive it? •What aspects of a personal profile are most likely to encourage / discourage new connections between people? •How can we increase people’s activity and contributions to topical discussion groups? • What factors drive the effectiveness of our marketing campaigns? •Why did one of our marketing campaigns work exceptionally well? • How can leverage data to help recruiters identify and communicate effectively with qualified and potentially available candidates? Typical Discovery Scenarios & Problems Background Work Context • Mines, analyzes, & experiments with data to identify patterns, trends, outliers, causal factors, predictive models, & opportunities • Defines and explains newly devised measurements, predictive models, & insights • Compares effectiveness of operations at achieving company goals for engagement, growth, data quality • Produces & explores new data sets • Collaborates with other data scientists to capture new data streams • Prototypes new data driven site features/ offerings • Runs data based experiments to test/ evaluate models, hypotheses & prototypes • Communicates & explains analyses to colleagues & Insight Consumers I’ll do whatever it takes – wrangle, extract, manipulate, analyze, experiment, prototype – to use data to drive value & innovate “ ” Activities
  66. 66. Perspectives Analytical The analytical perspective is the center of definition for all analytical roles. Contrast with engineers, who "make stuff". Analytical roles figure things out for some purpose: whether a model to inform a product prototype or provide insight. Empirical The empirical perspective is distinct from the analytical perspective, and marks 'true' data scientists. This revolves around framing and testing hypotheses formally and informally, often requires validation and interrogation of experimental methods and results by others, expects significant degree of transparency at (all) stages of the analytical effort.
  67. 67. Empirical Method Experiments Hypotheses Results Questions or beliefs Predictions Conclusions Insights Domain Production Models Data Sets Exploratory ValidationInvestigative TrainingModel Building Analytical Methods Insight Consumer Data Scientist Articulates Directs & applies Creates & refines Effected by Lead to Tested by Use / require Motivate Creates & refines Generate Achieves Informed by & shares Inform Understands Defines & evolves Inform Data Engineer Implements Determines Applied to validates Data Sources Used to define Applied to Development Corpus External Sources Production Corpus Mirrors Applied to Models Reference Initial Interim New Drawn from Analytical Tool Algorithm Script Test Implemented as Implements Inform What is the question? How will we answer the question? What data will we use? What analytical method will we use? What tools will we use? What are the results? What do the results mean? What did we learn / discover? Who should we inform? What is the next question? Manages Data ProductsManages EMPIRICAL DISCOVERY “a hybrid, purposeful, applied, augmented, iterative and serendipitous method for realizing novel insights for business, through analysis of large and diverse data sets.” Data Science and Empirical Discovery: A New Discipline Pioneering a New Analytical Method https://blogs.oracle.com/serendipity/entry/data_science_and_empirical_discovery
  68. 68. Data Science Insight Model Insight Model Data Product Product Analysts Outcomes
  69. 69. Analysis Workflow & Activities • Empirical analysis of subsets of data –Understand topology of data, boundaries (sets / subsets, complete corpus, totality of data) • Outlier identification and profiling –How significant are outliers to overall topology »Comparative exclusion and profiling of resulting data subsets to understand their role, discover principal components • Find and analyze patterns, areas of interestingness / deserving attention • Find and analyze central actors / factors (in existing model that produced source data, in topology of working data, in patterns, etc.) –ID and understand their impact on local and global data topology and primary metrics if in several ways / more than one axis / at the same time • Discover and analyze relationships amongst central actors –Understand cycles, trends, changes (dynamic characteristics) for core actors, topology, patterns and structure –Understand causal factors • Codify / create new model reflecting insights & outcomes from experiments
  70. 70. Data Science Workflow • Frame problem / goal of effort • Identify and extract data to be used in effort from whole corpus / totality of available data –Exploratory identification and selection of working data for use in experiments • Define experiment(s): hypothesis / null hypothesis, methods, success criteria –Derive insight(s) –Wrangle, process, visualize, interpret • Codify / create new model reflecting insights outcomes from experiments • Validate new model(s) • Provision training data • Train new model • Validation and outcome of training model • Hand-off for implementation on production systems / as production code
  71. 71. THE ESSENCE ▸Empirical perspective ▸Business imperatives drive activities ▸Analytical approach ▸Recipe is always the same ▸Engineering always present ▸Data challenges are paramount ▸consume 60% - 80% of time and effort ▸Data volumes range huge to moderate (PB > MB) ▸Domain often drives analysis ▸Data scientists already have self-service ▸Some new problems, many the same ▸Use ‘advanced’ analytics, not conventional BA ▸Innovate by applying known analyses to new data ▸Current workflow fragmented across tools and data stores ▸Success can be a model, product, insight, infrastructure, tool
  72. 72. Model of Analytical Workflow Articulates common analytical activities “realistic” - represents wrangling, some iterative dynamics bounded - does not represent business perspective Originated by Ben Lorica - O’Reilly *consistent with our research*
  73. 73. UNDERSTAND & EMPATHIZE WITH CUSTOMER PERSPECTIVES >>ARTICULATE CUSTOMER VALUE SOURCES
  74. 74. OPPORTUNITY ASSESSMENT PRODUCT DISCOVERY INVEST…? PORTFOLIO PLANNING
  75. 75. THE ESSENCE ▸Empirical perspective ▸Business imperatives drive activities ▸Analytical approach ▸Recipe is always the same ▸Engineering always present ▸Data challenges are paramount ▸consume 60% - 80% of time and effort ▸Data volumes range huge to moderate (PB > MB) ▸Domain often drives analysis ▸Data scientists already have self-service ▸Some new problems, many the same ▸Use ‘advanced’ analytics, not conventional BA ▸Innovate by applying known analyses to new data ▸Current workflow fragmented across tools and data stores ▸Success can be a model, product, insight, infrastructure, tool
  76. 76. “…HOUSTON, WE'VE GOT A PROBLEM”
  77. 77. John is tasked with analyzing 30 years of crime data collected by three different authorities. Accordingly, the data arrive in three different formats: one source is a relational database, another is a comma-separated values (CSV) file, and the third file contains data copied from various tables within a portable document format (PDF) report. Knowing the structure required for his visualization tool, John first reviews the different data sets to identify potential problems (step 1 in Figure 1). The relational database allows him to specify a query and generate a file in an acceptable format. For the comma delimited data, the column headings associated with the data were unclear. Using spreadsheet software he adds a row of header information at the top to fit the format required by the visualization tool. While updating the header, John notices that the location of a given crime is encoded in one column (as ‘City, State’) in the CSV file and encoded in two columns (one ‘City’ column and one ‘State’ column) in the relational database. He decides to split the column in the CSV file into two separate columns. John then opens the text file in the spreadsheet but the spreadsheet does not parse the data as desired. After manually moving data fields to appropriate columns and some other manipulation (step 2), John finally has consistent columns and now combines the three files into one, but then notices that some columns have inconsistently formatted cells. The ‘Date’ column is formatted as ‘dd/mm/yy’ in some cells and as ‘mm/dd/yyyy’ in others. John returns to the original files, transforms all the dates to the same format, and recombines the files. John loads the merged data file in a visualization tool (step 3). The tool immediately gives the error message ‘Empty cells in column 3’; it cannot cope with missing data. John returns to the spreadsheet to fill in missing values using a few spreadsheet formulas (back to step 2). He edits the data by hand; sometimes he transforms the data (e.g. one state reports data only every other year so he uses an average for the missing years). At other times there is nothing he can do after diagnosing a new problem (i.e. return to step 1). For example, he finds out that survey question 24 did not exist before 2000, and the most recent year of data from Ohio has not been delivered yet, so he tries to pick the best possible value (e.g. 1) to indicate missing values. John detects other, more nuanced, problems; for example, some cells have a blank space instead of being empty. It took hours to notice that difference. John tries to follow a systematic approach when evaluating the data, but it is difficult to keep track of what he has inspected and how he has modified the data, especially because he discovers different issues across different files. Even after all of this work, he is not sure if he has examined all of the variables or overlooked any outliers. After a while, the data file seems good enough and he decides to move on. It took a few days so it is with a great sense of accomplishment that John finally loads the data for the second time into the visualization tool he wants to use (step 3 again). He constructs several views of the data, including a geospatial representation of the crimes and a scatterplot of age against crime. As soon as he sees the visualized data he realizes that, unfortunately, data quality issues still persist. Extreme outliers appear in the visualization. Some outliers seem to be valid data (e.g. data from the District of Columbia are very different from data from every other state). Others seem suspicious (criminals may vary in age from teenagers to older adults, but apparently babies are also committing crimes in certain states). John iteratively removes those outliers he believes to be dirty data (e.g. criminals under 7 and over 120 years old). Times eries visualizations indicate that, in 1995, some causes of death disappear abruptly while new ones appear.Two days later, an email exchange with colleagues reveals that the classification of causes of death was changed that year. John writes a transformation script to merge the data so he can analyze distinct terms referring to the same (or at least similar) cause of death. Although the ‘real’ analysis is just about to start (step 4), John has made dozens of transformations, repeated the process several times, made important discoveries relating to the quality of the data, and made many decisions impacting the quality of the final ‘clean’ data. He also used visualization repeatedly while walking through the process, but still does not have results to show to his boss. Finally, he is able to work with the usable data, and useful insights come to the surface, but updated data sets arrive (step 5). Without proper documentation (step 6) of his transformations, John might be forced to repeat many of the tedious tasks. “Research directions in data wrangling: Visualizations and transformations for usable and credible data” “a process of iterative data exploration and transformation that enables analysis.” WRANGLING SCENARIO
  78. 78. Although the ‘real’ analysis is just about to start (step 4), John has made dozens of transformations, repeated the process several times, made important discoveries relating to the quality of the data, and made many decisions impacting the quality of the final ‘clean’ data. He also used visualization repeatedly while walking through the process, but still does not have results to show to his boss. Finally, he is able to work with the usable data, and useful insights come to the surface, but updated data sets arrive (step 5). Without proper documentation (step 6) of his transformations, John might be forced to repeat many of the tedious tasks. “Research directions in data wrangling: Visualizations and transformations for usable and credible data” “a process of iterative data exploration and transformation that enables analysis.” WRANGLING SCENARIO
  79. 79. One or more initial data sets may be used and new versions may come later. The wrangling and analysis phases overlap. While wrangling tools tend to be separated from the visual analysis tools, the ideal system would provide integrated tools (light yellow). The purple line illustrates a typical iterative process with multiple back and forth steps. Much wrangling may need to take place before the data can be loaded within visualization and analysis tools, which typically immediately reveals new problems with the data. Wrangling might take place at all the stages of analysis as users sort out interesting insights from dirty data, or new data become available or needed. At the bottom we illustrate how the data evolves from raw data to usable data that leads to new insights. “a process of iterative data exploration and transformation that enables analysis.” WRANGLING IN THE ANALYTICAL WORKFLOW
  80. 80. IT’S A CYCLE…
  81. 81. Discovery in the Analytical Workflow • Commonly recognizable cycle and focus for discovery activities (subset) • Explicitly iterative, ad-hoc, dynamic • Goal = incremental / directional advance in understanding • Core modes of engagement with data = Explore, Analyze • Modeling phase does not involve exploration Discovery
  82. 82. DEEP STRUCTURE CHANGE VECTORS EARLY SIGNALS INFLECTION POINTS EMERGING SPACES HOLISTIC EXPERIENCES
  83. 83. Activity Centered Design
  84. 84. designed many discovery solutions
  85. 85. scenario analysis
  86. 86. The Language of Discovery: A concrete descriptive language for human discovery activity in diverse contexts. A simple and consistent vocabulary that is independent of domain, role, information type, etc. The Language of Discovery: A concrete descriptive language for human discovery activity in diverse contexts. A simple and consistent vocabulary that is independent of domain, role, information type, etc.
  87. 87. activity grammar
  88. 88. Enables understanding of discovery needs and context
  89. 89. Generative tool for discovery capability and experiences
  90. 90. DISCOVERY S
  91. 91. Discovery Modes “a broad, but identifiable discovery activity that is not tied exclusively to a particular context or domain.”
  92. 92. Locate Verify Monitor Compare Comprehend Explore Analyze Evaluate Synthesize 9 modes
  93. 93. Locate To find a specific (possibly known) thing e.g. I need to find a new part with particular technical attributes and then source it from the most qualified supplier - Engineering Verify ‘To confirm or substantiate that an item or set of items meets some specific criterion’ e.g. How can I determine if I am looking at the latest information for a part or supplier? - Supply Chain Specialist Monitor ‘To maintain awareness of the status of an item or data set for purposes of management or control’ e.g. I need to monitor at risk/failing customers/dealers so I can prompt my Account Reps to fix the problems - Sales Manager
  94. 94. Compare To examine two or more things to identify similarities & differences e.g. I need to compare our module set teardowns with competitive teardown information to see if we’re staying competitive for cost, quality and functionality - Engineering Comprehend To generate insight by understanding the nature or meaning of something e.g. I need to analyze and understand consumer-customer-market trends to inform brand strategy & communications plan – Director, Brand Image Explore To proactively investigate or examine something for the purpose of knowledge discovery e.g. I need to understand the cost drivers for this commodity so I can negotiate better terms with my suppliers and forecast business risk based on market indices - Procurement
  95. 95. Analyze To critically examine the detail of something to identify patterns & relationships e.g. I need to know the cost drivers for a part such as materials that impact cost. Is the relationship a correlation or step function for a part cost driver? - Engineering Evaluate To use judgement to determine the significance or value of something with respect to a specific benchmark or model e.g. I need to determine my current state in my prints so I can evaluate if I have price variation to negotiate a better price - Procurement Synthesize To generate or communicate insight by integrating diverse inputs to create a novel artifact or composite view e.g. I need to prepare a weekly report for my boss (sales mgr) of how things are going - Account Rep
  96. 96. HYPOTHESIS
  97. 97. “…FOUND ‘EM!”
  98. 98. Locate Verify Monitor Compare Comprehend Explore Analyze Evaluate Synthesize 9 modes
  99. 99. Discovery Modes and Activity Explore Wrangle Analyze Augment Sensemaking Transformation data quality computed / enriched data New data triggers new cycles Cumulative Change Direction & Momentum Begin Conclude Goal: Make data useful for analysis Goal: Understand the nature and usefulness of data for analysis. Goal: Accumulate insight through iterative analysis Goal: Achieve insights by analyzing data.
  100. 100. Working with data to effect outcomes Explore Wrangle Analyze Augment Sensemaking Transformation data quality computed / enriched data New data triggers new cycles Cumulative Change Direction & Momentum Begin Conclude Advancing insight Can’t do this… …Without these capabilities Apparent Mode and Activity Affinities
  101. 101. Explore Wrangle Analyze Augment Sensemaking Transformation source data source & enriched data New data triggers new cycles Cumulative incremental progress Focus of attention: Organization of the data and quality issues Focus of attention: Actual & potential insights Real wrangling Real analysis Actual Discovery Modes and Activity Affinities
  102. 102. CAPABILITIES FOR VISUAL DISCOVERY & ANALYSIS TOOLS ▸ Explore data corpus ▸via effectively characterized catalog ▸ Explore individual data sets ▸effective preview / sample / subset ▸ Analyze data ▸within ad-hoc data sets, across ad-hoc data sets ▸ Wrangle data ▸within ad-hoc data sets, across ad-hoc data sets ▸ Verify outcomes: insights, models, data products ▸ Synthesize outcomes ▸ distinct types = insights, model, data product (project) ▸ Publish outcomes ▸ distinct types = insight, data product, model (project) ▸ Integrate specialized / external analytical tools {augment} ▸ analysis tools (R, Python), reference models, validation tools ▸ Integrate external workflow tools {enhancing} ▸ e.g. figshare, model management, projects ▸ Support analytical workflow {enhancing}
  103. 103. Discovery Capabilities: Core Explore Wrangle Analyze Augment Sensemaking Transformation data quality computed / enriched data Core discovery capabilities
  104. 104. Discovery Capabilities: Enhancing Explore Wrangle Analyze Augment Sensemaking Transformation data quality computed / enriched data Publish & operationalize outcomes Workflow, provenance, versioning, accelerators, collaboration Acquire and access data Enhancing capabilities
  105. 105. DEEP STRUCTURE CHANGE VECTORS EARLY SIGNALS INFLECTION POINTS EMERGING SPACES HOLISTIC EXPERIENCES
  106. 106. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 118 Activity Cycles & Capabilities Core Capabilities activity specific progressive Influencer By-product PublishImport Precursor • Core capabilities are necessary & primary to complete a given cycle • Enhancing capabilities are secondary within a cycle • Enhancing capabilities are necessary to accumulate assets(?) • Enhancing capabilities are necessary to advance to next cycle(?) asset types Workflow Collaboration PublicationAccelerators Enhancing Capabilities common random access Versioning Successor Provenance Metadata PublishImport Curation Governance Import
  107. 107. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 119 Capabililty Evolution Core Capabilities activity specific progressive Influencer By-product PublishImport Precursor • Core capabilities are necessary & primary to complete a given cycle • Enhancing capabilities are secondary within a cycle • Enhancing capabilities are necessary to accumulate assets(?) • Enhancing capabilities are necessary to advance to next cycle(?) asset types Workflow Collaboration PublicationAccelerators Enhancing Capabilities common random access Versioning Successor Provenance Metadata PublishImport Curation Governance Import
  108. 108. HYPOTHESIS
  109. 109. Business Analytics (future) Data Science (now) =
  110. 110. OPPORTUNITY “IS THERE ANY THERE, THERE?”
  111. 111. PRODUCT STRATEGY CHARTS A DESIRED SET OF COURSES THROUGH THE SPACE OF POSSIBLE PRODUCTS FOR A DOMAIN Joe Lamantia PRODUCT STRATEGY
  112. 112. OPPORTUNITY ASSESSMENT PRODUCT DISCOVERY INVEST…? PORTFOLIO PLANNING
  113. 113. Tools on the Market Now Explore Wrangle Analyze Augment Sensemaking Transformation data quality computed / enriched data Cumulative Change Direction & Momentum Begin Conclude Paxata, Trifacta Beyond Core? OSS / hand rolled EID 3.x Wave 1 wrangling tools now in market No good exploration tool in market
  114. 114. Tools on the Market Now Explore Wrangle Analyze Augment Sensemaking Transformation data quality computed / enriched data Cumulative Change Direction & Momentum Begin Conclude Alteryx Datameer Modest exploration capabilities
  115. 115. Tools on the Market Now Explore Wrangle Analyze Augment Sensemaking Transformation data quality computed / enriched data Cumulative Change Direction & Momentum Begin Conclude Alteryx Modest exploration capabilities Qlik
  116. 116. Tools on the Market Now Explore Wrangle Analyze Augment Sensemaking Transformation data quality computed / enriched data Cumulative Change Direction & Momentum Begin Conclude Tableau, Platfora Wave 1 visual analysis tools now in market Modest wrangling capabilities
  117. 117. BDD 1.x? Explore Wrangle Analyze Augment Sensemaking Transformation data quality computed / enriched data Cumulative Change Direction & Momentum Begin Conclude
  118. 118. BDD Future 1.x? Explore Wrangle Analyze Augment Sensemaking Transformation data quality computed / enriched data Cumulative Change Direction & Momentum Begin Conclude ‘Plugable’ external tools
  119. 119. BDD Future 2.x? Explore Wrangle Analyze Augment Sensemaking Transformation data quality computed / enriched data Cumulative Change Direction & Momentum Begin Conclude
  120. 120. VISUAL DISCOVERY AND ANALYSIS TOOLS: WAVE 1 Definition: traditional discovery & analysis possible on hadoop stores Value prop = easy access to hadoop stores for analysts w/out data engineer In / coming to market now: platfora, datameer, clearstory, sisense, etc. Segment is viable (people understand the need & have the problem) Tool maturity will increase incrementally, and in customary ways alignment to workflow particulars nuanced and compelling UX broader footprint of supporting capabilities: provenance, publishing, collaboration integration with ecosystem of related tools for activity This class of tools competes with & may replace / displace existing non-hadoop native tools that are still rising with the general analytics wave: qlik, tableau, microstrategy Firms making new investments (for new stacks) will try / buy this new generation Firms extending existing investments less likely to buy new Long view = tools in this segment could ‘eat’ BI marketshare by adding reporting and other structured analytical capabilities that capture customers who do not have large BI stacks now, begin investing here, and subsequently need BI capability
  121. 121. OPPORTUNITY ASSESSMENT PRODUCT DISCOVERY INVEST…? PORTFOLIO PLANNING
  122. 122. DEEP STRUCTURE CHANGE VECTORS EARLY SIGNALS INFLECTION POINTS EMERGING SPACES HOLISTIC EXPERIENCES
  123. 123. DATA DISCOVERY PRODUCT AN EXAMPLE
  124. 124. Oracle Confidential – Internal Oracle Big Data Discovery Overview Richard Tomlinson Director, Product Management September 25, 2014
  125. 125. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |Oracle Confidential – Internal Hadoop Data Reservoir Concept Gaining Momentum 142 Data Warehouse Data Reservoir Emerging Sources Existing Sources Source: wikibon.org/wiki/v/Big_Data_Vendor_Revenue_and_Market_Forecast_2013-2017 Source: 451 Research – Total Data Warehousing: 2013-2018 Source: The Forrester WaveTM: Big Data Hadoop Solutions, Q1 2014
  126. 126. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |Oracle Confidential – Internal Not Easy to Get Analytic Value from Hadoop 143 • Existing analytic tools fall short – Fail to expose potential of data up front – Rely on upstream ETL processes to cleanse and prepare data – Optimized for SQL not unstructured data – Not built for discovery (assume users know what questions to ask) • Only point solutions emerging – Leads to constant context switching – Need end-to-end capabilities • Early Hadoop tools complex – Pig, Oozie, Sqoop, Hive, Spark, etc • Specialized skills are scarce – Programming languages (e.g. Map Reduce, Python, Scala) – Statistics and machine learning – Command line interfaces
  127. 127. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |Oracle Confidential – Internal Requires a Fundamentally New Approach 144 A single intuitive, interactive and visual user interface Explore TransformDiscover Find for anyone to quickly find, explore, transform and analyze data in Hadoop then share results for enterprise leverage
  128. 128. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |Oracle Confidential – Internal 145 Oracle Big Data Discovery. The Visual Face of Hadoop Explore TransformDiscover Find
  129. 129. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |Oracle Confidential – Internal • Navigate a rich catalog of all data in the Hadoop cluster • Familiar search and guided navigation for ease of use • Access data set summaries, annotation and recommendations • Provision your own data through self-service upload • Data is automatically enriched with extracted locations, terms, sentiment • Browse personal big data projects and those shared by the community 146 Easily Find Relevant Data Sets
  130. 130. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |Oracle Confidential – Internal • Understand shape of the data. Visualize attributes by type • Entropy based sorting by information potential • View attribute statistics, data quality and outliers • Use scratch pad to see statistical correlations between attribute combinations • Evaluate whether a data set is worthy of further investment 147 Explore the Data and Understand Potential
  131. 131. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |Oracle Confidential – Internal • Intuitive user driven data wrangling • Library of data transformations to replace values, convert types, collapse, reshape, pivot, group, custom tag, merge and much more • Data enrichments for inferring location and language. Theme, entity and sentiment enrichments for text • Preview results, undo, commit and replay transforms • Run on sample data in memory or full data set in Hadoop 148 Transform and Enrich Data to Make it Ready
  132. 132. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |Oracle Confidential – Internal • Mash up different data sets for deeper perspectives • Drag and drop from a rich library of interactive visualizations to compose discovery dashboards • Filter through data with powerful search and intuitive guided navigation • Share projects, bookmarks and snapshots with team members for collaboration 149 Analyze the Data to Discover New Insights
  133. 133. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |Oracle Confidential – Internal Share Results and Publish for Enterprise Leverage 150 • Share and collaborate with the team – Share projects, bookmarks and snapshots then collaborate and iterate • Publish back to Hadoop – Transforms and enrichments may be applied to original data sets in Hadoop – Publish blended data sets back to HDFS • Leverage results in other tools – Publish data to Hadoop in format optimized for advanced analytic tools (e.g. ORAAH) – Hadoop compliant BI tools (e.g. OBIFS) can burst out to the masses – Leverage any native Hadoop tooling (e.g. Pig, Hive, Impala, Python, etc) – Integrate BDD data sets with DWH to secure, govern and optimize for query performance (e.g. Oracle Big Data SQL) Oracle Big Data Discovery plays well with the big data ecosystem Explore Transfor mDiscover Find Share & Collaborate raw data transformed data data reservoir (HDFS) Publish data warehouse business intelligenc e advanced analytics other hadoop tools Leverage
  134. 134. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |Oracle Confidential – Internal Oracle Big Data Discovery. Technical Innovation on Hadoop 151 Oracle Big Data Discovery Workloads Hadoop Cluster (BDA or Commodity) data node data node data node data node data node name node Data Processing, Workflow & Monitoring • Profiling: catalog entry creation, data type & language detection, schema configuration • Sampling: dgraph (index) file creation • Transforms: >100 functions • Enrichments: location (geo), text (cleanup, sentiment, entity, key- phrase, whitelist tagging) Self-Service Provisioning & Data Transfer • Personal Data: Upload CSV, XLS and JSON to HDFS • Enterprise Data: Provision from RDBMS to HDFS In-Memory Discovery Indexes • DGraph: Search, Guided Navigation, Analytics Studio • Web UI: Catalog, Explore, Transform, Analyze, Share Hadoop 2.x Filesystem (HDFS) Workload Mgmt (YARN) Metadata (HCatalog) Other Hadoop Workloads MapReduce Spark Hive Pig Oracle Big Data SQL (BDA only)
  135. 135. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 152
  136. 136. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 153
  137. 137. DEEP STRUCTURE <> ANALYTICAL WORKFLOW CHANGE VECTORS <> BIG DATA TECHNOLOGIES EARLY SIGNALS <> RISE OF DATA SCIENCE INFLECTION POINTS <> DATA SCIENCE MOMENT EMERGING SPACES <> EMPIRICAL DISCOVERY HOLISTIC EXPERIENCES <> VISUAL DISCOVERY TOOL
  138. 138. WHAT NEXT…?
  139. 139. VISUAL DISCOVERY & ANALYSIS TOOLS: WAVE 2 Definition: Augmented discovery & analysis across full business data corpus Value prop = deeper insights from more diverse data, faster insights, effected via a mixed toolkit of (semi)automated analytical techniques (clustering, machine learning, regression / correlation, etc.) enhances and directs analyst attention Vectors of augmentation: data types, degree of automation data = text / lingual, location / spatial, native graph, native stream automation = which specific activities are augmented, to what degree) Wave 2 is at the ‘pioneer’ stage: specifics of capability, value, implementation unknown Limiting factors: Domain specificity: value of general discovery analytics drops once domain boundaries are reached - need to align specifically to domain view of world Expect verticalization of all analytics Low / no tolerance for black boxes - deeper insights require transparency Analytical literacy: level increasing, but orgs can’t benefit from advanced analytical techniques if not understood & trusted
  140. 140. OPPORTUNITY ASSESSMENT PRODUCT DISCOVERY INVEST…? PORTFOLIO PLANNING
  141. 141. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |Oracle Confidential – Internal Feature Selection Joe Lamantia Product Strategy(ist) Oracle Big Data Discovery November, 2014
  142. 142. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 159 Feature Selection In machine learning and statistics, feature selection, also known as variable selection, attribute selection or variable subset selection, is the process of selecting a subset of relevant features for use in model construction. The central assumption when using a feature selection technique is that the data contains many redundant or irrelevant features. Redundant features are those which provide no more information than the currently selected features, and irrelevant features provide no useful information in any context. Feature selection techniques are a subset of the more general field of feature extraction. Feature extraction creates new features from functions of the original features, whereas feature selection returns a subset of the features. Feature selection techniques are often used in domains where there are many features and comparatively few samples (or data points). Feature selection is also useful as part of the data analysis process, as it shows which features are important for prediction, and how these features are related.
  143. 143. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | BDD Feedback: Data Scientist Interviews “Analysts don’t generally analyze the catalog per se - they analyze line items, or actions, or histories, that kind of thing.” “It’s generally actions that people are interested in.” 160
  144. 144. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 161 Data Records Catalog Format Entities Product, location Connections Satisfaction Goals Acquire Transform Events Purchase Status change Structures & Systems User centric Data centric Networks Business unit Community Loyalty factors Themes Profit Efficiency Plans Balance budget Launch product Manage risks Business Perspective Progressive engagement Complexity & difficulty Value of outcome Activities Traffic logging Address change Processes Fulfillment Brand monitoring Analysis PerspectiveData Perspective Domains Supply chain Industry / market Models Conversion Lifetime Customer Value (Decision tree) Measures Attrition rate Unit cost of materials Sensemaking Spectrum How analysts have to engage with data
  145. 145. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 162 Data Records Catalog Format Entities Product, location Connections Satisfaction Goals Acquire Transform Events Purchase Status change Structures & Systems User centric Data centric Networks Business unit Community Loyalty factors Themes Profit Efficiency Plans Balance budget Launch product Manage risks Business Perspective Progressive engagement Complexity & difficulty Value of outcome Activities Traffic logging Address change Processes Fulfillment Brand monitoring Analysis PerspectiveData Perspective Domains Supply chain Industry / market Models Conversion Lifetime Customer Value (Decision tree) Measures Attrition rate Unit cost of materials Sensemaking Spectrum How analysts want to engage with data
  146. 146. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | BDD Feedback: Data Scientist Interviews “The transforms are for feature engineering, right?” “What other goals are there for the transforms?” “I would assume that’s the only reason for the transforms…” 163
  147. 147. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | BDD Feedback: Data Scientist Interviews “Getting the data right is the hard part. Once you get the data right…” 164
  148. 148. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | BDD Feedback: Data Scientist Interviews “…feature engineering needs to be an iterative process” “…this is an iterative process. Everything goes in a circle.” “You’re going to do some data cleaning, you’re going to build a model, you’re going to have to go back and look at what you’re missing and what you’re not missing.” • 165
  149. 149. IT’S A CYCLE…
  150. 150. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 167 Analytical Activity Explore Wrangle Analyze Augment Sensemaking Transformation Features Goals Realize insights Generate Models Goals Understand data Make data useful Cumulative incremental progress Data quality & Features
  151. 151. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 168 Feature Extraction Engineering Generation Selection …
  152. 152. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | We can repurpose techniques used during the traditional feature selection stage of the analytical workflow to enhance other stages of the discovery and analysis workflow. A likely candidate is exploration as it is coupled with wrangling. …Allow analyst engagement and focus on more useful constructs like entities or business processes, instead of dealing only with raw values and attributes 169 Thesis
  153. 153. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | BDD 1.0 EID EID 170 BDD ? Acquire Ingest & Clean Store & Manage Featurize Wrangle Visual Analysis Interactive Queries Modeling Story-telling Build Deploy Monitor & Maintain Present Disseminate Insight cycle Modeling cycle
  154. 154. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | How? • Features are discovered and inferred • statistical & other domain-independent methods • Domain-based • Known features used to train system • Sources • artifacts (scripts, models, dictionaries) • analytical activities • direct indication • 171
  155. 155. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Possible Manifestations • Feature-based operations • wrangling: transforms, joins, • exploration: search, visualization, • analysis • Feature recognition: known features identified in new data • Feature-based enrichment • Interest graphs - Individual and group • Modeling capabilities 172
  156. 156. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | • Movement toward user-centric engagement with data: • Entity-centric navigation & event linkage across data sets (Platfora) • Answerset (Paxata) • semantic search & enrichments (BDD) • thematic data lenses (platfora) • data harmonization and data stories (clearstory) • natural language interaction / cognitive computing (IBM) • expert network (tamr) 173 What’s happening in this product space?
  157. 157. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 174 Capabililty Evolution Core Capabilities activity specific progressive Influencer By-product PublishImport Precursor • Core capabilities are necessary & primary to complete a given cycle • Enhancing capabilities are secondary within a cycle • Enhancing capabilities are necessary to accumulate assets(?) • Enhancing capabilities are necessary to advance to next cycle(?) asset types Workflow Collaboration PublicationAccelerators Enhancing Capabilities common random access Versioning Successor Provenance Metadata PublishImport Curation Governance Import
  158. 158. WHAT NEXT…?
  159. 159. OPPORTUNITY ASSESSMENT PRODUCT DISCOVERY INVEST…? PORTFOLIO PLANNING
  160. 160. Data Science Insight Model Insight Model Data Product Product Analysts Outcomes
  161. 161. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | BDD Feedback: Data Scientist Interviews “How do you know what changes you want to make until you build your model?  Once you build your model, you know you want to take the square root of this, or the log of this.  That doesn’t happen until you start building a model…” 178
  162. 162. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 179 Discovery & Analysis Workflow Acquire Ingest & Clean Store & Manage Featurize Wrangle Visual Analysis Interactive Queries Modeling Story-telling Build Deploy Monitor & Maintain Present Disseminate Insight cycle Modeling cycle Adapted from ‘Data Analysis Just One Component of the Data Science Workflow’ http://radar.oreilly.com/2013/09/data-analysis-just-one-component-of-the-data-science-workflow.html Features Insights
  163. 163. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | BDD 1.0 EID EID 180 Analytical Workflow Acquire Ingest & Clean Store & Manage Featurize Wrangle Visual Analysis Interactive Queries Modeling Story-telling Build Deploy Monitor & Maintain Present Disseminate Insight cycle Modeling cycleData Ingest cycle
  164. 164. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Joe Lamantia | Product Strategist: Oracle Endeca Big Data Discovery Modeling
  165. 165. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Joe Lamantia | Product Strategist: Oracle Endeca Big Data Discovery Old-school Modeling • Compute is expensive • Good (relevant) data is scarce • All data is difficult to work with, require considerable time and attention just to get provisionally ready • Human attention is limited - at all levels: engineer, analyst, insight consumer,  • ‘Experiments' are small, planned, receive close attention • Rely first on a library of well known methods (carefully vetted by years of practice) • Don’t run the experiment unless you know you can evaluate the results •    be sure you have the time •    be sure have the expertise •    be confident the results will be meaningful /insightful • Automation is only feasible in limited circumstances • Humans interpret experimental results • Complete experiments before evaluating them • ‘Small’ infrastructure - data sets, compute source, evaluation tools, archiving • Modeling is best done by the knowledgeable •     can have negative consequences when done by novices • Toolset aligned to: small / mid-sized data •      requires a high-quotient of human engagement, both directive / evaluative, and to enable execution 182
  166. 166. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Joe Lamantia | Product Strategist: Oracle Endeca Big Data Discovery New-school Modeling • Compute is cheap • Data is abundant | good (relevant) data is often available  • Data is still challenging to work with, but tooling allows engagement with much greater quantities, of many types • Run many experiments • Try many approaches, using new and old methods • Machines interpret experimental results, at least in part (batch eval for initial ranking of potential insight) • ‘Big’ infrastructure - data sets, compute source, evaluation tools, archiving • Automate where possible: selecting data, prepping data, choosing methods, setting parameters, executing experiments, evaluating results • Modeling is better done by those with knowledge, but it can have utility for non-experts • [forward-looking analogs: genomics, bioinformatics, computational neuroscience] • Toolset wants to be aligned to big data •   profile of human engagement varies over analytical lifecycle, seeking automation where possible in direction / evaluation, and execution 183
  167. 167. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Joe Lamantia | Product Strategist: Oracle Endeca Big Data Discovery Practices • Combine old-school and new school approaches at different stages of the analytical cycle • Starting points vary by practitioner maturity, understanding of problem, available resources • Experiments often alternate approaches • Use automation where possible 184
  168. 168. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Joe Lamantia | Product Strategist: Oracle Endeca Big Data Discovery185
  169. 169. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Joe Lamantia | Product Strategist: Oracle Endeca Big Data Discovery Modeling 186 Exploratory Analysis Identify features Understand relations between features Create new features Characterize Dataset Build Baseline Model Build Complex Model Feature Engineering & Model Tuning New features Straight-forward & well-known modeling methods Explore & understand contents, distribution, quality, etc. Iterative experimentation with several classes of modeling methods Compare to baseline Comparative / reference model Iterative & experimental model & feature combination, tuning, evaluation Recursive feature elimination Modeling, Testing, Training, Evaluation data sets Initial Predictive Model Final Predictive Model Explanatory Model Explanatory Model Discovery cycle Modeling cycle
  170. 170. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Joe Lamantia | Product Strategist: Oracle Endeca Big Data Discovery Modeling & BDD 187 Exploratory Analysis Identify features Understand feature relationships Create new features Characterize Dataset Build Baseline Model Build Complex Model Feature Engineering & Model Tuning New features Straight-forward & well-known modeling methods Explore & understand contents, distribution, quality, etc. Iterative experimentation with several classes of modeling methods Compare to baseline Comparative / reference model Iterative & experimental model & feature combination, tuning, evaluation Recursive feature elimination Modeling, Testing, Training, Evaluation data sets Initial Predictive Model Final Predictive Model Explanatory Model Explanatory Model Initial capability…
  171. 171. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Joe Lamantia | Product Strategist: Oracle Endeca Big Data Discovery Modeling & BDD 188 Exploratory Analysis Identify features Understand feature relationships Create new features Characterize Dataset Build Baseline Model Build Complex Model Feature Engineering & Model Tuning New features Straight-forward & well-known modeling methods Explore & understand contents, distribution, quality, etc. Iterative experimentation with several classes of modeling methods Compare to baseline Comparative / reference model Iterative & experimental model & feature combination, tuning, evaluation Recursive feature elimination Modeling, Testing, Training, Evaluation data sets Initial Predictive Model Final Predictive Model Explanatory Model Explanatory Model Subsequent capability…
  172. 172. OPPORTUNITY ASSESSMENT PRODUCT DISCOVERY INVEST…? PORTFOLIO PLANNING
  173. 173. WHAT NEXT…?
  174. 174. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 191 Business Assets & Activity Cycles Adapted from ‘Data Analysis Just One Component of the Data Science Workflow’ http://radar.oreilly.com/2013/09/data-analysis-just-one-component-of-the-data-science-workflow.html Featurize Wrangle Visual Analysis Interactive Queries Discovery Modeling Features Data Application VectorsEnrichments Acquire Ingest & Clean Manage & Update Model Train EvaluateUpdate Build MonitorStore & Expose Insights ModelsData Train Deploy corpus operational analytical archival insight stream awareness explanatory prescriptive intelligence machine human hybrid systems transactional engagement insight
  175. 175. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 192 Tool Archetypes Featurize Wrangle Trifacta Visual Analysis Platfora Interactive Queries Datameer Discovery ModelingData Application Acquire Ingest & Clean Manage & Update Model Train EvaluateUpdate Build Train Deploy MonitorStore & Expose Data science workbenches Sense, yhat Application Foundries Azure ML, IBM Traditional app studios Java Discovery Workbenches BDD x Data Integrators Clover Analysis Workbenches Alteryx, Alpine Analytics Platforms Teradata, Pivotal ML services BigML, Wise.io, Skytree Business Intelligence Suite OBIEE, Cognos Python notebooks iPython, juPyter
  176. 176. DEEP STRUCTURE CHANGE VECTORS EARLY SIGNALS INFLECTION POINTS EMERGING SPACES HOLISTIC EXPERIENCES
  177. 177. VALUE CHAIN MAP (WARDLEY MAPPING)
  178. 178. VALUE CHAIN MAP (WARDLEY MAPPING) ML
  179. 179. WORKING THE ECOSYSTEM • Oracle = an ecosystem • ML = commoditizing • Someone will ‘generate the electricity’ = provide ML capability within the Oracle ecosystem • Everyone’s going to need it…
  180. 180. OPPORTUNITY ASSESSMENT PRODUCT DISCOVERY INVEST…? PORTFOLIO PLANNING
  181. 181. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Joe Lamantia | Product Strategist: Oracle Endeca Big Data Discovery200
  182. 182. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Joe Lamantia | Product Strategist: Oracle Endeca Big Data Discovery201
  183. 183. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Joe Lamantia | Product Strategist: Oracle Endeca Big Data Discovery Oracle Machine Learning Service
  184. 184. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Joe Lamantia | Product Strategist: Oracle Endeca Big Data Discovery203
  185. 185. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Joe Lamantia | Product Strategist: Oracle Endeca Big Data Discovery Genesis 204
  186. 186. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Joe Lamantia | Product Strategist: Oracle Endeca Big Data Discovery Offering • Machine learning service exposed as • Stand-alone productized service (public cloud) • ‘Product’ integrated with relevant Oracle cloud offerings • enable machine learning / analytics pipelines for data spanning service boundaries • ‘White-label’ ML capability within cloud offerings (SaaS, IaaS, PaaS, DaaS, etc.) • enables localized ML / analytics pipelines w/in service boundaries • Collection of Oracle-specific ML accelerators • Data sets & streams, pipelines, algorithms, R / python libs, project templates, etc. • 205
  187. 187. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Joe Lamantia | Product Strategist: Oracle Endeca Big Data Discovery Oracle Value Prop • Provides ML capability across cloud offerings for expanded data landscape • Big data • Big data + Traditional Enterprise in combination • Streaming Data • IOT • Reinforces ‘data gravity’ effect across Oracle cloud offerings • Entry point for ‘new stack’ (cloud-only) customers needing ML capability • ‘Missing link’ completes analytical pipelines across tool boundaries 206
  188. 188. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Joe Lamantia | Product Strategist: Oracle Endeca Big Data Discovery Data Landscape 207 Complexity Quantity Traditional Enterprise Big Data IOT Oracle Machine Learning Service Product-native ML Stream / Real-time
  189. 189. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Joe Lamantia | Product Strategist: Oracle Endeca Big Data Discovery Customer Value Prop • Easy machine learning w/in ecosystem of Oracle cloud offerings • Turnkey • Elasticity and adaptivity: resources, pricing, • Portability across Oracle product / service boundaries • Manifests appropriately for product / service contexts • Application Developers • Analysts / Data Scientists • Business users • Machine consumers 208
  190. 190. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Joe Lamantia | Product Strategist: Oracle Endeca Big Data Discovery SaaS ML For the Oracle Cloud Ecosystem 209 Oracle Machine Learning Service DaaS Data Service IaaS Infrastructure Service PaaS Platform Service Data & Models Data & Models Data & Models ‘Public’ OML product Customer Applications & Data sources Data & Models
  191. 191. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Joe Lamantia | Product Strategist: Oracle Endeca Big Data Discovery SaaS White Label ML Capability 210 DaaS Data Service IaaS Infrastructure Service PaaS Platform Service Machine Learning ML ToolsML Tools ML Tools ML Tools ML Tools ML Tools ML Tools ML Tools ML Tools Customer Applications & Data sources Oracle Machine Learning Service ‘Public’ OML product Data & Models Data & Models Data & Models Data & Models
  192. 192. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Joe Lamantia | Product Strategist: Oracle Endeca Big Data Discovery Oracle Ecosystem • All cloud services can be • data sources for ML service • consumers of published data & models from ML service • OML can publish augmented datasets (e.g. pre-scored matrices) as part of multistep & multi-tool analytical pipelines  • 211
  193. 193. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Joe Lamantia | Product Strategist: Oracle Endeca Big Data Discovery Initial Capability • Core ML functions: • data upload (no transform - BDD integration) from Oracle sources • modeling / analysis via general purpose, interpretable, methods • model training • model evaluation • Model publication • Processed data publication  212
  194. 194. OPPORTUNITY ASSESSMENT PRODUCT DISCOVERY INVEST…? PORTFOLIO PLANNING
  195. 195. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Joe Lamantia | Product Strategist: Oracle Endeca Big Data Discovery214 Featurize Wrangle Visual Analysis Interactive Queries Discovery ModelingData Application Acquire Ingest & Clean Manage & Update Model Train EvaluateUpdate Build Train Deploy MonitorStore & Expose Discovery Workbenches BDD (now) ML services Oracle Machine Learning Discovery & Modeling Platform BDD & ML (combined analysis offering ?)
  196. 196. WHAT NEXT…?
  197. 197. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 217 Automation Potential Featurize Wrangle Visual Analysis Interactive Queries Discovery Modeling Adapted from ‘Data Analysis Just One Component of the Data Science Workflow’ http://radar.oreilly.com/2013/09/data-analysis-just-one-component-of-the-data-science-workflow.html Features Data Application VectorsEnrichments Acquire Ingest & Clean Manage & Update Model Train EvaluateUpdate Build Train Deploy MonitorStore & Expose Insights ModelsData
  198. 198. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 218 Machine Intelligence Value Chain Adapted from ‘Data Analysis Just One Component of the Data Science Workflow’ http://radar.oreilly.com/2013/09/data-analysis-just-one-component-of-the-data-science-workflow.html Featuriz Wrangl Visual Analys Interactiv e Discover Modeling Feature Data Application VectorEnrichmen Acquir Ingest & Manage & Mode Trai EvaluatUpdat Buil MonitoStore & Insight ModelsData Trai Deplo corpus operational analytical archival insight stream awareness intelligence machine human hybrid systems transactional engagement insight Process operations? transactional engagement insight Apps Metric Create Machine Intelligence Operationalize Machine Intelligence
  199. 199. DEEP STRUCTURE <> PRODUCT DEVELOPMENT CHANGE VECTORS <> ACQUISITION EARLY SIGNALS <> MARKET ACTIVITY INFLECTION POINTS <> INNOVATION MOMENTS EMERGING SPACES <> PRODUCT STRATEGY GIG HOLISTIC EXPERIENCES <> EXPERIENCE FOCUS
  200. 200. THANK YOU! I’M HIRING…
  201. 201. TOOLS & FRAMEWORKS AN EXAMPLE
  202. 202. The Language of Discovery Category: Primary Research, Design Systems Outcomes: Building on already-published original applied research into information retrieval and usage, the language of discovery posits a domain- independent framework describing the activity primitives of discovery in terms of ‘modes’.   Succeeding professional and industry publications outline the application of this descriptive vocabulary in settings including product design and development, product strategy, and information management. Reference: • Russell-Rose, T., Lamantia, J. and Burrell, M. 2011. A Taxonomy of Enterprise Search and Discovery. Proceedings of EuroHCIR 2011, London, UK. http://ceur-ws.org/Vol-763/paper4.pdf • Russell-Rose, T., Lamantia, J. and Burrell, M. 2011. A Taxonomy of Enterprise Search and Discovery. Proceedings of HCIR 2011, California, USA. https://docs.google.com/a/kent.edu/viewer? a=v&pid=sites&srcid=ZGVmYXVsdGRvbWFpbnxoY2lyd29ya3Nob3B8 Z3g6NzdmYjc3OWY2ZjQ2Zjg4MQ • Russell-Rose, T. and Makri, S. 2012 A Model of Consumer Search Behavior. Proceedings of EuroHCIR 2012, Nijmegen, NL. • Designing the Search Experience: http://www.amazon.com/Designing- Search-Experience-Information-Architecture/dp/0123969816 • Presentation - Strata: http://conferences.oreilly.com/strata/ stratany2012/public/schedule/detail/25411 • Presentation - UX Lisbon conference: http://www.joelamantia.com/ user-experience-ux/slides-for-uxlx-talk-the-language-of-discovery-a- grammar-for-designing-big-data-interactions
  203. 203. Domain & Market Study: Data Science Outcomes: Comprehensive portrait of all major facets of a new analytical discipline, including its practices, roles, methodology, tools and technologies, workflows, organizational models, skillsets, alignment with business, areas of innovation, and relation to the landscape of business analytics.  Research outcomes and synthesized insights guided product design, management, and strategy efforts including; opportunity identification and profiling, landscape / competitive modeling, technology lifecycle and evolution models, product discovery, concept creation and evaluation, prototyping. Notable aspects: Consistently delivered insights twelve or more months ahead of leading industry analysts pursuing similar agendas. Artifacts & Synthesis • Data Science Highlights: http://www.joelamantia.com/user- research/data-science-highlights-an-investigation-of-the-discipline • Empirical Discovery Concept and Workflow Model: https:// blogs.oracle.com/serendipity/entry/ empirical_discovery_concept_and_workflow • Empirical Discovery: A New Discipline https://blogs.oracle.com/ serendipity/entry/data_science_and_empirical_discovery • Defining Discovery: Core Concepts: https://blogs.oracle.com/ serendipity/entry/defining_discovery_core_concepts • Discovery and the Age of Insight http://www.joelamantia.com/ language-of-discovery/discovery-and-the-age-of-insight • Big Data Is Not Enough http://www.joelamantia.com/user- experience-ux/big-data-is-not-the-insight-slides-from-enterprise- search-europe
  204. 204. DEEP STRUCTURE CHANGE VECTORS EARLY SIGNALS INFLECTION POINTS EMERGING SPACES HOLISTIC EXPERIENCES
  205. 205. DEEP STRUCTURES ENTERPRISE / B2B • Business process • Activity • Social structure: Organizational model • Boundaries • Regulation • IT / Systems architecture • Lifecycle • Flows: capital, information, people • Frame: shareholder value, social enterprise CONSUMER / B2C • Value scheme: wealth, love, knowledge, safety • Demographics • Boundaries • Mores • Culture • Social structure: community / group • Frame: active lifestyle, sustainability
  206. 206. OPPORTUNITY ASSESSMENT PRODUCT DISCOVERY INVEST…? PORTFOLIO PLANNING
  207. 207. CONTINUOUS LEARNING
  208. 208. UNDERSTAND & EMPATHIZE WITH CUSTOMER PERSPECTIVES >>ARTICULATE CUSTOMER VALUE SOURCES
  209. 209. IDENTIFY BUSINESS IMPLICATIONS >> INFORM ALL STAGES OF PRODUCT & SERVICE DEVELOPMENT
  210. 210. INVESTIGATING CUSTOMERS EXPLORING HYPOTHESES ABOUT VALUE
  211. 211. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 232 Activity Cycles [Structural View] Initial Activity Final Activity Cycle Successor InfluencerBy-product OutcomeInput Precursor Interim Activity Interim Activity • Cycles are iterative • Activities are progressive • Can begin w/ any activity • Best to begin w/ initial activity • Impact of activity increases with ‘distance’ - can span cycles • Inputs are necessary • Precursors can be incomplete (?) • Influencers are ‘from the future’ • Influencers enhance the local cycle • By-products enhance the precursor • Assets are cumulative • Assets depend on precursor cycles • Assets communicate via cycles asset types
  212. 212. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 233 Business Assets & Activity Cycles Adapted from ‘Data Analysis Just One Component of the Data Science Workflow’ http://radar.oreilly.com/2013/09/data-analysis-just-one-component-of-the-data-science-workflow.html Featurize Wrangle Visual Analysis Interactive Queries Discovery Modeling Features Data Application VectorsEnrichments Acquire Ingest & Clean Manage & Update Model Train EvaluateUpdate Build MonitorStore & Expose Insights ModelsData Train Deploy corpus operational analytical archival insight stream awareness explanatory prescriptive intelligence machine human hybrid systems transactional engagement insight
  213. 213. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 234 Activity Integration Points / Interfaces Initial Activity Final Activity Cycle Successor InfluencerBy-product OutcomeInput Precursor Interim Activity Interim Activity • Integration necessary for individual activities to communicate w/ one another w/in a cycle • Gaps = demand for enhancing capabilities • Integration is made possible by enhancing capabilities • Cycles = accelerated by good integration • Cycles = slowed by poor integration • Activity speed is not affected by integration? asset types
  214. 214. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 235 Data Pipeline Featurize Wrangle Visual Analysis Interactive Queries Discovery Modeling Adapted from ‘Data Analysis Just One Component of the Data Science Workflow’ http://radar.oreilly.com/2013/09/data-analysis-just-one-component-of-the-data-science-workflow.html Features Data Application VectorsEnrichments Acquire Ingest & Clean Manage & Update Model Train EvaluateUpdate Build Train Deploy MonitorStore & Expose Insights ModelsData
  215. 215. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 236 Machine Intelligence Value Chain Adapted from ‘Data Analysis Just One Component of the Data Science Workflow’ http://radar.oreilly.com/2013/09/data-analysis-just-one-component-of-the-data-science-workflow.html Featuriz Wrangl Visual Analys Interactiv e Discover Modeling Feature Data Application VectorEnrichmen Acquir Ingest & Manage & Mode Trai EvaluatUpdat Buil MonitoStore & Insight ModelsData Trai Deplo corpus operational analytical archival insight stream awareness intelligence machine human hybrid systems transactional engagement insight Process operations? transactional engagement insight Apps Metric Create Machine Intelligence Operationalize Machine Intelligence
  216. 216. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 237 Tool Archetypes Featurize Wrangle Trifacta Visual Analysis Platfora Interactive Queries Datameer Discovery ModelingData Application Acquire Ingest & Clean Manage & Update Model Train EvaluateUpdate Build Train Deploy MonitorStore & Expose Data science workbenches Sense, yhat Application Foundries Azure ML, IBM Traditional app studios Java Discovery Workbenches BDD x Data Integrators Clover Analysis Workbenches Alteryx, Alpine Analytics Platforms Teradata, Pivotal ML services BigML, Wise.io, Skytree Business Intelligence Suite OBIEE, Cognos Python notebooks iPython, juPyter
  217. 217. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 238 Activity Cycles & Capabilities Core Capabilities activity specific progressive Influencer By-product PublishImport Precursor • Core capabilities are necessary & primary to complete a given cycle • Enhancing capabilities are secondary within a cycle • Enhancing capabilities are necessary to accumulate assets(?) • Enhancing capabilities are necessary to advance to next cycle(?) asset types Workflow Collaboration PublicationAccelerators Enhancing Capabilities common random access Versioning Successor Provenance Metadata PublishImport Curation Governance Import
  218. 218. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | enhancing capabilities common 239 Assets & Capabilities core capabilities asset specific Workflow Collaboration PublicationAccelerators Versioning Provenance Metadata Curation Governance Import
  219. 219. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 240 Asset Scope Enterprise Line of Business Enterprise Localized Line of Business Localized • Scope determines / implies boundaries, metrics • Distinct systems (IT) and processes (biz) for each asset, at each level of scope • Each distinct system and process = integration point, create barrier to flow, require interface
  220. 220. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Enterprise 241 Asset Communication Line of Business Localized • Scope determines / implies boundaries, metrics • Distinct systems (IT) and processes (biz) for each asset, at each level of scope • Each distinct system and process = integration point, create barrier to flow, require interfaceenhancing capabilities common enhancing capabilities common enhancing capabilities
  221. 221. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 242 Capabililty Evolution Core Capabilities activity specific progressive Influencer By-product PublishImport Precursor • Core capabilities are necessary & primary to complete a given cycle • Enhancing capabilities are secondary within a cycle • Enhancing capabilities are necessary to accumulate assets(?) • Enhancing capabilities are necessary to advance to next cycle(?) asset types Workflow Collaboration PublicationAccelerators Enhancing Capabilities common random access Versioning Successor Provenance Metadata PublishImport Curation Governance Import
  222. 222. VALUE CHAIN MAP (WARDLEY MAPPING)

×