SlideShare ist ein Scribd-Unternehmen logo
1 von 11
Downloaden Sie, um offline zu lesen
New Insights from ‘Big Legacy Data’:
The Role of Text Analytics at Bundle.com
Jaime Fitzgerald, President, Fitzgerald Analytics, Inc.
Alex Hasha, Chief Data Scientist, Bundle.com

October 2011


                                                   Architects of Fact-Based Decisions™
Agenda for Today’s Talk




                          1.       Introduction to the Business Model


                          2.       The Role of Text Analytics


                          3.       A Key Challenge and How we Overcame It


                          4.       Takeaways


                          5.       Q&A




New Insights from ‘Big Legacy Data’: The Role of Text Analytics at Bundle.com   2
Introduction

                                                      Jaime Fitzgerald,                        Alex Hasha

                                                      Founder @                                Data Scientist @
                                                      Fitzgerald Analytics                     Bundle Corp
                                                      @JaimeFitzgerald                         @AlexHasha

                                                                                Leading development of data products
                                 Transforming data into value for clients
    Responsible                                                                 Designing statistical methods / algorithm
          For…                                                                  that transform data into insights for
                                 Creating meaningful careers for employees
                                                                                consumers

                                 Helps clients convert Data to Dollars™         Uses data to help consumers make better
           At a                                                                 decisions with their money
                                 Brings a strategic perspective to improve      Bends valuable legacy data to new
       Company
                                 ROI on investments in technology, data,        purposes
          That
                                 people, and processes                          Is growing and hiring!

                                 Working on a movement to Democratize
            Also                                                                Learning about and implementing best
                                 Analytics by Reducing the “Barrier to
         Working                                                                practices for managing complex data
                                 Benefit” for non-profits, social
             On                                                                 pipelines
                                 entrepreneurs, and gov’t



New Insights from ‘Big Legacy Data’: The Role of Text Analytics at Bundle.com                                               3
For Example, We Help You Decide Where to Spend…




New Insights from ‘Big Legacy Data’: The Role of Text Analytics at Bundle.com   4
We Do This with Billions of Real Spending Records
        Unlike other merchant listing sites, our content is based on real credit card
        spending by 20 million households
                                                                                Key Issues with this Data:
        Example: Credit Card Statement Data                                     1. Credit card data lacks
                                                                                   merchant identifier
                                                                                2. So we rely on text analytics
                                                                                   to associate transactions
                                                                                   with merchants




New Insights from ‘Big Legacy Data’: The Role of Text Analytics at Bundle.com                                     5
A Business Model “Built Out Of Data”

                                                                          Transformed             To Create New
                       Old Data                                           in New Ways           Features Such As…


                Card Transaction                                                Normalization   People Who Shop
                      Data                                                                       Here Also Like…


                                                                                Clustering
               Merchant Listings                                                                The Bundle Loyalty
               (e.g., Address, Phone                                                                  Score
              Number, Business Type)
                                                                                Linking
                                                                                                   Data-Driven
                    Other Data:                                                                 Reviews From an
             Census, Bureau of Labor
                                                                                Aggregation     Array of Customer
             Statistics, User Feedback                                                              Segments



New Insights from ‘Big Legacy Data’: The Role of Text Analytics at Bundle.com                                        6
The Benefit is to Provide More Accurate, Less Biased Content




New Insights from ‘Big Legacy Data’: The Role of Text Analytics at Bundle.com   7
Before the Fun Stuff Happens…
        Before we can generate insights about merchants for our users, we must associate
        each transaction in our database with a specific merchant from a master list….


                                                                                Two main problems:

                              Credit Card                                       1. Accurate Fuzzy Matching is Difficult
                             Transactions                                       2. Scale of Data is Enormous
                            (Billions – 109)
                                                                                This case focuses on the second problem
                    • Highly variable text
                      descriptions
                    • Noisy geographic
                      info                                                                Comprehensive Listing
                                                                    Text
                    • Noisy merchant                               Matching                  of US Merchants
                      category info                                                       (Tens of Millions – 107)


             Naïve item by item search takes O(1016)
             expensive string comparisons: Too Slow!

New Insights from ‘Big Legacy Data’: The Role of Text Analytics at Bundle.com                                             8
A “Brute Force” Approach Would Never Work…


                                       1
                                                 1. Matching w/in Hundreds of
                                                    Millions of Merchants would
                    Processing Time / Workload
                                                    require massive processing…                   Nation
                                                    ….Fortunately we don’t need to
                                                    match at this level

                                                 2. Batching at local
                                                    area, process
                                                    orders of
                                                    magnitude faster.
                                                                                   City



                                                    Neighborhood
                                       0
                                                     Hundreds                   Hundreds of   Tens of Millions
                                                                                Thousands
                                                               # of Merchants in Comparison Set

New Insights from ‘Big Legacy Data’: The Role of Text Analytics at Bundle.com                                    9
Solution to Scaling Problem
        This is a “Cascade of Scale Reductions”, Parallelizing by Location
                 Credit Card Transactions
                          (Billions – 109)
                                                                                       Keys to solving the scaling problem:
                  Batch Transactions by
                 Geographic Neighborhood
                                                                                         1. Scale Reduction /
                                                                                            Parallelized Text Clustering
                                                                                         2. Free Open Source Software
             1        2                        10000



                           Dedupe
                          Description
                                                                                                 Secondary Fuzzy Matching
                            Strings
                                                                                               Process Reconciles Preliminary
                                                                                                   Listings with Merchant
                                                                                                       “Source of Truth”
                     Text Clustering
                   (Not Matching)
            Consolidate Strings Belonging
                 to Same Merchant
                                                                                                            Computational Efficiency
                                                                                                          Increased by a Factor of 108!
                   Preliminary Merchant                                         Final Merged
                 Listing Generated Directly                                      Transaction                 Eons -> Days -> Minutes
                      from Transactions                                            Data Set
                   (Tens of Millions–107)

New Insights from ‘Big Legacy Data’: The Role of Text Analytics at Bundle.com                                                             10
Takeaways



           1. Tame your data before perfecting your methods.
           efficiency enables experimentation, iteration, improvement.



           2. Design your process to minimize unnecessary complexity
           (e.g. Parallel Processing at Scale, Normalization, Pre-Filtering)



            3. Tools: Take advantage of powerful (and inexpensive) open-
            source tools that enable your process...


New Insights from ‘Big Legacy Data’: The Role of Text Analytics at Bundle.com   11

Weitere ähnliche Inhalte

Was ist angesagt?

Big Data is Here for Financial Services White Paper
Big Data is Here for Financial Services White PaperBig Data is Here for Financial Services White Paper
Big Data is Here for Financial Services White PaperExperian
 
Data Mining: The Top 3 Things You Need to Know to Achieve Business Improvemen...
Data Mining: The Top 3 Things You Need to Know to Achieve Business Improvemen...Data Mining: The Top 3 Things You Need to Know to Achieve Business Improvemen...
Data Mining: The Top 3 Things You Need to Know to Achieve Business Improvemen...Dr. Cedric Alford
 
Business & Decision MDM Summit (english version)
Business & Decision MDM Summit (english version)Business & Decision MDM Summit (english version)
Business & Decision MDM Summit (english version)Jean-Michel Franco
 
Unleashing the Power of Customer Data
Unleashing the Power of Customer DataUnleashing the Power of Customer Data
Unleashing the Power of Customer DataReadWrite
 
Unleashing The Power Of Customer Data Wp091047
Unleashing The Power Of Customer Data Wp091047Unleashing The Power Of Customer Data Wp091047
Unleashing The Power Of Customer Data Wp091047Erik Ginalick
 
Analytical thinking 5 - May 2012
Analytical thinking 5 - May 2012Analytical thinking 5 - May 2012
Analytical thinking 5 - May 2012Charlotte Skornik
 
Thinking Small: Bringing the Power of Big Data to the Masses
Thinking Small:  Bringing the Power of Big Data to the MassesThinking Small:  Bringing the Power of Big Data to the Masses
Thinking Small: Bringing the Power of Big Data to the MassesFlutterbyBarb
 
Business analytics a certain something
Business analytics   a certain somethingBusiness analytics   a certain something
Business analytics a certain somethingLogicalis
 
Big Transaction Data - CMG Vegas 2012
Big Transaction Data - CMG Vegas 2012Big Transaction Data - CMG Vegas 2012
Big Transaction Data - CMG Vegas 2012nickychu
 
enterprise-data-everywhere
enterprise-data-everywhereenterprise-data-everywhere
enterprise-data-everywhereBill Peer
 
DarrelKammeyer Creating Value with BI in Hospitality
DarrelKammeyer Creating Value with BI in HospitalityDarrelKammeyer Creating Value with BI in Hospitality
DarrelKammeyer Creating Value with BI in HospitalityDarrel Kammeyer
 
Hadoop training in bangalore
Hadoop training in bangaloreHadoop training in bangalore
Hadoop training in bangaloreappaji intelhunt
 

Was ist angesagt? (17)

Big data assignment
Big data assignmentBig data assignment
Big data assignment
 
Big Data is Here for Financial Services White Paper
Big Data is Here for Financial Services White PaperBig Data is Here for Financial Services White Paper
Big Data is Here for Financial Services White Paper
 
Data Mining: The Top 3 Things You Need to Know to Achieve Business Improvemen...
Data Mining: The Top 3 Things You Need to Know to Achieve Business Improvemen...Data Mining: The Top 3 Things You Need to Know to Achieve Business Improvemen...
Data Mining: The Top 3 Things You Need to Know to Achieve Business Improvemen...
 
Wall streetjournal
Wall streetjournalWall streetjournal
Wall streetjournal
 
Business & Decision MDM Summit (english version)
Business & Decision MDM Summit (english version)Business & Decision MDM Summit (english version)
Business & Decision MDM Summit (english version)
 
Unleashing the Power of Customer Data
Unleashing the Power of Customer DataUnleashing the Power of Customer Data
Unleashing the Power of Customer Data
 
Unleashing The Power Of Customer Data Wp091047
Unleashing The Power Of Customer Data Wp091047Unleashing The Power Of Customer Data Wp091047
Unleashing The Power Of Customer Data Wp091047
 
Analytical thinking 5 - May 2012
Analytical thinking 5 - May 2012Analytical thinking 5 - May 2012
Analytical thinking 5 - May 2012
 
Big Data: Where is the Real Opportunity?
Big Data: Where is the Real Opportunity?Big Data: Where is the Real Opportunity?
Big Data: Where is the Real Opportunity?
 
Thinking Small: Bringing the Power of Big Data to the Masses
Thinking Small:  Bringing the Power of Big Data to the MassesThinking Small:  Bringing the Power of Big Data to the Masses
Thinking Small: Bringing the Power of Big Data to the Masses
 
Business analytics a certain something
Business analytics   a certain somethingBusiness analytics   a certain something
Business analytics a certain something
 
Big Transaction Data - CMG Vegas 2012
Big Transaction Data - CMG Vegas 2012Big Transaction Data - CMG Vegas 2012
Big Transaction Data - CMG Vegas 2012
 
18540515
1854051518540515
18540515
 
enterprise-data-everywhere
enterprise-data-everywhereenterprise-data-everywhere
enterprise-data-everywhere
 
The 25 Predictions About The Future Of Big Data
The 25 Predictions About The Future Of Big DataThe 25 Predictions About The Future Of Big Data
The 25 Predictions About The Future Of Big Data
 
DarrelKammeyer Creating Value with BI in Hospitality
DarrelKammeyer Creating Value with BI in HospitalityDarrelKammeyer Creating Value with BI in Hospitality
DarrelKammeyer Creating Value with BI in Hospitality
 
Hadoop training in bangalore
Hadoop training in bangaloreHadoop training in bangalore
Hadoop training in bangalore
 

Ähnlich wie New insights from big legacy data at bundle (Presented at Text Analytics World 2011)

Analytical thinking 8 - June 2012
Analytical thinking 8 - June 2012Analytical thinking 8 - June 2012
Analytical thinking 8 - June 2012Charlotte Skornik
 
Analytical thinking 16 - October 2012
Analytical thinking 16 - October 2012Analytical thinking 16 - October 2012
Analytical thinking 16 - October 2012Charlotte Skornik
 
The Big Deal About Big Data For Customer Engagement
The Big Deal About Big Data For Customer EngagementThe Big Deal About Big Data For Customer Engagement
The Big Deal About Big Data For Customer EngagementIBM India Smarter Computing
 
Oea big-data-guide-1522052
Oea big-data-guide-1522052Oea big-data-guide-1522052
Oea big-data-guide-1522052kavi172
 
Oea big-data-guide-1522052
Oea big-data-guide-1522052Oea big-data-guide-1522052
Oea big-data-guide-1522052Gilbert Rozario
 
Gartner eBook on Big Data
Gartner eBook on Big DataGartner eBook on Big Data
Gartner eBook on Big DataJyrki Määttä
 
Starting small with big data
Starting small with big data Starting small with big data
Starting small with big data WGroup
 
Data Analytics And Business Decision.pdf
Data Analytics And Business Decision.pdfData Analytics And Business Decision.pdf
Data Analytics And Business Decision.pdfCiente
 
Data Analytics And Business Decision.pdf
Data Analytics And Business Decision.pdfData Analytics And Business Decision.pdf
Data Analytics And Business Decision.pdfCiente
 
Malaysia Presentation
Malaysia PresentationMalaysia Presentation
Malaysia PresentationAlan Royal
 
Whitebook on Big Data
Whitebook on Big DataWhitebook on Big Data
Whitebook on Big DataViren Aul
 
Big Data: Beyond the Hype - Why Big Data Matters to You
Big Data: Beyond the Hype - Why Big Data Matters to YouBig Data: Beyond the Hype - Why Big Data Matters to You
Big Data: Beyond the Hype - Why Big Data Matters to YouDATAVERSITY
 
Big Data & Analytics perspectives in Banking
Big Data & Analytics perspectives in BankingBig Data & Analytics perspectives in Banking
Big Data & Analytics perspectives in BankingGianpaolo Zampol
 
2018 bi-trends-ebook
2018 bi-trends-ebook2018 bi-trends-ebook
2018 bi-trends-ebookSand
 

Ähnlich wie New insights from big legacy data at bundle (Presented at Text Analytics World 2011) (20)

Big Data analytics best practices
Big Data analytics best practicesBig Data analytics best practices
Big Data analytics best practices
 
Big Data - Harnessing a game changing asset
Big Data - Harnessing a game changing assetBig Data - Harnessing a game changing asset
Big Data - Harnessing a game changing asset
 
Analytical thinking 8 - June 2012
Analytical thinking 8 - June 2012Analytical thinking 8 - June 2012
Analytical thinking 8 - June 2012
 
Bigdata
BigdataBigdata
Bigdata
 
Analytical thinking 16 - October 2012
Analytical thinking 16 - October 2012Analytical thinking 16 - October 2012
Analytical thinking 16 - October 2012
 
The Big Deal About Big Data For Customer Engagement
The Big Deal About Big Data For Customer EngagementThe Big Deal About Big Data For Customer Engagement
The Big Deal About Big Data For Customer Engagement
 
Oea big-data-guide-1522052
Oea big-data-guide-1522052Oea big-data-guide-1522052
Oea big-data-guide-1522052
 
Oea big-data-guide-1522052
Oea big-data-guide-1522052Oea big-data-guide-1522052
Oea big-data-guide-1522052
 
Gartner eBook on Big Data
Gartner eBook on Big DataGartner eBook on Big Data
Gartner eBook on Big Data
 
Starting small with big data
Starting small with big data Starting small with big data
Starting small with big data
 
The state of the Big Data market
The state of the Big Data marketThe state of the Big Data market
The state of the Big Data market
 
Data Analytics And Business Decision.pdf
Data Analytics And Business Decision.pdfData Analytics And Business Decision.pdf
Data Analytics And Business Decision.pdf
 
Data Analytics And Business Decision.pdf
Data Analytics And Business Decision.pdfData Analytics And Business Decision.pdf
Data Analytics And Business Decision.pdf
 
Malaysia Presentation
Malaysia PresentationMalaysia Presentation
Malaysia Presentation
 
Whitebook on Big Data
Whitebook on Big DataWhitebook on Big Data
Whitebook on Big Data
 
Big Data: Beyond the Hype - Why Big Data Matters to You
Big Data: Beyond the Hype - Why Big Data Matters to YouBig Data: Beyond the Hype - Why Big Data Matters to You
Big Data: Beyond the Hype - Why Big Data Matters to You
 
Barak regev
Barak regevBarak regev
Barak regev
 
Big Data & Analytics perspectives in Banking
Big Data & Analytics perspectives in BankingBig Data & Analytics perspectives in Banking
Big Data & Analytics perspectives in Banking
 
2018 bi-trends-ebook
2018 bi-trends-ebook2018 bi-trends-ebook
2018 bi-trends-ebook
 
9sight operational analytics white paper
9sight   operational analytics white paper9sight   operational analytics white paper
9sight operational analytics white paper
 

Mehr von Fitzgerald Analytics, Inc.

Profiting from customer profitability + big data fitzgerald analytics
Profiting from customer profitability + big data fitzgerald analyticsProfiting from customer profitability + big data fitzgerald analytics
Profiting from customer profitability + big data fitzgerald analyticsFitzgerald Analytics, Inc.
 
2013 12-05 data-driven innovation - fitzgerald analytics workshop at gilbane ...
2013 12-05 data-driven innovation - fitzgerald analytics workshop at gilbane ...2013 12-05 data-driven innovation - fitzgerald analytics workshop at gilbane ...
2013 12-05 data-driven innovation - fitzgerald analytics workshop at gilbane ...Fitzgerald Analytics, Inc.
 
Analytics in Financial Services - Behavioral Finance Event - Data Visualizati...
Analytics in Financial Services - Behavioral Finance Event - Data Visualizati...Analytics in Financial Services - Behavioral Finance Event - Data Visualizati...
Analytics in Financial Services - Behavioral Finance Event - Data Visualizati...Fitzgerald Analytics, Inc.
 
Analytics in financial services prez behavioral finance + data visualizatio...
Analytics in financial services prez   behavioral finance + data visualizatio...Analytics in financial services prez   behavioral finance + data visualizatio...
Analytics in financial services prez behavioral finance + data visualizatio...Fitzgerald Analytics, Inc.
 
Jaime Fitzgerald on Data-Driven Customer Experience in Financial Services and...
Jaime Fitzgerald on Data-Driven Customer Experience in Financial Services and...Jaime Fitzgerald on Data-Driven Customer Experience in Financial Services and...
Jaime Fitzgerald on Data-Driven Customer Experience in Financial Services and...Fitzgerald Analytics, Inc.
 
Data Discovery for Big Big Insights - Tableau Webinar Slides
Data Discovery for Big Big Insights - Tableau Webinar SlidesData Discovery for Big Big Insights - Tableau Webinar Slides
Data Discovery for Big Big Insights - Tableau Webinar SlidesFitzgerald Analytics, Inc.
 
TDWI NYC Chapter - Tony Baer Ovum on Big data, Data quality, and BI Convergence
TDWI NYC Chapter - Tony Baer Ovum on Big data, Data quality, and BI ConvergenceTDWI NYC Chapter - Tony Baer Ovum on Big data, Data quality, and BI Convergence
TDWI NYC Chapter - Tony Baer Ovum on Big data, Data quality, and BI ConvergenceFitzgerald Analytics, Inc.
 
Data visualization trends in Business Intelligence: Allison Sapka at Analytic...
Data visualization trends in Business Intelligence: Allison Sapka at Analytic...Data visualization trends in Business Intelligence: Allison Sapka at Analytic...
Data visualization trends in Business Intelligence: Allison Sapka at Analytic...Fitzgerald Analytics, Inc.
 
Governing the Data to Dollars Value Chain™ - Sept 2012 NYC Data Governance Co...
Governing the Data to Dollars Value Chain™ - Sept 2012 NYC Data Governance Co...Governing the Data to Dollars Value Chain™ - Sept 2012 NYC Data Governance Co...
Governing the Data to Dollars Value Chain™ - Sept 2012 NYC Data Governance Co...Fitzgerald Analytics, Inc.
 
Data to Dollars™ - Practical Analytics in the Big Data Era Jaime Fitzgerald A...
Data to Dollars™ - Practical Analytics in the Big Data Era Jaime Fitzgerald A...Data to Dollars™ - Practical Analytics in the Big Data Era Jaime Fitzgerald A...
Data to Dollars™ - Practical Analytics in the Big Data Era Jaime Fitzgerald A...Fitzgerald Analytics, Inc.
 
Big Data Meets Customer Profitability Analytics
Big Data Meets Customer Profitability AnalyticsBig Data Meets Customer Profitability Analytics
Big Data Meets Customer Profitability AnalyticsFitzgerald Analytics, Inc.
 
Keynote on Financial Services Analytics - Presented aug 2011
Keynote on Financial Services Analytics - Presented aug 2011Keynote on Financial Services Analytics - Presented aug 2011
Keynote on Financial Services Analytics - Presented aug 2011Fitzgerald Analytics, Inc.
 
Knowledge management for analytic teams jaime fitzgerald and alex hasha - p...
Knowledge management for analytic teams   jaime fitzgerald and alex hasha - p...Knowledge management for analytic teams   jaime fitzgerald and alex hasha - p...
Knowledge management for analytic teams jaime fitzgerald and alex hasha - p...Fitzgerald Analytics, Inc.
 
Analytics in Financial Services: Keynote Presentation for TDWI and NY Tech Co...
Analytics in Financial Services: Keynote Presentation for TDWI and NY Tech Co...Analytics in Financial Services: Keynote Presentation for TDWI and NY Tech Co...
Analytics in Financial Services: Keynote Presentation for TDWI and NY Tech Co...Fitzgerald Analytics, Inc.
 
Jaime Fitzgerald: A Master Data Management Road-Trip - Presented Enterprise D...
Jaime Fitzgerald: A Master Data Management Road-Trip - Presented Enterprise D...Jaime Fitzgerald: A Master Data Management Road-Trip - Presented Enterprise D...
Jaime Fitzgerald: A Master Data Management Road-Trip - Presented Enterprise D...Fitzgerald Analytics, Inc.
 

Mehr von Fitzgerald Analytics, Inc. (17)

Profiting from customer profitability + big data fitzgerald analytics
Profiting from customer profitability + big data fitzgerald analyticsProfiting from customer profitability + big data fitzgerald analytics
Profiting from customer profitability + big data fitzgerald analytics
 
2013 12-05 data-driven innovation - fitzgerald analytics workshop at gilbane ...
2013 12-05 data-driven innovation - fitzgerald analytics workshop at gilbane ...2013 12-05 data-driven innovation - fitzgerald analytics workshop at gilbane ...
2013 12-05 data-driven innovation - fitzgerald analytics workshop at gilbane ...
 
Analytics in Financial Services - Behavioral Finance Event - Data Visualizati...
Analytics in Financial Services - Behavioral Finance Event - Data Visualizati...Analytics in Financial Services - Behavioral Finance Event - Data Visualizati...
Analytics in Financial Services - Behavioral Finance Event - Data Visualizati...
 
Analytics in financial services prez behavioral finance + data visualizatio...
Analytics in financial services prez   behavioral finance + data visualizatio...Analytics in financial services prez   behavioral finance + data visualizatio...
Analytics in financial services prez behavioral finance + data visualizatio...
 
Jaime Fitzgerald on Data-Driven Customer Experience in Financial Services and...
Jaime Fitzgerald on Data-Driven Customer Experience in Financial Services and...Jaime Fitzgerald on Data-Driven Customer Experience in Financial Services and...
Jaime Fitzgerald on Data-Driven Customer Experience in Financial Services and...
 
Data Discovery for Big Big Insights - Tableau Webinar Slides
Data Discovery for Big Big Insights - Tableau Webinar SlidesData Discovery for Big Big Insights - Tableau Webinar Slides
Data Discovery for Big Big Insights - Tableau Webinar Slides
 
TDWI NYC Chapter - Tony Baer Ovum on Big data, Data quality, and BI Convergence
TDWI NYC Chapter - Tony Baer Ovum on Big data, Data quality, and BI ConvergenceTDWI NYC Chapter - Tony Baer Ovum on Big data, Data quality, and BI Convergence
TDWI NYC Chapter - Tony Baer Ovum on Big data, Data quality, and BI Convergence
 
Text graph-visualization redux
Text graph-visualization reduxText graph-visualization redux
Text graph-visualization redux
 
Data visualization trends in Business Intelligence: Allison Sapka at Analytic...
Data visualization trends in Business Intelligence: Allison Sapka at Analytic...Data visualization trends in Business Intelligence: Allison Sapka at Analytic...
Data visualization trends in Business Intelligence: Allison Sapka at Analytic...
 
Governing the Data to Dollars Value Chain™ - Sept 2012 NYC Data Governance Co...
Governing the Data to Dollars Value Chain™ - Sept 2012 NYC Data Governance Co...Governing the Data to Dollars Value Chain™ - Sept 2012 NYC Data Governance Co...
Governing the Data to Dollars Value Chain™ - Sept 2012 NYC Data Governance Co...
 
Data to Dollars™ - Practical Analytics in the Big Data Era Jaime Fitzgerald A...
Data to Dollars™ - Practical Analytics in the Big Data Era Jaime Fitzgerald A...Data to Dollars™ - Practical Analytics in the Big Data Era Jaime Fitzgerald A...
Data to Dollars™ - Practical Analytics in the Big Data Era Jaime Fitzgerald A...
 
Big Data Meets Customer Profitability Analytics
Big Data Meets Customer Profitability AnalyticsBig Data Meets Customer Profitability Analytics
Big Data Meets Customer Profitability Analytics
 
Keynote on Financial Services Analytics - Presented aug 2011
Keynote on Financial Services Analytics - Presented aug 2011Keynote on Financial Services Analytics - Presented aug 2011
Keynote on Financial Services Analytics - Presented aug 2011
 
Knowledge management for analytic teams jaime fitzgerald and alex hasha - p...
Knowledge management for analytic teams   jaime fitzgerald and alex hasha - p...Knowledge management for analytic teams   jaime fitzgerald and alex hasha - p...
Knowledge management for analytic teams jaime fitzgerald and alex hasha - p...
 
Analytics in Financial Services: Keynote Presentation for TDWI and NY Tech Co...
Analytics in Financial Services: Keynote Presentation for TDWI and NY Tech Co...Analytics in Financial Services: Keynote Presentation for TDWI and NY Tech Co...
Analytics in Financial Services: Keynote Presentation for TDWI and NY Tech Co...
 
Fitzgerald Analytics 1-Page Overview
Fitzgerald Analytics 1-Page OverviewFitzgerald Analytics 1-Page Overview
Fitzgerald Analytics 1-Page Overview
 
Jaime Fitzgerald: A Master Data Management Road-Trip - Presented Enterprise D...
Jaime Fitzgerald: A Master Data Management Road-Trip - Presented Enterprise D...Jaime Fitzgerald: A Master Data Management Road-Trip - Presented Enterprise D...
Jaime Fitzgerald: A Master Data Management Road-Trip - Presented Enterprise D...
 

Kürzlich hochgeladen

Cyber Security Training in Office Environment
Cyber Security Training in Office EnvironmentCyber Security Training in Office Environment
Cyber Security Training in Office Environmentelijahj01012
 
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City GurgaonCall Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaoncallgirls2057
 
Unlocking the Future: Explore Web 3.0 Workshop to Start Earning Today!
Unlocking the Future: Explore Web 3.0 Workshop to Start Earning Today!Unlocking the Future: Explore Web 3.0 Workshop to Start Earning Today!
Unlocking the Future: Explore Web 3.0 Workshop to Start Earning Today!Doge Mining Website
 
Market Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 EditionMarket Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 EditionMintel Group
 
Annual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesAnnual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesKeppelCorporation
 
Investment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy CheruiyotInvestment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy Cheruiyotictsugar
 
Guide Complete Set of Residential Architectural Drawings PDF
Guide Complete Set of Residential Architectural Drawings PDFGuide Complete Set of Residential Architectural Drawings PDF
Guide Complete Set of Residential Architectural Drawings PDFChandresh Chudasama
 
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu MenzaYouth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menzaictsugar
 
Marketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent ChirchirMarketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent Chirchirictsugar
 
Kenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith PereraKenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith Pereraictsugar
 
Darshan Hiranandani [News About Next CEO].pdf
Darshan Hiranandani [News About Next CEO].pdfDarshan Hiranandani [News About Next CEO].pdf
Darshan Hiranandani [News About Next CEO].pdfShashank Mehta
 
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCRashishs7044
 
MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?Olivia Kresic
 
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdfNewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdfKhaled Al Awadi
 
Church Building Grants To Assist With New Construction, Additions, And Restor...
Church Building Grants To Assist With New Construction, Additions, And Restor...Church Building Grants To Assist With New Construction, Additions, And Restor...
Church Building Grants To Assist With New Construction, Additions, And Restor...Americas Got Grants
 
Digital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfDigital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfJos Voskuil
 
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort ServiceCall US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Servicecallgirls2057
 
The-Ethical-issues-ghhhhhhhhjof-Byjus.pptx
The-Ethical-issues-ghhhhhhhhjof-Byjus.pptxThe-Ethical-issues-ghhhhhhhhjof-Byjus.pptx
The-Ethical-issues-ghhhhhhhhjof-Byjus.pptxmbikashkanyari
 

Kürzlich hochgeladen (20)

Cyber Security Training in Office Environment
Cyber Security Training in Office EnvironmentCyber Security Training in Office Environment
Cyber Security Training in Office Environment
 
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City GurgaonCall Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
 
Unlocking the Future: Explore Web 3.0 Workshop to Start Earning Today!
Unlocking the Future: Explore Web 3.0 Workshop to Start Earning Today!Unlocking the Future: Explore Web 3.0 Workshop to Start Earning Today!
Unlocking the Future: Explore Web 3.0 Workshop to Start Earning Today!
 
Market Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 EditionMarket Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 Edition
 
Annual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesAnnual General Meeting Presentation Slides
Annual General Meeting Presentation Slides
 
Investment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy CheruiyotInvestment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy Cheruiyot
 
Guide Complete Set of Residential Architectural Drawings PDF
Guide Complete Set of Residential Architectural Drawings PDFGuide Complete Set of Residential Architectural Drawings PDF
Guide Complete Set of Residential Architectural Drawings PDF
 
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu MenzaYouth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
 
Marketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent ChirchirMarketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent Chirchir
 
No-1 Call Girls In Goa 93193 VIP 73153 Escort service In North Goa Panaji, Ca...
No-1 Call Girls In Goa 93193 VIP 73153 Escort service In North Goa Panaji, Ca...No-1 Call Girls In Goa 93193 VIP 73153 Escort service In North Goa Panaji, Ca...
No-1 Call Girls In Goa 93193 VIP 73153 Escort service In North Goa Panaji, Ca...
 
Kenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith PereraKenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith Perera
 
Darshan Hiranandani [News About Next CEO].pdf
Darshan Hiranandani [News About Next CEO].pdfDarshan Hiranandani [News About Next CEO].pdf
Darshan Hiranandani [News About Next CEO].pdf
 
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
 
Corporate Profile 47Billion Information Technology
Corporate Profile 47Billion Information TechnologyCorporate Profile 47Billion Information Technology
Corporate Profile 47Billion Information Technology
 
MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?
 
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdfNewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdf
 
Church Building Grants To Assist With New Construction, Additions, And Restor...
Church Building Grants To Assist With New Construction, Additions, And Restor...Church Building Grants To Assist With New Construction, Additions, And Restor...
Church Building Grants To Assist With New Construction, Additions, And Restor...
 
Digital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfDigital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdf
 
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort ServiceCall US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
 
The-Ethical-issues-ghhhhhhhhjof-Byjus.pptx
The-Ethical-issues-ghhhhhhhhjof-Byjus.pptxThe-Ethical-issues-ghhhhhhhhjof-Byjus.pptx
The-Ethical-issues-ghhhhhhhhjof-Byjus.pptx
 

New insights from big legacy data at bundle (Presented at Text Analytics World 2011)

  • 1. New Insights from ‘Big Legacy Data’: The Role of Text Analytics at Bundle.com Jaime Fitzgerald, President, Fitzgerald Analytics, Inc. Alex Hasha, Chief Data Scientist, Bundle.com October 2011 Architects of Fact-Based Decisions™
  • 2. Agenda for Today’s Talk 1. Introduction to the Business Model 2. The Role of Text Analytics 3. A Key Challenge and How we Overcame It 4. Takeaways 5. Q&A New Insights from ‘Big Legacy Data’: The Role of Text Analytics at Bundle.com 2
  • 3. Introduction Jaime Fitzgerald, Alex Hasha Founder @ Data Scientist @ Fitzgerald Analytics Bundle Corp @JaimeFitzgerald @AlexHasha Leading development of data products Transforming data into value for clients Responsible Designing statistical methods / algorithm For… that transform data into insights for Creating meaningful careers for employees consumers Helps clients convert Data to Dollars™ Uses data to help consumers make better At a decisions with their money Brings a strategic perspective to improve Bends valuable legacy data to new Company ROI on investments in technology, data, purposes That people, and processes Is growing and hiring! Working on a movement to Democratize Also Learning about and implementing best Analytics by Reducing the “Barrier to Working practices for managing complex data Benefit” for non-profits, social On pipelines entrepreneurs, and gov’t New Insights from ‘Big Legacy Data’: The Role of Text Analytics at Bundle.com 3
  • 4. For Example, We Help You Decide Where to Spend… New Insights from ‘Big Legacy Data’: The Role of Text Analytics at Bundle.com 4
  • 5. We Do This with Billions of Real Spending Records Unlike other merchant listing sites, our content is based on real credit card spending by 20 million households Key Issues with this Data: Example: Credit Card Statement Data 1. Credit card data lacks merchant identifier 2. So we rely on text analytics to associate transactions with merchants New Insights from ‘Big Legacy Data’: The Role of Text Analytics at Bundle.com 5
  • 6. A Business Model “Built Out Of Data” Transformed To Create New Old Data in New Ways Features Such As… Card Transaction Normalization People Who Shop Data Here Also Like… Clustering Merchant Listings The Bundle Loyalty (e.g., Address, Phone Score Number, Business Type) Linking Data-Driven Other Data: Reviews From an Census, Bureau of Labor Aggregation Array of Customer Statistics, User Feedback Segments New Insights from ‘Big Legacy Data’: The Role of Text Analytics at Bundle.com 6
  • 7. The Benefit is to Provide More Accurate, Less Biased Content New Insights from ‘Big Legacy Data’: The Role of Text Analytics at Bundle.com 7
  • 8. Before the Fun Stuff Happens… Before we can generate insights about merchants for our users, we must associate each transaction in our database with a specific merchant from a master list…. Two main problems: Credit Card 1. Accurate Fuzzy Matching is Difficult Transactions 2. Scale of Data is Enormous (Billions – 109) This case focuses on the second problem • Highly variable text descriptions • Noisy geographic info Comprehensive Listing Text • Noisy merchant Matching of US Merchants category info (Tens of Millions – 107) Naïve item by item search takes O(1016) expensive string comparisons: Too Slow! New Insights from ‘Big Legacy Data’: The Role of Text Analytics at Bundle.com 8
  • 9. A “Brute Force” Approach Would Never Work… 1 1. Matching w/in Hundreds of Millions of Merchants would Processing Time / Workload require massive processing… Nation ….Fortunately we don’t need to match at this level 2. Batching at local area, process orders of magnitude faster. City Neighborhood 0 Hundreds Hundreds of Tens of Millions Thousands # of Merchants in Comparison Set New Insights from ‘Big Legacy Data’: The Role of Text Analytics at Bundle.com 9
  • 10. Solution to Scaling Problem This is a “Cascade of Scale Reductions”, Parallelizing by Location Credit Card Transactions (Billions – 109) Keys to solving the scaling problem: Batch Transactions by Geographic Neighborhood 1. Scale Reduction / Parallelized Text Clustering 2. Free Open Source Software 1 2 10000 Dedupe Description Secondary Fuzzy Matching Strings Process Reconciles Preliminary Listings with Merchant “Source of Truth” Text Clustering (Not Matching) Consolidate Strings Belonging to Same Merchant Computational Efficiency Increased by a Factor of 108! Preliminary Merchant Final Merged Listing Generated Directly Transaction Eons -> Days -> Minutes from Transactions Data Set (Tens of Millions–107) New Insights from ‘Big Legacy Data’: The Role of Text Analytics at Bundle.com 10
  • 11. Takeaways 1. Tame your data before perfecting your methods. efficiency enables experimentation, iteration, improvement. 2. Design your process to minimize unnecessary complexity (e.g. Parallel Processing at Scale, Normalization, Pre-Filtering) 3. Tools: Take advantage of powerful (and inexpensive) open- source tools that enable your process... New Insights from ‘Big Legacy Data’: The Role of Text Analytics at Bundle.com 11