SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Cataloging
                The Art & Science of it...
                           Utkarsh
                    Principal Architect @
                        Flipkart.com



Sunday 3 March 13
Art vs Science
                             Imaginative                  Free
                                                          Form
                                               Creative




           Measurable            Formulative


                    Methodical               Set
                                           Patterns




Sunday 3 March 13
What is Cataloging?
  • Catalog
    A list or itemized display usually including descriptive information
    or illustrations.
  • Cataloging
    a. To list or include in a catalog
       b. To classify according to a categorical system


       We define it as:
       Cataloging is the process of managing the inventory of products
       through the entire lifecycle of creating, updating, de-
       provisioning/re-provisioning and deletion.

                                                                     3

Sunday 3 March 13
Why is the problem
                       interesting?
   • Ever growing - “size”
   • Dynamic nature of the Metadata - “elasticity”
   • Association(s) between data elements -
     “flexibility”
   • Flux of changes - “variability”
   • De-coupled systems & Data Ownership -
     “data duplication”



                                                     4

Sunday 3 March 13
How do we solve it?
     • Be Comprehensive & Imaginative
     • Be Methodical & Flexible




     • Work with Patterns & Create new Patterns
     • Be a Composer, be an artist (blend where required)


                                                            5

Sunday 3 March 13
What do we solve?
     • Identify Data Elements
     • Identify Relationships b/w Data Elements
     • Identify Data Usage patterns (Query patterns)
     • Create an ideal representation: Logical Model
     • Characterize the Data Store(s)
     • Architect the Catalog Data Cluster
     • Define Views/Interface(s)




                                                       6

Sunday 3 March 13
Identify Data Elements
            Product                       Stock                     Sellers
             Biblio



                       Product                    Category                    Product
                       Variants                                                SLAs



            Supplier                 Product                     Taxation
                                     Images


                           Pricing                Contributors
                                                                               ?

            Be Comprehensive ; Be Imaginative !!



                                                                                        7

Sunday 3 March 13
Identify Relationships
                                            ?
                                                                  Compilation
             Physical                                                 1
             Product                                has A
                             is A

                                                                  Compilation
                                                                      2
                                      Book
                                                      has A
                     belongs to
                                                belongs to
                                  belongs to
                    Year                                 Author


                                    Genre


       Be Comprehensive ; Be Imaginative !!


                                                                                8

Sunday 3 March 13
Identify Data Query Patterns
     •   Is the querying real-time or offline (customer perspective)
     •   Is the query “Id” based or use of filters (adhoc or pre-defined)
     •   Is the query linking multiple data elements
     •   Understand: Query SLAs at ever increasing scale
     •   Question: why is the client writing such a query


         Eg:
     a. Book with a specific title Secret of the Nagas
     b. Books by Chetan Bhagat published in 2012
     c. Books which are Thrillers, published post 2005 written in Hindi and
        published by Rupa Publications




                                                                              9

Sunday 3 March 13
Identification is Non Trivial
         Example “Book”


         Identification -->


         “Title”




                                    10

Sunday 3 March 13
Identification is Non Trivial
         Example “Book”


         Identification -->


         “Title”
         “Title” + “Publisher”




                                    11

Sunday 3 March 13
Identification is Non Trivial
         Example “Book”


         Identification -->


         “Title”
         “Title” + “Publisher”
         “Title” + “Publisher” + “Edition”




                                             12

Sunday 3 March 13
Identification is Non Trivial
         Example “Book”


         Identification -->


         “Title”
         “Title” + “Publisher”
         “Title” + “Publisher” + “Edition”
         “Title” + “Publisher” + “Edition” + “Variant”




                                                         13

Sunday 3 March 13
Identification is Non Trivial
         Example “Book”


         Identification -->


         “Title”
         “Title” + “Publisher”
         “Title” + “Publisher” + “Edition”
         “Title” + “Publisher” + “Edition” + “Variant”
         “Title” + “Publisher” + “Edition” + “Variant” + ??

         Be Imaginative - an Artist’s brush stroke !!




                                                              14

Sunday 3 March 13
Logical Model
     Schema
     Entities as Tables    + Rich Query Support       Relational
                                                      Databases:
                           + Built-in support for
                           Relationships                  * MySQL,
     Relationships as                                 Oracle, Postgres
     Constraints           + Indexes                  et al


     Queries supported     - Elasticity
     through indexes          * Frequent addition/
     and joins             deletion of columns
                              * Growing secondary
                           indexes
                           - Not optimized for some
                           use-cases
                             * Key-Values
                              *Data Blobs/ Graphs


                                                                   15

Sunday 3 March 13
Logical Model
         Semi-Schema
                             + Flexibility:
         Blobs (Documents)                       Document Stores:
                             “Documents” are
         of Data             less rigid            * MongoDB,
                                                 CouchBase et al
                             + Query Language
         Linkages between    to retrieve based
         Documents           on content of
                             “Document”
         Queries supported
         through document    - Complex
         identifiers and      Relationships are
         document            non-trivial
         references          - “Linked”
                             Document Queries
                             may not be
                             optimized


                                                               16

Sunday 3 March 13
Logical Model
         No Schema
         Data Blobs           + Elasticity           Other NoSQL
                                 * Variability of    Stores:
                              data format              * HBase, RIAK,
         Rules/Relationship                          Cassandra, et al
         definitions              * Secondary
                              Indices
                              + Tunable
         Queries supported    performance
         through data
         “views”, indexes,
         search based on      - Relational data is
         reverse indexing     a force-fit (sub-
         etc ...              optimal)
                              +/- Querying
                              models are specific
                              to Stores


                                                                   17

Sunday 3 March 13
Catalog Data Cluster

                    Catalog       Biblio     Product
                     Data         Data        Data




                                   UGC      Compliance
                                    on        Data
                                 Products
  - “View”/”Data” Partitions
  - Blend multiple data stores
  - Interfaces provide view to
                                    ?        Pricing/
  the underlying data
                                            Accounting
  - Scale uniformly for data
  elements



                                                       18

Sunday 3 March 13
Data Store Characterization
     • Data characteristics:                • Elasticity
           - Reliability (availability          - increase in scale
             and redundancy)                    - evolving catalog
           - Consistency                          definitions


     • Querying capability
           - Support for indexes            • SLAs
           - Filters; secondary                - Volumes
             indexes
                                               - Throughput
           - linkages/relationships
                                               - Latencies

          Be Comprehensive; be Methodical but be unbounded by
          choices - a Scientist who has a palet of colors in hand !!


                                                                       19

Sunday 3 March 13
Data Store Characterization
    • CAP: which 2 we pick? can data store help configure
      any 2?                     A




                        C                P

    • Operational ease (monitoring, reporting, config
      mgmt ..)
    • Pluggability with Distributed Computing platforms


                                                          20

Sunday 3 March 13
Define Views & Interfaces
      •   Cataloging has multiple use-cases
          which are business centric                  View Layer
                                                Precomputed View(s)
      •   Use-cases evolve; and so do the
          “view” to the data                            Dynamic View(s)

      •   “Views” as multiple interpretations
                                                   Data Access Interface
          of the data;
      •   De-coupled with the underlying
          data                                     Data 1          Data 2

      •   Underlying data form has to be
          elastic                                  Data 3          Data 4
      •   Overlayed views have to be
          adaptive



                                                                            21

Sunday 3 March 13
Architect for Scale &
                       Performance
                Identify
             Usage Patterns                  Right
                                          Tools for Job


                        Right
                     Abstractions                Pluggable
                                              Solution Stacks


                              Decoupled
                                Data                    Offline
                                                      Processing




                                                                   22

Sunday 3 March 13
Measure, Monitor & Evolve
     • SLAs change; system has to be adaptive
     • Start off with established goals; benchmark and
       meet the initial set goals
     • Changes are gradual; plan at the first symptom
     • Listen for system(s) not coping up
     • Always work towards incremental changes; entire
       overhaul of the systems will be counter productive

           Be Curious, have doubts, deeply introspect -
           be the ultimate Scientist !!



                                                            23

Sunday 3 March 13
Change is constant ... adapt

     • Requirements evolve
     • Business introduces flux
     • Data interpretations grow

     • Be flexible, adaptive, imaginative......
       work as a Scientist who appreciates
       Art !!


                                                 24

Sunday 3 March 13
Thank you !
                      My Co-ordinates:
                    utkarsh@flipkart.com




                                          25

Sunday 3 March 13

Weitere ähnliche Inhalte

Kürzlich hochgeladen

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 

Kürzlich hochgeladen (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

Empfohlen

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Empfohlen (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Cataloging: The Art and Science of it

  • 1. Cataloging The Art & Science of it... Utkarsh Principal Architect @ Flipkart.com Sunday 3 March 13
  • 2. Art vs Science Imaginative Free Form Creative Measurable Formulative Methodical Set Patterns Sunday 3 March 13
  • 3. What is Cataloging? • Catalog A list or itemized display usually including descriptive information or illustrations. • Cataloging a. To list or include in a catalog b. To classify according to a categorical system We define it as: Cataloging is the process of managing the inventory of products through the entire lifecycle of creating, updating, de- provisioning/re-provisioning and deletion. 3 Sunday 3 March 13
  • 4. Why is the problem interesting? • Ever growing - “size” • Dynamic nature of the Metadata - “elasticity” • Association(s) between data elements - “flexibility” • Flux of changes - “variability” • De-coupled systems & Data Ownership - “data duplication” 4 Sunday 3 March 13
  • 5. How do we solve it? • Be Comprehensive & Imaginative • Be Methodical & Flexible • Work with Patterns & Create new Patterns • Be a Composer, be an artist (blend where required) 5 Sunday 3 March 13
  • 6. What do we solve? • Identify Data Elements • Identify Relationships b/w Data Elements • Identify Data Usage patterns (Query patterns) • Create an ideal representation: Logical Model • Characterize the Data Store(s) • Architect the Catalog Data Cluster • Define Views/Interface(s) 6 Sunday 3 March 13
  • 7. Identify Data Elements Product Stock Sellers Biblio Product Category Product Variants SLAs Supplier Product Taxation Images Pricing Contributors ? Be Comprehensive ; Be Imaginative !! 7 Sunday 3 March 13
  • 8. Identify Relationships ? Compilation Physical 1 Product has A is A Compilation 2 Book has A belongs to belongs to belongs to Year Author Genre Be Comprehensive ; Be Imaginative !! 8 Sunday 3 March 13
  • 9. Identify Data Query Patterns • Is the querying real-time or offline (customer perspective) • Is the query “Id” based or use of filters (adhoc or pre-defined) • Is the query linking multiple data elements • Understand: Query SLAs at ever increasing scale • Question: why is the client writing such a query Eg: a. Book with a specific title Secret of the Nagas b. Books by Chetan Bhagat published in 2012 c. Books which are Thrillers, published post 2005 written in Hindi and published by Rupa Publications 9 Sunday 3 March 13
  • 10. Identification is Non Trivial Example “Book” Identification --> “Title” 10 Sunday 3 March 13
  • 11. Identification is Non Trivial Example “Book” Identification --> “Title” “Title” + “Publisher” 11 Sunday 3 March 13
  • 12. Identification is Non Trivial Example “Book” Identification --> “Title” “Title” + “Publisher” “Title” + “Publisher” + “Edition” 12 Sunday 3 March 13
  • 13. Identification is Non Trivial Example “Book” Identification --> “Title” “Title” + “Publisher” “Title” + “Publisher” + “Edition” “Title” + “Publisher” + “Edition” + “Variant” 13 Sunday 3 March 13
  • 14. Identification is Non Trivial Example “Book” Identification --> “Title” “Title” + “Publisher” “Title” + “Publisher” + “Edition” “Title” + “Publisher” + “Edition” + “Variant” “Title” + “Publisher” + “Edition” + “Variant” + ?? Be Imaginative - an Artist’s brush stroke !! 14 Sunday 3 March 13
  • 15. Logical Model Schema Entities as Tables + Rich Query Support Relational Databases: + Built-in support for Relationships * MySQL, Relationships as Oracle, Postgres Constraints + Indexes et al Queries supported - Elasticity through indexes * Frequent addition/ and joins deletion of columns * Growing secondary indexes - Not optimized for some use-cases * Key-Values *Data Blobs/ Graphs 15 Sunday 3 March 13
  • 16. Logical Model Semi-Schema + Flexibility: Blobs (Documents) Document Stores: “Documents” are of Data less rigid * MongoDB, CouchBase et al + Query Language Linkages between to retrieve based Documents on content of “Document” Queries supported through document - Complex identifiers and Relationships are document non-trivial references - “Linked” Document Queries may not be optimized 16 Sunday 3 March 13
  • 17. Logical Model No Schema Data Blobs + Elasticity Other NoSQL * Variability of Stores: data format * HBase, RIAK, Rules/Relationship Cassandra, et al definitions * Secondary Indices + Tunable Queries supported performance through data “views”, indexes, search based on - Relational data is reverse indexing a force-fit (sub- etc ... optimal) +/- Querying models are specific to Stores 17 Sunday 3 March 13
  • 18. Catalog Data Cluster Catalog Biblio Product Data Data Data UGC Compliance on Data Products - “View”/”Data” Partitions - Blend multiple data stores - Interfaces provide view to ? Pricing/ the underlying data Accounting - Scale uniformly for data elements 18 Sunday 3 March 13
  • 19. Data Store Characterization • Data characteristics: • Elasticity - Reliability (availability - increase in scale and redundancy) - evolving catalog - Consistency definitions • Querying capability - Support for indexes • SLAs - Filters; secondary - Volumes indexes - Throughput - linkages/relationships - Latencies Be Comprehensive; be Methodical but be unbounded by choices - a Scientist who has a palet of colors in hand !! 19 Sunday 3 March 13
  • 20. Data Store Characterization • CAP: which 2 we pick? can data store help configure any 2? A C P • Operational ease (monitoring, reporting, config mgmt ..) • Pluggability with Distributed Computing platforms 20 Sunday 3 March 13
  • 21. Define Views & Interfaces • Cataloging has multiple use-cases which are business centric View Layer Precomputed View(s) • Use-cases evolve; and so do the “view” to the data Dynamic View(s) • “Views” as multiple interpretations Data Access Interface of the data; • De-coupled with the underlying data Data 1 Data 2 • Underlying data form has to be elastic Data 3 Data 4 • Overlayed views have to be adaptive 21 Sunday 3 March 13
  • 22. Architect for Scale & Performance Identify Usage Patterns Right Tools for Job Right Abstractions Pluggable Solution Stacks Decoupled Data Offline Processing 22 Sunday 3 March 13
  • 23. Measure, Monitor & Evolve • SLAs change; system has to be adaptive • Start off with established goals; benchmark and meet the initial set goals • Changes are gradual; plan at the first symptom • Listen for system(s) not coping up • Always work towards incremental changes; entire overhaul of the systems will be counter productive Be Curious, have doubts, deeply introspect - be the ultimate Scientist !! 23 Sunday 3 March 13
  • 24. Change is constant ... adapt • Requirements evolve • Business introduces flux • Data interpretations grow • Be flexible, adaptive, imaginative...... work as a Scientist who appreciates Art !! 24 Sunday 3 March 13
  • 25. Thank you ! My Co-ordinates: utkarsh@flipkart.com 25 Sunday 3 March 13