SlideShare ist ein Scribd-Unternehmen logo
1 von 52
Arivoli Tirouvingadame
Principal Member of Technical Staff, Oracle America, Inc.
Acknowledgements


                             Sincere thanks to


                   Keshava Rangarajan,
         Chief Architect, Halliburton Corporation

for all the contribution and guidance, without which this research would not
                              have been possible.
What is Spend Classification ?
•Definition: Process of determining a purchase code for each spend record
(Requisitions, Purchase Orders, Receipts, Invoices, etc.) from a hierarchical
structure (Taxonomy).




                            Requisitions, POs, Receipts, Invoices, etc.
Why to classify spend ?
•Once all spend transactions are classified with a standard code from a
taxonomy – simple queries can be answered like

    •What are my top 10 spend categories ?
    •What is my travel spend ?
    •What is my spend for a given Supplier ?
    •What is my spend for a given Part ?
    •What is my spend for a given Business Unit ?



•If your classification is done on a consolidated data across all systems in your
organization, you get visibility across all systems with classification.
What is Taxonomy ?
•A simple hierarchical level of coding structure used to classify spend at
different levels.




                           Segment


                                     Family


                                              Class


                                                      Commodity
What is the Spend Classification challenge ?
•Categorization at source
•Categorization itself is inconsistent or missing completely
•Multiple disparate Taxonomies may exist in a company
•Classifying into “MISCELLANEOUS” category
•No standardization of Taxonomies
What is the “Categorization at source” challenge ?

Exercise: Buying a work laptop and expensing via procurement



X Category: Facility. Building.Hardware
Category: IT.Hardware.Laptop




Characteristics:
•User entered, hence error-prone
•No standardization across the supply chain – business units, customers, or
suppliers.
What is the “inconsistent/missing
             Categorization” challenge ?
• Category: IT.Hardware.Laptop
• Category: IT.Hardware.Computers.Laptop
What is the “multiple disparate Taxonomies”
                   challenge ?
•Multiple (and disparate) taxonomies may also exist in the organization
where classification could be carried out business unit-wise without regard
to, or referring to, the taxonomies used in other business units.



          Business Unit 3

     Business Unit 2

   Business Unit 1



                                                                          Taxonomy 3


                                                                     Taxonomy 2

                                                        Taxonomy 1
What is the “MISCELLANEOUS category” challenge ?

•Spend transactions are classified into the 'Miscellaneous' category, making it
very difficult for business analysts to figure out which category the item
should actually belong to.
•Spend analytics data will then show a weighted 'Miscellaneous' category,
which is incorrect and thus does not reflect a true picture of spend by
categories for the organization.



•Similar popular categories: OTHERS, UNCATEGORIZED
What is the standardization of Taxonomies need ?

•An enterprise may have multiple taxonomies at different levels – corporate,
strategic, business unit and regional center.

•Multiple taxonomies at various levels creates a number of issues when
analyzing spend, therefore it is important to create or use standard
taxonomies across the enterprise.
What are the types of Spend Classification
              Taxonomies ?
                  SPEND
              CLASSIFICATION
                TAXONOMY


                   Standard


                    Custom
Standard Taxonomies
•UNSPSC: United Nations Standard Products and Services Code. It is 5 level
hierarchy coded as an 8-digit number.

Example:
•Segment 44. Office Equipment and Accessories and Supplies.
• Family 10. Office machines and their supplies and accessories.
•   Class 15. Duplicating machines.
•     Commodity 01. Photocopiers.
•       Business Function 14. Retail.
Custom Taxonomies
•If your own coding structure is strong enough for your business, or you think
your business is more acquainted with your own structure
1)     Requisitions                         ERP Category
2)     Purchase Orders
3)     Receipts
4)     Invoices




         Procurement & Spend Analysis

      Item Invoice        Categories             Supplier
       Description        Description           Description
     And Attribute       And Attribute         And Attribute




      ERP Taxonomy       UNSPSC Code           Custom
                                             Taxonomies




                                         Data
                                         Mining




                                         Spend Classification
What is Spend Analysis?
•Process of collecting, cleansing, classifying and analyzing expenditure data
with the purpose of reducing procurement.
•Process of aggregating, classifying, and leveraging spend data for the purpose
of gaining visibility into cost reduction, performance improvement, and contract
compliance opportunities.
•Enables to answer the following questions:

    •Who is buying ?
    •What ?
    •From whom ?
    •When ?
    •(optionally) Where ?
    •At what price ?
Who needs Spend Analysis?
•It is the process of organizing a company’s spend in such a way that one
understand it, slice it, dice it and uncover hidden savings opportunities.
•Impacts more than just the sourcing team
•Spend analysis/ visibility serves three internal user community groups:

    •Leadership and CxOs: who need up-to-date reports to drive strategic direction
    •Managers, accountants: who need to drill down into a spend data set to explore specific areas
    of interest or track down payment specifics
    •Sourcing power users: who need to locate, drive, and monitor the next set of savings initiatives
What is Spend Management?
•Process in which companies control and optimize the money they spend.
•Involves cutting operating and other costs associated with doing business.
    •Includes spend analysis, sourcing, procurement, receiving, payment settlement and
    management of accounts payable and general ledger accounts.




•In an enterprise, spend management is managing how to spend money to best
effect in order to build products and services.
    •Encompasses processes such as outsourcing, procurement, e-procurement, and supply chain
    management.
Benefits of Spend Management
•Decreasing "maverick" spend
•Increase of spend economies of scale
    •Strategic sourcing (also called "supplier rationalization")
          •Sourcing optimization
          •Co-operative sourcing

•Increase process efficiencies
•Increase procurement efficiency
Life cycle of a PO
    Create PO
1

    Add items to PO
2

    Add PO to Cart *
3

    Create Document for the PO in the Cart
4

    Create Requisition for the Document
5


    Note: PO needs to be classified before it hits the Cart. After the Order
    hits the Cart, then it is too late for classification.
Classifying Spend
•   We have a set of pre-defined fields chosen for classification from a Purchase
    Order. All these fields are concatenated to form one giant string. (Note:
    This textual string could have multi-lingual strings.)



•   Lexers can be used for detecting languages. (eg: Auto lexers, World lexers)



•   SVM could be used for Textual mining.
Where does Machine Learning fit in?
    (Spend Auto-Classification)
     Ontology (including Spend
     Descriptions + other textual
     attributes)                                    Taxonomies

         Spend transaction




                                     Spend
                               Auto-classifier
                             Linguistics (UIMA) +
                              Neural Net Engine/
                                    Text SVM




                                                        Auto-Classified
                                                        Spend
Training data set
•   To begin with, customers provide a Training data set. This is from their
    historic data. They take some well known data set from their most common
    use cases. This would constitute a good representation of their problem.



•   We run our logic against this training set and get the results. The results are
    verified. We iterate this for some cycles to tune the logic.



•   Repeat the same over other use cases.
Data Mining Model
                           Create a Model




                            Model created

Enrich/Re-train


                    Cleanse incorrect classification



                  Support new categories (if needed)
What is Named Entity Recognition ?
•“Named-entity recognition (NER) (also known as entity identification and
entity extraction) is a subtask of information extraction that seeks to locate
and classify atomic elements in text into predefined categories such as the
names of persons, organizations, locations, expressions of times, quantities,
monetary values, percentages, etc.” -- Wikipedia
•Most research on NER systems has been structured as taking an
unannotated block of text, such as this one

• Jim bought 300 shares of Acme Corp. in 2006.

•And producing an annotated block of text, such as this one:

• <ENAMEX TYPE="PERSON">Jim</ENAMEX>bought<NUMEX
TYPE="QUANTITY">300</NUMEX>shares of<ENAMEX TYPE="ORGANIZATION">Acme
Corp.</ENAMEX> in <TIMEX TYPE="DATE">2006</TIMEX>.
Anatomy of a query …




Query = “Find Approved Status POs with High
Amount”
Stemmed Entity Recognition & Linguistic
           Parsing yields…
        Search Verb:
           “Find”




Find Approved Status POs with High Amount
Stemmed Entity Recognition & Linguistic
           Parsing yields…
         Search Verb:
            “Find”




       Attribute:Status= “Approved”




Find Approved Status POs with High Amount
Stemmed Entity Recognition & Linguistic
           Parsing yields…
         Search Verb:                         Entity:
            “Find”                    Attribute:Type=“PO”




       Attribute:Status= “Approved”




Find Approved Status POs with High Amount
Stemmed Entity Recognition & Linguistic
           Parsing yields…
         Search Verb:                         Entity:
            “Find”                    Attribute:Type=“PO”




                                                    Attribute:Amount= “High”
       Attribute:Status= “Approved”




Find Approved Status POs with High Amount
Stemmed Entity Recognition & Linguistic
           Parsing yields…
         Search Verb:           Target           Entity:
            “Find”                       Attribute:Type=“PO”




                                                       Attribute:Amount= “High”
       Attribute:Status= “Approved”




Find Approved Status POs with High Amount
Stemmed Entity Recognition & Linguistic
           Parsing yields…
         Search Verb:           Target             Entity:
            “Find”                         Attribute:Type=“PO”



                                Having
                               Attribute

                                                         Attribute:Amount= “High”
       Attribute:Status= “Approved”




Find Approved Status POs with High Amount
Stemmed Entity Recognition & Linguistic
           Parsing yields…
         Search Verb:           Target             Entity:
            “Find”                         Attribute:Type=“PO”



                                Having          Having
                               Attribute       Attribute

                                                           Attribute:Amount= “High”
       Attribute:Status= “Approved”




Find Approved Status POs with High Amount
Spend record with a Domain Ontology
OWL:
                                                                                                                               attribute: string
         Transaction                                               Party
                                                                                               has a                    Code
OWL:class                                   has        OWL:class
                                            many
                                                                                                     Role                OWL:class
       has an                                                                      plays          OWL:
                                                                   Is A
                                                                                                                              Bank
                                                                                                  attribute: string
                          is related
      ID                  to
                                    Person                                Corporation
                                       OWL:class                                                       Is A
                                                                    OWL:class
      OWL:attribute:                                                                                                  Finance
      number                                                                                                        Corporation
                                                                                                                OWL:class
                                                             has                 has
             First                             has                                             Name                               ID
                             has               many          many
            Name                                                           OWL:class       OWL:
                                                                            Address        attribute: string
             Last                                                                                                    has an    OWL:attribute:
                                                       Account                                                                 number
            Name
        OWL:attribute:             has an                                   in
                                                        ID
        string
                                              OWL:attribute:                            has
                                              number

             Door                        Street                    City                State                  Zip             Country
            Number                       Name
                                                          OWL:                    OWL:                   OWL:             OWL: attribute:
      OWL:                         OWL:                                           attribute:string
      attribute: string            attribute: string      attribute: string                              attribute:string string
Transaction

ID:200911071234
                  has    Party

                                      has ID: SBK
                                 has Role: S? Bank Role


                               played by

                          Bank

                               has Name: Bank Of Congo




                        has
                        many      Address
                                   has Street Name: Afrique Au Congo
                                   has Country: RDC
Transaction

ID:200911071235
                          has             Party

                                                   has ID: ORP
                                          has Role: Ordering Party Role


                                              played by

                                          Person

                                                  has First Name: John
                                                  has Last Name: Doe


                                    has
                                    many          Address
Account                                            has City: Kinshasa
 has Account Id: 123456                            has Country: CD

                                in Bank
                                     has Name: Bank Of Congo
Transaction
 Transaction
                                                                    ID:200911071234
ID:200911071235                                                                    has
                                       is related
                            Party      to
                                                                                        Party
                   has                has ID: ORP                                   has ID: SBK
                             has Role: Ordering Party Role                     has Role: S? Bank Role

                                                    played by

          Person

               has First Name: John                                         played by
               has Last Name: Doe                   Address
                                                     has City: Kinshasa            Address
                                                     has Country: CD                has Street Name:
                                        has                                         Afrique Au Congo
                                        many                                        has Country: RDC
Account

 has Account Id: 123456
                                    in Bank                                                     has
                                          has Name: Bank Of Congo                               many
A possible solution: Pipelining approach

•Flow 1:
    •Machine learning Pipeline: Input data is directly fed to the Machine Learning piece.


•Flow 2:
    •Domain Ontology Pipeline: Input data is fed to a Domain Ontology.
    •Standardize the output from the Domain Ontology.
    •Machine learning Pipeline: Feed it into the Machine Learning piece.

•Flow 3:
    •NER Pipeline: Input data is fed to a NER.
    •Domain Ontology Pipeline: Output from the NER is fed to the Domain Ontology.
    •Standardize the output from the Domain Ontology.
    •Machine learning Pipeline: Feed it into the Machine Learning piece.

•Note:
    •Domain Ontology and NER Pipelines can be optionally turned on or off
5
                        6

1
                            9

                    4

    2
        3               7




            8
Stanford NER Demo


http://nlp.stanford.edu:8080/ner/process


 Copyright Š 2011, Stanford University, All Rights Reserved.
SVM Steps

1.Identify taxonomy (hierarchical or flat) to be classified against
2.Identity representative training data that has been classified to this taxonomy
3.Run training data against blank SVM model and the given taxonomy
4.Classify training data as per required taxonomy
5.Classify the data
6.Increase training population and enrich classification model
7.Recognize and realign impact of original model against fresh training data
8.Classify (manually) misclassifications into proper taxonomy nodes
9.Run step 6 through 8 until all the variations for a given domain have been recognized
10.Introduce live data
11.Repeat steps 4 and 5 for misclassifications
12.Store the result in a relational database
13.Insert data in an Ontology
14.Enable analysis using RQL or SPARQL
Open source software

1.Jena
2.Pentaho http://www.pentaho.com/
3. Stanford NER, http://nlp.stanford.edu/software/CRF-NER.shtml
4.Annie NER
5.GATE
6.UIMA
7.SVM, http://en.wikipedia.org/wiki/Support_vector_machine
Acknowledgements


•Keshava Rangarajan, Chief Architect, Halliburton Corporation
•Gopalan Arun, Vice President, Software Development
•Ramesh Vasudevan, Senior Director, Software Development
•Nagaraj Srinivasan, VP R&D, Halliburton Corporation (Landmark Graphics)
•Ashish Pathak, Director, Product Management, Oracle America, Inc.
•Chandra Yeleshwarapu, Director, Information Management & Platform
Technologies, Halliburton Corporation
•Jayesh Shah, Vice President, Product Development, Oracle America, Inc.
•Rajesh Raheja, Senior Director, Applications Development, Oracle America, Inc
•Stanford NER, http://nlp.stanford.edu:8080/ner/process
•Prashant Mendki
http://scn.sap.com/community/epm/blog/2012/03/03/your-basic-guide-to-spend-
classification
References


•Jena
•Pentaho http://www.pentaho.com/
• Stanford NER, http://nlp.stanford.edu/software/CRF-NER.shtml
•Annie NER
•GATE
•UIMA
•SVM, http://en.wikipedia.org/wiki/Support_vector_machine
•Oracle Spend Classification Process Guide, Release 7.9.6
•Lexers

Weitere ähnliche Inhalte

Andere mochten auch

Clinical Trials Strategy: The Clinical Development Plan
Clinical Trials Strategy: The Clinical Development PlanClinical Trials Strategy: The Clinical Development Plan
Clinical Trials Strategy: The Clinical Development PlanMaRS Discovery District
 
ERP PROJECT
ERP PROJECTERP PROJECT
ERP PROJECTArun Kumar
 
Cloud computing security issues and challenges
Cloud computing security issues and challengesCloud computing security issues and challenges
Cloud computing security issues and challengesDheeraj Negi
 
PHISHING PROJECT REPORT
PHISHING PROJECT REPORTPHISHING PROJECT REPORT
PHISHING PROJECT REPORTvineetkathan
 
Credit insurance Solutions
Credit insurance SolutionsCredit insurance Solutions
Credit insurance SolutionsZayd Soobedar
 
Smart Innovation Platform Flier - Grindstaff
Smart Innovation Platform Flier - GrindstaffSmart Innovation Platform Flier - Grindstaff
Smart Innovation Platform Flier - GrindstaffJohn Nixon
 
Want to work for The Insurance Barn
Want to work for The Insurance BarnWant to work for The Insurance Barn
Want to work for The Insurance BarnTim Barnes Clu
 
Actuarial Challenge 2015 Price Indemnity Puzzle Contest Insurance Report
Actuarial Challenge 2015 Price Indemnity Puzzle Contest Insurance ReportActuarial Challenge 2015 Price Indemnity Puzzle Contest Insurance Report
Actuarial Challenge 2015 Price Indemnity Puzzle Contest Insurance Reportaashrafz
 
Efficient Point Cloud Pre-processing using The Point Cloud Library
Efficient Point Cloud Pre-processing using The Point Cloud LibraryEfficient Point Cloud Pre-processing using The Point Cloud Library
Efficient Point Cloud Pre-processing using The Point Cloud LibraryCSCJournals
 

Andere mochten auch (9)

Clinical Trials Strategy: The Clinical Development Plan
Clinical Trials Strategy: The Clinical Development PlanClinical Trials Strategy: The Clinical Development Plan
Clinical Trials Strategy: The Clinical Development Plan
 
ERP PROJECT
ERP PROJECTERP PROJECT
ERP PROJECT
 
Cloud computing security issues and challenges
Cloud computing security issues and challengesCloud computing security issues and challenges
Cloud computing security issues and challenges
 
PHISHING PROJECT REPORT
PHISHING PROJECT REPORTPHISHING PROJECT REPORT
PHISHING PROJECT REPORT
 
Credit insurance Solutions
Credit insurance SolutionsCredit insurance Solutions
Credit insurance Solutions
 
Smart Innovation Platform Flier - Grindstaff
Smart Innovation Platform Flier - GrindstaffSmart Innovation Platform Flier - Grindstaff
Smart Innovation Platform Flier - Grindstaff
 
Want to work for The Insurance Barn
Want to work for The Insurance BarnWant to work for The Insurance Barn
Want to work for The Insurance Barn
 
Actuarial Challenge 2015 Price Indemnity Puzzle Contest Insurance Report
Actuarial Challenge 2015 Price Indemnity Puzzle Contest Insurance ReportActuarial Challenge 2015 Price Indemnity Puzzle Contest Insurance Report
Actuarial Challenge 2015 Price Indemnity Puzzle Contest Insurance Report
 
Efficient Point Cloud Pre-processing using The Point Cloud Library
Efficient Point Cloud Pre-processing using The Point Cloud LibraryEfficient Point Cloud Pre-processing using The Point Cloud Library
Efficient Point Cloud Pre-processing using The Point Cloud Library
 

Ähnlich wie Semantic Spend Classification

Business Intelligence Data Warehouse System
Business Intelligence Data Warehouse SystemBusiness Intelligence Data Warehouse System
Business Intelligence Data Warehouse SystemKiran kumar
 
Who Does What, When, and How for a Divestiture?
Who Does What, When, and How for a Divestiture?Who Does What, When, and How for a Divestiture?
Who Does What, When, and How for a Divestiture?eprentise
 
How to Jump Start Taxonomy Content Creation webinar slides 9 24 15
How to Jump Start Taxonomy Content Creation webinar slides 9 24 15How to Jump Start Taxonomy Content Creation webinar slides 9 24 15
How to Jump Start Taxonomy Content Creation webinar slides 9 24 15Alicia Harapko
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data miningHadi Fadlallah
 
Assessing Strengths and Weaknesses
Assessing Strengths and WeaknessesAssessing Strengths and Weaknesses
Assessing Strengths and WeaknessesSalih Islam
 
التقنيات المستخدمة لتطوير المكتبات
التقنيات المستخدمة لتطوير المكتباتالتقنيات المستخدمة لتطوير المكتبات
التقنيات المستخدمة لتطوير المكتباتMohammed El Rafie Tarabay
 
ISBB_Chapter8.pptx
ISBB_Chapter8.pptxISBB_Chapter8.pptx
ISBB_Chapter8.pptxvijay s Kanavi
 
Kaizen Egypt | Introduction to Business Process Management & Process Developm...
Kaizen Egypt | Introduction to Business Process Management & Process Developm...Kaizen Egypt | Introduction to Business Process Management & Process Developm...
Kaizen Egypt | Introduction to Business Process Management & Process Developm...Amr El-Ganainy
 
Intrrnl env, resources competancies
Intrrnl env, resources competanciesIntrrnl env, resources competancies
Intrrnl env, resources competanciesJaswinder Singh
 
Data Mining-2023 (2).ppt
Data Mining-2023 (2).pptData Mining-2023 (2).ppt
Data Mining-2023 (2).pptSATYAJITJENABTECH
 
ch01_02.ppt
ch01_02.pptch01_02.ppt
ch01_02.pptsheryl90
 
enVista Corp - Operational Assessment
enVista Corp - Operational AssessmentenVista Corp - Operational Assessment
enVista Corp - Operational AssessmentAndrew Stuckey
 
Value Chain and value system
Value Chain and value systemValue Chain and value system
Value Chain and value systemsandeep1x
 
Oracle PIM: Phantasmal Item Descriptions in your Organization
Oracle PIM: Phantasmal Item Descriptions in your OrganizationOracle PIM: Phantasmal Item Descriptions in your Organization
Oracle PIM: Phantasmal Item Descriptions in your OrganizationAXIA Consulting Inc.
 
Moyez Dreamforce 2017 presentation on Large Data Volumes in Salesforce
Moyez Dreamforce 2017 presentation on Large Data Volumes in SalesforceMoyez Dreamforce 2017 presentation on Large Data Volumes in Salesforce
Moyez Dreamforce 2017 presentation on Large Data Volumes in SalesforceMoyez Thanawalla
 
Is Your Marketing Database "Model Ready"?
Is Your Marketing Database "Model Ready"?Is Your Marketing Database "Model Ready"?
Is Your Marketing Database "Model Ready"?Vivastream
 

Ähnlich wie Semantic Spend Classification (20)

Business Intelligence Data Warehouse System
Business Intelligence Data Warehouse SystemBusiness Intelligence Data Warehouse System
Business Intelligence Data Warehouse System
 
Data mining
Data miningData mining
Data mining
 
Who Does What, When, and How for a Divestiture?
Who Does What, When, and How for a Divestiture?Who Does What, When, and How for a Divestiture?
Who Does What, When, and How for a Divestiture?
 
How to Jump Start Taxonomy Content Creation webinar slides 9 24 15
How to Jump Start Taxonomy Content Creation webinar slides 9 24 15How to Jump Start Taxonomy Content Creation webinar slides 9 24 15
How to Jump Start Taxonomy Content Creation webinar slides 9 24 15
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data mining
 
Assessing Strengths and Weaknesses
Assessing Strengths and WeaknessesAssessing Strengths and Weaknesses
Assessing Strengths and Weaknesses
 
التقنيات المستخدمة لتطوير المكتبات
التقنيات المستخدمة لتطوير المكتباتالتقنيات المستخدمة لتطوير المكتبات
التقنيات المستخدمة لتطوير المكتبات
 
ISBB_Chapter8.pptx
ISBB_Chapter8.pptxISBB_Chapter8.pptx
ISBB_Chapter8.pptx
 
ISBB_Chapter8.pptx
ISBB_Chapter8.pptxISBB_Chapter8.pptx
ISBB_Chapter8.pptx
 
Kaizen Egypt | Introduction to Business Process Management & Process Developm...
Kaizen Egypt | Introduction to Business Process Management & Process Developm...Kaizen Egypt | Introduction to Business Process Management & Process Developm...
Kaizen Egypt | Introduction to Business Process Management & Process Developm...
 
Intrrnl env, resources competancies
Intrrnl env, resources competanciesIntrrnl env, resources competancies
Intrrnl env, resources competancies
 
Data Mining-2023 (2).ppt
Data Mining-2023 (2).pptData Mining-2023 (2).ppt
Data Mining-2023 (2).ppt
 
ch01_02.ppt
ch01_02.pptch01_02.ppt
ch01_02.ppt
 
ch01_02.ppt
ch01_02.pptch01_02.ppt
ch01_02.ppt
 
enVista Corp - Operational Assessment
enVista Corp - Operational AssessmentenVista Corp - Operational Assessment
enVista Corp - Operational Assessment
 
Value Chain and value system
Value Chain and value systemValue Chain and value system
Value Chain and value system
 
Oracle PIM: Phantasmal Item Descriptions in your Organization
Oracle PIM: Phantasmal Item Descriptions in your OrganizationOracle PIM: Phantasmal Item Descriptions in your Organization
Oracle PIM: Phantasmal Item Descriptions in your Organization
 
Internal analysis
Internal analysisInternal analysis
Internal analysis
 
Moyez Dreamforce 2017 presentation on Large Data Volumes in Salesforce
Moyez Dreamforce 2017 presentation on Large Data Volumes in SalesforceMoyez Dreamforce 2017 presentation on Large Data Volumes in Salesforce
Moyez Dreamforce 2017 presentation on Large Data Volumes in Salesforce
 
Is Your Marketing Database "Model Ready"?
Is Your Marketing Database "Model Ready"?Is Your Marketing Database "Model Ready"?
Is Your Marketing Database "Model Ready"?
 

KĂźrzlich hochgeladen

Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel AraĂşjo
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 

KĂźrzlich hochgeladen (20)

Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

Semantic Spend Classification

  • 1. Arivoli Tirouvingadame Principal Member of Technical Staff, Oracle America, Inc.
  • 2. Acknowledgements Sincere thanks to Keshava Rangarajan, Chief Architect, Halliburton Corporation for all the contribution and guidance, without which this research would not have been possible.
  • 3. What is Spend Classification ? •Definition: Process of determining a purchase code for each spend record (Requisitions, Purchase Orders, Receipts, Invoices, etc.) from a hierarchical structure (Taxonomy). Requisitions, POs, Receipts, Invoices, etc.
  • 4. Why to classify spend ? •Once all spend transactions are classified with a standard code from a taxonomy – simple queries can be answered like •What are my top 10 spend categories ? •What is my travel spend ? •What is my spend for a given Supplier ? •What is my spend for a given Part ? •What is my spend for a given Business Unit ? •If your classification is done on a consolidated data across all systems in your organization, you get visibility across all systems with classification.
  • 5. What is Taxonomy ? •A simple hierarchical level of coding structure used to classify spend at different levels. Segment Family Class Commodity
  • 6. What is the Spend Classification challenge ? •Categorization at source •Categorization itself is inconsistent or missing completely •Multiple disparate Taxonomies may exist in a company •Classifying into “MISCELLANEOUS” category •No standardization of Taxonomies
  • 7. What is the “Categorization at source” challenge ? Exercise: Buying a work laptop and expensing via procurement X Category: Facility. Building.Hardware Category: IT.Hardware.Laptop Characteristics: •User entered, hence error-prone •No standardization across the supply chain – business units, customers, or suppliers.
  • 8. What is the “inconsistent/missing Categorization” challenge ? • Category: IT.Hardware.Laptop • Category: IT.Hardware.Computers.Laptop
  • 9. What is the “multiple disparate Taxonomies” challenge ? •Multiple (and disparate) taxonomies may also exist in the organization where classification could be carried out business unit-wise without regard to, or referring to, the taxonomies used in other business units. Business Unit 3 Business Unit 2 Business Unit 1 Taxonomy 3 Taxonomy 2 Taxonomy 1
  • 10. What is the “MISCELLANEOUS category” challenge ? •Spend transactions are classified into the 'Miscellaneous' category, making it very difficult for business analysts to figure out which category the item should actually belong to. •Spend analytics data will then show a weighted 'Miscellaneous' category, which is incorrect and thus does not reflect a true picture of spend by categories for the organization. •Similar popular categories: OTHERS, UNCATEGORIZED
  • 11. What is the standardization of Taxonomies need ? •An enterprise may have multiple taxonomies at different levels – corporate, strategic, business unit and regional center. •Multiple taxonomies at various levels creates a number of issues when analyzing spend, therefore it is important to create or use standard taxonomies across the enterprise.
  • 12. What are the types of Spend Classification Taxonomies ? SPEND CLASSIFICATION TAXONOMY Standard Custom
  • 13. Standard Taxonomies •UNSPSC: United Nations Standard Products and Services Code. It is 5 level hierarchy coded as an 8-digit number. Example: •Segment 44. Office Equipment and Accessories and Supplies. • Family 10. Office machines and their supplies and accessories. • Class 15. Duplicating machines. • Commodity 01. Photocopiers. • Business Function 14. Retail.
  • 14. Custom Taxonomies •If your own coding structure is strong enough for your business, or you think your business is more acquainted with your own structure
  • 15. 1) Requisitions ERP Category 2) Purchase Orders 3) Receipts 4) Invoices Procurement & Spend Analysis Item Invoice Categories Supplier Description Description Description And Attribute And Attribute And Attribute ERP Taxonomy UNSPSC Code Custom Taxonomies Data Mining Spend Classification
  • 16. What is Spend Analysis? •Process of collecting, cleansing, classifying and analyzing expenditure data with the purpose of reducing procurement. •Process of aggregating, classifying, and leveraging spend data for the purpose of gaining visibility into cost reduction, performance improvement, and contract compliance opportunities. •Enables to answer the following questions: •Who is buying ? •What ? •From whom ? •When ? •(optionally) Where ? •At what price ?
  • 17. Who needs Spend Analysis? •It is the process of organizing a company’s spend in such a way that one understand it, slice it, dice it and uncover hidden savings opportunities. •Impacts more than just the sourcing team •Spend analysis/ visibility serves three internal user community groups: •Leadership and CxOs: who need up-to-date reports to drive strategic direction •Managers, accountants: who need to drill down into a spend data set to explore specific areas of interest or track down payment specifics •Sourcing power users: who need to locate, drive, and monitor the next set of savings initiatives
  • 18. What is Spend Management? •Process in which companies control and optimize the money they spend. •Involves cutting operating and other costs associated with doing business. •Includes spend analysis, sourcing, procurement, receiving, payment settlement and management of accounts payable and general ledger accounts. •In an enterprise, spend management is managing how to spend money to best effect in order to build products and services. •Encompasses processes such as outsourcing, procurement, e-procurement, and supply chain management.
  • 19. Benefits of Spend Management •Decreasing "maverick" spend •Increase of spend economies of scale •Strategic sourcing (also called "supplier rationalization") •Sourcing optimization •Co-operative sourcing •Increase process efficiencies •Increase procurement efficiency
  • 20. Life cycle of a PO Create PO 1 Add items to PO 2 Add PO to Cart * 3 Create Document for the PO in the Cart 4 Create Requisition for the Document 5 Note: PO needs to be classified before it hits the Cart. After the Order hits the Cart, then it is too late for classification.
  • 21. Classifying Spend • We have a set of pre-defined fields chosen for classification from a Purchase Order. All these fields are concatenated to form one giant string. (Note: This textual string could have multi-lingual strings.) • Lexers can be used for detecting languages. (eg: Auto lexers, World lexers) • SVM could be used for Textual mining.
  • 22. Where does Machine Learning fit in? (Spend Auto-Classification) Ontology (including Spend Descriptions + other textual attributes) Taxonomies Spend transaction Spend Auto-classifier Linguistics (UIMA) + Neural Net Engine/ Text SVM Auto-Classified Spend
  • 23. Training data set • To begin with, customers provide a Training data set. This is from their historic data. They take some well known data set from their most common use cases. This would constitute a good representation of their problem. • We run our logic against this training set and get the results. The results are verified. We iterate this for some cycles to tune the logic. • Repeat the same over other use cases.
  • 24. Data Mining Model Create a Model Model created Enrich/Re-train Cleanse incorrect classification Support new categories (if needed)
  • 25. What is Named Entity Recognition ? •“Named-entity recognition (NER) (also known as entity identification and entity extraction) is a subtask of information extraction that seeks to locate and classify atomic elements in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.” -- Wikipedia •Most research on NER systems has been structured as taking an unannotated block of text, such as this one • Jim bought 300 shares of Acme Corp. in 2006. •And producing an annotated block of text, such as this one: • <ENAMEX TYPE="PERSON">Jim</ENAMEX>bought<NUMEX TYPE="QUANTITY">300</NUMEX>shares of<ENAMEX TYPE="ORGANIZATION">Acme Corp.</ENAMEX> in <TIMEX TYPE="DATE">2006</TIMEX>.
  • 26. Anatomy of a query … Query = “Find Approved Status POs with High Amount”
  • 27. Stemmed Entity Recognition & Linguistic Parsing yields… Search Verb: “Find” Find Approved Status POs with High Amount
  • 28. Stemmed Entity Recognition & Linguistic Parsing yields… Search Verb: “Find” Attribute:Status= “Approved” Find Approved Status POs with High Amount
  • 29. Stemmed Entity Recognition & Linguistic Parsing yields… Search Verb: Entity: “Find” Attribute:Type=“PO” Attribute:Status= “Approved” Find Approved Status POs with High Amount
  • 30. Stemmed Entity Recognition & Linguistic Parsing yields… Search Verb: Entity: “Find” Attribute:Type=“PO” Attribute:Amount= “High” Attribute:Status= “Approved” Find Approved Status POs with High Amount
  • 31. Stemmed Entity Recognition & Linguistic Parsing yields… Search Verb: Target Entity: “Find” Attribute:Type=“PO” Attribute:Amount= “High” Attribute:Status= “Approved” Find Approved Status POs with High Amount
  • 32. Stemmed Entity Recognition & Linguistic Parsing yields… Search Verb: Target Entity: “Find” Attribute:Type=“PO” Having Attribute Attribute:Amount= “High” Attribute:Status= “Approved” Find Approved Status POs with High Amount
  • 33. Stemmed Entity Recognition & Linguistic Parsing yields… Search Verb: Target Entity: “Find” Attribute:Type=“PO” Having Having Attribute Attribute Attribute:Amount= “High” Attribute:Status= “Approved” Find Approved Status POs with High Amount
  • 34. Spend record with a Domain Ontology
  • 35.
  • 36. OWL: attribute: string Transaction Party has a Code OWL:class has OWL:class many Role OWL:class has an plays OWL: Is A Bank attribute: string is related ID to Person Corporation OWL:class Is A OWL:class OWL:attribute: Finance number Corporation OWL:class has has First has Name ID has many many Name OWL:class OWL: Address attribute: string Last has an OWL:attribute: Account number Name OWL:attribute: has an in ID string OWL:attribute: has number Door Street City State Zip Country Number Name OWL: OWL: OWL: OWL: attribute: OWL: OWL: attribute:string attribute: string attribute: string attribute: string attribute:string string
  • 37. Transaction ID:200911071234 has Party has ID: SBK has Role: S? Bank Role played by Bank has Name: Bank Of Congo has many Address has Street Name: Afrique Au Congo has Country: RDC
  • 38. Transaction ID:200911071235 has Party has ID: ORP has Role: Ordering Party Role played by Person has First Name: John has Last Name: Doe has many Address Account has City: Kinshasa has Account Id: 123456 has Country: CD in Bank has Name: Bank Of Congo
  • 39. Transaction Transaction ID:200911071234 ID:200911071235 has is related Party to Party has has ID: ORP has ID: SBK has Role: Ordering Party Role has Role: S? Bank Role played by Person has First Name: John played by has Last Name: Doe Address has City: Kinshasa Address has Country: CD has Street Name: has Afrique Au Congo many has Country: RDC Account has Account Id: 123456 in Bank has has Name: Bank Of Congo many
  • 40. A possible solution: Pipelining approach •Flow 1: •Machine learning Pipeline: Input data is directly fed to the Machine Learning piece. •Flow 2: •Domain Ontology Pipeline: Input data is fed to a Domain Ontology. •Standardize the output from the Domain Ontology. •Machine learning Pipeline: Feed it into the Machine Learning piece. •Flow 3: •NER Pipeline: Input data is fed to a NER. •Domain Ontology Pipeline: Output from the NER is fed to the Domain Ontology. •Standardize the output from the Domain Ontology. •Machine learning Pipeline: Feed it into the Machine Learning piece. •Note: •Domain Ontology and NER Pipelines can be optionally turned on or off
  • 41. 5 6 1 9 4 2 3 7 8
  • 42. Stanford NER Demo http://nlp.stanford.edu:8080/ner/process Copyright Š 2011, Stanford University, All Rights Reserved.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49. SVM Steps 1.Identify taxonomy (hierarchical or flat) to be classified against 2.Identity representative training data that has been classified to this taxonomy 3.Run training data against blank SVM model and the given taxonomy 4.Classify training data as per required taxonomy 5.Classify the data 6.Increase training population and enrich classification model 7.Recognize and realign impact of original model against fresh training data 8.Classify (manually) misclassifications into proper taxonomy nodes 9.Run step 6 through 8 until all the variations for a given domain have been recognized 10.Introduce live data 11.Repeat steps 4 and 5 for misclassifications 12.Store the result in a relational database 13.Insert data in an Ontology 14.Enable analysis using RQL or SPARQL
  • 50. Open source software 1.Jena 2.Pentaho http://www.pentaho.com/ 3. Stanford NER, http://nlp.stanford.edu/software/CRF-NER.shtml 4.Annie NER 5.GATE 6.UIMA 7.SVM, http://en.wikipedia.org/wiki/Support_vector_machine
  • 51. Acknowledgements •Keshava Rangarajan, Chief Architect, Halliburton Corporation •Gopalan Arun, Vice President, Software Development •Ramesh Vasudevan, Senior Director, Software Development •Nagaraj Srinivasan, VP R&D, Halliburton Corporation (Landmark Graphics) •Ashish Pathak, Director, Product Management, Oracle America, Inc. •Chandra Yeleshwarapu, Director, Information Management & Platform Technologies, Halliburton Corporation •Jayesh Shah, Vice President, Product Development, Oracle America, Inc. •Rajesh Raheja, Senior Director, Applications Development, Oracle America, Inc •Stanford NER, http://nlp.stanford.edu:8080/ner/process •Prashant Mendki http://scn.sap.com/community/epm/blog/2012/03/03/your-basic-guide-to-spend- classification
  • 52. References •Jena •Pentaho http://www.pentaho.com/ • Stanford NER, http://nlp.stanford.edu/software/CRF-NER.shtml •Annie NER •GATE •UIMA •SVM, http://en.wikipedia.org/wiki/Support_vector_machine •Oracle Spend Classification Process Guide, Release 7.9.6 •Lexers

Hinweis der Redaktion

  1. We will talk about Auto-Classification and the place for Machine Learning . When a Spend transaction is added, what needs to happen is, the positioning of a spend in terms of a formal taxonomy might have to be dynamically changed. And that is not something that a person can manually do it in real time. We need an automated way of doing that.   The spend transactions themselves have descriptions . When a tagging activity happens, when a review is written up , there is textual information. We could use UIMA, to pick out all the textual tokens – break them out into attributes and do Named Entity recognition. And then bring out a trained SVM engine which works on a model, that is able to pick up all the spend descriptions, and all its attributes from the Classification model, and tag it, and then position it appropriately in the Taxonomy. There are two flavors available: Neural Net Engine SVM They both have comparable performance. The bottom line is, we took in the spend Taxonomy , we took in the spend Ontology that describes the entire Spend model as well as the description of the spend - you can run it into a Neural Net Engine and then you can tag things, so that, as and when a new spend transaction is introduced, it is appropriately positioned in the Taxonomy, dynamically .