2. Acknowledgements
Sincere thanks to
Keshava Rangarajan,
Chief Architect, Halliburton Corporation
for all the contribution and guidance, without which this research would not
have been possible.
3. What is Spend Classification ?
â˘Definition: Process of determining a purchase code for each spend record
(Requisitions, Purchase Orders, Receipts, Invoices, etc.) from a hierarchical
structure (Taxonomy).
Requisitions, POs, Receipts, Invoices, etc.
4. Why to classify spend ?
â˘Once all spend transactions are classified with a standard code from a
taxonomy â simple queries can be answered like
â˘What are my top 10 spend categories ?
â˘What is my travel spend ?
â˘What is my spend for a given Supplier ?
â˘What is my spend for a given Part ?
â˘What is my spend for a given Business Unit ?
â˘If your classification is done on a consolidated data across all systems in your
organization, you get visibility across all systems with classification.
5. What is Taxonomy ?
â˘A simple hierarchical level of coding structure used to classify spend at
different levels.
Segment
Family
Class
Commodity
6. What is the Spend Classification challenge ?
â˘Categorization at source
â˘Categorization itself is inconsistent or missing completely
â˘Multiple disparate Taxonomies may exist in a company
â˘Classifying into âMISCELLANEOUSâ category
â˘No standardization of Taxonomies
7. What is the âCategorization at sourceâ challenge ?
Exercise: Buying a work laptop and expensing via procurement
X Category: Facility. Building.Hardware
ďźCategory: IT.Hardware.Laptop
Characteristics:
â˘User entered, hence error-prone
â˘No standardization across the supply chain â business units, customers, or
suppliers.
8. What is the âinconsistent/missing
Categorizationâ challenge ?
⢠Category: IT.Hardware.Laptop
⢠Category: IT.Hardware.Computers.Laptop
9. What is the âmultiple disparate Taxonomiesâ
challenge ?
â˘Multiple (and disparate) taxonomies may also exist in the organization
where classification could be carried out business unit-wise without regard
to, or referring to, the taxonomies used in other business units.
Business Unit 3
Business Unit 2
Business Unit 1
Taxonomy 3
Taxonomy 2
Taxonomy 1
10. What is the âMISCELLANEOUS categoryâ challenge ?
â˘Spend transactions are classified into the 'Miscellaneous' category, making it
very difficult for business analysts to figure out which category the item
should actually belong to.
â˘Spend analytics data will then show a weighted 'Miscellaneous' category,
which is incorrect and thus does not reflect a true picture of spend by
categories for the organization.
â˘Similar popular categories: OTHERS, UNCATEGORIZED
11. What is the standardization of Taxonomies need ?
â˘An enterprise may have multiple taxonomies at different levels â corporate,
strategic, business unit and regional center.
â˘Multiple taxonomies at various levels creates a number of issues when
analyzing spend, therefore it is important to create or use standard
taxonomies across the enterprise.
12. What are the types of Spend Classification
Taxonomies ?
SPEND
CLASSIFICATION
TAXONOMY
Standard
Custom
13. Standard Taxonomies
â˘UNSPSC: United Nations Standard Products and Services Code. It is 5 level
hierarchy coded as an 8-digit number.
Example:
â˘Segment 44. Office Equipment and Accessories and Supplies.
⢠Family 10. Office machines and their supplies and accessories.
⢠Class 15. Duplicating machines.
⢠Commodity 01. Photocopiers.
⢠Business Function 14. Retail.
14. Custom Taxonomies
â˘If your own coding structure is strong enough for your business, or you think
your business is more acquainted with your own structure
15. 1) Requisitions ERP Category
2) Purchase Orders
3) Receipts
4) Invoices
Procurement & Spend Analysis
Item Invoice Categories Supplier
Description Description Description
And Attribute And Attribute And Attribute
ERP Taxonomy UNSPSC Code Custom
Taxonomies
Data
Mining
Spend Classification
16. What is Spend Analysis?
â˘Process of collecting, cleansing, classifying and analyzing expenditure data
with the purpose of reducing procurement.
â˘Process of aggregating, classifying, and leveraging spend data for the purpose
of gaining visibility into cost reduction, performance improvement, and contract
compliance opportunities.
â˘Enables to answer the following questions:
â˘Who is buying ?
â˘What ?
â˘From whom ?
â˘When ?
â˘(optionally) Where ?
â˘At what price ?
17. Who needs Spend Analysis?
â˘It is the process of organizing a companyâs spend in such a way that one
understand it, slice it, dice it and uncover hidden savings opportunities.
â˘Impacts more than just the sourcing team
â˘Spend analysis/ visibility serves three internal user community groups:
â˘Leadership and CxOs: who need up-to-date reports to drive strategic direction
â˘Managers, accountants: who need to drill down into a spend data set to explore specific areas
of interest or track down payment specifics
â˘Sourcing power users: who need to locate, drive, and monitor the next set of savings initiatives
18. What is Spend Management?
â˘Process in which companies control and optimize the money they spend.
â˘Involves cutting operating and other costs associated with doing business.
â˘Includes spend analysis, sourcing, procurement, receiving, payment settlement and
management of accounts payable and general ledger accounts.
â˘In an enterprise, spend management is managing how to spend money to best
effect in order to build products and services.
â˘Encompasses processes such as outsourcing, procurement, e-procurement, and supply chain
management.
19. Benefits of Spend Management
â˘Decreasing "maverick" spend
â˘Increase of spend economies of scale
â˘Strategic sourcing (also called "supplier rationalization")
â˘Sourcing optimization
â˘Co-operative sourcing
â˘Increase process efficiencies
â˘Increase procurement efficiency
20. Life cycle of a PO
Create PO
1
Add items to PO
2
Add PO to Cart *
3
Create Document for the PO in the Cart
4
Create Requisition for the Document
5
Note: PO needs to be classified before it hits the Cart. After the Order
hits the Cart, then it is too late for classification.
21. Classifying Spend
⢠We have a set of pre-defined fields chosen for classification from a Purchase
Order. All these fields are concatenated to form one giant string. (Note:
This textual string could have multi-lingual strings.)
⢠Lexers can be used for detecting languages. (eg: Auto lexers, World lexers)
⢠SVM could be used for Textual mining.
22. Where does Machine Learning fit in?
(Spend Auto-Classification)
Ontology (including Spend
Descriptions + other textual
attributes) Taxonomies
Spend transaction
Spend
Auto-classifier
Linguistics (UIMA) +
Neural Net Engine/
Text SVM
Auto-Classified
Spend
23. Training data set
⢠To begin with, customers provide a Training data set. This is from their
historic data. They take some well known data set from their most common
use cases. This would constitute a good representation of their problem.
⢠We run our logic against this training set and get the results. The results are
verified. We iterate this for some cycles to tune the logic.
⢠Repeat the same over other use cases.
24. Data Mining Model
Create a Model
Model created
Enrich/Re-train
Cleanse incorrect classification
Support new categories (if needed)
25. What is Named Entity Recognition ?
â˘âNamed-entity recognition (NER) (also known as entity identification and
entity extraction) is a subtask of information extraction that seeks to locate
and classify atomic elements in text into predefined categories such as the
names of persons, organizations, locations, expressions of times, quantities,
monetary values, percentages, etc.â -- Wikipedia
â˘Most research on NER systems has been structured as taking an
unannotated block of text, such as this one
⢠Jim bought 300 shares of Acme Corp. in 2006.
â˘And producing an annotated block of text, such as this one:
⢠<ENAMEX TYPE="PERSON">Jim</ENAMEX>bought<NUMEX
TYPE="QUANTITY">300</NUMEX>shares of<ENAMEX TYPE="ORGANIZATION">Acme
Corp.</ENAMEX> in <TIMEX TYPE="DATE">2006</TIMEX>.
26. Anatomy of a query âŚ
Query = âFind Approved Status POs with High
Amountâ
27. Stemmed Entity Recognition & Linguistic
Parsing yieldsâŚ
Search Verb:
âFindâ
Find Approved Status POs with High Amount
28. Stemmed Entity Recognition & Linguistic
Parsing yieldsâŚ
Search Verb:
âFindâ
Attribute:Status= âApprovedâ
Find Approved Status POs with High Amount
29. Stemmed Entity Recognition & Linguistic
Parsing yieldsâŚ
Search Verb: Entity:
âFindâ Attribute:Type=âPOâ
Attribute:Status= âApprovedâ
Find Approved Status POs with High Amount
30. Stemmed Entity Recognition & Linguistic
Parsing yieldsâŚ
Search Verb: Entity:
âFindâ Attribute:Type=âPOâ
Attribute:Amount= âHighâ
Attribute:Status= âApprovedâ
Find Approved Status POs with High Amount
31. Stemmed Entity Recognition & Linguistic
Parsing yieldsâŚ
Search Verb: Target Entity:
âFindâ Attribute:Type=âPOâ
Attribute:Amount= âHighâ
Attribute:Status= âApprovedâ
Find Approved Status POs with High Amount
32. Stemmed Entity Recognition & Linguistic
Parsing yieldsâŚ
Search Verb: Target Entity:
âFindâ Attribute:Type=âPOâ
Having
Attribute
Attribute:Amount= âHighâ
Attribute:Status= âApprovedâ
Find Approved Status POs with High Amount
33. Stemmed Entity Recognition & Linguistic
Parsing yieldsâŚ
Search Verb: Target Entity:
âFindâ Attribute:Type=âPOâ
Having Having
Attribute Attribute
Attribute:Amount= âHighâ
Attribute:Status= âApprovedâ
Find Approved Status POs with High Amount
36. OWL:
attribute: string
Transaction Party
has a Code
OWL:class has OWL:class
many
Role OWL:class
has an plays OWL:
Is A
Bank
attribute: string
is related
ID to
Person Corporation
OWL:class Is A
OWL:class
OWL:attribute: Finance
number Corporation
OWL:class
has has
First has Name ID
has many many
Name OWL:class OWL:
Address attribute: string
Last has an OWL:attribute:
Account number
Name
OWL:attribute: has an in
ID
string
OWL:attribute: has
number
Door Street City State Zip Country
Number Name
OWL: OWL: OWL: OWL: attribute:
OWL: OWL: attribute:string
attribute: string attribute: string attribute: string attribute:string string
37. Transaction
ID:200911071234
has Party
has ID: SBK
has Role: S? Bank Role
played by
Bank
has Name: Bank Of Congo
has
many Address
has Street Name: Afrique Au Congo
has Country: RDC
38. Transaction
ID:200911071235
has Party
has ID: ORP
has Role: Ordering Party Role
played by
Person
has First Name: John
has Last Name: Doe
has
many Address
Account has City: Kinshasa
has Account Id: 123456 has Country: CD
in Bank
has Name: Bank Of Congo
39. Transaction
Transaction
ID:200911071234
ID:200911071235 has
is related
Party to
Party
has has ID: ORP has ID: SBK
has Role: Ordering Party Role has Role: S? Bank Role
played by
Person
has First Name: John played by
has Last Name: Doe Address
has City: Kinshasa Address
has Country: CD has Street Name:
has Afrique Au Congo
many has Country: RDC
Account
has Account Id: 123456
in Bank has
has Name: Bank Of Congo many
40. A possible solution: Pipelining approach
â˘Flow 1:
â˘Machine learning Pipeline: Input data is directly fed to the Machine Learning piece.
â˘Flow 2:
â˘Domain Ontology Pipeline: Input data is fed to a Domain Ontology.
â˘Standardize the output from the Domain Ontology.
â˘Machine learning Pipeline: Feed it into the Machine Learning piece.
â˘Flow 3:
â˘NER Pipeline: Input data is fed to a NER.
â˘Domain Ontology Pipeline: Output from the NER is fed to the Domain Ontology.
â˘Standardize the output from the Domain Ontology.
â˘Machine learning Pipeline: Feed it into the Machine Learning piece.
â˘Note:
â˘Domain Ontology and NER Pipelines can be optionally turned on or off
49. SVM Steps
1.Identify taxonomy (hierarchical or flat) to be classified against
2.Identity representative training data that has been classified to this taxonomy
3.Run training data against blank SVM model and the given taxonomy
4.Classify training data as per required taxonomy
5.Classify the data
6.Increase training population and enrich classification model
7.Recognize and realign impact of original model against fresh training data
8.Classify (manually) misclassifications into proper taxonomy nodes
9.Run step 6 through 8 until all the variations for a given domain have been recognized
10.Introduce live data
11.Repeat steps 4 and 5 for misclassifications
12.Store the result in a relational database
13.Insert data in an Ontology
14.Enable analysis using RQL or SPARQL
50. Open source software
1.Jena
2.Pentaho http://www.pentaho.com/
3. Stanford NER, http://nlp.stanford.edu/software/CRF-NER.shtml
4.Annie NER
5.GATE
6.UIMA
7.SVM, http://en.wikipedia.org/wiki/Support_vector_machine
We will talk about Auto-Classification and the place for Machine Learning . When a Spend transaction is added, what needs to happen is, the positioning of a spend in terms of a formal taxonomy might have to be dynamically changed. And that is not something that a person can manually do it in real time. We need an automated way of doing that. Â The spend transactions themselves have descriptions . When a tagging activity happens, when a review is written up , there is textual information. We could use UIMA, to pick out all the textual tokens â break them out into attributes and do Named Entity recognition. And then bring out a trained SVM engine which works on a model, that is able to pick up all the spend descriptions, and all its attributes from the Classification model, and tag it, and then position it appropriately in the Taxonomy. There are two flavors available: Neural Net Engine SVM They both have comparable performance. The bottom line is, we took in the spend Taxonomy , we took in the spend Ontology that describes the entire Spend model as well as the description of the spend - you can run it into a Neural Net Engine and then you can tag things, so that, as and when a new spend transaction is introduced, it is appropriately positioned in the Taxonomy, dynamically .