Practical data science project groups energy supplier accounts using ML and NLP

•

3 gefällt mir•969 views

20min talk given at PyData London 2014 A client in the energy sector wanted to create predictive behavioural models of business customers at the company level, but the CRM data was messy, often containing several sub-accounts for each business, without any grouping identifiers, and so aggregation was impossible. In this talk I describe a short project where we used text mining, a handful of unsupervised learning techniques and pragmatic use of human skill, to identify the true company level structures in the CRM data.

Business

ON

I
EDIT
G

NIN

T
LIGH

A practical data science project

Company
ID

Account name

Contact
name

Premises address
lines 1 - 4

Billing address
lines 1 - 4

1

Bob’s Pizza

Big Bob

5 High St, Wexford

5 High St, Wexford

1

Bob’s Pizza

Big Bob

Temple Bar, D2

5 High St, Wexford

1

Mike’s Kebabs

Mad Mike

3 Upper St, Dublin

5 High St, Wexford

2

Mark’s Kebabs

Mild Mark

8 Upper St, Dublin

Main St, Waterford

3

Fred’s Falafel

Fat Fred

9 Henry St, Cork

9 Henry st, cork

3

Fred Fallafell

Freddie

Bridges St, Galway

Henrys St, Cork

This crucial bit of info groups the separatelyrecorded accounts into companies…
and was missing from the dataset

… x100,000

Account holder
Business name
Premises address
Billing address
...

Cleaned, parsed,
tokenised text
strings

NTLK.PunktTokeniser

x100,000

sklearn.TfidfVectorizer

SVD N-D
MATRIX

TF-IDF
2-D MATRIX

sklearn.TruncatedSVD

1. Suggest similar
accounts to be grouped
2. Human
validation &
verification

3. Incorporate & propagate
valid groupings

sklearn.MiniBatchKMeans
sklear.AffinityPropagation

sklearn.RadiusNeighborsClassifier

● A very quick turnaround from raw data to tagged
companies to 93% accuracy
● ~40% of accounts found to belong to a company, ~3.5
accounts per company
● NLP toolkits and scikit-learn allowed rapid
development and testing of solution
● Incorporated human identification at critical stages:
no ML problem is an island

Weitere ähnliche Inhalte

Andere mochten auch

Online recommendations at scale using matrix factorisationMarcus Ljungblad

Requirements for Processing Datasets for Recommender SystemsStoitsis Giannis

Customer Relationship Management in Ireland Managing your Customers for Busin...Krishna De

Recommendation Engine DemystifiedDKALab

Recommendation techniques sun9413

Data miningUjjwal Kumar

Data Mining Techniques for CRMShyaamini Balu

Association rule miningAcad

The comparative study of apriori and FP-growth algorithmdeepti92pawar

Role-Based Contextual Recommendationphonecom

Lecture13 - Association RulesAlbert Orriols-Puig

Apriori AlgorithmInternational School of Engineering

Recommender system algorithm and architectureLiang Xiang

Andere mochten auch (13)

Online recommendations at scale using matrix factorisation

Requirements for Processing Datasets for Recommender Systems

Customer Relationship Management in Ireland Managing your Customers for Busin...

Recommendation Engine Demystified

Recommendation techniques

Data mining

Data Mining Techniques for CRM

Association rule mining

The comparative study of apriori and FP-growth algorithm

Role-Based Contextual Recommendation

Lecture13 - Association Rules

Apriori Algorithm

Recommender system algorithm and architecture

Ähnlich wie Practical data science project groups energy supplier accounts using ML and NLP

Agriculture, horticulture, animal husbandry & stock breeding email listGlobal B2B Contacts LLC

Senior Technical Sales ManagerJoel Porterfield

Floating regas & storage attendees email list 13 14 nov 2018Global B2B Contacts LLC

Small and medium enterprise techno fair attendees email list 10 12 oct 2018Global B2B Contacts LLC

The green industry & equipment expo attendees email list 17 19 oct 2018Global B2B Contacts LLC

Small and medium enterprise techno fair attendees email list 10 12 oct 2018Global B2B Contacts LLC

Nailba meeting & exhibition orlando attendees email list 31 oct 02 nov 2018Global B2B Contacts LLC

Ems world expo attendees email list 29 oct 02 nov 2018Global B2B Contacts LLC

La orange county expo & tech forum attendees email list 01 nov 2018Global B2B Contacts LLC

Bucharest international technical fair attendees email list 10 13 oct 2018Global B2B Contacts LLC

Cryogen expo industrial gases attendees email list 30 oct 01 nov 2018Global B2B Contacts LLC

International stockbreeding and equipment fair attendees email list 09 13 oct...Global B2B Contacts LLC

World heavy oil show attendees email list 03 05 sep 2018Global B2B Contacts LLC

Salon industriel de quebec show attendees email list 02 04 oct 2018Global B2B Contacts LLC

International powder & bulk solids processing exhibition attendees email ...Global B2B Contacts LLC

Energy storage north america attendees email list 06 08 nov 2018Global B2B Contacts LLC

Le grand rendez vous attendees email list 07-08 nov 2018Global B2B Contacts LLC

Composites europe attendees email list 06 08 nov 2018Global B2B Contacts LLC

Foundry products trade fair exhibition attendees email list 30 oct 01 nov 2018Global B2B Contacts LLC

Ip utility safety conference & expo attendees email list 06 08 nov 2018Global B2B Contacts LLC

Ähnlich wie Practical data science project groups energy supplier accounts using ML and NLP (20)

Agriculture, horticulture, animal husbandry & stock breeding email list

Senior Technical Sales Manager

Floating regas & storage attendees email list 13 14 nov 2018

Small and medium enterprise techno fair attendees email list 10 12 oct 2018

The green industry & equipment expo attendees email list 17 19 oct 2018

Small and medium enterprise techno fair attendees email list 10 12 oct 2018

Nailba meeting & exhibition orlando attendees email list 31 oct 02 nov 2018

Ems world expo attendees email list 29 oct 02 nov 2018

La orange county expo & tech forum attendees email list 01 nov 2018

Bucharest international technical fair attendees email list 10 13 oct 2018

Cryogen expo industrial gases attendees email list 30 oct 01 nov 2018

International stockbreeding and equipment fair attendees email list 09 13 oct...

World heavy oil show attendees email list 03 05 sep 2018

Salon industriel de quebec show attendees email list 02 04 oct 2018

International powder & bulk solids processing exhibition attendees email ...

Energy storage north america attendees email list 06 08 nov 2018

Le grand rendez vous attendees email list 07-08 nov 2018

Composites europe attendees email list 06 08 nov 2018

Foundry products trade fair exhibition attendees email list 30 oct 01 nov 2018

Ip utility safety conference & expo attendees email list 06 08 nov 2018

Mehr von Jonathan Sedar

Demystifying Data ScienceJonathan Sedar

How is Data Science going to Improve Insurance?Jonathan Sedar

Visualising High Dimensional Data with TSNEJonathan Sedar

Bayesian Robust Linear Regression with Outlier DetectionJonathan Sedar

Topic Modelling on the Enron Email Corpus @ ODSC 13 Apr 2016Jonathan Sedar

Applied AI Tech Talk: How to Setup a Data Science DeptJonathan Sedar

Customer Clustering For Retail MarketingJonathan Sedar

Customer Clustering for Retailer MarketingJonathan Sedar

Mehr von Jonathan Sedar (8)

Demystifying Data Science

How is Data Science going to Improve Insurance?

Visualising High Dimensional Data with TSNE

Bayesian Robust Linear Regression with Outlier Detection

Topic Modelling on the Enron Email Corpus @ ODSC 13 Apr 2016

Applied AI Tech Talk: How to Setup a Data Science Dept

Customer Clustering For Retail Marketing

Customer Clustering for Retailer Marketing

Kürzlich hochgeladen

Boost the utilization of your HCL environment by reevaluating use cases and f...Roland Driesen

MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLSeo

It will be International Nurses' Day on 12 MayNZSG

Call Girls In Panjim North Goa 9971646499 Genuine Serviceritikaroy0888

VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service JamshedpurSuhani Kapoor

Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...lizamodels9

$Nepali Escort Girl Kakori \ 9548273370 Indian Call Girls Service Lucknow ₹,9517$ $Nepali Escort Girl Kakori \ 9548273370 Indian Call Girls Service Lucknow ₹,9517$

Nepali Escort Girl Kakori \ 9548273370 Indian Call Girls Service Lucknow ₹,9517Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

Sales & Marketing Alignment: How to Synergize for SuccessAggregage

Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999Tina Ji

Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Lviv Startup Club

VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Roomdivyansh0kumar0

0183760ssssssssssssssssssssssssssss00101011 (27).pdfRenandantas16

Best Basmati Rice Manufacturers in IndiaShree Krishna Exports

9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 DelhiCall Girls in Delhi

A DAY IN THE LIFE OF A SALESMAN / WOMANIlamathiKannappan

Pharma Works Profile of Karan Communicationskarancommunications

Grateful 7 speech thanking everyone that has helped.pdfPaul Menig

Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...anilsa9823

Regression analysis: Simple Linear Regression Multiple Linear RegressionRavindra Nath Shukla

Tech Startup Growth Hacking 101 - Basics on Growth MarketingShawn Pang

Kürzlich hochgeladen (20)

Boost the utilization of your HCL environment by reevaluating use cases and f...

MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL

It will be International Nurses' Day on 12 May

Call Girls In Panjim North Goa 9971646499 Genuine Service

VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur

Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...

$Nepali Escort Girl Kakori \ 9548273370 Indian Call Girls Service Lucknow ₹,9517$ $Nepali Escort Girl Kakori \ 9548273370 Indian Call Girls Service Lucknow ₹,9517$

Nepali Escort Girl Kakori \ 9548273370 Indian Call Girls Service Lucknow ₹,9517

Sales & Marketing Alignment: How to Synergize for Success

Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999

Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...

VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Room

0183760ssssssssssssssssssssssssssss00101011 (27).pdf

Best Basmati Rice Manufacturers in India

9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi

A DAY IN THE LIFE OF A SALESMAN / WOMAN

Pharma Works Profile of Karan Communications

Grateful 7 speech thanking everyone that has helped.pdf

Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...

Regression analysis: Simple Linear Regression Multiple Linear Regression

Tech Startup Growth Hacking 101 - Basics on Growth Marketing

Practical data science project groups energy supplier accounts using ML and NLP

1. ON I EDIT G NIN T LIGH A practical data science project

2. ● A CRM dataset (100k business accounts) belonging to a national energy supplier ● A knotty problem: multiple accounts per company, without any grouping ids ● How can we to find groups of accounts (larger company structures), using just the CRM data? ● Machine Learning (ML) and Natural Language Processing (NLP) tools and techniques in Python. ● Import: Scikit Learn and TextBlob (NLTK & Pattern)

3. Company ID Account name Contact name Premises address lines 1 - 4 Billing address lines 1 - 4 1 Bob’s Pizza Big Bob 5 High St, Wexford 5 High St, Wexford 1 Bob’s Pizza Big Bob Temple Bar, D2 5 High St, Wexford 1 Mike’s Kebabs Mad Mike 3 Upper St, Dublin 5 High St, Wexford 2 Mark’s Kebabs Mild Mark 8 Upper St, Dublin Main St, Waterford 3 Fred’s Falafel Fat Fred 9 Henry St, Cork 9 Henry st, cork 3 Fred Fallafell Freddie Bridges St, Galway Henrys St, Cork This crucial bit of info groups the separatelyrecorded accounts into companies… and was missing from the dataset … x100,000

4. Account holder Business name Premises address Billing address ... Cleaned, parsed, tokenised text strings NTLK.PunktTokeniser x100,000 sklearn.TfidfVectorizer SVD N-D MATRIX TF-IDF 2-D MATRIX sklearn.TruncatedSVD

5. 1. Suggest similar accounts to be grouped 2. Human validation & verification 3. Incorporate & propagate valid groupings sklearn.MiniBatchKMeans sklear.AffinityPropagation sklearn.RadiusNeighborsClassifier

6. ● A very quick turnaround from raw data to tagged companies to 93% accuracy ● ~40% of accounts found to belong to a company, ~3.5 accounts per company ● NLP toolkits and scikit-learn allowed rapid development and testing of solution ● Incorporated human identification at critical stages: no ML problem is an island

7. Any questions?

Practical data science project groups energy supplier accounts using ML and NLP

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (13)

Ähnlich wie Practical data science project groups energy supplier accounts using ML and NLP

Ähnlich wie Practical data science project groups energy supplier accounts using ML and NLP (20)

Mehr von Jonathan Sedar

Mehr von Jonathan Sedar (8)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Practical data science project groups energy supplier accounts using ML and NLP