A department, somewhere in EU, depends on having a steady input of 3000 new textual documents per day, 365 days a year. Documents come from 10 different sources and each document comes pre-classified into a single category of a large taxonomy. The department is unhappy: the accuracy of incoming document classifications seems to be low. Even after the department puts additional 800% FTE throughout the year to manually repair or discard wrongly classified documents, the accuracy still lags behind their targets. NIRI was hired to conduct a research and develop an accurate document classifier. The plan was to use NIRI’s classifier to replace the unreliable classes coming with documents, and thus solve the problem of low accuracy, as well as reduce the high cost of 800% FTE. In this talk we will share our experiences: classification approach used to meet the needs of our client, challenges in demonstrating progress during the project, and the approach used for the acceptance-validation of our classifier.
13. Business Context
The Challenge
The Solution
Effectiveness
Laboratory measurements
Impact estimation
Reality
Wrap up
The flow
14. The Solution:
NIRI will build you a better classifier
Vacancy
Aggregator
and
Classifier
NIRI
Classifier
Publish2000-4000 per day
15. Really?
How accurate will it be?
How will it fit our process?
Reduce manual effort
Increase volume
Improve final accuracy
Really. We will (try to):
16. But you need to give us training data
> 1M vacancies
No class
12%
Not verified
14%
Verified
74%
21. Business Context
The Challenge
The Solution
Effectiveness
Laboratory measurements
Impact estimation
Reality
Wrap up
The flow
22. Measuring accuracy in the laboratory
No class
12%
Not verified
14%
Verified
74%
No class
Incorrect
Correct
Test
20%
Train
80% Train
Test
x 5
Vacancy
Classifier
23. 74% 78% 80%
85%
14%
13% 12%
10%
12% 9% 8% 5%
CORPUS CLASSIFIER CLASSIFIER 100 CLASSIFIER 1000
ONE OF MANY LABORATORY MEASUREMENTS
Correct Incorrect No class
Measuring accuracy in the laboratory
24. Measuring accuracy in the laboratory
No class
12%
Not verified
14%
Verified
74%
Vacancy
Classifier
No class 9%
Incorrect
13%
Correct
78%
Original
Classifier
This is not relaity
Biased train/test set
Accuracy of test set unknown
Inability to test against 26%
25. Business Context
The Challenge
The Solution
Effectiveness
Laboratory measurements
Impact estimation
Reality
Wrap up
The flow
34. How was it built?
Check & Repair
4 eye principle
Vacancy
Classifier
Published
Original Code
&
Top 5 VC codes
Original Code
&
Top 5 VC codes
Original Code
&
Top 5 VC codes
Every single classification was marked as either
Correct, Acceptable, or Wrong
35. Results
63.05%
73.91% 72.06% 74.38%
65.98%
77.56% 76.25% 78.69%
CURRENT NIRI VC CURRENT
(HQ SOURCE)
NIRI VC
(HQ SOURCE)
GOLDEN TEST SET RESULTS
Correct Acceptable
Highest Quality Source (Training)
36. Business Context
The Challenge
The Solution
Effectiveness
Laboratory measurements
Impact estimation
Reality
Wrap up
The flow
37. Wrap up
Clean semantic data, in real-life, can only be a myth. We are looking into
data cleansing approaches.
Measuring usefulness can be hard and expensive, but …
… it can/must to be monitored after the system is deployed.
It changes over time. Continuous learning, where possible is a great thing.
1) Implementing state-of-the-art machine learning algorithm is one thing.
2) Making it useful is another.
3) Explaining that to the end-user is the third.
NIRI is a very cool company to work with!
I hope you liked the story, and I thank you for your attention.