SlideShare a Scribd company logo
1 of 16
Text mining 
Fuzzy document classification 
Using Elasticsearch 
Lev Ozeryansky
Identity Card 
• Merging of Ankor and We! 
• Owned by Hilan (publicly traded in Tel Aviv Stock exchange) 
• Fast growing IT integration company 
• Over 2000 systems installed and maintained 
• Over 1000 leading customers - Hi-tech, Industry, Academy, Banks, 
Insurance, 
• Strong technological team – over 45 engineers, professional services 
and project managers 
• Over 120 employees 
• Four main divisions – Infrastructure, Big Data, Cloud, Cyber
Technology Edge
What is classification 
• Document classification as document categorization. 
• Using classification. 
• Our classification data source. 
• What we do with? 
• Java programmer. 
• .NET programmer.
Data source
The mathematics 
• Let be class set 
• Let be documents set 
• Classification function
Classification method 
• Cosine similarity 
• Function
Build document class 
vector 
• Java programmer 
• Java 
• 5 
• Hibernate 
• .NET programmer 
• C# 
• 5 
• Nhibernate
Let index classificators 
• Add weight manually. 
• For Java programmer: 
• Java = 0.7 
• 5 = 0.5 
• Hibernate = 0.3 
• For .NET programmer 
• C# = 0.7 
• 5 = 0.5 
• Nhibernate = 0.3
DEMO
w-shingling 
• In natural language processing a w-shingling is a set of 
unique "shingles"—contiguous subsequences of tokens in 
a document. (Wikipedia) 
• Tokenization 
• Elasticsearch analyze mechanism
DEMO
Classification process 
• Tokens array. 
• Classification query. 
• Use terms query when terms array == tokens array 
• Two vectors 
• Vector of filtered tokens 
• Classification vector
DEMO
Classification process 
• SciPy to calculate distance.
Q&A

More Related Content

What's hot

What's hot (19)

Linq architecture
Linq   architectureLinq   architecture
Linq architecture
 
Kibana Tutorial | Kibana Dashboard Tutorial | Kibana Elasticsearch | ELK Stac...
Kibana Tutorial | Kibana Dashboard Tutorial | Kibana Elasticsearch | ELK Stac...Kibana Tutorial | Kibana Dashboard Tutorial | Kibana Elasticsearch | ELK Stac...
Kibana Tutorial | Kibana Dashboard Tutorial | Kibana Elasticsearch | ELK Stac...
 
Elasticsearch { "Meetup" : "talk" }
Elasticsearch { "Meetup" : "talk" }Elasticsearch { "Meetup" : "talk" }
Elasticsearch { "Meetup" : "talk" }
 
A (XPages) developers guide to Cloudant - MeetIT
A (XPages) developers guide to Cloudant - MeetITA (XPages) developers guide to Cloudant - MeetIT
A (XPages) developers guide to Cloudant - MeetIT
 
Bank management system c++
Bank management system c++Bank management system c++
Bank management system c++
 
Architecture - why so serious?
Architecture - why so serious?Architecture - why so serious?
Architecture - why so serious?
 
Cloud architectural patterns and Microsoft Azure tools
Cloud architectural patterns and Microsoft Azure toolsCloud architectural patterns and Microsoft Azure tools
Cloud architectural patterns and Microsoft Azure tools
 
Getting started with Laravel & Elasticsearch
Getting started with Laravel & ElasticsearchGetting started with Laravel & Elasticsearch
Getting started with Laravel & Elasticsearch
 
Laravel and SOLR
Laravel and SOLRLaravel and SOLR
Laravel and SOLR
 
Scala for java developers 6 may 2017 - yeni
Scala for java developers   6 may 2017 - yeniScala for java developers   6 may 2017 - yeni
Scala for java developers 6 may 2017 - yeni
 
Tips & Tricks SQL in the City Seattle 2014
Tips & Tricks SQL in the City Seattle 2014Tips & Tricks SQL in the City Seattle 2014
Tips & Tricks SQL in the City Seattle 2014
 
Not only SQL - Database Choices
Not only SQL - Database ChoicesNot only SQL - Database Choices
Not only SQL - Database Choices
 
Adobe Spark Meetup - 9/19/2018 - San Jose, CA
Adobe Spark Meetup - 9/19/2018 - San Jose, CAAdobe Spark Meetup - 9/19/2018 - San Jose, CA
Adobe Spark Meetup - 9/19/2018 - San Jose, CA
 
Bleeding Edge Databases
Bleeding Edge DatabasesBleeding Edge Databases
Bleeding Edge Databases
 
AWS for Java Developers workshop
AWS for Java Developers workshopAWS for Java Developers workshop
AWS for Java Developers workshop
 
Machine Learning on the Microsoft Stack
Machine Learning on the Microsoft StackMachine Learning on the Microsoft Stack
Machine Learning on the Microsoft Stack
 
When Our Serverless Team Chooses Containers
When Our Serverless Team Chooses ContainersWhen Our Serverless Team Chooses Containers
When Our Serverless Team Chooses Containers
 
Test driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDBTest driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDB
 
Getting started with Azure Cognitive services
Getting started with Azure Cognitive servicesGetting started with Azure Cognitive services
Getting started with Azure Cognitive services
 

Viewers also liked

CV for Jason Cowan
CV for Jason CowanCV for Jason Cowan
CV for Jason Cowan
Jason Cowan
 
MBarberResume1Word-A
MBarberResume1Word-AMBarberResume1Word-A
MBarberResume1Word-A
Marisa Barber
 
Cathodic protection in_practise
Cathodic protection in_practiseCathodic protection in_practise
Cathodic protection in_practise
abdallahbeca
 
4th primary school of larissa!!!
4th primary school of larissa!!!4th primary school of larissa!!!
4th primary school of larissa!!!
sofiathoma
 
Мень А. Библиологический словарь (Ii том. k-п) - 2002
Мень А.   Библиологический словарь (Ii том. k-п) - 2002Мень А.   Библиологический словарь (Ii том. k-п) - 2002
Мень А. Библиологический словарь (Ii том. k-п) - 2002
Михаил Ломоносов
 

Viewers also liked (19)

HOW TO HACK FACEBOOK BY COMPUTER
HOW TO HACK FACEBOOK BY COMPUTER HOW TO HACK FACEBOOK BY COMPUTER
HOW TO HACK FACEBOOK BY COMPUTER
 
Vu-2015_tissue-plasticity-PNET
Vu-2015_tissue-plasticity-PNETVu-2015_tissue-plasticity-PNET
Vu-2015_tissue-plasticity-PNET
 
I'm Trying To Relax But It's Not Working
I'm Trying To Relax But It's Not WorkingI'm Trying To Relax But It's Not Working
I'm Trying To Relax But It's Not Working
 
CV for Jason Cowan
CV for Jason CowanCV for Jason Cowan
CV for Jason Cowan
 
Target audience
Target audienceTarget audience
Target audience
 
MBarberResume1Word-A
MBarberResume1Word-AMBarberResume1Word-A
MBarberResume1Word-A
 
Biologi-Sel
Biologi-SelBiologi-Sel
Biologi-Sel
 
Production process of print advert
Production process of print advertProduction process of print advert
Production process of print advert
 
Vail Style Guide
Vail Style GuideVail Style Guide
Vail Style Guide
 
Activity about parts of brain
Activity about parts of brain Activity about parts of brain
Activity about parts of brain
 
Narrative theory
Narrative theoryNarrative theory
Narrative theory
 
Tutorial: CASRAI Standards Development (for a non-technology audience) - Davi...
Tutorial: CASRAI Standards Development (for a non-technology audience) - Davi...Tutorial: CASRAI Standards Development (for a non-technology audience) - Davi...
Tutorial: CASRAI Standards Development (for a non-technology audience) - Davi...
 
Pokok bahasan hpp 2014
Pokok bahasan hpp 2014Pokok bahasan hpp 2014
Pokok bahasan hpp 2014
 
Cathodic protection in_practise
Cathodic protection in_practiseCathodic protection in_practise
Cathodic protection in_practise
 
Webinar session 2
Webinar session 2Webinar session 2
Webinar session 2
 
4th primary school of larissa!!!
4th primary school of larissa!!!4th primary school of larissa!!!
4th primary school of larissa!!!
 
Bhungroo_Brochure
Bhungroo_BrochureBhungroo_Brochure
Bhungroo_Brochure
 
Keynote: Today's Data Grow Tomorrow's Citizens - Tracey P. Lauriault
Keynote: Today's Data Grow Tomorrow's Citizens - Tracey P. LauriaultKeynote: Today's Data Grow Tomorrow's Citizens - Tracey P. Lauriault
Keynote: Today's Data Grow Tomorrow's Citizens - Tracey P. Lauriault
 
Мень А. Библиологический словарь (Ii том. k-п) - 2002
Мень А.   Библиологический словарь (Ii том. k-п) - 2002Мень А.   Библиологический словарь (Ii том. k-п) - 2002
Мень А. Библиологический словарь (Ii том. k-п) - 2002
 

Similar to Dev ops-presentation

Full Text Search with Lucene
Full Text Search with LuceneFull Text Search with Lucene
Full Text Search with Lucene
WO Community
 
scrazzl - A technical overview
scrazzl - A technical overviewscrazzl - A technical overview
scrazzl - A technical overview
scrazzl
 
Tokyo azure meetup #2 big data made easy
Tokyo azure meetup #2   big data made easyTokyo azure meetup #2   big data made easy
Tokyo azure meetup #2 big data made easy
Tokyo Azure Meetup
 
Tech Spark Presentation
Tech Spark PresentationTech Spark Presentation
Tech Spark Presentation
Stephen Borg
 

Similar to Dev ops-presentation (20)

Characerizing and Validating QoS in the Emerging IoT Network
Characerizing and Validating QoS in the Emerging IoT NetworkCharacerizing and Validating QoS in the Emerging IoT Network
Characerizing and Validating QoS in the Emerging IoT Network
 
Rdbms
RdbmsRdbms
Rdbms
 
Full Text Search with Lucene
Full Text Search with LuceneFull Text Search with Lucene
Full Text Search with Lucene
 
Self Service for IT Infrastructure
Self Service for IT Infrastructure Self Service for IT Infrastructure
Self Service for IT Infrastructure
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Delivering big content at NBC News with RavenDB
Delivering big content at NBC News with RavenDBDelivering big content at NBC News with RavenDB
Delivering big content at NBC News with RavenDB
 
AWS Lambda support for AWS X-Ray
AWS Lambda support for AWS X-RayAWS Lambda support for AWS X-Ray
AWS Lambda support for AWS X-Ray
 
Selenium Online Training
Selenium Online Training Selenium Online Training
Selenium Online Training
 
How to Build Deep Learning Models
How to Build Deep Learning ModelsHow to Build Deep Learning Models
How to Build Deep Learning Models
 
Mazenet
MazenetMazenet
Mazenet
 
scrazzl - A technical overview
scrazzl - A technical overviewscrazzl - A technical overview
scrazzl - A technical overview
 
Java
JavaJava
Java
 
Tokyo azure meetup #2 big data made easy
Tokyo azure meetup #2   big data made easyTokyo azure meetup #2   big data made easy
Tokyo azure meetup #2 big data made easy
 
Dev nexus 2017
Dev nexus 2017Dev nexus 2017
Dev nexus 2017
 
Sebastian Cohnen – Building a Startup with NoSQL - NoSQL matters Barcelona 2014
Sebastian Cohnen – Building a Startup with NoSQL - NoSQL matters Barcelona 2014Sebastian Cohnen – Building a Startup with NoSQL - NoSQL matters Barcelona 2014
Sebastian Cohnen – Building a Startup with NoSQL - NoSQL matters Barcelona 2014
 
Tech Spark Presentation
Tech Spark PresentationTech Spark Presentation
Tech Spark Presentation
 
Apache Cayenne: a Java ORM Alternative
Apache Cayenne: a Java ORM AlternativeApache Cayenne: a Java ORM Alternative
Apache Cayenne: a Java ORM Alternative
 
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
 
ppt_on_java.pptx
ppt_on_java.pptxppt_on_java.pptx
ppt_on_java.pptx
 
Elasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetupElasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetup
 

Recently uploaded

Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
masabamasaba
 

Recently uploaded (20)

%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 

Dev ops-presentation

  • 1. Text mining Fuzzy document classification Using Elasticsearch Lev Ozeryansky
  • 2. Identity Card • Merging of Ankor and We! • Owned by Hilan (publicly traded in Tel Aviv Stock exchange) • Fast growing IT integration company • Over 2000 systems installed and maintained • Over 1000 leading customers - Hi-tech, Industry, Academy, Banks, Insurance, • Strong technological team – over 45 engineers, professional services and project managers • Over 120 employees • Four main divisions – Infrastructure, Big Data, Cloud, Cyber
  • 4. What is classification • Document classification as document categorization. • Using classification. • Our classification data source. • What we do with? • Java programmer. • .NET programmer.
  • 6. The mathematics • Let be class set • Let be documents set • Classification function
  • 7. Classification method • Cosine similarity • Function
  • 8. Build document class vector • Java programmer • Java • 5 • Hibernate • .NET programmer • C# • 5 • Nhibernate
  • 9. Let index classificators • Add weight manually. • For Java programmer: • Java = 0.7 • 5 = 0.5 • Hibernate = 0.3 • For .NET programmer • C# = 0.7 • 5 = 0.5 • Nhibernate = 0.3
  • 10. DEMO
  • 11. w-shingling • In natural language processing a w-shingling is a set of unique "shingles"—contiguous subsequences of tokens in a document. (Wikipedia) • Tokenization • Elasticsearch analyze mechanism
  • 12. DEMO
  • 13. Classification process • Tokens array. • Classification query. • Use terms query when terms array == tokens array • Two vectors • Vector of filtered tokens • Classification vector
  • 14. DEMO
  • 15. Classification process • SciPy to calculate distance.
  • 16. Q&A