SlideShare ist ein Scribd-Unternehmen logo
1 von 13
Downloaden Sie, um offline zu lesen
2
Generic Framework For Knowledge
Classification
By
Venkata Vineel
3
Agenda
•  Introduction
•  Problem at Hand
•  How is it solved ?
•  Challenges
•  Skills and Career alignment
•  Q & A
4
Introduction
•  Masters in Computer Science
University of Utah, SaltLakeCity, UT
•  Systems Engineering Intern
Internal tools team - Knowledge Management
Interests:
Scalability challenges, Machine Learning and Visualization.
5
Problem at Hand
•  Generic Framework for classifying knowledge
•  Classifying questions in Answer Hub
6
How did I solve ??
•  Developed an generic algorithm.
•  Answer Hub Knowledge Base that learns.
7
Project High Points
•  72 % percent accuracy has been achieved.
0 2000 4000 6000 8000 10000 12000 14000 16000 18000
1
3
5
7
9
11
13
15
17
19
21
23
Rank Statastics
No of Questions RANK CATEGORIES
8
Confusion matrix
Categories	
   V3	
   GBX	
   C3	
   Hadoop	
   BES	
   DAL	
   Raptor	
   Stratus	
   Security	
  Pla>orm	
   General	
   User	
  Tracking	
   ExperimentaEon	
   Service	
  Frameworks	
   Search	
  Services	
   Sherlock	
   Batch	
  Frameword	
   Trinity	
   Commerce	
  OS	
  	
   Teradata	
   AnalyEcs	
  Pla>orm	
   Total	
  
V3	
   1552	
   2	
   1	
   2	
   6	
   263	
   217	
   3	
   23	
   455	
   2	
   41	
   290	
   9	
   3	
   6	
   0	
   0	
   0	
   0	
   2875	
  
GBX	
   1	
   68	
   0	
   0	
   0	
   6	
   37	
   0	
   1	
   9	
   1	
   26	
   4	
   8	
   0	
   0	
   0	
   1	
   0	
   0	
   162	
  
C3	
   0	
   0	
   318	
   1	
   1	
   25	
   27	
   54	
   5	
   32	
   1	
   6	
   1	
   4	
   0	
   1	
   0	
   1	
   1	
   0	
   478	
  
Hadoop	
   0	
   0	
   2	
   173	
   1	
   10	
   8	
   0	
   0	
   20	
   1	
   3	
   4	
   0	
   3	
   0	
   0	
   0	
   0	
   0	
   225	
  
BES	
   11	
   0	
   0	
   0	
   300	
   59	
   39	
   1	
   0	
   5	
   0	
   1	
   22	
   0	
   0	
   0	
   0	
   0	
   0	
   0	
   438	
  
DAL	
   67	
   0	
   1	
   0	
   3	
   2307	
   89	
   0	
   2	
   16	
   0	
   13	
   99	
   5	
   0	
   1	
   0	
   0	
   0	
   0	
   2603	
  
Raptor	
   11	
   10	
   5	
   2	
   25	
   396	
   5352	
   3	
   62	
   212	
   26	
   184	
   337	
   25	
   6	
   17	
   0	
   0	
   1	
   0	
   6674	
  
Stratus	
   1	
   0	
   82	
   2	
   1	
   40	
   188	
   435	
   4	
   40	
   0	
   13	
   6	
   0	
   2	
   1	
   0	
   1	
   0	
   0	
   816	
  
Security	
  Pla>orm	
   4	
   0	
   0	
   0	
   0	
   32	
   38	
   0	
   174	
   11	
   0	
   6	
   129	
   1	
   0	
   1	
   0	
   0	
   0	
   0	
   396	
  
General	
   100	
   2	
   12	
   15	
   6	
   129	
   258	
   16	
   13	
   1200	
   3	
   88	
   64	
   29	
   4	
   3	
   0	
   0	
   5	
   0	
   1947	
  
User	
  Tracking	
   3	
   0	
   0	
   1	
   0	
   16	
   43	
   0	
   3	
   8	
   126	
   41	
   10	
   1	
   0	
   0	
   0	
   0	
   0	
   0	
   252	
  
ExperimentaEon	
   1	
   1	
   0	
   0	
   0	
   27	
   40	
   0	
   1	
   8	
   0	
   868	
   29	
   1	
   0	
   0	
   0	
   0	
   3	
   0	
   979	
  
Service	
  Frameworks	
   124	
   3	
   0	
   0	
   6	
   90	
   299	
   2	
   67	
   83	
   0	
   56	
   1977	
   38	
   5	
   3	
   0	
   11	
   0	
   0	
   2764	
  
Search	
  Services	
   0	
   1	
   1	
   0	
   1	
   5	
   9	
   1	
   2	
   8	
   0	
   4	
   32	
   163	
   0	
   0	
   0	
   0	
   0	
   0	
   227	
  
Sherlock	
   2	
   0	
   0	
   4	
   0	
   67	
   31	
   2	
   0	
   17	
   0	
   29	
   19	
   0	
   85	
   0	
   0	
   0	
   0	
   0	
   256	
  
Batch	
  Frameword	
   11	
   0	
   0	
   2	
   2	
   100	
   92	
   2	
   2	
   10	
   0	
   2	
   22	
   0	
   0	
   67	
   0	
   0	
   1	
   0	
   313	
  
Trinity	
   0	
   0	
   0	
   0	
   0	
   0	
   0	
   0	
   0	
   0	
   0	
   4	
   1	
   1	
   0	
   0	
   0	
   0	
   0	
   0	
   6	
  
Commerce	
  OS	
  	
   0	
   0	
   0	
   0	
   0	
   10	
   48	
   0	
   4	
   15	
   0	
   14	
   15	
   8	
   0	
   0	
   0	
   103	
   0	
   0	
   217	
  
Teradata	
   0	
   0	
   1	
   1	
   0	
   10	
   0	
   0	
   0	
   0	
   1	
   16	
   2	
   1	
   0	
   1	
   0	
   0	
   49	
   0	
   82	
  
AnalyEcs	
  Pla>orm	
   0	
   0	
   1	
   1	
   0	
   5	
   1	
   0	
   1	
   23	
   1	
   14	
   0	
   3	
   1	
   0	
   0	
   0	
   1	
   11	
   63	
  
Total	
   1888	
   87	
   424	
   204	
   352	
   3597	
   6816	
   519	
   364	
   2172	
   162	
   1429	
   3063	
   297	
   109	
   101	
   0	
   117	
   61	
   11	
   21773	
  
Percentage	
  correct	
   82.20339	
   78.16092	
   75	
   84.80392	
   85.22727	
   64.13678	
   78.52113	
   83.81503	
   47.8021978	
   55.24862	
   77.77777778	
   60.74177747	
   64.54456415	
   54.88215488	
   77.98165	
   66.33663366	
   #DIV/0!	
   88.03418803	
   80.32787	
   100	
  	
  	
  
9
Challenges and How Did We Overcome Those
•  Sparse data.
•  Large number of features.
•  Chi- Square test came to the rescue.
10
Skills Obtained
•  Lucene
•  Literature survey of existing techniques
•  Machine Learning and NLP
•  Exposure to productizing research
11
Alignment With My Career Path
•  Interested in Text and Machine Learning.
•  eBay has tonnes of data.
12
Future Scope for Improvement
•  User profile
•  Support Vector Machine, TF-IDF and k-NN algorithms
13
Q&A

Weitere ähnliche Inhalte

Ähnlich wie Generic Framework for Knowledge Classification-1

ClickHouse Analytical DBMS. Introduction and usage, by Alexander Zaitsev
ClickHouse Analytical DBMS. Introduction and usage, by Alexander ZaitsevClickHouse Analytical DBMS. Introduction and usage, by Alexander Zaitsev
ClickHouse Analytical DBMS. Introduction and usage, by Alexander ZaitsevAltinity Ltd
 
realestate and MySQL devops melbourne
realestate and MySQL devops melbournerealestate and MySQL devops melbourne
realestate and MySQL devops melbournemysqldbahelp
 
Detecting Malicious Websites using Machine Learning
Detecting Malicious Websites using Machine LearningDetecting Malicious Websites using Machine Learning
Detecting Malicious Websites using Machine LearningAndrew Beard
 
Рахманов Александр "Что полезного в разборе дампов для .NET-разработчиков?"
Рахманов Александр "Что полезного в разборе дампов для .NET-разработчиков?"Рахманов Александр "Что полезного в разборе дампов для .NET-разработчиков?"
Рахманов Александр "Что полезного в разборе дампов для .NET-разработчиков?"Yulia Tsisyk
 
Software vulnerability discovery and exploitation during red team assessments
Software vulnerability discovery and exploitation during red team assessmentsSoftware vulnerability discovery and exploitation during red team assessments
Software vulnerability discovery and exploitation during red team assessmentsb0yd
 
Hive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkHive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkDongwon Kim
 
Future Architecture of Streaming Analytics: Capitalizing on the Analytics of ...
Future Architecture of Streaming Analytics: Capitalizing on the Analytics of ...Future Architecture of Streaming Analytics: Capitalizing on the Analytics of ...
Future Architecture of Streaming Analytics: Capitalizing on the Analytics of ...DataWorks Summit
 
Using Machine Learning to Debug Oracle RAC Issues
Using Machine Learning to Debug Oracle RAC IssuesUsing Machine Learning to Debug Oracle RAC Issues
Using Machine Learning to Debug Oracle RAC IssuesAnil Nair
 
Using Amazon Machine Learning to Identify Trends in IoT Data - Technical 201
Using Amazon Machine Learning to Identify Trends in IoT Data - Technical 201Using Amazon Machine Learning to Identify Trends in IoT Data - Technical 201
Using Amazon Machine Learning to Identify Trends in IoT Data - Technical 201Amazon Web Services
 
Using amazon machine learning to identify trends in io t data technical 201
Using amazon machine learning to identify trends in io t data   technical 201Using amazon machine learning to identify trends in io t data   technical 201
Using amazon machine learning to identify trends in io t data technical 201Amazon Web Services
 
HYDSPIN Dec14 visual story telling
HYDSPIN Dec14 visual story tellingHYDSPIN Dec14 visual story telling
HYDSPIN Dec14 visual story tellingGramener
 
Machine Learning and React Native
Machine Learning and React NativeMachine Learning and React Native
Machine Learning and React NativeRay Deck
 
Investigating the Semantic Gap through Query Log Analysis
Investigating the Semantic Gap through Query Log AnalysisInvestigating the Semantic Gap through Query Log Analysis
Investigating the Semantic Gap through Query Log AnalysisPeter Mika
 
Large-scale Experimentation with Network Abstraction for Network Configuratio...
Large-scale Experimentation with Network Abstraction for Network Configuratio...Large-scale Experimentation with Network Abstraction for Network Configuratio...
Large-scale Experimentation with Network Abstraction for Network Configuratio...ARCFIRE ICT
 
SQL on Hadoop benchmarks using TPC-DS query set
SQL on Hadoop benchmarks using TPC-DS query setSQL on Hadoop benchmarks using TPC-DS query set
SQL on Hadoop benchmarks using TPC-DS query setKognitio
 

Ähnlich wie Generic Framework for Knowledge Classification-1 (20)

ClickHouse Analytical DBMS. Introduction and usage, by Alexander Zaitsev
ClickHouse Analytical DBMS. Introduction and usage, by Alexander ZaitsevClickHouse Analytical DBMS. Introduction and usage, by Alexander Zaitsev
ClickHouse Analytical DBMS. Introduction and usage, by Alexander Zaitsev
 
realestate and MySQL devops melbourne
realestate and MySQL devops melbournerealestate and MySQL devops melbourne
realestate and MySQL devops melbourne
 
Detecting Malicious Websites using Machine Learning
Detecting Malicious Websites using Machine LearningDetecting Malicious Websites using Machine Learning
Detecting Malicious Websites using Machine Learning
 
Рахманов Александр "Что полезного в разборе дампов для .NET-разработчиков?"
Рахманов Александр "Что полезного в разборе дампов для .NET-разработчиков?"Рахманов Александр "Что полезного в разборе дампов для .NET-разработчиков?"
Рахманов Александр "Что полезного в разборе дампов для .NET-разработчиков?"
 
QCon London.pdf
QCon London.pdfQCon London.pdf
QCon London.pdf
 
Software vulnerability discovery and exploitation during red team assessments
Software vulnerability discovery and exploitation during red team assessmentsSoftware vulnerability discovery and exploitation during red team assessments
Software vulnerability discovery and exploitation during red team assessments
 
Hive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkHive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmark
 
Future Architecture of Streaming Analytics: Capitalizing on the Analytics of ...
Future Architecture of Streaming Analytics: Capitalizing on the Analytics of ...Future Architecture of Streaming Analytics: Capitalizing on the Analytics of ...
Future Architecture of Streaming Analytics: Capitalizing on the Analytics of ...
 
Using Machine Learning to Debug Oracle RAC Issues
Using Machine Learning to Debug Oracle RAC IssuesUsing Machine Learning to Debug Oracle RAC Issues
Using Machine Learning to Debug Oracle RAC Issues
 
Empowering the quantum revolution with Q#
Empowering the quantum revolution with Q#Empowering the quantum revolution with Q#
Empowering the quantum revolution with Q#
 
Using Amazon Machine Learning to Identify Trends in IoT Data - Technical 201
Using Amazon Machine Learning to Identify Trends in IoT Data - Technical 201Using Amazon Machine Learning to Identify Trends in IoT Data - Technical 201
Using Amazon Machine Learning to Identify Trends in IoT Data - Technical 201
 
Using amazon machine learning to identify trends in io t data technical 201
Using amazon machine learning to identify trends in io t data   technical 201Using amazon machine learning to identify trends in io t data   technical 201
Using amazon machine learning to identify trends in io t data technical 201
 
Performance Risk Management
Performance Risk ManagementPerformance Risk Management
Performance Risk Management
 
SIG-NOC Tools survey results
SIG-NOC Tools survey resultsSIG-NOC Tools survey results
SIG-NOC Tools survey results
 
HYDSPIN Dec14 visual story telling
HYDSPIN Dec14 visual story tellingHYDSPIN Dec14 visual story telling
HYDSPIN Dec14 visual story telling
 
Machine Learning and React Native
Machine Learning and React NativeMachine Learning and React Native
Machine Learning and React Native
 
Quick Wins
Quick WinsQuick Wins
Quick Wins
 
Investigating the Semantic Gap through Query Log Analysis
Investigating the Semantic Gap through Query Log AnalysisInvestigating the Semantic Gap through Query Log Analysis
Investigating the Semantic Gap through Query Log Analysis
 
Large-scale Experimentation with Network Abstraction for Network Configuratio...
Large-scale Experimentation with Network Abstraction for Network Configuratio...Large-scale Experimentation with Network Abstraction for Network Configuratio...
Large-scale Experimentation with Network Abstraction for Network Configuratio...
 
SQL on Hadoop benchmarks using TPC-DS query set
SQL on Hadoop benchmarks using TPC-DS query setSQL on Hadoop benchmarks using TPC-DS query set
SQL on Hadoop benchmarks using TPC-DS query set
 

Generic Framework for Knowledge Classification-1

  • 1.
  • 2. 2 Generic Framework For Knowledge Classification By Venkata Vineel
  • 3. 3 Agenda •  Introduction •  Problem at Hand •  How is it solved ? •  Challenges •  Skills and Career alignment •  Q & A
  • 4. 4 Introduction •  Masters in Computer Science University of Utah, SaltLakeCity, UT •  Systems Engineering Intern Internal tools team - Knowledge Management Interests: Scalability challenges, Machine Learning and Visualization.
  • 5. 5 Problem at Hand •  Generic Framework for classifying knowledge •  Classifying questions in Answer Hub
  • 6. 6 How did I solve ?? •  Developed an generic algorithm. •  Answer Hub Knowledge Base that learns.
  • 7. 7 Project High Points •  72 % percent accuracy has been achieved. 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 1 3 5 7 9 11 13 15 17 19 21 23 Rank Statastics No of Questions RANK CATEGORIES
  • 8. 8 Confusion matrix Categories   V3   GBX   C3   Hadoop   BES   DAL   Raptor   Stratus   Security  Pla>orm   General   User  Tracking   ExperimentaEon   Service  Frameworks   Search  Services   Sherlock   Batch  Frameword   Trinity   Commerce  OS     Teradata   AnalyEcs  Pla>orm   Total   V3   1552   2   1   2   6   263   217   3   23   455   2   41   290   9   3   6   0   0   0   0   2875   GBX   1   68   0   0   0   6   37   0   1   9   1   26   4   8   0   0   0   1   0   0   162   C3   0   0   318   1   1   25   27   54   5   32   1   6   1   4   0   1   0   1   1   0   478   Hadoop   0   0   2   173   1   10   8   0   0   20   1   3   4   0   3   0   0   0   0   0   225   BES   11   0   0   0   300   59   39   1   0   5   0   1   22   0   0   0   0   0   0   0   438   DAL   67   0   1   0   3   2307   89   0   2   16   0   13   99   5   0   1   0   0   0   0   2603   Raptor   11   10   5   2   25   396   5352   3   62   212   26   184   337   25   6   17   0   0   1   0   6674   Stratus   1   0   82   2   1   40   188   435   4   40   0   13   6   0   2   1   0   1   0   0   816   Security  Pla>orm   4   0   0   0   0   32   38   0   174   11   0   6   129   1   0   1   0   0   0   0   396   General   100   2   12   15   6   129   258   16   13   1200   3   88   64   29   4   3   0   0   5   0   1947   User  Tracking   3   0   0   1   0   16   43   0   3   8   126   41   10   1   0   0   0   0   0   0   252   ExperimentaEon   1   1   0   0   0   27   40   0   1   8   0   868   29   1   0   0   0   0   3   0   979   Service  Frameworks   124   3   0   0   6   90   299   2   67   83   0   56   1977   38   5   3   0   11   0   0   2764   Search  Services   0   1   1   0   1   5   9   1   2   8   0   4   32   163   0   0   0   0   0   0   227   Sherlock   2   0   0   4   0   67   31   2   0   17   0   29   19   0   85   0   0   0   0   0   256   Batch  Frameword   11   0   0   2   2   100   92   2   2   10   0   2   22   0   0   67   0   0   1   0   313   Trinity   0   0   0   0   0   0   0   0   0   0   0   4   1   1   0   0   0   0   0   0   6   Commerce  OS     0   0   0   0   0   10   48   0   4   15   0   14   15   8   0   0   0   103   0   0   217   Teradata   0   0   1   1   0   10   0   0   0   0   1   16   2   1   0   1   0   0   49   0   82   AnalyEcs  Pla>orm   0   0   1   1   0   5   1   0   1   23   1   14   0   3   1   0   0   0   1   11   63   Total   1888   87   424   204   352   3597   6816   519   364   2172   162   1429   3063   297   109   101   0   117   61   11   21773   Percentage  correct   82.20339   78.16092   75   84.80392   85.22727   64.13678   78.52113   83.81503   47.8021978   55.24862   77.77777778   60.74177747   64.54456415   54.88215488   77.98165   66.33663366   #DIV/0!   88.03418803   80.32787   100      
  • 9. 9 Challenges and How Did We Overcome Those •  Sparse data. •  Large number of features. •  Chi- Square test came to the rescue.
  • 10. 10 Skills Obtained •  Lucene •  Literature survey of existing techniques •  Machine Learning and NLP •  Exposure to productizing research
  • 11. 11 Alignment With My Career Path •  Interested in Text and Machine Learning. •  eBay has tonnes of data.
  • 12. 12 Future Scope for Improvement •  User profile •  Support Vector Machine, TF-IDF and k-NN algorithms