FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS Nikolay Zagoruiko Irina Borisova, Vladimir Dyubanov, Olga Kytnenko Institute of Mathematics of the Siberian Devision of the Russian Academy of Sciences, Pr. Koptyg 4, 630090 Novosibirsk, Russia, ,[object Object]

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Specificity of Data Mining tasks: ,[object Object],[object Object],[object Object],[object Object]

Some real tasks DM ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Data Mining Cup 2009 http:www.prudsys.deServiceDownloadsbin Prognosis of data at absolure scale To predict 19344 cells 1 . . . . . . . 2418 C O N T R O L 1 . . . 84% = 0 . . A = 0 - 2300 . 2394 T R A I N I N G 1…8 1…………………………………………1856

DMC 2009 618 teams from 164 Universities of 42 countries participated 231 have sent decisions, 49 were selected for rating NN Teams Errors NN Teams Errors 1938612 FH Hannover 49 23488 Isfahan University of Technology 15 77551 Warsaw School of Economics 48 23277 Budapest University of Technology 14 45096 Uiversity of Edinburgh 39 21780 RWTH Aachen_I 11 32841 Technical University of Kosice 38 21195 KTH Royal Institute of Technology 10 28670 Anna University Coimbatore 34 21064 Uni Hamburg_ 9 28517 Indian Institute of Technology 32 20767 Hochschule Anhalt 8 26254 University of Central Florida 26 20140 FH Brandenburg_II 7 25829 Telkom Institute of Technology 25 19814 FH Brandenburg_I 6 25694 University of Southampton 24 18763 Uni Karlsruhe TH_ I 5 24884 University Laval 20 18353 Novosibirsk State University 4 23952 Zhejiang University of Sc. and Tech 19 18163 TU Dresden 3 23796 Uni Weimar_I 18 17912 TU Dortmund 2 23626 TU Graz 16 17260 Uni Karlsruhe TH_ II 1

Comparison with 10 methods ,[object Object],[object Object],9 tasks on microarray data. 10 methods the feature selection . Independent attributes . Selection of n first (best) . Criteria – min of errors on CV: 10 time by 50%. Decision rules: Support Vector Machine ( SVM ), Between Group Analysis ( BGA ), Naive Bayes Classification ( NBC ), K - Nearest Neighbors ( KNN ).

Methods of selection ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Results of comperasing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Recognition of two types of Leukemia - ALL and AML ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Pentium T=3 hours Name of gene Weight 2641/1 , 4049/1 33 2641/1 32 On the 27 first rules P =34/34 The 10 best rules Pentium T=13 sec I . Guyon, J . Weston, S . Barnhill, V . Vapnik Zagoruiko N., Borisova I., Dyubanov V., Kutnenko O. F RiS Decision Rules P 0,72656 537/1 , 1833/1 , 2641/2 , 4049/2 34 0,71373 1454/1 , 2641/1 , 4049/1 34 0,71208 2641/1 , 3264/1 , 4049/1 34 0,71077 435/1 , 2641/2 , 4049/2 , 6800/1 34 0,70993 2266/1 , 2641/2 , 4049/2 34 0,70973 2266/1 , 2641/2 , 2724/1 , 4049/2 34 0,70711 2266/1 , 2641/2 , 3264/1 , 4049/2 34 0,70574 2641/2 , 3264/1 , 4049/2 , 4446/1 34 0,70532 435/1 , 2641/2 , 2895/1 , 4049/2 34 0,70243 2641/2 , 2724/1 , 3862/1 , 4049/2 34

Projection of training set on 2-dim. space 2641 and 4049 ALL AML

Diabetes of II type Ordering of patients M=43 17+8+18 , N=5520 ,[object Object],Healthy Patients Group of risk The group of risk did not participate in training It is useful for early diagnostics of diseases and for monitoring process of treatment F=+1 F=-1

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Similarity is not absolute, but a relative category Is a object b similar to a or it is not similar? Whether objects a and b belong to one class? a b a b c a b c We should know the answer on question: In competition with what?

F unction of Concurrent ( Ri val) S imilarity ( FRiS ) r1 r2 -1 z A +1 B d2 F A B z r1 r2

All pattern recognition methods are based on hypothesis of compactness Braverman E.M. , 1962 The patterns are compact if -the number of boundary points is not enough in comparison with their common number; - compact patterns are separated from each other refer to not too elaborate borders. Compactness

Compactness Similarity between objects of one pattern should be maximal Similarity between objects of different patterns should be minimal

Maximal similarity between objects of the same pattern Compact patterns should satisfy to condition of the Defensive capacity: Compactness

Tolerance: Compactness Maximal difference of these objects with the objects of other patterns Compact patterns should satisfy to the condition

Selection of the standards (stolps) Algorithm FRiS-Stolp

Censoring of the training set H P =argmax |r|(H,P) = 1,2,…7 1.0.8689 -90(90)-20 2.0.8902 -90(90)-20 3.0.9084 -90(90)-20 4.0.9167 -90(90)-20 5.0.8903 - 90(90)-20 6.0.7309 -88(90)-9 7.0.2324 -86(90)-7

Informativeness by Fisher for normal distribution Compactness has the same sense and can be used as a criteria of informativeness, which is invariant to low of distribution and to relation of NM Results of comparative researches have shown appreciable advantage of this criterion in comparison with commonly used number of errors at Cross-Validation Criteria

Comparison of the criteria (CV - FRiS) ,[object Object],[object Object],[object Object],noise N =100 M =2*100 m t =2*35 m C =2*65 +noise noise Criteria

Algorithm GRAD ,[object Object],[object Object],[object Object],[object Object],GRAD

Algorithm AdDel ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],R (AdDel) > R (DelAd) > R (Ad) > R (Del) GRAD

Algorithm GRAD ,[object Object],[object Object],[object Object],Decision : orientation on individual informativeness of attributes Dependence of frequency f hits in an informative subsystem from serial number L on individual informativeness It allows to granulate a most informative part attributes only GRAD L f

Algorithm GRAD (Granulated AdDel) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],GRAD

Value of FRiS for points on a plane

Classification (Algorithm FRiS-Class) FRiS-Cluster divides a objects on clusters FRiS-Tax unites a clusters to classes ( taxons ) Using FRiS-function allows: - To make a taxons of any form ; - To search a optimal number of taksons. r 1 r 2 * r 1 r 2 *

Examples of taxonomies by a algorithm FRiS-Class

Примеры таксономии алгоритмом FRiS-Class

Comparison the FRiS-Class with other algorithms of taxonomy K

Universal classification ,[object Object],[object Object],[object Object],[object Object]

New methods of DM, using FRiS - function ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Unsettled problems ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Conclusion ,[object Object],[object Object],[object Object],[object Object],[object Object]

Conclusion ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (7)

Ähnlich wie FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Ähnlich wie FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS (20)

Mehr von Irene Pochinok

Mehr von Irene Pochinok (8)

FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS