9. Chi Square Automatic Interaction Detection (CHAID) . CART and CHAID are decision tree techniques used for classification of a dataset. Rule induction – The extraction of useful if-then rules from data based on statistical significance. Nearest neighbor – Classify records based on the k-most similar records Data visualization - Visual interpretation of complex relationships in multidimensional data.
10. Applications Can be divided into four major kinds: Classification Numerical prediction Association Clustering Some examples: Automatic abstraction Financial forecasting Targeted marketing Medical diagnosis Credit card fraud detection Weather forecasting etc.
11.
12. Enterprise edition (Community Edition + More Features + Services + Guarantees) *YALE - Yet Another Learning Environment
13. Some properties of RapidMiner: Written in Java Knowledge discovery processes are modelled as operator trees Internal XML representation ensures standardized interchange format of data mining experiments Scriptinglanguage allows for automating large-scale experiments Multi-layered data view concept ensures efficient and transparent data handling GUI, command-line mode (batch mode), and Java API for using RapidMiner from other programs Several plugins already exist A large set of high-dimensional visualization schemes for data and models offered by its plotting facility. Applications: text mining, multimedia mining, feature engineering, data stream mining and tracking drifting concepts, development of ensemble methods, and distributed data mining.
14.
15. GUI can be used to design XML description of the operator tree
16. Break points can be used to check the intermediate resultsUse from a separate program Command line version and Java API can be used to invoke RapidMiner in your programs without using the GUI
17. Download and Installation Steps Download The latest version of RapidMiner can be downloaded from http://rapid-i.com/content/blogsection/7/82/lang,en/ by selecting the appropriate version(Windows x86, x64 etc.) and RapidMiner edition Installation Windows executable Download the windows executable (.exe) file Double-click the rapidminer-xxx-instal.exe file to run it Follow the instructions
18.
19.
20.
21.
22. Supported File Formats Can read data files, read & write models, parameter sets and attribute sets. Most important – examples and instances
23. Data files & attribute description files ARFFEXAMPLESOURCE - .arff format DATABASEEXAMPLESOURCE – To read from databases SPARSEFORMATEXAMPLESOURCE DENSEFORMATEXAMPLESOURCE Attribute description file (.aml) in order to retrieve metadata about the instances XML Attributes that can be set: Name – unique name of the attribute Sourcefile – name of the file containing the data(default used if not specified) Sourcecol –column within the file(Starting from 1) Sourcecol_end – sourcecol-sourcecol_end attributes are generated with the same properties. Valuetype– one out of nominal,numeric, integer, real, ordered, binominal, polynominal and file_path Blocktype – one out of single_value, value_series, value_series_start, value_series_end, interval, interval_start, interval_end
24. Model files (.mod files) Contains the models generated by previous runs MODELWRITER – to write model files MODELLOADER – to read model files MODELAPPLIER – to apply model files Attribute construction files (.att files) ATTRIBUTECONSTRUCTIONWRITER – writes an attribute set ATTRIBUTECONSTRUCTIONLOADER – reads an attribute set Parameter set files (.par files) GRIDPARAMETEROPTIMIZTION – generates a set of optimal parameters for a particular task PARAMETERSETLOADER – use the parameter files Attribute weight files (.wgt files) Attibute selection is seen as attribute weighing which allows for more flexibility ATTRIBUTEWEIGHTSWRITER – to write attribute weights to a file ATTRIBUTEWEIGHTSLOADER – to read the attribute weights ATTRIBUTEWEIGHTSAPPLIER – to apply in the example sets