2. Types of variables Can be divided into two main types: Categorical attributes - Nominal, binary and ordinal variables Continuous attributes– integer, interval-scaled and ratio-scaled variables Ignore attribute (optional) - Variables which are of no significance
3. Data Cleaning Erroneous values can be divided into: Noisy value: Valid for the dataset, but incorrectly recorded Invalid values: Can be easily detected and removed/corrected Noise detection: Peaks in the dataset Some values outside the normal range: Such values could even be genuine (called as Outliers)
4. Missing Values Reasons of occurrence: Equipment malfunction Additional fields were added later Non-availability of information Strategies to deal with missing values Discard instances Replace by most frequent/average value
5. Visit more self help tutorials Pick a tutorial of your choice and browse through it at your own pace. The tutorials section is free, self-guiding and will not involve any additional support. Visit us at www.dataminingtools.net