2. Issues In Data Integration
There are number of issues to consider during data
integration
Schema integration
Redundancy
Detection and resolution
of data value conflicts.
Schema Integration:
Integrate meta data from different sources.
The real word entities from multiple source be
matched referred to as the entity identification problem.
3. Redundancy
Redundancy:
An attribute may be redundant if it cam be
derived or obtaining from another attribute or set of
attribute.
Inconsistencies in attribute can also cause
redundancies in the resulting data set.
Some redundancies can be detected by
correlation analysis.
4. Detection and resolution of data value
conflicts
Detection and resolution of data value conflicts:
This is the third important issues in data
integration.
Attribute values from another different
sources may differ for the same real world entity.
An attribute is one system may be
recorded at a lower level abstraction then the “same “
attribute in another.
5. DATA PREPROCESSING IN DATA
MINING
Preprocessing in data mining:
data preprocessing is a data mining
technique which is used to transform the raw data in a
useful and efficient format.
Steps involved in data preprocessing:
1.Data preprocessing:
The data can have many irrelevant and
missing parts. To handle this part, data cleaning is
done.
6. Missing data
(a)Missing data:
This situation arises when some data is
missing in the data. It can be handled in various ways.
some of them are:
1.Ignore the tupes:
This approach is suitable only when the
dataset we have is quite large and multiple values.
7. Missing data
2.Filling the missing values:
There are various ways to do this task.
you can choose to fill the missing values manually.
( b)Noisy data:
noisy data is a meaningless data that can’t be
interpreted by machines .It can be generated due to
generated due to faulty data collection, data entry
errors etc.It can be handled in following ways: