Presentation1.1

1. Data Mining Engine for Enterprise GIS AkashDwivedi (09IT6001) Under the guidance of Prof. S.K. Ghosh School of Information Technology Indian Institute of Technology, Kharagpur

2. OUTLINE 4/2/2011 2

3. OBJECTIVES 4/2/2011 3

5. traffic, bird habitats, global climate, logistics, ...

6. Object types:

8. Meteorology

9. Astronomy

10. Environmental studies, etc.4/2/2011 4

11. What is Special about Spatial Data 4/2/2011 5

12. Why Data Mining in Spatial Data 4/2/2011 6

13. Spatial Data + Web Services= OGC (Open Geospatial Consortium) 4/2/2011 7

14. Proposed Architecture of Enterprise GIS 4/2/2011 8 Semantic Resolution of query DB1 Client Map Overlay Query Broker (service composition) WFS DB 2 WFS Spatial Data mining Engine WMS DB n WPS Fig.1: Architecture of Enterprise GIS

15. Data Mining Engine Framework Fig.2: Data mining engine framework 4/2/2011 9

16. Spatial Outlier Detection

17. Spatial Outlier Fig.3 : Palm Beach county as spatial outlier (source : http://madison.hss.cmu.edu/buchanan-bush.gif) 4/2/2011 11

18. Spatial Outlier Detection Problem 4/2/2011 12

19. Back To Our Motivating Example:- 4/2/2011 13

20. Results (Classical Data Mining Algorithms) 4/2/2011 14

21. Results for the above methods Fig.4 :Outliers in red color Fig. 5:Outliers in Brown color 4/2/2011 15

22. Results(Spatial data mining algorithms) 4/2/2011 16

23. LAG based approach 4/2/2011 17

24. LAG based approach contd.. Fig. 6: LAG Based Box Map 4/2/2011 18

25. Using Moran Scatter Plot Fig.7 Moran scatter plot, yellow points are spatial outliers 4/2/2011 19

26. Verification Fig. 8: LISA cluster map, Outliers in Red color 4/2/2011 20

27. Verification Contd… Fig. 9 : Relation between HR7984 and PE82 4/2/2011 21

28. Any Reasons Fig.12 :Scatterplotbw RDAC80 and HR7984 outliers in yellow color. . 4/2/2011 22

29. Spatial Cluster Analysis

30. 4/2/2011 24

31. While choosing a clustering algorithm many factors have to be considered like: 4/2/2011 25

32. Spatial Clustering Problem Definition 4/2/2011 26 Given,

33. Problem Definition Contd… 4/2/2011 27

34. Back To Our Motivating Example:- 4/2/2011 28

35. Experimental Setup 4/2/2011 29 Table 1. Experimental Setup details

36. Analysis Histogram Figure 13: Histogram of House price Data We can roughly model with a mixture of components. 4/2/2011 30

37. Results for K=2, Using NEM Figure 14: Clustering Results for K=2, High priced Houses in in Brown color 4/2/2011 31

38. Results for k=3, Using NEM Figure 15:k=3, High Prices building shown in red color 4/2/2011 32

39. Semantic Enrichment using spatial clustering

40. Problem Definition 4/2/2011 34

41. Proposed Solution 4/2/2011 35

42. Framework Figure 16: Semantic enrichment of clusters 4/2/2011 36

43. Framework Contd… 4/2/2011 37

44. Reasoning of ontology for implicit knowledge 4/2/2011 38

45. Results: Ontology Figure 17:Data ontology for Baltimore House price data 4/2/2011 39

46. Results Contd… Reasoning , ABox reasoning done to this ontology using SPARQL. Sample Query: Figure 18:SPARQL Query page 4/2/2011 40

47. Results Contd… Result for the given query Figure 19: Result for the given query 4/2/2011 41

48. Future Work 4/2/2011 42

49. References 4/2/2011 43 [1] P. Bolstad, "GIS fundamentals," A first text on Geographic Information Systems, 2002. [2] S. and Chawla, S. Shekhar, "Spatial databases: a tour," Upper Saddle River, New Jersey, vol. 7458. [3] K. and Adhikary, J. and Han, J. Koperski, "Spatial data mining: progress and challenges survey paper," in Proc. ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, Montreal, Canada., 1996. [4] R. and Srikant, R. Agrawal, "Fast algorithms for mining association rules," in Proc. 20th Int. Conf. Very Large Data Bases, VLDB., 1994, vol. 1215, pp. 487--499. [5] J.R. Quinlan, C4. 5: programs for machine learning.: Morgan Kaufmann, 1993. [6] V. and Lewis, T. Barnett, Outliers in statistical data. New York: Wiley , 1994. [7] A.K. and Dubes, R.C. Jain, Algorithms for clustering data., 1988. [8] L. and Procopiuc, O. and Ramaswamy, S. and Suel, T. and Vitter, J.S. Arge, "Scalable sweeping-based spatial join," in PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES., 1998, pp. 570--581. [9] Y. Chou,.: Onward Press, 1997. [10]H.P. Kriegel, R.T. Ng, and J. Sander M.M. Breunig, "Optics-of: Id ntifying local outliers," Proc. of PKDD, pp. 262-270, 1999.

50. References Contd… 4/2/2011 44 [11] V. Barnett and T. Lewis, Outliers in Statistical Data. New York: John Wiley, 1994. [12] M.M Breunig, H.P. Kriegel, and J. Sander M. ankerst, "Ordering points to identify the clustering," International conference on Management of Data, pp. 49-60, 1999. [13] R. Johnson, Applied Multivariate Statistical Analysis.: Prentice Halt, 1992. [14] R. Rastogi, and K. Shim. S. Ramaswamy, "Efficient algorithms for mining outliers from large data sets," Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, vol. 29, pp. 427-438, 2000. [15] Shashi and Lu, Chang-Tien and Zhang, PushengShekhar, "A Unified Approach to Detecting Spatial Outliers," Geoinformatica, vol. 7, no. 2, pp. 139--166, June 2003. [16] Anselin Luc, "Exploratory spatial data analysis and geographic information systems," in New Tools for Spatial Analysis., 1994, pp. 45-54. [17] D. and Hebeler, J. and Dean, M. Kolas, "Geospatial semantic web: Architecture of ontologies," GeoSpatial Semantics, pp. 183--194, 2005. [18] T. and Vt, "Creating and using geospatial ontology time series in a semantic cultural heritage portal," in Proceedings of the 5th European semantic web conference on The semantic web: research and applications.: Springer-Verlag, 2008, pp. 110—123.

51. References Contd… 4/2/2011 45 [19]P. and Di, L. and Yang, W. and Yu, G. and Zhao, P. and Gong, J. Yue, "Semantic Web Services-based process planning for earth science applications," International Journal of Geographical Information Science, vol. 29, no. 9, pp. 1139--1163, 2009. [20]M. and Ghosh, SK Paul, "oward Assessing Semantic Similarity of Geospatial Services," in TENCON 2006. 2006 IEEE Region 10 Conference., pp. 1--4. [21]E. and Lutz, M. and Kuhn, W. Klien, "Ontology-based discovery of geographic information services--An application in disaster management," Computers, environment and urban systems, vol. 30, no. 1, 2006. [22]Anselin Luc, "Local indicators of spatial association: LISA," Geographical Analysis, vol. 27, no. 2. [23]L. Anselin, D. Hawkins, G. Deane, S. Tolnay, R. Baller S. Messner. (2000) [Online]. http://www.ncovr.heinz.cmu.edu/ [24]ShashiShekhar,Weili Wu, and UygarOzesmi Sanjay Chawla, "Predicting Locations Using Map Similarity(PLUMS): A Framework for Spatial Data Mining," in MDM/KDD, Simeon J. Simoff and Osmar R. Za, Ed. Boston, MA, USA: University of Alberta, 2000, pp. 14-24. [25]Robin A. Dubin. (1992) geodacenter.asu.edu. [Online]. http://geodacenter.org/downloads/data-files/baltimore.zip

52. References Contd… 4/2/2011 46 [26] P. Zhang, Y. Huang, R. Vatsavai S. Shekhar, "Trend in Spatial Data Mining," in Data Mining: Next Generation Challenges and Future Directions.: AAAI/MIT Press, 2003. [27] C. and Govaert, G. Ambroise, "onvergence of an EM-type algorithm for spatial clustering," pattern recognition letters, vol. 19, no. 10, pp. 919--927, 1998. [28] N. Alameh, "Chaining geographic information web services," IEEE Internet Computing, vol. 7, no. 5, pp. 22--29, 2003. [29] A. and Lucchi, R. and Lutz, M. and OstlFriis-Christensen, "Service chaining architectures for applications implementing distributed geographic information processing," International Journal of Geographical Information Science, vol. 23, no. 5, pp. 561--580, 2009. [30] P. and Gong, J. and Di, L. and He, L. and Wei, Y. Yue, "Integrating semantic web technologies and geospatial catalog services for geospatial information discovery and processing in cyberinfrastructure," GeoInformatica, 2009.

53. 4/2/2011 47

54. Box Map Since box maps are based on the same methodology as box plots, they can be used to detect outliers in a stricter sense than is possible with percentile maps. Box maps group values such as counts or rates into six fixed categories: Four quartiles (1-25%, 25-50%, 50-75%, and 75-100%) plus two outlier categories at the low and high end of the distribution. Values are classified as outliers if they are 1. 5 times higher than the interquartile range (IQR). IQR is the difference between the 75th percentile (Q3) and the 25th percentile (Q1) or Q3-Q1. It describes the range of the middle of the distribution since 25% of values are above the interquartile range and 25% below it. 4/2/2011 48

55. Box Plot Box plots are particularly useful to identify outliers and gain an overview of the spread of a distribution. The box plot (sometimes referred to as box and whisker plot) is a non-parametric method. For normally distributed data, the median corresponds to the mean and the interquartile range to the standard deviation. The box plot shows the median, first and third quartile of a distribution (the 50%, 25% and 75% points in the cumulative distribution) as well as outliers. An observation is classified as an outlier when it lies more than a given multiple of the interquartile range (the difference in value between the 75% and 25% observation) above or below respectively the value for the 75th percentile and 25th percentile. The standard multiples used are 1.5 and 3 times the interquartile range. The red bar in the middle corresponds to the median, the dark part shows the interquartile range. The individual observations in the first and fourth quartile are shown as blue dots. The thin line is the hinge, corresponding to the default criterion of 1.5. 4/2/2011 49

Presentation1.1

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (9)

Similar to Presentation1.1

Similar to Presentation1.1 (20)

Recently uploaded

Recently uploaded (20)

Presentation1.1