SlideShare ist ein Scribd-Unternehmen logo
1 von 70
Downloaden Sie, um offline zu lesen
Geospatial Open Data and Urban Growth Modelling for
Evidence-based Decision Making in perspective of Smart
Cities
PIYUSH YADAV
17/01/2020 Ā© Lero 2015 2
About me
ā‘ Researcher at Insight Center for Data
Analytics and Lero Software Research Centre at NUI
Galway (NUIG)
ā‘ Researcher- CTO at Tata Research Development and
Design Centre (TRDDC) which is part of TCS
Innovation Lab , Member of project in collaboration
with IIT Bombay.
ā‘ M.Tech. (CSE) with specialization in information
security at IIIT Delhi in 2013, Research Assistant
McGill Univ. Canada.
ā‘ Research Interest : Complex Event Processing,
Video Analytics, Distributed Systems, Machine
Learning, Smart Cities, GIS and Remote Sensing
ā‘ Publications : 17 Conference Papers, 1 Journal, 1
Book Chapter, 6 Posters, 2 Patents Filed , 1 Industry
Report (Dell)
Twitter
LinkedIn
Website
Contact
17/01/2020 Ā© Lero 2015 3
ā€¢ Learning Outcomes
ā€¢ Geospatial Data
ā€¢ Classification for Satellite Images
ā€¢ Case Study: Urban Growth Modelling
ā€¢ Multi-source Open Data Management
ā€¢ Quality Issues in Multi-source Open Data
ā€¢ Techniques for data preparation and cleaning
ā€¢ Assignments
Outline
17/01/2020 Ā© Lero 2015 4
Learning Outcomes
You will learn:
ā€¢ Importance of Geospatial Data and Land Use Land Cover in development of Smart Cities
ā€¢ Fundamentals of Satellite Image Classification
ā€¢ How to model urban growth and predict future growth of city.
ā€¢ Importance of Open Data in Smart Cities
ā€¢ Explain the nature and types of data issues in (Open) Data
ā€¢ Discuss techniques for identifying data quality issues
ā€¢ Demonstrate data preparation and cleaning strategies (e.g., data clustering, filtering, etc.)
17/01/2020 Ā© Lero 2015 5
Copernicus Hackathon Ireland 2019
ā€¢ Last Year 3 teams participated from this class.
ā€¢ 2 teams won the prize.
Air Quality
Aftab Alam, Nikhil Nambiar, Vignesh Kamath
https://prezi.com/view/iZEygJaFnxqAJH7lR9TM/
Smart Agriculture
17/01/2020 Ā© Lero 2015 6
Geospatial Data
Geospatial data or spatial data (as it's sometimes known), is information that has a geographic aspect to it
āž¢ Coordinates: Lat Long
āž¢ Postal Address
āž¢ Physical Features
Vector - This form uses points, lines, and polygons to represent spatial features such as cities, roads, and
streams.
Raster - This form uses cells (computer often use dots or pixels) to represent spatial features.(our focus in this
lecture)
Types
https://www.bolton-
menk.com/books/lindsey/Lindsey.html
17/01/2020 Ā© Lero 2015 7
Satellite Imagery: Basics
How we see colour Electromagnetic Spectrum
ā€¢ Electromagnetic (EM) spectrum describes the continuous spectrum of energy from
high energy gamma rays and x-rays to very low energy microwaves and radio
waves.
ā€¢ Visible light, or light that our eyes can detect, is just a small portion of the EM
spectrum.
ā€¢ Satellites collect data by passing the reflected energy from the Earth through filters that separate the energy
into small windows of the EM spectrum into discrete spectral bands (Raster Image)
Satellite Imaging
https://landsat.usgs.gov/atmospheric-transmittance-information
17/01/2020 Ā© Lero 2015 8
Multispectral(3-10 bands)
Hyperspectral(100-1000 bands(nm))
Normal Image (3 bands
Red, Green, Blue)
Image Bands/Channels
An image constitute of multiple bands from this electromagnetic spectrum.
http://www.splibtarang.com/index.php
Stack of Bands ~ Tensor
17/01/2020 Ā© Lero 2015 9
LANDSAT Satellite Images
ā€¢ Landsat program is the longest-running enterprise for acquisition of satellite imagery of Earth by Nasa
ā€¢ Till now 8 satellites
ā€¢ Landsat 1- launched 1972, Landsat 7- 1999, Landsat 8 -2013
ā€¢ Can download data from : https://earthexplorer.usgs.gov/
Landsat 7 Bands Landsat 8 Bands Scan Line Correction Issue
In Landsat 7 (2003)
Other Earth Observation
Satellites
17/01/2020 Ā© Lero 2015 10
Pre-processing of Landsat Image
Cracknell, A. (2007). Atmospheric Corrections to Passive Satellite Remote Sensing Data. In A. Cracknell, Introduction To Remote Sensing, Second Edition (p. 196). CRC Press.
Retrieved September 1, 2015
Kaufman, Y. J. (1989). The atmospheric effect on remote sensing and its correction. In Theory and applications of optical remote sensing (pp. 336-428).
Atmospheric Correction
Solar Correction
ā€¢ Electromagnetic radiation captured by the satellite sensors is affected
because of the atmospheric interference such as scattering,
dispersion, etc.
ā€¢ Subtract the digital number (DN) of water pixels in band 4 (infrared
band) as it has very low water leaving radiance (Cracknell 2007).
ā€¢ DN values were then converted to spectral radiance (Kaufmann 1989).
š‘³ = š‘³ š’Žš’Šš’ +
š‘³ š’Žš’‚š’™
šŸšŸ“šŸ’
āˆ’
š‘³ š’Žš’Šš’
šŸšŸ“šŸ“
š’™ š‘«š‘µ
ā€¢ For clear Landsat images, solar correction of the images was done by converting
spectral radiance to exoatmospheric reflectance (Kaufmann 1989).
š† š’‘ =
š… ā‹… š‘³ š€ā‹… š’… šŸ
š‘¬š‘ŗš‘¼š‘µ š€ ā‹… š’„š’š’”šœ½ š’”
17/01/2020 Ā© Lero 2015 11
17/01/2020 Ā© Lero 2015 12
Pre-processing of Landsat Image
Band 1 Band 2
Band 3ā€¦ā€¦..
Converted to
Reflectance
https://drive.google.com/drive/folders/1KGQmkZ7bN2M-ED31sDNWVtX29VntfzWs
View Using KML on Google Earth. Download file from below link
R
G
B
ā€¦
17/01/2020 Ā© Lero 2015 13
Classify Landsat Image (Supervised Learning)
Create Training Data
Class ID Class Name Location(x,y)
1 Vegetation
2 Impervious Surface(Built Up)
3 Soil
4 Water
Train Model
ā€¢ Maximum Likelihood
ā€¢ SVM
ā€¢ DNN
Spectral Signature for Different Classes
Classify
17/01/2020 Ā© Lero 2015 14
Classified Image
17/01/2020 Ā© Lero 2015 15
World Population is growing
Increased Economic Activities
Increased Urban Growth Rate
Case Study: Urban Growth Modelling
An Aerial View of urban growth in 2006 and 2014
Urban Growth
Change in Land Use Land Cover
17/01/2020 Ā© Lero 2015 16
A KEY ASPECT OF
URBAN GROWTH
IS AFFECT ON
LAND USE LAND
COVER CHANGE
LAND COVER
INDICATES
THE PHYSICAL
LAND TYPE SUCH
AS FOREST OR
OPEN WATER
LAND USE
DOCUMENTS HOW
PEOPLE ARE USING
THE LAND SUCH AS
AGRICULTURE
Land Use Land Cover Change(LULCC)
17/01/2020 Ā© Lero 2015 17
Factors Affecting Land Use Land Cover
ā€¢ Predominantly, change over space but
remain relatively static with respect to
time.
ā€¢ Digital Elevation Model (DEM)
Spatial
Factors
ā€¢ Change over both time and space.
ā€¢ Proximity to the primary roads
Spatio-
temporal
Factors
ā€¢ Change over time but spatially static
for a given study area.
ā€¢ National Gross Domestic Product
(GDP)
Temporal
Factors
Direct
Factors
Indirect
Factors
Land Use
Land Cover
Change
17/01/2020 Ā© Lero 2015 18
Urban Growth Models
Thus the lattice based spatio-temporal models, e.g. Cellular
Automata (CA) and Logistic Regression (LR), are effectively used to
model the spatial geographic processes.
LULC images of two distinct time instances are taken and the
probabilities are computed using the frequency of change from one
LULC class to another and generate transition probability matrix.
Urban Growth models are used for prediction of land use land cover
(LULC) changes. LULC modeling is extremely difficult due to
complex interactions between multi-scale factors.
Schematic of an integrated Markov
Chain model
Limitation: Persistent Growth Rate
17/01/2020 Ā© Lero 2015 19
Our Contribution
Hidden Markov
Model
Introduction of Hidden
Markov Model (HMM)
Temporal Factors
Incorporate temporal
factors in LULC change
modelling using HMM.
Model the underlying
temporal factors as
Gaussian distributions,
conditioned on the
hidden states, to learn
land cover type
transition probabilities
Integrate
Integrate our model
with other spatio-
temporal models such
as Logistic Regression
(LR) to yield richer
integrated models than
the corresponding MC
based integrated
models.
An urban growth model with
multi-scale direct and indirect
factors impacting LULC changes
17/01/2020 Ā© Lero 2015 20
Our Model
A Hidden Markov Model with hidden states
(V, I, S) and sample emissions (GDP and
Liquidity)
Proposed urban growth model: HMM
integrated with Logistic Regression model
17/01/2020 Ā© Lero 2015 21
Study Area: Pune
ā€¢ Tier-A city situated in the state of Maharashtra, India.
ā€¢ Located 560 m above the sea level.
ā€¢ Famous for Information Technology and Automobile industries and various research institutes.
ā€¢ Considered 45 sq. km of the city area which have gone under rapid urbanisation.
17/01/2020 Ā© Lero 2015 22
Temporal Growth Factors
Gross Domestic Product
National. Amount of goods and services produced within the border of a
country in a specific time interval.
Interest Rate Cycle National. Revised bimonthly. A tight monetary policy affects the overall
investment policy which leads to slowdown and vice versa.
Consumer Price Index National. Low inflation creates developmental investment environment.
Gross Fixed Capital Formation
National. Amount that government spends in the capital formation(such
as infrastructure building, land improvements) of the country. Greater
the GFCF investment higher is the rate of urbanization .
Urban Population Growth Rate
National. In order to accommodate a higher influx of people, cities
are expanding along their outskirts, leading to the growth in urban
agglomerate.
Electricity Consumption Regional. Typically, regions with higher electricity demand grow
faster than those with lesser demand.
Road Length Added
Regional. Better connectivity of a region helps in better transportation
and thus provides impetus to growth by allowing setup of new industrial
complexes and other infrastructure services.
17/01/2020 Ā© Lero 2015 23
Temporal Growth Factors Data
GDP growth rate (%)
Absolute average CPI Inflation (%)
Gross fixed capital formation (%GDP)
Urban population growth rate (%)
Bimonthly interest (repo) rate (%)
Per capita electricity consumption in
kilowatt-hours
17/01/2020 Ā© Lero 2015 24
Land Use Land Cover (LULC) Data
LULC data is required for HMM hidden states and LR models as an input.
Time
period
Yearly, 2001 to 2014 (between March to
April)
Latitude 18.38847838Ā°N - 18.79279909Ā°N
Longitude 73.64552005Ā°E - 74.07494971Ā°E
Bands 1 to 7
Resolution 30m
Pixels 1500 š‘„ 1500
Landsat 7
Landsat-7 Specifications
Scan Line Correction (SLC)
ā€¢ In 2003 Landsat-7 SLC in ETM+ instrument has developed a fault thus creating
some black lines in the captured images.
ā€¢ Image Smoothening using windowing.
LULC Data Pre-processing
Atmospheric Correction: explained earlier
Solar Correction: explained earlier
17/01/2020 Ā© Lero 2015 25
ā€¢ Classified into seven broad LULC classes on the basis of the nature of the
landscape.
ā€¢ Forest Canopy, Agriculture Area, Residential Area, Industrial Area, Common
Open Area, Burnt Grass, Bright Soil, and Water Body.
Classes
ā€¢ For classification a labeled set of pixels for each class of interest was collected
(500 to 3000 samples per class). The feature vector for each pixel consisted of
all seven band values.
ā€¢ Support Vector Machines
ā€¢ Manual Correction (Concrete and Quarry)
SVM Classification
ā€¢ Vegetation, Impervious Surface, and Soil
VIS Classes
LULC Data Classification
17/01/2020 Ā© Lero 2015 26
A Quick Recap
LULC Data
17/01/2020 Ā© Lero 2015 27
Spatio-Temporal Factors
Digital Elevation Model (DEM) and Slope
Proximity to primary roads:
Mask
CARTOSAT 1
Water bodies were masked out from the LULC image
3 D View
DEM Image
Primary Road Layers
17/01/2020 Ā© Lero 2015 28
Results
HMM Experiments
Computed MC transition probabilities for 2001-2002, Learned HMM transition probabilities for
2014, Computed MC transition probabilities for 2014
ā€¢ Used Gaussian HMM library in Scikit Learn
ā€¢ We designed a HMM with the three hidden states (V, I, and S) and temporal factors
ā€¢ HMM was initialized with MC transition probabilities for the year 2001 to 2002
ā€¢ A stable model was obtained empirically after 50000 iterations with a threshold of less than 0.01
17/01/2020 Ā© Lero 2015 29
Results
Land Change Modelling Experiments
ā€¢ Terrsetā€™s Land Change Modeler.
ā€¢ Transition sub-models were defined for four LC change types, i.e., V to S, V to I, S to V, and S to I.
ā€¢ Slope gradient and primary roads layer were used as the primary driver variables .
š’”š’–š’Šš’•š’‚š’ƒš’Šš’š’Šš’•š’š =
šŸ
š’”š’š’š’‘š’† š’ˆš’“š’‚š’…š’Šš’†š’š’• šŸŽ.šŸ
ā€¢ Suitability map. Greater the value higher the suitability and vice-versa.
ā€¢ Suitability for urbanization is high in areas such as roads, low lying
river basin, and around the urbanized areas where the slope gradient
is less.
ā€¢ Towards, the south end the suitability drops significantly, as the area
has hills and valleys.
ā€¢ Four of the sub models were built using Logistic Regression.
17/01/2020 Ā© Lero 2015 30
Results
Soil to Impervious Soil to Vegetation
Vegetation to Impervious Vegetation to Soil
Heat maps depicting transition probabilities from one state to another
17/01/2020 Ā© Lero 2015 31
ā€¢ The two models were then used to predict changes for the year 2014.
Results
Actual land cover image of
2014 obtained from
classification
Predicted land cover image
of 2014 (HMM-LR)
Predicted land cover image
of 2014 (MC-LR)
ā€¢ Visually it is evident that the HMM based predicted image is significantly better, in terms of similarity with
the actual classified LC image than the MC based predicted image .
17/01/2020 Ā© Lero 2015 32
HMM-LR MC-LR
V I S V I S
Precision 0.48 0.49 0.60 0.54 0.38 0.34
Recall 0.48 0.52 0.59 0.54 0.32 0.39
Results
ā€¢ Blob Analysis of urban and non urban regions. Blobs denote concentrated urban regions.
ā€¢ Green blobs are true positives, blue blobs are false negatives, and red blobs are the false positives.
ā€¢ HMM-LR false positives are smaller in size and less dense than those of the MC-LR. The HMM output is well
balanced and resembles the actual output better.
ā€¢ 11% increment in precision of the persistence of Impervious Surface (I) is observed.
ā€¢ Precision of Soil (S) class type has jumped up by 26%.
ā€¢ Drop in the precision of Vegetation (V) class type by a marginal 6% . This is because vegetation cover is an
outcome of relatively easy process as compared to S and I .
Blob Analysis of urban areas. Left to right: (i) Actual, (ii)
MC-LR, (iii) HMM-LR
Precision and Recall for integrated models
17/01/2020 Ā© Lero 2015 33
Conclusion
ā€¢ Markov Chain (MC) models are limited in their urban prediction capabilities due to the
assumption of constant rate of persistence of land cover class types and inability to model the
temporal factors.
ā€¢ We have proposed a new temporal model using Hidden Markov Model.
ā€¢ We have demonstrated the usefulness of our model over MC by predicting urban growth for
an upcoming city of India (Pune).
ā€¢ We believe that this inquiry into HMM based models provides yet another tool that will
equip the urban modelers, planners and decision makers to better design sustainable
urban environments.
ā€¢ 11% and 26% increment of precision in Impervious Surface and Soil Class respectively.
https://www.researchgate.net/publication/327745849_Computational_Model_for_Urban_Growth_Using_Socioeconomic_Latent_Parameters
17/01/2020 Ā© Lero 2015 34
Open Data
17/01/2020 Ā© Lero 2015 3535
Open Data
17/01/2020 Ā© Lero 2015 3636
10030 112
https://data.gov.ie/stats
17/01/2020 Ā© Lero 2015 37
How is Open Data being used?
Engagement/Innovation
https://www.mapalerter.com/
Data Modelling / Decision-Making
http://exceedence.com/monetising-metocean-data-an-open-data-
project/
17/01/2020 Ā© Lero 2015 38
Monitoring / Planning
Quality and Qualifications Ireland
http://infographics.qqi.ie/
Sustainability / Mobility
https://citybik.es/
17/01/2020 Ā© Lero 2015 39
Open Data Management Challenge
39
17/01/2020 Ā© Lero 2015 40
From Data to Smart Data
40
Data
Sources
Predictive
Analytics User
Awareness
Recommen-
dations
Smart Apps
Open Data
Management
Data Modeling
Collection
Aggregation
Enrichment
Linking
Classification
Cleaning
Integration
Storing
Querying
Is this data good enough for creating accurate and reliable apps?
17/01/2020 Ā© Lero 2015 41
Open Data Management Challenge
Open Data Quality can be very challenging for designing apps and decision support
models
Open Data can have multiple issues: missing values, different formats, irregular
timestamps, abnormal values, etc.
Data preparation such as filtering and classification is an important step for further
analysis
Data is not complete and require combining multiple data sources
41
17/01/2020 Ā© Lero 2015 42
Case Specifics
42
17/01/2020 Ā© Lero 2015 43
Data Preparation for Building a map of Playing Pitches around Dublin
43
The data is available on https://data.gov.ie
Different
Formats
17/01/2020 Ā© Lero 2015 44
And even more challenges!
44
Different
Formats
Different
Attributes
Missing
Values
17/01/2020 Ā© Lero 2015 45
And even more challenges!
45
Different
Formats
Different
Attributes
Missing
Values
Objective: Create a good
quality dataset from these
resource!
17/01/2020 Ā© Lero 2015 46
What is a good quality data?
46
A Conventional Definition of Data Quality
Good quality data are:
Accurate, Complete, Unique,
Up-to-date, and Consistent ;
meaning ā€¦
17/01/2020 Ā© Lero 2015 47
Accurate means ā€¦
Are we storing correct values?
āž” Values in the data entries should be consistent: Same form or
value representation
47
Sensor Timestamp Value Location
M1n 12/01/2018T10:03:59 12.3 Galway
M3n 1452592980000 9.5 GA
M5n 01/12/2018 10:03 1.55 NUIG
Example: What issues can you identify from this table?
17/01/2020 Ā© Lero 2015 48
Possible solution
48
Create a Unified Data Model
Do you have access to the data source?
Convert your data before
further processing
Adjust sources to send
data using your model
NoYes
Accurate means ā€¦
17/01/2020 Ā© Lero 2015 49
Complete means ā€¦
Does the data contain everything it is supposed to contain?
49
Sensor Timestamp Value Location
M1n 08/01/2018T00:00:00 32.5 NEB, NUIG
M1n 09/01/2018T00:00:00 21.2
M1n 10/01/2018T00:00:00 26.1 NEB, NUIG
M1n 12/01/2018T00:00:00 23.5 NEB, NUIG
M1n 13/01/2018T00:00:00 NEB, NUIG
M1n 14/01/2018T00:00:00 26.1 NEB, NUIG
Example: What issues can you identify from this table?
17/01/2020 Ā© Lero 2015 50
Unique means ā€¦
Do the data entries appear only once?
āž” This issue generally appears when manual entries are allowed in
the dataset
50
Surname Firstname DoB Driving test passed:
Smith J. 17/12/85 17/12/05
Smith Jack 17/12/85 17/12/2005
Smith Jock 17/12/95 17/12/2005
Example: What issues can you identify from this table?
17/01/2020 Ā© Lero 2015 51
Consistent means ā€¦
Does the data contain any logical errors or impossibilities?
51
Sensor Timestamp Value Location
M1n 08/01/2018T00:00:00 32.5 NEB, NUIG
M1n 09/01/2018T00:00:00 21.2 NEB, NUIG
M1n 10/01/2018T00:00:00 0 NEB, NUIG
M1n 11/01/2018T00:00:00 23.5 NEB, NUIG
M1n 12/01/2018T00:00:00 -1.23 NEB, NUIG
M1n 13/01/2018T00:00:00 26.1 NEB, NUIG
Example: What issues can you identify from this table?
Are these errors? How can we identify them?
āž” Possible solutions: Filtering and Outliers detection.
17/01/2020 Ā© Lero 2015 52
Up-to-Date means ā€¦
Is the data updated regularly?
52
A sensor moved to a new location.
What implications can this have?
Can you think of a case where it doesnā€™t
matter whether or not the data are kept up
to date?
17/01/2020 Ā© Lero 2015 53
Techniques for Data Preparation
53
17/01/2020 Ā© Lero 2015 54
Minimal Data Preparation Pipeline
54
Observation Quality Enhancement
Understanding
the format of the
data and its
elements
Classification,
Aggregation, Filtering,
Enrichment, etc.
Modeling
Identify relevant
attributes and
representation
format
17/01/2020 Ā© Lero 2015 55
Step 1: Observation
This step involves the descriptive analysis (auditing) of individual
data resources
Data observations can be:
ā€“ Highly structured: by having a predefined checklist of observational attributes
(e.g., format, attributes, frequency, volume, language, etc.)
ā€“ Semi-structured: by having an ad-hock checklist of observational attributes
55
ā€¢ Cons:
ā€“ Can be time
consuming
ā€¢ Pros:
ā€“ Define contextual information about the
data
ā€“ Provides good and early insights into data
quality issue
17/01/2020 Ā© Lero 2015 56
Step 2: Modeling
This step involves the use of formal techniques for creating a data model
Examples of techniques: Object-Relational mapping, Relational model etc.
Methodologies:
ā€“ Top-down: predefined information about the data
ā€“ Bottom-up: results from a reengineering effort
56
17/01/2020 Ā© Lero 2015 57
Step 3.1: Classification
Data classification is the process of organizing data by categories
for refined and targeted analysis
āž”Example: Water or Energy consumption for working days vs. non
working days
āž”Categories depend on the intended use of the data
57
17/01/2020 Ā© Lero 2015 58
Step 3.2: Aggregation
Data aggregation is a data mining process that summarizes the data with
respect to certain criteria/dimensions.
Data aggregations help increase search performance
Facilitates data reporting and analysis
Types of aggregations: Sum, Count, Min/Max , AVG, etc.
Aggregation strategies and levels: temporal (hourly, daily, etc.), source-based
(resources hierarchy), location-based (outlet, room, area, building), etc.
58
The level of aggregations depends on the available data and its
intended use
Example of useful aggregations: Hourly traffic congestion level per road.
Quarterly Inflation price
17/01/2020 Ā© Lero 2015 59
Data filtering is the process of refining data sets by removing data items that do not comply to
certain criteria
Example: Keep data with positive water consumption values
Filters depend on the context of the observations (negative values may be meaningful in
installations where water flows in both directions on a pipe)
59
Step 3.3: Filtering
Content-based Filtering
ā€“ Selecting data items based on their values
(e.g., keep only positive values)
Policy-based Filtering
ā€“ Filtering rules are defined as constrains
similar to access control mechanisms (e.g.,
for security reasons)
Statistical Filtering
ā€“ Identify a baseline for a content-based filtering
ā€“ Baselines are determined from historical data analysis
ā€“ Outliers detection
Hybrid Filtering
ā€“ Combination of filtering options
Filtering Types
17/01/2020 Ā© Lero 2015 6060
Step 3.3: Filtering Outliers Detection
Value inconsistent with rest of
the dataset ā€“ Global Outlier
Special outliers ā€“ Local Outlier
ā€¢ Observations inconsistent with their
neighborhoods
ā€¢ A local instability or discontinuity
āž¢ Low quality measurements: faulty collectors, manual
errors, wrong calibrations of devices
āž¢ Network issues: problems with data transmission from
data sources to the data management platform
āž¢ Missing values or redundant values: can create wrong
aggregations
āž¢ Correct but exceptional data!
Causes of Outliers
17/01/2020 Ā© Lero 2015 61
Outlier Detection Approaches
Deviation-based outlier detection
ā€“ Sequential exception
Distance-based outlier detection
ā€“ Index-based, nested-loop, cell-based, local-outliers
Statistical-based outlier detection
ā€“ Distribution-based, depth-based
61
17/01/2020 Ā© Lero 2015 62
Distance-based Outlier Detection
62
ā€¢ General idea:
ā€“ Judge a point based on the distance to its neighbors
ā€“ Several variants proposed
ā€¢ Basic Assumption:
ā€“ Normal data objects have a dense neighborhood
ā€“ Outliers are far apart from their neighbors
ā€¢ Basic Model:
ā€“ Given a radius
ā€“ A point is considered
an outlier if at least šœ«
percent of all other
points have a distance
to šœ« less than š“
17/01/2020 Ā© Lero 2015 63
Step 3.4: Enrichment
This step supplements/adds additional information to the data.
Possible techniques:
ā€“ Additional information can be accessed from other resources
ā€“ Use of services such as translation, value conversion, adding a zip code, etc.
ā€“ [In case of semantic linked data] Linking to other concepts through new predicates.
63
17/01/2020 Ā© Lero 2015 64
Summary
64
Discussed Land Use Land Cover
Discussed Satellite Imaging and Classification
Discussed Case study on Urban Growth Modelling
Discussed the challenges of developing decision support systems with Open Data (e.g., need for
accurate trusted information)
Explained the nature and types of data issues in (Open) Data: different formats, missing values,
Discussed techniques for identifying data quality issues
Discussed data preparation and cleaning strategies (e.g., data clustering, filtering, etc.)
Identified a minimal data preparation pipeline
17/01/2020 Ā© Lero 2015 6565
Rahm, Erhard, and Hong Hai Do. "Data cleaning:
Problems and current approaches." IEEE Data
Eng. Bull. 23.4 (2000): 3-13.
Assigned Reading
https://landsat.gsfc.nasa.gov/pdf_archive/How2make.pdf
17/01/2020 Ā© Lero 2015 66
Acknowledgments
I created this material from several resources:
ā€“ https://study.com/academy/lesson/geospatial-data-definition-example.html
ā€“ Data from USGS
ā€“ http://www.splibtarang.com/index.php
ā€“ Yadav, Piyush, Shamsuddin Ladha, Shailesh Deshpande, and Edward Curry. " Computational Model for Urban Growth
Using Socioeconomic Latent Parameters ", In Joint European Conference on Machine Learning and Knowledge Discovery in
Databases, pp. 65-78. Springer, Cham, 2018
ā€“ NASA, Landsat Website
ā€“ Data from https://data.gov.ie ,
ā€“ A ppt by David Corn, ā€œData Quality and Data Cleaning1ā€
ā€“ A ppt by Eric Poulin and Colin Yu, ā€œOutlier Detection and Analysisā€
ā€“ A paper by Erhard Rahm and Hong Hai Do, ā€ Data Cleaning: Problems and Current Approachesā€
ā€“ A ppt by Cameron Brooks, ā€œLets Build a Smarter Planet: IBM Smarter Water Managementā€
66
17/01/2020 Ā© Lero 2015 67
Further Reading
For further readings I recommend the following books
67
Book Link
17/01/2020 Ā© Lero 2015 68
Assignments
Group Assignment
Total 100 marks
Two Sections
Section 1- (30 marks)
ā€“ Objective- Classify a given Landsat Images of a Dublin region of two years using QGIS software and find one major
change that you can see between two images
ā€“ Marking Scheme: Report 100% (30 marks)
Section2- (70 marks)
ā€“ Objective: Create a complete and clean dataset by merging three datasets
ā€“ Dataset: Real world data from https://data.gov.ie
ā€¢ Playing pitches around Dublin
ā€¢ Multiple formats (minimum 2 are required)
ā€¢ Data completion using other sources
ā€“ Tools: Python or Java
ā€“ Marking scheme:
ā€¢ Report 50% (35 marks)
ā€¢ Code/Analytics 50% (35 marks)
17/01/2020 Ā© Lero 2015 69
Guidelines For Group
Two people in each group
Fill the group information by 21st Jan , 5pm. (Link Given Below)
Those who will not fill will be assigned random groups.
For any doubt you can mail me on piyush.yadav@insight-centre.org
Assignment Due: Jan 30th Midnight
https://docs.google.com/spreadsheets/d/1eTwNF6-OqvSGKZtv0WWREgRjnKt8w_OATEH6b18unJQ/edit?usp=sharing
THANK YOU
QUESTIONS

Weitere Ƥhnliche Inhalte

Was ist angesagt?

Moreno_EUEC2016_ICF_final
Moreno_EUEC2016_ICF_finalMoreno_EUEC2016_ICF_final
Moreno_EUEC2016_ICF_finalDaniel Moreno
Ā 
Geographic Information Systems in the Oil & Gas Industry
Geographic Information Systems in the Oil & Gas IndustryGeographic Information Systems in the Oil & Gas Industry
Geographic Information Systems in the Oil & Gas IndustryFrancois Viljoen
Ā 
GIS for Infrastructure Management
GIS for Infrastructure ManagementGIS for Infrastructure Management
GIS for Infrastructure ManagementDavid Puckett
Ā 
Application of gis in urban traffic air quality
Application of gis in urban traffic air qualityApplication of gis in urban traffic air quality
Application of gis in urban traffic air qualitySAMITINJAY SHARMA
Ā 
Tom Martlev - detailed geological modelling in urban areas focused on structu...
Tom Martlev - detailed geological modelling in urban areas focused on structu...Tom Martlev - detailed geological modelling in urban areas focused on structu...
Tom Martlev - detailed geological modelling in urban areas focused on structu...Geological Survey of Sweden
Ā 
Spring 2013
Spring 2013Spring 2013
Spring 2013Esri
Ā 
Application of gis and gps in civil engineering
Application of gis and gps in civil engineeringApplication of gis and gps in civil engineering
Application of gis and gps in civil engineeringAvinash Anand
Ā 
Icelandic Bathy model
Icelandic Bathy modelIcelandic Bathy model
Icelandic Bathy modelPeio Elissalde
Ā 
2018 GIS in Development: Developing a National Map of Subsurface Infrastructure
2018 GIS in Development: Developing a National Map of Subsurface Infrastructure2018 GIS in Development: Developing a National Map of Subsurface Infrastructure
2018 GIS in Development: Developing a National Map of Subsurface InfrastructureGIS in the Rockies
Ā 
INSPIRE and Land Use - The need for real harmonised data about urban plans
INSPIRE and Land Use - The need for real harmonised data about urban plansINSPIRE and Land Use - The need for real harmonised data about urban plans
INSPIRE and Land Use - The need for real harmonised data about urban plansPiergiorgio Cipriano
Ā 
Crowd-Sourcing Approach of Building Ground Truth Database for Global Urban Ar...
Crowd-Sourcing Approach of Building Ground Truth Database for Global Urban Ar...Crowd-Sourcing Approach of Building Ground Truth Database for Global Urban Ar...
Crowd-Sourcing Approach of Building Ground Truth Database for Global Urban Ar...Hiroyuki Miyazaki
Ā 
Gis powerpoint
Gis powerpointGis powerpoint
Gis powerpointkaushdave
Ā 
Digital Elevation Models
Digital Elevation ModelsDigital Elevation Models
Digital Elevation ModelsBernd Flmla
Ā 
Gis in transportation
Gis in transportationGis in transportation
Gis in transportationAshan Senarathne
Ā 
Introduction and Application of GIS
Introduction and Application of GISIntroduction and Application of GIS
Introduction and Application of GISSatish Taji
Ā 

Was ist angesagt? (20)

Moreno_EUEC2016_ICF_final
Moreno_EUEC2016_ICF_finalMoreno_EUEC2016_ICF_final
Moreno_EUEC2016_ICF_final
Ā 
Geographic Information Systems in the Oil & Gas Industry
Geographic Information Systems in the Oil & Gas IndustryGeographic Information Systems in the Oil & Gas Industry
Geographic Information Systems in the Oil & Gas Industry
Ā 
GIS for Infrastructure Management
GIS for Infrastructure ManagementGIS for Infrastructure Management
GIS for Infrastructure Management
Ā 
Urban planing & gis
Urban planing & gisUrban planing & gis
Urban planing & gis
Ā 
Application of gis in urban traffic air quality
Application of gis in urban traffic air qualityApplication of gis in urban traffic air quality
Application of gis in urban traffic air quality
Ā 
civil engineer
civil engineercivil engineer
civil engineer
Ā 
Tom Martlev - detailed geological modelling in urban areas focused on structu...
Tom Martlev - detailed geological modelling in urban areas focused on structu...Tom Martlev - detailed geological modelling in urban areas focused on structu...
Tom Martlev - detailed geological modelling in urban areas focused on structu...
Ā 
Spring 2013
Spring 2013Spring 2013
Spring 2013
Ā 
Application of gis and gps in civil engineering
Application of gis and gps in civil engineeringApplication of gis and gps in civil engineering
Application of gis and gps in civil engineering
Ā 
Icelandic Bathy model
Icelandic Bathy modelIcelandic Bathy model
Icelandic Bathy model
Ā 
Massachgusetts, USGS, and Fugro/Earthdata
Massachgusetts, USGS, and Fugro/EarthdataMassachgusetts, USGS, and Fugro/Earthdata
Massachgusetts, USGS, and Fugro/Earthdata
Ā 
2018 GIS in Development: Developing a National Map of Subsurface Infrastructure
2018 GIS in Development: Developing a National Map of Subsurface Infrastructure2018 GIS in Development: Developing a National Map of Subsurface Infrastructure
2018 GIS in Development: Developing a National Map of Subsurface Infrastructure
Ā 
INSPIRE and Land Use - The need for real harmonised data about urban plans
INSPIRE and Land Use - The need for real harmonised data about urban plansINSPIRE and Land Use - The need for real harmonised data about urban plans
INSPIRE and Land Use - The need for real harmonised data about urban plans
Ā 
Crowd-Sourcing Approach of Building Ground Truth Database for Global Urban Ar...
Crowd-Sourcing Approach of Building Ground Truth Database for Global Urban Ar...Crowd-Sourcing Approach of Building Ground Truth Database for Global Urban Ar...
Crowd-Sourcing Approach of Building Ground Truth Database for Global Urban Ar...
Ā 
Gis powerpoint
Gis powerpointGis powerpoint
Gis powerpoint
Ā 
Digital Elevation Models
Digital Elevation ModelsDigital Elevation Models
Digital Elevation Models
Ā 
DTM
DTMDTM
DTM
Ā 
Gis in transportation
Gis in transportationGis in transportation
Gis in transportation
Ā 
Introduction and Application of GIS
Introduction and Application of GISIntroduction and Application of GIS
Introduction and Application of GIS
Ā 
FLOOD MAP MODERNIZATION
FLOOD MAP MODERNIZATION FLOOD MAP MODERNIZATION
FLOOD MAP MODERNIZATION
Ā 

Ƅhnlich wie Geospatial Open Data and Urban Growth Modelling for Evidence-based Decision Making in perspective of Smart Cities

Computational Model for Urban Growth Using Socioeconomic Latent Parameters
Computational Model for Urban Growth Using Socioeconomic Latent ParametersComputational Model for Urban Growth Using Socioeconomic Latent Parameters
Computational Model for Urban Growth Using Socioeconomic Latent ParametersPiyush Yadav
Ā 
IRJET- Land Use & Land Cover Change Detection using G.I.S. & Remote Sensing
IRJET-  	  Land Use & Land Cover Change Detection using G.I.S. & Remote SensingIRJET-  	  Land Use & Land Cover Change Detection using G.I.S. & Remote Sensing
IRJET- Land Use & Land Cover Change Detection using G.I.S. & Remote SensingIRJET Journal
Ā 
Jo Parker: A New VISTA on Buried Assets
Jo Parker: A New VISTA on Buried AssetsJo Parker: A New VISTA on Buried Assets
Jo Parker: A New VISTA on Buried AssetsAGI Geocommunity
Ā 
Integration of GIS Based Survey procedure to update Road Network Geo-Database...
Integration of GIS Based Survey procedure to update Road Network Geo-Database...Integration of GIS Based Survey procedure to update Road Network Geo-Database...
Integration of GIS Based Survey procedure to update Road Network Geo-Database...Soumik Chakraborty
Ā 
Christian jensen advanced routing in spatial networks using big data
Christian jensen advanced routing in spatial networks using big dataChristian jensen advanced routing in spatial networks using big data
Christian jensen advanced routing in spatial networks using big datajins0618
Ā 
Urban Development Scenarios and Probability Mapping for Greater Dublin Region...
Urban Development Scenarios and Probability Mapping for Greater Dublin Region...Urban Development Scenarios and Probability Mapping for Greater Dublin Region...
Urban Development Scenarios and Probability Mapping for Greater Dublin Region...Beniamino Murgante
Ā 
IRJET- Technical Paper on Use of Smart Urban Simulation Software ā€“ā€˜Citysi...
IRJET-  	  Technical Paper on Use of Smart Urban Simulation Software ā€“ā€˜Citysi...IRJET-  	  Technical Paper on Use of Smart Urban Simulation Software ā€“ā€˜Citysi...
IRJET- Technical Paper on Use of Smart Urban Simulation Software ā€“ā€˜Citysi...IRJET Journal
Ā 
IRJET- Geological Boundary Detection for Satellite Images using AI Technique
IRJET- Geological Boundary Detection for Satellite Images using AI TechniqueIRJET- Geological Boundary Detection for Satellite Images using AI Technique
IRJET- Geological Boundary Detection for Satellite Images using AI TechniqueIRJET Journal
Ā 
Slum image detection and localization using transfer learning: a case study ...
Slum image detection and localization using transfer learning:  a case study ...Slum image detection and localization using transfer learning:  a case study ...
Slum image detection and localization using transfer learning: a case study ...IJECEIAES
Ā 
CCXG Forum, March 2022, Luca Lo Re and Federico de Lorenzo
CCXG Forum, March 2022, Luca Lo Re and Federico de LorenzoCCXG Forum, March 2022, Luca Lo Re and Federico de Lorenzo
CCXG Forum, March 2022, Luca Lo Re and Federico de LorenzoOECD Environment
Ā 
Use of Satellite Data for Feasibility Study And Preliminary Design Project Re...
Use of Satellite Data for Feasibility Study And Preliminary Design Project Re...Use of Satellite Data for Feasibility Study And Preliminary Design Project Re...
Use of Satellite Data for Feasibility Study And Preliminary Design Project Re...IJERDJOURNAL
Ā 
Gis in telecomm
Gis in telecommGis in telecomm
Gis in telecommAtiqa khan
Ā 
Building Climate Resilience: Translating Climate Data into Risk Assessments
Building Climate Resilience: Translating Climate Data into Risk Assessments Building Climate Resilience: Translating Climate Data into Risk Assessments
Building Climate Resilience: Translating Climate Data into Risk Assessments Safe Software
Ā 
SUNSHINE short overview of the project and its objectives
SUNSHINE short overview of the project and its objectives SUNSHINE short overview of the project and its objectives
SUNSHINE short overview of the project and its objectives Raffaele de Amicis
Ā 
Making Infrastructure Work: BIM Meets Geospatial (Rollo Home, Ordnance Survey)
Making Infrastructure Work: BIM Meets Geospatial (Rollo Home, Ordnance Survey)Making Infrastructure Work: BIM Meets Geospatial (Rollo Home, Ordnance Survey)
Making Infrastructure Work: BIM Meets Geospatial (Rollo Home, Ordnance Survey)Association for Geographic Information (AGI)
Ā 
Using GIS for effective flood management
Using GIS for effective flood managementUsing GIS for effective flood management
Using GIS for effective flood managementJames Thompson
Ā 
Building Smart Cities: The Data-Driven Way (Created For The Big 5 Construct 2...
Building Smart Cities: The Data-Driven Way (Created For The Big 5 Construct 2...Building Smart Cities: The Data-Driven Way (Created For The Big 5 Construct 2...
Building Smart Cities: The Data-Driven Way (Created For The Big 5 Construct 2...SocialCops
Ā 

Ƅhnlich wie Geospatial Open Data and Urban Growth Modelling for Evidence-based Decision Making in perspective of Smart Cities (20)

Computational Model for Urban Growth Using Socioeconomic Latent Parameters
Computational Model for Urban Growth Using Socioeconomic Latent ParametersComputational Model for Urban Growth Using Socioeconomic Latent Parameters
Computational Model for Urban Growth Using Socioeconomic Latent Parameters
Ā 
IRJET- Land Use & Land Cover Change Detection using G.I.S. & Remote Sensing
IRJET-  	  Land Use & Land Cover Change Detection using G.I.S. & Remote SensingIRJET-  	  Land Use & Land Cover Change Detection using G.I.S. & Remote Sensing
IRJET- Land Use & Land Cover Change Detection using G.I.S. & Remote Sensing
Ā 
Jo Parker: A New VISTA on Buried Assets
Jo Parker: A New VISTA on Buried AssetsJo Parker: A New VISTA on Buried Assets
Jo Parker: A New VISTA on Buried Assets
Ā 
Integration of GIS Based Survey procedure to update Road Network Geo-Database...
Integration of GIS Based Survey procedure to update Road Network Geo-Database...Integration of GIS Based Survey procedure to update Road Network Geo-Database...
Integration of GIS Based Survey procedure to update Road Network Geo-Database...
Ā 
Christian jensen advanced routing in spatial networks using big data
Christian jensen advanced routing in spatial networks using big dataChristian jensen advanced routing in spatial networks using big data
Christian jensen advanced routing in spatial networks using big data
Ā 
Urban Development Scenarios and Probability Mapping for Greater Dublin Region...
Urban Development Scenarios and Probability Mapping for Greater Dublin Region...Urban Development Scenarios and Probability Mapping for Greater Dublin Region...
Urban Development Scenarios and Probability Mapping for Greater Dublin Region...
Ā 
IRJET- Technical Paper on Use of Smart Urban Simulation Software ā€“ā€˜Citysi...
IRJET-  	  Technical Paper on Use of Smart Urban Simulation Software ā€“ā€˜Citysi...IRJET-  	  Technical Paper on Use of Smart Urban Simulation Software ā€“ā€˜Citysi...
IRJET- Technical Paper on Use of Smart Urban Simulation Software ā€“ā€˜Citysi...
Ā 
2nd review
2nd review2nd review
2nd review
Ā 
IRJET- Geological Boundary Detection for Satellite Images using AI Technique
IRJET- Geological Boundary Detection for Satellite Images using AI TechniqueIRJET- Geological Boundary Detection for Satellite Images using AI Technique
IRJET- Geological Boundary Detection for Satellite Images using AI Technique
Ā 
GIS.ppt
GIS.pptGIS.ppt
GIS.ppt
Ā 
Slum image detection and localization using transfer learning: a case study ...
Slum image detection and localization using transfer learning:  a case study ...Slum image detection and localization using transfer learning:  a case study ...
Slum image detection and localization using transfer learning: a case study ...
Ā 
CCXG Forum, March 2022, Luca Lo Re and Federico de Lorenzo
CCXG Forum, March 2022, Luca Lo Re and Federico de LorenzoCCXG Forum, March 2022, Luca Lo Re and Federico de Lorenzo
CCXG Forum, March 2022, Luca Lo Re and Federico de Lorenzo
Ā 
Use of Satellite Data for Feasibility Study And Preliminary Design Project Re...
Use of Satellite Data for Feasibility Study And Preliminary Design Project Re...Use of Satellite Data for Feasibility Study And Preliminary Design Project Re...
Use of Satellite Data for Feasibility Study And Preliminary Design Project Re...
Ā 
Gis in telecomm
Gis in telecommGis in telecomm
Gis in telecomm
Ā 
Building Climate Resilience: Translating Climate Data into Risk Assessments
Building Climate Resilience: Translating Climate Data into Risk Assessments Building Climate Resilience: Translating Climate Data into Risk Assessments
Building Climate Resilience: Translating Climate Data into Risk Assessments
Ā 
SUNSHINE short overview of the project and its objectives
SUNSHINE short overview of the project and its objectives SUNSHINE short overview of the project and its objectives
SUNSHINE short overview of the project and its objectives
Ā 
3 D Lidar Epfl Iccsa 08
3 D Lidar Epfl Iccsa 083 D Lidar Epfl Iccsa 08
3 D Lidar Epfl Iccsa 08
Ā 
Making Infrastructure Work: BIM Meets Geospatial (Rollo Home, Ordnance Survey)
Making Infrastructure Work: BIM Meets Geospatial (Rollo Home, Ordnance Survey)Making Infrastructure Work: BIM Meets Geospatial (Rollo Home, Ordnance Survey)
Making Infrastructure Work: BIM Meets Geospatial (Rollo Home, Ordnance Survey)
Ā 
Using GIS for effective flood management
Using GIS for effective flood managementUsing GIS for effective flood management
Using GIS for effective flood management
Ā 
Building Smart Cities: The Data-Driven Way (Created For The Big 5 Construct 2...
Building Smart Cities: The Data-Driven Way (Created For The Big 5 Construct 2...Building Smart Cities: The Data-Driven Way (Created For The Big 5 Construct 2...
Building Smart Cities: The Data-Driven Way (Created For The Big 5 Construct 2...
Ā 

KĆ¼rzlich hochgeladen

Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
Ā 
äø“äøšäø€ęƔäø€ē¾Žå›½äæ„äŗ„äæ„大学ęƕäøščÆęˆē»©å•pdfē”µå­ē‰ˆåˆ¶ä½œäæ®ę”¹
äø“äøšäø€ęƔäø€ē¾Žå›½äæ„äŗ„äæ„大学ęƕäøščÆęˆē»©å•pdfē”µå­ē‰ˆåˆ¶ä½œäæ®ę”¹äø“äøšäø€ęƔäø€ē¾Žå›½äæ„äŗ„äæ„大学ęƕäøščÆęˆē»©å•pdfē”µå­ē‰ˆåˆ¶ä½œäæ®ę”¹
äø“äøšäø€ęƔäø€ē¾Žå›½äæ„äŗ„äæ„大学ęƕäøščÆęˆē»©å•pdfē”µå­ē‰ˆåˆ¶ä½œäæ®ę”¹yuu sss
Ā 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
Ā 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
Ā 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
Ā 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
Ā 
9711147426āœØCall In girls Gurgaon Sector 31. SCO 25 escort service
9711147426āœØCall In girls Gurgaon Sector 31. SCO 25 escort service9711147426āœØCall In girls Gurgaon Sector 31. SCO 25 escort service
9711147426āœØCall In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
Ā 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
Ā 
Call Us āž„97111āˆš47426šŸ¤³Call Girls in Aerocity (Delhi NCR)
Call Us āž„97111āˆš47426šŸ¤³Call Girls in Aerocity (Delhi NCR)Call Us āž„97111āˆš47426šŸ¤³Call Girls in Aerocity (Delhi NCR)
Call Us āž„97111āˆš47426šŸ¤³Call Girls in Aerocity (Delhi NCR)jennyeacort
Ā 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
Ā 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
Ā 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
Ā 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
Ā 
RS 9000 Call In girls Dwarka Mor (DELHI)ā‡›9711147426šŸ”Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)ā‡›9711147426šŸ”DelhiRS 9000 Call In girls Dwarka Mor (DELHI)ā‡›9711147426šŸ”Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)ā‡›9711147426šŸ”Delhijennyeacort
Ā 
High Class Call Girls Noida Sector 39 Aarushi šŸ”8264348440šŸ” Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi šŸ”8264348440šŸ” Independent Escort...High Class Call Girls Noida Sector 39 Aarushi šŸ”8264348440šŸ” Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi šŸ”8264348440šŸ” Independent Escort...soniya singh
Ā 
办ē†å­¦ä½čƁēŗ½ēŗ¦å¤§å­¦ęƕäøščƁ(NYUęƕäøščƁ书ļ¼‰åŽŸē‰ˆäø€ęƔäø€
办ē†å­¦ä½čƁēŗ½ēŗ¦å¤§å­¦ęƕäøščƁ(NYUęƕäøščƁ书ļ¼‰åŽŸē‰ˆäø€ęƔäø€åŠžē†å­¦ä½čƁēŗ½ēŗ¦å¤§å­¦ęƕäøščƁ(NYUęƕäøščƁ书ļ¼‰åŽŸē‰ˆäø€ęƔäø€
办ē†å­¦ä½čƁēŗ½ēŗ¦å¤§å­¦ęƕäøščƁ(NYUęƕäøščƁ书ļ¼‰åŽŸē‰ˆäø€ęƔäø€fhwihughh
Ā 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
Ā 
ęƕäøšę–‡å‡­åˆ¶ä½œ#å›žå›½å…„čŒ#diploma#degreeę¾³ę“²äø­å¤®ę˜†å£«å…°å¤§å­¦ęƕäøščÆęˆē»©å•pdfē”µå­ē‰ˆåˆ¶ä½œäæ®ę”¹#ęƕäøšę–‡å‡­åˆ¶ä½œ#å›žå›½å…„čŒ#diploma#degree
ęƕäøšę–‡å‡­åˆ¶ä½œ#å›žå›½å…„čŒ#diploma#degreeę¾³ę“²äø­å¤®ę˜†å£«å…°å¤§å­¦ęƕäøščÆęˆē»©å•pdfē”µå­ē‰ˆåˆ¶ä½œäæ®ę”¹#ęƕäøšę–‡å‡­åˆ¶ä½œ#å›žå›½å…„čŒ#diploma#degreeęƕäøšę–‡å‡­åˆ¶ä½œ#å›žå›½å…„čŒ#diploma#degreeę¾³ę“²äø­å¤®ę˜†å£«å…°å¤§å­¦ęƕäøščÆęˆē»©å•pdfē”µå­ē‰ˆåˆ¶ä½œäæ®ę”¹#ęƕäøšę–‡å‡­åˆ¶ä½œ#å›žå›½å…„čŒ#diploma#degree
ęƕäøšę–‡å‡­åˆ¶ä½œ#å›žå›½å…„čŒ#diploma#degreeę¾³ę“²äø­å¤®ę˜†å£«å…°å¤§å­¦ęƕäøščÆęˆē»©å•pdfē”µå­ē‰ˆåˆ¶ä½œäæ®ę”¹#ęƕäøšę–‡å‡­åˆ¶ä½œ#å›žå›½å…„čŒ#diploma#degreeyuu sss
Ā 

KĆ¼rzlich hochgeladen (20)

Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
Ā 
äø“äøšäø€ęƔäø€ē¾Žå›½äæ„äŗ„äæ„大学ęƕäøščÆęˆē»©å•pdfē”µå­ē‰ˆåˆ¶ä½œäæ®ę”¹
äø“äøšäø€ęƔäø€ē¾Žå›½äæ„äŗ„äæ„大学ęƕäøščÆęˆē»©å•pdfē”µå­ē‰ˆåˆ¶ä½œäæ®ę”¹äø“äøšäø€ęƔäø€ē¾Žå›½äæ„äŗ„äæ„大学ęƕäøščÆęˆē»©å•pdfē”µå­ē‰ˆåˆ¶ä½œäæ®ę”¹
äø“äøšäø€ęƔäø€ē¾Žå›½äæ„äŗ„äæ„大学ęƕäøščÆęˆē»©å•pdfē”µå­ē‰ˆåˆ¶ä½œäæ®ę”¹
Ā 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
Ā 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Ā 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
Ā 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
Ā 
9711147426āœØCall In girls Gurgaon Sector 31. SCO 25 escort service
9711147426āœØCall In girls Gurgaon Sector 31. SCO 25 escort service9711147426āœØCall In girls Gurgaon Sector 31. SCO 25 escort service
9711147426āœØCall In girls Gurgaon Sector 31. SCO 25 escort service
Ā 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
Ā 
Call Us āž„97111āˆš47426šŸ¤³Call Girls in Aerocity (Delhi NCR)
Call Us āž„97111āˆš47426šŸ¤³Call Girls in Aerocity (Delhi NCR)Call Us āž„97111āˆš47426šŸ¤³Call Girls in Aerocity (Delhi NCR)
Call Us āž„97111āˆš47426šŸ¤³Call Girls in Aerocity (Delhi NCR)
Ā 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
Ā 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
Ā 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
Ā 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Ā 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Ā 
Call Girls in Saket 99530šŸ” 56974 Escort Service
Call Girls in Saket 99530šŸ” 56974 Escort ServiceCall Girls in Saket 99530šŸ” 56974 Escort Service
Call Girls in Saket 99530šŸ” 56974 Escort Service
Ā 
RS 9000 Call In girls Dwarka Mor (DELHI)ā‡›9711147426šŸ”Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)ā‡›9711147426šŸ”DelhiRS 9000 Call In girls Dwarka Mor (DELHI)ā‡›9711147426šŸ”Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)ā‡›9711147426šŸ”Delhi
Ā 
High Class Call Girls Noida Sector 39 Aarushi šŸ”8264348440šŸ” Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi šŸ”8264348440šŸ” Independent Escort...High Class Call Girls Noida Sector 39 Aarushi šŸ”8264348440šŸ” Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi šŸ”8264348440šŸ” Independent Escort...
Ā 
办ē†å­¦ä½čƁēŗ½ēŗ¦å¤§å­¦ęƕäøščƁ(NYUęƕäøščƁ书ļ¼‰åŽŸē‰ˆäø€ęƔäø€
办ē†å­¦ä½čƁēŗ½ēŗ¦å¤§å­¦ęƕäøščƁ(NYUęƕäøščƁ书ļ¼‰åŽŸē‰ˆäø€ęƔäø€åŠžē†å­¦ä½čƁēŗ½ēŗ¦å¤§å­¦ęƕäøščƁ(NYUęƕäøščƁ书ļ¼‰åŽŸē‰ˆäø€ęƔäø€
办ē†å­¦ä½čƁēŗ½ēŗ¦å¤§å­¦ęƕäøščƁ(NYUęƕäøščƁ书ļ¼‰åŽŸē‰ˆäø€ęƔäø€
Ā 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
Ā 
ęƕäøšę–‡å‡­åˆ¶ä½œ#å›žå›½å…„čŒ#diploma#degreeę¾³ę“²äø­å¤®ę˜†å£«å…°å¤§å­¦ęƕäøščÆęˆē»©å•pdfē”µå­ē‰ˆåˆ¶ä½œäæ®ę”¹#ęƕäøšę–‡å‡­åˆ¶ä½œ#å›žå›½å…„čŒ#diploma#degree
ęƕäøšę–‡å‡­åˆ¶ä½œ#å›žå›½å…„čŒ#diploma#degreeę¾³ę“²äø­å¤®ę˜†å£«å…°å¤§å­¦ęƕäøščÆęˆē»©å•pdfē”µå­ē‰ˆåˆ¶ä½œäæ®ę”¹#ęƕäøšę–‡å‡­åˆ¶ä½œ#å›žå›½å…„čŒ#diploma#degreeęƕäøšę–‡å‡­åˆ¶ä½œ#å›žå›½å…„čŒ#diploma#degreeę¾³ę“²äø­å¤®ę˜†å£«å…°å¤§å­¦ęƕäøščÆęˆē»©å•pdfē”µå­ē‰ˆåˆ¶ä½œäæ®ę”¹#ęƕäøšę–‡å‡­åˆ¶ä½œ#å›žå›½å…„čŒ#diploma#degree
ęƕäøšę–‡å‡­åˆ¶ä½œ#å›žå›½å…„čŒ#diploma#degreeę¾³ę“²äø­å¤®ę˜†å£«å…°å¤§å­¦ęƕäøščÆęˆē»©å•pdfē”µå­ē‰ˆåˆ¶ä½œäæ®ę”¹#ęƕäøšę–‡å‡­åˆ¶ä½œ#å›žå›½å…„čŒ#diploma#degree
Ā 

Geospatial Open Data and Urban Growth Modelling for Evidence-based Decision Making in perspective of Smart Cities

  • 1. Geospatial Open Data and Urban Growth Modelling for Evidence-based Decision Making in perspective of Smart Cities PIYUSH YADAV
  • 2. 17/01/2020 Ā© Lero 2015 2 About me ā‘ Researcher at Insight Center for Data Analytics and Lero Software Research Centre at NUI Galway (NUIG) ā‘ Researcher- CTO at Tata Research Development and Design Centre (TRDDC) which is part of TCS Innovation Lab , Member of project in collaboration with IIT Bombay. ā‘ M.Tech. (CSE) with specialization in information security at IIIT Delhi in 2013, Research Assistant McGill Univ. Canada. ā‘ Research Interest : Complex Event Processing, Video Analytics, Distributed Systems, Machine Learning, Smart Cities, GIS and Remote Sensing ā‘ Publications : 17 Conference Papers, 1 Journal, 1 Book Chapter, 6 Posters, 2 Patents Filed , 1 Industry Report (Dell) Twitter LinkedIn Website Contact
  • 3. 17/01/2020 Ā© Lero 2015 3 ā€¢ Learning Outcomes ā€¢ Geospatial Data ā€¢ Classification for Satellite Images ā€¢ Case Study: Urban Growth Modelling ā€¢ Multi-source Open Data Management ā€¢ Quality Issues in Multi-source Open Data ā€¢ Techniques for data preparation and cleaning ā€¢ Assignments Outline
  • 4. 17/01/2020 Ā© Lero 2015 4 Learning Outcomes You will learn: ā€¢ Importance of Geospatial Data and Land Use Land Cover in development of Smart Cities ā€¢ Fundamentals of Satellite Image Classification ā€¢ How to model urban growth and predict future growth of city. ā€¢ Importance of Open Data in Smart Cities ā€¢ Explain the nature and types of data issues in (Open) Data ā€¢ Discuss techniques for identifying data quality issues ā€¢ Demonstrate data preparation and cleaning strategies (e.g., data clustering, filtering, etc.)
  • 5. 17/01/2020 Ā© Lero 2015 5 Copernicus Hackathon Ireland 2019 ā€¢ Last Year 3 teams participated from this class. ā€¢ 2 teams won the prize. Air Quality Aftab Alam, Nikhil Nambiar, Vignesh Kamath https://prezi.com/view/iZEygJaFnxqAJH7lR9TM/ Smart Agriculture
  • 6. 17/01/2020 Ā© Lero 2015 6 Geospatial Data Geospatial data or spatial data (as it's sometimes known), is information that has a geographic aspect to it āž¢ Coordinates: Lat Long āž¢ Postal Address āž¢ Physical Features Vector - This form uses points, lines, and polygons to represent spatial features such as cities, roads, and streams. Raster - This form uses cells (computer often use dots or pixels) to represent spatial features.(our focus in this lecture) Types https://www.bolton- menk.com/books/lindsey/Lindsey.html
  • 7. 17/01/2020 Ā© Lero 2015 7 Satellite Imagery: Basics How we see colour Electromagnetic Spectrum ā€¢ Electromagnetic (EM) spectrum describes the continuous spectrum of energy from high energy gamma rays and x-rays to very low energy microwaves and radio waves. ā€¢ Visible light, or light that our eyes can detect, is just a small portion of the EM spectrum. ā€¢ Satellites collect data by passing the reflected energy from the Earth through filters that separate the energy into small windows of the EM spectrum into discrete spectral bands (Raster Image) Satellite Imaging https://landsat.usgs.gov/atmospheric-transmittance-information
  • 8. 17/01/2020 Ā© Lero 2015 8 Multispectral(3-10 bands) Hyperspectral(100-1000 bands(nm)) Normal Image (3 bands Red, Green, Blue) Image Bands/Channels An image constitute of multiple bands from this electromagnetic spectrum. http://www.splibtarang.com/index.php Stack of Bands ~ Tensor
  • 9. 17/01/2020 Ā© Lero 2015 9 LANDSAT Satellite Images ā€¢ Landsat program is the longest-running enterprise for acquisition of satellite imagery of Earth by Nasa ā€¢ Till now 8 satellites ā€¢ Landsat 1- launched 1972, Landsat 7- 1999, Landsat 8 -2013 ā€¢ Can download data from : https://earthexplorer.usgs.gov/ Landsat 7 Bands Landsat 8 Bands Scan Line Correction Issue In Landsat 7 (2003) Other Earth Observation Satellites
  • 10. 17/01/2020 Ā© Lero 2015 10 Pre-processing of Landsat Image Cracknell, A. (2007). Atmospheric Corrections to Passive Satellite Remote Sensing Data. In A. Cracknell, Introduction To Remote Sensing, Second Edition (p. 196). CRC Press. Retrieved September 1, 2015 Kaufman, Y. J. (1989). The atmospheric effect on remote sensing and its correction. In Theory and applications of optical remote sensing (pp. 336-428). Atmospheric Correction Solar Correction ā€¢ Electromagnetic radiation captured by the satellite sensors is affected because of the atmospheric interference such as scattering, dispersion, etc. ā€¢ Subtract the digital number (DN) of water pixels in band 4 (infrared band) as it has very low water leaving radiance (Cracknell 2007). ā€¢ DN values were then converted to spectral radiance (Kaufmann 1989). š‘³ = š‘³ š’Žš’Šš’ + š‘³ š’Žš’‚š’™ šŸšŸ“šŸ’ āˆ’ š‘³ š’Žš’Šš’ šŸšŸ“šŸ“ š’™ š‘«š‘µ ā€¢ For clear Landsat images, solar correction of the images was done by converting spectral radiance to exoatmospheric reflectance (Kaufmann 1989). š† š’‘ = š… ā‹… š‘³ š€ā‹… š’… šŸ š‘¬š‘ŗš‘¼š‘µ š€ ā‹… š’„š’š’”šœ½ š’”
  • 12. 17/01/2020 Ā© Lero 2015 12 Pre-processing of Landsat Image Band 1 Band 2 Band 3ā€¦ā€¦.. Converted to Reflectance https://drive.google.com/drive/folders/1KGQmkZ7bN2M-ED31sDNWVtX29VntfzWs View Using KML on Google Earth. Download file from below link R G B ā€¦
  • 13. 17/01/2020 Ā© Lero 2015 13 Classify Landsat Image (Supervised Learning) Create Training Data Class ID Class Name Location(x,y) 1 Vegetation 2 Impervious Surface(Built Up) 3 Soil 4 Water Train Model ā€¢ Maximum Likelihood ā€¢ SVM ā€¢ DNN Spectral Signature for Different Classes Classify
  • 14. 17/01/2020 Ā© Lero 2015 14 Classified Image
  • 15. 17/01/2020 Ā© Lero 2015 15 World Population is growing Increased Economic Activities Increased Urban Growth Rate Case Study: Urban Growth Modelling An Aerial View of urban growth in 2006 and 2014 Urban Growth Change in Land Use Land Cover
  • 16. 17/01/2020 Ā© Lero 2015 16 A KEY ASPECT OF URBAN GROWTH IS AFFECT ON LAND USE LAND COVER CHANGE LAND COVER INDICATES THE PHYSICAL LAND TYPE SUCH AS FOREST OR OPEN WATER LAND USE DOCUMENTS HOW PEOPLE ARE USING THE LAND SUCH AS AGRICULTURE Land Use Land Cover Change(LULCC)
  • 17. 17/01/2020 Ā© Lero 2015 17 Factors Affecting Land Use Land Cover ā€¢ Predominantly, change over space but remain relatively static with respect to time. ā€¢ Digital Elevation Model (DEM) Spatial Factors ā€¢ Change over both time and space. ā€¢ Proximity to the primary roads Spatio- temporal Factors ā€¢ Change over time but spatially static for a given study area. ā€¢ National Gross Domestic Product (GDP) Temporal Factors Direct Factors Indirect Factors Land Use Land Cover Change
  • 18. 17/01/2020 Ā© Lero 2015 18 Urban Growth Models Thus the lattice based spatio-temporal models, e.g. Cellular Automata (CA) and Logistic Regression (LR), are effectively used to model the spatial geographic processes. LULC images of two distinct time instances are taken and the probabilities are computed using the frequency of change from one LULC class to another and generate transition probability matrix. Urban Growth models are used for prediction of land use land cover (LULC) changes. LULC modeling is extremely difficult due to complex interactions between multi-scale factors. Schematic of an integrated Markov Chain model Limitation: Persistent Growth Rate
  • 19. 17/01/2020 Ā© Lero 2015 19 Our Contribution Hidden Markov Model Introduction of Hidden Markov Model (HMM) Temporal Factors Incorporate temporal factors in LULC change modelling using HMM. Model the underlying temporal factors as Gaussian distributions, conditioned on the hidden states, to learn land cover type transition probabilities Integrate Integrate our model with other spatio- temporal models such as Logistic Regression (LR) to yield richer integrated models than the corresponding MC based integrated models. An urban growth model with multi-scale direct and indirect factors impacting LULC changes
  • 20. 17/01/2020 Ā© Lero 2015 20 Our Model A Hidden Markov Model with hidden states (V, I, S) and sample emissions (GDP and Liquidity) Proposed urban growth model: HMM integrated with Logistic Regression model
  • 21. 17/01/2020 Ā© Lero 2015 21 Study Area: Pune ā€¢ Tier-A city situated in the state of Maharashtra, India. ā€¢ Located 560 m above the sea level. ā€¢ Famous for Information Technology and Automobile industries and various research institutes. ā€¢ Considered 45 sq. km of the city area which have gone under rapid urbanisation.
  • 22. 17/01/2020 Ā© Lero 2015 22 Temporal Growth Factors Gross Domestic Product National. Amount of goods and services produced within the border of a country in a specific time interval. Interest Rate Cycle National. Revised bimonthly. A tight monetary policy affects the overall investment policy which leads to slowdown and vice versa. Consumer Price Index National. Low inflation creates developmental investment environment. Gross Fixed Capital Formation National. Amount that government spends in the capital formation(such as infrastructure building, land improvements) of the country. Greater the GFCF investment higher is the rate of urbanization . Urban Population Growth Rate National. In order to accommodate a higher influx of people, cities are expanding along their outskirts, leading to the growth in urban agglomerate. Electricity Consumption Regional. Typically, regions with higher electricity demand grow faster than those with lesser demand. Road Length Added Regional. Better connectivity of a region helps in better transportation and thus provides impetus to growth by allowing setup of new industrial complexes and other infrastructure services.
  • 23. 17/01/2020 Ā© Lero 2015 23 Temporal Growth Factors Data GDP growth rate (%) Absolute average CPI Inflation (%) Gross fixed capital formation (%GDP) Urban population growth rate (%) Bimonthly interest (repo) rate (%) Per capita electricity consumption in kilowatt-hours
  • 24. 17/01/2020 Ā© Lero 2015 24 Land Use Land Cover (LULC) Data LULC data is required for HMM hidden states and LR models as an input. Time period Yearly, 2001 to 2014 (between March to April) Latitude 18.38847838Ā°N - 18.79279909Ā°N Longitude 73.64552005Ā°E - 74.07494971Ā°E Bands 1 to 7 Resolution 30m Pixels 1500 š‘„ 1500 Landsat 7 Landsat-7 Specifications Scan Line Correction (SLC) ā€¢ In 2003 Landsat-7 SLC in ETM+ instrument has developed a fault thus creating some black lines in the captured images. ā€¢ Image Smoothening using windowing. LULC Data Pre-processing Atmospheric Correction: explained earlier Solar Correction: explained earlier
  • 25. 17/01/2020 Ā© Lero 2015 25 ā€¢ Classified into seven broad LULC classes on the basis of the nature of the landscape. ā€¢ Forest Canopy, Agriculture Area, Residential Area, Industrial Area, Common Open Area, Burnt Grass, Bright Soil, and Water Body. Classes ā€¢ For classification a labeled set of pixels for each class of interest was collected (500 to 3000 samples per class). The feature vector for each pixel consisted of all seven band values. ā€¢ Support Vector Machines ā€¢ Manual Correction (Concrete and Quarry) SVM Classification ā€¢ Vegetation, Impervious Surface, and Soil VIS Classes LULC Data Classification
  • 26. 17/01/2020 Ā© Lero 2015 26 A Quick Recap LULC Data
  • 27. 17/01/2020 Ā© Lero 2015 27 Spatio-Temporal Factors Digital Elevation Model (DEM) and Slope Proximity to primary roads: Mask CARTOSAT 1 Water bodies were masked out from the LULC image 3 D View DEM Image Primary Road Layers
  • 28. 17/01/2020 Ā© Lero 2015 28 Results HMM Experiments Computed MC transition probabilities for 2001-2002, Learned HMM transition probabilities for 2014, Computed MC transition probabilities for 2014 ā€¢ Used Gaussian HMM library in Scikit Learn ā€¢ We designed a HMM with the three hidden states (V, I, and S) and temporal factors ā€¢ HMM was initialized with MC transition probabilities for the year 2001 to 2002 ā€¢ A stable model was obtained empirically after 50000 iterations with a threshold of less than 0.01
  • 29. 17/01/2020 Ā© Lero 2015 29 Results Land Change Modelling Experiments ā€¢ Terrsetā€™s Land Change Modeler. ā€¢ Transition sub-models were defined for four LC change types, i.e., V to S, V to I, S to V, and S to I. ā€¢ Slope gradient and primary roads layer were used as the primary driver variables . š’”š’–š’Šš’•š’‚š’ƒš’Šš’š’Šš’•š’š = šŸ š’”š’š’š’‘š’† š’ˆš’“š’‚š’…š’Šš’†š’š’• šŸŽ.šŸ ā€¢ Suitability map. Greater the value higher the suitability and vice-versa. ā€¢ Suitability for urbanization is high in areas such as roads, low lying river basin, and around the urbanized areas where the slope gradient is less. ā€¢ Towards, the south end the suitability drops significantly, as the area has hills and valleys. ā€¢ Four of the sub models were built using Logistic Regression.
  • 30. 17/01/2020 Ā© Lero 2015 30 Results Soil to Impervious Soil to Vegetation Vegetation to Impervious Vegetation to Soil Heat maps depicting transition probabilities from one state to another
  • 31. 17/01/2020 Ā© Lero 2015 31 ā€¢ The two models were then used to predict changes for the year 2014. Results Actual land cover image of 2014 obtained from classification Predicted land cover image of 2014 (HMM-LR) Predicted land cover image of 2014 (MC-LR) ā€¢ Visually it is evident that the HMM based predicted image is significantly better, in terms of similarity with the actual classified LC image than the MC based predicted image .
  • 32. 17/01/2020 Ā© Lero 2015 32 HMM-LR MC-LR V I S V I S Precision 0.48 0.49 0.60 0.54 0.38 0.34 Recall 0.48 0.52 0.59 0.54 0.32 0.39 Results ā€¢ Blob Analysis of urban and non urban regions. Blobs denote concentrated urban regions. ā€¢ Green blobs are true positives, blue blobs are false negatives, and red blobs are the false positives. ā€¢ HMM-LR false positives are smaller in size and less dense than those of the MC-LR. The HMM output is well balanced and resembles the actual output better. ā€¢ 11% increment in precision of the persistence of Impervious Surface (I) is observed. ā€¢ Precision of Soil (S) class type has jumped up by 26%. ā€¢ Drop in the precision of Vegetation (V) class type by a marginal 6% . This is because vegetation cover is an outcome of relatively easy process as compared to S and I . Blob Analysis of urban areas. Left to right: (i) Actual, (ii) MC-LR, (iii) HMM-LR Precision and Recall for integrated models
  • 33. 17/01/2020 Ā© Lero 2015 33 Conclusion ā€¢ Markov Chain (MC) models are limited in their urban prediction capabilities due to the assumption of constant rate of persistence of land cover class types and inability to model the temporal factors. ā€¢ We have proposed a new temporal model using Hidden Markov Model. ā€¢ We have demonstrated the usefulness of our model over MC by predicting urban growth for an upcoming city of India (Pune). ā€¢ We believe that this inquiry into HMM based models provides yet another tool that will equip the urban modelers, planners and decision makers to better design sustainable urban environments. ā€¢ 11% and 26% increment of precision in Impervious Surface and Soil Class respectively. https://www.researchgate.net/publication/327745849_Computational_Model_for_Urban_Growth_Using_Socioeconomic_Latent_Parameters
  • 34. 17/01/2020 Ā© Lero 2015 34 Open Data
  • 35. 17/01/2020 Ā© Lero 2015 3535 Open Data
  • 36. 17/01/2020 Ā© Lero 2015 3636 10030 112 https://data.gov.ie/stats
  • 37. 17/01/2020 Ā© Lero 2015 37 How is Open Data being used? Engagement/Innovation https://www.mapalerter.com/ Data Modelling / Decision-Making http://exceedence.com/monetising-metocean-data-an-open-data- project/
  • 38. 17/01/2020 Ā© Lero 2015 38 Monitoring / Planning Quality and Qualifications Ireland http://infographics.qqi.ie/ Sustainability / Mobility https://citybik.es/
  • 39. 17/01/2020 Ā© Lero 2015 39 Open Data Management Challenge 39
  • 40. 17/01/2020 Ā© Lero 2015 40 From Data to Smart Data 40 Data Sources Predictive Analytics User Awareness Recommen- dations Smart Apps Open Data Management Data Modeling Collection Aggregation Enrichment Linking Classification Cleaning Integration Storing Querying Is this data good enough for creating accurate and reliable apps?
  • 41. 17/01/2020 Ā© Lero 2015 41 Open Data Management Challenge Open Data Quality can be very challenging for designing apps and decision support models Open Data can have multiple issues: missing values, different formats, irregular timestamps, abnormal values, etc. Data preparation such as filtering and classification is an important step for further analysis Data is not complete and require combining multiple data sources 41
  • 42. 17/01/2020 Ā© Lero 2015 42 Case Specifics 42
  • 43. 17/01/2020 Ā© Lero 2015 43 Data Preparation for Building a map of Playing Pitches around Dublin 43 The data is available on https://data.gov.ie Different Formats
  • 44. 17/01/2020 Ā© Lero 2015 44 And even more challenges! 44 Different Formats Different Attributes Missing Values
  • 45. 17/01/2020 Ā© Lero 2015 45 And even more challenges! 45 Different Formats Different Attributes Missing Values Objective: Create a good quality dataset from these resource!
  • 46. 17/01/2020 Ā© Lero 2015 46 What is a good quality data? 46 A Conventional Definition of Data Quality Good quality data are: Accurate, Complete, Unique, Up-to-date, and Consistent ; meaning ā€¦
  • 47. 17/01/2020 Ā© Lero 2015 47 Accurate means ā€¦ Are we storing correct values? āž” Values in the data entries should be consistent: Same form or value representation 47 Sensor Timestamp Value Location M1n 12/01/2018T10:03:59 12.3 Galway M3n 1452592980000 9.5 GA M5n 01/12/2018 10:03 1.55 NUIG Example: What issues can you identify from this table?
  • 48. 17/01/2020 Ā© Lero 2015 48 Possible solution 48 Create a Unified Data Model Do you have access to the data source? Convert your data before further processing Adjust sources to send data using your model NoYes Accurate means ā€¦
  • 49. 17/01/2020 Ā© Lero 2015 49 Complete means ā€¦ Does the data contain everything it is supposed to contain? 49 Sensor Timestamp Value Location M1n 08/01/2018T00:00:00 32.5 NEB, NUIG M1n 09/01/2018T00:00:00 21.2 M1n 10/01/2018T00:00:00 26.1 NEB, NUIG M1n 12/01/2018T00:00:00 23.5 NEB, NUIG M1n 13/01/2018T00:00:00 NEB, NUIG M1n 14/01/2018T00:00:00 26.1 NEB, NUIG Example: What issues can you identify from this table?
  • 50. 17/01/2020 Ā© Lero 2015 50 Unique means ā€¦ Do the data entries appear only once? āž” This issue generally appears when manual entries are allowed in the dataset 50 Surname Firstname DoB Driving test passed: Smith J. 17/12/85 17/12/05 Smith Jack 17/12/85 17/12/2005 Smith Jock 17/12/95 17/12/2005 Example: What issues can you identify from this table?
  • 51. 17/01/2020 Ā© Lero 2015 51 Consistent means ā€¦ Does the data contain any logical errors or impossibilities? 51 Sensor Timestamp Value Location M1n 08/01/2018T00:00:00 32.5 NEB, NUIG M1n 09/01/2018T00:00:00 21.2 NEB, NUIG M1n 10/01/2018T00:00:00 0 NEB, NUIG M1n 11/01/2018T00:00:00 23.5 NEB, NUIG M1n 12/01/2018T00:00:00 -1.23 NEB, NUIG M1n 13/01/2018T00:00:00 26.1 NEB, NUIG Example: What issues can you identify from this table? Are these errors? How can we identify them? āž” Possible solutions: Filtering and Outliers detection.
  • 52. 17/01/2020 Ā© Lero 2015 52 Up-to-Date means ā€¦ Is the data updated regularly? 52 A sensor moved to a new location. What implications can this have? Can you think of a case where it doesnā€™t matter whether or not the data are kept up to date?
  • 53. 17/01/2020 Ā© Lero 2015 53 Techniques for Data Preparation 53
  • 54. 17/01/2020 Ā© Lero 2015 54 Minimal Data Preparation Pipeline 54 Observation Quality Enhancement Understanding the format of the data and its elements Classification, Aggregation, Filtering, Enrichment, etc. Modeling Identify relevant attributes and representation format
  • 55. 17/01/2020 Ā© Lero 2015 55 Step 1: Observation This step involves the descriptive analysis (auditing) of individual data resources Data observations can be: ā€“ Highly structured: by having a predefined checklist of observational attributes (e.g., format, attributes, frequency, volume, language, etc.) ā€“ Semi-structured: by having an ad-hock checklist of observational attributes 55 ā€¢ Cons: ā€“ Can be time consuming ā€¢ Pros: ā€“ Define contextual information about the data ā€“ Provides good and early insights into data quality issue
  • 56. 17/01/2020 Ā© Lero 2015 56 Step 2: Modeling This step involves the use of formal techniques for creating a data model Examples of techniques: Object-Relational mapping, Relational model etc. Methodologies: ā€“ Top-down: predefined information about the data ā€“ Bottom-up: results from a reengineering effort 56
  • 57. 17/01/2020 Ā© Lero 2015 57 Step 3.1: Classification Data classification is the process of organizing data by categories for refined and targeted analysis āž”Example: Water or Energy consumption for working days vs. non working days āž”Categories depend on the intended use of the data 57
  • 58. 17/01/2020 Ā© Lero 2015 58 Step 3.2: Aggregation Data aggregation is a data mining process that summarizes the data with respect to certain criteria/dimensions. Data aggregations help increase search performance Facilitates data reporting and analysis Types of aggregations: Sum, Count, Min/Max , AVG, etc. Aggregation strategies and levels: temporal (hourly, daily, etc.), source-based (resources hierarchy), location-based (outlet, room, area, building), etc. 58 The level of aggregations depends on the available data and its intended use Example of useful aggregations: Hourly traffic congestion level per road. Quarterly Inflation price
  • 59. 17/01/2020 Ā© Lero 2015 59 Data filtering is the process of refining data sets by removing data items that do not comply to certain criteria Example: Keep data with positive water consumption values Filters depend on the context of the observations (negative values may be meaningful in installations where water flows in both directions on a pipe) 59 Step 3.3: Filtering Content-based Filtering ā€“ Selecting data items based on their values (e.g., keep only positive values) Policy-based Filtering ā€“ Filtering rules are defined as constrains similar to access control mechanisms (e.g., for security reasons) Statistical Filtering ā€“ Identify a baseline for a content-based filtering ā€“ Baselines are determined from historical data analysis ā€“ Outliers detection Hybrid Filtering ā€“ Combination of filtering options Filtering Types
  • 60. 17/01/2020 Ā© Lero 2015 6060 Step 3.3: Filtering Outliers Detection Value inconsistent with rest of the dataset ā€“ Global Outlier Special outliers ā€“ Local Outlier ā€¢ Observations inconsistent with their neighborhoods ā€¢ A local instability or discontinuity āž¢ Low quality measurements: faulty collectors, manual errors, wrong calibrations of devices āž¢ Network issues: problems with data transmission from data sources to the data management platform āž¢ Missing values or redundant values: can create wrong aggregations āž¢ Correct but exceptional data! Causes of Outliers
  • 61. 17/01/2020 Ā© Lero 2015 61 Outlier Detection Approaches Deviation-based outlier detection ā€“ Sequential exception Distance-based outlier detection ā€“ Index-based, nested-loop, cell-based, local-outliers Statistical-based outlier detection ā€“ Distribution-based, depth-based 61
  • 62. 17/01/2020 Ā© Lero 2015 62 Distance-based Outlier Detection 62 ā€¢ General idea: ā€“ Judge a point based on the distance to its neighbors ā€“ Several variants proposed ā€¢ Basic Assumption: ā€“ Normal data objects have a dense neighborhood ā€“ Outliers are far apart from their neighbors ā€¢ Basic Model: ā€“ Given a radius ā€“ A point is considered an outlier if at least šœ« percent of all other points have a distance to šœ« less than š“
  • 63. 17/01/2020 Ā© Lero 2015 63 Step 3.4: Enrichment This step supplements/adds additional information to the data. Possible techniques: ā€“ Additional information can be accessed from other resources ā€“ Use of services such as translation, value conversion, adding a zip code, etc. ā€“ [In case of semantic linked data] Linking to other concepts through new predicates. 63
  • 64. 17/01/2020 Ā© Lero 2015 64 Summary 64 Discussed Land Use Land Cover Discussed Satellite Imaging and Classification Discussed Case study on Urban Growth Modelling Discussed the challenges of developing decision support systems with Open Data (e.g., need for accurate trusted information) Explained the nature and types of data issues in (Open) Data: different formats, missing values, Discussed techniques for identifying data quality issues Discussed data preparation and cleaning strategies (e.g., data clustering, filtering, etc.) Identified a minimal data preparation pipeline
  • 65. 17/01/2020 Ā© Lero 2015 6565 Rahm, Erhard, and Hong Hai Do. "Data cleaning: Problems and current approaches." IEEE Data Eng. Bull. 23.4 (2000): 3-13. Assigned Reading https://landsat.gsfc.nasa.gov/pdf_archive/How2make.pdf
  • 66. 17/01/2020 Ā© Lero 2015 66 Acknowledgments I created this material from several resources: ā€“ https://study.com/academy/lesson/geospatial-data-definition-example.html ā€“ Data from USGS ā€“ http://www.splibtarang.com/index.php ā€“ Yadav, Piyush, Shamsuddin Ladha, Shailesh Deshpande, and Edward Curry. " Computational Model for Urban Growth Using Socioeconomic Latent Parameters ", In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 65-78. Springer, Cham, 2018 ā€“ NASA, Landsat Website ā€“ Data from https://data.gov.ie , ā€“ A ppt by David Corn, ā€œData Quality and Data Cleaning1ā€ ā€“ A ppt by Eric Poulin and Colin Yu, ā€œOutlier Detection and Analysisā€ ā€“ A paper by Erhard Rahm and Hong Hai Do, ā€ Data Cleaning: Problems and Current Approachesā€ ā€“ A ppt by Cameron Brooks, ā€œLets Build a Smarter Planet: IBM Smarter Water Managementā€ 66
  • 67. 17/01/2020 Ā© Lero 2015 67 Further Reading For further readings I recommend the following books 67 Book Link
  • 68. 17/01/2020 Ā© Lero 2015 68 Assignments Group Assignment Total 100 marks Two Sections Section 1- (30 marks) ā€“ Objective- Classify a given Landsat Images of a Dublin region of two years using QGIS software and find one major change that you can see between two images ā€“ Marking Scheme: Report 100% (30 marks) Section2- (70 marks) ā€“ Objective: Create a complete and clean dataset by merging three datasets ā€“ Dataset: Real world data from https://data.gov.ie ā€¢ Playing pitches around Dublin ā€¢ Multiple formats (minimum 2 are required) ā€¢ Data completion using other sources ā€“ Tools: Python or Java ā€“ Marking scheme: ā€¢ Report 50% (35 marks) ā€¢ Code/Analytics 50% (35 marks)
  • 69. 17/01/2020 Ā© Lero 2015 69 Guidelines For Group Two people in each group Fill the group information by 21st Jan , 5pm. (Link Given Below) Those who will not fill will be assigned random groups. For any doubt you can mail me on piyush.yadav@insight-centre.org Assignment Due: Jan 30th Midnight https://docs.google.com/spreadsheets/d/1eTwNF6-OqvSGKZtv0WWREgRjnKt8w_OATEH6b18unJQ/edit?usp=sharing