SlideShare ist ein Scribd-Unternehmen logo
1 von 19
Information fusion methods for
location data analysis
Candidate: Alket Cecaj Supervisor: Prof. Marco Mamei
Doctorate School in Industrial Innovation Engineering
Thesis outline
• Introduction
• Data Fusion for Event Detection and Event Description Using Agg. CDR
• Re-identification of Anonymized CDR Records Using Information Fusion
• Privacy issues
• Conclusions
Data Fusion and Location data
• Data Fusion
• Location Data types:
- CDR (Call Description Records) aggregated or individual.
- Geo-tagged social network data or LBS as Foursquare
- Location data as Open data. Example: census data.
Data fusion for event detection by using aggregated
CDR and geo-tagged social network data
Detecting and describing events happening in urban
areas by analysing spatio – temporal data
• Detecting and describing events happening in urban areas by
analysing spatio – temporal data
• Prevoious works: Laura Ferrari, Marco Mamei, Massimo Colonna (2012) : “ People get together on special
events: Discovering happenings in the city via cell network analysis ” Pervasive Computing and Communications
Workshops (PERCOM Workshops), 2012 IEEE International Conference on.
• Publication: Cecaj Alket, Marco Mamei (2016) : “Data Fusion for City Life Event Detection” In: Journal of
Ambient Intelligence and Humanized Computing, pp 1– 15.
The dataset: spatio-temporal aggregation
Spatial Aggregation
Temporal aggregation
Outlier detection
method
IQR method :
[LB,UB] = [Q25 – k*IQR, Q75 + k*IQR]
M method :
[LB,UB] = [Q50 – k*Q50, Q50 + k*Q50]
Q75 method :
[LB,UB] = [Q25 – k*Q25, Q25 + k*Q75]
Groundtruth
dataset
 Football matches
 Fairs
 Protests
 Other events, large crowds
Events happening in the period
of time the data covers
Measuring precision and
recall of the system
True positives (tp)
False positives (fp)
False negatives (fn)
Precision = tp / (tp + fp)
Recall = tp / (tp + fn)
Precision – Recall of event detection system : CDR
By combining the results from
the two datasets
• Improvement of precision – recall
performance of the method
• The improvement is limited in the
long run by the main dataset.
• The same improvement can be
observed also by joining the results
of the other datasets.
Improving event detection results by data fusion
By using the CDR data the
events can be detected but
not described:
• By joining the results the data
can complement and enrich
each other.
• In this case the social dataset
can be used to describe
semantically the events
Data fusion for Event description
Re-identification of CDR data by using social
network geo-tagged data
Information fusion for anonymized CDR data de-
anonymization.
Montjoye, Y. et al. (2013). “Unique in the crowd. The privacy bounds of
human mobility”. In: Scientific Reports 3, pp. 161 –180
Cecaj, Alket, Marco Mamei, and Franco Zambonelli (2015). “Re-identification and Information
Fusion Between Anonymized CDR and Social Network Data”. Journal of Ambient Intelligence
and Humanized Computing, pp. 1–14.
CDR and Social: event distribution and R.G
Mobility measures and uniqueness of users mobility (unique in the crowd)
Knowledge extraction : uniqueness of traces
Knowledge extraction : uniqueness of mobility traces
• Given that CDR user Ci has Ni events (points) in common with FTi, how likely is that the two
users are the same?
• Question is both novel (no other works addressing it in this domain) and fundamental
• Conditional probability
• Even the percentage is low in a data set of millions of users there is a consistent
number of them that can be identified.
Re-identification : probabilistic approach
Conclusions
• Information fusion as a an enabling process for novel applications
- Future work oriented towards the “structured data fusion” idea
• Privacy
- anonimty VS re-identification and remaining utility of data
- variations of existing privacy preserving techniques (Differential privacy.)
Publications
• Nicola Bicocchi, Alket Cecaj, Damiano Fontana, Marco Mamei, Andrea Sassi, Franco Zambonelli: “ Collective Awareness
for Human ICT Collaboration in Smart Cities”. IEEE WETICE International conference on state-of-the art research in
enabling technologies for collaboration 17-20 2013.
• Alket Cecaj, Marco Mamei, Nicola Bicocchi : “ Re-identification of Anonymized CDR datasets Using Social Network Data
”. IEEE Percom International conference on Pervasive Computing and Communications. Budapest, Hungary 24-28, 2014.
• Cecaj Alket, Marco Mamei (2016) : “Data Fusion for City Life Event Detection” In: Journal of Ambient Intelligence and
Humanized Computing, pp 1– 15.
• Nicola Bicocchi, Alket Cecaj, Damiano Fontana, Marco Mamei, Andrea Sassi, Franco Zambonelli.(2014) “ Social
Collective Awareness in Socio-Technical Urban Superorganisms ”. Social Collective Intelligence Combining the Powers
Of Humans and Machines to Build a Smarter Society,Part III, Applications and Case studies, page 227.
• Cecaj, Alket, Marco Mamei, and Franco Zambonelli (2015). “Re-identification and Information Fusion Between
Anonymized CDR and Social Network Data”. In: Journal of Ambient Intelligence and Humanized Computing, pp. 1–14.

Weitere ähnliche Inhalte

Was ist angesagt?

The impact of Big Data on next generation of smart cities
The impact of Big Data on next generation of smart citiesThe impact of Big Data on next generation of smart cities
The impact of Big Data on next generation of smart cities
PayamBarnaghi
 
by Warren Jin
by Warren Jin by Warren Jin
by Warren Jin
butest
 
Feature based similarity search in 3 d object databases
Feature based similarity search in 3 d object databasesFeature based similarity search in 3 d object databases
Feature based similarity search in 3 d object databases
unyil96
 
Working with real world data
Working with real world dataWorking with real world data
Working with real world data
PayamBarnaghi
 
Internet of Things: The story so far
Internet of Things: The story so farInternet of Things: The story so far
Internet of Things: The story so far
PayamBarnaghi
 
Dacena
DacenaDacena
Dacena
miss-lab
 

Was ist angesagt? (20)

The impact of Big Data on next generation of smart cities
The impact of Big Data on next generation of smart citiesThe impact of Big Data on next generation of smart cities
The impact of Big Data on next generation of smart cities
 
Physical-Cyber-Social Data Analytics & Smart City Applications
Physical-Cyber-Social Data Analytics & Smart City ApplicationsPhysical-Cyber-Social Data Analytics & Smart City Applications
Physical-Cyber-Social Data Analytics & Smart City Applications
 
TREND-BASED NETWORKING DRIVEN BY BIG DATA TELEMETRY FOR SDN AND TRADITIONAL N...
TREND-BASED NETWORKING DRIVEN BY BIG DATA TELEMETRY FOR SDN AND TRADITIONAL N...TREND-BASED NETWORKING DRIVEN BY BIG DATA TELEMETRY FOR SDN AND TRADITIONAL N...
TREND-BASED NETWORKING DRIVEN BY BIG DATA TELEMETRY FOR SDN AND TRADITIONAL N...
 
On Physical Web models
On Physical Web modelsOn Physical Web models
On Physical Web models
 
50120140506002
5012014050600250120140506002
50120140506002
 
Visualizing Networked Collaboration
Visualizing Networked CollaborationVisualizing Networked Collaboration
Visualizing Networked Collaboration
 
Dynamic Data Analytics for the Internet of Things: Challenges and Opportunities
Dynamic Data Analytics for the Internet of Things: Challenges and OpportunitiesDynamic Data Analytics for the Internet of Things: Challenges and Opportunities
Dynamic Data Analytics for the Internet of Things: Challenges and Opportunities
 
Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things
 
by Warren Jin
by Warren Jin by Warren Jin
by Warren Jin
 
9th International Conference on Database and Data Mining (DBDM 2021)
9th International Conference on Database and Data Mining (DBDM 2021)9th International Conference on Database and Data Mining (DBDM 2021)
9th International Conference on Database and Data Mining (DBDM 2021)
 
Opportunities and Challenges of Large-scale IoT Data Analytics
Opportunities and Challenges of Large-scale IoT Data AnalyticsOpportunities and Challenges of Large-scale IoT Data Analytics
Opportunities and Challenges of Large-scale IoT Data Analytics
 
The Internet of Things: What's next?
The Internet of Things: What's next? The Internet of Things: What's next?
The Internet of Things: What's next?
 
Dynamic Semantics for Semantics for Dynamic IoT Environments
Dynamic Semantics for Semantics for Dynamic IoT EnvironmentsDynamic Semantics for Semantics for Dynamic IoT Environments
Dynamic Semantics for Semantics for Dynamic IoT Environments
 
CityPulse: Large-scale data analysis for smart city applications
CityPulse: Large-scale data analysis for smart city applicationsCityPulse: Large-scale data analysis for smart city applications
CityPulse: Large-scale data analysis for smart city applications
 
Feature based similarity search in 3 d object databases
Feature based similarity search in 3 d object databasesFeature based similarity search in 3 d object databases
Feature based similarity search in 3 d object databases
 
Engage Project on Open Data
Engage Project on Open DataEngage Project on Open Data
Engage Project on Open Data
 
On Crowd-sensing back-end
On Crowd-sensing back-endOn Crowd-sensing back-end
On Crowd-sensing back-end
 
Working with real world data
Working with real world dataWorking with real world data
Working with real world data
 
Internet of Things: The story so far
Internet of Things: The story so farInternet of Things: The story so far
Internet of Things: The story so far
 
Dacena
DacenaDacena
Dacena
 

Andere mochten auch

Session 7.3 Implementing threat intelligence systems - Moving from chaos to s...
Session 7.3 Implementing threat intelligence systems - Moving from chaos to s...Session 7.3 Implementing threat intelligence systems - Moving from chaos to s...
Session 7.3 Implementing threat intelligence systems - Moving from chaos to s...
Puneet Kukreja
 
Eidws 110 operations
Eidws 110 operationsEidws 110 operations
Eidws 110 operations
IT2Alcorn
 
Eidws 109 communications
Eidws 109 communicationsEidws 109 communications
Eidws 109 communications
IT2Alcorn
 
Eidws 111 opsec
Eidws 111 opsecEidws 111 opsec
Eidws 111 opsec
IT2Alcorn
 
Eidws 112 intelligence
Eidws 112 intelligenceEidws 112 intelligence
Eidws 112 intelligence
IT2Alcorn
 
Intelligence Specialist Resume
Intelligence Specialist ResumeIntelligence Specialist Resume
Intelligence Specialist Resume
Jennifer Ng
 
Military_Review_JAN_2017
Military_Review_JAN_2017Military_Review_JAN_2017
Military_Review_JAN_2017
Victor R. Morris
 
Military_Intelligence_Professional_Bulletin_OCT_DEC_2015
Military_Intelligence_Professional_Bulletin_OCT_DEC_2015Military_Intelligence_Professional_Bulletin_OCT_DEC_2015
Military_Intelligence_Professional_Bulletin_OCT_DEC_2015
Victor R. Morris
 
Military_Intelligence_Professional_Bulletin_APR_JUN_2015
Military_Intelligence_Professional_Bulletin_APR_JUN_2015Military_Intelligence_Professional_Bulletin_APR_JUN_2015
Military_Intelligence_Professional_Bulletin_APR_JUN_2015
Victor R. Morris
 

Andere mochten auch (20)

Heterogeneous data fusion with multiple kernel growing self organizing maps
Heterogeneous data fusion with multiple kernel growing self organizing mapsHeterogeneous data fusion with multiple kernel growing self organizing maps
Heterogeneous data fusion with multiple kernel growing self organizing maps
 
Pivotal role of intelligence analysis in ILP
Pivotal role of intelligence analysis in ILPPivotal role of intelligence analysis in ILP
Pivotal role of intelligence analysis in ILP
 
Competitive intelligence-analysis-tools-for-economic-development
Competitive intelligence-analysis-tools-for-economic-developmentCompetitive intelligence-analysis-tools-for-economic-development
Competitive intelligence-analysis-tools-for-economic-development
 
Ontologijos, semantinis saitynas ir semantinė paieška
Ontologijos, semantinis saitynas ir semantinė paieškaOntologijos, semantinis saitynas ir semantinė paieška
Ontologijos, semantinis saitynas ir semantinė paieška
 
Session 7.3 Implementing threat intelligence systems - Moving from chaos to s...
Session 7.3 Implementing threat intelligence systems - Moving from chaos to s...Session 7.3 Implementing threat intelligence systems - Moving from chaos to s...
Session 7.3 Implementing threat intelligence systems - Moving from chaos to s...
 
Executive Communications
Executive CommunicationsExecutive Communications
Executive Communications
 
Data Fusion for Dealing with the Recommendation Problem
Data Fusion for Dealing with the Recommendation ProblemData Fusion for Dealing with the Recommendation Problem
Data Fusion for Dealing with the Recommendation Problem
 
2004 06 intelligence analysis seminar
2004 06 intelligence analysis seminar2004 06 intelligence analysis seminar
2004 06 intelligence analysis seminar
 
Intelligence Analysis & Cognitive Biases: an Illustrative Case Study
Intelligence Analysis & Cognitive Biases: an Illustrative Case StudyIntelligence Analysis & Cognitive Biases: an Illustrative Case Study
Intelligence Analysis & Cognitive Biases: an Illustrative Case Study
 
What can go wrong in executive communications
What can go wrong in executive communicationsWhat can go wrong in executive communications
What can go wrong in executive communications
 
Eidws 110 operations
Eidws 110 operationsEidws 110 operations
Eidws 110 operations
 
Eidws 109 communications
Eidws 109 communicationsEidws 109 communications
Eidws 109 communications
 
Eidws 111 opsec
Eidws 111 opsecEidws 111 opsec
Eidws 111 opsec
 
Eidws 112 intelligence
Eidws 112 intelligenceEidws 112 intelligence
Eidws 112 intelligence
 
intelligence report format
intelligence report formatintelligence report format
intelligence report format
 
Intelligence Specialist Resume
Intelligence Specialist ResumeIntelligence Specialist Resume
Intelligence Specialist Resume
 
Open source intelligence analysis
Open source intelligence analysisOpen source intelligence analysis
Open source intelligence analysis
 
Military_Review_JAN_2017
Military_Review_JAN_2017Military_Review_JAN_2017
Military_Review_JAN_2017
 
Military_Intelligence_Professional_Bulletin_OCT_DEC_2015
Military_Intelligence_Professional_Bulletin_OCT_DEC_2015Military_Intelligence_Professional_Bulletin_OCT_DEC_2015
Military_Intelligence_Professional_Bulletin_OCT_DEC_2015
 
Military_Intelligence_Professional_Bulletin_APR_JUN_2015
Military_Intelligence_Professional_Bulletin_APR_JUN_2015Military_Intelligence_Professional_Bulletin_APR_JUN_2015
Military_Intelligence_Professional_Bulletin_APR_JUN_2015
 

Ähnlich wie Information Fusion Methods for Location Data Analysis

Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Artificial Intelligence Institute at UofSC
 
How to make data more usable on the Internet of Things
How to make data more usable on the Internet of ThingsHow to make data more usable on the Internet of Things
How to make data more usable on the Internet of Things
PayamBarnaghi
 

Ähnlich wie Information Fusion Methods for Location Data Analysis (20)

Real World Internet, Smart Cities and Linked Data: Mirko Presser (Alexandrea ...
Real World Internet, Smart Cities and Linked Data: Mirko Presser (Alexandrea ...Real World Internet, Smart Cities and Linked Data: Mirko Presser (Alexandrea ...
Real World Internet, Smart Cities and Linked Data: Mirko Presser (Alexandrea ...
 
Efficient Attack Detection in IoT Devices using Feature Engineering-Less Mach...
Efficient Attack Detection in IoT Devices using Feature Engineering-Less Mach...Efficient Attack Detection in IoT Devices using Feature Engineering-Less Mach...
Efficient Attack Detection in IoT Devices using Feature Engineering-Less Mach...
 
EFFICIENT ATTACK DETECTION IN IOT DEVICES USING FEATURE ENGINEERING-LESS MACH...
EFFICIENT ATTACK DETECTION IN IOT DEVICES USING FEATURE ENGINEERING-LESS MACH...EFFICIENT ATTACK DETECTION IN IOT DEVICES USING FEATURE ENGINEERING-LESS MACH...
EFFICIENT ATTACK DETECTION IN IOT DEVICES USING FEATURE ENGINEERING-LESS MACH...
 
Networks, Deep Learning and COVID-19
Networks, Deep Learning and COVID-19Networks, Deep Learning and COVID-19
Networks, Deep Learning and COVID-19
 
Comparative Analysis of K-Means Data Mining and Outlier Detection Approach fo...
Comparative Analysis of K-Means Data Mining and Outlier Detection Approach fo...Comparative Analysis of K-Means Data Mining and Outlier Detection Approach fo...
Comparative Analysis of K-Means Data Mining and Outlier Detection Approach fo...
 
Cloud computing and networking course: paper presentation -Data Mining for In...
Cloud computing and networking course: paper presentation -Data Mining for In...Cloud computing and networking course: paper presentation -Data Mining for In...
Cloud computing and networking course: paper presentation -Data Mining for In...
 
journal for research
journal for researchjournal for research
journal for research
 
Crowdsourcing and citizen engagement for people-centric smart cities
Crowdsourcing and citizen engagement for people-centric smart citiesCrowdsourcing and citizen engagement for people-centric smart cities
Crowdsourcing and citizen engagement for people-centric smart cities
 
Semantic Technologies for the Internet of Things: Challenges and Opportunities
Semantic Technologies for the Internet of Things: Challenges and Opportunities Semantic Technologies for the Internet of Things: Challenges and Opportunities
Semantic Technologies for the Internet of Things: Challenges and Opportunities
 
Internet of Things and Large-scale Data Analytics
Internet of Things and Large-scale Data Analytics Internet of Things and Large-scale Data Analytics
Internet of Things and Large-scale Data Analytics
 
Adopting a User Modeling Approach to Quantify the City
Adopting a User Modeling Approach to Quantify the CityAdopting a User Modeling Approach to Quantify the City
Adopting a User Modeling Approach to Quantify the City
 
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
 
Crowdsourcing Approaches for Smart City Open Data Management
Crowdsourcing Approaches for Smart City Open Data ManagementCrowdsourcing Approaches for Smart City Open Data Management
Crowdsourcing Approaches for Smart City Open Data Management
 
Big Data & Smart City Applications
Big Data & Smart City ApplicationsBig Data & Smart City Applications
Big Data & Smart City Applications
 
IJWMN -Malware Detection in IoT Systems using Machine Learning Techniques
IJWMN -Malware Detection in IoT Systems using Machine Learning TechniquesIJWMN -Malware Detection in IoT Systems using Machine Learning Techniques
IJWMN -Malware Detection in IoT Systems using Machine Learning Techniques
 
MALWARE DETECTION IN IOT SYSTEMS USING MACHINE LEARNING TECHNIQUES
MALWARE DETECTION IN IOT SYSTEMS USING MACHINE LEARNING TECHNIQUESMALWARE DETECTION IN IOT SYSTEMS USING MACHINE LEARNING TECHNIQUES
MALWARE DETECTION IN IOT SYSTEMS USING MACHINE LEARNING TECHNIQUES
 
How to make data more usable on the Internet of Things
How to make data more usable on the Internet of ThingsHow to make data more usable on the Internet of Things
How to make data more usable on the Internet of Things
 
Profiling Linked Open Data
Profiling Linked Open DataProfiling Linked Open Data
Profiling Linked Open Data
 
Big social data analytics - social network analysis
Big social data analytics - social network analysis Big social data analytics - social network analysis
Big social data analytics - social network analysis
 
Big Data and IOT
Big Data and IOTBig Data and IOT
Big Data and IOT
 

Mehr von Alket Cecaj (6)

Distributed systems and blockchain technology
Distributed systems and blockchain technologyDistributed systems and blockchain technology
Distributed systems and blockchain technology
 
Joomla
Joomla Joomla
Joomla
 
Elaborazione e rappresentazione grafica e interattiva dell'informazione
Elaborazione e rappresentazione grafica e interattiva dell'informazioneElaborazione e rappresentazione grafica e interattiva dell'informazione
Elaborazione e rappresentazione grafica e interattiva dell'informazione
 
Collective awareness for human ict collaboration in smart cities
Collective awareness for human ict collaboration in smart citiesCollective awareness for human ict collaboration in smart cities
Collective awareness for human ict collaboration in smart cities
 
Algorithms presentation
Algorithms presentationAlgorithms presentation
Algorithms presentation
 
Bridges innovcampdk
Bridges innovcampdkBridges innovcampdk
Bridges innovcampdk
 

KĂźrzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

KĂźrzlich hochgeladen (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

Information Fusion Methods for Location Data Analysis

  • 1. Information fusion methods for location data analysis Candidate: Alket Cecaj Supervisor: Prof. Marco Mamei Doctorate School in Industrial Innovation Engineering
  • 2. Thesis outline • Introduction • Data Fusion for Event Detection and Event Description Using Agg. CDR • Re-identification of Anonymized CDR Records Using Information Fusion • Privacy issues • Conclusions
  • 3. Data Fusion and Location data • Data Fusion • Location Data types: - CDR (Call Description Records) aggregated or individual. - Geo-tagged social network data or LBS as Foursquare - Location data as Open data. Example: census data.
  • 4. Data fusion for event detection by using aggregated CDR and geo-tagged social network data Detecting and describing events happening in urban areas by analysing spatio – temporal data • Detecting and describing events happening in urban areas by analysing spatio – temporal data • Prevoious works: Laura Ferrari, Marco Mamei, Massimo Colonna (2012) : “ People get together on special events: Discovering happenings in the city via cell network analysis ” Pervasive Computing and Communications Workshops (PERCOM Workshops), 2012 IEEE International Conference on. • Publication: Cecaj Alket, Marco Mamei (2016) : “Data Fusion for City Life Event Detection” In: Journal of Ambient Intelligence and Humanized Computing, pp 1– 15.
  • 5.
  • 6. The dataset: spatio-temporal aggregation Spatial Aggregation Temporal aggregation
  • 7. Outlier detection method IQR method : [LB,UB] = [Q25 – k*IQR, Q75 + k*IQR] M method : [LB,UB] = [Q50 – k*Q50, Q50 + k*Q50] Q75 method : [LB,UB] = [Q25 – k*Q25, Q25 + k*Q75]
  • 8. Groundtruth dataset  Football matches  Fairs  Protests  Other events, large crowds Events happening in the period of time the data covers
  • 9. Measuring precision and recall of the system True positives (tp) False positives (fp) False negatives (fn) Precision = tp / (tp + fp) Recall = tp / (tp + fn)
  • 10. Precision – Recall of event detection system : CDR
  • 11. By combining the results from the two datasets • Improvement of precision – recall performance of the method • The improvement is limited in the long run by the main dataset. • The same improvement can be observed also by joining the results of the other datasets. Improving event detection results by data fusion
  • 12. By using the CDR data the events can be detected but not described: • By joining the results the data can complement and enrich each other. • In this case the social dataset can be used to describe semantically the events Data fusion for Event description
  • 13. Re-identification of CDR data by using social network geo-tagged data Information fusion for anonymized CDR data de- anonymization. Montjoye, Y. et al. (2013). “Unique in the crowd. The privacy bounds of human mobility”. In: Scientific Reports 3, pp. 161 –180 Cecaj, Alket, Marco Mamei, and Franco Zambonelli (2015). “Re-identification and Information Fusion Between Anonymized CDR and Social Network Data”. Journal of Ambient Intelligence and Humanized Computing, pp. 1–14.
  • 14. CDR and Social: event distribution and R.G
  • 15. Mobility measures and uniqueness of users mobility (unique in the crowd) Knowledge extraction : uniqueness of traces
  • 16. Knowledge extraction : uniqueness of mobility traces
  • 17. • Given that CDR user Ci has Ni events (points) in common with FTi, how likely is that the two users are the same? • Question is both novel (no other works addressing it in this domain) and fundamental • Conditional probability • Even the percentage is low in a data set of millions of users there is a consistent number of them that can be identified. Re-identification : probabilistic approach
  • 18. Conclusions • Information fusion as a an enabling process for novel applications - Future work oriented towards the “structured data fusion” idea • Privacy - anonimty VS re-identification and remaining utility of data - variations of existing privacy preserving techniques (Differential privacy.)
  • 19. Publications • Nicola Bicocchi, Alket Cecaj, Damiano Fontana, Marco Mamei, Andrea Sassi, Franco Zambonelli: “ Collective Awareness for Human ICT Collaboration in Smart Cities”. IEEE WETICE International conference on state-of-the art research in enabling technologies for collaboration 17-20 2013. • Alket Cecaj, Marco Mamei, Nicola Bicocchi : “ Re-identification of Anonymized CDR datasets Using Social Network Data ”. IEEE Percom International conference on Pervasive Computing and Communications. Budapest, Hungary 24-28, 2014. • Cecaj Alket, Marco Mamei (2016) : “Data Fusion for City Life Event Detection” In: Journal of Ambient Intelligence and Humanized Computing, pp 1– 15. • Nicola Bicocchi, Alket Cecaj, Damiano Fontana, Marco Mamei, Andrea Sassi, Franco Zambonelli.(2014) “ Social Collective Awareness in Socio-Technical Urban Superorganisms ”. Social Collective Intelligence Combining the Powers Of Humans and Machines to Build a Smarter Society,Part III, Applications and Case studies, page 227. • Cecaj, Alket, Marco Mamei, and Franco Zambonelli (2015). “Re-identification and Information Fusion Between Anonymized CDR and Social Network Data”. In: Journal of Ambient Intelligence and Humanized Computing, pp. 1–14.

Hinweis der Redaktion

  1. Lo scopo dell mio lavoro di tesi è quello di : 1- sviluppare delle tecniche di data fusion per dati geo-referenziati. Questo lavoro, se da una parte ha permesso di sviluppare applicazioni per arricchire i data set stessi dal altra ha fatto emergere problematiche di privacy che derivano dal processo di data fusion. 2- Questo lavoro,da un lato ha permesso di 2.1- sviluppare diverse applicazioni per arricchire i data set stessi e 2.2- dall’altro ha evidenziato alcuni problemi di privacy che derivano dal processo di data fusion.
  2. La tesi si articola secondo i seguenti punti : Dopo una prima parte introduttiva si presenta uno studio di rilevamento automatico di grandi eventi in aree urbane usando dati aggregati di telefonia mobile e dati social geo-referenziati. Dai dati aggregati si passa ai dati CDR anonimizzati che mostrano tracce di mobilità individuali. In particolare in questo lavoro si mostra come il processo di data fusion con questi dati può impattare la privacy. Alla fine, insieme alle conclusioni si presentano diversi punti ancora aperti sia per quanto riguarda il campo di data fusion che quello sulla privacy preserving.
  3. Data fusion è il processo di combinazione e integrazione di più data set. Il processo analizza diversi dati set cosi che ciascun di questi possa interagire, informare e completare gli altri data set. Invece per quanto riguarda i tipi di dati geo-referenziati questi sono CDR o Call Description Records che possono essere di due formati : Livelli di attività (chiamate ,SMS o connessione dati) in una certa zona in maniera aggregata Dati che mostrano tracce di mobilità individuali Un’altra fonte di location data sono anche i dati social geo-referenziati e gli open data ad esempio dati di censimento.
  4. 1- Presento subito il primo caso di applicazione data fusion che è un sistema di event detection che usa dati CDR aggregati. Molto spesso i city manager o le autorità locali devono capire (anche con una certa urgenza se in caso di emergenza) quello che succede in una determinata area della città, oppure semplicemente capire le dinamiche di una zona urbana dal punto di vista del traffico, inquinamento del aria, movimenti di persone ecc.. ) e attuare miglioramenti in questo senso. 2- Questo studio segue questa direzione ed ha come obiettivo quello di creare un applicazione che possa rilevare in maniera automatica gli eventi nelle zone urbane a partire dall’analisi di dati CDR aggregati e dai dati social geo-referenziati. 3- Altri lavori fatti in questo ambito sono : Ferrari Mamei Colonna (2012) presentato alla Percom2012 4- Questo lavoro è stato publicato in «Journal of Ambient Intelligence and Humanized Computing»
  5. 1- (I dati CDR) i dati CDR (o Call Description Records) aggregati mostrano livelli di attivitĂ  in termini di chiamate e sms in uscita o in entrata in una certa zona. 2- (dati forniti ) durante un Big-Data challenge organizzato da TIM Telecom Italia nel 2014 e riguardano due cittĂ  che sono Milano e Trento. 3- (Il grafico mostra ) i livelli di attivitĂ  di una cella della griglia vicina a uno stadio dove tipicamente nel weekend ci sono attivitĂ  sportive in un arco temporale di due mesi.
  6. Per il nostro approccio di analisi e rilevazione degli eventi abbiamo aggregato i dati dal punto di vista spaziale e temporale. L’aggregazione spaziale ci aiuta in due punti 1- il primo e quello secondo cui se l’area dove si svolge un evento risulta frantumata in più celle allora con l’aggregazione riusciamo a identificare l’area del evento con 1 o massimo due celle 2- il secondo punto invece ha a che fare con aspetti computazionali e cioè con il fatto che con meno celle possiamo rilevare gli eventi in meno tempo L’aggregazione temporale invece ci aiuta ad approssimare la distribuzione di densità di probabilità dei livelli di attività di una cella che è bimodale come in a) con una distribuzione normale come in d) aggregando i dati su base oraria e distinguendo tra giorni lavorativi e week-end.
  7. La distribuzione normale dei dati, permette di poter usare in maniera efficace, uno strumento di rappresentazione dei livelli di attività di cella nel tempo che è il boxplot. Modellando i dati in questo modo posso usare un metodo di rilevazione degli outliers (quindi degli eventi) che è il boxplot rule. Con questo metodo identifico gli outlier come valori superiori a upper bound UB dove UB = Q75 + k * IQR dove IQR = Q75 – Q25. Prendendo come riferimento un certo livello di attività o soglia valuto di volta in volta il numero di eventi che trovo per quella soglia. Il coefficiente k mi da la possibilità di poter considerare come eventi oppure no i picchi che trovo con riferimento a diversi livelli di attività di cella. Anche altre versioni di questo metodo vengono testati utilizzando al posto di IQR il Q50 oppure il Q75 quindi si parlerà di questi metodi Come IQR, M, e Q75
  8. 1- Confronto i risultati del metodo di event – detection con un inseme di dati di groundtruth 2-Questi sono un insieme di eventi successi nell’area nel periodo di riferimento del dataset stesso come partite di calcio, fiere, proteste e altri eventi che coinvolgono numeri consistenti di persone.
  9. 1- Quindi valutiamo i risultati di recall e precision del sistema confrontandoli con i dati di groundtruth. 2- In questo caso la recall mi da il rapporto tra eventi riconosciuti come tali e gli eventi che ci sono effettivamente stati nel area. 3- la precision è una misura che esprime la qualità della recall. Cioè, eventi del groung truth diviso la quantità di quello che il mio metodo di event detection (analizzando i miei dati) riconosce come eventi.
  10. Il grafico a destra mostra i risultati di precision e recall usando il metodo della mediana per i vari valori di k. In particolare ciascuna delle curve nel grafico a destra sono state ottenute con un singolo valore di k variando però il livello di soglia di riferimento e passando da un valore 1000 a un valore di circa 2500 anche il numero degli eventi che trovo varia. Per ogni livello di soglia di riferimento ottengo Un certo valore di precision e recall che riporto nel grafico. Passando dal grafico in alto a quello in basso il numero degli eventi che trovo diminuisce perchÊ ignoro gli eventi di magnitudo inferiore e mi concentro sugli eventi piÚ grandi. Questo fa si che la recall diminuisce mentre aumenta la precision, In particolare per k bassi 0.5 (come nel primo grafico in alto) si ha una recall piÚ alta ma una precisone bassa mentre per k alti migliora la precision ma la recall parte da un valore iniziale piÚ bassa. Tanti altri esperimenti su entrambe le città e con diversi tipi di cella
  11. Per integrare i risultati di event-detection ottenuti con i dati CDR e con i dati social consideriamo l’unione insiemistica degli eventi rilevati in ogni uno dei due data set. Quindi andiamo a valutare precison e recall con i risultati cosi integrati. La curva rossa mostra i valori di precision e recall finali. In particolare a parità di recall si nota un miglioramento della precision anche se tale miglioramento è limitato dagli eventi ottenuti con il dataset principale che è quello dei CDR.
  12. Un altro vantaggio del data fusion deriva dal fatto che i due data set sono complementari ai fini del event description. Quindi arricchiscono il risultato finale in quanto il data set social è in grado di descrivere gli eventi rilevati con i dati CDR . Semplicemente analizzando i topic e le parole chiave che compaiono nel testo di aggiornamento di status degli utenti social una volta che i risultati si integrano. Quindi una conclusione su questa prima parte della tesi è quello sulle opportunità che i metodi di data fusion offrono di poter arricchire e complementare i dati di un data set e anche i risultati dell’ analisi.
  13. I dati usati nell’esempio precedente sono forniti in un formatto aggregato quindi privi di riferimenti su dati individuali. In altri casi invece i CDR contengono dati anonimizzati dove l’id utente è un hash code univoco. Anche se i dati in questo caso sono anonimizzati (l’anonimizazzione non basta anche se viene considerata molto spesso sicura )c’è sempre la possibilità che vengano de-anonimizzati utilizzando per la re-identificazione altri dati come ad esempio i dati social geo-referenziati. Questo è possibile in quanto le tracce di mobilità di ciascun individuo (cosi come quelle digitali) sono uniche. Partendo da questo concetto di unicità delle tracce di mobilità il seguente studio mostra come è possibile utilizzare tecniche di data fusion per re-identificare utenti CDR anonimizzati.
  14. 1- due tipi di data set : due CDR e due data set social geo-referenziati. Il primo grafico in alto a sinistra mostra la distribuzione degli eventi (Call - SMS - Internet) per utente del primo data set CDR. Di fianco a questo grafico si mostra graficamente una misura di mobilitĂ  di questi utenti che si chiama ÂŤRadius of GyrationÂť che esprime la lunghezza media dei percorsi degli utenti CDR. Il grafico sotto esprime le stesse misure ma per gli utenti social quindi eventi per utente e ÂŤRadius of GyrationÂť
  15. In particolare l’analisi di unicità dei percorsi di mobilità ci aiuta a capire due tipi di informazioni : 1- il numero medio di punti o eventi necessari per identificare come unico un individuo 2- la percentuale degli utenti CDR che ha un percorso unico e quindi può essere associato a un’unica traccia di mobilità. Contestualizzarlo con un esempio concreto
  16. Una prima conclusione si può avere guardando i dati del grafico a sinistra che mostra il numero di punti
  17. Le conclusioni sono due in particolare : 1- Il data fusion e un processo che rende possibili diverse applicazioni tuttavia nel campo ancora manca un idea di data fusion strutturato 2- La seconda conclusione è sulla privacy in particolare quella dei dati CDR individuali che sebbene anonimizzati possono
  18. Le pubblicazioni che abbiamo fatto su questi temi.