Driving Behavioral Change for Information Management through Data-Driven Gree...
Â
NoizCrowd: A Crowd-Based Data Gathering and Management System for Noise Level Data
1. NoizCrowd:
A Crowd-Based Data Gathering and
Management System for Noise Level Data
Mariusz Wisniewski, Gianluca
Demartini, Apostolos Malatras, and
Philippe CudrĂŠ-Mauroux
University of Fribourg, Switzerland
2. Motivation - Big Data
⢠Large dataset are necessary to enable analytics
and support decision making
â Meteorological station / car traffic
⢠Set up a large-scale sensing infrastructure is
costly and time-consuming
⢠Create a large amount of valuable data
â Crowdsourcing
â Data generation models
â Smartphones as sensors
â Big Data analytics
Gianluca Demartini 2
3. NoizCrowd
⢠A crowd-sensing approach to big data generation
using commodity sensors
⢠Crowd-source noise level in a geo region
⢠Noise propagation models to generate data
⢠Array data management techniques to scale
⢠Results accessible via a visual interface
⢠Support decisions (e.g., where to live)
Gianluca Demartini 3
4. Outline
⢠Related approaches
⢠NoizCrowd Architecture Overview
â Data Gathering
â Storage
â Modeling
â Export and Visualization
⢠Data Models
⢠Performance Evaluation
Gianluca Demartini 4
5. Related Work
⢠Participatory Sensing vs Sensor Networks
â Low cost / High cost
â Mobile phones / Sensors
â Distributed / Centralized management
â Privacy, data quality
⢠Applications: Environment, vehicle routing
Gianluca Demartini 5
6. Related Work
⢠Noise Mapping Apps
â NoiseTube: opensource, widespread usage
â NoiseMap: control over data
â SoundSense: machine learning to classify sounds
⢠NoizCrowd
â Data in RDF linkable to other datasets
(linkeddata.org)
â Scalable storage: generate data by interpolation
Gianluca Demartini 6
8. Data Gathering
⢠By means of Crowd-sourcing
â GPS: location
â Microphone: noise level
â Internet connection: send data to server
⢠Microphone Calibration
â Sound level meter
â Sharing conversion table for smartphone models
Gianluca Demartini 8
9. Data Storage
⢠App sends median and peak dB values over
few seconds
⢠Spatio-temporal data: non-relational storage
system (SciDB)
â Durable storage
â Retrieve data to build models
â Export data for visualization
⢠Multi-dimensional array (space and time)
⢠Distributed storage
Gianluca Demartini 9
10. Noise Modeling
⢠Data from crowd is noisy and skewed/sparse
⢠Raw data is not shown to the end users
⢠Models to deal with
â Overlapping data
â Missing data
Gianluca Demartini 10
11. Data Export and Visualization
⢠From SciDB data is
â converted to RDF
â stored in dipLODocus[RDF]
â Available via SPARQL
⢠Visualization
â Overlay noise level on a map
â Additional chart for time evolution
Gianluca Demartini 11
13. Data Models
⢠Spatial Interpolation
â In the same time interval, data from different
locations
â Need to be computational simple (large volume)
â Bi-dimensional range queries in space (SciDB)
â K-nearest neighbor interpolation
â Computed in parallel
Gianluca Demartini
14. Data Models
⢠Temporal interpolation
â Short ranges (minutes) like spatial interp. in 3D
â Long ranges, look for patterns and infer
⢠E.g., every Monday at 11am we have 50dB and we miss
a Monday measurement
⢠E.g., same measurement (50dB) in same area 2h ago
and now
Gianluca Demartini 14
15. Noise Propagation Models
⢠We adopt an existing model that takes into
account:
â Sound power
â Distance from source
â Directivity
â Atmospheric absorption
â Excess attenuation (we use meteo conditions)
⢠Difficult to measure with smartphone
⢠Constant in a given region (and use GPS info)
Gianluca Demartini 15
16. Materialization of Models
⢠Data from models
â Is computationally expensive to generate
â May be a lot since we can cover any region
⢠We do late materialization
â At query time
â Only for the specific request
â Cached and indexed for future requests
â Incremental updates of views, if possible
Gianluca Demartini 16
17. Performance Evaluation (1)
⢠30 outdoor deployments
â 2,3,4 smartphones
â Multiple noise sources
â Urban setting, flat area of 50x50 meters
⢠Professional-grade noise level meter as gold
standard measurement
⢠85% of interpolated data +-6dB error
⢠63% of interpolated data +-4dB error
Gianluca Demartini 17
19. Performance Evaluation (3)
⢠Sound level of source error
â 16% with 3 measurements
â 10% with 4 measurements
â 9% with 5 measurements
⢠Source location
â 3m error on average
Gianluca Demartini 19
20. NoizCrowd - Conclusions
⢠Large scale data is key for decision making
⢠Crowd-source noise level data using mobiles
â Scale-out using an array backend
â Generate missing data and visualize
⢠Next steps
â Android app
â Data recording as background feature
â Additional materialization strategies
http://exascale.info
Gianluca Demartini 20