Search, Discovery and Analysis of Sensory Data Streams
1. Search, Discovery and Analysis
of Sensory Data Streams
Centre for Vision, Speech and Signal Processing (CVSSP), University of
Care Technology & Research Centre, The UK Dementia Research
SAW2019: 1st International Workshop on Sensors and Actuators on
2. 46 years ago on the 5th of November (submission
• A 32 bit IP address was
used of which the first 8
bits signified the network
and the remaining 24 bits
designated the host on
• The assumption was that
256 networks would be
sufficient for the
• Obviously this was before
LANs (Ethernet was
under development at
Xerox PARC at that time).
28. The Crawling Challenge
− Uniform policy: re-visiting all pages in the collection with
the same frequency, regardless of their rates of change.
− Proportional policy: re-visiting more often the pages that
change more frequently. The visiting frequency is directly
proportional to the (estimated) change frequency.
Cho, Junghoo; Garcia-Molina, Hector (2003). "Effective page refresh policies for Web
crawlers". ACM Transactions on Database Systems. 28 (4): 390–426.
29. Web Crawling
− Cho and Garcia-Molina proved the surprising result that,
in terms of average freshness, the uniform policy
outperforms the proportional policy in both a simulated
Web and a real Web crawl.
− Allocating too many new crawls to rapidly changing
pages at the expense of less frequently updating pages.
− A proportional policy allocates more resources to
crawling frequently updating pages, but experiences less
overall freshness time from them.
30. Crawling and the Freshness Issue
− To improve freshness, the crawler should penalise the
elements that change too often.
− The optimal re-visiting policy is neither the uniform policy
nor the proportional policy.
− The optimal method for keeping average freshness high
includes ignoring the pages that change too often, and
the optimal for keeping average age low is to use access
frequencies that monotonically (and sub-linearly)
increase with the rate of change of each page.
Junghoo Cho; Hector Garcia-Molina (2003). "Estimating frequency of change". ACM
Transactions on Internet Technology. 3 (3): 256–290.
39. Some of the Research Challenges
− Provenance monitoring and fact checking algorithms
− Dealing with noisy, incomplete and dynamic data.
− Handling and processing large data streams, search and
identification of patterns.
− Crawling, search and query of changing data
− Multi-modal information analysis and continual and
adaptive learning algorithms
− Security, privacy, trust and accessibility
− Solutions to keep (and make) the Web a safe, open,
inclusive and collaborative environment.
− S. Enshaeifar et. al, "Health management and pattern analysis of daily living activities
of people with Dementia using in-home sensors and machine learning techniques",
PLoS ONE 13(5): e0195605, 2018.
− A. González Vidal, P. Barnaghi, A. F. Skarmeta, "BEATS: Blocks of Eigenvalues
Algorithm for Time series Segmentation", IEEE Transactions on Knowledge and Data
Engineering (TKDE), 2018.
− Y. Fathy, P. Barnaghi, R. Tafazolli, "An Online Adaptive Algorithm for Change
Detection in Streaming Sensory Data", IEEE Systems Journal, 2018.
− Y. Fathy, P. Barnaghi, R. Tafazolli, "Large-Scale Indexing, Discovery and Ranking for
the Internet of Things (IoT)", ACM Computing Surveys, 2017.
− S. A. Hosieni Tabatabaei, Y. Fathy, P. Barnaghi, C. Wang, R. Tafazolli, "A Novel
Indexing Method for Scalable IoT Source Lookup", IEEE Internet of Things Journal,
− Y. Fathy, P. Barnaghi, R. Tafazolli, "Distributed Spatial Indexing for the Internet of
Things Data Management", Proc. of IFIP/IEEE International Symposium on
Integrated Network Management, Lisbon, Portugal, May 2017.