In Ambient Assisted Living environments, monitoring the elderly population can detect a wide range of environmental and user-specific parameters such as daily activities, a regular period of inactivity, usual behavioural patterns and other basic routines. The prime goal of this proposal is to experiment the anomaly detection methods and clustering techniques such as K-means, local outlier factor, K-nearest, DBSCAN and CURE on data and determine the most efficient and accurate method among all.
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
Data Analytics Project proposal: Smart home based ambient assisted living - Data Mining
1. PROJECT PROPOSAL
Smart Home Based Ambient Assisted Living
A Comparative Study of Anomaly Detection Techniques
TARUN SWARUP
2988527
Submitted in partial fulfilment for the degree of
Master of Science in Big Data Management & Analytics
Griffith College Dublin
September , 2018
Under the supervision of Supervisor’s Name
DR.WASEEM AKTHAR
2. Abstract:
Smart home technologies have scaled up from everyday usage to assisting healthcare at home. It is
possible for elderly and physically impaired people to live independently in their existing home
with the evolution of Internet of Technology (IoT) , assistive informatics , real-time monitoring and
forecasting. When human activities are observed and analysed systematically , health statistics and
behavioural patterns can be extracted. Several research works existing research works based on
improving ambient assisted living environment for the elderly population were carried out testing
residents and collecting realtime information based on their activities. Considering the health issues
and age-related conditions of the elder population , ambient assisted living (AAL) technologies
were developed to encourage and enhance independent living , without losing the safety and
security that an unobtrusive healthcare monitoring system delivers. Anomaly detection and data
clustering is implemented on realtime sensory data using various existing techniques. The prime
goal of this proposal is to experiment the anomaly detection methods and clustering techniques such
as k-means , local outlier factor , k-nearest , DBSCAN and CURE on data and determine the most
efficient and accurate method among all.
I. Introduction
The main purpose of this project is to enable the independent living of elderly / physically impaired
people in their homes. Enhancements in technology aids the development , evolution and improves
the capability of smart homes and intelligent environments. Ambient intelligence is defined as a
digital environment that is sensitive, adaptive and responsive to the presence of people [1]. Smart
home technologies were incorporated with particular sensors monitors the activities and behaviours
of elderly people. This eliminates the need for a caretaker to accompany them all day long. One of
the major concerns for healthcare assistants / caregivers is to be available instantly in case of life
threatening situations. So when the activities and behavioural patterns are monitored , real time
forecast of an individual’s well being can be anticipated. Predicting events and detecting
anomalies(unusual behavioural patterns) in such data sets is a complex task [2]. Anomaly detection
is the recognition of items or entities that do not comply with the usual or expected pattern present
in the whole dataset. They differ from the existing items to a higher degree. Machine learning
algorithms can be designed or formulated in a way that they get accustomed to the behavioural
patterns and activities of the residents , the results of which can be manipulated to detect anomalous
behaviour. These patterns representing the physical , mental and cognitive health conditions of
people recognise deviations or anomalies and help us prevent unrealised or potential health
problems. According to the demographic study , of multiple countries ageing population is
increasing dramatically [3]. Hence , it is essential to help older adults and physically challenged
people lead an independent and healthy lifestyle for a prolonged period of time. The ultimate goal
of this research project is to test real time data accumulated from real users aiming to improve the
comfort , security and safety of the target population of elderly and physically challenged people.
Data mining is the process of identifying and analysing patterns in data to derive meaningful
insights which can contribute in better decision making and forecasting. One of the major threats
with respect to data mining is outlier or Anomaly. Anomaly detection is somewhat identical to
Novelty detection. Novelty detection involves identifying novelties by detecting whether a new data
entity can be classified as an outlier. Clustering involves the process of grouping similar kind of
data points so that data within each of the clusters share certain extent of similarity. Based on the
type of metrics used, a data instance can be grouped under single cluster or can belong to multiple
3. clusters. Clustering algorithms have huge scope with respect to outlier detection and implementing
these techniques on to the data might provide valuable results compared to the traditional methods.
Combining the elements and power of data mining , anomaly detection , clustering with Smart
home technology produces colossal results in terms of ambient assisted living. This project idea
deals with applying various clustering techniques on real time data generated from smart home
sensors and devices , which are designed to monitor patients , elderly people or the physically
challenged.
II. Rationale for research
Remote health monitoring is an emerging course of study with strong potential to enhance the
everyday lives of the elderly population. Smart home technologies have scaled up from everyday
usage to assisting healthcare at home. Smart home systems evolved to a greater degree when state-
of-the-art technologies like pervasive computing , cloud computing , machine learning and artificial
intelligence were combined with the base technology , Internet Of Things. With the growing age, a
person tends to become vulnerable to various impairments physically and mentally. Increase in
healthcare costs , increase in diseases , lack of caregivers and busy schedules are major problems
for family members to stay with the elderly all the time. Though there are a number of smart
sensors and interactive devices to detect a wide range of environmental and user-specific parameters
such as daily activities , regular period of inactivity , usual behavioural patterns and other basic
routines , the sole challenge is to extract relevant and comprehensible information from the data.In
Ambient Assisted Living (AAL) environments , monitoring these parameters can be useful to
design regular models or patterns to detect anomalies and significant changes in their behaviour. In
addition , increasing the elderly autonomy and assisting them in carrying out their daily activities
(e.g., walking, eating, bathing, dressing, cooking, etc.) makes it possible to extend the time they live
in their home environment and to reduce stress. Earlier work primarily focussed on either detecting
anomalies offline or they were limited to predefined set of anomalies , which means data could
collected from real-time monitoring of the person’s activities when an event is complete .
Hierarchical Task analytic methods can be used to describe normative human behaviour represent
the various activities which users perform to reach their goals. These models are often hierarchical:
tasks are decomposed into furthermore detailed tasks until they reach elementary tasks where
behaviour is defined as any pattern in a sequence of observations and the models are generated from
sensed data. However , the problem arises when they fail to decide between the various degrees of
anomalies. As mentioned earlier , anomaly detection in real-time includes immense volume of data
associated with the user behaviour and unpredictable nature of data which are deficient in analytic
methods. The purpose of this project is to categorise real time user data and detect possible
anomalies using a number of clustering techniques and check whether the techniques deployed,
provide efficient and accurate solutions.
III. Review of Literature
Ambient Assisted Living tools are supported by various computational techniques and
algorithms .The process was split up into separate steps . Activity recognition was implemented
using mobile sensors like accelerometer and gyroscope , a network of ambient sensors to capture
complex activities or a vision-based model using hierarchical and space time approaches where the
user’s actions were recognised and represented in the form of unique , time-series patterns or logical
structures [4]. Context modelling recorded temporal and spatial information of the user and his
surroundings. Anomaly detection recognises unusual activities / behaviour considering the
4. medical compliance and detects hazardous situations and wandering patterns exercising rule-based
techniques , similarity-based techniques, and spatiotemporal information [3] . The continuous
assessment of physical and cognitive functions may lead to the early detection of potential health
problems. TigerPlace , a supportive environmental designers facilitated to study the ageing process
by observing a group of residents ( 70 - 95 years ) suffering chronic illness such as Alzheimer’s ,
heart disease or diabetes. The objective was to maximize independent living by performing holistic
assessment techniques to promote health and wellness. Data collected from physiological sensor
networks were examined and investigated to potentially correlate and visualise their activities ,
clinical events and medical readings. Most patients or elderly people would prefer unobtrusive
monitoring system to record or observe their day-to-day activities and identify/predict anomalous
behaviour. i.e., gathering data from everyday activities without disturbing the normal behaviour
using all the necessary devices such as motion sensors , predictor , smartphones , smart Television
and other smart sensors observing the user’s behaviour. In this paper , raw Sensory data was
accumulated from elderly occupants suffering from dementia , converted into an appropriate format
to indicate the correlation between the events and activities [5]. Then , the data was visualised to
detect possible anomalies or outliers and clustered to condense and summarise information ( Large
volumes of data representing normal behaviour and sparse dots denoting anomalous behaviour ) to
predict future events of the patients. Temporal reasoning enhances data mining in smart
environments by adding information about expected temporal interactions between resident
activities [6]. This paper illustrates a machine learning algorithm for anomaly detection to
automatically comprehend data models of residents to bring forth automated health monitoring.
Spatiotemporal information was incorporated into the system to to determine temporal relationships
between the time intervals of activities done by the user. In the light of Allen’s statement that time
intervals are comparatively effective than singular time points , they analyse smart environment
data to detect anomalies. Events precisely occur over intervals of time and not instantaneous points
of time holding properties [7] . Experiments were carried out to recognise and predict user activities
in a smart home environment using Activity pattern clustering and activity type decision. Abnormal
and anomalous behaviour can be revealed by constructing behavioural patterns from data generated
by sensors , so that remotely healthcare can be monitored [8] . This paper proposes a combination
of pattern clustering (k-pattern clustering ) and activity decision algorithm based on Artificial
Neural Network to detect anomalies or outliers from complex user activities. Hybrid procedures are
better than conventional methods on account of scalability and heterogeneity in the ambient
infrastructure. After thorough analysis of earlier works , we can conclude that healthcare can be
maintained remotely and predictions in a patient’s behaviour can be made to a certain extent.
IV. Anomaly Detection
Existing anomaly detection methods
Various techniques are proposed for identifying the outliers. These techniques fall broadly into three
categories: Supervised, semi-supervised and unsupervised. If the dataset has clear definition or label
to distinguish an outlier, it can be grouped under supervised technique. If the labels are available but
not significant, it can be classified as semi supervised. If the label is unavailable, then it can be
classified as unsupervised. Based on the number of dependent attributes in the dataset which
contribute to detecting outliers, techniques can be classified as univariate and multi-variate.
5. Statistics-based approach: A study done by Chaloner et al [9] , deals with how bayesian concept
can be applied in the field of outlier detection. This approach works on the assumption that the data
under investigation follows normal distribution and outliers are estimated based on the probability
of the data points. If any specific distribution is assumed in the selected attributes, they can be
classified as parametric and non-parametric.
Proximity based approach: The relative distance of the neighbouring data points and their location
density are analysed to predict whether a data point can be classified as an anomaly . Knorr et al
[10] proposed k-nearest neighbour (KNN) algorithm in which they have estimated the distance
between the neighbouring data points and how they collectively contribute to classifying whether a
data instance is an anomaly or not. Various techniques like index-based, nested loop algorithms
have been proposed as part of this research. Breunig et al. [11] have developed a factor to estimate
the degree for a given data instance to be an outlier. This factor is known as Local Outlier Factor
(LOF).
Density based clustering: This approach focuses on forming clusters based on the density of the
data. The data involved can be of any size and shape and can be noisy too. DBSCAN (Density-
based spatial clustering of applications with noise) is the most popular technique used in this field.
The study performed by Kamran Khan et al. [12] explore various DBSCAN techniques currently
available. One drawback of DBSCAN is it can’t be used for high dimensional data.
Partitioning based clustering: This is the most widely used clustering technique. This approach
iteratively relocates the data points to various clusters before the final clusters are formulated. K-
means is an unsupervised outlier technique in which the centroids of each of the clusters are
randomly assigned and iteratively relocated till the exact centroid location is computed to form the
clusters. Zhongxiang Fan et al.[13] explains about how k- means is used to implement clustering on
college students data and how it can be optimised for better performance. One major drawback of k-
means is slight variation in the analysed data can cause significant deviation in the formation of
clusters and outlier estimation.
Hierarchical clustering: This technique considers each of the data instances to be clusters and joins
adjacent points to create hierarchical order. There are two types of hierarchical clustering:
Agglomerative follows bottom up approach whereas divisive follows top down approach . BIRCH
(balanced iterative reducing and clustering using hierarchies) or CURE (Clustering Using
Representatives) are the widely known clustering techniques which can be used for identifying the
outliers.
IV. Theme Analysis
The best way to assess a newly developed idea is by gathering the different views and perspectives
of what fellow people feel about it. A survey was conducted where people of various age groups
and professions participated . The questions were designed in such a way to check their level of
interest in technology and automation , preferability and affinity for smart home devices , their
reactions towards the project and factors they would consider in case they’re interested in
purchasing smart home devices for assisting healthcare. The participants were given inline
descriptions about smart homes and ambient assisted living environment in common terms. After
analysing and studying the gathered survey data , the findings were :
6. • Almost half the people were very much interested in automated products.
• Only 30% consider having a smart home functionality.
• Number of people who prefer having caretakers / healthcare assistants were much the same
as the ones who don’t.
• Almost 65% of the population voted positively for installing a smart real-time monitoring
device to assist physically impaired / elderly people and about 45% felt that the product would
be highly effective .
• The major population mentioned that they would consider Price and user-friendliness
primarily before purchasing the product followed by design and service.
A few people from the same group were invited to participate in a brainstorming session where
efforts were made to find a conclusion for the project by gathering a list of ideas spontaneously. The
top priority of a brainstorming session is quantity over quality. An effective way to get going with
brainstorming and be more productive is to emphasise more on rapid ideation rather than
concentrating on the quality of ideas. The synopsis of the session is set out below :
• Preference for caretakers
• Inclination towards automation / technology.
• Factors considered before purchase of a smart home AAL technology.
• Perspectives about the product and Points of View.
• Practical restraints.
The proceedings and sequence of the keynote is given below in the form of a Mind Map branching
out to various perspectives of the participants. Throughout the discourse , emerged an intriguing
panorama of different themes according to the views and concerns of the entrants.
7. V. Research Plan
Our plan is to explore, identify, perform feasibility check, implement and compare various basic
techniques and state-of-the-art outlier detection techniques currently available onto the sensory data
interpreted by monitoring the everyday activities of elderly people with the intent to evaluate the
best outlier detection technique which provides high outlier score and precision. The research is
basically conducted on the data collected on elderly and physically challenged people who are the
research participants. The physical activities and instrumental activities are observed on a daily
basis in regular intervals. A simple sensor has the capacity to record immense spatiotemporal
information , which might be a challenging process to analyse without temporal data mining
techniques that are developed specifically . Data that is gathered in smart environments has natural
temporal elements which are essential for anomaly detection. The ultimate goal is to enhance
prediction , decision making and realtime forecasting of the resident’s behaviour and detect
anomalies in these patterns. The data collected would be raw and disorderly as it arrives from
various sensors and smart instruments placed all over the home environment .The next step would
be to convert the data into a comprehensible format which could be explored and analysed in order
to derive at possible data models and behavioural patterns . The data is further studied , understood
and summary statistics like central tendency , skewness , variance etc are determined . Possible
relationships between the different existing attributes are found and visualised. In essence , the data
is cleansed and explored to prepare it for analysis. Subsequently , the clustering techniques
mentioned above are applied on the clean data to group usual and abnormal characteristics of the
user. The normal and routine activities are denoted by huge clusters and the anomalous behaviour is
formed in minor clusters. The aim is to identify the best state-of-the-art clustering techniques and
implement them on smart home user data to optimise prediction and forecast for providing better
healthcare in ambient assisted living spaces. The results from various techniques is compared and
contrasted to estimate the most efficient technique for detecting anomalies and outliers.
The breakdown of the whole project is given below along with the tentative dates and timeframes
for each step .
DATA
COLLECTION
REFACTORING
DATA
REPRESENTATION
CLUSTER
IDENTIFICATION
8. PLANNING
TASK 1 : Development of Methodology
TASK 2 : Project Draft
TASK 3 : Preliminary Evaluation
SPECIFICATION OF INTEGRATED SYSTEMS
TASK 4 : Checklist of Resources
TASK 5 : Inventory Management
TASK 6 : Review of Existing Facilities
TASK 7 : Technical Requirements Catalogue
TASK 8 : Propose Expected Scenarios
TASK 9 : Prepare Detailed Business Plans
PROCESS
TASK 10 : Data Collection
TASK 11 : Refactor Data and Prepare for Analysis
TASK 12 : Data Visualisation and Design
TASK 13 : Derive Relationships Among Data Samples
TASK 14 : Clustering Data Using Various Techniques
TASK 15 : Recognition of Behavioural Patterns
TASK 16 : Detecting Anomalous Behaviour
TASK 17 : Compare Results From Different Methods Employed
TASK 18 : Deliver Final Report
ACTIVITY PLANNER
Task Name Start End
Dura1on
(days)
Task 1 4/5/19 10/5/2019 7
Task 2 11/5/19 13/5/2019 4
Task 3 14/5/19 18/05/19 5
Task 4 19/5/19 25/5/19 7
Task 5 26/5/19 01/6/19 7
Task 6 02/6/19 05/6/19 4
Task 7 07/6/19 10/6/19 4
Task 8 11/6/19 15/6/19 5
Task 9 16/6/19 25/6/19 10
Task 10 26/6/19 29/6/19 4
Task 11 1/7/19 4/7/19 4
Task 12 5/7/19 10/7/19 6
Task 13 11/7/19 15/7/19 5
Task 14 16/7/19 26/7/19 10
Task 15 27/7/19 30/7/19 4
Task 16 31/7/19 1/8/19 2
Task 17 1/8/19 2/8/19 2
Task 18 2/8/19 4/8/19 3
9. VI. BUSINESS PLAN
The purpose of proposing a business plan is to present the basic ideas behind the project and what
led the way to this research and convince the promoters and stakeholders. The stakeholders in this
project are the family members , caretakers , healthcare assistants and well-wishers concerned about
the elderly population and physically challenged people. This project would positively impact the
social , economical and technological aspects of the society. It is also believed that new
technologies promote principles of solidarity between generations, increasing volunteer actions and
transmission of personal and professional experiences. Life expectancy has increased to a higher
number of older adults who are solely dependent on care and assistance given by caregivers for all
the humdrum routine work. Several perspectives of the acting stakeholders on the technological
aspects of ambient assisted living environments could be identified. The survey and brainstorming
sessions imply that the major population are confident and willing to install a smart home product
to monitor the health and activities of elderly residents. Performing a SWOT analysis for this
product , shows us the possible threats and opportunities based on the resources of the current
marketplace. Approximately , the production costs and overhead would come around a £1000
including all the necessary sensors , smart devices , testing products , instruments , data mining and
clustering softwares. The business assets would roughly come up to £3000 with the rent and
expenses for the workspace , resident’s premises and necessary electronic devices , taken into
consideration.
Project Plan : GANTT CHART
D >
StartEnd 1 2 3 4 5 6 7 8 9 1011121314 15 ….. 45 … 65 … 85 …. 104
1 Development of Methodology 1 7
2 Project Draft 8 11
3 Preliminary Evaluation 12 17
4 Checklist of Resources 18 25
5 Inventory Management 26 33
6 Review of Existing Facilities 34 38
7 Technical Requirements Catalogue39 43
8 Propose Expected Scenarios 44 49
9 Prepare Detailed Business Plans50 59
10 Data Collection 60 64
11 Refactor Data and Prepare for Analysis65 68
12 Data Visualisation and Design 69 75
13 Derive Relationships Among Data Samples76 81
14 Clustering Data Using Various Techniques82 92
15 Recognition of Behavioural Patterns93 96
16 Detecting Anomalous Behaviour97 99
17 Compare Results From Different Methods Employed99 101
18 Deliver Final Report 101 104
10. VII. FUTURE VISION
In this proposal , we worked on various anomaly detection methods and clustering techniques to
determine which would be the most accurate and efficient for ambient assisted living smart home
environments. I have planned to perform a comparative evaluation of various outlier detection
algorithms on the user data to estimate their merits and demerits. The results obtained might be
helpful in suggesting which outlier technique best identifies the anomalies in the data with greater
precision. Our future works would be to implement relevant machine learning algorithms to
understand and perceive the cognitive actions of the residents and then apply the best and proved
anomaly detection method to detect potential hazards in the user’s health by unobtrusive health
monitoring and realtime forecasting.
VIII. BIBLIOGRAPHY
[1] E. Aarts and S. Marzano, The new everyday. Rotterdam: 010 Publishers, 2003.
[2] "Advanced Intelligent Environments", ENHANCING ANOMALY DETECTION USING
TEMPORAL PATTERN DISCOVERY, 2009. Available: 10.1007/978-0-387-76485-6
[3] "A smart home application to eldercare: Current status and lessons learned", Technology and
Health Care, vol. 17, no. 3, pp. 183-201, 2019.
[4] P. Rashidi, "SURVEY ON AMBIENT-ASSISTED LIVING TOOLS FOR OLDER ADULTS",
IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, vol. 17, no. 3, 2013.
[5] A. Lotfi, C. Langensiepen, S. Mahmoud and M. Akhlaghinia, "Smart homes for the elderly
dementia sufferers: identification and prediction of abnormal behaviour", Journal of Ambient
Intelligence and Humanized Computing, vol. 3, no. 3, pp. 205-218, 2011. Available: 10.1007/
s12652-010-0043-x .
[6] V. Jakkula and D. Cook, "Anomaly Detection Using Temporal Data Mining in a Smart Home
Environment", Methods of Information in Medicine, vol. 47, no. 01, pp. 70-75, 2008. Available:
10.3414/me9103.
[7] Allen JF, Ferguson G. Actions and events in interval temporal logic. Journal of Logic and
Computation. 1994; 4(5):531-579.
[8] S. Bourobou and Y. Yoo, "User Activity Recognition in Smart Homes Using Pattern Clustering
Applied to Temporal ANN Algorithm", Sensors, vol. 15, no. 5, pp. 11953-11971, 2015. Available:
10.3390/s150511953 [Accessed 7 January 2019].
[9] K Chaloner, R Brant. A Bayesian approach to outlier detection and residual analysis. In
Biometrika, Volume 75, Issue 4, Pages 651–659, 1988.
11. [10] Edwin M. Knox and Raymond T. Ng. Algorithms for Mining Distance-Based outliers in Large
datasets. In Proceedings of the 24rd International Conference on Very Large Data Bases, Pages
392-403, ACM, 1998.
[11] Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng and Jörg Sander. LOF: Identifying
Density-Based Local Outliers.in International Conference on Management of Data, Dalles, Volume:
5, Pages: 4061 – 4066, IEEE, 2005.
[12] Kamran Khan; Saif Ur Rehman; Kamran Aziz; Simon Fong ; S. Sarasvady . DBSCAN: Past,
Present and Future in Fifth International Conference on the Applications of Digital Information and
Web Technologies ( ICADIWT ), Pages: 232-238, IEEE, 2014.
[13] Zhongxiang Fanand Yan Sun. Clustering of College Students Based on Improved K-means
Algorithm. International Computer Symposium (ICS). Pages: 676 – 679, IEEE, 2016.
IX. APPENDIX
The best way to assess a newly developed idea is by gathering the different views and perspectives
of what fellow people feel about it. A survey was conducted where people of various age groups
and professions participated . The questions were designed in such a way to check their level of
interest in technology and automation , preferability and affinity for smart home devices , their
reactions towards the project and factors they would consider in case they’re interested in
purchasing smart home devices for assisting healthcare. The participants were given inline
descriptions about smart homes and ambient assisted living environment in common terms. After
analysing and studying the gathered survey data , the findings were :
• Almost half the people were very much interested in automated products.
• Only 30% consider having a smart home functionality.
• Number of people who prefer having caretakers / healthcare assistants were much the same
as the ones who don’t.
• Almost 65% of the population voted positively for installing a smart real-time monitoring
device to assist physically impaired / elderly people and about 45% felt that the product would
be highly effective .
• The major population mentioned that they would consider Price and user-friendliness
primarily before purchasing the product followed by design and service.
Survey Link : https://goo.gl/forms/1wIgDlRj4p4hi2ty2
Video Presentation Link : https://youtu.be/EDB46ixogM4