Primary mirror edge sensor project for the Southern African Large Telescope
Utilizing Data warehousing and Data Mining Algorithms on information gathered with IoT Sensors
1. Utilizing Data warehousing and Data Mining Algorithms on
information gathered with IoT Sensors
Eric Matthews – Mohsen Tavakoli Fall 2016
Emerging Non-Traditional Database Systems:
Data Warehousing and Mining
(03-60-539) Dr. Ezeife
1
2. Contents
● Software and Hardware
● Data Warehouse
● Roll-Up Function
● WEKA Clustering
○ 3 Clusters
○ 6 Clusters
● Conclusion
2
3. Software and Hardware
● Arduino / Arduino IDE 1.6.13 (1)
● Ubuntu Linux Server
● Python 2.7 (Server & Client)
● MySQL Server v5.5.46
● WEKA 3.8
Sensors:
(1)
(2)
(5)
(3)
(4)
3
Sound (RB-Wav-26) (2)
Ultrasonic Distance (SR04) (3)
Temperature (DHT11) (4)
Light (Photoresistor) (5)
Motion Sensor (HC-SR501) (6)
(6)
4. Data Warehouse
Our data warehouse consists of the following
fields:
● location_id (any unique location that the device is placed)
● average , maximum and minimum over 10 readings of:
○ Distance
○ Light
○ Sound
○ Temperature
○ Humidity
○ # of Counts of Motion
● time_collected (time that client collected data)
● srv_time_collected (time that server collected data) 4
7. Roll-Up
We have created a stored procedure in MySQL that allows us to roll-up our data
by any interval of time and location
CALL database_project.rollup_time(time_interval_seconds, location_id)
This query allows us to aggregate our data into fact tables by any time interval
(minute, hour, day, year, or any amount of seconds) and location
We do this using GROUP BY on our time_collected field in MySQL
7
9. WEKA Clustering - Location 5 - 3 Clusters
Using EM clustering with a maximum of 3 clusters, we have retrieved clusters for
location 3, per minute, that we call Not Home, Passively Home, and Actively
Home
Passively home Not home Actively Home
9
● 47% Being
used
● 53% Not being
used
10. Location 5 - Cluster Centroids
Using 3 of our attributes (Light, Motion, and Sound) we have calculated these
centroids for our clusters in location 5. Data has been normalized.
Passively Home Not Home Actively Home
Avg Light 0.2533 0.7012 0.6758
Max Motion Count 0.1172 0 0.2433
Avg Sound 0.0431 0.0306 0.0819
# of Data Points 176 (9%) 899 (47%) 851 ( 44%)
10
11. WEKA Clustering - Location 5 - 6 Clusters
Using EM on location 5 with no maximum cluster parameter resulted in 6 clusters
Based on the clusters we came to the conclusion that:
● 51% location being used
● 49% location not being used
● Highly Active
● 2 Lights no Activity
● Quietly Active
● No light No Activity
● Main Light No Activity
● 1 Light Quietly Active
11
12. Location 5 - Cluster Centroids
Highly
Active
2 Lights
No Activity
Quietly
Active
No Light
No Activity
Main Light
No Activity
1 Light
Quietly Active
Avg Light 0.6901 0.8603 0.6777 0 0.6485 0.3654
Max
Motion
Count
0.2052 0 0.2438 0 0 0.1826
Avg
Sound
0.3034 0.0194 0.0717 0.0317 0.0338 0.049
# of Data
Points
896 (52%) 519 (30%) 316
(18%)
73 (4%) 166 (9%) 94 (5%)
12
13. Conclusion
- We can conclude that it is possible to define three different states of home
presence, namely: Not Home, Passively Home, and Actively Home
- Any new readings can be categorized into these clusters to determine
whether the subject is home or not
- Also, we can gain finer detail into the state of a location by using more
clusters:
● Determine when lights or heating/cooling are turned on but nobody is
using the location
● Monitor sources of ambient or constant noise
● Detection of presence during usual periods of no activity (locked building,
or house) 13
14. Future Work
We hope to find out more information from our data by:
● Collecting more data
● Rolling up larger amounts of time
● Using different subsets of our data for different hypotheses
● Using different algorithms for clustering
14