We propose a method to automatically translate a preexisting activity recognition system, devised for a source sensor domain S, so that it can operate on a newly discovered target sensor domain T, possibly of different modality. First, we use MIMO system identification techniques to obtain a function that maps the signals of S to T. This mapping is then used to translate the recognition system across the sensor domains. We demonstrate the approach in a 5-class gesture recognition problem translating between a vision-based skeleton tracking system (Kinect), and inertial measurement units (IMUs). An adequate mapping can be learned in as few as a single gesture (3 seconds) in this scenario. The accuracy after Kinect --> IMU or IMU --> Kinect translation is 4% below the baseline for the same limb. Translating across modalities and also to an adjacent limb yields an accuracy 8% below baseline. We discuss the sources of errors and means for improvement. The approach is independent of the sensor modalities. It supports multimodal activity recognition and more flexible real-world activity recognition system deployments.
This presentation illustrates part of the work described in the following article:
* Banos, O., Calatroni, A., Damas, M., Pomares, H., Rojas, I., Troester, G., Sagha, H., Millan, J. del R., Chavarriaga, R., Roggen, D.: Kinect=IMU? Learning MIMO Signal Mappings to Automatically Translate Activity Recognition Systems Across Sensor Modalities. In: Proceedings of the 16th annual International Symposium on Wearable Computers (ISWC 2012), Newcastle, United Kingdom, June 18-22 (2012)
Kinect=IMU? Learning MIMO Signal Mappings to Automatically Translate Activity Recognition Systems Across Sensor Modalities
1. Kinect=IMU? Learning MIMO Signal
Mappings to Automatically Translate Activity
Recognition Systems Across Sensor
Modalities
ISWC 2012, Newcastle (UK)
Oresti Baños1, Alberto Calatroni2, Miguel Damas1, Héctor Pomares1,
Ignacio Rojas1, Hesam Sagha3, José del R. Millán3,
Gerhard Tröster2, Ricardo Chavarriaga3, and Daniel Roggen2
1Department of Computer Architecture and Computer Technology, CITIC-UGR, University of Granada, SPAIN
2Wearable Computing Laboratory, ETH Zurich, SWITZERLAND
3CNBI, Center for Neuroprosthetics, École Polytechnique Fédérale de Lausanne, SWITZERLAND
FET-Open Grant #225938
13. Transfer learning in AR
• Concept of transfer learning
– Origin in ML: “Need for lifelong machine learning methods that retain and reuse
previously learned knowledge” NIPS-95 workshop on “Learning to Learn”
– Mechanism, ability or means to recognize and apply knowledge and skills
learned in previous tasks or domains to novel tasks or domains
• Intended for
– Continuity of context-awareness across different sensing environments
– Network topology redundancy
– Collective and individual knowledge enhancement
• Advantages
– Knowledge may be conserved
– Less labeled supervision is needed (ideally no additional recordings)
– ‘Online’ process
– Possibly heterogeneous
14. Transfer learning in AR: related work
• Selected contributions
– On-body sensors ::: Calatroni et al. (2011)
• Model parameters
• Labels
– Ambient sensors ::: van Kasteren et al.
(2010)
• Common meta-feature space
• Limitations
– Long time scales operation
– Possible incomplete transfer
– Difficult transfer across modalities
A. Calatroni,D. Roggen, and G. Tröster, “Automatic transfer of activity recognition
capabilitiesbetween body-worn motion sensors: Training newcomers to recognize
locomotion,” in Proc. 8th Int Conf on Networked Sensing Systems, 2011.
T. van Kasteren,G. Englebienne,and B. Kröse, “Transferringknowledge of activity
recognition across sensor networks,” in Proc. 8th Int. Conf on Pervasive Computing,
2010, pp. 283–300.
15. Translation setup (Kinect ↔ IMU)
Skeleton Tracking System
(Kinect)
Body-worn Inertial Measurement Unit
(Xsens)
16. Translation setup (Kinect ↔ IMU)
Skeleton Tracking System
(Kinect)
– RGB camera, IR LED, IR camera
– Depth map
– 15 joint skeleton
– 3D joint coordinates (POS in mm)
– Tracking range: 1.2-3.5m
Body-worn Inertial Measurement Unit
(Xsens)
– Accurate 3D orientation
– Several modalities (ACC, GYR,
MAG)
20. IMU (Acceleration)
Translation setup (Kinect ↔ IMU)
Kinect (Position)
0 1 2 3
-1
0
1
2
Time (s)
Position(m)
X
Y
Z
0 1 2 3
-0.5
0
0.5
1
1.5
Time (s)
Acceleration(G)
X
Y
Z
21. Translation method
• System identification (signal level)
• Translation architectures (classification level)
– Template translation
– Signal translation
22. IMU (Acceleration)
Translation: Kinect to IMU
Kinect (Position)
𝑋𝑆(𝑡) 𝑋 𝑇(𝑡)
System S (source domain) System T (target domain)
Signal
level
Classification
level
0 2 4 6
-1
0
1
2
Time (s)
Position(m)
X
Y
Z
0 2 4
-1
0
1
2
Time (s)
Position(m)
X
Y
Z
0 1 2 3
-1
0
1
2
Time (s)
Position(m)
L1 L2 L3
0 2 4 6
-1
0
1
2
Time (s)
Position(m)
X
Y
Z
0 2 4
-1
0
1
2
Time (s)
Position(m)
X
Y
Z
0 1 2 3
-1
0
1
2
Time (s)
Position(m)
0 2 4 6
-1
0
1
2
Time (s)
Position(m)
X
Y
Z
0 2 4
-1
0
1
2
Time (s)
Position(m)
X
Y
Z
0 1 2 3
-1
0
1
2
Time (s)
Position(m)
23. IMU (Acceleration)
Kinect to IMU (signal mapping)
Kinect (Position)
𝑋𝑆(𝑡) 𝑋 𝑇(𝑡)
System S (source domain) System T (target domain)
Signal
level
Classification
level
0 2 4 6
-1
0
1
2
Time (s)
Position(m)
X
Y
Z
0 2 4
-1
0
1
2
Time (s)
Position(m)
X
Y
Z
0 1 2 3
-1
0
1
2
Time (s)
Position(m)
L1 L2 L3
0 2 4 6
-1
0
1
2
Time (s)
Position(m)
X
Y
Z
0 2 4
-1
0
1
2
Time (s)
Position(m)
X
Y
Z
0 1 2 3
-1
0
1
2
Time (s)
Position(m)
0 2 4 6
-1
0
1
2
Time (s)
Position(m)
X
Y
Z
0 2 4
-1
0
1
2
Time (s)
Position(m)
X
Y
Z
0 1 2 3
-1
0
1
2
Time (s)
Position(m)
Coexistence… (T)
0 20 40
-1
0
1
2
Time (s)
Position(m)
0 20 40
-1
0
1
2
Time (s)Acceleration(G)
24. IMU (Acceleration)
Kinect to IMU (signal mapping)
Kinect (Position)
𝑋𝑆(𝑡) 𝑋 𝑇(𝑡)
System S (source domain) System T (target domain)
Signal
level
Classification
level
0 2 4 6
-1
0
1
2
Time (s)
Position(m)
X
Y
Z
0 2 4
-1
0
1
2
Time (s)
Position(m)
X
Y
Z
0 1 2 3
-1
0
1
2
Time (s)
Position(m)
L1 L2 L3
0 2 4 6
-1
0
1
2
Time (s)
Position(m)
X
Y
Z
0 2 4
-1
0
1
2
Time (s)
Position(m)
X
Y
Z
0 1 2 3
-1
0
1
2
Time (s)
Position(m)
0 2 4 6
-1
0
1
2
Time (s)
Position(m)
X
Y
Z
0 2 4
-1
0
1
2
Time (s)
Position(m)
X
Y
Z
0 1 2 3
-1
0
1
2
Time (s)
Position(m)
Ψ𝑆→𝑇 𝑡 : 𝑋𝑆(𝑡) → 𝑋 𝑇(𝑡) ≈ 𝑋 𝑇(𝑡)
26. IMU (Acceleration)
Kinect to IMU (template translation)
Kinect (Position)
System S (source domain) System T (target domain)
Signal
level
Classification
level
0 2 4 6
-1
0
1
2
Time (s)
Position(m)
X
Y
Z
0 2 4
-1
0
1
2
Time (s)
Position(m)
X
Y
Z
0 1 2 3
-1
0
1
2
Time (s)
Position(m)
L1 L2 L3
0 2 4 6
-1
0
1
2
Time (s)
Position(m)
X
Y
Z
0 2 4
-1
0
1
2
Time (s)
Position(m)
X
Y
Z
0 1 2 3
-1
0
1
2
Time (s)
Position(m)
0 2 4 6
-1
0
1
2
Time (s)
Position(m)
X
Y
Z
0 2 4
-1
0
1
2
Time (s)
Position(m)
X
Y
Z
0 1 2 3
-1
0
1
2
Time (s)
Position(m)
Ψ𝑆→𝑇 𝑡 : 𝑋𝑆(𝑡) → 𝑋 𝑇(𝑡) ≈ 𝑋 𝑇(𝑡)
27. IMU (Acceleration)
Kinect to IMU (template translation)
Kinect (Position)
System S (source domain) System T (target domain)
Signal
level
Classification
level
0 2 4 6
-1
0
1
2
Time (s)
Position(m)
X
Y
Z
0 2 4
-1
0
1
2
Time (s)
Position(m)
X
Y
Z
0 1 2 3
-1
0
1
2
Time (s)
Position(m)
L1 L2 L3
0 2 4 6
-1
0
1
2
Time (s)
Position(m)
X
Y
Z
0 2 4
-1
0
1
2
Time (s)
Position(m)
X
Y
Z
0 1 2 3
-1
0
1
2
Time (s)
Position(m)
0 2 4 6
-1
0
1
2
Time (s)
Position(m)
X
Y
Z
0 2 4
-1
0
1
2
Time (s)
Position(m)
X
Y
Z
0 1 2 3
-1
0
1
2
Time (s)
Position(m)
Ψ𝑆→𝑇 𝑡 : 𝑋𝑆(𝑡) → 𝑋 𝑇(𝑡) ≈ 𝑋 𝑇(𝑡)
0 1 2 3
-1
0
1
2
Time (s)
Position(m)
X
Y
Z
28. IMU (Acceleration)
Kinect to IMU (template translation)
Kinect (Position)
System S (source domain) System T (target domain)
Signal
level
Classification
level
0 2 4 6
-1
0
1
2
Time (s)
Position(m)
X
Y
Z
0 2 4
-1
0
1
2
Time (s)
Position(m)
X
Y
Z
0 1 2 3
-1
0
1
2
Time (s)
Position(m)
L1 L2 L3
0 2 4 6
-1
0
1
2
Time (s)
Position(m)
X
Y
Z
0 2 4
-1
0
1
2
Time (s)
Position(m)
X
Y
Z
0 1 2 3
-1
0
1
2
Time (s)
Position(m)
0 2 4 6
-1
0
1
2
Time (s)
Position(m)
X
Y
Z
0 2 4
-1
0
1
2
Time (s)
Position(m)
X
Y
Z
0 1 2 3
-1
0
1
2
Time (s)
Position(m)
Ψ𝑆→𝑇 𝑡 : 𝑋𝑆(𝑡) → 𝑋 𝑇(𝑡) ≈ 𝑋 𝑇(𝑡)
0 1 2 3
-1
0
1
2
Time (s)
Position(m)
X
Y
Z
0 1 2 3
-0.5
0
0.5
1
1.5
Time (s)
Acceleration(G)
^X
^Y
^Z
29. IMU (Acceleration)
Translation method (Kinect IMU)
Kinect (Position)
System S (source domain) System T (target domain)
Signal
level
Classification
level
L1 L2 L3
Ψ𝑆→𝑇 𝑡 : 𝑋𝑆(𝑡) → 𝑋 𝑇(𝑡) ≈ 𝑋 𝑇(𝑡)
0 1 2 3
0
0.5
1
1.5
Time (s)
Acceleration(G)
X
Y
Z
^X
^Y
^Z
30. IMU (Acceleration)
Kinect to IMU (template translation)
Kinect (Position)
System S (source domain) System T (target domain)
Signal
level
Classification
level
0 2 4 6
-1
0
1
2
Time (s)
Position(m)
X
Y
Z
0 2 4
-1
0
1
2
Time (s)
Position(m)
X
Y
Z
0 1 2 3
-1
0
1
2
Time (s)
Position(m)
L1 L2 L3
0 2 4 6
-1
0
1
2
Time (s)
Position(m)
X
Y
Z
0 2 4
-1
0
1
2
Time (s)
Position(m)
X
Y
Z
0 1 2 3
-1
0
1
2
Time (s)
Position(m)
0 2 4 6
-1
0
1
2
Time (s)
Position(m)
X
Y
Z
0 2 4
-1
0
1
2
Time (s)
Position(m)
X
Y
Z
0 1 2 3
-1
0
1
2
Time (s)
Position(m)
Ψ𝑆→𝑇 𝑡 : 𝑋𝑆(𝑡) → 𝑋 𝑇(𝑡) ≈ 𝑋 𝑇(𝑡)
0 2 4
-1
0
1
2
Time (s)
Acceleration(G)
^X
^Y
^Z
0 5 10
-1
0
1
2
Time (s)
Acceleration(G)
^X
^Y
^Z
0 2 4
-1
0
1
2
Time (s)
Acceleration(G)
^X
^Y
^Z
L1 L2 L3
0 2 4
-1
0
1
2
Time (s)
Acceleration(G)
^X
^Y
^Z
0 5 10
-1
0
1
2
Time (s)
Acceleration(G)
^X
^Y
^Z
0 2 4
-1
0
1
2
Time (s)
Acceleration(G)
^X
^Y
^Z
0 2 4
-1
0
1
2
Time (s)
Acceleration(G)
^X
^Y
^Z
0 5 10
-1
0
1
2
Time (s)
Acceleration(G)
^X
^Y
^Z
0 2 4
-1
0
1
2
Time (s)
Acceleration(G)
^X
^Y
^Z
31. IMU (Acceleration)
Kinect to IMU (template translation)
Kinect (Position)
𝑋𝑆(𝑡) 𝑋 𝑇(𝑡)
System S (source domain) System T (target domain)
Signal
level
Classification
level
L1 L2 L3
0 2 4
-1
0
1
2
Time (s)
Acceleration(G)
^X
^Y
^Z
0 5 10
-1
0
1
2
Time (s)
Acceleration(G)
^X
^Y
^Z
0 2 4
-1
0
1
2
Time (s)
Acceleration(G)
^X
^Y
^Z
L1 L2 L3
0 2 4
-1
0
1
2
Time (s)
Acceleration(G)
^X
^Y
^Z
0 5 10
-1
0
1
2
Time (s)
Acceleration(G)
^X
^Y
^Z
0 2 4
-1
0
1
2
Time (s)
Acceleration(G)
^X
^Y
^Z
0 2 4
-1
0
1
2
Time (s)
Acceleration(G)
^X
^Y
^Z
0 5 10
-1
0
1
2
Time (s)
Acceleration(G)
^X
^Y
^Z
0 2 4
-1
0
1
2
Time (s)
Acceleration(G)
^X
^Y
^Z
32. IMU (Acceleration)
Kinect to IMU (template translation)
Kinect (Position)
𝑋𝑆(𝑡) 𝑋 𝑇(𝑡)
System S (source domain) System T (target domain)
Signal
level
Classification
level
L1 L2 L3
33. Kinect (Position)
IMU to Kinect
IMU (Acceleration)
𝑋𝑆(𝑡) 𝑋 𝑇(𝑡)
System S (source domain) System T (target domain)
Signal
level
Classification
level
34. Kinect (Position)
IMU to Kinect (signal mapping)
IMU (Acceleration)
𝑋𝑆(𝑡) 𝑋 𝑇(𝑡)
System S (source domain) System T (target domain)
Signal
level
Classification
level
Coexistence… (T)
0 20 40
-1
0
1
2
Time (s)
Position(m)
0 20 40
-1
0
1
2
Time (s)
Acceleration(G)
35. Kinect (Position)
IMU to Kinect (signal mapping)
IMU (Acceleration)
𝑋𝑆(𝑡) 𝑋 𝑇(𝑡)
System S (source domain) System T (target domain)
Signal
level
Classification
level
Ψ 𝑇→𝑆 𝑡 : 𝑋 𝑇(𝑡) → 𝑋𝑆(𝑡) ≈ 𝑋𝑆(𝑡)
36. Kinect (Position)
IMU to Kinect (signal translation)
IMU (Acceleration)
𝑋𝑆(𝑡) 𝑋 𝑇(𝑡)
System S (source domain) System T (target domain)
Signal
level
Classification
level
Ψ 𝑇→𝑆 𝑡 : 𝑋 𝑇(𝑡) → 𝑋𝑆(𝑡) ≈ 𝑋𝑆(𝑡)
37. Kinect (Position)
IMU to Kinect (signal translation)
IMU (Acceleration)
𝑋𝑆(𝑡) 𝑋 𝑇(𝑡)
System S (source domain) System T (target domain)
Signal
level
Classification
level
Ψ 𝑇→𝑆 𝑡 : 𝑋 𝑇(𝑡) → 𝑋𝑆(𝑡) ≈ 𝑋𝑆(𝑡)
38. Kinect (Position)
IMU to Kinect (signal translation)
IMU (Acceleration)
𝑋𝑆(𝑡) 𝑋 𝑇(𝑡)
System S (source domain) System T (target domain)
Signal
level
Classification
level
Ψ 𝑇→𝑆 𝑡 : 𝑋 𝑇(𝑡) → 𝑋𝑆(𝑡) ≈ 𝑋𝑆(𝑡)
0 1 2 3
-1
0
1
2
Time (s)
Position(m)
X
Y
Z
39. Kinect (Position)
IMU to Kinect (signal translation)
IMU (Acceleration)
𝑋𝑆(𝑡) 𝑋 𝑇(𝑡)
System S (source domain) System T (target domain)
Signal
level
Classification
level
Ψ 𝑇→𝑆 𝑡 : 𝑋 𝑇(𝑡) → 𝑋𝑆(𝑡) ≈ 𝑋𝑆(𝑡)
0 1 2 3
-0.5
0
0.5
1
1.5
Time (s)Acceleration(G)
^X
^Y
^Z
42. Evaluation
• Analyzed transfers
– Kinect (position):
• HAND
– IMUs (acceleration):
• RIGHT LOWER ARM (RLA)
• RIGHT UPPER ARM (RUA)
• BACK
43. Evaluation
• Model
– MIMO mapping with 10 tap delay
• Mapping domains
– Problem-domain mapping (PDM)
– Gesture-specific mapping (GSM)
– Unrelated-domain mapping (UDM)
• Results
– Mapping learning: 100 samples (~3.3s)
– Mapping testing: rest of unused instances
– Selection randomly repeated 20 times in an outer CV process
44. Translation accuracy
• Model
– 3-NN, FS = max. & min.
– 5-fold cross validation
– 100 repetitions
• Results
To RLA To RUA To BACK From RLA From RUA From BACK
0
20
40
60
80
100
Accuracy(%)
BS BT PDM GSM UDM
From Kinect … … to Kinect
46. Encountered limitations
• General model challenges/limitations
– Not all the mappings might be allowed (Temperature Gyro?)
• Kinect ↔ IMU challenges/limitations
– Different frame of reference (IMU local vs. Kinect world)
– Occlusions
– Subject out of range
– Torsions
47. Conclusions and future work
• Transfer system based on
– MIMO mapping model
– Template/Signal translation
• MAPPING: as few as a single gesture (~3 seconds)
• Successful translation across sensor modalities, Kinect ↔ IMU (4% and 8%
below baseline)
• NEXT STEPS
– Analyze the effect of data loss (occlusions, anomalies, etc.)
– Higher characterization of the considered MIMO model (i.e., ‘q’ value)
– Alternative mapping models: ARMA, TDNN, LSSVM
– Combination of sensors (homogeneous/heterogeneous)
– Test in more complex setups/real-world situations
48. Thank you for your attention.
Questions?
Oresti Baños Legrán
Dep. Computer Architecture & Computer Technology
Faculty of Computer & Electrical Engineering (ETSIIT)
University of Granada, Granada (SPAIN)
Email: oresti@atc.ugr.es
Phone: +34 958 241 516
Fax: +34 958 248 993
Work supported in part by the FP7 project OPPORTUNITY under FET-Open grant number 225938, the Spanish CICYT Project TIN2007-60587,
Junta de Andalucia Projects P07-TIC-02768 and P07-TIC-02906, the CENIT project AmIVital and the FPU Spanish grant AP2009-2244