Personalized Privacy-Aware Image Classification

Personalized Privacy-aware
Image Classification
1Eleftherios (Lefteris) Spyromitros-Xioufis, 1Symeon Papadopoulos,
2Adrian Popescu, 1Yiannis Kompatsiaris
1Center for Research and Technology Hellas – Information Technologies Institute (CERTH-ITI)
2CEA-LIST
ICMR 2016, June 6-9, 2016, New York
children drinking erotic relatives vacations wedding
1

Personalized image privacy classification
 Photo sharing may compromise privacy
 Can we make photo sharing safer?
• Yes: build “private” image detectors
• Alerts whenever a “private” image is shared
• Personalization is needed because privacy is subjective!
2
-Would you share such an image?
-It depends:
• Teenager?
• Life insurance?

Previous work & limitations
1. Focus on generic (“community”) notion of privacy
• Models trained on PicAlert [1]
• Flickr images annotated according to a common privacy definition
• Consequences:
• Variability in user perceptions not captured
• Overoptimistic performance estimates
2. Justifications are hardly comprehensible
[1] Zerr et al., I know what you did last summer!: Privacy-aware image classification and search, CIKM, 2012.
3

Our main contributions
 Study personalization in image privacy classification
• Compare personalized vs generic models
• Compare two types of personalized models
 Semantic visual features
• Better justifications and privacy insights
 YourAlert: more realistic than existing benchmarks
4

Personalization approaches
1. Full personalization:
• A different model for each user relying only his feedback
• Disadvantage: requires a lot of feedback
2. Partial personalization:
• Models rely on user feedback + feedback from other users
• Amount of personalization controlled via instance weighting
5

Visual and Semantic Features
 vlad [1]: aggregation of local image descriptors
 cnn [2]: deep visual features
 semfeat [3]: outputs of ~17K concept detectors
• Trained using cnn
• Top 100 concepts per image
[1] Spyromitros-Xioufis et al., A comprehensive study over vlad and product quantization in large-scale image retrieval. IEEE
Transactions on Multimedia, 2014.
[2] Simonyan and Zisserman, Very deep convolutional networks for large-scale image recognition, ArXiv, 2014.
[3] Ginsca et al., Large-Scale Image Mining with Flickr Groups, MultiMedia Modeling, 2015.
6

Justifications via semfeat
knitwear
young-back
hand-glass
cigar-smoker
smoker
drinker
Freudian
 semfeat can be used to justify predictions
• A tag cloud of the most discriminative visual concepts
 Justifications can be noisy
• concept detectors are not perfect
• semfeat vocabulary is not privacy-oriented
7

semfeat-LDA: an improved semantic representation
 Solution: project semfeat to a latent space
• Images treated as text documents (top 10 concepts)
• A text corpus created from private images (Pic+YourAlert)
• LDA is applied to create a topic model (30 topics)
• 6 privacy-related topics are identified (manually)
 A 2nd level semantic representation: semfeat-LDA
Topic Top-5 semfeat concepts assigned to each topic
children dribbler child godson wimp niece
drinking drinker drunk tipper thinker drunkard
erotic slattern erotic cover-girl maillot back
relatives great-aunt second-cousin grandfather mother great-grandchild
vacations seaside vacationer surf-casting casting sandbank
wedding groom bride celebrant wedding costume
8

semfeat-LDA: more intuitive justifications
children
drinking
erotic
relatives
vacations
wedding
knitwear
young-back
hand-glass
cigar-smoker
smoker
drinker
Freudian
1st level semantic
representation
2nd level semantic
representation
9

YourAlert: a realistic benchmark
 User study
• Participants annotate their own photos
• Loose guidance allowed adoption of personal privacy notions
• Private  “would share only with close OSN friends or not at all”
• Public  “would share with all OSN friends or even make public”
• Automated extraction and annotation software
• Reduced privacy concerns: only features and annotations shared
• Users gave their informed consent to use their data
 The resulting dataset: YourAlert1
• Stats: 1.5K photos, 27 users, ~16/~40 private/public per user
• Main advantages:
• Facilitates realistic evaluation of privacy models
• Allows development of personalized models
1Publicly available at: http://mklab.iti.gr/datasets/image-privacy/ 10

Experimental evaluation
 Goals
• Compare different visual features
• Evaluate generic models in a realistic setting
• Evaluate personalized and partially personalized models
• Gain insights into privacy perceptions via semfeat
 Experimental setup
• Classifier: regularized logistic regression (LibLinear)
• Evaluation measure: Area under ROC (AUC)
11

Generic models on PicAlert vs YourAlert
0.50
0.55
0.60
0.65
0.70
0.75
0.80
0.85
0.90
0.95
1.00
edch bow vlad cnn semfeat
AUC
PicAlert YourAlert
perfect
best visual features in
[Zerr et al., 2012]
visual features based on
aggregation of local
descriptors
deep visual
features
semantic visual
features based on
cnn
Significantly
worse on
YourAlert!
random
+20%
12

Key findings on generic models
 Almost perfect performance on PicAlert with cnn
• semfeat perform similarly with cnn
 Singificantly worse performance on YourAlert
• Similar performance for all features
 Additional findings
• Using more generic training examples does not help
• Large variability in performance across users
13

Personalized privacy models
 Evaluation carried out on YourAlert
• A modified k-fold cross-validation for unbiased estimates
 Personalized model types
• ‘user’: only user-specific examples from YourAlert
• ‘hybrid’: a mixture of user-specific examples from YourAlert
and generic examples from PicAlert
• User-specific examples are weighted higher
14

Evaluation of personalized models
PicAlert YourAlert
u1
3-fold cv
k=1 test set
u2 u3
15
Model type:
‘user’

PicAlert YourAlert
u1
3-fold cv
k=1 test set
u2 u3
𝐷𝑡𝑟𝑎𝑖𝑛
𝑢1
Model type:
‘user’ ℎ 𝑢𝑠𝑒𝑟
1
16

PicAlert YourAlert
u1
3-fold cv
k=1 test set
u2 u3
𝑢2
Model type:
‘user’ ℎ 𝑢𝑠𝑒𝑟
2
17

PicAlert YourAlert
u1
3-fold cv
k=1 test set
u2 u3
18
Model type:
‘hybrid w=1’

PicAlert YourAlert
u1
3-fold cv
k=1 test set
u2 u3
𝑢1
Model type:
‘hybrid w=1’ ℎℎ𝑦𝑏𝑟𝑖𝑑 𝑤=1
1
19

PicAlert YourAlert
u1
3-fold cv
k=1 test set
u2 u3
𝑢2
Model type:
2
20

PicAlert YourAlert
u1
3-fold cv
k=1 test set
u2 u3
21
Model type:
‘hybrid w=2’

PicAlert YourAlert
u1
3-fold cv
k=1 test set
u2 u3
𝑢1
Model type:
1
22

PicAlert YourAlert
u1
3-fold cv
k=1 test set
u2 u3
𝑢2
ℎℎ𝑦𝑏𝑟𝑖𝑑 𝑤=2
2
Model type:
‘hybrid w=2’
23

Personalized privacy models
0.55
0.60
0.65
0.70
0.75
0.80
0.85
0.90
0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35
vlad semfeat cnn
AUC
# user-specific examples / features
generic
user
hybrid-g w=1
hybrid-g w=10
hybrid-g w=100
hybrid-g w=1000
24

Key findings on personalized models
 ‘user’ catches up ‘generic’ with few examples
 ‘hybrid’ is better than both ‘user’ and ‘generic’
• Even with very few user-specific examples
• ‘user’ is expected to outperform hybrid with more examples
 Weighting user-specific examples higher leads to
significantly better performance!
25

Privacy insights via semfeat
 An exploratory analysis
• Get insights into the average perception of privacy
• Identify deviations from the average perception of privacy
 Setup
• Build 1 generic and 27 personalized (‘user’) models
• Identify 50 most positive and 50 most negative coefficients
 Results
• Generic
• Interesting Deviations
• Alcoholic is private for generic and public for 𝑢12, 𝑢22
• Tourist is private for 𝑢11 and public for generic
child
mate
son
private
uphill
lakefront
waterside
public
26

Identifying recurring privacy themes
 A prototype semfeat-LDA vector for each user
• The centroid of the semfeat-LDA vectors of his private images
 K-means (k=5) clustering on the prototype vectors
0.00
0.05
0.10
0.15
0.20
c0: {2,3,19,23,25,26,27} c1: {1,5,6,11,12,13,14,20,21} c2: {8,10,17,24} c3: {4,16} c4: {7,9,15,18,22}
FactorLoadings
children
drinking
erotic
relatives
vacations
wedding
27

Future work
 Predict fine-grained privacy classes
• E.g. close-friends, all-friends, friends-of-friends, public
 More sophisticated instance sharing strategies
• E.g. taking inter-user similarities into account
 Adaptation of the semantic vocabulary towards privacy
 In a larger context
• Images are just one piece of the puzzle in users’ privacy
preservation…
• Deal with data acquisition and sharing problems
• Collaboration with other groups to conduct larger scale study
• Cross-domain collaboration (e.g. legal, social sciences)
• The USEMP1 project is a good example
1 http://www.usemp-project.eu/ 28

Thank you!
 Resources
Datasets: http://mklab.iti.gr/datasets/image-privacy
Code: https://github.com/MKLab-ITI/image-privacy
 Contact us
@espyromi / espyromi@iti.gr
@sympap / papadop@iti.gr
@kompats / ikom@iti.gr
http://www.usemp-project.eu/
29

Personalized Privacy-Aware Image Classification

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (8)

Ähnlich wie Personalized Privacy-Aware Image Classification

Ähnlich wie Personalized Privacy-Aware Image Classification (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Personalized Privacy-Aware Image Classification