Multiple classifier systems are widely used in security applications like biometric personal authentication, spam filtering, and intrusion detection in computer networks. Several works experimentally showed their effectiveness in these tasks. However, their use in such applications is motivated only by intuitive and qualitative arguments. In this work we give a first possible formal explanation of why multiple classifier systems are harder to evade, and therefore more secure, than a system based on a single classifier. To this end, we exploit a theoretical framework recently proposed to model adversarial classification problems. A case study in spam filtering illustrates our theoretical findings.
1. P R A G
Pattern Recognition and Applications Group
University of Cagliari, Italy
Department of Electrical and Electronic Engineering
Evade Hard
Multiple Classifier Systems
Battista Biggio, Giorgio Fumera, Fabio Roli
ECAI / SUEMA 2008, Patras, Greece, July 21st - 25th
SUEMA 2008
2. About me
• Pattern Recognition and Applications Group
http://prag.diee.unica.it
– DIEE, University of Cagliari, Italy.
• Contact
– Battista Biggio, Ph.D. student
battista.biggio@diee.unica.it
21-07-2008 Evade Hard MCSs SUEMA 2008 2
3. Pattern Recognition and
P R A G
Applications Group
• Research interests
– Methodological issues
• Multiple classifier systems
• Classification reliability
– Main applications
• Intrusion detection in computer networks
• Multimedia document categorization, Spam filtering
• Biometric authentication (fingerprint, face)
• Content-based image retrieval
21-07-2008 Evade Hard MCSs SUEMA 2008 3
4. Why are we working on this topic?
• MCSs are widely used in security applications,
but…
– Lack of theoretical motivations
• Only few theoretical works on machine learning
for adversarial classification
• Goal of this (ongoing) work
– To give some theoretical background to the use of
MCSs in security applications
21-07-2008 Evade Hard MCSs SUEMA 2008 4
5. Outline
• Introducing the problem
– Adversarial Classification
• A study on MCSs for adversarial classification
– MCS hardening strategy: adding classifiers trained on
different features
– A case study in spam filtering: SpamAssassin
21-07-2008 Evade Hard MCSs SUEMA 2008 5
6. Adversarial Classification
Dalvi et al., Adversarial Classification, 10th ACM SIGKDD Int. Conf. 2004
• Adversarial classification
– An intelligent adaptive adversary modifies patterns to
defeat the classifier.
• e.g., spam filtering, intrusion detection systems (IDSs).
• Goals
– How to design adversary-
aware classifiers?
– How to improve classifier
hardness of evasion?
21-07-2008 Evade Hard MCSs SUEMA 2008 6
7. Definitions
Dalvi et al., 2004
• Two class problem:
– Positive/malicious patterns (+)
– Negative/innocent patterns (-)
Adversarial
Instance space Classifier cost function
-
X2 x X2 X2
x
+
X1 X1 X1
X = {X 1 , ... , X N }
C : X ! {+,"} W:X ! X "!
Each Xi is a feature
Instances, x ∈ X c ∈ C, concept class (e.g., more legible
(e.g., emails) (e.g., linear classifier) spam is better)
21-07-2008 Evade Hard MCSs SUEMA 2008 7
8. Adversarial cost function
• Cost is related to
– Adversary efforts
• e.g., to use a different server for sending spam
– Attack effectiveness
• more legible spam is better!
Example
• Original spam message: BUY VIAGRA!
– Easy to be detected by classifier
• Slightly modified spam message: BU-Y V1@GR4!
– It can evade classifier and be effective
• No more legible spam (uneffective message): B--Y V…!
– It can evade several systems, but who will still buy viagra?
21-07-2008 Evade Hard MCSs SUEMA 2008 8
9. A framework for
adversarial classification
Dalvi et al., 2004
• Problem formulation
– Two player game: Classifier vs Adversary
• Utility and cost functions for each player
• Classifier chooses a decision function C(x) at each ply
• Adversary chooses a modification function A(x) to evade classifier
• Assumptions in Dalvi et al., 2004
– Perfect Information
• Adversary knows the classifier’s discriminant function C(x)
• Classifier knows adversary strategy A(x) for modifying patterns
– Actions
• Adversary can only modify malicious patterns at operation phase
(training process is untainted)
21-07-2008 Evade Hard MCSs SUEMA 2008 9
10. In a nutshell
Lowd & Meek, Adversarial Learning, 11th ACM SIGKDD Int. Conf. 2005
- -
+ +
Adversary’s Task: Classifier’s Task:
Choose minimum cost Choose a new decision
modifications to function to minimise the
evade classifier expected risk
21-07-2008 Evade Hard MCSs SUEMA 2008 10
11. Adversary’s strategy
x2
BUY VIAGRA!
+
x
Too high cost
camouflage(s)
B--Y V…!
Mimimum cost
+' camouflage(s)
+ +
x ''
x BUY VI@GRA!
x '''
C(x) = ! C(x) = + x1
21-07-2008 Evade Hard MCSs SUEMA 2008 11
12. Classifier’s strategy
• The Classifier knows A(x) [perfect information]
– Adversary-aware classifier
Dalvi et al. showed that adversary-aware classifier can
perform significantly better
x2 ? +
detected!
x
?
still evades…
+'
x
x
x1
C(x) = ! C(x) = +
x'
21-07-2008 Evade Hard MCSs SUEMA 2008 12
13. Goals of this work
• Analysis of a widely used strategy for hardening
MCSs
– Using different sets of heterogeneus and redundant
features [Giacinto et al. (2003), Perdisci et al. (2006)]
• Only heuristic and qualitative motivations have
been given
• Using the described framework, we give more
formal explainations about the effectiveness of
this strategy
21-07-2008 Evade Hard MCSs SUEMA 2008 13
14. An example of the
considered strategy
• Biometric verification system
Fingerprint
Face Decision genuine
rule impostor
…
Voice
Claimed Identity
21-07-2008 Evade Hard MCSs SUEMA 2008 14
15. Another example of the
considered strategy
• Spam filtering
Header Analysis
Σ
Black/White List
URL Filter legitimate
spam
Signature Filter
… Assigned class
Content Analysis
http://spamassassin.apache.org
21-07-2008 Evade Hard MCSs SUEMA 2008 15
16. Applying the framework
to the spam filtering case
• Cost for Adversary
legitimate
Header Analysis s1 = 0.2
s2 = 0 true
Σ
Black/White List
s = 5.7
2.7
Signature Filter s3 = 0
s<th
s<5
BUY Text Classifier s4 = 2.5
VI@GR4!
VIAGRA! … false
Keyword Filters sN = 0
3
spam
Working assumption: changing “VIAGRA” to “VI@GR4” costs 3!
21-07-2008 Evade Hard MCSs SUEMA 2008 16
17. Applying the framework
to the spam filtering case
AFM Continues to Climb. Big News On
Horizon | UP 50 % This Week Text is embedded
Aerofoam Metals Inc. into an image!
Symbol : AFML
Price : $ 0.10 UP AGAIN
Status : Strong Buy
legitimate
Header Analysis s1 = 3.2
s2 = 0 true
Σ
Black/White List
s = 5.7
3.2
6.2
Signature Filter s3 = 0
s<5
Text Classifier sN = 2.5
0
Evasion costs 2.5 … false
Image Analysis sN+1 = 3
Evasion costs 3.0 spam
Now both text and image classifiers must be evaded to evade the filter!
21-07-2008 Evade Hard MCSs SUEMA 2008 17
18. Forcing the adversary to surrender
• Hardening the system by adding modules can
make the evasion too costly for the adversary
– In the end, the optimal adversary strategy becomes
not fighting!
“The ultimate warrior is one who wins the war by forcing the
enemy to surrender without fighting any battles”
The Art of War, Sun Tzu, 500 BC
21-07-2008 Evade Hard MCSs SUEMA 2008 18
19. Experimental Setup
• SpamAssassin
– 619 tests
– includes a text classifier (naive bayes)
• Data set: TREC 2007 spam track
– 75,419 e-mails (25,220 ham - 50,199 spam).
– We used the first 10K e-mails (taken in chronological
order) for training the SpamAssassin naive Bayes
classifier.
21-07-2008 Evade Hard MCSs SUEMA 2008 19
20. Experimental Setup
• Adversary
– Cost simulated at score level
• Manhattan distance between test scores
– Maximum cost fixed
• Rationale: higher cost modifications will make the spam
message no more effective/legible
• Classifier
– We did not take into account the computational cost
for adding tests
• Performance measure
– Expected utility
21-07-2008 Evade Hard MCSs SUEMA 2008 20
23. Will spammers give up?
• Spammer economics
– Goal: beat enough of the filters temporarily to get a bit
of mails through and generate a quick profit
– As filters accuracy increases, spammers simply send
larger quantities of spam in order to get the same bit
of mails still pass through
• the cost of sending spam is negligible with respect to the
achievable profit!
• Is it feasible to push the accuracy of spam filters
up to the point where only ineffective spam
messages can pass through the filters?
– Otherwise spammers won’t give up!
21-07-2008 Evade Hard MCSs SUEMA 2008 23
24. Future work
• Theory of Adversarial Classification
– Extend the model to more realistic situations
• Investigating other defence strategies
– We are expanding the framework to model
information hiding strategies [Barreno et al. (2006)]
• Possible implementation: randomising the placement of
the decision boundary
“Keep the adversary guessing. If your strategy is a mystery, it
cannot be counteracted. This gives you a significant advantage”
The Art of War, Sun Tzu, 500 BC
21-07-2008 Evade Hard MCSs SUEMA 2008 24
25. Thank you!
• Contacts
– roli@diee.unica.it
– fumera@diee.unica.it
– battista.biggio@diee.unica.it
P R A G
21-07-2008 Evade Hard MCSs SUEMA 2008 25
Hinweis der Redaktion
Precisare bene cos’e’ W(x,x’), ovvero che è il costo di aggiungere parole, etc E che è una sorta di misura di similarità tra i pattern, per cui vale 0 se e solo se x=x’
Intro biometrics, poi parallelo con spam e IDSs In many security systems, hardness of evasion can be improved combining several experts trained on redundant and heterogeneus features MCSs provide a very natural architecture to achieve this task. Our goal is to provide a more formal explaination to this phenomenon, using the framework previously described.
Specificare come abbiamo simulato il gioco Adversary optimal strategy Classifier adds modules