SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Downloaden Sie, um offline zu lesen
Detec%ng
Decep%on
in

        Wri%ng
Style


Sadia
Afroz,
Michael
Brennan
and
Rachel
Greenstadt.

        Privacy,
Security
and
Automa%on
Lab

                   Drexel
University

Overview

•  Authorship
recogni%on

•  Authorship
recogni%on
in
adversarial

   environment

•  Decep%on
detec%on

•  Experiments
on
different
datasets

Authorship
recogni%on






















Who
wrote
the
document?

Authorship
recogni%on

Stylometry:



  –  An
authorship
recogni%on
system
based
solely
on

     wri%ng
style.

  –  Not
handwri%ng

  –  Only
linguis%c
style:
word
choice,
sentence
length,

     parts‐of‐speech
usage,
…

Why
it
works?



•  Everybody
has
learned
language
differently

How
regular
authorship
recogni%on

              works



           Extract
features

                               Machine
Learning

                                   System

Extract

                       Determine

           features
                       authorship

                       Machine
Learning

                           System



Document
of


unknown
authorship

Assump%ons

•  Wri%ng
style
is
invariant.


   –  It’s
like
a
fingerprint,
you
can’t
really
change
it.


   –  Authorship
recogni%on
can
iden%fy
you
if
there

      are
sufficient
wri%ng
samples
and
a
set
of

      suspects.

Adversarial
AUacks


•  Imita%on
or
framing
aUack

    –  Where
one
author
imitates
another
author

    –  Par%cipants
were
asked
to
imitate
Cormac

       McCarthy
in
wri%ng
about
their
day

•  Obfusca%on
aUack

    –  Where
an
author
hides
his
regular
style

    –  Par%cipants
were
asked
to
write
about
their

       neighborhood
in
a
different
style

M.
Brennan
and
R.
Greenstadt.
Prac%cal
aUacks
against
authorship
recogni%on

techniques.
In
Proceedings
of
the
Twenty‐First
Conference
on
Innova%ve

Applica%ons
of
Ar%ficial
Intelligence
(IAAI),
Pasadena,
CA,
2009.

Accuracy
in
detec6ng
authorship
of
regular

                             documents

  1

0.9

0.8

0.7

0.6

0.5
        More
than
80%
accurate
in
detec%ng
          9‐Feature
(NN)

            authorship
with
40
authors
in
regular
       Synonym‐Based

0.4

                         document
                       Writeprints
Baseline
(SVM)

0.3
                                                     Random

0.2

0.1

  0

       5
     10
   15
   20
   25
    30
   35
   40

                      Number
of
Authors

Accuracy
in
detec6ng
authorship
of

                 Obfuscated
documents

  1

0.9

0.8

0.7

0.6

                                                       9‐Feature
(NN)

0.5

       Accuracy
is
less
than
random
chance

           Synonym‐Based

0.4
   in
obfuscated
documents
                        Writeprints
Baseline
(SVM)

0.3
                                                   Random

0.2

0.1

  0

       5
   10
   15
   20
   25
    30
   35
   40

                    Number
of
Authors

Accuracy
in
detec6ng
authorship
of

                   Imitated
documents

  1

0.9

0.8

0.7

0.6

                                                       9‐Feature
(NN)

0.5

       Accuracy
is
less
than
random
chance

           Synonym‐Based

0.4
   in
imitated
documents
                          Writeprints
Baseline
(SVM)

0.3
                                                   Random

0.2

0.1

  0

       5
   10
   15
   20
   25
    30
   35
   40

                    Number
of
Authors

Can
we
detect
Stylis%c
Decep%on?



                      Imitated

           Regular


                Obfuscated

Extended‐Brennan‐Greenstadt


                    Corpus

•  56
authors


   –  12
of
the
par%cipants
are
from
Drexel
University

   –  The
rest
are
paid
workers
from
Amazon
mechanical
turk

•  Three
kinds
of
wri%ng
samples

   –  Regular
wri%ngs
(5000
words)

   –  Imitated
wri%ng

      •  A
500‐word
ar%cle
describing
a
day


      •  Imitate
Cormac
McCarthy
from
`The
Road’

   –  Obfuscated
wri%ng

      •  A
500‐word
ar%cle

describing
neighborhood

      •  Hide
own
wri%ng
style

Detec%ng
stylis%c
decep%on
is
possible


100
   98
   94.5

                                  89.5

 90
     95.7
       85

 80
                    75.3

 70

                                      59.9

 60

                            48
                 Writeprint,
SVM

 50
                                      43

                                                Lying‐detec%on,
J48

 40

                                                9‐feature
set,
J48

 30

 20

 10

  0

        Regular
     Imita%on
    Obfusca%on

Feature
Changes
in
Imita6on
and
Obfusca6on


            Personal
pronoun

              Sentence
count

                      Par%cle

                 Short
Words

                        Verb

               Unique
words

                      Adverb

             Existen%al
there
                                        Imita%on

   Average
syllables
per
word
                                        Obfusca%on

         Average
word
length

                    Adjec%ve

            Cardinal
number

Gunning‐Fog
readability
index

     Average
sentence
length


              ‐80
 ‐60
 ‐40
 ‐20
   0
   20
   40
   60
   80
 100

Problem
with
the
dataset:

             Topic
Similarity

•  All
the
decep%ve
documents
were
of
same

   topic.

                                             5,$6.)78)9+,$($-.)8$%.'($)&$.)+-)9$.$60-1)
                                                      %9:$(&%(+%4)%'.;7(&;+3)
                                            $"



•  Non‐content‐specific

                                          !#,"
                                          !#+"
                                          !#*"




                             !"#$%&'($)

features
have
same


                                          !#)"
                                          !#("                                                        =>3/0<1<"
                                          !#'"                                                        ?5@-<08"
                                          !#&"

effect
as
content‐specific


                                                                                                      A23/53/"
                                          !#%"
                                          !#$"
                                            !"

features.
                                       -.-/0123"           4567804"
                                                             *+,$($-.)/(+0-1)2%#34$&)
                                                                                        29:7;<0123"
Hemingway‐Faulkner
Imita%on


                Corpus

•  Ar%cles
from
the
Interna%onal
Imita%on

   Hemingway
Contest
(2000‐2005)

•  Ar%cles
from
the
Faux
Faulkner
Contest

   (2001‐2005)

•  Original
excerpts
of
Ernest
Hemingway
and

   William
Faulkner

Decep%on
detec%on
is
possible

even
when
the
topic
is
not
similar



•  81.2%
accurate
in
detec%ng
imitated

   documents.

Long
term
decep%on:

            A
Gay
Girl
In
Damascus





Thomas
MacMaster.

                                      Fake
picture
of
Amina
Arraf.

–  Original
author
was
a
40‐year
old
American
ci%zen,

   Thomas
MacMaster.

–  Pretended
to
be
a
Syrian
gay
woman,
Amina
Arraf.

–  The
author
worked
for
at
least
5
years
to
create
a

   new
style.

Long
term
decep%on
is
hard
to
detect

•  None
of
the
blog
posts
were
found
to
be

   decep%ve.

•  But
regular
authorship
recogni%on
can
help.

•  We
tried
to
aUribute
authorship
of
the
blog

   posts
using
Thomas
(as
himself),
Thomas
(as

   Amina),
BriUa
(Thomas’s
wife).

Long
term
decep%on

 Authorship
recogni%on
of
the
blog

               posts





Thomas
MacMaster.
   Amina
Arraf
   BriUa
(Thomas’s
wife)


   54%
                    43%
                    3%

Future
works

•  Intrusion
detec%on

•  Social
spam
detec%on

•  Iden%fying
quality
discourse

Two
Tools

•  JStylo:
Authorship
Recogni%on
Analysis
Tool.

•  Anonymouth:
Authorship
Recogni%on
Evasion

   Tool.



•  Free,
Open
Source.
(GNU
GPL)

•  Alpha
releases
available
today
at

   hUps://psal.cs.drexel.edu

   –  Migra%ng
to
GitHub
soon.

Privacy,
Security
and
Automa%on
Lab

      (hUps://psal.cs.drexel.edu)

•  Faculty

   –  Dr.
Rachel
Greenstadt

•  Graduate
Students

   –  Sadia
Afroz
(Decep%on
Detec%on
Lead)

   –  Diamond
Bishop

   –  Michael
Brennan

   –  Aylin
Caliskan

   –  Ariel
Stolerman
(JStylo
Lead
Developer)

•  Undergraduate
Students

   –  Pavan
Kantharaju

   –  Andrew
McDonald
(Anonymouth
Lead
Developer)


Weitere ähnliche Inhalte

Mehr von pamselle

Power Spriting With Compass
Power Spriting With CompassPower Spriting With Compass
Power Spriting With Compasspamselle
 
Aylin Caliskan: Quantifying the Translator Effect: Identifying authors and ma...
Aylin Caliskan: Quantifying the Translator Effect: Identifying authors and ma...Aylin Caliskan: Quantifying the Translator Effect: Identifying authors and ma...
Aylin Caliskan: Quantifying the Translator Effect: Identifying authors and ma...pamselle
 
Kamelia Aryafar: Musical Genre Classification Using Sparsity-Eager Support Ve...
Kamelia Aryafar: Musical Genre Classification Using Sparsity-Eager Support Ve...Kamelia Aryafar: Musical Genre Classification Using Sparsity-Eager Support Ve...
Kamelia Aryafar: Musical Genre Classification Using Sparsity-Eager Support Ve...pamselle
 
GDI WordPress 4 January 2012 (white)
GDI WordPress 4 January 2012 (white)GDI WordPress 4 January 2012 (white)
GDI WordPress 4 January 2012 (white)pamselle
 
GDI WordPress 4 January 2012
GDI WordPress 4 January 2012GDI WordPress 4 January 2012
GDI WordPress 4 January 2012pamselle
 
GDI WordPress 3 January 2012 (white background)
GDI WordPress 3 January 2012 (white background)GDI WordPress 3 January 2012 (white background)
GDI WordPress 3 January 2012 (white background)pamselle
 
GDI WordPress 3 January 2012
GDI WordPress 3 January 2012GDI WordPress 3 January 2012
GDI WordPress 3 January 2012pamselle
 
GDI WordPress 2 January 2012
GDI WordPress 2 January 2012 GDI WordPress 2 January 2012
GDI WordPress 2 January 2012 pamselle
 
Gdi word press_2
Gdi word press_2Gdi word press_2
Gdi word press_2pamselle
 
GDI WordPress 1 January 2012
GDI WordPress 1 January 2012GDI WordPress 1 January 2012
GDI WordPress 1 January 2012pamselle
 

Mehr von pamselle (10)

Power Spriting With Compass
Power Spriting With CompassPower Spriting With Compass
Power Spriting With Compass
 
Aylin Caliskan: Quantifying the Translator Effect: Identifying authors and ma...
Aylin Caliskan: Quantifying the Translator Effect: Identifying authors and ma...Aylin Caliskan: Quantifying the Translator Effect: Identifying authors and ma...
Aylin Caliskan: Quantifying the Translator Effect: Identifying authors and ma...
 
Kamelia Aryafar: Musical Genre Classification Using Sparsity-Eager Support Ve...
Kamelia Aryafar: Musical Genre Classification Using Sparsity-Eager Support Ve...Kamelia Aryafar: Musical Genre Classification Using Sparsity-Eager Support Ve...
Kamelia Aryafar: Musical Genre Classification Using Sparsity-Eager Support Ve...
 
GDI WordPress 4 January 2012 (white)
GDI WordPress 4 January 2012 (white)GDI WordPress 4 January 2012 (white)
GDI WordPress 4 January 2012 (white)
 
GDI WordPress 4 January 2012
GDI WordPress 4 January 2012GDI WordPress 4 January 2012
GDI WordPress 4 January 2012
 
GDI WordPress 3 January 2012 (white background)
GDI WordPress 3 January 2012 (white background)GDI WordPress 3 January 2012 (white background)
GDI WordPress 3 January 2012 (white background)
 
GDI WordPress 3 January 2012
GDI WordPress 3 January 2012GDI WordPress 3 January 2012
GDI WordPress 3 January 2012
 
GDI WordPress 2 January 2012
GDI WordPress 2 January 2012 GDI WordPress 2 January 2012
GDI WordPress 2 January 2012
 
Gdi word press_2
Gdi word press_2Gdi word press_2
Gdi word press_2
 
GDI WordPress 1 January 2012
GDI WordPress 1 January 2012GDI WordPress 1 January 2012
GDI WordPress 1 January 2012
 

Kürzlich hochgeladen

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 

Kürzlich hochgeladen (20)

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Sadia Afroz: Detecting Hoaxes, Frauds, and Deception in Writing Style Online