SlideShare ist ein Scribd-Unternehmen logo
1 von 37
Downloaden Sie, um offline zu lesen
Introduction                      Assembler as a native language          Anomalies detection




                 Detecting abnormal executable files using
                            binary code mining

                                       Rechkov Anton

                               TU Berlin Germany & TTI SFU Russia


                                      21th March 2012




               Rechkov Anton          Lomonosov Scholarship Report   21th March 2012    1 / 31
Introduction                   Assembler as a native language              Anomalies detection




Malware evolution

       Ciphered
       Encrypted malware code of viruses


       Oligomorphic
       Generation of a decryptor by randomly selecting each piece of the decryptor
       from several predefined alternatives.


       Polymorphic
       Generation of a sample by encypting malware body and modifying decryptor
       each replication


       Metamorphic
       Reprograming all virus body by some obfuscation engine.

               Rechkov Anton       Lomonosov Scholarship Report       21th March 2012    2 / 31
Introduction                   Assembler as a native language           Anomalies detection




Modern detection technique


       Signature analysis
       Searching a determine pattern in code.


       Emulation
       Unpacking and analysis through the emulation of malware code and continue
       signature analysis.


       Behavioral analysis
       Analysis of functions graph flow.




               Rechkov Anton       Lomonosov Scholarship Report    21th March 2012    3 / 31
Introduction                         Assembler as a native language          Anomalies detection




Code modification



       Obfuscation
       Transformation of executable program code which preserves functionality, but
       complicates the analysis and understanding algorithms.


       Deobfuscation
       Resolving irrelevant code by
                  Algebraic models
                  Formal grammars




               Rechkov Anton             Lomonosov Scholarship Report   21th March 2012    4 / 31
Introduction                         Assembler as a native language          Anomalies detection




Code modification



       Obfuscation
       Transformation of executable program code which preserves functionality, but
       complicates the analysis and understanding algorithms.


       Deobfuscation
       Resolving irrelevant code by
                  Algebraic models
                  Formal grammars




               Rechkov Anton             Lomonosov Scholarship Report   21th March 2012    4 / 31
Introduction                     Assembler as a native language          Anomalies detection




Outline



       1       Assembler as a native language
                 Binary code mining
                 Native language processing
                 Stochastic models

       2       Anomalies detection




               Rechkov Anton         Lomonosov Scholarship Report   21th March 2012    5 / 31
Introduction                    Assembler as a native language          Anomalies detection


Binary code mining


Table of Contents


       1        Assembler as a native language
                  Binary code mining
                  Native language processing
                  Stochastic models

       2        Anomalies detection
                  Preparation
                  Code generator lexemes
                  Anomalies detection by neural networks
                  Anomalies detection by probability model



               Rechkov Anton        Lomonosov Scholarship Report   21th March 2012    6 / 31
Introduction                         Assembler as a native language              Anomalies detection


Binary code mining


Structure of compiler

                                                               Common compiler scheme
   Code generator engine:
               Machine code generator,
               Optimizers:
                      interprocedural
                      optimization (IPO),
                      profile-guided
                      optimization (PGO),
                      high-level optimizations
               Mutation code generator /
               obfuscator.



               Rechkov Anton             Lomonosov Scholarship Report       21th March 2012    7 / 31
Introduction                          Assembler as a native language          Anomalies detection


Binary code mining


Common Code generator features


       high-level optimizations
                  Unique intermediate language
                  Preoptimizing in intermediate representation


       Code generation
                  Code templates from Intermediate to Target
                  Number of used instruction types


       Machine dependent optimizer
                  Instructions cost



               Rechkov Anton              Lomonosov Scholarship Report   21th March 2012    8 / 31
Introduction                          Assembler as a native language          Anomalies detection


Binary code mining


Common Code generator features


       high-level optimizations
                  Unique intermediate language
                  Preoptimizing in intermediate representation


       Code generation
                  Code templates from Intermediate to Target
                  Number of used instruction types


       Machine dependent optimizer
                  Instructions cost



               Rechkov Anton              Lomonosov Scholarship Report   21th March 2012    8 / 31
Introduction                          Assembler as a native language          Anomalies detection


Binary code mining


Common Code generator features


       high-level optimizations
                  Unique intermediate language
                  Preoptimizing in intermediate representation


       Code generation
                  Code templates from Intermediate to Target
                  Number of used instruction types


       Machine dependent optimizer
                  Instructions cost



               Rechkov Anton              Lomonosov Scholarship Report   21th March 2012    8 / 31
Introduction                        Assembler as a native language            Anomalies detection


Binary code mining


Approving theory


       Experiment
                  Determine instruction sequences
                  Compile source code with compilers
                  Compare distributions


       Compilers
          ⇒ MSVC
          ⇒ LLVM
          ⇒ GCC
          ⇒ Intel C++ Compiler



               Rechkov Anton              Lomonosov Scholarship Report   21th March 2012    9 / 31
Introduction                        Assembler as a native language            Anomalies detection


Binary code mining


Approving theory


       Experiment
                  Determine instruction sequences
                  Compile source code with compilers
                  Compare distributions


       Compilers
          ⇒ MSVC
          ⇒ LLVM
          ⇒ GCC
          ⇒ Intel C++ Compiler



               Rechkov Anton              Lomonosov Scholarship Report   21th March 2012    9 / 31
Introduction                         Assembler as a native language                    Anomalies detection


Binary code mining


XTEA distribution test
                                 Frequency of words in binary.




                                (a) LLVM                              (b) MSVC




                               (c) Intel C++                          (d) GCC
               Rechkov Anton             Lomonosov Scholarship Report            21th March 2012    10 / 31
Introduction                         Assembler as a native language           Anomalies detection


Binary code mining



                               Optimize binary’s mean distribution




               Rechkov Anton             Lomonosov Scholarship Report   21th March 2012    11 / 31
Introduction                    Assembler as a native language           Anomalies detection


Native language processing


Table of Contents


       1        Assembler as a native language
                  Binary code mining
                  Native language processing
                  Stochastic models

       2        Anomalies detection
                  Preparation
                  Code generator lexemes
                  Anomalies detection by neural networks
                  Anomalies detection by probability model



               Rechkov Anton        Lomonosov Scholarship Report   21th March 2012    12 / 31
Introduction                   Assembler as a native language           Anomalies detection


Native language processing


Text Mining


       Language detection


       Author detection


       Text Classification


       Document clustering




               Rechkov Anton       Lomonosov Scholarship Report   21th March 2012    13 / 31
Introduction                    Assembler as a native language           Anomalies detection


Stochastic models


Table of Contents


       1        Assembler as a native language
                  Binary code mining
                  Native language processing
                  Stochastic models

       2        Anomalies detection
                  Preparation
                  Code generator lexemes
                  Anomalies detection by neural networks
                  Anomalies detection by probability model



               Rechkov Anton        Lomonosov Scholarship Report   21th March 2012    14 / 31
Introduction                        Assembler as a native language           Anomalies detection


Stochastic models


Neural networks


       Advantages
           + effectively with small number of training vectors
           + assessment of all samples proximity


       Disadvantages
               - predetermining model
                         manual words definition
                         manual excessive elements analysis
                         reeducation limitations




               Rechkov Anton            Lomonosov Scholarship Report   21th March 2012    15 / 31
Introduction                         Assembler as a native language           Anomalies detection


Stochastic models


Probability model


       Advantages
           + self-sufficient word definition
           + education only by positive vectors
           + education unification(flexible reeducation)


       Disadvantages
               - big sample set for education
               - errors while distribution determination
               - computational complexity




               Rechkov Anton             Lomonosov Scholarship Report   21th March 2012    16 / 31
Introduction                     Assembler as a native language           Anomalies detection




Outline



       1       Assembler as a native language

       2       Anomalies detection
                 Preparation
                 Code generator lexemes
                 Anomalies detection by neural networks
                 Anomalies detection by probability model




               Rechkov Anton         Lomonosov Scholarship Report   21th March 2012    17 / 31
Introduction                    Assembler as a native language           Anomalies detection


Preparation


Table of Contents


       1        Assembler as a native language
                  Binary code mining
                  Native language processing
                  Stochastic models

       2        Anomalies detection
                  Preparation
                  Code generator lexemes
                  Anomalies detection by neural networks
                  Anomalies detection by probability model



               Rechkov Anton        Lomonosov Scholarship Report   21th March 2012    18 / 31
Introduction                          Assembler as a native language           Anomalies detection


Preparation


Collect statistics samples



       Python
                  Detection list of max repeated sequences
                  Disassembling
                  Searching strings


       Matlab
                  Stochastic models




               Rechkov Anton              Lomonosov Scholarship Report   21th March 2012    19 / 31
Introduction                          Assembler as a native language           Anomalies detection


Preparation


Collect statistics samples



       Python
                  Detection list of max repeated sequences
                  Disassembling
                  Searching strings


       Matlab
                  Stochastic models




               Rechkov Anton              Lomonosov Scholarship Report   21th March 2012    19 / 31
Introduction                          Assembler as a native language           Anomalies detection


Preparation


Collect statistics samples



       Python
                  Detection list of max repeated sequences
                  Disassembling
                  Searching strings


       Matlab
                  Stochastic models




               Rechkov Anton              Lomonosov Scholarship Report   21th March 2012    19 / 31
Introduction                    Assembler as a native language           Anomalies detection


Code generator lexemes


Table of Contents


       1        Assembler as a native language
                  Binary code mining
                  Native language processing
                  Stochastic models

       2        Anomalies detection
                  Preparation
                  Code generator lexemes
                  Anomalies detection by neural networks
                  Anomalies detection by probability model



               Rechkov Anton        Lomonosov Scholarship Report   21th March 2012    20 / 31
Introduction                         Assembler as a native language           Anomalies detection


Code generator lexemes


From disassembling to lexemes




       Lexem
                  3 to 6 instruction length sequences
                  ignore unknown bytes
                  maximum repeated sequences




               Rechkov Anton             Lomonosov Scholarship Report   21th March 2012    21 / 31
Introduction                   Assembler as a native language                   Anomalies detection


 Code generator lexemes


 Lexemes analysis


                                                                   Suffix Tree example


Suffix tree:
       Economy memory,
       String searching faster then O(N 2 ),
       Fast assessment of maximum
       repeats in strings




                Rechkov Anton       Lomonosov Scholarship Report           21th March 2012    22 / 31
Introduction                             Assembler as a native language           Anomalies detection


Anomalies detection by neural networks


Table of Contents


       1        Assembler as a native language
                  Binary code mining
                  Native language processing
                  Stochastic models

       2        Anomalies detection
                  Preparation
                  Code generator lexemes
                  Anomalies detection by neural networks
                  Anomalies detection by probability model



               Rechkov Anton                 Lomonosov Scholarship Report   21th March 2012    23 / 31
Introduction                             Assembler as a native language                     Anomalies detection


Anomalies detection by neural networks


Radial basis networks



                                                                            Neural net architecture

      no need to choose the number of
      hidden layers
      lack of the pathology convergence
      fast convergence through a
      combination of learning algorithms.




               Rechkov Anton                 Lomonosov Scholarship Report             21th March 2012    24 / 31
Introduction                              Assembler as a native language           Anomalies detection


Anomalies detection by neural networks


Detection compilers

                                         Compiler detection testing




               Rechkov Anton                  Lomonosov Scholarship Report   21th March 2012    25 / 31
Introduction                               Assembler as a native language           Anomalies detection


Anomalies detection by probability model


Table of Contents


       1        Assembler as a native language
                  Binary code mining
                  Native language processing
                  Stochastic models

       2        Anomalies detection
                  Preparation
                  Code generator lexemes
                  Anomalies detection by neural networks
                  Anomalies detection by probability model



               Rechkov Anton                   Lomonosov Scholarship Report   21th March 2012    26 / 31
Introduction                               Assembler as a native language                                       Anomalies detection


Anomalies detection by probability model


Multivariate Gamma

                                                                  Empirical and theoretical PDF
                                                                           of element

   Using a set of bi- and 3-variate                                        40

   Gamma:                                                                  35
                                                                                                                          Gamma PDF
                                                                                                                          Empirical PDF


               Suggest Gamma                                               30

               distribution                                                25


               Sample proximity

                                                                     PDF
                                                                           20



               Fast education                                              15


                                                                           10


                                                                            5


                                                                            0
                                                                           −0.02   0   0.02   0.04       0.06      0.08      0.1       0.12
                                                                                                     X




               Rechkov Anton                   Lomonosov Scholarship Report                          21th March 2012               27 / 31
Introduction                                            Assembler as a native language                                                Anomalies detection


Anomalies detection by probability model


Probability model testing

                Error graphs of compiler probabilities based on coefficient of
                              minimal value Pp = Pmin ∗ 10coef
                                             i       i



                 1                                                                         1
                                               false positive GCC O0                                                                          false positive MS
                                               false negative Clang                       0.9                                                 false negative LLVM
                0.9
                                               false negative Intel
                                               false negative GCC O2                      0.8
                0.8                            false negative MS

                0.7                                                                       0.7


                0.6                                                                       0.6




                                                                                  error
        error




                0.5                                                                       0.5


                0.4                                                                       0.4


                0.3                                                                       0.3


                0.2                                                                       0.2


                0.1                                                                       0.1


                 0                                                                         0
                      0   1   2   3    4        5      6    7      8   9   10                   0   2   4   6    8       10      12      14       16      18        20
                                      coeff for min value                                                       coeff for min value




                  Rechkov Anton                              Lomonosov Scholarship Report                              21th March 2012                    28 / 31
Introduction                                            Assembler as a native language                                                       Anomalies detection


Anomalies detection by probability model


Probability model testing


                                        Problem of existing zero elements


                 1                                                                                1
                                                            false positive GCC O2                                                              false positive GCC O2
                                                            false negative Clang                 0.9                                           false negative Clang
                0.9
                                                            false negative Intel                                                               false negative Intel
                                                            false negative GCC O0                                                              false negative GCC O0
                0.8                                                                              0.8
                                                            false negative MS                                                                  false negative MS

                0.7                                                                              0.7


                0.6                                                                              0.6




                                                                                         error
        error




                0.5                                                                              0.5


                0.4                                                                              0.4


                0.3                                                                              0.3


                0.2                                                                              0.2


                0.1                                                                              0.1


                 0                                                                                0
                      0   1   2   3    4        5      6    7      8      9         10                 0   1   2   3    4        5      6       7     8      9         10
                                      coeff for min value                                                              coeff for min value




                  Rechkov Anton                              Lomonosov Scholarship Report                                     21th March 2012                 29 / 31
Introduction                               Assembler as a native language           Anomalies detection


Anomalies detection by probability model


Conclusion


                  Proposed connection between native language and
                  assembler
                  Developed algorithms of lexical assembler language
                  analyzes
                  Developed experimental stochastic models:
                         Based on neural networks
                         Based on probability model
                  Realized lexical assembler language analysis.
                  Approximate false positive errors of compiler detection:
                         27%
                         10-15%


               Rechkov Anton                   Lomonosov Scholarship Report   21th March 2012    30 / 31
Introduction                               Assembler as a native language           Anomalies detection


Anomalies detection by probability model




                                           Questions?




               Rechkov Anton                   Lomonosov Scholarship Report   21th March 2012    31 / 31

Weitere ähnliche Inhalte

Was ist angesagt?

Program & language generation
Program & language generationProgram & language generation
Program & language generationBuxoo Abdullah
 
Generations of programming_language.kum_ari11-1-1-1
Generations of programming_language.kum_ari11-1-1-1Generations of programming_language.kum_ari11-1-1-1
Generations of programming_language.kum_ari11-1-1-1lakshmi kumari neelapu
 
.Net platform an understanding
.Net platform an understanding.Net platform an understanding
.Net platform an understandingBinu Bhasuran
 
Generations of Programming Languages
Generations of Programming LanguagesGenerations of Programming Languages
Generations of Programming LanguagesTarun Sharma
 
Programming language design and implemenation
Programming language design and implemenationProgramming language design and implemenation
Programming language design and implemenationAshwini Awatare
 
Domain Specific Language with pleasure
Domain Specific Language with pleasureDomain Specific Language with pleasure
Domain Specific Language with pleasureVaclav Pech
 
Programming Language
Programming  LanguageProgramming  Language
Programming LanguageAdeel Hamid
 
Computer Programming Overview
Computer Programming OverviewComputer Programming Overview
Computer Programming Overviewagorolabs
 
mbeddr meets IncQuer - Combining the Best Features of Two Modeling Worlds
mbeddr meets IncQuer - Combining the Best Features of Two Modeling Worldsmbeddr meets IncQuer - Combining the Best Features of Two Modeling Worlds
mbeddr meets IncQuer - Combining the Best Features of Two Modeling WorldsIstvan Rath
 
generation of programming language
 generation of programming language generation of programming language
generation of programming languagekunalkumar500
 
Presentation1
Presentation1Presentation1
Presentation1kpkcsc
 
Machine language to artificial intelligence
Machine language to artificial intelligenceMachine language to artificial intelligence
Machine language to artificial intelligenceSuneel Dogra
 
Software languages
Software languagesSoftware languages
Software languagesEelco Visser
 
Evolution of programming languages
Evolution of programming languagesEvolution of programming languages
Evolution of programming languagesNitin Kumar Kashyap
 
History of Programming Language
History of Programming LanguageHistory of Programming Language
History of Programming Languagetahria123
 
Can programming be liberated from the von neumman style
Can programming be liberated from the von neumman styleCan programming be liberated from the von neumman style
Can programming be liberated from the von neumman styleshady_10
 

Was ist angesagt? (19)

Program & language generation
Program & language generationProgram & language generation
Program & language generation
 
Generations of programming_language.kum_ari11-1-1-1
Generations of programming_language.kum_ari11-1-1-1Generations of programming_language.kum_ari11-1-1-1
Generations of programming_language.kum_ari11-1-1-1
 
.Net platform an understanding
.Net platform an understanding.Net platform an understanding
.Net platform an understanding
 
Presentation on Programming Languages.
Presentation on Programming Languages.Presentation on Programming Languages.
Presentation on Programming Languages.
 
Generations of Programming Languages
Generations of Programming LanguagesGenerations of Programming Languages
Generations of Programming Languages
 
Programming language design and implemenation
Programming language design and implemenationProgramming language design and implemenation
Programming language design and implemenation
 
Domain Specific Language with pleasure
Domain Specific Language with pleasureDomain Specific Language with pleasure
Domain Specific Language with pleasure
 
Introduction to c language
Introduction to c language Introduction to c language
Introduction to c language
 
Programming Language
Programming  LanguageProgramming  Language
Programming Language
 
Computer Programming Overview
Computer Programming OverviewComputer Programming Overview
Computer Programming Overview
 
mbeddr meets IncQuer - Combining the Best Features of Two Modeling Worlds
mbeddr meets IncQuer - Combining the Best Features of Two Modeling Worldsmbeddr meets IncQuer - Combining the Best Features of Two Modeling Worlds
mbeddr meets IncQuer - Combining the Best Features of Two Modeling Worlds
 
generation of programming language
 generation of programming language generation of programming language
generation of programming language
 
Presentation1
Presentation1Presentation1
Presentation1
 
Machine language to artificial intelligence
Machine language to artificial intelligenceMachine language to artificial intelligence
Machine language to artificial intelligence
 
Software languages
Software languagesSoftware languages
Software languages
 
Evolution of programming languages
Evolution of programming languagesEvolution of programming languages
Evolution of programming languages
 
History of Programming Language
History of Programming LanguageHistory of Programming Language
History of Programming Language
 
Can programming be liberated from the von neumman style
Can programming be liberated from the von neumman styleCan programming be liberated from the von neumman style
Can programming be liberated from the von neumman style
 
Computer programming languages
Computer programming languagesComputer programming languages
Computer programming languages
 

Kürzlich hochgeladen

80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxPooja Bhuva
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxUmeshTimilsina1
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsSandeep D Chaudhary
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxJisc
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...Amil baba
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
Tatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf artsTatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf artsNbelano25
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxannathomasp01
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 

Kürzlich hochgeladen (20)

80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptx
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & Systems
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Tatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf artsTatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf arts
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 

Rechkov. Lomonosov Report

  • 1. Introduction Assembler as a native language Anomalies detection Detecting abnormal executable files using binary code mining Rechkov Anton TU Berlin Germany & TTI SFU Russia 21th March 2012 Rechkov Anton Lomonosov Scholarship Report 21th March 2012 1 / 31
  • 2. Introduction Assembler as a native language Anomalies detection Malware evolution Ciphered Encrypted malware code of viruses Oligomorphic Generation of a decryptor by randomly selecting each piece of the decryptor from several predefined alternatives. Polymorphic Generation of a sample by encypting malware body and modifying decryptor each replication Metamorphic Reprograming all virus body by some obfuscation engine. Rechkov Anton Lomonosov Scholarship Report 21th March 2012 2 / 31
  • 3. Introduction Assembler as a native language Anomalies detection Modern detection technique Signature analysis Searching a determine pattern in code. Emulation Unpacking and analysis through the emulation of malware code and continue signature analysis. Behavioral analysis Analysis of functions graph flow. Rechkov Anton Lomonosov Scholarship Report 21th March 2012 3 / 31
  • 4. Introduction Assembler as a native language Anomalies detection Code modification Obfuscation Transformation of executable program code which preserves functionality, but complicates the analysis and understanding algorithms. Deobfuscation Resolving irrelevant code by Algebraic models Formal grammars Rechkov Anton Lomonosov Scholarship Report 21th March 2012 4 / 31
  • 5. Introduction Assembler as a native language Anomalies detection Code modification Obfuscation Transformation of executable program code which preserves functionality, but complicates the analysis and understanding algorithms. Deobfuscation Resolving irrelevant code by Algebraic models Formal grammars Rechkov Anton Lomonosov Scholarship Report 21th March 2012 4 / 31
  • 6. Introduction Assembler as a native language Anomalies detection Outline 1 Assembler as a native language Binary code mining Native language processing Stochastic models 2 Anomalies detection Rechkov Anton Lomonosov Scholarship Report 21th March 2012 5 / 31
  • 7. Introduction Assembler as a native language Anomalies detection Binary code mining Table of Contents 1 Assembler as a native language Binary code mining Native language processing Stochastic models 2 Anomalies detection Preparation Code generator lexemes Anomalies detection by neural networks Anomalies detection by probability model Rechkov Anton Lomonosov Scholarship Report 21th March 2012 6 / 31
  • 8. Introduction Assembler as a native language Anomalies detection Binary code mining Structure of compiler Common compiler scheme Code generator engine: Machine code generator, Optimizers: interprocedural optimization (IPO), profile-guided optimization (PGO), high-level optimizations Mutation code generator / obfuscator. Rechkov Anton Lomonosov Scholarship Report 21th March 2012 7 / 31
  • 9. Introduction Assembler as a native language Anomalies detection Binary code mining Common Code generator features high-level optimizations Unique intermediate language Preoptimizing in intermediate representation Code generation Code templates from Intermediate to Target Number of used instruction types Machine dependent optimizer Instructions cost Rechkov Anton Lomonosov Scholarship Report 21th March 2012 8 / 31
  • 10. Introduction Assembler as a native language Anomalies detection Binary code mining Common Code generator features high-level optimizations Unique intermediate language Preoptimizing in intermediate representation Code generation Code templates from Intermediate to Target Number of used instruction types Machine dependent optimizer Instructions cost Rechkov Anton Lomonosov Scholarship Report 21th March 2012 8 / 31
  • 11. Introduction Assembler as a native language Anomalies detection Binary code mining Common Code generator features high-level optimizations Unique intermediate language Preoptimizing in intermediate representation Code generation Code templates from Intermediate to Target Number of used instruction types Machine dependent optimizer Instructions cost Rechkov Anton Lomonosov Scholarship Report 21th March 2012 8 / 31
  • 12. Introduction Assembler as a native language Anomalies detection Binary code mining Approving theory Experiment Determine instruction sequences Compile source code with compilers Compare distributions Compilers ⇒ MSVC ⇒ LLVM ⇒ GCC ⇒ Intel C++ Compiler Rechkov Anton Lomonosov Scholarship Report 21th March 2012 9 / 31
  • 13. Introduction Assembler as a native language Anomalies detection Binary code mining Approving theory Experiment Determine instruction sequences Compile source code with compilers Compare distributions Compilers ⇒ MSVC ⇒ LLVM ⇒ GCC ⇒ Intel C++ Compiler Rechkov Anton Lomonosov Scholarship Report 21th March 2012 9 / 31
  • 14. Introduction Assembler as a native language Anomalies detection Binary code mining XTEA distribution test Frequency of words in binary. (a) LLVM (b) MSVC (c) Intel C++ (d) GCC Rechkov Anton Lomonosov Scholarship Report 21th March 2012 10 / 31
  • 15. Introduction Assembler as a native language Anomalies detection Binary code mining Optimize binary’s mean distribution Rechkov Anton Lomonosov Scholarship Report 21th March 2012 11 / 31
  • 16. Introduction Assembler as a native language Anomalies detection Native language processing Table of Contents 1 Assembler as a native language Binary code mining Native language processing Stochastic models 2 Anomalies detection Preparation Code generator lexemes Anomalies detection by neural networks Anomalies detection by probability model Rechkov Anton Lomonosov Scholarship Report 21th March 2012 12 / 31
  • 17. Introduction Assembler as a native language Anomalies detection Native language processing Text Mining Language detection Author detection Text Classification Document clustering Rechkov Anton Lomonosov Scholarship Report 21th March 2012 13 / 31
  • 18. Introduction Assembler as a native language Anomalies detection Stochastic models Table of Contents 1 Assembler as a native language Binary code mining Native language processing Stochastic models 2 Anomalies detection Preparation Code generator lexemes Anomalies detection by neural networks Anomalies detection by probability model Rechkov Anton Lomonosov Scholarship Report 21th March 2012 14 / 31
  • 19. Introduction Assembler as a native language Anomalies detection Stochastic models Neural networks Advantages + effectively with small number of training vectors + assessment of all samples proximity Disadvantages - predetermining model manual words definition manual excessive elements analysis reeducation limitations Rechkov Anton Lomonosov Scholarship Report 21th March 2012 15 / 31
  • 20. Introduction Assembler as a native language Anomalies detection Stochastic models Probability model Advantages + self-sufficient word definition + education only by positive vectors + education unification(flexible reeducation) Disadvantages - big sample set for education - errors while distribution determination - computational complexity Rechkov Anton Lomonosov Scholarship Report 21th March 2012 16 / 31
  • 21. Introduction Assembler as a native language Anomalies detection Outline 1 Assembler as a native language 2 Anomalies detection Preparation Code generator lexemes Anomalies detection by neural networks Anomalies detection by probability model Rechkov Anton Lomonosov Scholarship Report 21th March 2012 17 / 31
  • 22. Introduction Assembler as a native language Anomalies detection Preparation Table of Contents 1 Assembler as a native language Binary code mining Native language processing Stochastic models 2 Anomalies detection Preparation Code generator lexemes Anomalies detection by neural networks Anomalies detection by probability model Rechkov Anton Lomonosov Scholarship Report 21th March 2012 18 / 31
  • 23. Introduction Assembler as a native language Anomalies detection Preparation Collect statistics samples Python Detection list of max repeated sequences Disassembling Searching strings Matlab Stochastic models Rechkov Anton Lomonosov Scholarship Report 21th March 2012 19 / 31
  • 24. Introduction Assembler as a native language Anomalies detection Preparation Collect statistics samples Python Detection list of max repeated sequences Disassembling Searching strings Matlab Stochastic models Rechkov Anton Lomonosov Scholarship Report 21th March 2012 19 / 31
  • 25. Introduction Assembler as a native language Anomalies detection Preparation Collect statistics samples Python Detection list of max repeated sequences Disassembling Searching strings Matlab Stochastic models Rechkov Anton Lomonosov Scholarship Report 21th March 2012 19 / 31
  • 26. Introduction Assembler as a native language Anomalies detection Code generator lexemes Table of Contents 1 Assembler as a native language Binary code mining Native language processing Stochastic models 2 Anomalies detection Preparation Code generator lexemes Anomalies detection by neural networks Anomalies detection by probability model Rechkov Anton Lomonosov Scholarship Report 21th March 2012 20 / 31
  • 27. Introduction Assembler as a native language Anomalies detection Code generator lexemes From disassembling to lexemes Lexem 3 to 6 instruction length sequences ignore unknown bytes maximum repeated sequences Rechkov Anton Lomonosov Scholarship Report 21th March 2012 21 / 31
  • 28. Introduction Assembler as a native language Anomalies detection Code generator lexemes Lexemes analysis Suffix Tree example Suffix tree: Economy memory, String searching faster then O(N 2 ), Fast assessment of maximum repeats in strings Rechkov Anton Lomonosov Scholarship Report 21th March 2012 22 / 31
  • 29. Introduction Assembler as a native language Anomalies detection Anomalies detection by neural networks Table of Contents 1 Assembler as a native language Binary code mining Native language processing Stochastic models 2 Anomalies detection Preparation Code generator lexemes Anomalies detection by neural networks Anomalies detection by probability model Rechkov Anton Lomonosov Scholarship Report 21th March 2012 23 / 31
  • 30. Introduction Assembler as a native language Anomalies detection Anomalies detection by neural networks Radial basis networks Neural net architecture no need to choose the number of hidden layers lack of the pathology convergence fast convergence through a combination of learning algorithms. Rechkov Anton Lomonosov Scholarship Report 21th March 2012 24 / 31
  • 31. Introduction Assembler as a native language Anomalies detection Anomalies detection by neural networks Detection compilers Compiler detection testing Rechkov Anton Lomonosov Scholarship Report 21th March 2012 25 / 31
  • 32. Introduction Assembler as a native language Anomalies detection Anomalies detection by probability model Table of Contents 1 Assembler as a native language Binary code mining Native language processing Stochastic models 2 Anomalies detection Preparation Code generator lexemes Anomalies detection by neural networks Anomalies detection by probability model Rechkov Anton Lomonosov Scholarship Report 21th March 2012 26 / 31
  • 33. Introduction Assembler as a native language Anomalies detection Anomalies detection by probability model Multivariate Gamma Empirical and theoretical PDF of element Using a set of bi- and 3-variate 40 Gamma: 35 Gamma PDF Empirical PDF Suggest Gamma 30 distribution 25 Sample proximity PDF 20 Fast education 15 10 5 0 −0.02 0 0.02 0.04 0.06 0.08 0.1 0.12 X Rechkov Anton Lomonosov Scholarship Report 21th March 2012 27 / 31
  • 34. Introduction Assembler as a native language Anomalies detection Anomalies detection by probability model Probability model testing Error graphs of compiler probabilities based on coefficient of minimal value Pp = Pmin ∗ 10coef i i 1 1 false positive GCC O0 false positive MS false negative Clang 0.9 false negative LLVM 0.9 false negative Intel false negative GCC O2 0.8 0.8 false negative MS 0.7 0.7 0.6 0.6 error error 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 0 1 2 3 4 5 6 7 8 9 10 0 2 4 6 8 10 12 14 16 18 20 coeff for min value coeff for min value Rechkov Anton Lomonosov Scholarship Report 21th March 2012 28 / 31
  • 35. Introduction Assembler as a native language Anomalies detection Anomalies detection by probability model Probability model testing Problem of existing zero elements 1 1 false positive GCC O2 false positive GCC O2 false negative Clang 0.9 false negative Clang 0.9 false negative Intel false negative Intel false negative GCC O0 false negative GCC O0 0.8 0.8 false negative MS false negative MS 0.7 0.7 0.6 0.6 error error 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 coeff for min value coeff for min value Rechkov Anton Lomonosov Scholarship Report 21th March 2012 29 / 31
  • 36. Introduction Assembler as a native language Anomalies detection Anomalies detection by probability model Conclusion Proposed connection between native language and assembler Developed algorithms of lexical assembler language analyzes Developed experimental stochastic models: Based on neural networks Based on probability model Realized lexical assembler language analysis. Approximate false positive errors of compiler detection: 27% 10-15% Rechkov Anton Lomonosov Scholarship Report 21th March 2012 30 / 31
  • 37. Introduction Assembler as a native language Anomalies detection Anomalies detection by probability model Questions? Rechkov Anton Lomonosov Scholarship Report 21th March 2012 31 / 31