In recent years Machine Learning (ML) and especially Deep Learning (DL) have achieved great success in many areas such as visual recognition, NLP or even aiding in medical research.
https://www.bigdataspain.org/2017/talk/attacking-machine-learning-used-in-antivirus-with-reinforcement
Big Data Spain 2017
16th - 17th Kinépolis Madrid
5. # cat Static_Malware_Analysis
• Definition
Static Malware Analysis is usually performed by dissecting the different
resources of the binary file without executing it and studying each component.
The binary file can also be disassembled (or reverse engineered) using a
disassembler such as IDA or radare. (Wikipedia)
Search for signatures in the executable.
6. # cat Static_Malware_Analysis
• Portable Executable (PE)
The Portable Executable (PE) file format is a data structure that contains the
information necessary for the Windows OS loader to manage the wrapped
executable code.
PE File Format by Saurabh & Chinmaya
7. # cat Reinforcement_Learning
• Definitions
A Reinforcement Learning model consists of an angent and an environment.
For each turn, an agent receives a state and may choose one from a set of
actions .
The policy is the agent’s behavior, i.e., a mapping from states to actions .
The agent receives the next state and a scalar reward .
http://www.ausy.tu-darmstadt.de/Research/Research
Α
8. # cat Reinforcement_Learning
• Definitions
Immediate rewards are generally not very helpful while learning a game. So,
what we should aim for is long term rewards.
The long term reward of step t will be:
The agent aims to maximize the expectation of such long term return from
each state.
The parameter is the discount factor that defines the weight of distant
rewards in relation to those obtained sooner.
The discounting by ensured that this sum is finite.
9. # cat Reinforcement_Learning
• Q value
The optimal action-value function: Q value
A Neural Network will be used to approximate this function.
Next we can define the policy to choose an action.
The Loss function to update the Network:
http://web.stanford.edu/class/cs20si/lectures/slides_14.pdf
10. # cat Reinforcement_Learning
• Actor-Critic Algorithms
The actor produces an action given the current state of the environment.
The critic produces a TD (Temporal-Difference) error signal given the state and
resultant reward.
If the critic is estimating the action-value function Q(s,a), it will also need the
output of the actor.
The output of the critic drives learning in both the actor and the critic.
In Deep Reinforcement Learning, neural networks can be used to represent the
actor and critic structures.
11. # cat Antivirus_Evasion_Using_RL
• Overview
The environment → the malware sample.
The environment emits the state in the form of a 2350-dimensional feature
vector:
PE header metadata.
Section metadata: section name, size and characteristics.
Import & Export Table metadata.
Counts of human readable strings.
Byte histogram.
12. # cat Antivirus_Evasion_Using_RL
• Overview
The agent → the algorithm used to change the environment.
The agent sends actions to the environment, and the environment replies with
observations and rewards (that is, a score).
There will be an anti-malware engine (the attack target).
Each step will provide:
Reward: value of reward scored by the previous action. 10.0 (pass), 0.0 (fail).
Observation space (object): feature vector summarizing the composition of
the malware sample.
Done(bool): Determines whether environment needs to be reset; True means
episode was successful.
13. # cat Antivirus_Evasion_Using_RL
• Overview
The actions that can be performed on a malware sample in our environment
consist of the following binary manipulations:
* append_zero
* append_random_ascii
* append_random_bytes
* remove_signature
* upx_pack
* upx_unpack
* change_section_names_from_list
* change_section_names_to random
* modify_export
* remove_debug
* break_optional_header_checksum
Over time, the agent learns which combinations lead to the highest rewards, or
learns a policy.