Weitere ähnliche Inhalte Ähnlich wie [cb22] Who is the Mal-Gopher? - Implementation and Evaluation of “gimpfuzzy” for Go Malware Classification by Sawabe Amakasu and Nomura (20) Kürzlich hochgeladen (17) [cb22] Who is the Mal-Gopher? - Implementation and Evaluation of “gimpfuzzy” for Go Malware Classification by Sawabe Amakasu and Nomura1. Who is the Mal-Gopher?
- Implementation and Evaluation of "gimpfuzzy" for Go Malware
Classification
Yuta Sawabe / Nobuyuki Amakasu / Kazuya Nomura
NTT Security Holdings
2. © NTT Security Holdings All Rights Reserved
About Us
• Yuta Sawabe
• Mainly engaged in log analysis and malware analysis
• 2019 JIP Special Paper Awarded by Information Processing Society of Japan, Speaker at JSAC
2022 and Botconf 2021/2022
• Nobuyuki Amakasu
• Mainly engaged in EDR log analysis and malware analysis
• Previously worked as an Systems Engineer (SE), and has been in his current position since 2018
• Kazuya Nomura
• Main duties are alert monitoring with IPS/IDS/EDR / log analysis
• MWS2020 Thesis Award Winner, SecHack2020 Outstanding Alumni
2
2022
3. © NTT Security Holdings All Rights Reserved
Motivation & Goal
Go malware on the rise
• Go binary's unique structure requires the use of appropriate analysis methods
• Increased demand for fast classification
Proposal for "gimpfuzzy" to efficiently classify Go malware
• Methodology
• Evaluation of Discriminant Accuracy Using Datasets
• Example of classification using gimpfuzzy
3
2022
4. © NTT Security Holdings All Rights Reserved
Go Malware
© NTT Security Holdings All Rights Reserved
2022
5. © NTT Security Holdings All Rights Reserved
Go Malware
• Go / Golang
• An open-source programming language designed by Google in 2007
• Concurrency, runs fast, easy to write code
• Cross-platform: Build on a single code base to multiple operating environments
• Malware written in Go on the rise
• Ransomware / RAT / Botnet using an extensive library
• MaaS (Malware as a Service) / RaaS (Ransomware as a Service) leveraging cross-platform
• There are examples of existing malware downloaders and droppers being utilized
2022
The Go gopher was designed by Renee French.(http://reneefrench.blogspot.com/)
The design is licensed under the Creative Commons 3.0 Attributions license.
Read this article for more details: https://blog.golang.org/gopher
5
6. © NTT Security Holdings All Rights Reserved
Go Malware
2022
2012
Encriyoko RobbinHood
Zebrocy
(APT28)
WellMess
(APT29)
2018 2020
Blackrota
SNAKE
/ EKANS
ElectroRAT
PlugX Loader
(Mustang Panda
/ TA416)
ChaChi
Epsilon Red
2021
YamaBot
/ Kaos
2022
Chaos
Nerbian
Arid Gopher
2019
RAT
Ransomware
Party Ticket
Babuk Agenda
Snatch.
Others
6
IPStorm
2016 2017
AthenaGo
Linux.Lady
Linux.Rex
7. © NTT Security Holdings All Rights Reserved
Go Malware from Attackers
• Samples can be generated from a code base to target multiple platforms
• Runs fast and makes it easy to write robust code
• Avoid detection by AV and sandboxes due to large file size
2022
https://github.com/tiagorlampert/CHAOS
https://github.com/redcode-labs/Coldfire
7
8. © NTT Security Holdings All Rights Reserved
Go Malware from Analyst
Relatively difficult to reverse engineer
• Libraries are statically linked, greatly
increasing the analysis coverage
• Few language-aware analysis tools
• Analysts are often not familiar with specific
behavior unique to Go binaries
2022
A fast classification method is required.
C hello world Go hello world
71
Functions
1443
Functions
8
9. © NTT Security Holdings All Rights Reserved
gimpfuzzy
© NTT Security Holdings All Rights Reserved
2022
10. © NTT Security Holdings All Rights Reserved
Indicators to identify binaries
2022
• ssdeep
• TLSH
• imphash
• impfuzzy
• gimphash https://www.virustotal.com/
https://bazaar.abuse.ch/
10
11. © NTT Security Holdings All Rights Reserved
gimphash
• Golang binary version of imphash
• Golang binaries have a platform-independent structure called pclntab
• Dependent package names, function names, etc. can be restored
• gimphash is a SHA256 hash of some of the recovered package/function names with generic ones
removed
• Uniquely represent the functionality on which malware depends, but different hashes cannot be
compared
2022
github.com/example/ExampleFunction
github.com/example/Malicious
os.Getenv
Crypto/sha256/New
I...
gimphash
pclntab
Filter by function
name
+
sha256
E3B0C44298...
11
12. © NTT Security Holdings All Rights Reserved
gimpfuzzy
• gimphash to Fuzzy Hash
• SHA256 output changes greatly if the input is different even by 1 bit, so it is impossible to compare
"similar things" such as different versions.
• Fuzzy Hash enables measurement of similarity between samples
2022
github.com/example/ExampleFunction
github.com/example/Malicious
os.Getenv
Crypto/sha256/New
I...
gimpfuzzy
pclntab
github.com/example/AnotherFunction
github.com/example/Malicious
os.Getenv
Crypto/sha256/New
I...
3:j4dwGIVWYvgxN4dw
GIV3MNWzYKKKvTKrs
:j4mGggximGCMcTjKrs
3:j4dwGIV++RSXzM9xN4d
wGIV3MNWzYKKKvTKrs:j4
mGduSXg9ximGCMcTjKrs
Similarity can be
compared
12
13. © NTT Security Holdings All Rights Reserved
Verification method
• Using the Go malware classification list published by Palo Alto Networks
• SHA256 of samples
• Family Classification Results
• Approximately 7,900 samples from 53 families were collected from the list
• Verification of classification accuracy
2022
https://unit42.paloaltonetworks.jp/the-gopher-in-the-room-analysis-of-golang-malware-in-the-wild/
13
14. © NTT Security Holdings All Rights Reserved
Verification method: gimphash
2022
Image of gimphash's identification probability: example of combination of scores of samples in a family
a b c d e f
a 100 100 0 0 0 0
b 100 100 0 0 0 0
c 0 0 100 0 0 0
d 0 0 0 100 0 0
e 0 0 0 0 100 100
f 0 0 0 0 100 100
⁄
(2C2 + 2C2) 6C2
=
2
15
= 0.133
Exact match or not
(Score = 100 or 0)
Must always match
your hash
=1
=1
14
15. © NTT Security Holdings All Rights Reserved
Identification Probability: The probability that two samples from the same family
have the same gimphash
Example.
Number of family samples: 45
⁄
(21C2 + 14C2) 45C2
Verification method: gimphash
2022
count gimphash
-------------------------------------------------------------------------------------------------------------------
21 a2d8e06ce26e22f787fd9a9b44651f9493a3eae26847aa5cdf82a40c8bbc0f28
14 a307414aea2e294e69b425c65409774ed1552351fa8af099c3a643c5320f4411
1 f7c31d1ed7aad92f725d48827f9764abac9edd3f7e0cf6dcdfb93aea5e832ebd
15
16. © NTT Security Holdings All Rights Reserved
Verification method: gimpfuzzy
Identification Probability: The probability for two samples of the same family
having a similarity degree higher than the threshold
gimpfuzzy similarity score calculation method
2022
$ cat sample1.bin.gimpfuzzy
ssdeep,1.1--blocksize:hash:hash,filename
1536:R1IUKvE0I5WsS4KmdZo9uihOQGMbH6gWx:R1KvIoj4KmdZo9ucGMbH6gWx, "sample1.bin"
$ cat sample2.bin.gimpfuzzy
ssdeep,1.1--blocksize:hash:hash,filename
1536:x/KKvOm0I5WsS4cm5Zo9uihOQGMbH6gW7:5KKQIoj4cm5Zo9ucGMbH6gW7, "sample2.bin"
$ ssdeep -a -k sample1.bin.gimpfuzzy sample2.bin.gimpfuzzy
sample2.bin.gimpfuzzy:sample2.bin matches sample1.bin.gimpfuzzy:sample1.bin (77)
score
16
17. © NTT Security Holdings All Rights Reserved
Verification method: gimpfuzzy
2022
Identification probability for gimpfuzzy (score threshold > 60): A combination of
scores for samples in a family
a b c d e f
a 100 100 70 80 0 0
b 100 100 65 90 0 0
c 70 65 100 70 0 0
d 80 90 70 100 0 0
e 0 0 0 0 100 100
f 0 0 0 100 100
⁄
(4C2 + 2C2) 6C2
=
7
15
= 0.466
Based on the similarity of
the inputs, an evaluation is
now possible.
=6
=1
17
18. © NTT Security Holdings All Rights Reserved
Verification result: Identification
2022
gimphash gimpfuzzy(>=80) gimpfuzzy(>=70) gimpfuzzy(>=60)
Mean rate of
identification 7.32 47.85 67.37 79.98
(veil excluded) 17.64 37.89 46.13 55.94
18
0
10
20
30
40
50
60
70
80
90
100
A B C D E F G H I J K L M N O P Q R S T U V W
discrimination
rate
19. © NTT Security Holdings All Rights Reserved
Verification results
• Comparison of scores with in-family samples
• Compare the scores of a family of samples with all samples of the same family (remove duplicates)
• Investigate the distribution of scores for combinations within families
2022
70
100
75
70
0
Within the same
family, there are
samples that don't
match at all.
(score = 0)
Within the same family, many
samples have similar hashes.
(score >= 68)
*Color indicates family
19
20. © NTT Security Holdings All Rights Reserved
Verification results
• Comparison of scores with out-of-family samples
• Compare scores of samples of one family against all samples of a different family (remove duplicates)
• 75% of the combinations have completely different hashes (score 0), but 25% have similar hashes
2022
20
50
60
Score=0 is
omitted
Most samples have a
score < 69
*Color indicates family
20
21. © NTT Security Holdings All Rights Reserved
Verification results
• How to determine the threshold for grouping
• Threshold for family classification without
over/underclassification
• Those above the threshold are considered to be within the
family and surveyed
• (TP) True Positives → combinations that are above the threshold
and actually in the family
• (TN) True Negatives → combinations that are below the
threshold and actually outside the family
• (FP) False Positives → combination that is above the threshold
and actually outside the family
• (FN) False Negatives → combinations that are below the
threshold and actually in the family
• Difficult to evaluate by simple percentage of correct answers
• Number of samples in the family is greater than the number of
samples outside the family
• The need for equal assessment of TP/FP/FN ratios
2022
(TP) True
Positive
(TN) True
Negative
(FN) False
Negative
(FP) False
Positive
outside the family within the family
correct
wrong
Verify accuracy by moving the threshold to vary what is
considered within the family
21
22. © NTT Security Holdings All Rights Reserved
Verification results
2022
Threshold = 68
22
• Rated by F-measure
• Conformance rate: Percentage that were truly
within the family
• Reproducibility: Percentage of no dropouts
• F-measure : harmonic mean of goodness-of-
fit and repeatability
• Threshold of 68 is most reasonable.
• Also consistent with intuitive graph distribution
〇TP
×FP
〇TP +
〇TP
×FN
〇TP +
23. © NTT Security Holdings All Rights Reserved
Case Study
© NTT Security Holdings All Rights Reserved
2022
24. © NTT Security Holdings All Rights Reserved
Case Study 1: Cross Platform
2022
IPStorm
• P2P botnets observed in May 2019
• Check similarity of samples from different platforms by using gimpfuzzy
Similarity 90
b
gimpfuzzy
gimphash
3072:qZosQIop4rzLYr62Xb+iwzh7RXn
ZWCy7IxXkzAEmZMpg1DtNg+rhHWP
vXvAYiMsP:bA/KT7439mV/Wrz
3072:lZosCIop4szLYr62Xb+iSzh7RXn
ZWCy7IxXyzAKmZMpg1DtNg+rhHWP
vXvAYiMs0:1AQT7439mV/WrOt
mismatch
4e92f61bb61e08947f457e
73cbe72348c1dce312323a
652397c85731144a8088
37da5b52a7c1577b7e0f3c
fc99d059ffefe4def03ee840
a8b9becd192ca79291
a
24
25. © NTT Security Holdings All Rights Reserved
Case Study 1: Comparison of main functions
2022
IPStorm's characteristic behavior of calling the function "starter" of the package
"storm" is matched.
A (Linux) B (Mac)
25
26. © NTT Security Holdings All Rights Reserved
Case Study 1: Extracted Functions
2022 26
Classification is possible regardless of platform
differences
B (Mac)
817
Common
5744
A (Linux)
194
27. © NTT Security Holdings All Rights Reserved
Case Study 2: Families for which
gimpfuzzy is effective
• Family created with the malware creation tool "veil"
• Support for Metasploit payloads
• Various features, such as backdoors, can be specified and added to malware
• AV detection evasion function is available.
• Compared to gimphash, gimpfuzzy's accuracy is very good
2022 27
0.1
54.80936432
82.21532493
96.77106297
0
20
40
60
80
100
gimphash gimpfuzzy (threshold 80)
28. © NTT Security Holdings All Rights Reserved
• Fuzzy Hash is more effective on many samples than normal Hash
• In the case of normal hash, if the input is different even by a single character, a completely different hash is output.
2022
Passed Function: 426
main.JeDKBXmMzyjP
main.TUsJXmXgumrHRF
bytes.Equal
bytes.IndexByte
strings.IndexByte
io/ioutil.ReadAll
io/ioutil.NopCloser
NewSource
math/rand.New
math/rand.(*Rand).Seed
math/rand.(*Rand).Int63
...
Passed Function: 427
main.PdbjIQTndnFwQIr
main.UEeaLthPqHAvk
main.Redfkoe
bytes.Equal
bytes.IndexByte
strings.IndexByte
io/ioutil.ReadAll
io/ioutil.NopCloser
NewSource
math/rand.New
math/rand.(*Rand).Seed
...
Some function names
are obfuscated
Varies from sample to
sample.
Most of the other
Function name
matches
Fuzzy Hashing is
extremely effective
28
Case Study 2: Families for which
gimpfuzzy is effective
29. © NTT Security Holdings All Rights Reserved
Case Study 3: Clustering with gimpfuzzy
WellMess
• First observed by JPCERT and LAC in 2018
• APT29 in 2020 for use in campaigns related to COVID-19 vaccine development.
• Calculate gimphash / gimpfuzzy for 15 collected samples
• Classified into 7 types by gimphash
2022 29
30. © NTT Security Holdings All Rights Reserved
Case Study 3: WellMess
2022
2018 2019 2020
b
a c e g
d f
30
31. © NTT Security Holdings All Rights Reserved
Case Study 3: Classification Results
Use gimpfuzzy to further improve the identification probability
2022
Clustering of all 15
samples
31
7
5
4
2
1
0
5
10
gimphash gimpfuzzy
(閾値90)
gimpfuzzy
(閾値85)
gimpfuzzy
(閾値80)
gimpfuzzy
(閾値70)
Number
of
clusters
32. © NTT Security Holdings All Rights Reserved
Case Study 3: Classification Results
2022
b
a c g
e
d f
90
88
93
82 82
75
32
The samples are combined in chronological order,
starting from the nearest sample in chronological order.
Microsoft
Windows
Linux
33. © NTT Security Holdings All Rights Reserved
Case Study 3: Functions in the main
package (Windows)
Confirmation that the attacker is extending functionality
2022
c
a
e
g
33
34. © NTT Security Holdings All Rights Reserved
Case Study 3: Functions in the main
package (Linux)
2022
The function "getIP" exists only for samples targeting Linux
b d
f
34
35. © NTT Security Holdings All Rights Reserved
Common
574
Windows
3
Linux
24
an
unknown
32
Y + Z
58
Z
54
G
45
Others
22
Case Study 3: Extracted Functions
35
2022
X Y Z
• 70% of functions are common
• The difference in the functionality of the analyzer
is more important than the difference in the
platform.
than differences in sample functionality
36. © NTT Security Holdings All Rights Reserved
Challenges
© NTT Security Holdings All Rights Reserved
2022
37. © NTT Security Holdings All Rights Reserved
1. Filtered function names
Many function names are removed by filters in the calculation process
2022
$ ~/enum_gimp_function 947b273069...
ALL Function: 1532
go.buildid
LfoJVuVXGK
main.main
main.init
type..hash.[100]string
type..eq.[100]string
runtime.memhash0
runtime.memhash8
runtime.memhash16
runtime.memhash32
runtime.memhash64
...
...
Passed Function: 6
LfoJVuVXGK
os.(*PathError).Error
os.(*File).Name
os.NewFile
os.Exit
errors.New
Filter
Contains "internal/"
The first letter of the function
name is lower case
begins with
"go.", "type.", "runtime",etc...
Possibility of sample-specific
features being removed
37
38. © NTT Security Holdings All Rights Reserved
1. Filtered function names
2022 38
https://github.com/NextronSystems/gimphash
39. © NTT Security Holdings All Rights Reserved
2. Fuzzy Hashing not always possible
Less information required for Fuzzy Hash calculation
2022
$ ~/enum_gimp_function 947b273069105ff6a78e533cd6b8b0d9e8a35f4c9534fdf9d6...
All Function: 1532
...
Passed Function: 6
LfoJVuVXGK
os.(*PathError).Error
os.(*File).Name
os.NewFile
os.Exit
errors.New If the size of input to Fuzzy Hash is less than 4096 bytes, an error
occurs.
In this sample, the concatenated function string is short
39
40. © NTT Security Holdings All Rights Reserved
3. Interference of analysis
Function name cannot be restored
2022
$ file 0581c4953fda52c40f5d0911acdcfcb4dfdcbb9b64e3b0213a94477b991a7ec3
0581c4953fda52c40f5d0911acdcfcb4dfdcbb9b64e3b0213a94477b991a7ec3: PE32 executable (GUI) Intel
80386 (stripped to external PDB), for MS Windows, UPX compressed
$ upx -d 0581c4953fda52c40f5d0911acdcfcb4dfdcbb9b64e3b0213a94477b991a7ec3
Ultimate Packer for eXecutables
Copyright (C) 1996 - 2020
UPX 3.96 Markus Oberhumer, Laszlo Molnar & John Reiser Jan 23rd 2020
File size Ratio Format Name
-------------------- ------ -----------
upx: 0581c4953fda52c40f5d0911acdcfcb4dfdcbb9b64e3b0213a94477b991a7ec3: NotPackedException:
not packed by UPX
Unpacked 0 files.
Checking the section, it contains the section name
which is supposed to be packed by UPX, but it is not
unpackable
40
41. © NTT Security Holdings All Rights Reserved
3. Interference of analysis
Function name is a random string
2022
$ ~/enum_gimp_function 3ff830b8a9a1eb5cdf60d98c969e5d18a64c5a2fa6d430098...
...
Passed Function: 1026
main.NewCDITPPPADBCZMNQ
main.(*CDITPPADBCZMNQ).MBMKXLZCVOHE
main.(*CDITPPADBCZMNQ).ERGJXOCYDQNL
main.(*CDITPPPADBCZMNQ).DTSMCFPNHOPV
main.(*CDITPPPADBCZMNQ).PDUPAPZYBDWJ
main.(*CDITPPPADBCZMNQ).SKEKRXUKNVRZ
main.(*CDITPPADBCZMNQ).SXIFTMHNQIUP
...
The inclusion of random strings that differ from sample to sample
causes the Fuzzy Hash similarity score to decrease
41
42. © NTT Security Holdings All Rights Reserved
3. Interference of analysis
Obfuscators
2022 42
https://github.com/unixpickle/gobfuscate
43. © NTT Security Holdings All Rights Reserved
Summary
© NTT Security Holdings All Rights Reserved
2022
44. © NTT Security Holdings All Rights Reserved
Summary
New "gimpfuzzy" metric proposed for fast classification of Go malware
• Measuring similarity between samples by utilizing Fuzzy Hash
• Validation using a dataset confirms an accuracy improvement of more than 2.6x
Introducing an example of classification using gimpfuzzy
• Samples with added functionality are determined to be of the same family
• Cross-platform sample classification is also possible
• Also discussed challenges with gimpfuzzy
2022 44
45. © NTT Security Holdings All Rights Reserved
References
• https://github.com/NextronSystems/gimphash
• https://github.com/JPCERTCC/impfuzzy
• https://unit42.paloaltonetworks.jp/the-gopher-in-the-room-analysis-of-golang-malware-in-the-wild/
• https://www.intezer.com/blog/malware-analysis/year-of-the-gopher-2020-go-malware-round-up/
• https://www.paloaltonetworks.jp/company/in-the-news/2019/the-gopher-in-the-room-analysis-of-golang-malware-in-the-
wild
• https://www.darkreading.com/threat-intelligence/attackers-use-of-uncommon-programming-languages-continues-to-grow
• https://www.crowdstrike.com/blog/financial-motivation-drives-golang-malware-adoption/
• https://www.intezer.com/blog/research/operation-ElectroRAT-attacker-creates-fake-companies-to-drain-your-crypto-
wallets/
• https://blogs.jpcert.or.jp/ja/2018/06/wellmess.html
• https://www.ncsc.gov.uk/news/advisory-apt29-targets-covid-19-vaccine-development
2022 45
46. © NTT Security Holdings All Rights Reserved
Thank you!
2022
For questions / comments:
ntts.nsj-so-info@global.ntt
47. © NTT Security Holdings All Rights Reserved
Appendix
© NTT Security Holdings All Rights Reserved
2022
48. © NTT Security Holdings All Rights Reserved
Fuzzy Hashing (ssdeep)
• ssdeep's main idea: CTPH (Context Triggered Piecewise Hashing)
• Piecewise Hashing : Hashing of partitioned partial data
• Rolling Hash : Hash for fixed length partial data
• When the Rolling Hash reaches a certain value, split it and do Piecewise Hashing
• The triggering value is calculated according to the input data length
2022
a b c d e f g i
n
d
e
c
e
n
t
I J K L
Piecewise Hashing
Rolling Hash
a b c d e f g i
n
d
e
I J K L The similarity of the inputs
are reflected well.
input data
input data
48
49. © NTT Security Holdings All Rights Reserved
Combination
2022
• combination
• Not taking into account the
different order.
• 𝑛𝑛C𝑟𝑟 =
𝑛𝑛!
𝑛𝑛−𝑟𝑟 !𝑟𝑟!
• Example: Take out 2
pieces from 6 pieces
Combination
• A-B
• A-C.
• ...
• =15
a b c d e f
a A-A. A-B A-C. A-D. A-E. A-F
b B-A. B-B B-C. B-D. B-E. B-F
c C-A. C-B. C-C. C-D. C-E. C-F.
d D-A. D-B. D-C. D-D. D-E. D-F.
e E-A. E-B. E-C. E-D. E-E. E-F.
f F-A. F-B F-C. F-D. F-E. F-F
The combination with
itself is
does not exist
Out of order doesn't
count.
49
50. © NTT Security Holdings All Rights Reserved
Hashes
2022
Page Description SHA256
p.24 IPStorm (Linux) 4f0add8eadb24a134b5cab6052920f576eec1bb39232c9548286a66883dcab82
p.24 IPStorm (mac) 087f2ec8bbcee4091241e5ad30d449a1aecd0b9879338d072638c7d0ed6b30da
p.28 veil 00cf1c62ffdd727bfe8514003c6c881a4c53820a010e6e5a4757123165481864
p.28 veil 06b5cb2bc4fb1f08a223fb0bf5cc065adb6074b27a24e063bbc8a5ce7459ff5c
p.30 WellMess (A) bec1981e422c1e01c14511d384a33c9bcc66456c1274bbbac073da825a3f537d
p.30 WellMess (B) 0b8e6a11adaa3df120ec15846bb966d674724b6b92eae34d63b665e0698e0193
p.30 WellMess (C) d7e7182f498440945fc8351f0e82ad2d5844530ebdba39051d2205b730400381
p.30 WellMess (D) 7c39841ba409bce4c2c35437ecf043f22910984325c70b9530edf15d826147ee
p.30 WellMess (E) 8749c1495af4fd73ccfc84b32f56f5e78549d81feefb0c1d1c3475a74345f6a8
p.30 WellMess (F) 5ca4a9f6553fea64ad2c724bf71d0fac2b372f9e7ce2200814c98aac647172fb
p.30 WellMess (G) 4c8671411da91eb5967f408c2a6ff6baf25ff7c40c65ff45ee33b352a711bf9c
50