Mining software vulns in SCCM / NIST's NVD

Mining Software Vulns
in SCCM / NIST’s NVD
THE ROCKY ROAD TO DATA NIRVANA

Overview
• “Pleased to meet you”
• The Playground
• Challenge #1: Complex Data Structures
• Challenge #2: “Dirty” unstructured data
• Challenge #3: People issues
• Lessons learned + Demo

Who I am
• Technical Security Architect at Ubisoft
• Previous: 2 large financial institutions, a major retailer, a
world-class telco, service bureaus
• Generalist with a passion for all things “technical security”

Disclaimer
“Opinions expressed as well
as the content of this
presentation are the
responsibility of the author.
They do not represent Ubisoft
company policy or views.”

The Playground: “Find the panda”
• 10K+ team members
• 26 studios in 18 countries
• Windows-centric
• Creativity Rules!
Where is the vulnerable
non-Microsoft software
installed?

The Great Idea
Microsoft’s SCCM: Reliable production software inventory
NIST’s NVD database: Up-to-date vulnerability data
Effective Patch Management

The Great Idea: Why?
• Avoids expen$ive licensing by using free public software
• Vuln data can become a JSON feed into SIEM or DFIR “big
data” mining app
• Do the “impossible” with leading-edge technologies

Challenge # 1
COMPLEX DATA STRUCTURES

MS’ System Center Configuration Mgr
• “The application people love to hate”
• Indispensable for management of enterprise-scale
Windows-centric environment
• Back-end MS-Sql database: 1600+ tables, 6200+ views
• Distributed component design leveragingWMI
• On-premises deployment: complex architecture

SCCM Components
• 50+ components!!!
• DLLs running (mostly) as threads, also
some separate services
• Communication:
• In-core queues
• Flat files stored in inboxes / outboxes

SCCM and WMI
SMS was the original WMI client
“Everything” is architected using WMI:
• Client-side
• Internal control of agent operations
• Discovery of hardware inventory
• Server-side
• SMS Provider isWMI provider
• Exposes important database objects asWMI objects
• ConfigMgr Console, SCCM auxiliary applications and tools are
implemented asWMI Mgmt Applications.

SCCM Discovery - I
• Populates inventory data in SCCM database
• 6 different methods
• Which are enabled depends on site configuration
• 4 methods target AD
• 1 searches the surrounding network
• 1 interacts with the SCCM client

SCCM Discovery - II
• AD Forest Discovery: IP subnets, AD sites
• AD Group discovery: AD groups and memberships
• AD User discovery: User accounts,AD attributes
• AD System discovery: Computer discovery
• Heartbeat discovery:
• Enabled by default + must be enabled  Are clients healthy and reachable?
• “creates discovery data records (DDRs) containing information about the client
including network location, NetBIOS name, and operational status.”
• Every 7 days by default.
• Network discovery: Search domains, SNMP services, Dhcp servers.
Disabled by default.

SCCM Discovery - III
“Garbage In –
Garbage Out”

SCCM Discovery - IV
“Make friends with your
SCCM administrator”
• Methods enabled?
• Polling interval?

SCCM Data – “Getting to know you”
“Hands-on” Exploring
• MS Sql Studio
Use AD to augment host inventory data
• E.g. OU in Distinguished Name
“Google isYour Friend”
• Also SafariTechnical Library

SCCM Data - I
UseViews notTables
• More stable interface
• Better documentation
• Permissions already in place
• Performance – avoid locking tables
• MS has done the “heavy lifting” e.g.
joins, stored procedure definitions
• More Community experience
• This is what MS MVPs say to do
Query SQL notWMI
• More direct, simpler, better performance

SCCM Data II – WMI Underpinnings
• WMI Class Name: “SMS_xxx”  SQLView Name: v_xxx
• WMI Property Names  Column names in SQLViews
• View names > 30 chars are truncated
• Column names have “0” appended to avoid conflicts with
SQL reserved words

SCCM Data III – View types
• Inventory data:
• Current: v_GS_< group name >
• History: v_HS_< group name >
• Discovery data:
• WMI scalar properties: v_R_< resource type name >
• WMI array properties: v_RA_< architecture name >_< group name >

SCCM Data III –
View types
v_SchemaViews lists
and categorizes
ConfigMgr views

SCCM Data IV – Inventory groups / views
• v_GroupMap view lists inventory groups and views
• Each one represents a WMI class configured for
inventory collection in client agent settings
DisplayName InvClassName InvHistoryClassName MIFClass
System v_GS_System v_HS_System SYSTEM
Add Remove Pgms
v_GS_ADD_REMOVE_PROG
RAMS
v_HS_ADD_REMOVE_PROGR
AMS
MICROSOFT|ADD_REM
OVE_PROGRAMS|1.0
Memory v_GS_X86_PC_MEMORY v_HS_X86_PC_MEMORY
MICROSOFT|X86_PC_M
EMORY|1.0

SCCM Data V - Collections
• A Collection is “a logical
group of resources in
ConfigMgr”
• v_Collection view:
Collection meta-data
• “All…” columns –
system-wide collections
Name Members
All Systems 25106
All Users 22903
All Unknown Computers 8
AllWindows Clients 20630
AllWindows Servers 3610

SCCM Data VI – Which view to use?
• v_R_System
• FromAD / Network / Heartbeat Discovery
• Resource_ID
• NetBIOS name, OS, AD domain
• 60+ fields
• v_GS_System
• Updated when Hardware Inventory runs
• Less accurate – host must have active agent and be scheduled for
hdware inventory
• 10 fields

SCCM Data : TL;DR
In most production contexts, the relevant views are:
• v_R_System
• Host / user data
• v_GS_ADD_REMOVE_PROGRAMS
• v_GS_ADD_REMOVE_PROGRAMS_64
• Updated when Hardware Inventory runs
• Installed software registry data

NIST Data
• Two main NIST data sets:
• CPE:Vendor / product dictionary
• CVE: List of vulnerabilities by year
• Formalized, structured format (== XML)

NIST’s CPE
CPE == “Common Platform Enumeration”
“Common Platform Enumeration (CPE) is a standardized method of
describing and identifying classes of applications, operating systems, and
hardware devices present among an enterprise's computing assets.”
 A master list of all vendors and all their products.

CPE Vendor / Product dictionary
A typical item in the CPEVendor / Product dictionary:

CPE Vendor / Product dictionary con’t

NIST’s NVD
“The NationalVulnerability Database is the U.S. government
repository of standards-based vulnerability management data
…This data enables automation of vulnerability management,
security measurement, and compliance.” (Wikipedia)

NIST NVD Components
A typical NIST NVD entry has the following components:
Component Name Description
CVE
CommonVulnerabilities and
Exposures
The basic vulnerability listing includingCPE vendor /
product.
CVSS
CommonVulnerability Scoring
System
Standardized vulnerability impact
CWE Common Weakness Enumeration Augmented, standardized description of vulnerability

CVE – Vulnerability
A typical vulnerability entry in the NVD: CVE-2017-3547

NIST NVD Feeds
NVD CVE data available as a daily Feed:
• XML or (new) JSON format
• Compressed gzip or zip archive
• Delta file or full download by year
• Meta file with file sizes / SHA256 hash to determine if feed file has
changed
https://nvd.nist.gov/vuln/data-feeds

NIST Data : TL;DR
• CPE:Vendor / product dictionary
• CVE: List of vulnerabilities by year
• CVSS:Vuln Impact (contained in CVE)
• XML standardized format
• Daily feeds available

Complex Data: The solution
The challenge:
• How to extract the unstructured vendor registry data from SCCM?
• How to match this data with the NIST vulnerability data?
The solution:
• Wise choice ofTools
• “Divide and conquer”

Make Good Technology choices
python: Good “data science” language
• fuzzywuzzy: Fuzzy matching
• xmltodict: XML parsing
pandas: Data will fit in computer memory. Great python-
based data analysis tool.
scikit-learn: Reliable Artificial Intelligence / Machine Learning
algorithms
Docker: Move “skunkworks” project around as required
ansible: Automate provisioning

Basic Approach
• Keep it native
• UseWindows to talk toWindows (AD, SCCM)
• Use Linux for Docker / python / pandas / scikit-learn
• Keep it simple
• 3rd-party software only, not Microsoft
• “Divide and conquer”
• Match vendors first
• Then match products for a given vendor

Basic Approach con’t
Use Machine Learning
• Treat this as two separate classification problems.
• Manually label data (especially vendors) since data sets are
small
• Extract features from data using fuzzy matching

Sample Vendor Data – Potential Matches
SCCM CPE
The GnuPG Project gnupg
DigitalVolcano Software Ltd digitalvolcano
NETGEAR Powerline netgear
MIT Media Lab mit
Cisco Systems, Inc. cisco
DameWare Development, LLC. dameware
BumpTechnologies, Inc. bump_project
Open Source open_source_development_team

Sample Vendor Data – SCCM Vendor names
Will the real vendor please stand up?
Cisco Oracle
Cisco Consumer Products LLC Oracle
Cisco Systems Oracle and/or its affiliates
Cisco Systems, Inc Oracle Corporation
Cisco Systems, Inc. Oracle Corporation.
CiscoWebEx LLC Oracle USA
Oracle, Inc.

ML – Feature Extraction
ML Classification Algorithm needs data “features”
Basic approach:
• Tokenization
• Stop words
• Fuzzy matching statistics
• String length

ML – Tokenization
• Convert name string into a set of tokens:
• Shift to lower case
• Split string into tokens using separators: _ . , ( ) + !
• Remove “Stop” words
• Tokens that appear often e.g. “Ltd.” “Inc.” “Project” “Software”
• Add little “value” in determining whether there is a match

ML – Fuzzy Matching I
Levenshtein or “edit” distance:
“The Levenshtein distance between two words is the minimum number
of single-character edits (insertions, deletions or substitutions) required
to change one word into the other.” (Wikipedia)

ML – Fuzzy Matching II
python FuzzyWuzzy package
https://github.com/seatgeek/fuzzywuzzy
1st string 2cd string Ratio
Simple Ratio "this is a test" "this is a test!" 97
Partial Ratio "this is a test" "this is a test!" 100
Token Sort Ratio
"fuzzy wuzzy was a
bear"
"wuzzy fuzzy was a
bear"
100
Token Set Ratio "fuzzy was a bear"
"fuzzy fuzzy was a
bear"
100

ML – Feature Extraction
To extract data “features”:
• Use the fuzzywuzzy pkg to calculate match ratios
• Also use string length

ML – Label the input data sets
Observations:
• Accurately matching vendor data is crucial
• Data set size is small: ~10K vendors
Approach:
• Manually label data taking care to target important vendors
• Use the manually labelled data to train the ML algorithm
• Use ML-classified data + labelled data for final match processing!!

ML – Algorithm Selection I
Which algorithm to
choose?

ML – Algorithm Selection II
Use simple K-Folds cross-validation
• Split labelled data into k consecutive folds
• Each fold is used once for validation while remaining k – 1 folds
form the training set
• Repeat for each algorithm being tested

ML – Algorithm Selection III
Random Forest Classifier was the best.
• “Forest” of decision trees
• Diverse set of classifiers built by introducing randomness in
classifier construction
• Prediction of the ensemble is the averaged prediction of
the individual classifiers.
http://scikit-
learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html

ML – Algorithm Tuning
“This algorithm has many parameters. How to tune for
maximum accuracy?”
Use Randomized Grid Search with Cross-Validation
• Define initial parameter bounds / possible values
• Randomized search over the parameter space
• Use cross-validation to evaluate estimator accuracy

ML – Software match sample results
Just how good is the matching?
CPE SCCM DisplayName0
cpe:/a:wireshark:wireshark:1.4.3 Wireshark 1.4.3
cpe:/a:videolan:vlc_media_player:1.1.6 VLC media player 1.1.6
cpe:/a:hp:headless_server_registry_update:1.0.0.0 Headless Server Registry Update
cpe:/a:hp:insight_management_agents:8.70.0.0 HP Insight Management Agents
cpe:/a:wireshark:wireshark:1.12.6 Wireshark 1.12.6 (64-bit)
cpe:/a:adobe:indesign_cs4_common_base_files:6.0 Adobe InDesignCS4Application Feature Set Fil..
cpe:/a:hp:smart_web_printing:4.60 HP SmartWeb Printing 4.60
cpe:/a:mozilla:firefox:45.0.1 Mozilla Firefox 45.0.1 (x64 en-US)
cpe:/a:watchguard:watchguard_system_manager:- WatchGuard System Manager 11.5.1

Complex Data : TL;DR
• Choose powerful technology: python / pandas / scikit-learn
• Split into 2 separate simple classification problems
• K-Folds Cross-validation picked Random Forest Classifier
• Randomized Grid Search with Cross-validation to tune

Challenge # 2
“DIRTY” DATA

Then Everything Blew Up!
Discovery: Real-life production data is full of anomalies!
• AD
• 80K extraneous hosts
• SCCM
• Did not manage “everything”
• Some hosts were “missing in action” e.g. laptops
• CPE
• Vendor product naming / versioning varied wildly from vendor to
vendor
• Vendor buyouts / merges impacted product naming e.g. Java
• Foreign language data / Unicode

“Dirty” data solutions I
• Spend hands-on time with the data
• Manual labelling  several code rewrites
• Use Defensive Coding
• Validate all input
• Use python “try”
• Handle Missing data
• The “bane” of pandas
• Either discard or initialize to a known value

“Dirty” Data solutions - II
Discard extraneous data as quickly as possible, e.g.:
• Microsoft software data
• Deprecated NVD data
• Unmanaged SCCM hosts
• CVE listings for hardware / OS vulnerabilities

“Dirty” Data Solutions - III
Use heuristics to speed up matching
• Vendor:
• Ignore CPE vendors that are 1-2 characters long
• 1st word of CPEVendor string has to be in the tokenizedWMI SCCM
Publisher0 string somewhere
• The condensed CPE name has to be shorter than the fullWMI
"Publisher0“
• Products:
• Release #’s should at least partially match
• At least one word in the CPE product name should be found in the
SCCM equivalent

“Dirty” Data Solutions – IV
When all else fails, develop code for the “problem” data
e.g. Java product versioning

“Dirty” Data Solutions: TL;DR
• Get “intimate” with the data
• “Shields up”: validate, “try”
• “Shoot from the hip”: Kill the “missing” data before it gets
you
• “Take out the garbage” (data)
• Cheat if you have to: Heuristics
• “Plan B”: code around obstacles

Present the idea to Ops to get support
• Took my “great idea”
to the SCCM
Production Ops team
• They were kind enough
to meet with me.
• On-site meeting with
SCCM architect on
conference call.

Production Ops reaction: Oups! Disaster!
• Talked “technology”
instead of presenting
from Ops viewpoint
• SCCM architect
• “The” key player
• 6 time zones away, end of
his day
• Local meeting was not in
his native language
• The “man in the wall”

Blessed by the King! (… Sort of)
• VP came to town
• Heard the prez
• Wanted “his” dashboard:
• For “yesterday”
• Budget: $0 / 0 hr

Ops reaction: We are Worried!!!!
• Ops people rapidly
became concerned
about visibility ofVP
Dashboard
• Started making noises
about “SCCM DB
Performance”
• Totally understandable
reaction

Ops Proposition: “Take our nice siding here”
• Instead of direct
production access, use
a secondary non-prod
DB employed for
reporting / query
• Turned out that this
DB underwent
arbitrary “black box”
ETL of SCCM data
depending on Ops
reporting needs,
visibility req’t!

“People” Solutions I
• Operate in “pirate” mode: Budget of 0 hr $0 means:
• Run under the radar
• Be focused and efficient – refactor prototype code into prod-ready batch
classes
• Be flexible, be creative:
• Docker-based project bounced from Ubuntu toWindows to CentOS to
save $
• Run on lab PCs, on scrapped PCs, on laptops, anything that is available
• Make deals
• “Sell your grandmother to the highest bidder” to get that precious direct
production access

“People” Solutions II
• Deliver quietly, slowly, and “down-sell” to ease viz concerns
• “Uh MrVP, your dashboard is not quite ready yet …”
• “This is a new app and new technology. Data reliability is still to be
proven …”
• Provide targeted Ops training
• Help the “dump truck” people understand the new-fangled
“airplane” paradigm
• Give Ops control and help them find ways to leverage the new
technology

Lessons learned
“What I didn’t do but should have”
• Data wrangling requires time and effort to do well
• Set management and user expectations at the outset
• Think “big”, think “production” to start
• “Take baby steps”: Always runnable continuous
development
• Write test cases before writing code
• Write code in small reusable modules with clean
interfaces
• Document and delegate

“People” Solutions: TL;DR
• Operate in “pirate” mode
• Be flexible, be creative
• When necessary:
• “Sell your grandmother to the highest bidder”
• Deliver quietly, slowly, and “down-sell”
• Provide targeted Ops training
• “Lessons learned”

Contact Information
Github and Docker Hub: lorgor/vulnmine
Peerlyst: lorgor77
Loren Gordon
Email: lgordon - - at - - lgsec.biz
Twitter: --at-- lorgorsec
Web: lorgor@blogspot.ca

References
System Center 2012 Configuration Manager Unleashed
Meyler et al, 2012 Sams, ISBN-13: 978-0-672-33437-5
MSTN Sys Center 2012 Config Mgr SQLView Schema
https://technet.microsoft.com/en-ca/library/dn581978.aspx
MSTN Gallery: SCCM CfgMgr 1602 SQLViews
Documentation
https://gallery.technet.microsoft.com/SCCM-Configmgr-1602-SQL-
8db3b11c
Shouts to Pixabay for their free images
https://pixabay.com

Mining software vulns in SCCM / NIST's NVD

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Mining software vulns in SCCM / NIST's NVD

Ähnlich wie Mining software vulns in SCCM / NIST's NVD (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Mining software vulns in SCCM / NIST's NVD