Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Rsyslog log normalization

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Wird geladen in …3
×

Hier ansehen

1 von 37 Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Anzeige

Ähnlich wie Rsyslog log normalization (20)

Anzeige

Aktuellste (20)

Rsyslog log normalization

  1. 1. Log Message Processing, Formatting and Normalizing with Rsyslog Rainer Gerhards
  2. 2. Rainer Gerhards, http://blog.gerhards.net What's in this talk? • Some Logging Basics • A practical Usage Scenario • Logging APIs • Background information on rsyslog processing
  3. 3. Rainer Gerhards, http://blog.gerhards.net Why Logging? • Troubleshooting • Security Alerting (e.g. SIEM) • Legal Requirements (e.g. banks) • Evidence in Court • Billing (e.g. Telecom Industry)
  4. 4. Rainer Gerhards, http://blog.gerhards.net Logging is simple, isn't it? • Just generate a log record when something interesting happens • BUT ▫ What is “interesting”? ▫ What is required to describe the event? ▫ How do we know what the actual data item means? ▫ What does a log record look like? • So... making sense out of logs, especially in a heterogeneous environment, is far from being simple...
  5. 5. Rainer Gerhards, http://blog.gerhards.net The Logging Dilemma • There is no universally accepted format • Logs looking very much the same describe different events • The same event is described in very different- looking log records • Often, pseudo-free-form text is used • For consumers, it is very hard to digest even a decent subset of important logging formats
  6. 6. Rainer Gerhards, http://blog.gerhards.net It's a real-world problem! One day in my mailbox... “I am working with a customer who is deploying a large rsyslog environment for central logging. Basically they want a cluster of boxes to act as the "log of record". They would also like to have the logs fed to a couple security products for analysis. The customer has a limited budget so having each vendor write parsers is cost prohibitive. ”
  7. 7. Rainer Gerhards, http://blog.gerhards.net Log Producers & Consumers Linux Boxes WindowsOther *nix FirewallsApps Security Analyzer I Log Storage Security Analyzer n Capacity Planning Billing ?
  8. 8. Rainer Gerhards, http://blog.gerhards.net Some important log sources • Free-form text formats ▫ Traditional syslog messages ▫ Application text log files • Structured formats ▫ Windows Event Log ▫ Linux Journal (today mostly text messages) ▫ Application text log files (XML, CSV, WELF, Apache CLF, whatever) ▫ SNMP traps ▫ New-style syslog
  9. 9. Rainer Gerhards, http://blog.gerhards.net How to solve that dilemma? • Several efforts try very hard to solve this ▫ For many years ▫ With limited success • Resulted in approach named “Common Event Expression” (CEE) ▫ Cross vendor team (both OSS & commercial) ▫ Driven by US MITRE ▫ Build on existing infrastructure
  10. 10. Rainer Gerhards, http://blog.gerhards.net
  11. 11. Rainer Gerhards, http://blog.gerhards.net CEE's core ideas • Keep it simple & extensible • Support existing technology • As far as the format is concerned ▫ name/value pairs ▫ Keep the structure as flat as possible, but permit some hierarchy ▫ Keep dictionaries of field names, syntax and semantic ▫ Profiles specify what needs to be present in specific event types
  12. 12. Rainer Gerhards, http://blog.gerhards.net Project Lumberjack • Born on last years Fedora DevConf, right here! • Intends to ▫ Build on CEE and drive the ideas further ▫ Provide open source implementation of core functionality ▫ Deliver something that actually works • Driven by Logging Professionals from Red Hat, Balabit (syslog-ng) and Adiscon (rsyslog), open to everyone else
  13. 13. Rainer Gerhards, http://blog.gerhards.net What did we do the past year? • Agree on the log format • Made rsyslog fully lumberjack-aware • Made Adiscon's Windows Products fully lumberjack-aware • Made syslog-ng fully lumberjack-aware • Create new syslog API --> libumberlog
  14. 14. Rainer Gerhards, http://blog.gerhards.net Back to my mailbox... “I am working with a customer who is deploying a large rsyslog environment for central logging. Basically they want a cluster of boxes to act as the "log of record". They would also like to have the logs fed to a couple security products for analysis. The customer has a limited budget so having each vendor write parsers is cost prohibitive. A commonality for each of the additional destinations is the ability to ingest logs in <some common format>. I believe rsyslog has the capability to alter the output...”
  15. 15. Rainer Gerhards, http://blog.gerhards.net Rsyslog as converter rsyslogd Linux Boxes WindowsOther *nix FirewallsApps Security Analyzer I Log Storage Security Analyzer n Capacity Planning Billing
  16. 16. Rainer Gerhards, http://blog.gerhards.net Some rsyslog basics • Ruleset ▫ Like a function in a programming language ▫ Consists of (conditional) statements and actions ▫ Can be called from another ruleset or bound to a listener • Variables ▫ Message Variables (e.g. $msg, $rawmsg) ▫ System Variables (e.g. $$now) ▫ Structured Variables: form a tree-like structure, e.g. $! usr!somevar
  17. 17. Rainer Gerhards, http://blog.gerhards.net Let's look at a practical case • Goal: Unified log files with logon/logoff report ▫ For processing by backend tools (not shown) ▫ concentrate on just four fields: host system, reception time, username, logon/logoff status • Inputs ▫ Linux: traditional text log messages ▫ Windows: different Agents • Output ▫ Lumberjack JSON style ▫ CSV
  18. 18. Rainer Gerhards, http://blog.gerhards.net Have rsyslog gather the data module(load="imtcp") /* We assume to have all TCP logging (for simplicity) * Note that we use different ports to point different sources * to the right rule sets for normalization. While there are * other methods (e.g. based on tag or source), using multiple * ports is both the easiest as well as the fastest. */ input(type="imtcp" port="13514" Ruleset="WindowsRsyslog") input(type="imtcp" port="13515" Ruleset="LinuxPlainText") input(type="imtcp" port="13516" Ruleset="WindowsSnare")
  19. 19. Rainer Gerhards, http://blog.gerhards.net The Linux Input Data sample • Free-text format Jan 16 09:28:33 rger-virtual-machine sudo: pam_unix(sudo:session): session opened for user root by rger(uid=1000) Jan 16 09:28:33 rger-virtual-machine sudo: pam_unix(sudo:session): session closed for user root Jan 24 02:38:49 rger-virtual-machine sshd[2414]: pam_unix(sshd:session): session opened for user rger by (uid=0) Jan 24 02:41:22 rger-virtual-machine sshd[2414]: pam_unix(sshd:session): session closed for user rger • Free-text format
  20. 20. Rainer Gerhards, http://blog.gerhards.net Parsing Free-Text Messages: mmnormalize • Uses a “sample rule base” ▫ One sample for each expected message type ▫ Sample contains text (for matching) and property descriptions (like IPv4 Address, char-matches, …) ▫ If sample matches, corresponding properties are extracted ▫ Special parser for iptables • Also implemented as an action • Very fast algorithm (much faster than regex) • Based on liblognorm (which you can use in your own programs to gain this functionality!)
  21. 21. Rainer Gerhards, http://blog.gerhards.net Needs to be normalized • Job for rsyslog's mmnormalize • rulebase: # SSH and sudo logins prefix=%rcvdat:date-rfc3164% %rcvdfrom:word% rule=: sshd[%-:number%]: pam_unix(sshd:session): session %type:word% for user %user:word% by (uid=%-:number%) rule=: sshd[%-:number%]: pam_unix(sshd:session): session %type:word% for user %user:word%rule=: sudo: pam_unix(sudo:session): session %type:word% for user root by %user:char-to:(%(uid=%-:number%) rule=: sudo: pam_unix(sudo:session): session %type:word% for user %user:word%
  22. 22. Rainer Gerhards, http://blog.gerhards.net Putting it all together: /* plain Linux log messages (here: ssh and sudo) need to be * parsed - we use mmnormalize for fast and efficient parsing * here. */ ruleset(name="LinuxPlainText") { action(type="mmnormalize" rulebase="/home/rger/proj/rsyslog/linux.rb" userawmsg="on") if $parsesuccess == "OK" and $!user != "" then { if $!type == "opened" then set $!usr!type = "logon"; else if $!type == "closed" then set $!usr!type = "logoff"; set $!usr!rcvdfrom = $!rcvdfrom; set $!usr!rcvdat = $!rcvdat; set $!usr!user = $!user; call outwriter } }
  23. 23. Rainer Gerhards, http://blog.gerhards.net Windows Horrors: SNARE • Tab-delimited mess: <131>Feb 10 15:48:12 Win2008StdR2x64_vm MSWinEventLog#0111#011Security#0114#011Tue Feb 05 16:39:27 2013#0114624#011Microsoft-Windows-Security- Auditing#011WIN2008STDR2X64Administrator#011N/A#011Success Audit#011Win2008StdR2x64_vm#011Anmelden#011#011Ein Konto wurde erfolgreich angemeldet. Antragsteller: Sicherheits-ID: S-1-5-18 Kontoname: WIN2008STDR2X64$ Kontodomäne: WORKGROUP Anmelde-ID: 0x3e7 Anmeldetyp: 2 Neue Anmeldung: Sicherheits-ID: S-1-5-21-3148105976-3029560809- 1855765213-500 Kontoname: Administrator Kontodomäne: WIN2008STDR2X64 Anmelde-ID: 0x1d1feb Anmelde-GUID: {00000000-0000-0000-0000- 000000000000} Prozessinformationen: Prozess-ID: 0xc40 Prozessname: C:WindowsSystem32winlogon.exe Netzwerkinformationen: Arbeitsstationsname: WIN2008STDR2X64 Quellnetzwerkadresse: 127.0.0.1 Quellport: 0 Detaillierte Authentifizierungsinformationen: Anmeldeprozess: User32 Authentifizierungspaket: Negotiate Übertragene Dienste: - Paketname (nur NTLM): - Schlüssellänge: 0 Dieses Ereignis wird beim Erstellen einer Anmeldesitzung generiert. Es wird auf dem Computer
  24. 24. Rainer Gerhards, http://blog.gerhards.net Anyhow... digest by position: ruleset(name="WindowsSnare") { set $!usr!type = field($rawmsg, "#011", 6); if $!usr!type == 4634 then { set $!usr!type = "logoff"; set $!doProces = 1; } else if $!usr!type == 4624 then { set $!usr!type = "logon"; set $!doProces = 1; } else set $!doProces = 0; if $!doProces == 1 then { set $!usr!rcvdfrom = field($rawmsg, 32, 4); set $!usr!rcvdat = field($rawmsg, "#011", 5); /* we need to fix up the snare date */ set $!usr!rcvdat = field($!usr!rcvdat, 32, 2) & " " & field($!usr!rcvdat, 32, 3) & " " & field($!usr!rcvdat, 32, 4); set $!usr!user = field($rawmsg, "#011", 8); call outwriter } }
  25. 25. Rainer Gerhards, http://blog.gerhards.net Windows: rsyslog Agent • Native Lumberjack format with Windows field names • A structured mess ;-) <133>Feb 05 11:15:56 win7fr.intern.adiscon.com EvntSLog: @cee: {"source": "win7fr.intern.adiscon.com", "nteventlogtype": "Security", "sourceproc": "Microsoft- Windows-Security-Auditing", "id": "4634", "categoryid": "12545", "category": "12545", "keywordid": "0x8020000000000000", "user": "NA", "TargetUserSid": "S-1-5-21- 803433813-209592097-1264475144-8733", "TargetUserName": "fr", "TargetDomainName": "ADISCON", "TargetLogonId": "0xb8c7aed", "LogonType": "7", "catname": "Logoff", "keyword": "Audit Success", "level": "Information", "msg": "An account was logged off.rnrnSubject:rntSecurity ID:ttS-1-5-21- 803433813-209592097-1264475144-8733rntAccount Name:ttfrrntAccount Domain:ttADISCONrntLogon ID:tt0xb8c7aedrnrnLogon Type:ttt7rnrnThis event is generated when a logon session is destroyed. It may be positively correlated with a logon event using the Logon ID value. Logon IDs are only unique between reboots on the same computer."}
  26. 26. Rainer Gerhards, http://blog.gerhards.net Parsing Lumberjack Data: mmjsonparse • Checks if message contains Lumberjack structured data ▫ If so  parse out fields  Use field names directly from the message ▫ If not: populate Lumberjack msg field • Implemented via action interface ▫ Can be called based on rules, thus only for specific events
  27. 27. Rainer Gerhards, http://blog.gerhards.net Reading the Lumberjack Data: /* the rsyslog Windows Agent uses native Lumberjack format * (better said: is configured to use it) */ ruleset(name="WindowsRsyslog") { action(type="mmjsonparse") if $parsesuccess == "OK" then { if $!id == 4634 then set $!usr!type = "logoff"; else if $!id == 4624 then set $!usr!type = "logon"; set $!usr!rcvdfrom = $!source; set $!usr!rcvdat = $timereported; set $!usr!user = $!TargetDomainName & "" & $!TargetUserName; call outwriter } }
  28. 28. Rainer Gerhards, http://blog.gerhards.net What did we do so far? • We accepted input from three different sources ▫ Free-form text ▫ Tab-delimited semi-structured ▫ Native Lumberjack • We extracted the same information items from these messages • And stored these inside the $!usr branch variables
  29. 29. Rainer Gerhards, http://blog.gerhards.net So we now need to write the normalized output! /* this ruleset simulates forwarding to the final destination */ ruleset(name="outwriter"){ action(type="omfile" file="/home/rger/proj/rsyslog/logfile.csv" template="csv") action(type="omfile" file="/home/rger/proj/rsyslog/logfile.cee" template="cee") }
  30. 30. Rainer Gerhards, http://blog.gerhards.net Templates do the actual work template(name="csv" type="list") { property(name="$!usr!rcvdat" format="csv") constant(value=",") property(name="$!usr!rcvdfrom" format="csv") constant(value=",") property(name="$!usr!user" format="csv") constant(value=",") property(name="$!usr!type" format="csv") constant(value="n") } template(name="cee" type="string" string="@cee: %$!usr%n")
  31. 31. Rainer Gerhards, http://blog.gerhards.net And this is a combined CEE output file: @cee: { "type": "logon", "rcvdfrom": "rger-virtual-machine", "rcvdat": "Jan 16 09:28:33", "user": "root" } @cee: { "type": "logoff", "rcvdfrom": "rger-virtual-machine", "rcvdat": "Jan 16 09:28:33", "user": "root" } @cee: { "type": "logon", "rcvdfrom": "Win2008StdR2x64_vm", "rcvdat": "Feb 05 16:39:27", "user": "WIN2008STDR2X64Administrator" } @cee: { "type": "logoff", "rcvdfrom": "WIN-VSBQP2NOITT", "rcvdat": "Jan 25 15:44:35", "user": "WIN-VSBQP2NOITTte" } @cee: { "type": "logoff", "rcvdfrom": "win7fr.intern.adiscon.com", "rcvdat": "Feb 5 11:15:56", "user": "ADISCONfr" } @cee: { "type": "logon", "rcvdfrom": "win7fr.intern.adiscon.com", "rcvdat": "Feb 5 13:41:28", "user": "NT AUTHORITYSYSTEM" }
  32. 32. Rainer Gerhards, http://blog.gerhards.net And the same in CSV: "Jan 16 09:28:33","rger-virtual-machine","root","logon" "Jan 16 09:28:33","rger-virtual-machine","root","logoff" "Jan 24 02:38:49","rger-virtual-machine","rger","logon" "Feb 05 16:39:27","Win2008StdR2x64_vm","WIN2008STDR2X64Administrator","logon" "Jan 25 15:44:35","WIN-VSBQP2NOITT","WIN-VSBQP2NOITTte","logoff" "Feb 5 11:15:56","win7fr.intern.adiscon.com","ADISCONfr","logoff" "Feb 5 13:41:28","win7fr.intern.adiscon.com","NT AUTHORITYSYSTEM","logon"
  33. 33. Rainer Gerhards, http://blog.gerhards.net Of course, this is just a small example, but • It shows how all the pieces can be put together • mmnormalize is a very important building block to integrate free-form text logs, no matter what the source is • The output format is highly flexible • Of course, structured outputs like MongoDB or Elasticsearch are also supported • We can emit almost all output formats, new ones requires relatively little work in rsyslog's engine
  34. 34. Rainer Gerhards, http://blog.gerhards.net Bottom line • Rsyslog can act today as an universal log format translator • We hope that consumer tools will make use of the simple-to-process lumberjack format • HOWEVER, we can already convert into what today's real-world analysis tools can digest
  35. 35. Rainer Gerhards, http://blog.gerhards.net Once again back to my inbox... • “I know this is asking a lot since rsyslog would have to do a bunch of processing. I also understand there may be a delay in log delivery due to the processing.” • Well … actually it's far from being as bad as described: ▫ Structured logs are ingested very quickly ▫ Liblognorm/mmnormalize is extremely fast in converting classical text logs ▫ Reformatting is done always in any case, so... ;-)
  36. 36. Rainer Gerhards, http://blog.gerhards.net Long-Term Vision • There NEVER will be a single format ▫ Political reasons (vendors, projects, history, ...) ▫ Need for new features/functionality • BUT: use as few as possible ▫ Less hassle for producer and consumer devs ▫ Forces closed source vendors to support these standard, making it easier for the OSS guys ▫ Big win for Enterprise folks who get plug&play • We hope that Lumberjack will be dominant ▫ Stack already in place ▫ Good & simple solution ▫ Rsyslog converts everything running on Linux
  37. 37. Rainer Gerhards, http://blog.gerhards.net Questions? • Please direct them to the rsyslog mailing list • Listinfo: http://lists.adiscon.net/mailman/listinfo/rsyslog

×