2. About Me
Spent 12 years doing Networking, Linux, High Performance
Computing in Finance, Bio-Technology and other sectors
Left IT in 2007 to focus on product development
Did a 1 week contract fixing Avionics Networking code, and haven’t
left Aviation since.
Now responsible for Product & Services Development at Satcom
Direct
3. About My Company
Satcom Direct provides connectivity and communications for
Aviation, Maritime and Land Mobile customers. Built around a
core focus of support and service, we now serve thousands of
customers world wide, including the Fortune 500, NATO & Allied
Forces, and various Heads of State.
4. Agenda
Splunk – not really on a plane (yet)
Data Sources
How we use Splunk
– Support – Monitoring & Alerting
– Business Analytics
Tracking Planes
– The technican’s flight tracker
Splunk Tips
6. Data Sources
We feed Splunk pretty much anything we can get our hands
on, both standard IT data, and some more esoteric data
–
–
–
–
–
CDRs for Phone Calls (AudioCodes, Asterisk)
Syslogs from network appliances & servers
Radius accounting data
Logs from Satcom Systems (via email, or mobile apps)
Aircraft Position + Status Reports
We normalize Aircraft Position reports before feeding them to
Splunk
– Fields are extremely complex, often missing, sometimes delayed, and come
from at least 5 different sources. And they are all totally inconsistent.
8. Data Sources - Expand
Sep 14 15:53:07 63.###.###.210 accelerator[4142]: Link ID 115 was Updated
Sep 14 15:53:07 63.###.###.210 accelerator[4142]: Link 10.###.###.66 status changed from negotiating to accelerating
Sep 14 15:53:07 63.###.###.210 accelerator[4142]: Acceleration enable to peer-10.###.###.66:0, with decore size - 4194304
Sep 14 15:53:07 63.###.###.210 accelerator[4142]: Acceleration enable to peer- 10.###.###.66:0, with core size - 4194304
Sep 14 15:53:07 63.###.###.210 accelerator[4142]: Link ID 115 was Updated
Sep 14 15:53:00 63.###.###.210 accelerator[4142]: Link ID 115 was Updated
Sep 14 15:53:00 63.###.###.210 accelerator[4142]: Link 10.###.###.66 status changed from drop to negotiating
Sep 14 15:53:00 63.###.###.210 accelerator[4142]: Link ID 115 was Updated
Sep 14 15:53:00 63.###.###.210 accelerator[4142]: Subnets for Remote link CP Id 115 changed
Sep 14 15:53:00 63.###.###.210 accelerator[4142]: Link ID 115 was Updated
Sep 14 15:52:34 63.###.###.210 accelerator[4142]: Link ID 115 was Updated
Sep 14 15:52:34 63.###.###.210 accelerator[4142]: Link 10.###.###.66 status changed from accelerating to drop
Sep 14 15:52:34 63.###.###.210 accelerator[4142]: Update peer failed with code 22.
Sep 14 15:52:34 63.###.###.210 accelerator[4142]: Link ID 103 was Updated
10. Support: Monitoring and Alerting
•
Splunk provides a real-time dashboard in our NOC about the status
of several key services
•
Previously, support techs would need to login to 3-5 different
systems to look for faults or errors. Each system had a different
UI, different formats and different data. Techs learned, but over
long periods as errors were often infrequent and obscure
•
Now data is in one system, one interface, with intelligence ‘coded
in’ by our senior techs
11. Support: Monitoring and Alerting
•
We merge log data with our Configuration Management database
so we can display aircraft Tail Numbers, Phone Numbers and
relevant data directly on the dashboard.
– Allows our support team to see customers as their aircraft logon to the
satellites and move data or make voice calls
– Support techs can verify while still on the phone with the customer (data is
~60-90 seconds delayed)
CSA Data Entry
CM Servers
Feed Splunk CSV tables for Lookups
indexer
12. Support: Monitoring and Alerting
•
We can be proactive – Splunk alerting allows us to capture issues
immediately – customers unable to connect (incorrect
passwords, or invalid settings). We know we’ll get a call, or we can
call the customer directly.
13. Support: Monitoring & Alerting
•
.conf 2013 Stump the Experts Report – counting in-flight (Literally!)
transactions over time to gauge volumes
14. Support: Monitoring and Alerting
•
Alerts help capture out of the ordinary situations
•
More that # occurrences in a given timespan alerts take 60 seconds
to setup – use them
•
Now when something spirals out of control, you’ll know!
16. Business Analytics
•
We’ve always been a data driven organization – we focus heavily on
configuration management for customer avionics
•
Using Splunk to analyze the data helps us make smart decisions
•
Each time we deep dive into the data, we learn new things
17. Business Analytics
•
We used Splunk to determine how to size our new DNS
infrastructure
•
Fed DNS stats (Bind + script + syslog) into Splunk for a few weeks,
visualized the results and then were able to do capacity planning
18. Business Analytics – VoIP Call Rates
•
We can monitor the Country Codes dialed for our Satellite Voice
calls in aggregate, so we know what countries our customers call
most often. We then push our telecom & VoIP providers to
negotiate better rates.
•
Splunk tells us what countries we need to focus on, so we ignore
the long rate cards and get right down to the ones we care about.
19. Business Analytics – VoIP Call Rates
•
We can then route outbound calls based on destination country
code to a different provider, reducing our direct cost per second for
call terminations
21. Flight Tracking
Where the plane is coming or going isn’t what is important
Common problems with Satellite communications are handovers –
where you change which satellite you are talking to while in flight
Historically it’s hard to correlate events with location visually
Google Earth/Google Maps were a major leap, but not automated
Enter Splunk w/Google Maps plugin – now we can put all the data
in a consistent visual format.
22. Flight Tracking Data
FAA ASDI
users
Other Apps
Sat. Provider 1
FT Server
Process & Normalize All Data
Sat. Provider 2
Satcom
Terminal
forwarder
indexer
27. Transactions
Insanely powerful for gathering statistics.
tag="Expand" "status changed" |rex "s.*?Links(?<AircraftIP>S+)" |transaction AircraftIP State
startswith="negotiating to accelerating" endswith="accelerating to drop"
Sep 14 15:53:07 63.###.###.210 accelerator[4142]: Link ID 115 was Updated
Sep 14 15:53:07 63.###.###.210 accelerator[4142]: Link 10.###.###.66 status changed from negotiating to accelerating
Sep 14 15:53:07 63.###.###.210 accelerator[4142]: Acceleration enable to peer-10.###.###.66:0, with decore size - 4194304
Sep 14 15:53:07 63.###.###.210 accelerator[4142]: Acceleration enable to peer- 10.###.###.66:0, with core size - 4194304
Sep 14 15:53:07 63.###.###.210 accelerator[4142]: Link ID 115 was Updated
Sep 14 15:53:00 63.###.###.210 accelerator[4142]: Link ID 115 was Updated
Sep 14 15:53:00 63.###.###.210 accelerator[4142]: Link 10.###.###.66 status changed from drop to negotiating
Sep 14 15:53:00 63.###.###.210 accelerator[4142]: Link ID 115 was Updated
Sep 14 15:53:00 63.###.###.210 accelerator[4142]: Subnets for Remote link CP Id 115 changed
Sep 14 15:53:00 63.###.###.210 accelerator[4142]: Link ID 115 was Updated
Sep 14 15:52:34 63.###.###.210 accelerator[4142]: Link ID 115 was Updated
Sep 14 15:52:34 63.###.###.210 accelerator[4142]: Link 10.###.###.66 status changed from accelerating to drop
Sep 14 15:52:34 63.###.###.210 accelerator[4142]: Update peer failed with code 22.
Sep 14 15:52:34 63.###.###.210 accelerator[4142]: Link ID 103 was Updated
28. Transactions
Run against a few hours of data, and we see lots of transactions
occurring. So we know how long each Aircraft is ‘in session’ for.
29. Transactions
Now what? Let’s do some math and get some stats!
tag="Expand" "status changed" |rex
"s.*?Links(?<AircraftIP>S+)" |transaction AircraftIP State
startswith="negotiating to accelerating" endswith="accelerating
to drop" | eval ConnectedFor(Mins)=round(duration/60) | lookup
taillookup ip as AircraftIP OUTPUT subnet_name as Tail|stats
sum(ConnectedFor(Mins)) as TimeOnline by Tail| sort TimeOnline
30. Transaction - Visualizations
Once you have the data, visualizations on the dashboard allow us
to know at a glance if a service is performing within limits
We adjust the gauge colors – in this case, higher is better
31. Don’t Fear CSV
KISS – and CSV is certainly that
Great for mapping things like IP/Subnets to Customers
Easier to manipulate text files to clean them up
Great for things that don’t change too often
# Sort by IP address so searches are easier
sort -t . -k 1,1n -k 2,2n -k 3,3n -k 4,4n ip-customers.in > ip-customers.csv
cp ip-customers.csv /opt/splunk/etc/system/lookups/ip-customers.csv
CIDR Lookup Scripts: http://answers.splunk.com/answers/5916/using-cidr-in-a-lookup-table
32. Summary
Alerting based on frequency of events within a timeframe can be
extremely powerful to detect anomalies
Sometimes you need to clean up your data before you send it into
Splunk – Garbage in, garbage out
Adding external lookups can be as simple as CSV files – don’t
overthink it
’transaction’ helps make sense of time & duration based data
Use Splunk to guide your choices with real data – embrace
Empiricism to make good business decisions