GRIMES_Visualizing_Telemetry

Running head: VISUALIZING AND PROCESSING WEATHER TELEMETRY
i
Visualizing and Processing Weather Satellite Telemetry:
A Solution Using Big Data Methodologies
Kevin M. Grimes, II
Mentor: Amalaye Oyake
Jet Propulsion Laboratory
20 August 2015

VISUALIZING AND PROCESSING WEATHER TELEMETRY ii
Abstract
The Ocean Surface Topography Mission’s Jason satellites have placed about a terabyte of
weather data into a MySQL database for analysis. Because the OSTM’s set of data is so
enormous, the processing time required to perform a query using its current setup is quite long.
Visualizing the data using the Cyclone tool is inefficient and does not allow for much interaction
with the graph. Using the “Big Data” tool Elasticsearch, we have proposed a Cyclone
replacement that quickens the process of querying and graphing Jason data by a factor of five. In
our proposed tool, the data is ingested into Elasticsearch via Logstash, an Elastic product. Once
the data is ingested into Elasticsearch, our visualizer queries it in a way that maximizes
efficiency and minimizes time. A specially designed UI allows for quick and easy access to the
satellite telemetry. Users may choose to have the data plotted in an interactive graph or printed in
a variety of formats. Our project’s functionality provides the user with a quick, easy, and
efficient experience that can be implemented as a suitable Cyclone replacement.
Keywords: Jason, telemetry, Elasticsearch, Cyclone

VISUALIZING AND PROCESSING WEATHER TELEMETRY iii
Acknowledgments
I would like to preface this report by thanking the multiple people whose efforts led to
my project’s completion. Without the support of these people I would not have been able to
accomplish what I have this summer.
My mentor, Amalaye Oyake - for choosing to bring me on board. It has been exciting and
challenging, and I cannot thank you enough for letting me be a part of it.
My partner, Daniele Bellutta - for working as hard as, if not harder than, I did. Thanks for
working long days with me and seeing this project through. Also, thanks for helping me by
answering questions I had; without your help, I would have spent even more time bugging
people on Stack Overflow.
Various JPL employees, including Dan Isla, Philip Southam, Stefan Eng, and David
Mittman - for giving me solutions to problems I was unable to solve.

VISUALIZING AND PROCESSING WEATHER TELEMETRY iv
Table of Contents
Abstract … … ii
Acknowledgments … … iii
Table of Contents … … iv
List of Figures and Examples … … v
List of Abbreviations … … vi
I. Background and Motivation … … 1
II. Methods … 3
A. Initial Processing … … 3
B. Ingestion … … 4
C. Visualization … … 5
D. Benchmarking … … 7
E. Data Dumping … … 8
III. Conclusions … … 11
IV. Future Work … … 12
References … … 15

VISUALIZING AND PROCESSING WEATHER TELEMETRY v
List of Figures and Examples
Figure 1. Cyclone, the tool currently used to process and visualize
Jason telemetry … 2
Figure 2. Apache Spark process flowchart … … 4
Figure 3. Our visualizer prototype … … 6
Figures 4-5. Benchmarking our visualizer with Cyclone … … 8
Example 1. Making an SQL query to dump Elasticsearch data to user’s system
while running the script … … 9
Example 2. Making an SQL query via the Perl data dump script’s command
line parameters … … 10
Example 3. Running the Perl data dump script with several options from the
command line … … 10

VISUALIZING AND PROCESSING WEATHER TELEMETRY vi
List of Abbreviations
CNES … … Centre national d'études spatiales
CPAN … … Comprehensive Perl Archive
Network
CSV … … Comma-separated format
DSN … … Deep Space Network
ECSV … … Encapsulated comma-separated
format
FTP … … File transfer protocol
GDS … … Ground data system
GHE … … GitHub Enterprise
JPL … … Jet Propulsion Laboratory
KB … … Kilobyte(s)
Mb … … Megabit(s)
MB … … Megabyte(s)
NASA … … National Aeronautics and Space
Administration
OSTM … … Ocean Surface Topography
Mission
PP … … Perl Packager
TOPEX … … Ocean Topography Experiment
UI … … User interface

VISUALIZING AND PROCESSING WEATHER TELEMETRY 1
Visualizing and Processing Weather Satellite Telemetry:
A Solution Using Big Data Methodologies
I. Background and Motivation
The Ocean Surface Topography Mission (OSTM) at Jet Propulsion Laboratory (JPL)
collects and analyzes data from our planet’s oceans. Their first mission was a collaborative effort
with the Centre national d'études spatiales (CNES), the French center for space research.
CNES’s Ocean Topography Experiment (TOPEX) merged with JPL’s Poseidon project and
launched the TOPEX/Poseidon satellite, commencing OSTM’s first mission. The satellite
received data such as oceanic temperatures and ocean levels and sent it via “space packets” to a
ground data system (GDS) on Earth where it was processed and stored into a large database.
Although the TOPEX/Poseidon mission ended in January 2006, several JPL satellites, including
Jason-2, continue to collect, process, and transmit oceanic data for scientific use.
The data collected by the OSTM has been extremely valuable in the field of meteorology.
Over four hundred scientists from thirty different nations use this data to perform climate
research, forecast hurricanes, route ships, and research coral reefs; in particular, the data gathered
by the Jason series of satellites is used to monitor changes in oceanic levels.1
Over the last
couple decades, the OSTM has accumulated multiple gigabytes of data useful to the scientific
community. With such a large set of data, however, comes large processing time. In order for
users to process and visualize Jason-2 telemetry, users must complete a long and complicated
request via Cyclone, the visualization tool currently in place. Once the request has been
submitted, the user must wait an extended amount of time for the graph to be created. For these
reasons and more, the OSTM felt that Cyclone needed to either be updated or replaced.
Our mentor, Amalaye Oyake, recommended that my partner, Daniele Bellutta, and I

experiment with “Big Data” tools such as Elasticsearch2
and Apache Spark.3
After some research
into the features of these utilities we devised a configuration that allowed for quick and easy data
ingestion and interactive visualization. We downloaded Logstash,4
an Elastic ingestion tool, and
ingested data directly into Elasticsearch. Once we ingested the data, we designed a visualizer
using JavaScript and D3.js,5
an external library. Finally, we designed a script using the Perl
programming language that would download data to the user’s computer directly from
Elasticsearch. Once these tools were implemented and our project was completed, we
benchmarked our visualizer with Cyclone and observed our results.

After several tests, we proved that our tool queried and visualized Jason-2 telemetry
much quicker than Cyclone. We ran multiple tests comparing our tool with Cyclone and
concluded that our visualizer works at a rate over five times quicker than Cyclone. Additionally,
our tool has a much nicer user-interface (UI) that allows users to customize their query much
easier than they could on Cyclone. Despite our success in this regard, however, ingestion is still a
Figure 1: Cyclone, the tool currently used to process and visualize Jason telemetry

time-consuming part of the visualization process. Clustering tools such as Apache Spark have the
capability to send data to several “workers,” sort it, and send it to Elasticsearch. Both Daniele
and I believe that ingestion speed would be reduced dramatically with the help of Spark but
were, due to time restrictions, unable to implement it ourselves. Despite slow ingestion, however,
the visualizer is a quick and easy-to-use tool that will, alongside our Perl data dump script, be
able to serve as a Cyclone replacement.
II. Methods

A. Initial Processing
Jason-2 collects data from Earth’s oceans and processes it on-site. This data is sent via
binary telemetry packets to JPL’s GDS and is relayed to the OSTM. Once this information is
received, it must be processed into some sort of format recognizable by ingestion tools. In order
to ingest this data into Elasticsearch, Daniele and I needed a method to first covert the binary
telemetry packets into encapsulated comma-separated format (ECSV).
We were given access to the OSTM’s file-transfer protocol (FTP) system and were able
to pull a sampling of data to use in development. Our mentor, Amalaye Oyake, gave us a few
Perl scripts that had been used to process binary telemetry of various types and either display it
in the Terminal or pipeline it into MySQL. We were able to modify these scripts in such a way
that they exported the data to the user’s system in ECSV format. In order to automate the
process, we wrote a shell script that would run the telemetry export script repeatedly until all the
binaries in the specified directory had been converted. Once we had converted all of the
telemetry we were given, we began to explore methods of ingestion into Elasticsearch.

B. Ingestion
We explored thoroughly two methods of ingestion into Elasticsearch: Apache Spark and
Elastic’s Logstash. Spark, a clustering tool used by companies such as Amazon, eBay, and JPL’s
own Deep Space Network (DSN),6
takes data parsed by an external parser and sends it to
multiple “workers” which process it and return it to the user. Logstash, on the other hand, does
not require much parsing prior to ingestion and is much easier to use; however, it is not as
powerful as Apache Spark. For the sake of ease, we chose to ingest our data with Logstash but
hope that future programmers could incorporate Spark’s clustering capabilities into the
visualization process.
Although extremely powerful, Spark is particularly picky concerning how it reads in data
and how it returns it to the user. Daniele and I spent the first week of our summer program
developing a parser in Scala that would pipeline the ECSV files generated by our Perl scripts into
Spark. Once we had the data ingested into Spark, we attempted to process and sort it with the
help of “workers.” The final step in the Apache Spark ingestion was sending the processed and
sorted data from Spark into Elasticsearch. We experimented with Spark functions that would
Figure 2: Apache Spark process flowchart

pipeline the data into Elasticsearch but met several complications. Rather than invest another
week into configuring Spark, we decided to turn our focus to an easier-to-use tool, Logstash.
Logstash, an Elastic product, proved to be much simpler than Spark. While we spent
nearly a week developing a parser that would send data to Spark, we spent only one day
implementing Logstash. We developed a configuration file that would tell Logstash how to
ingest the data. This file included the names of the various telemetry, the index under which the
ingested data would be stored, and the path to the files to be ingested. After a few minutes
passed, our data was successfully ingested into Elasticsearch in a neat and organized manner.
Although Logstash is easier to use than Spark, it takes much more time to ingest data.
When we ingested a relatively small amount of data of ~50 megabytes (MB), Logstash needed a
few minutes to process and send the data to Elasticsearch. If we were to try to ingest terabytes of
data, Logstash would need to run for hours. In this regard, our project still has room to grow.
Despite slow ingestion, however, we found that Logstash was able to meet our needs for our
small sampling of data and began to explore visualization techniques.
C. Visualization
Once the telemetry was ingested into Elasticsearch, we began development on our
visualizer. We considered a few external libraries that seemed to be reliable and efficient and
settled on coding the visualizer in JavaScript with the help of the D3.js external library. The
D3.js library provided dozens of graphing functions that were invaluable to us during
development. Two thousand lines of code later, we had produced a tool that was quick and easy
to use.
The visualizer queries the Elasticsearch database multiple times. Initially, it queries for
the range of dates, fields, and a few other pieces of data. This data is used in several drop-down

menus and boxes to allow the user to customize his/her results. The user may decide to either use
our UI with drop-down boxes or he/she may format a request in SQL via a SQL plug-in we
installed.7
Once the user submits the request, it is sent to Elasticsearch and processed in an initial
query followed by a series of scroll queries. The initial query queries Elasticsearch for a fixed
amount of data and returns a “scroll ID” that can be used to pick up where that query left off. A
scroll query is then made using the scroll ID returned by the initial query and retrieves the rest of
the data.

As Elasticsearch returns the final packets of data, the visualizer begins to create a graph.
This graph—depending upon the user’s settings—may contain all of the points returned by the
Elasticsearch queries or it may be only an averaging of the data. If the user desires, the visualizer
can count the number of points in each vertical column of pixels and calculate an average using
Elasticsearch’s “aggregation” feature. This significantly reduces the number of points being
plotted while still keeping the general trend of the larger data set. In addition to averaging, the
data may also be scaled by a scaling factor. Once the graph has been plotted, the user may hover
Figure 3: Our visualizer prototype

his/her cursor over the points and a “tooltip” will draw a vertical line through the nearest point
and show on the side which point is being studied and the corresponding value of that point.
Additionally, the user may zoom in and out using a slider at the bottom of the screen. As the user
pans from side to side or zooms, Elasticsearch is queried for the points needed. This “dynamic
querying” feature of the visualizer, combined with the various settings, provides for a quick and
easy user experience.
Once the visualizer was developed, we used Wireshark,8
a network protocol analyzer, to
determine how much bandwidth our visualizer used while performing queries. When we queried
three hours’ worth of data without using aggregation, for example, Elasticsearch returned about
13.5 megabits (Mb) of data over twenty-six 65 kilobyte (KB) packets. When the aggregation
feature was applied, 0.5 Mb of bandwidth was used over one 65 KB packet. The 65 KB packet
size remains constant as long as the graph size remains constant, as one point of data is loaded
per column of pixels. Because Elasticsearch returns data in these small packets, there is no
potential for heavy network traffic.
D. Benchmarking
In order to determine whether or not our visualizer performed better than Cyclone (the
visualization tool currently being used by the OSTM), Daniele and I performed a series of tests.
We took great precaution to ensure that both tools plotted the same amount of data in these tests.
To accomplish this, we found that the “bucket size” setting in our visualizer functioned similarly
to the “fidelity” setting in Cyclone. The two were set to produce nearly the same amount of data
and then a total of twenty-four tests were run.
Setting the two visualizers in such a way that they both produced the same amount of
data was accomplished via our visualizer’s “bucket size” setting and Cyclone’s “fidelity” setting.

In our visualizer’s JavaScript the amount of data points collected for each averaging instance can
be set. The smaller the number is, the larger the amount of data points that will be plotted. In
Cyclone, the “fidelity” setting determines how true to the original data set the resulting graph
should be. Just as with our visualizer, the smaller this number is, the more points will be
graphed.
Once the two tools were calibrated to produce the same amount of points, we began
testing. Each tool was tested a total of twelve times: three times we tested over two time intervals
across two different fields. We were able to conclusively state that our visualizer ran over five
times faster than Cyclone.
Our Visualizer
Interval
13 July 2014 09:00 – 15 July 2014
06:00
19 July 2014 15:00 – 21 July 2014
09:00
Field GPS A Current AMR V Current GPS A Current AMR V Current
Trial #1 5.57 s 6.62 s 4.17 s 4.98 s
Trial #2 5.99 s 5.94 s 4.14 s 4.83 s
Trial #3 6.61 s 6.16 s 3.90 s 5.90 s
Field Average 6.06 s 6.24 s 4.07 s 5.24 s
Interval Average 6.15 s 4.65 s
Overall Average 5.40 s

Cyclone

Interval

13 July 2014 09:00 – 15 July 2014
06:00

19 July 2014 15:00 – 21 July 2014
09:00

Field
GPS A Current
AMR V Current
GPS A Current
AMR V Current

Trial #1
30.55 s
32.01 s
28.42 s
24.91 s

Trial #2
27.96 s
27.53 s
28.06 s
23.93 s

Trial #3
31.52 s
30.69 s
27.10 s
25.15 s

Field Average
30.01 s
30.08 s
27.86 s
24.66 s

Interval Average 30.04 s 26.26 s
Overall Average 28.15 s

Figures 4-5: Benchmarking our visualizer (top) with Cyclone (bottom)

E. Data Dumping
In an attempt to make our project as developer-friendly as possible, we developed a tool,
1700 lines of code long, separate from the visualizer that would pull data stored in Elasticsearch
to the user’s system. This data can be used for statistical analysis and in development of other
visualization tools. If the script is run from the command line without any parameters, it displays
a welcome message and the main menu. At this menu, the user is asked whether he/she would
like to query via a format similar to that of the visualizer’s “drop-down menu” syntax or if they
would like to query via a SQL plug-in. If they choose to query via the SQL plug-in, they type
their query just as they would type any other SQL query. Consider the following example.
In Example 1, the index “jason-3” and APID type “260” are chosen. The query will
return values from the “lraTemp” and “dt” fields where “lraTemp” is in-between 280 and 295.
The “LIMIT” part of this query specifies how many results can be returned by the query. Since
the user in Example 1 entered a limit of 10,050 no more than 10,050 results will be exported.
Unfortunately, if too large of a query size is entered, the SQL plug-in will crash. Because of this,
a default query limit of 100,000 has been set in the script’s code. After the user enters his/her
query, they will be asked in which file format they would like to have their results exported. If no
results are found, the script will say so. Otherwise, an output file will be created with their results
in their specified format.
If the user decided to use the script’s syntax rather than use the SQL plug-in, they will be
SELECT lraTemp,dt FROM jason-3/260 WHERE lraTemp BETWEEN 280 AND 295 LIMIT
10050
Example 1: Making an SQL query to dump Elasticsearch data to user’s system while running the script

prompted to answer a series of questions concerning their query. They will first be asked whether
they would like to query and export the entire data set or a section of it. If they would like to
query specific dates, the script will prompt them for them. Otherwise, the script will continue to
the next prompt: whether or not the user would like to enter a scaling factor. If entered, this
number will be multiplied by every numeric value being printed. The next prompt asks the user
to specify which fields he/she would like to export. If the user does not know which fields are
available, he/she may type “list” and see the list appear on the screen. Otherwise, they may type
the individual fields they would like to see printed or “all” if they would like all of them. Finally,
the script prompts the user for which format they would like to have their data. After some time,
a file is generated containing the user’s requested data.
Another feature of the data dumping script is its ability to run entirely from the command
line. For example, if the user would like to query via a SQL query, he/she can type “--sql”
followed by their query. If they would like to choose a specific APID or ingestion version, they
can specify them with the “--s” flag. Several extra settings can be specified with the “--e” flag.
Consider the following example.
Example 2 accomplishes everything Example 1 accomplished, but from the command line. No
interaction with the script’s interface is needed. Consider another example.
user@mycomp:$ perl data-dumper.pl --sql csv SELECT lraTemp,dt FROM Jason-
3/260 WHERE lraTemp BETWEEN 280 AND 295 LIMIT 100500
Example 2: Making an SQL query via the Perl data dump script’s command line parameters

Example 3, although seemingly complicated, accomplishes a lot. The two dates following the “--
e” flag tell the script to query all of the data in-between them. The “csv” parameter tells the
script to export the results in comma-separated format (CSV). The “100” immediately following
is the scaling factor—that is, by how much to multiply every numeric value. The parameters that
follow (up until the “--s” flag) are the fields that will be queried. After the “--s” flag are a few
more parameters: the domain from which Elasticsearch is hosting, the ingestion version (index),
the telemetry APID type, the query size (the maximum number of packets each query to
Elasticsearch will produce), whether or not to round the results (1 for yes and 0 for no), and by
how much to round the results (in this case, all results will be rounded to the nearest thousandth).
All of these settings could be set via script prompts, but having the ability to set them from the
command line can be useful in situations where the data dumper script needs to be called from
another script.
Once the script was completed, we compiled it into an image to be shipped via Docker.9
Docker is an open-source tool that automates the deployment of applications inside software
containers that can be run from any system. Using a configuration file we included all of the
different CPAN (Comprehensive Perl Archive Network) modules inside the image so that the
user does not need to install anything other than Docker in order to run the script. Compiling
everything needed to run the script in an image significantly reduces the amount of work
user@mycomp:$ perl data-dumper.pl --e 2014-12-11T17:21:30.000Z 2014-12-
11T17:21:45.000Z csv 100 apid lraTemp --s http://localhost:9200/ jason-3
260 10000 1 0.100
Example 3: Running the Perl data dump script with several options from the command line

required on the user’s part; however, they may also download the Perl files from a JPL GHE
repository.
III. Conclusions
At the beginning of the summer we were asked by our mentor to develop a quick and
easy-to-use tool that would ingest and visualize OSTM telemetry. The OSTM already had a
system in place, but they wanted a solution that would implement “Big Data” tools and therefore
reduce the time needed to receive a graph. In this regard, we have been more than successful.
Concerning ingestion, we have implemented a simple (albeit slightly slow) solution: we
installed and calibrated Logstash in such a way that it dynamically ingests data into
Elasticsearch. ECSV files can be “dragged-and-dropped” into the ingestion folder and be
automatically ingested into Elasticsearch. Although this method of ingestion is slow, it is easy to
use and can be replaced by other ingestion methods if the desire to do so arises.
We have created a visualizer that is extremely efficient and easy to use. The tool is set up
in such a way that new data is not queried until it is needed. Points are filtered using an
averaging technique that processes points on a pixel-by-pixel basis. As a result of the
Elasticsearch implementation, we have proven that our tool runs five times faster than Cyclone,
the visualizer currently in place.

Finally, we have taken large steps towards abstracting our tool. Comments are printed
throughout every file of code. A data dumping utility has been created that will help future
programmers develop visualization tools of their own and help statisticians derive conclusions
about the satellite’s observations. Additionally, we installed a SQL plug-in into the visualizer so
that those familiar with SQL can easily visualize the data they desire. For easy access to our
summer project, all of our work has been pushed into JPL GitHub Enterprise repositories. The

OSTM may access these files to use with Jason-2 data, or they may modify them slightly to work
with Jason-3, SWOT, or Jason-CS data.
The goal of this summer was to make an enormously complicated process seem simple to
the user. We spent a large portion of our ten weeklong internship learning about the various
pieces of software and languages with which we needed to code; as we learned more and more,
we coded with the hope that users would not have to spend the same amount of time we did
learning. With the tools we have coded and Elasticsearch in place, we feel that everyone in the
OSTM will be able to access Jason telemetry easily and quickly with only the most basic
understanding of OSTM telemetry.
IV. Future Work
Although Daniele and I were able to accomplish much during our internship and produce
significant results, there is much more work that can be done to improve our visualizer and the
visualization process as a whole. We were able to speed up the front-end side of the visualizer by
including features such as dynamic querying but were not able to speed up the ingestion process.
In order to speed up the process, the OSTM should consider converting the telemetry binaries on
the JPL Cloud and ingesting the ECSVs on a cluster.
At the moment, Jason telemetry packets are being processed by one Perl script one at a
time. While we were working with ~50 MB of data, we found that it took the Perl conversion
script about twenty minutes to process all the data. If the OSTM were to try to convert the entire
terabyte of Jason telemetry this way the time required would be incredible. In order to escape
this, the OSTM can “containerize” the Perl conversion scripts and run them on the JPL Cloud.
By placing the conversion scripts into individual containers, dozens of them can be run on the
Cloud at a time. Rather than having one script processing all of the n data files after another, n

files could be sent to n containers running on the Cloud. This way, each container would only
have to process 1 file. The only real downside to sending each telemetry file to its own container
is that it could be very expensive and demanding on the JPL Cloud. To avoid this, the OSTM
could perhaps send twenty or so binary files to each container. Doing so would take twenty times
as long as sending each file to its own container, but it would also place one twentieth the load
on the JPL Cloud.
Sending telemetry to the Cloud will save much time and effort on part of the OSTM. Dan
Isla and Philip Southam, two developers at JPL, have offered to help the OSTM “containerize”
the Perl conversion scripts. They have made significant progress towards this end. Once they
have finished, the OSTM should be able to convert their binaries at a very quick rate.
A second enhancement that could be made to our summer project is clustering the
ingestion process. Currently Logstash is ingesting the ECSVs into Elasticsearch. Although
Logstash is very easy to use, it is a bit slow. In order to speed up the ingestion process, the
OSTM could look into ingesting via Apache Spark. Spark is a utility designed to work on a
cluster: if a parser were to be developed that would send the telemetry to Spark, the data could
then be ingested into Elasticsearch directly.

References
1. Jet Propulsion Laboratory (n.d.). Mission Basics. Accessed from
http://sealevel.jpl.nasa.gov/overview/missionbasics/
2. Elasticsearch is a free, open-source tool available to download from
https://www.elastic.co/products/elasticsearch/
3. Apache Spark is a free, open-source tool available to download from
http://spark.apache.org/
4. Logstash is a free tool available to download from
https://www.elastic.co/products/logstash/
5. D3 is a free Javascript library available to download from http://d3js.org/
6. Konwinski, Andy. Powered by Spark (2015, August 14). Accessed from
https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark/
7. User NLPchina’s elasticsearch-sql is an open-source plug-in available from
https://github.com/NLPchina/elasticsearch-sql/
8. Wireshark is a free tool available to download from https://www.wireshark.org/
9. Docker is a free tool available to download from https://www.docker.com/

GRIMES_Visualizing_Telemetry

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie GRIMES_Visualizing_Telemetry

Ähnlich wie GRIMES_Visualizing_Telemetry (20)

GRIMES_Visualizing_Telemetry