SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Downloaden Sie, um offline zu lesen
Running head: VISUALIZING AND PROCESSING WEATHER TELEMETRY	
   i
Visualizing and Processing Weather Satellite Telemetry:
A Solution Using Big Data Methodologies
Kevin M. Grimes, II
Mentor: Amalaye Oyake
Jet Propulsion Laboratory
20 August 2015
VISUALIZING AND PROCESSING WEATHER TELEMETRY ii
Abstract
The Ocean Surface Topography Mission’s Jason satellites have placed about a terabyte of
weather data into a MySQL database for analysis. Because the OSTM’s set of data is so
enormous, the processing time required to perform a query using its current setup is quite long.
Visualizing the data using the Cyclone tool is inefficient and does not allow for much interaction
with the graph. Using the “Big Data” tool Elasticsearch, we have proposed a Cyclone
replacement that quickens the process of querying and graphing Jason data by a factor of five. In
our proposed tool, the data is ingested into Elasticsearch via Logstash, an Elastic product. Once
the data is ingested into Elasticsearch, our visualizer queries it in a way that maximizes
efficiency and minimizes time. A specially designed UI allows for quick and easy access to the
satellite telemetry. Users may choose to have the data plotted in an interactive graph or printed in
a variety of formats. Our project’s functionality provides the user with a quick, easy, and
efficient experience that can be implemented as a suitable Cyclone replacement.
Keywords: Jason, telemetry, Elasticsearch, Cyclone
VISUALIZING AND PROCESSING WEATHER TELEMETRY iii
Acknowledgments
I would like to preface this report by thanking the multiple people whose efforts led to
my project’s completion. Without the support of these people I would not have been able to
accomplish what I have this summer.
My mentor, Amalaye Oyake - for choosing to bring me on board. It has been exciting and
challenging, and I cannot thank you enough for letting me be a part of it.
My partner, Daniele Bellutta - for working as hard as, if not harder than, I did. Thanks for
working long days with me and seeing this project through. Also, thanks for helping me by
answering questions I had; without your help, I would have spent even more time bugging
people on Stack Overflow.
Various JPL employees, including Dan Isla, Philip Southam, Stefan Eng, and David
Mittman - for giving me solutions to problems I was unable to solve.
VISUALIZING AND PROCESSING WEATHER TELEMETRY iv
Table of Contents
Abstract … … ii
Acknowledgments … … iii
Table of Contents … … iv
List of Figures and Examples … … v
List of Abbreviations … … vi
I. Background and Motivation … … 1
II. Methods … 3
A. Initial Processing … … 3
B. Ingestion … … 4
C. Visualization … … 5
D. Benchmarking … … 7
E. Data Dumping … … 8
III. Conclusions … … 11
IV. Future Work … … 12
References … … 15
VISUALIZING AND PROCESSING WEATHER TELEMETRY v
List of Figures and Examples
Figure 1. Cyclone, the tool currently used to process and visualize
Jason telemetry … 2
Figure 2. Apache Spark process flowchart … … 4
Figure 3. Our visualizer prototype … … 6
Figures 4-5. Benchmarking our visualizer with Cyclone … … 8
Example 1. Making an SQL query to dump Elasticsearch data to user’s system
while running the script … … 9
Example 2. Making an SQL query via the Perl data dump script’s command
line parameters … … 10
Example 3. Running the Perl data dump script with several options from the
command line … … 10
VISUALIZING AND PROCESSING WEATHER TELEMETRY vi
List of Abbreviations
CNES … … Centre national d'études spatiales
CPAN … … Comprehensive Perl Archive
Network
CSV … … Comma-separated format
DSN … … Deep Space Network
ECSV … … Encapsulated comma-separated
format
FTP … … File transfer protocol
GDS … … Ground data system
GHE … … GitHub Enterprise
JPL … … Jet Propulsion Laboratory
KB … … Kilobyte(s)
Mb … … Megabit(s)
MB … … Megabyte(s)
NASA … … National Aeronautics and Space
Administration
OSTM … … Ocean Surface Topography
Mission
PP … … Perl Packager
TOPEX … … Ocean Topography Experiment
UI … … User interface
VISUALIZING AND PROCESSING WEATHER TELEMETRY 1
Visualizing and Processing Weather Satellite Telemetry:
A Solution Using Big Data Methodologies
I. Background and Motivation
The Ocean Surface Topography Mission (OSTM) at Jet Propulsion Laboratory (JPL)
collects and analyzes data from our planet’s oceans. Their first mission was a collaborative effort
with the Centre national d'études spatiales (CNES), the French center for space research.
CNES’s Ocean Topography Experiment (TOPEX) merged with JPL’s Poseidon project and
launched the TOPEX/Poseidon satellite, commencing OSTM’s first mission. The satellite
received data such as oceanic temperatures and ocean levels and sent it via “space packets” to a
ground data system (GDS) on Earth where it was processed and stored into a large database.
Although the TOPEX/Poseidon mission ended in January 2006, several JPL satellites, including
Jason-2, continue to collect, process, and transmit oceanic data for scientific use.
The data collected by the OSTM has been extremely valuable in the field of meteorology.
Over four hundred scientists from thirty different nations use this data to perform climate
research, forecast hurricanes, route ships, and research coral reefs; in particular, the data gathered
by the Jason series of satellites is used to monitor changes in oceanic levels.1
Over the last
couple decades, the OSTM has accumulated multiple gigabytes of data useful to the scientific
community. With such a large set of data, however, comes large processing time. In order for
users to process and visualize Jason-2 telemetry, users must complete a long and complicated
request via Cyclone, the visualization tool currently in place. Once the request has been
submitted, the user must wait an extended amount of time for the graph to be created. For these
reasons and more, the OSTM felt that Cyclone needed to either be updated or replaced.
Our mentor, Amalaye Oyake, recommended that my partner, Daniele Bellutta, and I
VISUALIZING AND PROCESSING WEATHER TELEMETRY 2
experiment with “Big Data” tools such as Elasticsearch2
and Apache Spark.3
After some research
into the features of these utilities we devised a configuration that allowed for quick and easy data
ingestion and interactive visualization. We downloaded Logstash,4
an Elastic ingestion tool, and
ingested data directly into Elasticsearch. Once we ingested the data, we designed a visualizer
using JavaScript and D3.js,5
an external library. Finally, we designed a script using the Perl
programming language that would download data to the user’s computer directly from
Elasticsearch. Once these tools were implemented and our project was completed, we
benchmarked our visualizer with Cyclone and observed our results. 	
  
After several tests, we proved that our tool queried and visualized Jason-2 telemetry
much quicker than Cyclone. We ran multiple tests comparing our tool with Cyclone and
concluded that our visualizer works at a rate over five times quicker than Cyclone. Additionally,
our tool has a much nicer user-interface (UI) that allows users to customize their query much
easier than they could on Cyclone. Despite our success in this regard, however, ingestion is still a
Figure 1: Cyclone, the tool currently used to process and visualize Jason telemetry
	
  
VISUALIZING AND PROCESSING WEATHER TELEMETRY 3
time-consuming part of the visualization process. Clustering tools such as Apache Spark have the
capability to send data to several “workers,” sort it, and send it to Elasticsearch. Both Daniele
and I believe that ingestion speed would be reduced dramatically with the help of Spark but
were, due to time restrictions, unable to implement it ourselves. Despite slow ingestion, however,
the visualizer is a quick and easy-to-use tool that will, alongside our Perl data dump script, be
able to serve as a Cyclone replacement.
II. Methods	
  
A. Initial Processing
Jason-2 collects data from Earth’s oceans and processes it on-site. This data is sent via
binary telemetry packets to JPL’s GDS and is relayed to the OSTM. Once this information is
received, it must be processed into some sort of format recognizable by ingestion tools. In order
to ingest this data into Elasticsearch, Daniele and I needed a method to first covert the binary
telemetry packets into encapsulated comma-separated format (ECSV).
We were given access to the OSTM’s file-transfer protocol (FTP) system and were able
to pull a sampling of data to use in development. Our mentor, Amalaye Oyake, gave us a few
Perl scripts that had been used to process binary telemetry of various types and either display it
in the Terminal or pipeline it into MySQL. We were able to modify these scripts in such a way
that they exported the data to the user’s system in ECSV format. In order to automate the
process, we wrote a shell script that would run the telemetry export script repeatedly until all the
binaries in the specified directory had been converted. Once we had converted all of the
telemetry we were given, we began to explore methods of ingestion into Elasticsearch.
VISUALIZING AND PROCESSING WEATHER TELEMETRY 4
B. Ingestion
We explored thoroughly two methods of ingestion into Elasticsearch: Apache Spark and
Elastic’s Logstash. Spark, a clustering tool used by companies such as Amazon, eBay, and JPL’s
own Deep Space Network (DSN),6
takes data parsed by an external parser and sends it to
multiple “workers” which process it and return it to the user. Logstash, on the other hand, does
not require much parsing prior to ingestion and is much easier to use; however, it is not as
powerful as Apache Spark. For the sake of ease, we chose to ingest our data with Logstash but
hope that future programmers could incorporate Spark’s clustering capabilities into the
visualization process.
Although extremely powerful, Spark is particularly picky concerning how it reads in data
and how it returns it to the user. Daniele and I spent the first week of our summer program
developing a parser in Scala that would pipeline the ECSV files generated by our Perl scripts into
Spark. Once we had the data ingested into Spark, we attempted to process and sort it with the
help of “workers.” The final step in the Apache Spark ingestion was sending the processed and
sorted data from Spark into Elasticsearch. We experimented with Spark functions that would
Figure 2: Apache Spark process flowchart
VISUALIZING AND PROCESSING WEATHER TELEMETRY 5
pipeline the data into Elasticsearch but met several complications. Rather than invest another
week into configuring Spark, we decided to turn our focus to an easier-to-use tool, Logstash.
Logstash, an Elastic product, proved to be much simpler than Spark. While we spent
nearly a week developing a parser that would send data to Spark, we spent only one day
implementing Logstash. We developed a configuration file that would tell Logstash how to
ingest the data. This file included the names of the various telemetry, the index under which the
ingested data would be stored, and the path to the files to be ingested. After a few minutes
passed, our data was successfully ingested into Elasticsearch in a neat and organized manner.
Although Logstash is easier to use than Spark, it takes much more time to ingest data.
When we ingested a relatively small amount of data of ~50 megabytes (MB), Logstash needed a
few minutes to process and send the data to Elasticsearch. If we were to try to ingest terabytes of
data, Logstash would need to run for hours. In this regard, our project still has room to grow.
Despite slow ingestion, however, we found that Logstash was able to meet our needs for our
small sampling of data and began to explore visualization techniques.
C. Visualization
Once the telemetry was ingested into Elasticsearch, we began development on our
visualizer. We considered a few external libraries that seemed to be reliable and efficient and
settled on coding the visualizer in JavaScript with the help of the D3.js external library. The
D3.js library provided dozens of graphing functions that were invaluable to us during
development. Two thousand lines of code later, we had produced a tool that was quick and easy
to use.
The visualizer queries the Elasticsearch database multiple times. Initially, it queries for
the range of dates, fields, and a few other pieces of data. This data is used in several drop-down
VISUALIZING AND PROCESSING WEATHER TELEMETRY 6
menus and boxes to allow the user to customize his/her results. The user may decide to either use
our UI with drop-down boxes or he/she may format a request in SQL via a SQL plug-in we
installed.7
Once the user submits the request, it is sent to Elasticsearch and processed in an initial
query followed by a series of scroll queries. The initial query queries Elasticsearch for a fixed
amount of data and returns a “scroll ID” that can be used to pick up where that query left off. A
scroll query is then made using the scroll ID returned by the initial query and retrieves the rest of
the data.	
  
As Elasticsearch returns the final packets of data, the visualizer begins to create a graph.
This graph—depending upon the user’s settings—may contain all of the points returned by the
Elasticsearch queries or it may be only an averaging of the data. If the user desires, the visualizer
can count the number of points in each vertical column of pixels and calculate an average using
Elasticsearch’s “aggregation” feature. This significantly reduces the number of points being
plotted while still keeping the general trend of the larger data set. In addition to averaging, the
data may also be scaled by a scaling factor. Once the graph has been plotted, the user may hover
Figure 3: Our visualizer prototype
VISUALIZING AND PROCESSING WEATHER TELEMETRY 7
his/her cursor over the points and a “tooltip” will draw a vertical line through the nearest point
and show on the side which point is being studied and the corresponding value of that point.
Additionally, the user may zoom in and out using a slider at the bottom of the screen. As the user
pans from side to side or zooms, Elasticsearch is queried for the points needed. This “dynamic
querying” feature of the visualizer, combined with the various settings, provides for a quick and
easy user experience.
Once the visualizer was developed, we used Wireshark,8
a network protocol analyzer, to
determine how much bandwidth our visualizer used while performing queries. When we queried
three hours’ worth of data without using aggregation, for example, Elasticsearch returned about
13.5 megabits (Mb) of data over twenty-six 65 kilobyte (KB) packets. When the aggregation
feature was applied, 0.5 Mb of bandwidth was used over one 65 KB packet. The 65 KB packet
size remains constant as long as the graph size remains constant, as one point of data is loaded
per column of pixels. Because Elasticsearch returns data in these small packets, there is no
potential for heavy network traffic.
D. Benchmarking
In order to determine whether or not our visualizer performed better than Cyclone (the
visualization tool currently being used by the OSTM), Daniele and I performed a series of tests.
We took great precaution to ensure that both tools plotted the same amount of data in these tests.
To accomplish this, we found that the “bucket size” setting in our visualizer functioned similarly
to the “fidelity” setting in Cyclone. The two were set to produce nearly the same amount of data
and then a total of twenty-four tests were run.
Setting the two visualizers in such a way that they both produced the same amount of
data was accomplished via our visualizer’s “bucket size” setting and Cyclone’s “fidelity” setting.
VISUALIZING AND PROCESSING WEATHER TELEMETRY 8
In our visualizer’s JavaScript the amount of data points collected for each averaging instance can
be set. The smaller the number is, the larger the amount of data points that will be plotted. In
Cyclone, the “fidelity” setting determines how true to the original data set the resulting graph
should be. Just as with our visualizer, the smaller this number is, the more points will be
graphed.
Once the two tools were calibrated to produce the same amount of points, we began
testing. Each tool was tested a total of twelve times: three times we tested over two time intervals
across two different fields. We were able to conclusively state that our visualizer ran over five
times faster than Cyclone.
Our Visualizer
Interval
13 July 2014 09:00 – 15 July 2014
06:00
19 July 2014 15:00 – 21 July 2014
09:00
Field GPS A Current AMR V Current GPS A Current AMR V Current
Trial #1 5.57 s 6.62 s 4.17 s 4.98 s
Trial #2 5.99 s 5.94 s 4.14 s 4.83 s
Trial #3 6.61 s 6.16 s 3.90 s 5.90 s
Field Average 6.06 s 6.24 s 4.07 s 5.24 s
Interval Average 6.15 s 4.65 s
Overall Average 5.40 s
	
  
	
   Cyclone	
  
Interval	
  
13 July 2014 09:00 – 15 July 2014
06:00	
  
19 July 2014 15:00 – 21 July 2014
09:00	
  
Field	
   GPS A Current	
   AMR V Current	
   GPS A Current	
   AMR V Current	
  
Trial #1	
   30.55 s	
   32.01 s	
   28.42 s	
   24.91 s	
  
Trial #2	
   27.96 s	
   27.53 s	
   28.06 s	
   23.93 s	
  
Trial #3	
   31.52 s	
   30.69 s	
   27.10 s	
   25.15 s	
  
Field Average	
   30.01 s	
   30.08 s	
   27.86 s	
   24.66 s	
  
Interval Average 30.04 s 26.26 s
Overall Average 28.15 s
	
  
Figures 4-5: Benchmarking our visualizer (top) with Cyclone (bottom)
VISUALIZING AND PROCESSING WEATHER TELEMETRY 9
E. Data Dumping
In an attempt to make our project as developer-friendly as possible, we developed a tool,
1700 lines of code long, separate from the visualizer that would pull data stored in Elasticsearch
to the user’s system. This data can be used for statistical analysis and in development of other
visualization tools. If the script is run from the command line without any parameters, it displays
a welcome message and the main menu. At this menu, the user is asked whether he/she would
like to query via a format similar to that of the visualizer’s “drop-down menu” syntax or if they
would like to query via a SQL plug-in. If they choose to query via the SQL plug-in, they type
their query just as they would type any other SQL query. Consider the following example.
In Example 1, the index “jason-3” and APID type “260” are chosen. The query will
return values from the “lraTemp” and “dt” fields where “lraTemp” is in-between 280 and 295.
The “LIMIT” part of this query specifies how many results can be returned by the query. Since
the user in Example 1 entered a limit of 10,050 no more than 10,050 results will be exported.
Unfortunately, if too large of a query size is entered, the SQL plug-in will crash. Because of this,
a default query limit of 100,000 has been set in the script’s code. After the user enters his/her
query, they will be asked in which file format they would like to have their results exported. If no
results are found, the script will say so. Otherwise, an output file will be created with their results
in their specified format.
If the user decided to use the script’s syntax rather than use the SQL plug-in, they will be
SELECT lraTemp,dt FROM jason-3/260 WHERE lraTemp BETWEEN 280 AND 295 LIMIT
10050
Example 1: Making an SQL query to dump Elasticsearch data to user’s system while running the script
VISUALIZING AND PROCESSING WEATHER TELEMETRY 10
prompted to answer a series of questions concerning their query. They will first be asked whether
they would like to query and export the entire data set or a section of it. If they would like to
query specific dates, the script will prompt them for them. Otherwise, the script will continue to
the next prompt: whether or not the user would like to enter a scaling factor. If entered, this
number will be multiplied by every numeric value being printed. The next prompt asks the user
to specify which fields he/she would like to export. If the user does not know which fields are
available, he/she may type “list” and see the list appear on the screen. Otherwise, they may type
the individual fields they would like to see printed or “all” if they would like all of them. Finally,
the script prompts the user for which format they would like to have their data. After some time,
a file is generated containing the user’s requested data.
Another feature of the data dumping script is its ability to run entirely from the command
line. For example, if the user would like to query via a SQL query, he/she can type “--sql”
followed by their query. If they would like to choose a specific APID or ingestion version, they
can specify them with the “--s” flag. Several extra settings can be specified with the “--e” flag.
Consider the following example.
Example 2 accomplishes everything Example 1 accomplished, but from the command line. No
interaction with the script’s interface is needed. Consider another example.
user@mycomp:$ perl data-dumper.pl --sql csv SELECT lraTemp,dt FROM Jason-
3/260 WHERE lraTemp BETWEEN 280 AND 295 LIMIT 100500
Example 2: Making an SQL query via the Perl data dump script’s command line parameters
VISUALIZING AND PROCESSING WEATHER TELEMETRY 11
Example 3, although seemingly complicated, accomplishes a lot. The two dates following the “--
e” flag tell the script to query all of the data in-between them. The “csv” parameter tells the
script to export the results in comma-separated format (CSV). The “100” immediately following
is the scaling factor—that is, by how much to multiply every numeric value. The parameters that
follow (up until the “--s” flag) are the fields that will be queried. After the “--s” flag are a few
more parameters: the domain from which Elasticsearch is hosting, the ingestion version (index),
the telemetry APID type, the query size (the maximum number of packets each query to
Elasticsearch will produce), whether or not to round the results (1 for yes and 0 for no), and by
how much to round the results (in this case, all results will be rounded to the nearest thousandth).
All of these settings could be set via script prompts, but having the ability to set them from the
command line can be useful in situations where the data dumper script needs to be called from
another script.
Once the script was completed, we compiled it into an image to be shipped via Docker.9
Docker is an open-source tool that automates the deployment of applications inside software
containers that can be run from any system. Using a configuration file we included all of the
different CPAN (Comprehensive Perl Archive Network) modules inside the image so that the
user does not need to install anything other than Docker in order to run the script. Compiling
everything needed to run the script in an image significantly reduces the amount of work
user@mycomp:$ perl data-dumper.pl --e 2014-12-11T17:21:30.000Z 2014-12-
11T17:21:45.000Z csv 100 apid lraTemp --s http://localhost:9200/ jason-3
260 10000 1 0.100
Example 3: Running the Perl data dump script with several options from the command line
VISUALIZING AND PROCESSING WEATHER TELEMETRY 12
required on the user’s part; however, they may also download the Perl files from a JPL GHE
repository.
III. Conclusions
At the beginning of the summer we were asked by our mentor to develop a quick and
easy-to-use tool that would ingest and visualize OSTM telemetry. The OSTM already had a
system in place, but they wanted a solution that would implement “Big Data” tools and therefore
reduce the time needed to receive a graph. In this regard, we have been more than successful.
Concerning ingestion, we have implemented a simple (albeit slightly slow) solution: we
installed and calibrated Logstash in such a way that it dynamically ingests data into
Elasticsearch. ECSV files can be “dragged-and-dropped” into the ingestion folder and be
automatically ingested into Elasticsearch. Although this method of ingestion is slow, it is easy to
use and can be replaced by other ingestion methods if the desire to do so arises.
We have created a visualizer that is extremely efficient and easy to use. The tool is set up
in such a way that new data is not queried until it is needed. Points are filtered using an
averaging technique that processes points on a pixel-by-pixel basis. As a result of the
Elasticsearch implementation, we have proven that our tool runs five times faster than Cyclone,
the visualizer currently in place.	
  
Finally, we have taken large steps towards abstracting our tool. Comments are printed
throughout every file of code. A data dumping utility has been created that will help future
programmers develop visualization tools of their own and help statisticians derive conclusions
about the satellite’s observations. Additionally, we installed a SQL plug-in into the visualizer so
that those familiar with SQL can easily visualize the data they desire. For easy access to our
summer project, all of our work has been pushed into JPL GitHub Enterprise repositories. The
VISUALIZING AND PROCESSING WEATHER TELEMETRY 13
OSTM may access these files to use with Jason-2 data, or they may modify them slightly to work
with Jason-3, SWOT, or Jason-CS data.
The goal of this summer was to make an enormously complicated process seem simple to
the user. We spent a large portion of our ten weeklong internship learning about the various
pieces of software and languages with which we needed to code; as we learned more and more,
we coded with the hope that users would not have to spend the same amount of time we did
learning. With the tools we have coded and Elasticsearch in place, we feel that everyone in the
OSTM will be able to access Jason telemetry easily and quickly with only the most basic
understanding of OSTM telemetry.
IV. Future Work
Although Daniele and I were able to accomplish much during our internship and produce
significant results, there is much more work that can be done to improve our visualizer and the
visualization process as a whole. We were able to speed up the front-end side of the visualizer by
including features such as dynamic querying but were not able to speed up the ingestion process.
In order to speed up the process, the OSTM should consider converting the telemetry binaries on
the JPL Cloud and ingesting the ECSVs on a cluster.
At the moment, Jason telemetry packets are being processed by one Perl script one at a
time. While we were working with ~50 MB of data, we found that it took the Perl conversion
script about twenty minutes to process all the data. If the OSTM were to try to convert the entire
terabyte of Jason telemetry this way the time required would be incredible. In order to escape
this, the OSTM can “containerize” the Perl conversion scripts and run them on the JPL Cloud.
By placing the conversion scripts into individual containers, dozens of them can be run on the
Cloud at a time. Rather than having one script processing all of the n data files after another, n
VISUALIZING AND PROCESSING WEATHER TELEMETRY 14
files could be sent to n containers running on the Cloud. This way, each container would only
have to process 1 file. The only real downside to sending each telemetry file to its own container
is that it could be very expensive and demanding on the JPL Cloud. To avoid this, the OSTM
could perhaps send twenty or so binary files to each container. Doing so would take twenty times
as long as sending each file to its own container, but it would also place one twentieth the load
on the JPL Cloud.
Sending telemetry to the Cloud will save much time and effort on part of the OSTM. Dan
Isla and Philip Southam, two developers at JPL, have offered to help the OSTM “containerize”
the Perl conversion scripts. They have made significant progress towards this end. Once they
have finished, the OSTM should be able to convert their binaries at a very quick rate.
A second enhancement that could be made to our summer project is clustering the
ingestion process. Currently Logstash is ingesting the ECSVs into Elasticsearch. Although
Logstash is very easy to use, it is a bit slow. In order to speed up the ingestion process, the
OSTM could look into ingesting via Apache Spark. Spark is a utility designed to work on a
cluster: if a parser were to be developed that would send the telemetry to Spark, the data could
then be ingested into Elasticsearch directly.
VISUALIZING AND PROCESSING WEATHER TELEMETRY 15
References
1. Jet Propulsion Laboratory (n.d.). Mission Basics. Accessed from
http://sealevel.jpl.nasa.gov/overview/missionbasics/
2. Elasticsearch is a free, open-source tool available to download from
https://www.elastic.co/products/elasticsearch/
3. Apache Spark is a free, open-source tool available to download from
http://spark.apache.org/
4. Logstash is a free tool available to download from
https://www.elastic.co/products/logstash/
5. D3 is a free Javascript library available to download from http://d3js.org/
6. Konwinski, Andy. Powered by Spark (2015, August 14). Accessed from
https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark/
7. User NLPchina’s elasticsearch-sql is an open-source plug-in available from
https://github.com/NLPchina/elasticsearch-sql/
8. Wireshark is a free tool available to download from https://www.wireshark.org/
9. Docker is a free tool available to download from https://www.docker.com/

Weitere ähnliche Inhalte

Was ist angesagt?

Task Resource Consumption Prediction for Scientific Applications and Workflows
Task Resource Consumption Prediction for Scientific Applications and WorkflowsTask Resource Consumption Prediction for Scientific Applications and Workflows
Task Resource Consumption Prediction for Scientific Applications and Workflows
Rafael Ferreira da Silva
 
Hpc, grid and cloud computing - the past, present, and future challenge
Hpc, grid and cloud computing - the past, present, and future challengeHpc, grid and cloud computing - the past, present, and future challenge
Hpc, grid and cloud computing - the past, present, and future challenge
Jason Shih
 

Was ist angesagt? (20)

Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
 
20190314 cern register v3
20190314 cern register v320190314 cern register v3
20190314 cern register v3
 
Data-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstData-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud Burst
 
Report to the NAC
Report to the NACReport to the NAC
Report to the NAC
 
OpenStack at CERN : A 5 year perspective
OpenStack at CERN : A 5 year perspectiveOpenStack at CERN : A 5 year perspective
OpenStack at CERN : A 5 year perspective
 
Master's Thesis - climateprediction.net: A Cloudy Approach
Master's Thesis - climateprediction.net: A Cloudy ApproachMaster's Thesis - climateprediction.net: A Cloudy Approach
Master's Thesis - climateprediction.net: A Cloudy Approach
 
Blue Waters and Resource Management - Now and in the Future
 Blue Waters and Resource Management - Now and in the Future Blue Waters and Resource Management - Now and in the Future
Blue Waters and Resource Management - Now and in the Future
 
Pegasus - automate, recover, and debug scientific computations
Pegasus - automate, recover, and debug scientific computationsPegasus - automate, recover, and debug scientific computations
Pegasus - automate, recover, and debug scientific computations
 
Task Resource Consumption Prediction for Scientific Applications and Workflows
Task Resource Consumption Prediction for Scientific Applications and WorkflowsTask Resource Consumption Prediction for Scientific Applications and Workflows
Task Resource Consumption Prediction for Scientific Applications and Workflows
 
20181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v320181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v3
 
NASA's Movement Towards Cloud Computing
NASA's Movement Towards Cloud ComputingNASA's Movement Towards Cloud Computing
NASA's Movement Towards Cloud Computing
 
The Schema Editor of OpenIoT for Semantic Sensor Networks
The Schema Editor of OpenIoT for Semantic Sensor NetworksThe Schema Editor of OpenIoT for Semantic Sensor Networks
The Schema Editor of OpenIoT for Semantic Sensor Networks
 
XGSN: An Open-source Semantic Sensing Middleware for the Web of Things
XGSN: An Open-source Semantic Sensing Middleware for the Web of ThingsXGSN: An Open-source Semantic Sensing Middleware for the Web of Things
XGSN: An Open-source Semantic Sensing Middleware for the Web of Things
 
Using A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific OutputUsing A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific Output
 
GSN Global Sensor Networks for Environmental Data Management
GSN Global Sensor Networks for Environmental Data ManagementGSN Global Sensor Networks for Environmental Data Management
GSN Global Sensor Networks for Environmental Data Management
 
NPOESS Program Overview
NPOESS Program OverviewNPOESS Program Overview
NPOESS Program Overview
 
Hpc, grid and cloud computing - the past, present, and future challenge
Hpc, grid and cloud computing - the past, present, and future challengeHpc, grid and cloud computing - the past, present, and future challenge
Hpc, grid and cloud computing - the past, present, and future challenge
 
X-GSN in OpenIoT SummerSchool
X-GSN in OpenIoT SummerSchoolX-GSN in OpenIoT SummerSchool
X-GSN in OpenIoT SummerSchool
 
Towards Exascale Simulations of Stellar Explosions with FLASH
Towards Exascale  Simulations of Stellar  Explosions with FLASHTowards Exascale  Simulations of Stellar  Explosions with FLASH
Towards Exascale Simulations of Stellar Explosions with FLASH
 
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
 

Ähnlich wie GRIMES_Visualizing_Telemetry

Real-Time Hardware Simulation with Portable Hardware-in-the-Loop (PHIL-Rebooted)
Real-Time Hardware Simulation with Portable Hardware-in-the-Loop (PHIL-Rebooted)Real-Time Hardware Simulation with Portable Hardware-in-the-Loop (PHIL-Rebooted)
Real-Time Hardware Simulation with Portable Hardware-in-the-Loop (PHIL-Rebooted)
Riley Waite
 
Virtual Science in the Cloud
Virtual Science in the CloudVirtual Science in the Cloud
Virtual Science in the Cloud
thetfoot
 
Slide 1
Slide 1Slide 1
Slide 1
butest
 
Referal-Kevin-Grimes
Referal-Kevin-GrimesReferal-Kevin-Grimes
Referal-Kevin-Grimes
Kevin Grimes
 

Ähnlich wie GRIMES_Visualizing_Telemetry (20)

Cognitive Engine: Boosting Scientific Discovery
Cognitive Engine:  Boosting Scientific DiscoveryCognitive Engine:  Boosting Scientific Discovery
Cognitive Engine: Boosting Scientific Discovery
 
Real-Time Hardware Simulation with Portable Hardware-in-the-Loop (PHIL-Rebooted)
Real-Time Hardware Simulation with Portable Hardware-in-the-Loop (PHIL-Rebooted)Real-Time Hardware Simulation with Portable Hardware-in-the-Loop (PHIL-Rebooted)
Real-Time Hardware Simulation with Portable Hardware-in-the-Loop (PHIL-Rebooted)
 
Virtual Science in the Cloud
Virtual Science in the CloudVirtual Science in the Cloud
Virtual Science in the Cloud
 
The Emerging Cyberinfrastructure for Earth and Ocean Sciences
The Emerging Cyberinfrastructure for Earth and Ocean SciencesThe Emerging Cyberinfrastructure for Earth and Ocean Sciences
The Emerging Cyberinfrastructure for Earth and Ocean Sciences
 
Bruce Damer's presentation of Digital Spaces, an open source 3D simulation pl...
Bruce Damer's presentation of Digital Spaces, an open source 3D simulation pl...Bruce Damer's presentation of Digital Spaces, an open source 3D simulation pl...
Bruce Damer's presentation of Digital Spaces, an open source 3D simulation pl...
 
Larry Smarr - NRP Application Drivers
Larry Smarr - NRP Application DriversLarry Smarr - NRP Application Drivers
Larry Smarr - NRP Application Drivers
 
Cyberinfrastructure to Support Ocean Observatories
Cyberinfrastructure to Support Ocean ObservatoriesCyberinfrastructure to Support Ocean Observatories
Cyberinfrastructure to Support Ocean Observatories
 
Processing Geospatial Data At Scale @locationtech
Processing Geospatial Data At Scale @locationtechProcessing Geospatial Data At Scale @locationtech
Processing Geospatial Data At Scale @locationtech
 
LarKC Tutorial at ISWC 2009 - Introduction
LarKC Tutorial at ISWC 2009 - IntroductionLarKC Tutorial at ISWC 2009 - Introduction
LarKC Tutorial at ISWC 2009 - Introduction
 
The Next Decade of ISS and Beyond
The Next Decade of ISS and BeyondThe Next Decade of ISS and Beyond
The Next Decade of ISS and Beyond
 
Toward a Global Interactive Earth Observing Cyberinfrastructure
Toward a Global Interactive Earth Observing CyberinfrastructureToward a Global Interactive Earth Observing Cyberinfrastructure
Toward a Global Interactive Earth Observing Cyberinfrastructure
 
The Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource ProvisioningThe Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource Provisioning
 
Slide 1
Slide 1Slide 1
Slide 1
 
How to use NCI's national repository of big spatial data collections
How to use NCI's national repository of big spatial data collectionsHow to use NCI's national repository of big spatial data collections
How to use NCI's national repository of big spatial data collections
 
Applying Photonics to User Needs: The Application Challenge
Applying Photonics to User Needs: The Application ChallengeApplying Photonics to User Needs: The Application Challenge
Applying Photonics to User Needs: The Application Challenge
 
Adoption of Software By A User Community: The Montage Image Mosaic Engine Exa...
Adoption of Software By A User Community: The Montage Image Mosaic Engine Exa...Adoption of Software By A User Community: The Montage Image Mosaic Engine Exa...
Adoption of Software By A User Community: The Montage Image Mosaic Engine Exa...
 
Metadata syncronisation with GeoNetwork - a users perspective
Metadata syncronisation with GeoNetwork - a users perspectiveMetadata syncronisation with GeoNetwork - a users perspective
Metadata syncronisation with GeoNetwork - a users perspective
 
Referal-Kevin-Grimes
Referal-Kevin-GrimesReferal-Kevin-Grimes
Referal-Kevin-Grimes
 
What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care?
 
Godiva2 Overview
Godiva2 OverviewGodiva2 Overview
Godiva2 Overview
 

GRIMES_Visualizing_Telemetry

  • 1. Running head: VISUALIZING AND PROCESSING WEATHER TELEMETRY   i Visualizing and Processing Weather Satellite Telemetry: A Solution Using Big Data Methodologies Kevin M. Grimes, II Mentor: Amalaye Oyake Jet Propulsion Laboratory 20 August 2015
  • 2. VISUALIZING AND PROCESSING WEATHER TELEMETRY ii Abstract The Ocean Surface Topography Mission’s Jason satellites have placed about a terabyte of weather data into a MySQL database for analysis. Because the OSTM’s set of data is so enormous, the processing time required to perform a query using its current setup is quite long. Visualizing the data using the Cyclone tool is inefficient and does not allow for much interaction with the graph. Using the “Big Data” tool Elasticsearch, we have proposed a Cyclone replacement that quickens the process of querying and graphing Jason data by a factor of five. In our proposed tool, the data is ingested into Elasticsearch via Logstash, an Elastic product. Once the data is ingested into Elasticsearch, our visualizer queries it in a way that maximizes efficiency and minimizes time. A specially designed UI allows for quick and easy access to the satellite telemetry. Users may choose to have the data plotted in an interactive graph or printed in a variety of formats. Our project’s functionality provides the user with a quick, easy, and efficient experience that can be implemented as a suitable Cyclone replacement. Keywords: Jason, telemetry, Elasticsearch, Cyclone
  • 3. VISUALIZING AND PROCESSING WEATHER TELEMETRY iii Acknowledgments I would like to preface this report by thanking the multiple people whose efforts led to my project’s completion. Without the support of these people I would not have been able to accomplish what I have this summer. My mentor, Amalaye Oyake - for choosing to bring me on board. It has been exciting and challenging, and I cannot thank you enough for letting me be a part of it. My partner, Daniele Bellutta - for working as hard as, if not harder than, I did. Thanks for working long days with me and seeing this project through. Also, thanks for helping me by answering questions I had; without your help, I would have spent even more time bugging people on Stack Overflow. Various JPL employees, including Dan Isla, Philip Southam, Stefan Eng, and David Mittman - for giving me solutions to problems I was unable to solve.
  • 4. VISUALIZING AND PROCESSING WEATHER TELEMETRY iv Table of Contents Abstract … … ii Acknowledgments … … iii Table of Contents … … iv List of Figures and Examples … … v List of Abbreviations … … vi I. Background and Motivation … … 1 II. Methods … 3 A. Initial Processing … … 3 B. Ingestion … … 4 C. Visualization … … 5 D. Benchmarking … … 7 E. Data Dumping … … 8 III. Conclusions … … 11 IV. Future Work … … 12 References … … 15
  • 5. VISUALIZING AND PROCESSING WEATHER TELEMETRY v List of Figures and Examples Figure 1. Cyclone, the tool currently used to process and visualize Jason telemetry … 2 Figure 2. Apache Spark process flowchart … … 4 Figure 3. Our visualizer prototype … … 6 Figures 4-5. Benchmarking our visualizer with Cyclone … … 8 Example 1. Making an SQL query to dump Elasticsearch data to user’s system while running the script … … 9 Example 2. Making an SQL query via the Perl data dump script’s command line parameters … … 10 Example 3. Running the Perl data dump script with several options from the command line … … 10
  • 6. VISUALIZING AND PROCESSING WEATHER TELEMETRY vi List of Abbreviations CNES … … Centre national d'études spatiales CPAN … … Comprehensive Perl Archive Network CSV … … Comma-separated format DSN … … Deep Space Network ECSV … … Encapsulated comma-separated format FTP … … File transfer protocol GDS … … Ground data system GHE … … GitHub Enterprise JPL … … Jet Propulsion Laboratory KB … … Kilobyte(s) Mb … … Megabit(s) MB … … Megabyte(s) NASA … … National Aeronautics and Space Administration OSTM … … Ocean Surface Topography Mission PP … … Perl Packager TOPEX … … Ocean Topography Experiment UI … … User interface
  • 7. VISUALIZING AND PROCESSING WEATHER TELEMETRY 1 Visualizing and Processing Weather Satellite Telemetry: A Solution Using Big Data Methodologies I. Background and Motivation The Ocean Surface Topography Mission (OSTM) at Jet Propulsion Laboratory (JPL) collects and analyzes data from our planet’s oceans. Their first mission was a collaborative effort with the Centre national d'études spatiales (CNES), the French center for space research. CNES’s Ocean Topography Experiment (TOPEX) merged with JPL’s Poseidon project and launched the TOPEX/Poseidon satellite, commencing OSTM’s first mission. The satellite received data such as oceanic temperatures and ocean levels and sent it via “space packets” to a ground data system (GDS) on Earth where it was processed and stored into a large database. Although the TOPEX/Poseidon mission ended in January 2006, several JPL satellites, including Jason-2, continue to collect, process, and transmit oceanic data for scientific use. The data collected by the OSTM has been extremely valuable in the field of meteorology. Over four hundred scientists from thirty different nations use this data to perform climate research, forecast hurricanes, route ships, and research coral reefs; in particular, the data gathered by the Jason series of satellites is used to monitor changes in oceanic levels.1 Over the last couple decades, the OSTM has accumulated multiple gigabytes of data useful to the scientific community. With such a large set of data, however, comes large processing time. In order for users to process and visualize Jason-2 telemetry, users must complete a long and complicated request via Cyclone, the visualization tool currently in place. Once the request has been submitted, the user must wait an extended amount of time for the graph to be created. For these reasons and more, the OSTM felt that Cyclone needed to either be updated or replaced. Our mentor, Amalaye Oyake, recommended that my partner, Daniele Bellutta, and I
  • 8. VISUALIZING AND PROCESSING WEATHER TELEMETRY 2 experiment with “Big Data” tools such as Elasticsearch2 and Apache Spark.3 After some research into the features of these utilities we devised a configuration that allowed for quick and easy data ingestion and interactive visualization. We downloaded Logstash,4 an Elastic ingestion tool, and ingested data directly into Elasticsearch. Once we ingested the data, we designed a visualizer using JavaScript and D3.js,5 an external library. Finally, we designed a script using the Perl programming language that would download data to the user’s computer directly from Elasticsearch. Once these tools were implemented and our project was completed, we benchmarked our visualizer with Cyclone and observed our results.   After several tests, we proved that our tool queried and visualized Jason-2 telemetry much quicker than Cyclone. We ran multiple tests comparing our tool with Cyclone and concluded that our visualizer works at a rate over five times quicker than Cyclone. Additionally, our tool has a much nicer user-interface (UI) that allows users to customize their query much easier than they could on Cyclone. Despite our success in this regard, however, ingestion is still a Figure 1: Cyclone, the tool currently used to process and visualize Jason telemetry  
  • 9. VISUALIZING AND PROCESSING WEATHER TELEMETRY 3 time-consuming part of the visualization process. Clustering tools such as Apache Spark have the capability to send data to several “workers,” sort it, and send it to Elasticsearch. Both Daniele and I believe that ingestion speed would be reduced dramatically with the help of Spark but were, due to time restrictions, unable to implement it ourselves. Despite slow ingestion, however, the visualizer is a quick and easy-to-use tool that will, alongside our Perl data dump script, be able to serve as a Cyclone replacement. II. Methods   A. Initial Processing Jason-2 collects data from Earth’s oceans and processes it on-site. This data is sent via binary telemetry packets to JPL’s GDS and is relayed to the OSTM. Once this information is received, it must be processed into some sort of format recognizable by ingestion tools. In order to ingest this data into Elasticsearch, Daniele and I needed a method to first covert the binary telemetry packets into encapsulated comma-separated format (ECSV). We were given access to the OSTM’s file-transfer protocol (FTP) system and were able to pull a sampling of data to use in development. Our mentor, Amalaye Oyake, gave us a few Perl scripts that had been used to process binary telemetry of various types and either display it in the Terminal or pipeline it into MySQL. We were able to modify these scripts in such a way that they exported the data to the user’s system in ECSV format. In order to automate the process, we wrote a shell script that would run the telemetry export script repeatedly until all the binaries in the specified directory had been converted. Once we had converted all of the telemetry we were given, we began to explore methods of ingestion into Elasticsearch.
  • 10. VISUALIZING AND PROCESSING WEATHER TELEMETRY 4 B. Ingestion We explored thoroughly two methods of ingestion into Elasticsearch: Apache Spark and Elastic’s Logstash. Spark, a clustering tool used by companies such as Amazon, eBay, and JPL’s own Deep Space Network (DSN),6 takes data parsed by an external parser and sends it to multiple “workers” which process it and return it to the user. Logstash, on the other hand, does not require much parsing prior to ingestion and is much easier to use; however, it is not as powerful as Apache Spark. For the sake of ease, we chose to ingest our data with Logstash but hope that future programmers could incorporate Spark’s clustering capabilities into the visualization process. Although extremely powerful, Spark is particularly picky concerning how it reads in data and how it returns it to the user. Daniele and I spent the first week of our summer program developing a parser in Scala that would pipeline the ECSV files generated by our Perl scripts into Spark. Once we had the data ingested into Spark, we attempted to process and sort it with the help of “workers.” The final step in the Apache Spark ingestion was sending the processed and sorted data from Spark into Elasticsearch. We experimented with Spark functions that would Figure 2: Apache Spark process flowchart
  • 11. VISUALIZING AND PROCESSING WEATHER TELEMETRY 5 pipeline the data into Elasticsearch but met several complications. Rather than invest another week into configuring Spark, we decided to turn our focus to an easier-to-use tool, Logstash. Logstash, an Elastic product, proved to be much simpler than Spark. While we spent nearly a week developing a parser that would send data to Spark, we spent only one day implementing Logstash. We developed a configuration file that would tell Logstash how to ingest the data. This file included the names of the various telemetry, the index under which the ingested data would be stored, and the path to the files to be ingested. After a few minutes passed, our data was successfully ingested into Elasticsearch in a neat and organized manner. Although Logstash is easier to use than Spark, it takes much more time to ingest data. When we ingested a relatively small amount of data of ~50 megabytes (MB), Logstash needed a few minutes to process and send the data to Elasticsearch. If we were to try to ingest terabytes of data, Logstash would need to run for hours. In this regard, our project still has room to grow. Despite slow ingestion, however, we found that Logstash was able to meet our needs for our small sampling of data and began to explore visualization techniques. C. Visualization Once the telemetry was ingested into Elasticsearch, we began development on our visualizer. We considered a few external libraries that seemed to be reliable and efficient and settled on coding the visualizer in JavaScript with the help of the D3.js external library. The D3.js library provided dozens of graphing functions that were invaluable to us during development. Two thousand lines of code later, we had produced a tool that was quick and easy to use. The visualizer queries the Elasticsearch database multiple times. Initially, it queries for the range of dates, fields, and a few other pieces of data. This data is used in several drop-down
  • 12. VISUALIZING AND PROCESSING WEATHER TELEMETRY 6 menus and boxes to allow the user to customize his/her results. The user may decide to either use our UI with drop-down boxes or he/she may format a request in SQL via a SQL plug-in we installed.7 Once the user submits the request, it is sent to Elasticsearch and processed in an initial query followed by a series of scroll queries. The initial query queries Elasticsearch for a fixed amount of data and returns a “scroll ID” that can be used to pick up where that query left off. A scroll query is then made using the scroll ID returned by the initial query and retrieves the rest of the data.   As Elasticsearch returns the final packets of data, the visualizer begins to create a graph. This graph—depending upon the user’s settings—may contain all of the points returned by the Elasticsearch queries or it may be only an averaging of the data. If the user desires, the visualizer can count the number of points in each vertical column of pixels and calculate an average using Elasticsearch’s “aggregation” feature. This significantly reduces the number of points being plotted while still keeping the general trend of the larger data set. In addition to averaging, the data may also be scaled by a scaling factor. Once the graph has been plotted, the user may hover Figure 3: Our visualizer prototype
  • 13. VISUALIZING AND PROCESSING WEATHER TELEMETRY 7 his/her cursor over the points and a “tooltip” will draw a vertical line through the nearest point and show on the side which point is being studied and the corresponding value of that point. Additionally, the user may zoom in and out using a slider at the bottom of the screen. As the user pans from side to side or zooms, Elasticsearch is queried for the points needed. This “dynamic querying” feature of the visualizer, combined with the various settings, provides for a quick and easy user experience. Once the visualizer was developed, we used Wireshark,8 a network protocol analyzer, to determine how much bandwidth our visualizer used while performing queries. When we queried three hours’ worth of data without using aggregation, for example, Elasticsearch returned about 13.5 megabits (Mb) of data over twenty-six 65 kilobyte (KB) packets. When the aggregation feature was applied, 0.5 Mb of bandwidth was used over one 65 KB packet. The 65 KB packet size remains constant as long as the graph size remains constant, as one point of data is loaded per column of pixels. Because Elasticsearch returns data in these small packets, there is no potential for heavy network traffic. D. Benchmarking In order to determine whether or not our visualizer performed better than Cyclone (the visualization tool currently being used by the OSTM), Daniele and I performed a series of tests. We took great precaution to ensure that both tools plotted the same amount of data in these tests. To accomplish this, we found that the “bucket size” setting in our visualizer functioned similarly to the “fidelity” setting in Cyclone. The two were set to produce nearly the same amount of data and then a total of twenty-four tests were run. Setting the two visualizers in such a way that they both produced the same amount of data was accomplished via our visualizer’s “bucket size” setting and Cyclone’s “fidelity” setting.
  • 14. VISUALIZING AND PROCESSING WEATHER TELEMETRY 8 In our visualizer’s JavaScript the amount of data points collected for each averaging instance can be set. The smaller the number is, the larger the amount of data points that will be plotted. In Cyclone, the “fidelity” setting determines how true to the original data set the resulting graph should be. Just as with our visualizer, the smaller this number is, the more points will be graphed. Once the two tools were calibrated to produce the same amount of points, we began testing. Each tool was tested a total of twelve times: three times we tested over two time intervals across two different fields. We were able to conclusively state that our visualizer ran over five times faster than Cyclone. Our Visualizer Interval 13 July 2014 09:00 – 15 July 2014 06:00 19 July 2014 15:00 – 21 July 2014 09:00 Field GPS A Current AMR V Current GPS A Current AMR V Current Trial #1 5.57 s 6.62 s 4.17 s 4.98 s Trial #2 5.99 s 5.94 s 4.14 s 4.83 s Trial #3 6.61 s 6.16 s 3.90 s 5.90 s Field Average 6.06 s 6.24 s 4.07 s 5.24 s Interval Average 6.15 s 4.65 s Overall Average 5.40 s     Cyclone   Interval   13 July 2014 09:00 – 15 July 2014 06:00   19 July 2014 15:00 – 21 July 2014 09:00   Field   GPS A Current   AMR V Current   GPS A Current   AMR V Current   Trial #1   30.55 s   32.01 s   28.42 s   24.91 s   Trial #2   27.96 s   27.53 s   28.06 s   23.93 s   Trial #3   31.52 s   30.69 s   27.10 s   25.15 s   Field Average   30.01 s   30.08 s   27.86 s   24.66 s   Interval Average 30.04 s 26.26 s Overall Average 28.15 s   Figures 4-5: Benchmarking our visualizer (top) with Cyclone (bottom)
  • 15. VISUALIZING AND PROCESSING WEATHER TELEMETRY 9 E. Data Dumping In an attempt to make our project as developer-friendly as possible, we developed a tool, 1700 lines of code long, separate from the visualizer that would pull data stored in Elasticsearch to the user’s system. This data can be used for statistical analysis and in development of other visualization tools. If the script is run from the command line without any parameters, it displays a welcome message and the main menu. At this menu, the user is asked whether he/she would like to query via a format similar to that of the visualizer’s “drop-down menu” syntax or if they would like to query via a SQL plug-in. If they choose to query via the SQL plug-in, they type their query just as they would type any other SQL query. Consider the following example. In Example 1, the index “jason-3” and APID type “260” are chosen. The query will return values from the “lraTemp” and “dt” fields where “lraTemp” is in-between 280 and 295. The “LIMIT” part of this query specifies how many results can be returned by the query. Since the user in Example 1 entered a limit of 10,050 no more than 10,050 results will be exported. Unfortunately, if too large of a query size is entered, the SQL plug-in will crash. Because of this, a default query limit of 100,000 has been set in the script’s code. After the user enters his/her query, they will be asked in which file format they would like to have their results exported. If no results are found, the script will say so. Otherwise, an output file will be created with their results in their specified format. If the user decided to use the script’s syntax rather than use the SQL plug-in, they will be SELECT lraTemp,dt FROM jason-3/260 WHERE lraTemp BETWEEN 280 AND 295 LIMIT 10050 Example 1: Making an SQL query to dump Elasticsearch data to user’s system while running the script
  • 16. VISUALIZING AND PROCESSING WEATHER TELEMETRY 10 prompted to answer a series of questions concerning their query. They will first be asked whether they would like to query and export the entire data set or a section of it. If they would like to query specific dates, the script will prompt them for them. Otherwise, the script will continue to the next prompt: whether or not the user would like to enter a scaling factor. If entered, this number will be multiplied by every numeric value being printed. The next prompt asks the user to specify which fields he/she would like to export. If the user does not know which fields are available, he/she may type “list” and see the list appear on the screen. Otherwise, they may type the individual fields they would like to see printed or “all” if they would like all of them. Finally, the script prompts the user for which format they would like to have their data. After some time, a file is generated containing the user’s requested data. Another feature of the data dumping script is its ability to run entirely from the command line. For example, if the user would like to query via a SQL query, he/she can type “--sql” followed by their query. If they would like to choose a specific APID or ingestion version, they can specify them with the “--s” flag. Several extra settings can be specified with the “--e” flag. Consider the following example. Example 2 accomplishes everything Example 1 accomplished, but from the command line. No interaction with the script’s interface is needed. Consider another example. user@mycomp:$ perl data-dumper.pl --sql csv SELECT lraTemp,dt FROM Jason- 3/260 WHERE lraTemp BETWEEN 280 AND 295 LIMIT 100500 Example 2: Making an SQL query via the Perl data dump script’s command line parameters
  • 17. VISUALIZING AND PROCESSING WEATHER TELEMETRY 11 Example 3, although seemingly complicated, accomplishes a lot. The two dates following the “-- e” flag tell the script to query all of the data in-between them. The “csv” parameter tells the script to export the results in comma-separated format (CSV). The “100” immediately following is the scaling factor—that is, by how much to multiply every numeric value. The parameters that follow (up until the “--s” flag) are the fields that will be queried. After the “--s” flag are a few more parameters: the domain from which Elasticsearch is hosting, the ingestion version (index), the telemetry APID type, the query size (the maximum number of packets each query to Elasticsearch will produce), whether or not to round the results (1 for yes and 0 for no), and by how much to round the results (in this case, all results will be rounded to the nearest thousandth). All of these settings could be set via script prompts, but having the ability to set them from the command line can be useful in situations where the data dumper script needs to be called from another script. Once the script was completed, we compiled it into an image to be shipped via Docker.9 Docker is an open-source tool that automates the deployment of applications inside software containers that can be run from any system. Using a configuration file we included all of the different CPAN (Comprehensive Perl Archive Network) modules inside the image so that the user does not need to install anything other than Docker in order to run the script. Compiling everything needed to run the script in an image significantly reduces the amount of work user@mycomp:$ perl data-dumper.pl --e 2014-12-11T17:21:30.000Z 2014-12- 11T17:21:45.000Z csv 100 apid lraTemp --s http://localhost:9200/ jason-3 260 10000 1 0.100 Example 3: Running the Perl data dump script with several options from the command line
  • 18. VISUALIZING AND PROCESSING WEATHER TELEMETRY 12 required on the user’s part; however, they may also download the Perl files from a JPL GHE repository. III. Conclusions At the beginning of the summer we were asked by our mentor to develop a quick and easy-to-use tool that would ingest and visualize OSTM telemetry. The OSTM already had a system in place, but they wanted a solution that would implement “Big Data” tools and therefore reduce the time needed to receive a graph. In this regard, we have been more than successful. Concerning ingestion, we have implemented a simple (albeit slightly slow) solution: we installed and calibrated Logstash in such a way that it dynamically ingests data into Elasticsearch. ECSV files can be “dragged-and-dropped” into the ingestion folder and be automatically ingested into Elasticsearch. Although this method of ingestion is slow, it is easy to use and can be replaced by other ingestion methods if the desire to do so arises. We have created a visualizer that is extremely efficient and easy to use. The tool is set up in such a way that new data is not queried until it is needed. Points are filtered using an averaging technique that processes points on a pixel-by-pixel basis. As a result of the Elasticsearch implementation, we have proven that our tool runs five times faster than Cyclone, the visualizer currently in place.   Finally, we have taken large steps towards abstracting our tool. Comments are printed throughout every file of code. A data dumping utility has been created that will help future programmers develop visualization tools of their own and help statisticians derive conclusions about the satellite’s observations. Additionally, we installed a SQL plug-in into the visualizer so that those familiar with SQL can easily visualize the data they desire. For easy access to our summer project, all of our work has been pushed into JPL GitHub Enterprise repositories. The
  • 19. VISUALIZING AND PROCESSING WEATHER TELEMETRY 13 OSTM may access these files to use with Jason-2 data, or they may modify them slightly to work with Jason-3, SWOT, or Jason-CS data. The goal of this summer was to make an enormously complicated process seem simple to the user. We spent a large portion of our ten weeklong internship learning about the various pieces of software and languages with which we needed to code; as we learned more and more, we coded with the hope that users would not have to spend the same amount of time we did learning. With the tools we have coded and Elasticsearch in place, we feel that everyone in the OSTM will be able to access Jason telemetry easily and quickly with only the most basic understanding of OSTM telemetry. IV. Future Work Although Daniele and I were able to accomplish much during our internship and produce significant results, there is much more work that can be done to improve our visualizer and the visualization process as a whole. We were able to speed up the front-end side of the visualizer by including features such as dynamic querying but were not able to speed up the ingestion process. In order to speed up the process, the OSTM should consider converting the telemetry binaries on the JPL Cloud and ingesting the ECSVs on a cluster. At the moment, Jason telemetry packets are being processed by one Perl script one at a time. While we were working with ~50 MB of data, we found that it took the Perl conversion script about twenty minutes to process all the data. If the OSTM were to try to convert the entire terabyte of Jason telemetry this way the time required would be incredible. In order to escape this, the OSTM can “containerize” the Perl conversion scripts and run them on the JPL Cloud. By placing the conversion scripts into individual containers, dozens of them can be run on the Cloud at a time. Rather than having one script processing all of the n data files after another, n
  • 20. VISUALIZING AND PROCESSING WEATHER TELEMETRY 14 files could be sent to n containers running on the Cloud. This way, each container would only have to process 1 file. The only real downside to sending each telemetry file to its own container is that it could be very expensive and demanding on the JPL Cloud. To avoid this, the OSTM could perhaps send twenty or so binary files to each container. Doing so would take twenty times as long as sending each file to its own container, but it would also place one twentieth the load on the JPL Cloud. Sending telemetry to the Cloud will save much time and effort on part of the OSTM. Dan Isla and Philip Southam, two developers at JPL, have offered to help the OSTM “containerize” the Perl conversion scripts. They have made significant progress towards this end. Once they have finished, the OSTM should be able to convert their binaries at a very quick rate. A second enhancement that could be made to our summer project is clustering the ingestion process. Currently Logstash is ingesting the ECSVs into Elasticsearch. Although Logstash is very easy to use, it is a bit slow. In order to speed up the ingestion process, the OSTM could look into ingesting via Apache Spark. Spark is a utility designed to work on a cluster: if a parser were to be developed that would send the telemetry to Spark, the data could then be ingested into Elasticsearch directly.
  • 21. VISUALIZING AND PROCESSING WEATHER TELEMETRY 15 References 1. Jet Propulsion Laboratory (n.d.). Mission Basics. Accessed from http://sealevel.jpl.nasa.gov/overview/missionbasics/ 2. Elasticsearch is a free, open-source tool available to download from https://www.elastic.co/products/elasticsearch/ 3. Apache Spark is a free, open-source tool available to download from http://spark.apache.org/ 4. Logstash is a free tool available to download from https://www.elastic.co/products/logstash/ 5. D3 is a free Javascript library available to download from http://d3js.org/ 6. Konwinski, Andy. Powered by Spark (2015, August 14). Accessed from https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark/ 7. User NLPchina’s elasticsearch-sql is an open-source plug-in available from https://github.com/NLPchina/elasticsearch-sql/ 8. Wireshark is a free tool available to download from https://www.wireshark.org/ 9. Docker is a free tool available to download from https://www.docker.com/