SlideShare ist ein Scribd-Unternehmen logo
1 von 40
Downloaden Sie, um offline zu lesen
Demos
Architecture
Bonus
PCIC Data Portal 2.0
StaïŹ€ Meeting
James Hiebert
February 18, 2014
James Hiebert PCIC Data Portal 2.0
Demos
Architecture
Bonus
Outline
1 Demos
2 Architecture
Metadata Database
Python Backend
Pydap
ncWMS
Basemaps
Front-end
3 Bonus
Automated Testing
James Hiebert PCIC Data Portal 2.0
Outline
1 Demos
2 Architecture
Metadata Database
Python Backend
Pydap
ncWMS
Basemaps
Front-end
3 Bonus
Automated Testing
2014-02-18
PCIC Data Portal 2.0
Outline
1. Last week we deployed our 4th and hopefully ïŹnal release candidate
for version 2.0 of the PCIC Data Portal. It’s been a four month
beta period over which we have received and responded to
feedback, both from inside PCIC and from some external beta
testers. Many of you have seen these at the various theme meetings
that we had throughout the fall, but I’d like to take this opportunity
to both introduce the rest of you to the data portal as well as
elaborate on more on what is running behind the scenes and all of
the work that has gone into producing it.
2. Typically in these presentations, I hold you captive with all of the
technical details ïŹrst and save the demo for the end. But in this
case, I’ll start with the demo and then if you don’t care about how
we did it, you can just check out after that.
Demos
Architecture
Bonus
Raster Portal(s)
Coming soon!
James Hiebert PCIC Data Portal 2.0
Raster Portal(s)
Coming soon!
2014-02-18
PCIC Data Portal 2.0
Demos
Raster Portal(s)
1. The software that we have written are a variety of components to
generally handle the organization and presentation of raster data;
that is gridded ïŹelds of spatiotemporal data. There are several sets
of high value data, for which we have written a “raster portal”
which can serve that data up.
Demos
Architecture
Bonus
BCSD Downscale Canada
James Hiebert PCIC Data Portal 2.0
BCSD Downscale Canada
2014-02-18
PCIC Data Portal 2.0
Demos
BCSD Downscale Canada
1. You’ll see that the feature set is intentionally fairly sparse. The
application’s purpose is to allow the users to get the data they
want, and only the data they want, and then to send them on their
way. The main section of screen real estate is the map. The map is
for displaying the areas for which data exists and then to allow the
user to select an area for which to download.
2. In the top right, there is a tree selection which controls the dataset
that is displayed and that which will be downloaded. And ïŹnally
there are a couple options for selecting a time range and data
format.
3. We only support formats which support multidimensional data,
which isn’t very many right now. We’ll be adding Arc ASCII Grid by
the end of the ïŹscal year, which isn’t technically multidimensional,
but we’ll probably send a zip ïŹle of individual grids, one per
timestep.
Demos
Architecture
Bonus
BC PRISM
James Hiebert PCIC Data Portal 2.0
BC PRISM
2014-02-18
PCIC Data Portal 2.0
Demos
BC PRISM
1. The BC PRISM portal is very similar to the BCSD Downscaling
portal, with a few minor diïŹ€erences. First of all the map projection
is speciïŹc to BC. We’ve used the BC Albers projection, which is a
little more visually appealling (though it does present some
challenges). Secondly, because the PRISM data only consists of
monthly climatologies, the data volume in the temporal dimension is
very small. For that reason, we elimintated the time subset controls,
and chose just to give the user the entire time range.
Demos
Architecture
Bonus
VIC (Generation 1)
James Hiebert PCIC Data Portal 2.0
Demos
Architecture
Bonus
Metadata Database
Python Backend
Pydap
ncWMS
Basemaps
Front-end
Software Components
James Hiebert PCIC Data Portal 2.0
Software Components
2014-02-18
PCIC Data Portal 2.0
Architecture
Software Components
1. One thing that you’ll notice from this diagram is that the data itself
is at the foundation of this software stack. Without the data in
place before hand, essentially nothing else can exist without it. Even
the metadata in the database comes from the NetCDF ïŹles. This is
why we have been somewhat militant about wanting your data to
be ïŹnalized before we begin to work on the portal to publish it.
2. The NetCDF box here is the only thing that just data sitting on
disk. These four boxes (PostgreSQL, ncWMS, pydap, pdp) are all
diïŹ€erent pieces of software running on the server which respond to
incoming web requests. PostgreSQL organizes all of the metadata
about the available data, ncWMS provides the climate visualization
layers, pydap responds to requests for the actual data, and pdp
responds to all of the requests that build up the user interface. [Do
a page load showing the network tools]
Demos
Architecture
Bonus
Metadata Database
Python Backend
Pydap
ncWMS
Basemaps
Front-end
Metadata Database
James Hiebert PCIC Data Portal 2.0
Metadata Database
2014-02-18
PCIC Data Portal 2.0
Architecture
Metadata Database
Metadata Database
1. This might be a bit too much detail, but try to bear with me. This
database stores the full relationship strucutre between all of the
data ïŹles that we store and want to publish. It tracks all of the ïŹles
on disk that we have, all of the diïŹ€erent variables that they contain,
full ranges for each variable so that we can quickly set color scales
and such for the visualization layers. It stores all of the metadata
about the ïŹles such as the timesteps that they contain, what their
grid parameters are, what models they are from and how they relate
to other driving models (for example in the case of an RCM forced
by a GCM). All of these can be grouped into “ensembles” which is
a group of rasters that we are publishing together on a single portal
page.
2. The data contained in the schema allows the web application to
function quickly, because everything is quickly searchable without
opening up a bunch of ïŹles and having to read terrabytes of data
just to determine a few key attributes.
Demos
Architecture
Bonus
Metadata Database
Python Backend
Pydap
ncWMS
Basemaps
Front-end
Python Backend
James Hiebert PCIC Data Portal 2.0
Python Backend
2014-02-18
PCIC Data Portal 2.0
Architecture
Python Backend
Python Backend
1. We have written a full web application backend in python which
does all of the ïŹle format translation, all of the database
communication and passes all of the metadata on to the webUI to
be interpreted by the user. The application consists of about 2800
lines of python code plus 1500 lines of testing code that we have
written outright. There’s about another 3000 lines of code which
makes up PyDAP which we have heavily modiïŹed.
Demos
Architecture
Bonus
Metadata Database
Python Backend
Pydap
ncWMS
Basemaps
Front-end
Python Backend
1 ensemble_name = ’bc_prism ’
portal_config = {
’title ’: ’High -Resolution Climatology ’,
’ensemble_name ’: ensemble_name ,
’js_files ’ : wrap_mini ([
6 ’js/ prism_demo_map .js’,
’js/ prism_demo_controls .js’,
’js/ prism_demo_app .js’],
basename=’bc_prism ’, debug=True)
}
11 portal_config = updateConfig (global_config , portal_config )
map_app = wrap_auth(MapApp (** portal_config ), required=False)
dsn = dsn + ’? application_name =pdp_prism ’
with session_scope (dsn) as sesh:
16 conf = db_raster_configurator (sesh , "Download Data", 0.1, 0, ensemble_name ,
root_url= global_config [’app_root ’]. rstrip(’/’) + ’/’ +
ensemble_name + ’/data/’
)
data_server = wrap_auth( RasterServer (dsn , conf ))
21 catalog_server = RasterCatalog (dsn , conf) #No Auth
menu = PrismEnsembleLister (dsn)
portal = PathDispatcher ([
(’^/map /?.*$’, map_app),
(’^/ catalog /.*$’, catalog_server ),
26 (’^/ data /.*$’, data_server ),
(’^/ menu.json .*$’, menu)
]) James Hiebert PCIC Data Portal 2.0
Demos
Architecture
Bonus
Metadata Database
Python Backend
Pydap
ncWMS
Basemaps
Front-end
OPeNDAP and PyDAP
Designed to be a:
“discipline-neutral means of requesting and providing data across
the [web]”
Data Access Protocol (DAP)
Open source
Machine-to-machine transfer of scientiïŹc data
Mostly supported by US scientiïŹc agencies (NOAA, NASA,
NSF)
James Hiebert PCIC Data Portal 2.0
OPeNDAP and PyDAP
Designed to be a:
“discipline-neutral means of requesting and providing data across
the [web]”
Data Access Protocol (DAP)
Open source
Machine-to-machine transfer of scientiïŹc data
Mostly supported by US scientiïŹc agencies (NOAA, NASA,
NSF)
2014-02-18
PCIC Data Portal 2.0
Architecture
Pydap
OPeNDAP and PyDAP
1. PyDAP is the component of the data portal that actually provides
the data download services. It’s an implementation of the
OPeNDAP protocol which is designed to be a discipline neutral
means of transferring data across the web. This protocol is open
source and is designed to OS and application independent such that
you can get data into whatever software you want to use to do your
data analysis. It’s supported by mostly US scientiïŹc agencies such
as NOAA, NASA and the National Science Foundation.
Demos
Architecture
Bonus
Metadata Database
Python Backend
Pydap
ncWMS
Basemaps
Front-end
OPeNDAP and PyDAP
James Hiebert PCIC Data Portal 2.0
OPeNDAP and PyDAP
2014-02-18
PCIC Data Portal 2.0
Architecture
Pydap
OPeNDAP and PyDAP
1. There are a number of diïŹ€erent OPenDAP servers out there, but
PyDAP is the one that we use to serve all of the data itself. Its
architecture is quite a bit more ïŹ‚exible than some of the other
OpenDAP servers out there. This is a rough layout of the
architecture. It has a number of “handlers” which are written to
interpret diïŹ€erent data formats and translate them to the DAP
structure. Then on the top end, there are numerous “responders”
that translate the DAP structure into output formats that the user
wants.
2. [describe more speciïŹcally which parts are our and which we use]
Demos
Architecture
Bonus
Metadata Database
Python Backend
Pydap
ncWMS
Basemaps
Front-end
How much of Pydap is our code?a
a
Source: hg churn
pydap.handlers.pcic: 100%
pydap.handlers.hdf5: 68.0%
pydap.responses.netcdf: 61.5%
pydap.handlers.sql: 12.3%
pydap.handlers.csv: 3.7%
pydap: 2.3%
pydap.responses.xls: 1.3%
pydap.responses.html: ?
James Hiebert PCIC Data Portal 2.0
How much of Pydap is our code?a
a
Source: hg churn
pydap.handlers.pcic: 100%
pydap.handlers.hdf5: 68.0%
pydap.responses.netcdf: 61.5%
pydap.handlers.sql: 12.3%
pydap.handlers.csv: 3.7%
pydap: 2.3%
pydap.responses.xls: 1.3%
pydap.responses.html: ?
2014-02-18
PCIC Data Portal 2.0
Architecture
Pydap
How much of Pydap is our code?a
a
Source: hg churn
1. To give you a bit of an idea of to what degree Pydap was
“oïŹ€-the-shelf”, I ran the command “hg churn” on all of the pydap
repositories, which measures the changes in the repository by lines
of code. The fractions shown are the churn of PCIC staïŹ€ divided by
the total churn of all committers. You can see that we wrote one
handler by ourselves, the hdf and netcdf work is mostly ours, and for
the rest of the modules we only had to make minimal changes.
Demos
Architecture
Bonus
Metadata Database
Python Backend
Pydap
ncWMS
Basemaps
Front-end
Big data, big RAM, BadRequest, Oh My!
James Hiebert PCIC Data Portal 2.0
Big data, big RAM, BadRequest, Oh My!
2014-02-18
PCIC Data Portal 2.0
Architecture
Pydap
Big data, big RAM, BadRequest, Oh My!
1. One of the technical problems that we ran up against was that all of
the available OPeNDAP data servers load their responses entirely
into RAM before sending them out. So if you want to serve up large
data sets, the size of your response is limited by your available RAM
divided by the number of concurrent responses that you are
prepared to serve. If you try and make a request to, say, THREDDS
OPeNDAP server that’s larger than the JVM allocated memory, the
user will just get back a BadRequest error.
2. For some applications this may be ïŹne, or even desirable, but for
the purposes of serving large data sets, the network pipe is usually
the bottleneck. Rather than annoy and frustrate the user by forcing
them to carve up their data requests to be arbitrarily small, we
wanted to allow as large a request as the users were prepared to
accept.
Demos
Architecture
Bonus
Metadata Database
Python Backend
Pydap
ncWMS
Basemaps
Front-end
Generators: 70’s tech that works today!
a function which yields execution rather than returning
yields values one at a time, on-demand
low memory footprint
faster; no calling overhead
elegant!
James Hiebert PCIC Data Portal 2.0
Generators: 70’s tech that works today!
a function which yields execution rather than returning
yields values one at a time, on-demand
low memory footprint
faster; no calling overhead
elegant!
2014-02-18
PCIC Data Portal 2.0
Architecture
Pydap
Generators: 70’s tech that works today!
1. Enter generators and coroutines. Generators are a programming
control where a function, rather than returning, can yield execution
and sort of return values one at a time on-demand. It has the
performance advantage of maintaining a low memory footprint, if
you want to return something large, you don’t have to do so all at
once, and they tend to be slightly faster, because you avoid a lot of
calling overhead of stack manipulation.
2. Generators have been around for a good thirty-ïŹve years, but have
been experiencing a bit of a Renaissance lately. If one programs in
python, they are extremely easy to use, and with the advent of big
data applications, they have a lot of utility.
Demos
Architecture
Bonus
Metadata Database
Python Backend
Pydap
ncWMS
Basemaps
Front-end
Generator Example
from i t e r t o o l s import i s l i c e
def f i b o n a c c i ( ) :
a , b = 0 , 1
while True :
y i e l d a
a , b = b , a+b
# p r i n t the f i r s t 10 v a l u e s of the f i b o n a c c i sequen
for x in i s l i c e ( f i b o n a c c i () , 10):
print x
James Hiebert PCIC Data Portal 2.0
Generator Example
from i t e r t o o l s import i s l i c e
def f i b o n a c c i ( ) :
a , b = 0 , 1
while True :
y i e l d a
a , b = b , a+b
# p r i n t the f i r s t 10 v a l u e s of the f i b o n a c c i sequen
for x in i s l i c e ( f i b o n a c c i () , 10):
print x
2014-02-18
PCIC Data Portal 2.0
Architecture
Pydap
Generator Example
1. For those who aren’t familiar, here’s a quick example to understand
generators. Generating a Fibonacci sequence is kind of the
quintessential toy example. The generator function, ïŹbonacci(), is
deïŹned at the top. You’ll notice that it’s an inïŹnite loop, because
the sequence is by deïŹnition, inïŹnite. But rather than building up
the values in memory, it just has a simple and elegant “yield”
statement right inside the loop. The calling loop down below,
actually pulls items from the function, one at a time, and then does
whatever it needs to do with them. It’s fast, eïŹƒcient, and actually
fairly elegant, readable code, too.
2. So you can see, for something like a web application serving big
datasets, this is perfect, because we can provide a very low latency
response, and then stream the data to the user as our high-latency
operations like disk reads take place.
3. None of the OPeNDAP servers out there supported streaming, so
many of the modiïŹcations that we made to PyDAP were for it to
Demos
Architecture
Bonus
Metadata Database
Python Backend
Pydap
ncWMS
Basemaps
Front-end
ncWMS
OïŹ€-the-shelf
Visualization of NetCDF
rasters
Full featured WMS server
Limitations
File-based layer
conïŹgurations (tedious
and error-prone!)
Loads layers serially on
startup (slow!)
Scans layers for ranges
(really slow!)
James Hiebert PCIC Data Portal 2.0
ncWMS
OïŹ€-the-shelf
Visualization of NetCDF
rasters
Full featured WMS server
Limitations
File-based layer
conïŹgurations (tedious
and error-prone!)
Loads layers serially on
startup (slow!)
Scans layers for ranges
(really slow!)
2014-02-18
PCIC Data Portal 2.0
Architecture
ncWMS
ncWMS
1. We’re using a modiïŹed version ncWMS to provide visualization of
the climate rasters. It gives us a lot of stuïŹ€ for free. It’s a full
featured Web Mapping Service server that converts netcdf ïŹles into
tiled images usable on the web. [demo]
2. Unfortunately it has a few limitations that make it non-ideal for use
with big data. To conïŹgure a layer, you have to go through the
ïŹles, one-by-one and add them to the list and conïŹgure 5-10
diïŹ€erent attributes. Additionally, when ever you start, re-start the
server, it goes through every single ïŹle, in order, scans them to
determine their ranges, so that it can assign a colorbar. This can
take many minutes, possibly hours, and it only gets slower the more
layers you add.
3. David Bronaugh has done some great work making modiïŹcations to
ncWMS to run it oïŹ€ of our metadata database, so that it gets its
list of layers from the database and all of the variable ranges and
everything. This has made it possible to scale our deployment up
Demos
Architecture
Bonus
Metadata Database
Python Backend
Pydap
ncWMS
Basemaps
Front-end
Mapnik and Basemaps
Create our own basemaps from OpenStreetMap
Maximum ïŹ‚exibility in domain and projection
James Hiebert PCIC Data Portal 2.0
Mapnik and Basemaps
Create our own basemaps from OpenStreetMap
Maximum ïŹ‚exibility in domain and projection
2014-02-18
PCIC Data Portal 2.0
Architecture
Basemaps
Mapnik and Basemaps
1. A ïŹ‚at image of the climate rasters aren’t that useful, especially if
you want to look at details in a particular locality. So thanks to
some great work by Basil, we have our own web basemaps based on
data from the OpenStreetMap project. We have the ability to
generate our own basemaps in any projection that we want and for
any domain. And we have control over the tile service so we can
tweak it for maximum performance.
Demos
Architecture
Bonus
Metadata Database
Python Backend
Pydap
ncWMS
Basemaps
Front-end
JavaScript Front-end
2600 lines of JavaScript
Responsible for tying everything together for the web user
Does little to no processing itself / just makes requests to
various servers
James Hiebert PCIC Data Portal 2.0
JavaScript Front-end
2600 lines of JavaScript
Responsible for tying everything together for the web user
Does little to no processing itself / just makes requests to
various servers
2014-02-18
PCIC Data Portal 2.0
Architecture
Front-end
JavaScript Front-end
1. Finally, the last piece of the software stack is the JavaScript
front-end that ties everything else together for the user. This is
probably the most ïŹnicky and possibly most complex piece of the
code base even though it doesn’t actually provide any functionality
in and of itself. It has be be aware of all of the various services that
are provided, it has to asyncronously make the requests, process
them, display things to the user, and often the results of one
request aïŹ€ect other things on the page.
2. [Show dataset selection, and how it is a request. Show how dataset
selection triggers layer change the loading of layer attributes]. If any
of these things fails, badness ensues.
Demos
Architecture
Bonus
Automated Testing
Automated Testing
James Hiebert PCIC Data Portal 2.0
Automated Testing
2014-02-18
PCIC Data Portal 2.0
Bonus
Automated Testing
1. In our two main repositories, we have about 1500 lines of code
speciïŹcally for automated testing of the functionality of both the
PCDS data portal and the raster portals. This test suite covers a
large swath of the code base, but is also compact so we can run the
full test suite in less than 5 seconds. This is fast enough that it can
be intergrated directly into your development workïŹ‚ow and you can
ensure that any changes you make to the code have not negatively
and unintendedly aïŹ€ected any previously programmed functionality.
Demos
Architecture
Bonus
Automated Testing
Automated Testing
Why?
There’s a lot of code and many code paths. Manual testing is
insane, takes days, and isn’t complete.
Provides an “executable speciïŹcation” for what the software
should do
Provides a way to ensure that code changes don’t aïŹ€ect
existing functionality (a.k.a. regression testing)
James Hiebert PCIC Data Portal 2.0
Automated Testing
Why?
There’s a lot of code and many code paths. Manual testing is
insane, takes days, and isn’t complete.
Provides an “executable speciïŹcation” for what the software
should do
Provides a way to ensure that code changes don’t aïŹ€ect
existing functionality (a.k.a. regression testing)
2014-02-18
PCIC Data Portal 2.0
Bonus
Automated Testing
Automated Testing
1. So with a system that provide this much functionality, there are a
lot of diïŹ€erent code paths through it, any of which could be taken
for diïŹ€erent user requests. It’s important to test as many of these
as possible, every time you make changes in the system. To
manually go through all of these–and we did with the release of the
PCDS portal a year ago–is meticulous, time consuming and error
prone. Automating this process pays oïŹ€ very quickly both in time
and in code quality.
2. Additionally, the tests provide a sort of “executable speciïŹcation”,
declaring what the various pieces of the code are supposed to do. If
a tests fails, your code doesn’t meet the spec.
3. Finally, the test suite provides a baseline against which further
development cannot regress. It ensures that future changes will not
negatively impact the functionality that we have previously
developed.
4. [demo of pytest]
Demos
Architecture
Bonus
Automated Testing
Questions
and hopefully answers
James Hiebert PCIC Data Portal 2.0

Weitere Àhnliche Inhalte

Was ist angesagt?

High Performance and Scalable Geospatial Analytics on Cloud with Open Source
High Performance and Scalable Geospatial Analytics on Cloud with Open SourceHigh Performance and Scalable Geospatial Analytics on Cloud with Open Source
High Performance and Scalable Geospatial Analytics on Cloud with Open SourceDataWorks Summit
 
On-Prem Solution for the Selection of Wind Energy Models
On-Prem Solution for the Selection of Wind Energy ModelsOn-Prem Solution for the Selection of Wind Energy Models
On-Prem Solution for the Selection of Wind Energy ModelsDatabricks
 
Data Gloveboxes: A Philosophy of Data Science Data Security
Data Gloveboxes: A Philosophy of Data Science Data SecurityData Gloveboxes: A Philosophy of Data Science Data Security
Data Gloveboxes: A Philosophy of Data Science Data SecurityDataWorks Summit
 
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...Big Data Spain
 
Acquisition of Seismic, Hydroacoustic, and Infrasonic Data with Apache NiFi a...
Acquisition of Seismic, Hydroacoustic, and Infrasonic Data with Apache NiFi a...Acquisition of Seismic, Hydroacoustic, and Infrasonic Data with Apache NiFi a...
Acquisition of Seismic, Hydroacoustic, and Infrasonic Data with Apache NiFi a...DataWorks Summit
 
Stream, Stream, Stream: Different Streaming Methods with Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Spark and KafkaStream, Stream, Stream: Different Streaming Methods with Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Spark and KafkaDataWorks Summit
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardParis Data Engineers !
 
Intro to Pinot (2016-01-04)
Intro to Pinot (2016-01-04)Intro to Pinot (2016-01-04)
Intro to Pinot (2016-01-04)Jean-François Im
 
Best pratices at BGI for the Challenges in the Era of Big Genomics Data
Best pratices at BGI for the Challenges in the Era of Big Genomics DataBest pratices at BGI for the Challenges in the Era of Big Genomics Data
Best pratices at BGI for the Challenges in the Era of Big Genomics DataXing Xu
 
ExxonMobil’s journey to unleash time-series data with open source technology
ExxonMobil’s journey to unleash time-series data with open source technologyExxonMobil’s journey to unleash time-series data with open source technology
ExxonMobil’s journey to unleash time-series data with open source technologyDataWorks Summit
 
Analytical DBMS to Apache Spark Auto Migration Framework with Edward Zhang an...
Analytical DBMS to Apache Spark Auto Migration Framework with Edward Zhang an...Analytical DBMS to Apache Spark Auto Migration Framework with Edward Zhang an...
Analytical DBMS to Apache Spark Auto Migration Framework with Edward Zhang an...Databricks
 
Streaming analytics state of the art
Streaming analytics state of the artStreaming analytics state of the art
Streaming analytics state of the artStavros Kontopoulos
 
Rapids: Data Science on GPUs
Rapids: Data Science on GPUsRapids: Data Science on GPUs
Rapids: Data Science on GPUsinside-BigData.com
 
Hortonworks on IBM POWER Analytics / AI
Hortonworks on IBM POWER Analytics / AIHortonworks on IBM POWER Analytics / AI
Hortonworks on IBM POWER Analytics / AIDataWorks Summit
 
Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020
Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020
Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020Taro L. Saito
 
The "Big Data" Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedInThe "Big Data" Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedInSam Shah
 
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...DataWorks Summit
 
An Introduction to Druid
An Introduction to DruidAn Introduction to Druid
An Introduction to DruidDataWorks Summit
 
Social Networks Analysis
Social Networks AnalysisSocial Networks Analysis
Social Networks AnalysisJoud Khattab
 
Practice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China MobilePractice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China MobileDataWorks Summit
 

Was ist angesagt? (20)

High Performance and Scalable Geospatial Analytics on Cloud with Open Source
High Performance and Scalable Geospatial Analytics on Cloud with Open SourceHigh Performance and Scalable Geospatial Analytics on Cloud with Open Source
High Performance and Scalable Geospatial Analytics on Cloud with Open Source
 
On-Prem Solution for the Selection of Wind Energy Models
On-Prem Solution for the Selection of Wind Energy ModelsOn-Prem Solution for the Selection of Wind Energy Models
On-Prem Solution for the Selection of Wind Energy Models
 
Data Gloveboxes: A Philosophy of Data Science Data Security
Data Gloveboxes: A Philosophy of Data Science Data SecurityData Gloveboxes: A Philosophy of Data Science Data Security
Data Gloveboxes: A Philosophy of Data Science Data Security
 
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
 
Acquisition of Seismic, Hydroacoustic, and Infrasonic Data with Apache NiFi a...
Acquisition of Seismic, Hydroacoustic, and Infrasonic Data with Apache NiFi a...Acquisition of Seismic, Hydroacoustic, and Infrasonic Data with Apache NiFi a...
Acquisition of Seismic, Hydroacoustic, and Infrasonic Data with Apache NiFi a...
 
Stream, Stream, Stream: Different Streaming Methods with Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Spark and KafkaStream, Stream, Stream: Different Streaming Methods with Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Spark and Kafka
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
 
Intro to Pinot (2016-01-04)
Intro to Pinot (2016-01-04)Intro to Pinot (2016-01-04)
Intro to Pinot (2016-01-04)
 
Best pratices at BGI for the Challenges in the Era of Big Genomics Data
Best pratices at BGI for the Challenges in the Era of Big Genomics DataBest pratices at BGI for the Challenges in the Era of Big Genomics Data
Best pratices at BGI for the Challenges in the Era of Big Genomics Data
 
ExxonMobil’s journey to unleash time-series data with open source technology
ExxonMobil’s journey to unleash time-series data with open source technologyExxonMobil’s journey to unleash time-series data with open source technology
ExxonMobil’s journey to unleash time-series data with open source technology
 
Analytical DBMS to Apache Spark Auto Migration Framework with Edward Zhang an...
Analytical DBMS to Apache Spark Auto Migration Framework with Edward Zhang an...Analytical DBMS to Apache Spark Auto Migration Framework with Edward Zhang an...
Analytical DBMS to Apache Spark Auto Migration Framework with Edward Zhang an...
 
Streaming analytics state of the art
Streaming analytics state of the artStreaming analytics state of the art
Streaming analytics state of the art
 
Rapids: Data Science on GPUs
Rapids: Data Science on GPUsRapids: Data Science on GPUs
Rapids: Data Science on GPUs
 
Hortonworks on IBM POWER Analytics / AI
Hortonworks on IBM POWER Analytics / AIHortonworks on IBM POWER Analytics / AI
Hortonworks on IBM POWER Analytics / AI
 
Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020
Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020
Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020
 
The "Big Data" Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedInThe "Big Data" Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedIn
 
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...
The Rise of Big Data Governance: Insight on this Emerging Trend from Active O...
 
An Introduction to Druid
An Introduction to DruidAn Introduction to Druid
An Introduction to Druid
 
Social Networks Analysis
Social Networks AnalysisSocial Networks Analysis
Social Networks Analysis
 
Practice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China MobilePractice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China Mobile
 

Andere mochten auch

ŚŚ™ŚšŚ™ŚĄŚ™Ś Ś•ŚŠŚ‘ŚąŚ•Ś Ś™Ś Ś‘Ś’ŚœŚ‘Ś•Śą 15.3.13
ŚŚ™ŚšŚ™ŚĄŚ™Ś Ś•ŚŠŚ‘ŚąŚ•Ś Ś™Ś Ś‘Ś’ŚœŚ‘Ś•Śą 15.3.13ŚŚ™ŚšŚ™ŚĄŚ™Ś Ś•ŚŠŚ‘ŚąŚ•Ś Ś™Ś Ś‘Ś’ŚœŚ‘Ś•Śą 15.3.13
ŚŚ™ŚšŚ™ŚĄŚ™Ś Ś•ŚŠŚ‘ŚąŚ•Ś Ś™Ś Ś‘Ś’ŚœŚ‘Ś•Śą 15.3.13izrael archive
 
Presentación en power point Inglés
Presentación en power point InglésPresentación en power point Inglés
Presentación en power point InglésEHZ4
 
Customer readiness signoff v1b cmmaao pmi pmp
Customer readiness signoff v1b cmmaao pmi pmpCustomer readiness signoff v1b cmmaao pmi pmp
Customer readiness signoff v1b cmmaao pmi pmpvishvasyadav45
 
ОгражЎающОД ĐșĐŸĐœŃŃ‚Ń€ŃƒĐșцоо Simplex system
ОгражЎающОД ĐșĐŸĐœŃŃ‚Ń€ŃƒĐșцоо Simplex systemОгражЎающОД ĐșĐŸĐœŃŃ‚Ń€ŃƒĐșцоо Simplex system
ОгражЎающОД ĐșĐŸĐœŃŃ‚Ń€ŃƒĐșцоо Simplex systemSimplex_Panel_System
 
сĐČĐŸĐ±ĐŸĐŽĐ° ŃĐŸĐ±Ń€Đ°ĐœĐžĐč 18.05.13
сĐČĐŸĐ±ĐŸĐŽĐ° ŃĐŸĐ±Ń€Đ°ĐœĐžĐč 18.05.13сĐČĐŸĐ±ĐŸĐŽĐ° ŃĐŸĐ±Ń€Đ°ĐœĐžĐč 18.05.13
сĐČĐŸĐ±ĐŸĐŽĐ° ŃĐŸĐ±Ń€Đ°ĐœĐžĐč 18.05.13Vadim Karastelev
 
Present de cuenca pa l prof carlmalav
Present de cuenca pa l prof carlmalavPresent de cuenca pa l prof carlmalav
Present de cuenca pa l prof carlmalavNatasha Reyes Merejildo
 
WordPress Web Design in Birmingham (Infographic)
WordPress Web Design in Birmingham (Infographic)WordPress Web Design in Birmingham (Infographic)
WordPress Web Design in Birmingham (Infographic)Opace Web Design
 
EskiÌ‡ĆŸehi̇r
EskiÌ‡ĆŸehi̇rEskiÌ‡ĆŸehi̇r
EskiÌ‡ĆŸehi̇rgamzekur26
 
The Social Media Resume
The Social Media ResumeThe Social Media Resume
The Social Media Resumeandersonjodi
 
Isma, carlos & nel
Isma, carlos & nelIsma, carlos & nel
Isma, carlos & nelcoleballobar
 
Proyecto desarrollo del pensamiento
Proyecto desarrollo del pensamientoProyecto desarrollo del pensamiento
Proyecto desarrollo del pensamientoNatasha Reyes Merejildo
 
~~ Newer ( connecitivity 02 ) ancient time class info. related to me ~~
~~ Newer ( connecitivity 02  )  ancient time class info. related to me ~~~~ Newer ( connecitivity 02  )  ancient time class info. related to me ~~
~~ Newer ( connecitivity 02 ) ancient time class info. related to me ~~Deepak Somaji-Sawant
 
Yourprezi julian david siyo
Yourprezi julian david siyoYourprezi julian david siyo
Yourprezi julian david siyojulian0822
 
American cell phone dependency
American cell phone dependencyAmerican cell phone dependency
American cell phone dependencyjeffreywinton
 
May 2011 highlights slideshow
May 2011 highlights slideshowMay 2011 highlights slideshow
May 2011 highlights slideshowtexasmochi
 
P-SellingYourHouseSummer2016
P-SellingYourHouseSummer2016P-SellingYourHouseSummer2016
P-SellingYourHouseSummer2016Jean Venezia
 
Nieznajomoƛć prawa szkodzi fb1
Nieznajomoƛć prawa szkodzi fb1Nieznajomoƛć prawa szkodzi fb1
Nieznajomoƛć prawa szkodzi fb1Ryszard Stolarz
 
9 3 multiplying polynomials by monomials lesson
9 3  multiplying polynomials by monomials lesson9 3  multiplying polynomials by monomials lesson
9 3 multiplying polynomials by monomials lessongwilson8786
 

Andere mochten auch (18)

ŚŚ™ŚšŚ™ŚĄŚ™Ś Ś•ŚŠŚ‘ŚąŚ•Ś Ś™Ś Ś‘Ś’ŚœŚ‘Ś•Śą 15.3.13
ŚŚ™ŚšŚ™ŚĄŚ™Ś Ś•ŚŠŚ‘ŚąŚ•Ś Ś™Ś Ś‘Ś’ŚœŚ‘Ś•Śą 15.3.13ŚŚ™ŚšŚ™ŚĄŚ™Ś Ś•ŚŠŚ‘ŚąŚ•Ś Ś™Ś Ś‘Ś’ŚœŚ‘Ś•Śą 15.3.13
ŚŚ™ŚšŚ™ŚĄŚ™Ś Ś•ŚŠŚ‘ŚąŚ•Ś Ś™Ś Ś‘Ś’ŚœŚ‘Ś•Śą 15.3.13
 
Presentación en power point Inglés
Presentación en power point InglésPresentación en power point Inglés
Presentación en power point Inglés
 
Customer readiness signoff v1b cmmaao pmi pmp
Customer readiness signoff v1b cmmaao pmi pmpCustomer readiness signoff v1b cmmaao pmi pmp
Customer readiness signoff v1b cmmaao pmi pmp
 
ОгражЎающОД ĐșĐŸĐœŃŃ‚Ń€ŃƒĐșцоо Simplex system
ОгражЎающОД ĐșĐŸĐœŃŃ‚Ń€ŃƒĐșцоо Simplex systemОгражЎающОД ĐșĐŸĐœŃŃ‚Ń€ŃƒĐșцоо Simplex system
ОгражЎающОД ĐșĐŸĐœŃŃ‚Ń€ŃƒĐșцоо Simplex system
 
сĐČĐŸĐ±ĐŸĐŽĐ° ŃĐŸĐ±Ń€Đ°ĐœĐžĐč 18.05.13
сĐČĐŸĐ±ĐŸĐŽĐ° ŃĐŸĐ±Ń€Đ°ĐœĐžĐč 18.05.13сĐČĐŸĐ±ĐŸĐŽĐ° ŃĐŸĐ±Ń€Đ°ĐœĐžĐč 18.05.13
сĐČĐŸĐ±ĐŸĐŽĐ° ŃĐŸĐ±Ń€Đ°ĐœĐžĐč 18.05.13
 
Present de cuenca pa l prof carlmalav
Present de cuenca pa l prof carlmalavPresent de cuenca pa l prof carlmalav
Present de cuenca pa l prof carlmalav
 
WordPress Web Design in Birmingham (Infographic)
WordPress Web Design in Birmingham (Infographic)WordPress Web Design in Birmingham (Infographic)
WordPress Web Design in Birmingham (Infographic)
 
EskiÌ‡ĆŸehi̇r
EskiÌ‡ĆŸehi̇rEskiÌ‡ĆŸehi̇r
EskiÌ‡ĆŸehi̇r
 
The Social Media Resume
The Social Media ResumeThe Social Media Resume
The Social Media Resume
 
Isma, carlos & nel
Isma, carlos & nelIsma, carlos & nel
Isma, carlos & nel
 
Proyecto desarrollo del pensamiento
Proyecto desarrollo del pensamientoProyecto desarrollo del pensamiento
Proyecto desarrollo del pensamiento
 
~~ Newer ( connecitivity 02 ) ancient time class info. related to me ~~
~~ Newer ( connecitivity 02  )  ancient time class info. related to me ~~~~ Newer ( connecitivity 02  )  ancient time class info. related to me ~~
~~ Newer ( connecitivity 02 ) ancient time class info. related to me ~~
 
Yourprezi julian david siyo
Yourprezi julian david siyoYourprezi julian david siyo
Yourprezi julian david siyo
 
American cell phone dependency
American cell phone dependencyAmerican cell phone dependency
American cell phone dependency
 
May 2011 highlights slideshow
May 2011 highlights slideshowMay 2011 highlights slideshow
May 2011 highlights slideshow
 
P-SellingYourHouseSummer2016
P-SellingYourHouseSummer2016P-SellingYourHouseSummer2016
P-SellingYourHouseSummer2016
 
Nieznajomoƛć prawa szkodzi fb1
Nieznajomoƛć prawa szkodzi fb1Nieznajomoƛć prawa szkodzi fb1
Nieznajomoƛć prawa szkodzi fb1
 
9 3 multiplying polynomials by monomials lesson
9 3  multiplying polynomials by monomials lesson9 3  multiplying polynomials by monomials lesson
9 3 multiplying polynomials by monomials lesson
 

Ähnlich wie PCIC Data Portal 2.0

Dataservices: Processing Big Data the Microservice Way
Dataservices: Processing Big Data the Microservice WayDataservices: Processing Big Data the Microservice Way
Dataservices: Processing Big Data the Microservice WayQAware GmbH
 
Creating a scalable & cost efficient BI infrastructure for a startup in the A...
Creating a scalable & cost efficient BI infrastructure for a startup in the A...Creating a scalable & cost efficient BI infrastructure for a startup in the A...
Creating a scalable & cost efficient BI infrastructure for a startup in the A...vcrisan
 
Off-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataOff-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataHostedbyConfluent
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangDatabricks
 
Time series data monitoring at 99acres.com
Time series data monitoring at 99acres.comTime series data monitoring at 99acres.com
Time series data monitoring at 99acres.comRavi Raj
 
20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQLKohei KaiGai
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataRobert Grossman
 
Hw09 Hadoop Based Data Mining Platform For The Telecom Industry
Hw09   Hadoop Based Data Mining Platform For The Telecom IndustryHw09   Hadoop Based Data Mining Platform For The Telecom Industry
Hw09 Hadoop Based Data Mining Platform For The Telecom IndustryCloudera, Inc.
 
IBM THINK 2018 - IBM Cloud SQL Query Introduction
IBM THINK 2018 - IBM Cloud SQL Query IntroductionIBM THINK 2018 - IBM Cloud SQL Query Introduction
IBM THINK 2018 - IBM Cloud SQL Query IntroductionTorsten Steinbach
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analyticskgshukla
 
P4_tutorial.pdf
P4_tutorial.pdfP4_tutorial.pdf
P4_tutorial.pdfPramodhN3
 
Python and trending_data_ops
Python and trending_data_opsPython and trending_data_ops
Python and trending_data_opschase pettet
 
What's New in Cytoscape
What's New in CytoscapeWhat's New in Cytoscape
What's New in CytoscapeKeiichiro Ono
 
Big Data, Mob Scale.
Big Data, Mob Scale.Big Data, Mob Scale.
Big Data, Mob Scale.darach
 
Big Events, Mob Scale - Darach Ennis (Push Technology)
Big Events, Mob Scale - Darach Ennis (Push Technology)Big Events, Mob Scale - Darach Ennis (Push Technology)
Big Events, Mob Scale - Darach Ennis (Push Technology)jaxLondonConference
 
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshThe Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshIanFurlong4
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshSion Smith
 
What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017 What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017 Databricks
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkDatabricks
 
Data exchange alternatives, SBIS conference in Stockholm (2008)
Data exchange alternatives, SBIS conference in Stockholm (2008)Data exchange alternatives, SBIS conference in Stockholm (2008)
Data exchange alternatives, SBIS conference in Stockholm (2008)Dag Endresen
 

Ähnlich wie PCIC Data Portal 2.0 (20)

Dataservices: Processing Big Data the Microservice Way
Dataservices: Processing Big Data the Microservice WayDataservices: Processing Big Data the Microservice Way
Dataservices: Processing Big Data the Microservice Way
 
Creating a scalable & cost efficient BI infrastructure for a startup in the A...
Creating a scalable & cost efficient BI infrastructure for a startup in the A...Creating a scalable & cost efficient BI infrastructure for a startup in the A...
Creating a scalable & cost efficient BI infrastructure for a startup in the A...
 
Off-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataOff-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier Data
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
 
Time series data monitoring at 99acres.com
Time series data monitoring at 99acres.comTime series data monitoring at 99acres.com
Time series data monitoring at 99acres.com
 
20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate Data
 
Hw09 Hadoop Based Data Mining Platform For The Telecom Industry
Hw09   Hadoop Based Data Mining Platform For The Telecom IndustryHw09   Hadoop Based Data Mining Platform For The Telecom Industry
Hw09 Hadoop Based Data Mining Platform For The Telecom Industry
 
IBM THINK 2018 - IBM Cloud SQL Query Introduction
IBM THINK 2018 - IBM Cloud SQL Query IntroductionIBM THINK 2018 - IBM Cloud SQL Query Introduction
IBM THINK 2018 - IBM Cloud SQL Query Introduction
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analytics
 
P4_tutorial.pdf
P4_tutorial.pdfP4_tutorial.pdf
P4_tutorial.pdf
 
Python and trending_data_ops
Python and trending_data_opsPython and trending_data_ops
Python and trending_data_ops
 
What's New in Cytoscape
What's New in CytoscapeWhat's New in Cytoscape
What's New in Cytoscape
 
Big Data, Mob Scale.
Big Data, Mob Scale.Big Data, Mob Scale.
Big Data, Mob Scale.
 
Big Events, Mob Scale - Darach Ennis (Push Technology)
Big Events, Mob Scale - Darach Ennis (Push Technology)Big Events, Mob Scale - Darach Ennis (Push Technology)
Big Events, Mob Scale - Darach Ennis (Push Technology)
 
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshThe Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data Mesh
 
What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017 What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache Spark
 
Data exchange alternatives, SBIS conference in Stockholm (2008)
Data exchange alternatives, SBIS conference in Stockholm (2008)Data exchange alternatives, SBIS conference in Stockholm (2008)
Data exchange alternatives, SBIS conference in Stockholm (2008)
 

KĂŒrzlich hochgeladen

SHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions PresentationSHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions PresentationShrmpro
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyviewmasabamasaba
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...masabamasaba
 
call girls in Vaishali (Ghaziabad) 🔝 >àŒ’8448380779 🔝 genuine Escort Service đŸ”âœ”ïžâœ”ïž
call girls in Vaishali (Ghaziabad) 🔝 >àŒ’8448380779 🔝 genuine Escort Service đŸ”âœ”ïžâœ”ïžcall girls in Vaishali (Ghaziabad) 🔝 >àŒ’8448380779 🔝 genuine Escort Service đŸ”âœ”ïžâœ”ïž
call girls in Vaishali (Ghaziabad) 🔝 >àŒ’8448380779 🔝 genuine Escort Service đŸ”âœ”ïžâœ”ïžDelhi Call girls
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...masabamasaba
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareJim McKeeth
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durbanmasabamasaba
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024Mind IT Systems
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfproinshot.com
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 

KĂŒrzlich hochgeladen (20)

SHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions PresentationSHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions Presentation
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
call girls in Vaishali (Ghaziabad) 🔝 >àŒ’8448380779 🔝 genuine Escort Service đŸ”âœ”ïžâœ”ïž
call girls in Vaishali (Ghaziabad) 🔝 >àŒ’8448380779 🔝 genuine Escort Service đŸ”âœ”ïžâœ”ïžcall girls in Vaishali (Ghaziabad) 🔝 >àŒ’8448380779 🔝 genuine Escort Service đŸ”âœ”ïžâœ”ïž
call girls in Vaishali (Ghaziabad) 🔝 >àŒ’8448380779 🔝 genuine Escort Service đŸ”âœ”ïžâœ”ïž
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 

PCIC Data Portal 2.0

  • 1. Demos Architecture Bonus PCIC Data Portal 2.0 StaïŹ€ Meeting James Hiebert February 18, 2014 James Hiebert PCIC Data Portal 2.0
  • 2. Demos Architecture Bonus Outline 1 Demos 2 Architecture Metadata Database Python Backend Pydap ncWMS Basemaps Front-end 3 Bonus Automated Testing James Hiebert PCIC Data Portal 2.0
  • 3. Outline 1 Demos 2 Architecture Metadata Database Python Backend Pydap ncWMS Basemaps Front-end 3 Bonus Automated Testing 2014-02-18 PCIC Data Portal 2.0 Outline 1. Last week we deployed our 4th and hopefully ïŹnal release candidate for version 2.0 of the PCIC Data Portal. It’s been a four month beta period over which we have received and responded to feedback, both from inside PCIC and from some external beta testers. Many of you have seen these at the various theme meetings that we had throughout the fall, but I’d like to take this opportunity to both introduce the rest of you to the data portal as well as elaborate on more on what is running behind the scenes and all of the work that has gone into producing it. 2. Typically in these presentations, I hold you captive with all of the technical details ïŹrst and save the demo for the end. But in this case, I’ll start with the demo and then if you don’t care about how we did it, you can just check out after that.
  • 5. Raster Portal(s) Coming soon! 2014-02-18 PCIC Data Portal 2.0 Demos Raster Portal(s) 1. The software that we have written are a variety of components to generally handle the organization and presentation of raster data; that is gridded ïŹelds of spatiotemporal data. There are several sets of high value data, for which we have written a “raster portal” which can serve that data up.
  • 7. BCSD Downscale Canada 2014-02-18 PCIC Data Portal 2.0 Demos BCSD Downscale Canada 1. You’ll see that the feature set is intentionally fairly sparse. The application’s purpose is to allow the users to get the data they want, and only the data they want, and then to send them on their way. The main section of screen real estate is the map. The map is for displaying the areas for which data exists and then to allow the user to select an area for which to download. 2. In the top right, there is a tree selection which controls the dataset that is displayed and that which will be downloaded. And ïŹnally there are a couple options for selecting a time range and data format. 3. We only support formats which support multidimensional data, which isn’t very many right now. We’ll be adding Arc ASCII Grid by the end of the ïŹscal year, which isn’t technically multidimensional, but we’ll probably send a zip ïŹle of individual grids, one per timestep.
  • 9. BC PRISM 2014-02-18 PCIC Data Portal 2.0 Demos BC PRISM 1. The BC PRISM portal is very similar to the BCSD Downscaling portal, with a few minor diïŹ€erences. First of all the map projection is speciïŹc to BC. We’ve used the BC Albers projection, which is a little more visually appealling (though it does present some challenges). Secondly, because the PRISM data only consists of monthly climatologies, the data volume in the temporal dimension is very small. For that reason, we elimintated the time subset controls, and chose just to give the user the entire time range.
  • 12. Software Components 2014-02-18 PCIC Data Portal 2.0 Architecture Software Components 1. One thing that you’ll notice from this diagram is that the data itself is at the foundation of this software stack. Without the data in place before hand, essentially nothing else can exist without it. Even the metadata in the database comes from the NetCDF ïŹles. This is why we have been somewhat militant about wanting your data to be ïŹnalized before we begin to work on the portal to publish it. 2. The NetCDF box here is the only thing that just data sitting on disk. These four boxes (PostgreSQL, ncWMS, pydap, pdp) are all diïŹ€erent pieces of software running on the server which respond to incoming web requests. PostgreSQL organizes all of the metadata about the available data, ncWMS provides the climate visualization layers, pydap responds to requests for the actual data, and pdp responds to all of the requests that build up the user interface. [Do a page load showing the network tools]
  • 14. Metadata Database 2014-02-18 PCIC Data Portal 2.0 Architecture Metadata Database Metadata Database 1. This might be a bit too much detail, but try to bear with me. This database stores the full relationship strucutre between all of the data ïŹles that we store and want to publish. It tracks all of the ïŹles on disk that we have, all of the diïŹ€erent variables that they contain, full ranges for each variable so that we can quickly set color scales and such for the visualization layers. It stores all of the metadata about the ïŹles such as the timesteps that they contain, what their grid parameters are, what models they are from and how they relate to other driving models (for example in the case of an RCM forced by a GCM). All of these can be grouped into “ensembles” which is a group of rasters that we are publishing together on a single portal page. 2. The data contained in the schema allows the web application to function quickly, because everything is quickly searchable without opening up a bunch of ïŹles and having to read terrabytes of data just to determine a few key attributes.
  • 16. Python Backend 2014-02-18 PCIC Data Portal 2.0 Architecture Python Backend Python Backend 1. We have written a full web application backend in python which does all of the ïŹle format translation, all of the database communication and passes all of the metadata on to the webUI to be interpreted by the user. The application consists of about 2800 lines of python code plus 1500 lines of testing code that we have written outright. There’s about another 3000 lines of code which makes up PyDAP which we have heavily modiïŹed.
  • 17. Demos Architecture Bonus Metadata Database Python Backend Pydap ncWMS Basemaps Front-end Python Backend 1 ensemble_name = ’bc_prism ’ portal_config = { ’title ’: ’High -Resolution Climatology ’, ’ensemble_name ’: ensemble_name , ’js_files ’ : wrap_mini ([ 6 ’js/ prism_demo_map .js’, ’js/ prism_demo_controls .js’, ’js/ prism_demo_app .js’], basename=’bc_prism ’, debug=True) } 11 portal_config = updateConfig (global_config , portal_config ) map_app = wrap_auth(MapApp (** portal_config ), required=False) dsn = dsn + ’? application_name =pdp_prism ’ with session_scope (dsn) as sesh: 16 conf = db_raster_configurator (sesh , "Download Data", 0.1, 0, ensemble_name , root_url= global_config [’app_root ’]. rstrip(’/’) + ’/’ + ensemble_name + ’/data/’ ) data_server = wrap_auth( RasterServer (dsn , conf )) 21 catalog_server = RasterCatalog (dsn , conf) #No Auth menu = PrismEnsembleLister (dsn) portal = PathDispatcher ([ (’^/map /?.*$’, map_app), (’^/ catalog /.*$’, catalog_server ), 26 (’^/ data /.*$’, data_server ), (’^/ menu.json .*$’, menu) ]) James Hiebert PCIC Data Portal 2.0
  • 18. Demos Architecture Bonus Metadata Database Python Backend Pydap ncWMS Basemaps Front-end OPeNDAP and PyDAP Designed to be a: “discipline-neutral means of requesting and providing data across the [web]” Data Access Protocol (DAP) Open source Machine-to-machine transfer of scientiïŹc data Mostly supported by US scientiïŹc agencies (NOAA, NASA, NSF) James Hiebert PCIC Data Portal 2.0
  • 19. OPeNDAP and PyDAP Designed to be a: “discipline-neutral means of requesting and providing data across the [web]” Data Access Protocol (DAP) Open source Machine-to-machine transfer of scientiïŹc data Mostly supported by US scientiïŹc agencies (NOAA, NASA, NSF) 2014-02-18 PCIC Data Portal 2.0 Architecture Pydap OPeNDAP and PyDAP 1. PyDAP is the component of the data portal that actually provides the data download services. It’s an implementation of the OPeNDAP protocol which is designed to be a discipline neutral means of transferring data across the web. This protocol is open source and is designed to OS and application independent such that you can get data into whatever software you want to use to do your data analysis. It’s supported by mostly US scientiïŹc agencies such as NOAA, NASA and the National Science Foundation.
  • 21. OPeNDAP and PyDAP 2014-02-18 PCIC Data Portal 2.0 Architecture Pydap OPeNDAP and PyDAP 1. There are a number of diïŹ€erent OPenDAP servers out there, but PyDAP is the one that we use to serve all of the data itself. Its architecture is quite a bit more ïŹ‚exible than some of the other OpenDAP servers out there. This is a rough layout of the architecture. It has a number of “handlers” which are written to interpret diïŹ€erent data formats and translate them to the DAP structure. Then on the top end, there are numerous “responders” that translate the DAP structure into output formats that the user wants. 2. [describe more speciïŹcally which parts are our and which we use]
  • 22. Demos Architecture Bonus Metadata Database Python Backend Pydap ncWMS Basemaps Front-end How much of Pydap is our code?a a Source: hg churn pydap.handlers.pcic: 100% pydap.handlers.hdf5: 68.0% pydap.responses.netcdf: 61.5% pydap.handlers.sql: 12.3% pydap.handlers.csv: 3.7% pydap: 2.3% pydap.responses.xls: 1.3% pydap.responses.html: ? James Hiebert PCIC Data Portal 2.0
  • 23. How much of Pydap is our code?a a Source: hg churn pydap.handlers.pcic: 100% pydap.handlers.hdf5: 68.0% pydap.responses.netcdf: 61.5% pydap.handlers.sql: 12.3% pydap.handlers.csv: 3.7% pydap: 2.3% pydap.responses.xls: 1.3% pydap.responses.html: ? 2014-02-18 PCIC Data Portal 2.0 Architecture Pydap How much of Pydap is our code?a a Source: hg churn 1. To give you a bit of an idea of to what degree Pydap was “oïŹ€-the-shelf”, I ran the command “hg churn” on all of the pydap repositories, which measures the changes in the repository by lines of code. The fractions shown are the churn of PCIC staïŹ€ divided by the total churn of all committers. You can see that we wrote one handler by ourselves, the hdf and netcdf work is mostly ours, and for the rest of the modules we only had to make minimal changes.
  • 24. Demos Architecture Bonus Metadata Database Python Backend Pydap ncWMS Basemaps Front-end Big data, big RAM, BadRequest, Oh My! James Hiebert PCIC Data Portal 2.0
  • 25. Big data, big RAM, BadRequest, Oh My! 2014-02-18 PCIC Data Portal 2.0 Architecture Pydap Big data, big RAM, BadRequest, Oh My! 1. One of the technical problems that we ran up against was that all of the available OPeNDAP data servers load their responses entirely into RAM before sending them out. So if you want to serve up large data sets, the size of your response is limited by your available RAM divided by the number of concurrent responses that you are prepared to serve. If you try and make a request to, say, THREDDS OPeNDAP server that’s larger than the JVM allocated memory, the user will just get back a BadRequest error. 2. For some applications this may be ïŹne, or even desirable, but for the purposes of serving large data sets, the network pipe is usually the bottleneck. Rather than annoy and frustrate the user by forcing them to carve up their data requests to be arbitrarily small, we wanted to allow as large a request as the users were prepared to accept.
  • 26. Demos Architecture Bonus Metadata Database Python Backend Pydap ncWMS Basemaps Front-end Generators: 70’s tech that works today! a function which yields execution rather than returning yields values one at a time, on-demand low memory footprint faster; no calling overhead elegant! James Hiebert PCIC Data Portal 2.0
  • 27. Generators: 70’s tech that works today! a function which yields execution rather than returning yields values one at a time, on-demand low memory footprint faster; no calling overhead elegant! 2014-02-18 PCIC Data Portal 2.0 Architecture Pydap Generators: 70’s tech that works today! 1. Enter generators and coroutines. Generators are a programming control where a function, rather than returning, can yield execution and sort of return values one at a time on-demand. It has the performance advantage of maintaining a low memory footprint, if you want to return something large, you don’t have to do so all at once, and they tend to be slightly faster, because you avoid a lot of calling overhead of stack manipulation. 2. Generators have been around for a good thirty-ïŹve years, but have been experiencing a bit of a Renaissance lately. If one programs in python, they are extremely easy to use, and with the advent of big data applications, they have a lot of utility.
  • 28. Demos Architecture Bonus Metadata Database Python Backend Pydap ncWMS Basemaps Front-end Generator Example from i t e r t o o l s import i s l i c e def f i b o n a c c i ( ) : a , b = 0 , 1 while True : y i e l d a a , b = b , a+b # p r i n t the f i r s t 10 v a l u e s of the f i b o n a c c i sequen for x in i s l i c e ( f i b o n a c c i () , 10): print x James Hiebert PCIC Data Portal 2.0
  • 29. Generator Example from i t e r t o o l s import i s l i c e def f i b o n a c c i ( ) : a , b = 0 , 1 while True : y i e l d a a , b = b , a+b # p r i n t the f i r s t 10 v a l u e s of the f i b o n a c c i sequen for x in i s l i c e ( f i b o n a c c i () , 10): print x 2014-02-18 PCIC Data Portal 2.0 Architecture Pydap Generator Example 1. For those who aren’t familiar, here’s a quick example to understand generators. Generating a Fibonacci sequence is kind of the quintessential toy example. The generator function, ïŹbonacci(), is deïŹned at the top. You’ll notice that it’s an inïŹnite loop, because the sequence is by deïŹnition, inïŹnite. But rather than building up the values in memory, it just has a simple and elegant “yield” statement right inside the loop. The calling loop down below, actually pulls items from the function, one at a time, and then does whatever it needs to do with them. It’s fast, eïŹƒcient, and actually fairly elegant, readable code, too. 2. So you can see, for something like a web application serving big datasets, this is perfect, because we can provide a very low latency response, and then stream the data to the user as our high-latency operations like disk reads take place. 3. None of the OPeNDAP servers out there supported streaming, so many of the modiïŹcations that we made to PyDAP were for it to
  • 30. Demos Architecture Bonus Metadata Database Python Backend Pydap ncWMS Basemaps Front-end ncWMS OïŹ€-the-shelf Visualization of NetCDF rasters Full featured WMS server Limitations File-based layer conïŹgurations (tedious and error-prone!) Loads layers serially on startup (slow!) Scans layers for ranges (really slow!) James Hiebert PCIC Data Portal 2.0
  • 31. ncWMS OïŹ€-the-shelf Visualization of NetCDF rasters Full featured WMS server Limitations File-based layer conïŹgurations (tedious and error-prone!) Loads layers serially on startup (slow!) Scans layers for ranges (really slow!) 2014-02-18 PCIC Data Portal 2.0 Architecture ncWMS ncWMS 1. We’re using a modiïŹed version ncWMS to provide visualization of the climate rasters. It gives us a lot of stuïŹ€ for free. It’s a full featured Web Mapping Service server that converts netcdf ïŹles into tiled images usable on the web. [demo] 2. Unfortunately it has a few limitations that make it non-ideal for use with big data. To conïŹgure a layer, you have to go through the ïŹles, one-by-one and add them to the list and conïŹgure 5-10 diïŹ€erent attributes. Additionally, when ever you start, re-start the server, it goes through every single ïŹle, in order, scans them to determine their ranges, so that it can assign a colorbar. This can take many minutes, possibly hours, and it only gets slower the more layers you add. 3. David Bronaugh has done some great work making modiïŹcations to ncWMS to run it oïŹ€ of our metadata database, so that it gets its list of layers from the database and all of the variable ranges and everything. This has made it possible to scale our deployment up
  • 32. Demos Architecture Bonus Metadata Database Python Backend Pydap ncWMS Basemaps Front-end Mapnik and Basemaps Create our own basemaps from OpenStreetMap Maximum ïŹ‚exibility in domain and projection James Hiebert PCIC Data Portal 2.0
  • 33. Mapnik and Basemaps Create our own basemaps from OpenStreetMap Maximum ïŹ‚exibility in domain and projection 2014-02-18 PCIC Data Portal 2.0 Architecture Basemaps Mapnik and Basemaps 1. A ïŹ‚at image of the climate rasters aren’t that useful, especially if you want to look at details in a particular locality. So thanks to some great work by Basil, we have our own web basemaps based on data from the OpenStreetMap project. We have the ability to generate our own basemaps in any projection that we want and for any domain. And we have control over the tile service so we can tweak it for maximum performance.
  • 34. Demos Architecture Bonus Metadata Database Python Backend Pydap ncWMS Basemaps Front-end JavaScript Front-end 2600 lines of JavaScript Responsible for tying everything together for the web user Does little to no processing itself / just makes requests to various servers James Hiebert PCIC Data Portal 2.0
  • 35. JavaScript Front-end 2600 lines of JavaScript Responsible for tying everything together for the web user Does little to no processing itself / just makes requests to various servers 2014-02-18 PCIC Data Portal 2.0 Architecture Front-end JavaScript Front-end 1. Finally, the last piece of the software stack is the JavaScript front-end that ties everything else together for the user. This is probably the most ïŹnicky and possibly most complex piece of the code base even though it doesn’t actually provide any functionality in and of itself. It has be be aware of all of the various services that are provided, it has to asyncronously make the requests, process them, display things to the user, and often the results of one request aïŹ€ect other things on the page. 2. [Show dataset selection, and how it is a request. Show how dataset selection triggers layer change the loading of layer attributes]. If any of these things fails, badness ensues.
  • 37. Automated Testing 2014-02-18 PCIC Data Portal 2.0 Bonus Automated Testing 1. In our two main repositories, we have about 1500 lines of code speciïŹcally for automated testing of the functionality of both the PCDS data portal and the raster portals. This test suite covers a large swath of the code base, but is also compact so we can run the full test suite in less than 5 seconds. This is fast enough that it can be intergrated directly into your development workïŹ‚ow and you can ensure that any changes you make to the code have not negatively and unintendedly aïŹ€ected any previously programmed functionality.
  • 38. Demos Architecture Bonus Automated Testing Automated Testing Why? There’s a lot of code and many code paths. Manual testing is insane, takes days, and isn’t complete. Provides an “executable speciïŹcation” for what the software should do Provides a way to ensure that code changes don’t aïŹ€ect existing functionality (a.k.a. regression testing) James Hiebert PCIC Data Portal 2.0
  • 39. Automated Testing Why? There’s a lot of code and many code paths. Manual testing is insane, takes days, and isn’t complete. Provides an “executable speciïŹcation” for what the software should do Provides a way to ensure that code changes don’t aïŹ€ect existing functionality (a.k.a. regression testing) 2014-02-18 PCIC Data Portal 2.0 Bonus Automated Testing Automated Testing 1. So with a system that provide this much functionality, there are a lot of diïŹ€erent code paths through it, any of which could be taken for diïŹ€erent user requests. It’s important to test as many of these as possible, every time you make changes in the system. To manually go through all of these–and we did with the release of the PCDS portal a year ago–is meticulous, time consuming and error prone. Automating this process pays oïŹ€ very quickly both in time and in code quality. 2. Additionally, the tests provide a sort of “executable speciïŹcation”, declaring what the various pieces of the code are supposed to do. If a tests fails, your code doesn’t meet the spec. 3. Finally, the test suite provides a baseline against which further development cannot regress. It ensures that future changes will not negatively impact the functionality that we have previously developed. 4. [demo of pytest]
  • 40. Demos Architecture Bonus Automated Testing Questions and hopefully answers James Hiebert PCIC Data Portal 2.0