1. Fishnet 2
A Network of Ichthyology Collections Offering
Realtime Analysis and Visualization
2. Overview
Original FishNet and related / subsequent projects demonstrated
value of combining many collections into a global âvirtual
collectionâ (e.g. MaNIS, ORNIS, OBIS, GBIF)
Issues of data availability, quality, and completeness (georeferencing,
scientiïŹc names) and the technical performance and sustainability
of networks
NSF sponsored FishNet2 to evaluate enhancements and produce
software tools to generally improve and further enable data sharing
practices for natural history (and similar) collections
Ichthyology collections used for prototypes
Recent focus on performance and scalability of data portal enabling
real time search and analysis by many simultaneous users
ASIH - 2009 Vieglais, University of Kansas NSF-0415600
3. Sum of Sources
Collectively:
Specimens ->
More data
Taxonomic,
Spatial,
Temporal coverage
Collections ->
Combining data is beneïŹcial to all participants, leads to increased overall data
availability (repatriation), reliability and quality, and can improve the
efïŹciency of future investigations (gap analysis).
Sharing combined data is beneïŹcial to all stakeholders (investigators, funding
agencies, scientiïŹc community, public) and provides a more effective, efïŹcient
use of resources (reducing duplication of effort)
ASIH - 2009 Vieglais, University of Kansas NSF-0415600
4. Many Sources
Data Sources
Web Portal
Network / infrastructure issues:
Query broadcast
Data service maintenance
Network vulnerabilities
Multi-user scalability
A fully distributed model enables up-to-date information,
but suffers from severe reliability and scalability issues since
each query is broadcast to all data sources and the
responses collated.
Maintenance costs become signiïŹcant as more data sources
are added, especially when fundamental changes are made
to the protocol or data model.
ASIH - 2009 Vieglais, University of Kansas NSF-0415600
5. Addressing Social Sustainability
Operation Cost â portal + n(source)
Hardware 6 Hardware 1
Maintenance 3 Maintenance 1
Setup / Change 4 Setup / Change 1
Bandwidth 5 Bandwidth 1
18 4
Operation Cost â 18 + n*4
After 4-5 sources, these are most expensive element. Hence one
goal should be to minimize cost to data sources.
ASIH - 2009 Vieglais, University of Kansas NSF-0415600
6. Portal Design Goals
High performance. Query response < 1sec
Scalable. > 10e7 records, multiple users
Arbitrary content (focus on Darwin Core)
Integration with related data
Simple but effective subset and export
Programmatic interfaces for extensibility
Reporting of record use
Provide value for participation
ASIH - 2009 Vieglais, University of Kansas NSF-0415600
7. Components
Source
Manage UI Search UI Portal
Darwin Core
Data Portal Services
Direct
Web Server Upload
Index
DiGIR CSV
TAPIR
Record
Cache
Georef,
Env Data
Morphology, ...
ASIH - 2009 Vieglais, University of Kansas NSF-0415600
8. ASIH - 2009 Vieglais, University of Kansas NSF-0415600
9. ASIH - 2009 Vieglais, University of Kansas NSF-0415600
11. The Future
Current solution forms a good basis for ongoing infrastructure.
Extensible to new data, scalable to very large data sets.
Technical issues are not the major problem. Viable solutions are
available, and issues of data translation and transfer are for the
most part well deïŹned
There are real costs associated with data sharing, so return on
investment must be satisfying to participants
Biggest problem is long term sustainability of the technical
infrastructure, especially the data sources which comprise the bulk
of the costs.
Attract ongoing support from the community and direct ïŹscal and
in-kind assistance to appropriate targets (e.g. maintenance of
portal(s) versus deployment of new data sources)
ASIH - 2009 Vieglais, University of Kansas NSF-0415600