the Neuroscience Information Framework has over 100 big data databases indexed, allowing us to ask big data landscape questions. Anita Bandrowski presents an overview of the NIF system and provides insights into the addiction data landscape to JAX laboratories.
2. Overview
• Brief overview of NIF philosophy
• Examples of data about addiction
• Why you should never use google to
answer any scientific question
• How can we make google better?
3. Power!
• How many subject/patients do we need to be
relatively certain that we are correct?
• More than you can afford?
• If YFGM gave each of you 1B dollars, would
that solve the problem?
• But, what if:
– Big data from small data?
7. • NIF is an initiative of the NIH Blueprint consortium of institutesNIF is an initiative of the NIH Blueprint consortium of institutes
– What types of resources (data, tools, materials, services) are available to theWhat types of resources (data, tools, materials, services) are available to the
neuroscience community?neuroscience community?
– How many are there?How many are there?
– What domains do they cover? What domains do they not cover?What domains do they cover? What domains do they not cover?
– Where are they?Where are they?
• Web sitesWeb sites
• DatabasesDatabases
• LiteratureLiterature
• Supplementary materialSupplementary material
– Who uses them?Who uses them?
– Who creates them?Who creates them?
– How can we find them?How can we find them?
– How can we make them better in the future?How can we make them better in the future?
http://neuinfo.org
• PDF filesPDF files
• Desk drawersDesk drawers
8. NIF: A New Type of Entity for NewNIF: A New Type of Entity for New
Modes of Scientific DisseminationModes of Scientific Dissemination
• NIF’s mission is to maximize the awareness of, access to and
utility of digital resources produced worldwide to enable better
science and promote efficient use
– NIF unites neuroscience information without respect to domain, funding
agency, institute or community
– NIF is a library for scholarly output that is a web enabled resource and
not a paper
– Aggregates all the different databases, tools and resources now
produced by the scientific community
– Makes them searchable from a single interface
– A practical approach to the data deluge
– Educate neuroscientists and students about effective data sharing
9. Surveying the resource landscapeSurveying the resource landscape
NIF resource registry: listing of > 6000 databases, tools,
materials, services, websites (> 2500 databases)
NIF resource registry: listing of > 6000 databases, tools,
materials, services, websites (> 2500 databases)
10. NIF data federation: Pub Med Central for dataNIF data federation: Pub Med Central for data
NIF was designed to accommodate the multiplicity of heterogeneous and distributed data
resources, providing deep query of the contents and unified views
NIF was designed to accommodate the multiplicity of heterogeneous and distributed data
resources, providing deep query of the contents and unified views
200 sources
> 360 M records
200 sources
> 360 M records
11. NIF Semantic Framework: NIFSTD ontologyNIF Semantic Framework: NIFSTD ontology
• NIF covers multiple structural scales and domains of relevance to neuroscience
• Aggregate of community ontologies with some extensions for neuroscience, e.g., Gene
Ontology, Chebi, Protein Ontology
NIFSTDNIFSTD
OrganismOrganism
NS FunctionNS FunctionMoleculeMolecule InvestigationInvestigationSubcellular
structure
Subcellular
structure
MacromoleculeMacromolecule GeneGene
Molecule DescriptorsMolecule Descriptors
TechniquesTechniques
ReagentReagent ProtocolsProtocols
CellCell
ResourceResource InstrumentInstrument
DysfunctionDysfunction QualityQualityAnatomical
Structure
Anatomical
Structure
Ontologies provide the universals for integrating across disparate
data by linking them to human knowledge models
Ontologies provide the universals for integrating across disparate
data by linking them to human knowledge models
12. Neurolex: Machine-processable
concepts for neuroscience
• Machine-processable lexical
units
• Connected via relationships
• Identified by a unique
identifier (URL)
• Computable index for
neuroscience
• Framework for linking
knowledge, claims and data
Built using a semantic wikiBuilt using a semantic wiki
13. NIF Analytics: The Neuroscience Landscape
Ontologies provide a semantic framework for understanding
data/resource landscape
Ontologies provide a semantic framework for understanding
data/resource landscape
Where are the data?
Striatum
Hypothalamus
Olfactory bulb
Cerebral cortex
Brain
Brainregion
Data source
Vadim Astakhov, Kepler Workflow Engine
15. Genetics of addiction?
Gene
Protein
Subcellular components
Cells
Cell microcircuits
Cell macrocircuits
Networks
Brain regions
PNS
Whole organism
Behaving organism (environment)
Networks of organisms
Populations
16. Genetics of addiction?
Gene
Protein
Subcellular components
Cells
Cell microcircuits
Cell macrocircuits
Networks
Brain regions
PNS
Whole organism
Behaving organism (environment)
Networks of organisms
Populations
17. Genetics of addiction?
• Addiction is a disease of subpopulations of humans who take
sociologically undesirable drugs or sociologically desirable
drugs at undesirable concentrations
• Drug is a molecule that does not exist in the body, an
environmental factor
• Drugs are metabolized by the digestive system and act after
crossing the BBB
• Drugs modify the activity of existing proteins on vastly
different time scales
• Drugs modify behaviors that depend on the actions of an
orchestra of neurons acting within circuits that all have a
purpose that is not to take drugs
18. The ecosystem is diverse and messy (and that’s OK)The ecosystem is diverse and messy (and that’s OK)
NIF favors a hybrid, tiered,
federated system
• Domain knowledge
– Ontologies
• Claims and observations
– Virtuoso RDF triples
• Data
– Data federation
– Spatial data
– Workflows
• Narrative
– Full text access
NeuronNeuron Brain partBrain part DiseaseDisease
OrganismOrganism GeneGene
Caudate projects to
Snpc
Caudate projects to
Snpc Grm1 is upregulated
in chronic cocaine
Grm1 is upregulated
in chronic cocaine
Betz cells
degenerate in ALS
Betz cells
degenerate in ALS
Data KnowledgeData Knowledge
19. Wish list: Cooperative science
• A mission that will engage the entire neuroscience
community and beyond
• An active community contribution model where everyone is
expected to contribute their outputs, not just a selected few
– Diverse contributions are tracked and recognized
– Spatial-semantic-genetic-temporal frameworks make data
discoverable-usable-integratable and help fill in the gaps
• A platform that moves neuroscience into the web
– Networking data, knowledge, tools, models, efforts, people, compute
resources, simulation
– Supports digital research objects as first order contributions, not just
narrative
– Works through and with existing platforms to improve them where
possible
Cooperative system: “...individual components that appear to be “selfish” and independent work
together to create a highly complex, greater-than-the-sum-of-its-parts system.”
Cooperative system: “...individual components that appear to be “selfish” and independent work
together to create a highly complex, greater-than-the-sum-of-its-parts system.”
20. 20
neurolex.org
•INCF Community encyclopedia
•Standardize vocabulary
•Define all vocabulary, terms, protocols, brain
structures, diseases, etc
•Living review articles
•Build and maintain working ontologiesLinks
to data, models and literature
•Semantic organization, search, analysis and
integration
•Global directory of all shared vocabularies,
CDEs, etc
Slide courtesy of Sean HillSlide courtesy of Sean Hill