SlideShare ist ein Scribd-Unternehmen logo
1 von 66
Downloaden Sie, um offline zu lesen
BIODIVERSITY OCCURRENCES AND
PATTERNS FROM THE ANGLE OF
SYSTEMATICS
Julien TROUDET
Directeurs de thèse : Régine Vignes-Lebbe, Frédéric Legendre
Institut de Systématique, Evolution, Biodiversité
ISYEB - UMR 7205 - CNRS, MNHN, UPMC, EPHE, Sorbonne Université
Equipe Evolution fonctionnelle et Systématique (EVOFONC)
Laboratoire Informatique & Systématique (LIS)
■ ANTONELLI Alexandre
■ LESSARD Jean-Philippe
■ ARCHAMBEAU Anne-Sophie
■ PAGE Rod
■ VIGNES-LEBBE Régine
■ LEGENDRE Frédéric
University of Gothenburg
Concordia University
GBIF France
University of Glasgow
UMR 7205 ISYEB MNHN
UMR 7205 ISYEB MNHN
Rapporteur
Rapporteur
Examinateur
Examinateur
Directeur de thèse
Directeur de thèse
Jury:
1992
25 years later
2
Introduction
3
Remaining populations of indigenous species as a percentage of their original
populations (Newbold et al. 2016)
This sense of urgency has been seized by ecologists and conservationists:
Has land use pushed terrestrial
biodiversity beyond the planetary
boundary? A global assessment
Newbold et al. 2016
Biodiversity loss and its impact
on humanity
Cardinale et al. 2012
Biodiversity hotspots for
conservation priorities
Myers et al. 2000
Biodiversity is eroding at an
accelerating pace.
4
Systematics also takes up
the challenge and gives
itself the means to respond
to this urgency.
Introduction
Post-molecular systematics and
the future of phylogenetics
Pyron 2015
How Many Kinds of Birds Are
There and Why Does It Matter?
Barrowclough et al. 2016
Assessing data quality in citizen
science
Kosmala et al 2016
Systematics produces data essential
to biodiversity sciences.
Ecology
Phylogenetics
Conservation
Taxonomy
Pest control
5
Introduction
The angle of systematics,
a complementary perspective
Systematists are the largest
producers of biodiversity occurrences
with 1.2 to 2.1 billions of specimens in
museum collections (Ariño 2010).
Systematists have a unique point of view
on biodiversity, giving all taxa the same
significance and adding a historical
context to the study of biodiversity.
Ciccarelli et al. 2006
6
Introduction
Data producers and
users
Global biodiversity
patterns at large
taxonomic scale: which
factors shape them?
Considering the largest taxonomic
scale possible to produce
generalizable outputs
How is the practice of
biodiversity data
gathering evolving?
Can biological diversity
be investigated in its
entirety?
As data producers, systematists have a special position to characterize biodiversity data
before using it.
The angle of systematics,
a complementary perspective
7
Introduction
Data producers and
users
Global biodiversity
patterns at large
taxonomic scale: which
factors shape them?
Considering the largest taxonomic
scale possible to produce
generalizable outputs
How is the practice of
biodiversity data
gathering evolving?
Can biological diversity
be investigated in its
entirety?
The angle of systematics,
a complementary perspective
Plan
8
▪ Biodiversity occurrences
▪ Methods for Big-data
▪ Less specimens, more observations
▪ Taxonomic bias and societal preferences
▪ Productivity shapes the latitudinal
diversity gradient
▪ Conclusion
1.
Biodiversity occurrences
The raw material of biodiversity sciences
Occurrences= Primary Biodiversity Data
10
What ?
Where ?
When ?
Scientists, especially
systematists are the first
producers of biodiversity
data.
Citizen science projects
produce biodiversity data
for specific needs and uses
scientific supervision.
Networks of amateur
naturalists are important
producers of data
especially for birds and
other vertebrate taxa
Integration of occurrences in databases
11
Collection digitization
Data production
A non-exhaustive map
of the global and
European biodiversity
informatics landscape
(Bingham et al. 2017)
Primary biodiversity data are created by many
producers, however most of it is created by either
digitizing existing data or by producing new data
Integration of occurrences in databases
12
Collection digitization
Data production
A non-exhaustive map
of the global and
European biodiversity
informatics landscape
(Bingham et al. 2017)
Primary biodiversity data are created by many
producers, however most of it is created by either
digitizing existing data or by producing new data
Integration of occurrences in databases
13
Collection digitization
Data production
A non-exhaustive map
of the global and
European biodiversity
informatics landscape
(Bingham et al. 2017)
Primary biodiversity data are created by many
producers, however most of it is created by either
digitizing existing data or by producing new data
From occurrences to databases
What is GBIF?
GBIF—the Global Biodiversity
Information Facility—is an open-
data research infrastructure
funded by the world’s
governments and aimed at
providing anyone, anywhere
access to data about all types of
life on Earth.
■ 1,118 Publishers
■ 36,825 Datasets
■ 856,055,455 occurrences
The Global Biodiversity Information Facility (GBIF) connections.
(Bingham et al. 2017)14
From occurrences to databases
What is GBIF?
GBIF—the Global Biodiversity
Information Facility—is an open-
data research infrastructure
funded by the world’s
governments and aimed at
providing anyone, anywhere
access to data about all types of
life on Earth.
■ 1,118 Publishers
■ 36,825 Datasets
■ 856,055,455 occurrences
15
2.
Methods for big-data
Processing millions of data in reasonable time
A dataset of 626 million occurrences
17
Occurrences accumulation in the GBIF
Exponential growth
The number of occurrences mediated by
the GBIF is growing exponentially.
57 million occurrences were recorded in
2014, which is more than 5 times the
amount of data recorded in 2004 (11
million).
The uncompressed volume of the GBIF data is approximately
500 GigaBytes. This volume of data is 400 times smaller
than the volume of data that the Gaia mission will produce
(200 TeraBytes).
Handling so many occurrences is a
methodological challenge.
DwCSP a fast biodiversity occurrence curator (in preparation for Bioinformatics)
A custom software
18
Darwin Core Spatial Processor
A tool to manipulate large amount of primary
biodiversity data
▪ Data enrichment using spatial files
(shapefiles and raster files)
▪ Spatial outliers detection
▪ Environmental outlier detection
Manipulating the GBIF data required to set up multiple systems, scripts and databases to, clean
and filter the data, compute statistics, visualize results on a map, etc.
Some tools used during the PhD: Java, R, PostgreSQL, QGIS
620 million occurrences in more than 60,000
species were processed.
DwCSP a fast biodiversity occurrence curator (in preparation for Bioinformatics)
A custom software
19
Darwin Core Spatial Processor
A tool to manipulate large amount of primary
biodiversity data
▪ Data enrichment using spatial files
(shapefiles and raster files)
▪ Spatial outliers detection
▪ Environmental outlier detection
Manipulating the GBIF data required to set up multiple systems, scripts and databases to, clean
and filter the data, compute statistics, visualize results on a map, etc.
Some tools used during the PhD: Java, R, PostgreSQL, QGIS
620 million occurrences in more than 60,000
species were processed.
Working with the GBIF data
20
For each species
1. Put the occurrences on a
grid
2. Keep species with 20 or
more occurrences
example:
Lacerta bilineata
Working with the GBIF data
21
For each species
1. Put the occurrences on a
grid
2. Keep species with 20 or
more occurrences
3. Detection of spatial outliers
example:
Lacerta bilineata
Working with the GBIF data
22
For each species
1. Put the occurrences on a
grid
2. Keep species with 20 or
more occurrences
3. Detection of spatial outliers
4. Detection of climatic
outliers
example:
Lacerta bilineata
Working with the GBIF data
23
For each species
1. Put the occurrences on a
grid
2. Keep species with 20 or
more occurrences
3. Detection of spatial outliers
4. Detection of climatic
outliers
5. Extrapolating species
distribution using niche
modelling
example:
Lacerta bilineata
2.
Less specimens, more
observations
A change in the primary biodiversity
data paradigm
A large and heterogeneous data set
25
The amount of GBIF-mediated data is
increasing exponentially.
GBIF-mediated data are very heterogenous
because of numerous data producers.
Occurrences accumulation in the GBIF
How is the practice of
biodiversity data
gathering evolving?
How do recent and old biodiversity
occurrences differ ?
Does the increase in data quantity
comes with an increase in data quality?
Two types of occurrences
26
Specimen-based and Observation-based occurrences are not identical.
The possible uses for an observational occurrence are limited by the ancillary data
collected during the observation, whereas a specimen can be analyzed in various ways
at a later stage.
The increasing disconnection of primary biodiversity data from specimens: How does it happen and how to handle it? (Systematic Biology, under review)
The number of observation-based
occurrences added to the GBIF is
growing at an exponential rate,
while the number of specimen-
based occurrences stay stable.
27
A general shift
Actinopterygii Aves Insecta Magnoliopsida Reptilia
Year
Proportion
Proportion
In proportion, a clear shift is visible
for the 24 taxonomic classes
studied.
The increasing disconnection of primary biodiversity data from specimens: How does it happen and how to handle it? (Systematic Biology, under review)28
Ancillary data as the solution?
Observation-based occurrences can
be complemented with ancillary data.
...and most of ancillary data are linked to specimen-
based occurrences.
Yet, ancillary data would be most useful to check or
update observation-based occurrences.
Ancillary data - such as DNA
sequences or multimedia files
(photo, video, recordings…) - are
more and more affordable to collect.
Still, very few GBIF-mediated
occurrences are linked to digital or
molecular data...
The increasing disconnection of primary biodiversity data from specimens: How does it happen and how to handle it? (Systematic Biology, under review)29
Recent data are of better quality
Spatial precision is improving (spatial issues decrease)
Overall, there is an improvement in the quality of biodiversity occurrences.
Taxonomic precision is improving
The increasing disconnection of primary biodiversity data from specimens: How does it happen and how to handle it? (Systematic Biology, under review)30
Conclusion & recommendations
We recommend to prioritise the
production of ancillary data in the
following order:
1. Specimens
2. Material samples (DNA)
3. Multimedia files
4. Detailed observation
Data producers have taken
the habit of providing precise
GPS coordinates and
taxonomic identification
In the age of smartphones
and global access to internet
it should be a priority to
encourage data producers to
link pictures and other
additional data to any
occurrence they create
More and more primary biodiversity data are not linked to voucher specimens (i.e. observations).
In addition, a very small proportion of these observations have auxiliary data. This situation
weakens the possibilities for future biodiversity studies relying on this data.
3.
Taxonomic bias and
societal preferences
The public could influence which taxa
are the most studied.
32
Biodiversity occurrences: a biased dataset
Biodiversity occurrences are not collected evenly.
A well known bias is the spatial bias (Meyer et al.
2015). Some areas of the world are far most sampled
than others. Similarly, some taxa are more studied
than others.
This taxonomic bias has been studied at
small scales.
▪ for a single field such as
conservation (Di Marco et al. 2017)
▪ for specific taxa (Ford et al. 2017)
33
Biodiversity occurrences: a biased dataset
First recommendation of Faith et al. (2013):
Biases must be recognised in biodiversity
sciences and efforts produced to bridge
them.
We used 24 classes (large taxonomic scale).
We analyzed 626 million occurrences.
We tested the ‘societal preferences’ and
‘taxonomic research’ hypotheses:
▪ The public preferences influence and bias
the choice of study organisms.
(Stahlschmidt 2011) (Number of web
pages)
▪ Scientific reasons and limitations lead
and orientate biodiversity data gathering.
(Number of scientific publications)Why only 17 % of bird species in the GBIF
have less than 20 occurrences while 79 %
of insect species are in the same case?
Taxonomic bias in biodiversity data and societal preferences (published in Scientific Reports, 2017)34
A bias affecting data quantity
In the GBIF, some groups have far more
occurrences than others even if those
groups are less speciose.
Millions of
occurrences
in the GBIF
Thousands of
species
in the GBIF
Median number
of occurrences
per species
Aves 345.11 12.82 371
Magnoliopsida 118.21 261.01 19
Insecta 46.78 352.78 3
Mammalia 10.78 11.53 15
Reptilia 4.98 11.30 24
Lecanoromycetes 4.97 17.79 8
Amphibia 3.94 5.89 54
Total in the GBIF 649.79 1200.38 6
Taxonomic bias in biodiversity data and societal preferences (published in Scientific Reports, 2017)35
A bias that increases over time
Looking at the quantity of occurrences produced
through time, the taxonomic bias is only worsening.
Millionsofoccurrences
Taxonomic bias in biodiversity data and societal preferences (published in Scientific Reports, 2017)36
A bias affecting data quality
The GBIF-mediated data also exhibit a
taxonomic bias in data quality.
The proportion of occurrences identified at the
species level varies across classes.
99 % 92 %
77 % 69 %
Taxonomic bias in biodiversity data and societal preferences (published in Scientific Reports, 2017)
Analysing more than 40,000 species, 39 of 47 generalized linear model showed a
positive correlation between the quantity of data per species and the number of web
pages for the species (societal influence).
37
Societal preferences influence
Vanessa atalanta
462,000 google results
528,227 occurrences
Misumena vatia
137,000 google results
5,556 occurrences
Taxonomic bias in biodiversity data and societal preferences (published in Scientific Reports, 2017)38
Conclusion
These results should encourage biodiversity researchers to communicate even more
with the public. If taxonomic bias is linked to societal preferences, the rise of citizen
science could further exacerbate this bias.
Hypotheses related to specific characteristics, such as species size or range, could
also explain this bias and should be explored.
▪ The public preferences influence the choice of
study organisms.
▪ Scientific reasons and limitations orientate biodiversity data gathering.
4.
The Latitudinal
Diversity Gradient
Exploring a global biodiversity pattern
with GBIF-mediated data
“
Thus, the nearer we approach
the tropics, the greater the
increase in the variety of
structure, grace of form, and
mixture of colours, as also in
perpetual youth and vigour of
organic life.”
40
Alexander von Humboldt, 1807
Latitudinal Diversity Gradient (LDG)
41
Organisms diversity tends to
be the highest near the
equator and diminishes as
we move towards the poles.
This pattern is called the
Latitudinal diversity gradient
Here biodiversity is quantified using
species richness.
The next results focus on terrestrial
taxa.
In situ hypotheses examples:
Latitudinal Diversity Gradient: Geometric hypotheses revisited using massive biodiversity occurrences in plants and animals of the New World
(in preparation)
42
A multitude of hypotheses
More than 30 hypotheses
have been formulated to
explain the formation of the
LDG (Willig et al. 2003).
Historical hypotheses
propose diversification
mechanisms that were not
tackled in this study.
In situ hypotheses propose
that environmental factors
shape the LDG.
Geometric hypotheses see
the LDG as a geometrical
artifact caused by random
species repartitions.
Productivity hypothesis
There is a positive correlation
between actual evapotranspiration
(productivity) and the species
richness of terrestrial birds (Hawkins
et al. 2003)
Ambient energy hypothesis
The species richness of terrestrial
birds in western palearctic areas is
related to the annual temperature
(Hawkins et al. 2003)
Latitudinal Diversity Gradient: Geometric hypotheses revisited using massive biodiversity occurrences in plants and animals of the New World
(in preparation)
43
The geometric hypotheses
The first geometric hypothesis: Colwell &
Hurt (1994), Colwell & Lees (2000). This
hypothesis is also called the mid-domain
effect.
An updated and untested geometric
hypothesis: Gross & Snyder-Beattie (2016)
Considering species have different
latitudinal range sizes, the random location
of those ranges on the globe would result
in more species near the equator.
Latitudinal Diversity Gradient: Geometric hypotheses revisited using massive biodiversity occurrences in plants and animals of the New World
(in preparation)
44
The dataset structure
Because of taxonomic and spatial biases (some areas are better sampled than others), we have
tested the LDG on 8 taxonomic classes on the New World.
▪ Amphibia
▪ Aves
▪ Liliopsida
▪ Magnoliopsida
▪ Mammalia
▪ Pinopsida
▪ Polypodiopsida
▪ Reptilia
208 millions occurrences
62,099 species
For each class, a list of geographic cells covering the New
World with species richness and explanatory variables values
is computed.
Latitudinal Diversity Gradient: Geometric hypotheses revisited using massive biodiversity occurrences in plants and animals of the New World
(in preparation)
45
Characterizing the LDG
A ‘classic’ LDG pattern was
found for 7 out of the 8
tested classes.
For these 7 classes, a
higher species richness
occurs between the -30°
and 30° lines of latitude
(dotted lines).
Pinopsida showed an
atypical pattern.
Latitudinal Diversity Gradient: Geometric hypotheses revisited using massive biodiversity occurrences in plants and animals of the New World
(in preparation)
46
Testing the hypotheses
The supported hypotheses were either both the productivity and ambient energy hypotheses
(Amphibia and Mammalia) or only the productivity hypothesis (Aves, Liliopsida, Magnoliopsida,
Polypodiopsida, Reptilia). The Gross & Snyder-Beattie hypothesis was rejected.
Non-spatial
stepwise
regression
Moran’s I test
for spatial
autocorrelation
Spatial
regression
(Spatial lag and
error model)
Best model
selection
Selection of
significant
variables
~ ~
Species
richness
Species richness
+
~ Evapotranspiration
+ Annual mean temperature
Productivity and Ambient energy
~ Evapotranspiration
Productivity
Latitudinal Diversity Gradient: Geometric hypotheses revisited using massive biodiversity occurrences in plants and animals of the New World
(in preparation)
▪ The null hypothesis proposed by
Gross & Snyder-Beattie doesn’t fit in
our model once the spatial
autocorrelation is taken into account.
47
Conclusion on the LDG
▪ The relative contribution of the
productivity hypothesis seems to be
the highest.
▪ There is a clear LDG for 7 out of the 8
class studied.
▪ More factors should be tested, especially
those included in historical hypotheses.
5.
Conclusion
49
How is the practice of biodiversity data
gathering evolving?
We recommend to encourage the production
of ancillary data (samples, photos…) along
with biodiversity occurrences.
The role of citizen sciences in the evolution of
this new paradigm should also be investigated.
The proportion of specimen-based occurrences
is rapidly falling behind the proportion of
observation-based occurrences In the GBIF
dataset.
This situation is worrying
because it jeopardizes the
feasibility and reliability of
some studies based on
biodiversity occurrences.
Year
Proportion
Some taxa are less known and
studied than others and this
situation worsens with time.
This taxonomic bias seems to be
linked to societal preferences.
50
Can biological diversity be investigated in
its entirety?
What can be said when looking at the species level? Do the most sampled species in the
GBIF also suggest a link with societal preferences? Changing the study scale might reveal
additional influences (species characteristics, data providers, etc.)
Class
Number
of species
Aves 611
Magnoliopsida 228
Liliopsida 70
Insecta 28
Actinopterygii 18
Mammalia 12
Polypodiopsida 7
14 next classes 26
1000 best sampled species
in the GBIF
The geometric hypothesis proposed by Gross &
Snyder-Beattie has been rejected while the
productivity hypothesis seems to be confirmed.
The Latitudinal Diversity Gradient pattern is clear
for 7 classes.
51
Global biodiversity patterns at large
taxonomic scale: which factors shape them?
More studies should be
done, this time including
the historical
hypotheses.
Other geographic
regions and biodiversity
patterns could be
studied.
The spatial non-
stationarity of the
models should be
explored.
THANKS!
To the jury members: Alexandre ANTONELLI, Jean-Philippe LESSARD, Anne-Sophie
ARCHAMBEAU and Rod PAGE
To the members of my PhD committee: Roseli Pellens, Philippe Grandcolas, Wilfried
Thuiller, Samy Gaiji and Jérôme Sueur.
To my thesis supervisors: Régine VIGNES-LEBBE and Frédéric LEGENDRE
Thanks to the people at the GBIF France and the International GBIF Node in Copenhagen
for their help and advices.
Many thanks to all the amazing people I get to interact with during this PhD in particular
all the people from the Institut de Systématique, Évolution, Biodiversité who welcomed
me during those three years.
Many thanks again to all the colleagues, friends and family who supported me and helped
me ! Words can’t express all my gratitude !
52
References▪ Ariño, Arturo H. « Approaches to Estimating the Universe of Natural History Collections Data ». Biodiversity Informatics 7, nᵒ 2 (2010). bi.v7i2.3991.
▪ Barrowclough, George F., Joel Cracraft, John Klicka, et Robert M. Zink. « How Many Kinds of Birds Are There and Why Does It Matter? » PLOS ONE 11, nᵒ 11 (2016):
e0166307.
▪ Bingham, Heather, Lauren Weatherdon, Katherine Despot-Belmonte, Florian Wetzel, et Corinne Martin. « The Biodiversity Informatics Landscape: Elements, Connections
and Opportunities ». Research Ideas and Outcomes 3 (2017): e14059.
▪ Cardinale, Bradley J., J. Emmett Duffy, Andrew Gonzalez, David U. Hooper, Charles Perrings, Patrick Venail, Anita Narwani, et al. « Biodiversity Loss and Its Impact on
Humanity ». Nature 486, nᵒ 7401 (2012): 59|67
▪ Ciccarelli, Francesca D., Tobias Doerks, Christian von Mering, Christopher J. Creevey, Berend Snel, et Peer Bork. « Toward Automatic Reconstruction of a Highly Resolved Tree
of Life ». Science (New York, N.Y.) 311, nᵒ 5765 (2006): 1283|87.
▪ Colwell, Robert K., et George C. Hurtt. « Nonbiological Gradients in Species Richness and a Spurious Rapoport Effect ». The American Naturalist 144, nᵒ 4 (1994): 570|95.
▪ Colwell, Robert. K., et David C. Lees. « The mid-domain effect: geometric constraints on the geography of species richness ». Trends in Ecology & Evolution 15, nᵒ 2 (2000):
70|76.
▪ Di Marco, Moreno, Sarah Chapman, Glenn Althor, Stephen Kearney, Charles Besancon, Nathalie Butt, Joseph M. Maina, et al. « Changing trends and persisting biases in three
decades of conservation science ». Global Ecology and Conservation 10 (avril 2017): 32|42.
▪ Faith, Dan, Ben Collen, Arturo Ariño, Patricia Koleff Patricia Koleff, John Guinotte, Jeremy Kerr, et Vishwas Chavan. « Bridging the Biodiversity Data Gaps: Recommendations
to Meet Users’ Data Needs ». Biodiversity Informatics 8, nᵒ 2 (2013).
▪ Ford, Adam T., Steven J. Cooke, Jacob R. Goheen, et Truman P. Young. « Conserving Megafauna or Sacrificing Biodiversity? » BioScience 67, nᵒ 3 (2017): 193|96.
▪ Gross, Kevin, et Andrew Snyder-Beattie. « A General, Synthetic Model for Predicting Biodiversity Gradients from Environmental Geometry ». The American Naturalist 188, nᵒ
4 (2016): E85|97.
▪ Hawkins, Bradford A., Eric E. Porter, et José Alexandre Felizola Diniz-Filho. « Productivity and History as Predictors of the Latitudinal Diversity Gradient of Terrestrial Birds ».
Ecology 84, nᵒ 6 (2003): 1608|23.
▪ Humboldt, Alexander. « Views of nature, or, Contemplations on the sublime phenomena of creation ». (1807)
▪ Kosmala, Margaret, Andrea Wiggins, Alexandra Swanson, et Brooke Simmons. « Assessing Data Quality in Citizen Science ». Frontiers in Ecology and the Environment 14, nᵒ
10 (2016): 551-60.
▪ Meyer, Carsten, Holger Kreft, Robert Guralnick, et Walter Jetz. « Global Priorities for an Effective Information Basis of Biodiversity Distributions ». Nature Communications 6
(2015): 8221.
▪ Myers, Norman, Russell A. Mittermeier, Cristina G. Mittermeier, Gustavo A. B. da Fonseca, et Jennifer Kent. « Biodiversity Hotspots for Conservation Priorities ». Nature 403,
nᵒ 6772 (2000): 853|58.
▪ Newbold, Tim, Lawrence N. Hudson, Andrew P. Arnell, Sara Contu, Adriana De Palma, Simon Ferrier, Samantha L. L. Hill, et al. « Has Land Use Pushed Terrestrial Biodiversity
beyond the Planetary Boundary? A Global Assessment ». Science 353, nᵒ 6296 (2016): 288.91.
▪ Pyron, R. Alexander. « Post-Molecular Systematics and the Future of Phylogenetics ». Trends in Ecology & Evolution 30, nᵒ 7 (2015): 384|89.
▪ Ripple, William J., Christopher Wolf, Thomas M. Newsome, Mauro Galetti, Mohammed Alamgir, Eileen Crist, Mahmoud I. Mahmoud, et William F. Laurance. « World Scientists’
Warning to Humanity: A Second Notice ». BioScience, (2017).
▪ Stahlschmidt, Zachary R. « Taxonomic Chauvinism Revisited: Insight from Parental Care Research ». PLOS ONE 6, nᵒ 8 (2011): e24192. journal.pone.0024192.
▪ Willig, M.R., D.M. Kaufman, et R.D. Stevens. « LATITUDINAL GRADIENTS OF BIODIVERSITY: Pattern, Process, Scale, and Synthesis ». Annual Review of Ecology, Evolution, and
Systematics 34, nᵒ 1 (2003): 273|309.
▪ World Scientists' Warning to Humanity (1992)53
Images Credits▪ Collections: naturalhistory.si.edu
▪ Coleoptères: mnhn.fr
▪ Butterfly box
▪ Smartphone: harperpark.ca
▪ herbaria: mnhn.fr
▪ Coleoptera box: museedesconfluences.fr
▪ Alcool collection: mnhn.fr
▪ Ornithologist: nature.org
▪ Metabarcoding: transmittingscience.org
54
▪ Gaia mission: sci.esa.int
▪ Lacerta bilineata: Wikimedia Commons
▪ entomologist: Wikimedia Commons
▪ butterfly boxes: http://nmnh.typepad.com
▪ birds checklist: Sasol Checklist Birds Of
Southern Africa
▪ GPS: dir.indiamart.com
▪ Vanessa atalanta: Wikimedia Commons
▪ Misumena vatia: flikr (Wayne Davies)
▪ LDG shema: news.illinois.edu
Jump in the number of occurrences
55
Magnoliopsida Reptilia
1945: 890,000 occurrences
Occurrence Data of Vascular Plants
collected or compiled for the Flora of
Bavaria
2012: 470,000 occurrences
Geographically tagged INSDC sequences
Published by European Molecular Biology
Laboratory (EMBL)
Taxonomic bias in biodiversity data and societal preferences (published in Scientific Reports, 2017)56
A bias affecting data quality
A Multiple Correspondence Analysis shows
that old occurrences have a tendency to have
more issues.
The proportions of spatial and temporal
issues are different between classes,
showing once again a bias in quality.
Taxonomic bias in biodiversity data and societal preferences (published in Scientific Reports, 2017)57
A bias affecting data quality
Some classes have far more specimen-based
occurrences than other.
58
The geometric hypotheses
The first geometric hypothesis: Colwell & Hurt
(1994), Colwell & Lees (2000). This hypothesis is
also called the mid-domain effect.
Rejected by Currie & Kerr 2007, 2008
Zappata et al. 2003
Thirty species with a uniform frequency
distribution of range sizes are placed
randomly within a finite domain (bottom
panels). Horizontal lines indicate range sizes
and midpoints indicate the position of ranges
along the 1-D domain. The overlap among
ranges on the bottom panels produces the
pattern of species richness observed on the
upper panels (dotted line). The mean value of
species richness that results from
repositioning the ranges randomly within the
domain 50 times is indicated by a solid line
59
The geometric hypotheses
An updated and untested geometric
hypothesis: Gross & Snyder-Beattie (2016)
60
The complex relationship between the
public and the scientists
Martín-López et al. 2009
Conceptual model representing the
primary connections among
scientific information, scientific
activity and policy, nongovernmental
organizations (NGOs), society
and environmental administrations
(or departments) at different
governmental scales that establish
and allocate funds for species
conservation
61
Other hypotheses behind taxonomic bias
Detectability Ease of
Identification
Encounter
probability
Ease of
Observation
Body size increase - + +
Reachable habitat + NA NA
High abundance + NA NA
Discrete
behaviour
NA - NA
Large range + NA NA
Diurnal taxon + + NA
Species similarity NA - -
High Speciosity NA NA -
62
Other hypotheses behind taxonomic bias
Detectability Ease of
Identification
Encounter
probability
Ease of
Observation
Body size increase - + +
Reachable habitat + NA NA
High abundance + NA NA
Discrete behaviour NA - NA
Large range + NA NA
Diurnal taxon + + NA
Species similarity NA - -
High Speciosity NA NA -
Martín-López et al. 2007
Quadratic relation between the attitudes
toward species (preference score) and the
WTP for conservation of the species
The GWR model shows that there is no spatial stationarity in the model we kept.
Meaning the explanatory power as well at the covariate influence varies across space.
63
Testing the model spatial stationarity
The Geographically Weighted Regression (GWR) allows us to test the models we kept
across space. It serves as a test of the model spatial stationarity (i.e. the assumption
that the relationship between variables doesn’t varies across space).
Mammalia
GWR results
64
About spatial stationarity
Hawkins et al. 2003
65
Detecting spatial outliers
Global repartition of
the common black ant
(Lasius niger)
To detect spatial outliers:
• for each point
• find the 5 nearest points (neighbors)
• compute the orthodromic distance to each neighbors
• sum those distances
• flag the 1% points with the highest sum as outliers
66
Detecting environmental outliers
class order species
annual mean
temperature
min temperature
of coldest month
annual
precipitation
Insecta Blattodea Tryonicus parvus 15.49 7.34 1370
Insecta Blattodea Arenivaga floridensis 22.35 9.91 1225
Insecta Blattodea Ectobius lapponicus 6.89 -6.94 772
Insecta Blattodea Phyllodromica subaptera 13.83 -0.74 425
Insecta Blattodea Periplaneta americana 17.63 6.61 440
Insecta Blattodea Tryonicus parvus 15.49 7.34 1370
To detect environmental outliers:
• for each point
• compute the Mahalanobis distance of each point using the
environmental variables
• flag the 1% points with the highest mahalanobis distance as
outliers
The Mahalanobis
distance is a measure
of the distance
between a point P and
a distribution D.

Weitere ähnliche Inhalte

Was ist angesagt?

Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a...
Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a...Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a...
Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a...TERN Australia
 
Modelling pig and poultry production systems: computational and conceptual ch...
Modelling pig and poultry production systems: computational and conceptual ch...Modelling pig and poultry production systems: computational and conceptual ch...
Modelling pig and poultry production systems: computational and conceptual ch...ILRI
 
NACCB 2016 Madison WI
NACCB 2016 Madison WINACCB 2016 Madison WI
NACCB 2016 Madison WIPeter Solymos
 
Closing the gap – linking collection data to applied research
Closing the gap – linking collection data to applied researchClosing the gap – linking collection data to applied research
Closing the gap – linking collection data to applied researchKlaus Riede
 
Governing Synthetic Biology: Why, How, and Who by Deborah Scott
Governing Synthetic Biology: Why, How, and Who by Deborah ScottGoverning Synthetic Biology: Why, How, and Who by Deborah Scott
Governing Synthetic Biology: Why, How, and Who by Deborah ScottThe Canadian Council of Churches
 
Prioritizing Crop Wild Relatives Collecting
Prioritizing Crop Wild Relatives CollectingPrioritizing Crop Wild Relatives Collecting
Prioritizing Crop Wild Relatives CollectingLuigi Guarino
 
21 genomes and their evolution
21   genomes and their evolution21   genomes and their evolution
21 genomes and their evolutionRenee Ariesen
 
CESAB-FREE-sfe2018
CESAB-FREE-sfe2018CESAB-FREE-sfe2018
CESAB-FREE-sfe2018CESAB-FRB
 
Bioinformatics Database Computer applications
Bioinformatics Database Computer applicationsBioinformatics Database Computer applications
Bioinformatics Database Computer applicationsYogi Raikwar
 
Determining gamma radiation dose......Leonard
Determining gamma radiation dose......LeonardDetermining gamma radiation dose......Leonard
Determining gamma radiation dose......LeonardLenard Chilembo
 
The Global Need for Plant Breeding Innovation - Petra Jorasch
The Global Need for Plant Breeding Innovation - Petra JoraschThe Global Need for Plant Breeding Innovation - Petra Jorasch
The Global Need for Plant Breeding Innovation - Petra JoraschOECD Environment
 
iEvoBio Keynote Talk 2010
iEvoBio Keynote Talk 2010iEvoBio Keynote Talk 2010
iEvoBio Keynote Talk 2010Rob Guralnick
 

Was ist angesagt? (20)

Bioinformatics intervention in crop improvement
Bioinformatics intervention in crop improvementBioinformatics intervention in crop improvement
Bioinformatics intervention in crop improvement
 
Amman Workshop #2 - M MacKay
Amman Workshop #2 - M MacKayAmman Workshop #2 - M MacKay
Amman Workshop #2 - M MacKay
 
Turok Amman Datepalm Jan 2010
Turok Amman Datepalm Jan 2010Turok Amman Datepalm Jan 2010
Turok Amman Datepalm Jan 2010
 
Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a...
Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a...Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a...
Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a...
 
Biodiversity Management
Biodiversity ManagementBiodiversity Management
Biodiversity Management
 
ABS2015_Poster_EP_2
ABS2015_Poster_EP_2ABS2015_Poster_EP_2
ABS2015_Poster_EP_2
 
Modelling pig and poultry production systems: computational and conceptual ch...
Modelling pig and poultry production systems: computational and conceptual ch...Modelling pig and poultry production systems: computational and conceptual ch...
Modelling pig and poultry production systems: computational and conceptual ch...
 
NACCB 2016 Madison WI
NACCB 2016 Madison WINACCB 2016 Madison WI
NACCB 2016 Madison WI
 
Closing the gap – linking collection data to applied research
Closing the gap – linking collection data to applied researchClosing the gap – linking collection data to applied research
Closing the gap – linking collection data to applied research
 
Quantifying Greenhouse Gas Emissions from Managed and Natural Soils
Quantifying Greenhouse Gas Emissions from Managed and Natural SoilsQuantifying Greenhouse Gas Emissions from Managed and Natural Soils
Quantifying Greenhouse Gas Emissions from Managed and Natural Soils
 
Governing Synthetic Biology: Why, How, and Who by Deborah Scott
Governing Synthetic Biology: Why, How, and Who by Deborah ScottGoverning Synthetic Biology: Why, How, and Who by Deborah Scott
Governing Synthetic Biology: Why, How, and Who by Deborah Scott
 
Prioritizing Crop Wild Relatives Collecting
Prioritizing Crop Wild Relatives CollectingPrioritizing Crop Wild Relatives Collecting
Prioritizing Crop Wild Relatives Collecting
 
21 genomes and their evolution
21   genomes and their evolution21   genomes and their evolution
21 genomes and their evolution
 
CESAB-FREE-sfe2018
CESAB-FREE-sfe2018CESAB-FREE-sfe2018
CESAB-FREE-sfe2018
 
New challenges in microalgae biotechnology
New challenges in microalgae biotechnologyNew challenges in microalgae biotechnology
New challenges in microalgae biotechnology
 
Indianexpress News
Indianexpress NewsIndianexpress News
Indianexpress News
 
Bioinformatics Database Computer applications
Bioinformatics Database Computer applicationsBioinformatics Database Computer applications
Bioinformatics Database Computer applications
 
Determining gamma radiation dose......Leonard
Determining gamma radiation dose......LeonardDetermining gamma radiation dose......Leonard
Determining gamma radiation dose......Leonard
 
The Global Need for Plant Breeding Innovation - Petra Jorasch
The Global Need for Plant Breeding Innovation - Petra JoraschThe Global Need for Plant Breeding Innovation - Petra Jorasch
The Global Need for Plant Breeding Innovation - Petra Jorasch
 
iEvoBio Keynote Talk 2010
iEvoBio Keynote Talk 2010iEvoBio Keynote Talk 2010
iEvoBio Keynote Talk 2010
 

Ähnlich wie PhD defense Julien Troudet (29/11/2017)

Indo norway delhi_vishwas_28_oct2011_final
Indo norway delhi_vishwas_28_oct2011_finalIndo norway delhi_vishwas_28_oct2011_final
Indo norway delhi_vishwas_28_oct2011_finalVishwas Chavan
 
Ecological Society of America
Ecological Society of America Ecological Society of America
Ecological Society of America Vishwas Chavan
 
Chavan Finland 13082009
Chavan Finland 13082009Chavan Finland 13082009
Chavan Finland 13082009Vishwas Chavan
 
GBIF and Biodiversity informatics for museums, 15 March 2021
GBIF and Biodiversity informatics for museums, 15 March 2021GBIF and Biodiversity informatics for museums, 15 March 2021
GBIF and Biodiversity informatics for museums, 15 March 2021Dag Endresen
 
The Biodiversity Informatics Landscape
The Biodiversity Informatics LandscapeThe Biodiversity Informatics Landscape
The Biodiversity Informatics LandscapeVince Smith
 
NLBIF_NIOO_2017v3
NLBIF_NIOO_2017v3NLBIF_NIOO_2017v3
NLBIF_NIOO_2017v3Jan Kuiper
 
Eia Data Publishing Infra Tech March2010
Eia Data Publishing Infra Tech March2010Eia Data Publishing Infra Tech March2010
Eia Data Publishing Infra Tech March2010Vishwas Chavan
 
Museum collections as research data - October 2019
Museum collections as research data - October 2019Museum collections as research data - October 2019
Museum collections as research data - October 2019Dag Endresen
 
Session 06, Introduction to biodiversity sample-based data publishing at the ...
Session 06, Introduction to biodiversity sample-based data publishing at the ...Session 06, Introduction to biodiversity sample-based data publishing at the ...
Session 06, Introduction to biodiversity sample-based data publishing at the ...Alberto González-Talaván
 
GBIF towards 2030 (November 2018)
GBIF towards 2030 (November 2018)GBIF towards 2030 (November 2018)
GBIF towards 2030 (November 2018)Dag Endresen
 
RPG iEvoBio 2010 Keynote
RPG iEvoBio 2010 KeynoteRPG iEvoBio 2010 Keynote
RPG iEvoBio 2010 KeynoteRob Guralnick
 
USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 5
USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 5USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 5
USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 5Gianpaolo Coro
 
The role of biodiversity informatics in GBIF, 2021-05-18
The role of biodiversity informatics in GBIF, 2021-05-18The role of biodiversity informatics in GBIF, 2021-05-18
The role of biodiversity informatics in GBIF, 2021-05-18Dag Endresen
 
20140523 swiss curators_bouchout_2
20140523 swiss curators_bouchout_220140523 swiss curators_bouchout_2
20140523 swiss curators_bouchout_2agosti
 
GBIF (Global Biodiversity Information Facility) Position Paper: Data Hosting ...
GBIF (Global Biodiversity Information Facility) Position Paper: Data Hosting ...GBIF (Global Biodiversity Information Facility) Position Paper: Data Hosting ...
GBIF (Global Biodiversity Information Facility) Position Paper: Data Hosting ...Phil Cryer
 
Building a Global System for Conserving Crop Diversity
Building a Global System for Conserving Crop DiversityBuilding a Global System for Conserving Crop Diversity
Building a Global System for Conserving Crop DiversityLuigi Guarino
 

Ähnlich wie PhD defense Julien Troudet (29/11/2017) (20)

Indo norway delhi_vishwas_28_oct2011_final
Indo norway delhi_vishwas_28_oct2011_finalIndo norway delhi_vishwas_28_oct2011_final
Indo norway delhi_vishwas_28_oct2011_final
 
Ecological Society of America
Ecological Society of America Ecological Society of America
Ecological Society of America
 
Chavan Finland 13082009
Chavan Finland 13082009Chavan Finland 13082009
Chavan Finland 13082009
 
GBIF and Biodiversity informatics for museums, 15 March 2021
GBIF and Biodiversity informatics for museums, 15 March 2021GBIF and Biodiversity informatics for museums, 15 March 2021
GBIF and Biodiversity informatics for museums, 15 March 2021
 
The Biodiversity Informatics Landscape
The Biodiversity Informatics LandscapeThe Biodiversity Informatics Landscape
The Biodiversity Informatics Landscape
 
NLBIF_NIOO_2017v3
NLBIF_NIOO_2017v3NLBIF_NIOO_2017v3
NLBIF_NIOO_2017v3
 
Eia Data Publishing Infra Tech March2010
Eia Data Publishing Infra Tech March2010Eia Data Publishing Infra Tech March2010
Eia Data Publishing Infra Tech March2010
 
Museum collections as research data - October 2019
Museum collections as research data - October 2019Museum collections as research data - October 2019
Museum collections as research data - October 2019
 
Session 06, Introduction to biodiversity sample-based data publishing at the ...
Session 06, Introduction to biodiversity sample-based data publishing at the ...Session 06, Introduction to biodiversity sample-based data publishing at the ...
Session 06, Introduction to biodiversity sample-based data publishing at the ...
 
GBIF – avanços e perspectivas - Tim Hirsch
GBIF – avanços e perspectivas - Tim HirschGBIF – avanços e perspectivas - Tim Hirsch
GBIF – avanços e perspectivas - Tim Hirsch
 
GBIF towards 2030 (November 2018)
GBIF towards 2030 (November 2018)GBIF towards 2030 (November 2018)
GBIF towards 2030 (November 2018)
 
RPG iEvoBio 2010 Keynote
RPG iEvoBio 2010 KeynoteRPG iEvoBio 2010 Keynote
RPG iEvoBio 2010 Keynote
 
2014 10 china-nsl
2014 10 china-nsl2014 10 china-nsl
2014 10 china-nsl
 
USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 5
USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 5USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 5
USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 5
 
The role of biodiversity informatics in GBIF, 2021-05-18
The role of biodiversity informatics in GBIF, 2021-05-18The role of biodiversity informatics in GBIF, 2021-05-18
The role of biodiversity informatics in GBIF, 2021-05-18
 
Introduction to OBIS at 2nd Int Ocean Research Conference 2014
Introduction to OBIS at 2nd Int Ocean Research Conference 2014Introduction to OBIS at 2nd Int Ocean Research Conference 2014
Introduction to OBIS at 2nd Int Ocean Research Conference 2014
 
20140523 swiss curators_bouchout_2
20140523 swiss curators_bouchout_220140523 swiss curators_bouchout_2
20140523 swiss curators_bouchout_2
 
Evs ppt
Evs pptEvs ppt
Evs ppt
 
GBIF (Global Biodiversity Information Facility) Position Paper: Data Hosting ...
GBIF (Global Biodiversity Information Facility) Position Paper: Data Hosting ...GBIF (Global Biodiversity Information Facility) Position Paper: Data Hosting ...
GBIF (Global Biodiversity Information Facility) Position Paper: Data Hosting ...
 
Building a Global System for Conserving Crop Diversity
Building a Global System for Conserving Crop DiversityBuilding a Global System for Conserving Crop Diversity
Building a Global System for Conserving Crop Diversity
 

Kürzlich hochgeladen

World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfsimulationsindia
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 

Kürzlich hochgeladen (20)

World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 

PhD defense Julien Troudet (29/11/2017)

  • 1. BIODIVERSITY OCCURRENCES AND PATTERNS FROM THE ANGLE OF SYSTEMATICS Julien TROUDET Directeurs de thèse : Régine Vignes-Lebbe, Frédéric Legendre Institut de Systématique, Evolution, Biodiversité ISYEB - UMR 7205 - CNRS, MNHN, UPMC, EPHE, Sorbonne Université Equipe Evolution fonctionnelle et Systématique (EVOFONC) Laboratoire Informatique & Systématique (LIS) ■ ANTONELLI Alexandre ■ LESSARD Jean-Philippe ■ ARCHAMBEAU Anne-Sophie ■ PAGE Rod ■ VIGNES-LEBBE Régine ■ LEGENDRE Frédéric University of Gothenburg Concordia University GBIF France University of Glasgow UMR 7205 ISYEB MNHN UMR 7205 ISYEB MNHN Rapporteur Rapporteur Examinateur Examinateur Directeur de thèse Directeur de thèse Jury:
  • 3. Introduction 3 Remaining populations of indigenous species as a percentage of their original populations (Newbold et al. 2016) This sense of urgency has been seized by ecologists and conservationists: Has land use pushed terrestrial biodiversity beyond the planetary boundary? A global assessment Newbold et al. 2016 Biodiversity loss and its impact on humanity Cardinale et al. 2012 Biodiversity hotspots for conservation priorities Myers et al. 2000 Biodiversity is eroding at an accelerating pace.
  • 4. 4 Systematics also takes up the challenge and gives itself the means to respond to this urgency. Introduction Post-molecular systematics and the future of phylogenetics Pyron 2015 How Many Kinds of Birds Are There and Why Does It Matter? Barrowclough et al. 2016 Assessing data quality in citizen science Kosmala et al 2016 Systematics produces data essential to biodiversity sciences. Ecology Phylogenetics Conservation Taxonomy Pest control
  • 5. 5 Introduction The angle of systematics, a complementary perspective Systematists are the largest producers of biodiversity occurrences with 1.2 to 2.1 billions of specimens in museum collections (Ariño 2010). Systematists have a unique point of view on biodiversity, giving all taxa the same significance and adding a historical context to the study of biodiversity. Ciccarelli et al. 2006
  • 6. 6 Introduction Data producers and users Global biodiversity patterns at large taxonomic scale: which factors shape them? Considering the largest taxonomic scale possible to produce generalizable outputs How is the practice of biodiversity data gathering evolving? Can biological diversity be investigated in its entirety? As data producers, systematists have a special position to characterize biodiversity data before using it. The angle of systematics, a complementary perspective
  • 7. 7 Introduction Data producers and users Global biodiversity patterns at large taxonomic scale: which factors shape them? Considering the largest taxonomic scale possible to produce generalizable outputs How is the practice of biodiversity data gathering evolving? Can biological diversity be investigated in its entirety? The angle of systematics, a complementary perspective
  • 8. Plan 8 ▪ Biodiversity occurrences ▪ Methods for Big-data ▪ Less specimens, more observations ▪ Taxonomic bias and societal preferences ▪ Productivity shapes the latitudinal diversity gradient ▪ Conclusion
  • 9. 1. Biodiversity occurrences The raw material of biodiversity sciences
  • 10. Occurrences= Primary Biodiversity Data 10 What ? Where ? When ? Scientists, especially systematists are the first producers of biodiversity data. Citizen science projects produce biodiversity data for specific needs and uses scientific supervision. Networks of amateur naturalists are important producers of data especially for birds and other vertebrate taxa
  • 11. Integration of occurrences in databases 11 Collection digitization Data production A non-exhaustive map of the global and European biodiversity informatics landscape (Bingham et al. 2017) Primary biodiversity data are created by many producers, however most of it is created by either digitizing existing data or by producing new data
  • 12. Integration of occurrences in databases 12 Collection digitization Data production A non-exhaustive map of the global and European biodiversity informatics landscape (Bingham et al. 2017) Primary biodiversity data are created by many producers, however most of it is created by either digitizing existing data or by producing new data
  • 13. Integration of occurrences in databases 13 Collection digitization Data production A non-exhaustive map of the global and European biodiversity informatics landscape (Bingham et al. 2017) Primary biodiversity data are created by many producers, however most of it is created by either digitizing existing data or by producing new data
  • 14. From occurrences to databases What is GBIF? GBIF—the Global Biodiversity Information Facility—is an open- data research infrastructure funded by the world’s governments and aimed at providing anyone, anywhere access to data about all types of life on Earth. ■ 1,118 Publishers ■ 36,825 Datasets ■ 856,055,455 occurrences The Global Biodiversity Information Facility (GBIF) connections. (Bingham et al. 2017)14
  • 15. From occurrences to databases What is GBIF? GBIF—the Global Biodiversity Information Facility—is an open- data research infrastructure funded by the world’s governments and aimed at providing anyone, anywhere access to data about all types of life on Earth. ■ 1,118 Publishers ■ 36,825 Datasets ■ 856,055,455 occurrences 15
  • 16. 2. Methods for big-data Processing millions of data in reasonable time
  • 17. A dataset of 626 million occurrences 17 Occurrences accumulation in the GBIF Exponential growth The number of occurrences mediated by the GBIF is growing exponentially. 57 million occurrences were recorded in 2014, which is more than 5 times the amount of data recorded in 2004 (11 million). The uncompressed volume of the GBIF data is approximately 500 GigaBytes. This volume of data is 400 times smaller than the volume of data that the Gaia mission will produce (200 TeraBytes). Handling so many occurrences is a methodological challenge.
  • 18. DwCSP a fast biodiversity occurrence curator (in preparation for Bioinformatics) A custom software 18 Darwin Core Spatial Processor A tool to manipulate large amount of primary biodiversity data ▪ Data enrichment using spatial files (shapefiles and raster files) ▪ Spatial outliers detection ▪ Environmental outlier detection Manipulating the GBIF data required to set up multiple systems, scripts and databases to, clean and filter the data, compute statistics, visualize results on a map, etc. Some tools used during the PhD: Java, R, PostgreSQL, QGIS 620 million occurrences in more than 60,000 species were processed.
  • 19. DwCSP a fast biodiversity occurrence curator (in preparation for Bioinformatics) A custom software 19 Darwin Core Spatial Processor A tool to manipulate large amount of primary biodiversity data ▪ Data enrichment using spatial files (shapefiles and raster files) ▪ Spatial outliers detection ▪ Environmental outlier detection Manipulating the GBIF data required to set up multiple systems, scripts and databases to, clean and filter the data, compute statistics, visualize results on a map, etc. Some tools used during the PhD: Java, R, PostgreSQL, QGIS 620 million occurrences in more than 60,000 species were processed.
  • 20. Working with the GBIF data 20 For each species 1. Put the occurrences on a grid 2. Keep species with 20 or more occurrences example: Lacerta bilineata
  • 21. Working with the GBIF data 21 For each species 1. Put the occurrences on a grid 2. Keep species with 20 or more occurrences 3. Detection of spatial outliers example: Lacerta bilineata
  • 22. Working with the GBIF data 22 For each species 1. Put the occurrences on a grid 2. Keep species with 20 or more occurrences 3. Detection of spatial outliers 4. Detection of climatic outliers example: Lacerta bilineata
  • 23. Working with the GBIF data 23 For each species 1. Put the occurrences on a grid 2. Keep species with 20 or more occurrences 3. Detection of spatial outliers 4. Detection of climatic outliers 5. Extrapolating species distribution using niche modelling example: Lacerta bilineata
  • 24. 2. Less specimens, more observations A change in the primary biodiversity data paradigm
  • 25. A large and heterogeneous data set 25 The amount of GBIF-mediated data is increasing exponentially. GBIF-mediated data are very heterogenous because of numerous data producers. Occurrences accumulation in the GBIF How is the practice of biodiversity data gathering evolving? How do recent and old biodiversity occurrences differ ? Does the increase in data quantity comes with an increase in data quality?
  • 26. Two types of occurrences 26 Specimen-based and Observation-based occurrences are not identical. The possible uses for an observational occurrence are limited by the ancillary data collected during the observation, whereas a specimen can be analyzed in various ways at a later stage.
  • 27. The increasing disconnection of primary biodiversity data from specimens: How does it happen and how to handle it? (Systematic Biology, under review) The number of observation-based occurrences added to the GBIF is growing at an exponential rate, while the number of specimen- based occurrences stay stable. 27 A general shift Actinopterygii Aves Insecta Magnoliopsida Reptilia Year Proportion Proportion In proportion, a clear shift is visible for the 24 taxonomic classes studied.
  • 28. The increasing disconnection of primary biodiversity data from specimens: How does it happen and how to handle it? (Systematic Biology, under review)28 Ancillary data as the solution? Observation-based occurrences can be complemented with ancillary data. ...and most of ancillary data are linked to specimen- based occurrences. Yet, ancillary data would be most useful to check or update observation-based occurrences. Ancillary data - such as DNA sequences or multimedia files (photo, video, recordings…) - are more and more affordable to collect. Still, very few GBIF-mediated occurrences are linked to digital or molecular data...
  • 29. The increasing disconnection of primary biodiversity data from specimens: How does it happen and how to handle it? (Systematic Biology, under review)29 Recent data are of better quality Spatial precision is improving (spatial issues decrease) Overall, there is an improvement in the quality of biodiversity occurrences. Taxonomic precision is improving
  • 30. The increasing disconnection of primary biodiversity data from specimens: How does it happen and how to handle it? (Systematic Biology, under review)30 Conclusion & recommendations We recommend to prioritise the production of ancillary data in the following order: 1. Specimens 2. Material samples (DNA) 3. Multimedia files 4. Detailed observation Data producers have taken the habit of providing precise GPS coordinates and taxonomic identification In the age of smartphones and global access to internet it should be a priority to encourage data producers to link pictures and other additional data to any occurrence they create More and more primary biodiversity data are not linked to voucher specimens (i.e. observations). In addition, a very small proportion of these observations have auxiliary data. This situation weakens the possibilities for future biodiversity studies relying on this data.
  • 31. 3. Taxonomic bias and societal preferences The public could influence which taxa are the most studied.
  • 32. 32 Biodiversity occurrences: a biased dataset Biodiversity occurrences are not collected evenly. A well known bias is the spatial bias (Meyer et al. 2015). Some areas of the world are far most sampled than others. Similarly, some taxa are more studied than others. This taxonomic bias has been studied at small scales. ▪ for a single field such as conservation (Di Marco et al. 2017) ▪ for specific taxa (Ford et al. 2017)
  • 33. 33 Biodiversity occurrences: a biased dataset First recommendation of Faith et al. (2013): Biases must be recognised in biodiversity sciences and efforts produced to bridge them. We used 24 classes (large taxonomic scale). We analyzed 626 million occurrences. We tested the ‘societal preferences’ and ‘taxonomic research’ hypotheses: ▪ The public preferences influence and bias the choice of study organisms. (Stahlschmidt 2011) (Number of web pages) ▪ Scientific reasons and limitations lead and orientate biodiversity data gathering. (Number of scientific publications)Why only 17 % of bird species in the GBIF have less than 20 occurrences while 79 % of insect species are in the same case?
  • 34. Taxonomic bias in biodiversity data and societal preferences (published in Scientific Reports, 2017)34 A bias affecting data quantity In the GBIF, some groups have far more occurrences than others even if those groups are less speciose. Millions of occurrences in the GBIF Thousands of species in the GBIF Median number of occurrences per species Aves 345.11 12.82 371 Magnoliopsida 118.21 261.01 19 Insecta 46.78 352.78 3 Mammalia 10.78 11.53 15 Reptilia 4.98 11.30 24 Lecanoromycetes 4.97 17.79 8 Amphibia 3.94 5.89 54 Total in the GBIF 649.79 1200.38 6
  • 35. Taxonomic bias in biodiversity data and societal preferences (published in Scientific Reports, 2017)35 A bias that increases over time Looking at the quantity of occurrences produced through time, the taxonomic bias is only worsening. Millionsofoccurrences
  • 36. Taxonomic bias in biodiversity data and societal preferences (published in Scientific Reports, 2017)36 A bias affecting data quality The GBIF-mediated data also exhibit a taxonomic bias in data quality. The proportion of occurrences identified at the species level varies across classes. 99 % 92 % 77 % 69 %
  • 37. Taxonomic bias in biodiversity data and societal preferences (published in Scientific Reports, 2017) Analysing more than 40,000 species, 39 of 47 generalized linear model showed a positive correlation between the quantity of data per species and the number of web pages for the species (societal influence). 37 Societal preferences influence Vanessa atalanta 462,000 google results 528,227 occurrences Misumena vatia 137,000 google results 5,556 occurrences
  • 38. Taxonomic bias in biodiversity data and societal preferences (published in Scientific Reports, 2017)38 Conclusion These results should encourage biodiversity researchers to communicate even more with the public. If taxonomic bias is linked to societal preferences, the rise of citizen science could further exacerbate this bias. Hypotheses related to specific characteristics, such as species size or range, could also explain this bias and should be explored. ▪ The public preferences influence the choice of study organisms. ▪ Scientific reasons and limitations orientate biodiversity data gathering.
  • 39. 4. The Latitudinal Diversity Gradient Exploring a global biodiversity pattern with GBIF-mediated data
  • 40. “ Thus, the nearer we approach the tropics, the greater the increase in the variety of structure, grace of form, and mixture of colours, as also in perpetual youth and vigour of organic life.” 40 Alexander von Humboldt, 1807
  • 41. Latitudinal Diversity Gradient (LDG) 41 Organisms diversity tends to be the highest near the equator and diminishes as we move towards the poles. This pattern is called the Latitudinal diversity gradient Here biodiversity is quantified using species richness. The next results focus on terrestrial taxa.
  • 42. In situ hypotheses examples: Latitudinal Diversity Gradient: Geometric hypotheses revisited using massive biodiversity occurrences in plants and animals of the New World (in preparation) 42 A multitude of hypotheses More than 30 hypotheses have been formulated to explain the formation of the LDG (Willig et al. 2003). Historical hypotheses propose diversification mechanisms that were not tackled in this study. In situ hypotheses propose that environmental factors shape the LDG. Geometric hypotheses see the LDG as a geometrical artifact caused by random species repartitions. Productivity hypothesis There is a positive correlation between actual evapotranspiration (productivity) and the species richness of terrestrial birds (Hawkins et al. 2003) Ambient energy hypothesis The species richness of terrestrial birds in western palearctic areas is related to the annual temperature (Hawkins et al. 2003)
  • 43. Latitudinal Diversity Gradient: Geometric hypotheses revisited using massive biodiversity occurrences in plants and animals of the New World (in preparation) 43 The geometric hypotheses The first geometric hypothesis: Colwell & Hurt (1994), Colwell & Lees (2000). This hypothesis is also called the mid-domain effect. An updated and untested geometric hypothesis: Gross & Snyder-Beattie (2016) Considering species have different latitudinal range sizes, the random location of those ranges on the globe would result in more species near the equator.
  • 44. Latitudinal Diversity Gradient: Geometric hypotheses revisited using massive biodiversity occurrences in plants and animals of the New World (in preparation) 44 The dataset structure Because of taxonomic and spatial biases (some areas are better sampled than others), we have tested the LDG on 8 taxonomic classes on the New World. ▪ Amphibia ▪ Aves ▪ Liliopsida ▪ Magnoliopsida ▪ Mammalia ▪ Pinopsida ▪ Polypodiopsida ▪ Reptilia 208 millions occurrences 62,099 species For each class, a list of geographic cells covering the New World with species richness and explanatory variables values is computed.
  • 45. Latitudinal Diversity Gradient: Geometric hypotheses revisited using massive biodiversity occurrences in plants and animals of the New World (in preparation) 45 Characterizing the LDG A ‘classic’ LDG pattern was found for 7 out of the 8 tested classes. For these 7 classes, a higher species richness occurs between the -30° and 30° lines of latitude (dotted lines). Pinopsida showed an atypical pattern.
  • 46. Latitudinal Diversity Gradient: Geometric hypotheses revisited using massive biodiversity occurrences in plants and animals of the New World (in preparation) 46 Testing the hypotheses The supported hypotheses were either both the productivity and ambient energy hypotheses (Amphibia and Mammalia) or only the productivity hypothesis (Aves, Liliopsida, Magnoliopsida, Polypodiopsida, Reptilia). The Gross & Snyder-Beattie hypothesis was rejected. Non-spatial stepwise regression Moran’s I test for spatial autocorrelation Spatial regression (Spatial lag and error model) Best model selection Selection of significant variables ~ ~ Species richness Species richness + ~ Evapotranspiration + Annual mean temperature Productivity and Ambient energy ~ Evapotranspiration Productivity
  • 47. Latitudinal Diversity Gradient: Geometric hypotheses revisited using massive biodiversity occurrences in plants and animals of the New World (in preparation) ▪ The null hypothesis proposed by Gross & Snyder-Beattie doesn’t fit in our model once the spatial autocorrelation is taken into account. 47 Conclusion on the LDG ▪ The relative contribution of the productivity hypothesis seems to be the highest. ▪ There is a clear LDG for 7 out of the 8 class studied. ▪ More factors should be tested, especially those included in historical hypotheses.
  • 49. 49 How is the practice of biodiversity data gathering evolving? We recommend to encourage the production of ancillary data (samples, photos…) along with biodiversity occurrences. The role of citizen sciences in the evolution of this new paradigm should also be investigated. The proportion of specimen-based occurrences is rapidly falling behind the proportion of observation-based occurrences In the GBIF dataset. This situation is worrying because it jeopardizes the feasibility and reliability of some studies based on biodiversity occurrences. Year Proportion
  • 50. Some taxa are less known and studied than others and this situation worsens with time. This taxonomic bias seems to be linked to societal preferences. 50 Can biological diversity be investigated in its entirety? What can be said when looking at the species level? Do the most sampled species in the GBIF also suggest a link with societal preferences? Changing the study scale might reveal additional influences (species characteristics, data providers, etc.) Class Number of species Aves 611 Magnoliopsida 228 Liliopsida 70 Insecta 28 Actinopterygii 18 Mammalia 12 Polypodiopsida 7 14 next classes 26 1000 best sampled species in the GBIF
  • 51. The geometric hypothesis proposed by Gross & Snyder-Beattie has been rejected while the productivity hypothesis seems to be confirmed. The Latitudinal Diversity Gradient pattern is clear for 7 classes. 51 Global biodiversity patterns at large taxonomic scale: which factors shape them? More studies should be done, this time including the historical hypotheses. Other geographic regions and biodiversity patterns could be studied. The spatial non- stationarity of the models should be explored.
  • 52. THANKS! To the jury members: Alexandre ANTONELLI, Jean-Philippe LESSARD, Anne-Sophie ARCHAMBEAU and Rod PAGE To the members of my PhD committee: Roseli Pellens, Philippe Grandcolas, Wilfried Thuiller, Samy Gaiji and Jérôme Sueur. To my thesis supervisors: Régine VIGNES-LEBBE and Frédéric LEGENDRE Thanks to the people at the GBIF France and the International GBIF Node in Copenhagen for their help and advices. Many thanks to all the amazing people I get to interact with during this PhD in particular all the people from the Institut de Systématique, Évolution, Biodiversité who welcomed me during those three years. Many thanks again to all the colleagues, friends and family who supported me and helped me ! Words can’t express all my gratitude ! 52
  • 53. References▪ Ariño, Arturo H. « Approaches to Estimating the Universe of Natural History Collections Data ». Biodiversity Informatics 7, nᵒ 2 (2010). bi.v7i2.3991. ▪ Barrowclough, George F., Joel Cracraft, John Klicka, et Robert M. Zink. « How Many Kinds of Birds Are There and Why Does It Matter? » PLOS ONE 11, nᵒ 11 (2016): e0166307. ▪ Bingham, Heather, Lauren Weatherdon, Katherine Despot-Belmonte, Florian Wetzel, et Corinne Martin. « The Biodiversity Informatics Landscape: Elements, Connections and Opportunities ». Research Ideas and Outcomes 3 (2017): e14059. ▪ Cardinale, Bradley J., J. Emmett Duffy, Andrew Gonzalez, David U. Hooper, Charles Perrings, Patrick Venail, Anita Narwani, et al. « Biodiversity Loss and Its Impact on Humanity ». Nature 486, nᵒ 7401 (2012): 59|67 ▪ Ciccarelli, Francesca D., Tobias Doerks, Christian von Mering, Christopher J. Creevey, Berend Snel, et Peer Bork. « Toward Automatic Reconstruction of a Highly Resolved Tree of Life ». Science (New York, N.Y.) 311, nᵒ 5765 (2006): 1283|87. ▪ Colwell, Robert K., et George C. Hurtt. « Nonbiological Gradients in Species Richness and a Spurious Rapoport Effect ». The American Naturalist 144, nᵒ 4 (1994): 570|95. ▪ Colwell, Robert. K., et David C. Lees. « The mid-domain effect: geometric constraints on the geography of species richness ». Trends in Ecology & Evolution 15, nᵒ 2 (2000): 70|76. ▪ Di Marco, Moreno, Sarah Chapman, Glenn Althor, Stephen Kearney, Charles Besancon, Nathalie Butt, Joseph M. Maina, et al. « Changing trends and persisting biases in three decades of conservation science ». Global Ecology and Conservation 10 (avril 2017): 32|42. ▪ Faith, Dan, Ben Collen, Arturo Ariño, Patricia Koleff Patricia Koleff, John Guinotte, Jeremy Kerr, et Vishwas Chavan. « Bridging the Biodiversity Data Gaps: Recommendations to Meet Users’ Data Needs ». Biodiversity Informatics 8, nᵒ 2 (2013). ▪ Ford, Adam T., Steven J. Cooke, Jacob R. Goheen, et Truman P. Young. « Conserving Megafauna or Sacrificing Biodiversity? » BioScience 67, nᵒ 3 (2017): 193|96. ▪ Gross, Kevin, et Andrew Snyder-Beattie. « A General, Synthetic Model for Predicting Biodiversity Gradients from Environmental Geometry ». The American Naturalist 188, nᵒ 4 (2016): E85|97. ▪ Hawkins, Bradford A., Eric E. Porter, et José Alexandre Felizola Diniz-Filho. « Productivity and History as Predictors of the Latitudinal Diversity Gradient of Terrestrial Birds ». Ecology 84, nᵒ 6 (2003): 1608|23. ▪ Humboldt, Alexander. « Views of nature, or, Contemplations on the sublime phenomena of creation ». (1807) ▪ Kosmala, Margaret, Andrea Wiggins, Alexandra Swanson, et Brooke Simmons. « Assessing Data Quality in Citizen Science ». Frontiers in Ecology and the Environment 14, nᵒ 10 (2016): 551-60. ▪ Meyer, Carsten, Holger Kreft, Robert Guralnick, et Walter Jetz. « Global Priorities for an Effective Information Basis of Biodiversity Distributions ». Nature Communications 6 (2015): 8221. ▪ Myers, Norman, Russell A. Mittermeier, Cristina G. Mittermeier, Gustavo A. B. da Fonseca, et Jennifer Kent. « Biodiversity Hotspots for Conservation Priorities ». Nature 403, nᵒ 6772 (2000): 853|58. ▪ Newbold, Tim, Lawrence N. Hudson, Andrew P. Arnell, Sara Contu, Adriana De Palma, Simon Ferrier, Samantha L. L. Hill, et al. « Has Land Use Pushed Terrestrial Biodiversity beyond the Planetary Boundary? A Global Assessment ». Science 353, nᵒ 6296 (2016): 288.91. ▪ Pyron, R. Alexander. « Post-Molecular Systematics and the Future of Phylogenetics ». Trends in Ecology & Evolution 30, nᵒ 7 (2015): 384|89. ▪ Ripple, William J., Christopher Wolf, Thomas M. Newsome, Mauro Galetti, Mohammed Alamgir, Eileen Crist, Mahmoud I. Mahmoud, et William F. Laurance. « World Scientists’ Warning to Humanity: A Second Notice ». BioScience, (2017). ▪ Stahlschmidt, Zachary R. « Taxonomic Chauvinism Revisited: Insight from Parental Care Research ». PLOS ONE 6, nᵒ 8 (2011): e24192. journal.pone.0024192. ▪ Willig, M.R., D.M. Kaufman, et R.D. Stevens. « LATITUDINAL GRADIENTS OF BIODIVERSITY: Pattern, Process, Scale, and Synthesis ». Annual Review of Ecology, Evolution, and Systematics 34, nᵒ 1 (2003): 273|309. ▪ World Scientists' Warning to Humanity (1992)53
  • 54. Images Credits▪ Collections: naturalhistory.si.edu ▪ Coleoptères: mnhn.fr ▪ Butterfly box ▪ Smartphone: harperpark.ca ▪ herbaria: mnhn.fr ▪ Coleoptera box: museedesconfluences.fr ▪ Alcool collection: mnhn.fr ▪ Ornithologist: nature.org ▪ Metabarcoding: transmittingscience.org 54 ▪ Gaia mission: sci.esa.int ▪ Lacerta bilineata: Wikimedia Commons ▪ entomologist: Wikimedia Commons ▪ butterfly boxes: http://nmnh.typepad.com ▪ birds checklist: Sasol Checklist Birds Of Southern Africa ▪ GPS: dir.indiamart.com ▪ Vanessa atalanta: Wikimedia Commons ▪ Misumena vatia: flikr (Wayne Davies) ▪ LDG shema: news.illinois.edu
  • 55. Jump in the number of occurrences 55 Magnoliopsida Reptilia 1945: 890,000 occurrences Occurrence Data of Vascular Plants collected or compiled for the Flora of Bavaria 2012: 470,000 occurrences Geographically tagged INSDC sequences Published by European Molecular Biology Laboratory (EMBL)
  • 56. Taxonomic bias in biodiversity data and societal preferences (published in Scientific Reports, 2017)56 A bias affecting data quality A Multiple Correspondence Analysis shows that old occurrences have a tendency to have more issues. The proportions of spatial and temporal issues are different between classes, showing once again a bias in quality.
  • 57. Taxonomic bias in biodiversity data and societal preferences (published in Scientific Reports, 2017)57 A bias affecting data quality Some classes have far more specimen-based occurrences than other.
  • 58. 58 The geometric hypotheses The first geometric hypothesis: Colwell & Hurt (1994), Colwell & Lees (2000). This hypothesis is also called the mid-domain effect. Rejected by Currie & Kerr 2007, 2008 Zappata et al. 2003 Thirty species with a uniform frequency distribution of range sizes are placed randomly within a finite domain (bottom panels). Horizontal lines indicate range sizes and midpoints indicate the position of ranges along the 1-D domain. The overlap among ranges on the bottom panels produces the pattern of species richness observed on the upper panels (dotted line). The mean value of species richness that results from repositioning the ranges randomly within the domain 50 times is indicated by a solid line
  • 59. 59 The geometric hypotheses An updated and untested geometric hypothesis: Gross & Snyder-Beattie (2016)
  • 60. 60 The complex relationship between the public and the scientists Martín-López et al. 2009 Conceptual model representing the primary connections among scientific information, scientific activity and policy, nongovernmental organizations (NGOs), society and environmental administrations (or departments) at different governmental scales that establish and allocate funds for species conservation
  • 61. 61 Other hypotheses behind taxonomic bias Detectability Ease of Identification Encounter probability Ease of Observation Body size increase - + + Reachable habitat + NA NA High abundance + NA NA Discrete behaviour NA - NA Large range + NA NA Diurnal taxon + + NA Species similarity NA - - High Speciosity NA NA -
  • 62. 62 Other hypotheses behind taxonomic bias Detectability Ease of Identification Encounter probability Ease of Observation Body size increase - + + Reachable habitat + NA NA High abundance + NA NA Discrete behaviour NA - NA Large range + NA NA Diurnal taxon + + NA Species similarity NA - - High Speciosity NA NA - Martín-López et al. 2007 Quadratic relation between the attitudes toward species (preference score) and the WTP for conservation of the species
  • 63. The GWR model shows that there is no spatial stationarity in the model we kept. Meaning the explanatory power as well at the covariate influence varies across space. 63 Testing the model spatial stationarity The Geographically Weighted Regression (GWR) allows us to test the models we kept across space. It serves as a test of the model spatial stationarity (i.e. the assumption that the relationship between variables doesn’t varies across space). Mammalia GWR results
  • 65. 65 Detecting spatial outliers Global repartition of the common black ant (Lasius niger) To detect spatial outliers: • for each point • find the 5 nearest points (neighbors) • compute the orthodromic distance to each neighbors • sum those distances • flag the 1% points with the highest sum as outliers
  • 66. 66 Detecting environmental outliers class order species annual mean temperature min temperature of coldest month annual precipitation Insecta Blattodea Tryonicus parvus 15.49 7.34 1370 Insecta Blattodea Arenivaga floridensis 22.35 9.91 1225 Insecta Blattodea Ectobius lapponicus 6.89 -6.94 772 Insecta Blattodea Phyllodromica subaptera 13.83 -0.74 425 Insecta Blattodea Periplaneta americana 17.63 6.61 440 Insecta Blattodea Tryonicus parvus 15.49 7.34 1370 To detect environmental outliers: • for each point • compute the Mahalanobis distance of each point using the environmental variables • flag the 1% points with the highest mahalanobis distance as outliers The Mahalanobis distance is a measure of the distance between a point P and a distribution D.