CMSY workshop - Gianpaolo Coro (ISTI-CNR)

BlueBRIDGE receives funding from the European Union’s Horizon 2020
research and innovation programme under grant agreement No. 675680 www.bluebridge-vres.eu
CMSY Workshop
Gianpaolo Coro
ISTI-CNR
gianpaolo.coro@isti.cnr.it

Verhulst (1844) Model of Population Growth

The Schaefer Model (1954)
Fmsy = ½ rmax
Bmsy = ½ k

http://onlinelibrary.wiley.com/doi/10.1111/faf.
12190/full
CMSY
An Open-source software for data-
limited stock assessment
https://github.com/SISTA16/cmsy

From Catch-MSY to CMSY
• Catch-MSY gave robust
estimates of MSY, but
biased estimates of r (too
low) and k (too high).
 Catch-MSY could not
reliably predict biomass
 CMSY overcomes the bias
and gives reasonable
estimates of Fmsy and Bmsy
 CMSY gives reasonable
estimates of biomass

Input
https://github.com/SISTA16/cmsy
https://github.com/SISTA16/cmsy/blob/master/CMSY_UserGuide_24Oct16.docx
Resilience prior r range
High 0.6 – 1.5
Medium 0.2 – 0.8
Low 0.05 – 0.5
Very low 0.015 – 0.1
stock Name
English
Name
Scientific
Name Source Resilience StartYear EndYear
Biomass
status
beginning
Biomass
status
end Type
Possible
Crash
her-47d3
Herring in
Sub-area IV,
Divisions
VIId & IIIa
(autumn-
spawners)
Atlantic
herring
Clupea
harengus
www.ices
.dk
Medium 1947 2013 Good/Bad Good/Bad
Biomass/
CPUE/
none
No
Stock ID Year Catch Biomass/CPUE
her-47d3 1947 581760 7053257
her-47d3 1948 502100 6362933
her-47d3 1949 508500 6070794
her-47d3 1950 491700 6119555
her-47d3 1951 600400 6199629
her-47d3 1952 664400 6058665
her-47d3 1953 698500 5950584
her-47d3 1954 762900 5809471
… … … …
ID File:
Time Series File: Estimated status of the biomass at the beginning and the end of
the time series

Output
Illex coindetii
broadtail shortfin squid
Analysis charts Management charts

CMSY - Approach
• Given a catch trend estimate the best pair of
values for the intrinsic rate of increase (r) and the
carrying capacity (k) that generated the trend
• Goal: estimate r and k.
Constraint: the Schaefer function
𝒃 𝒕+𝟏 = 𝒃 𝒕 − 𝒄 𝒕 + 𝒓 𝒃 𝒕 𝟏 −
𝒃 𝒕
𝒌
𝒗 𝒔
CMSY has a double approach: Monte Carlo Analysis and
Bayesian Schaefer Model

𝑏𝑡+1 = 𝑏𝑡 − 𝑐𝑡 + 𝒓 𝑏𝑡 1 −
𝑏𝑡
𝒌
𝑣𝑠
Step 1: sample all possible r
and k pairs compliant with the
Schaefer function and the
priors
Step 2: resample in the lower
tip. We search for the mean of
maximum viable r-values
Step 3: divide the tip in 25
ranges
Step 4: take the median of the
non-empty ranges
Result by CMSY analysis
True value
Monte Carlo approach
𝑀𝑆𝑌 =
𝑟𝑏𝑒𝑠𝑡 𝑘 𝑏𝑒𝑠𝑡
4
MonteCarlo Analysis

Bayesian Schaefer Analysis
• In the case the Biomass or CPUE trends are
available, CMSY increases the precision of
the estimation:
• Goal: estimate r and k.
Constraint: the Schaefer function
𝒃 𝒕+𝟏 = 𝒃 𝒕 − 𝒄 𝒕 + 𝒓 𝒃 𝒕 𝟏 −
𝒃 𝒕
𝒌
𝒗 𝒔

Issues
Simple curve fitting does not work
Estimate after curve
fitting

1. Clustering Analysis (DBScan)
4. Viable pairs densities
3. Gaussian Mixtures2. Trapezoidal density over the best
fit r-k line
Gm of the largest cluster
Simulation
of r density
Search in
the tip of
the r-k
triangle
X
Other unpromising approaches

Difficulty of the problem
At each step of the sampling process:
• The biomass values are strongly correlated between them
• An iterative fitting model should
• approximate the complete biomass curve using better and better r
and k values
• produce a new biomass curve correlated to the previous biomass
curve
• account for time dependency between the samples of one curve

Brain signals
Robotics
Biology
Statistics
Speech processing
Mathematics
Promising approach:
Markov Chain Monte Carlo methods

MCMC and the Schaefer function
𝜃 = {𝛼, 𝑘, 𝑟, 𝑏0, 𝑏1, 𝑏2, . . , 𝑏 𝑇}
b
0
b
1
b
T
…
rk𝛼𝑏0 = 𝛼𝑘
𝑏𝑡+1 = 𝑏𝑡 − 𝑐𝑡 + 𝒓 𝑏𝑡 1 −
𝑏𝑡
𝒌
𝑣𝑠
• The Schaefer formula is used as likelihood(s)
• Priors are required for 𝛼,k and r
At each step, the MCMC produces samples for these parameters:
where T is the maximum time of the biomass trend
𝜃0 = {𝛼0, 𝑘0, 𝑟0, 𝑏00, 𝑏10, 𝑏20, . . , 𝑏 𝑇0}
𝜃𝑀 = {𝛼𝑀, 𝑘𝑀, 𝑟𝑀, 𝑏0 𝑀, 𝑏1 𝑀, 𝑏2 𝑀, . . , 𝑏 𝑇 𝑀}
𝜃1
𝜃2
𝜃3
𝜃4
After M steps…
Hierarchical model for the variables
Details in Coro G. Gibbs Sampling with JAGS: Behind the Scenes. Technical report, 2017, CNR PUMA, cnr.isti/2017-B5-001
http://puma.isti.cnr.it/dfdownload.php?ident=/cnr.isti/2017-B5-001&langver=it&scelta=Metadata
https://www.researchgate.net/publication/313905185_Gibbs_Sampling_with_JAGS_Behind_the_Scenes

• Simulating a biomass trend by means of an MCMC requires the model to
produce, at each step of the sampling process, a new biomass time series
by means of new values assigned to model variables
• At each step the MCMC tries to simulate the whole biomass time series
using new values for r and k
• The new picked values are constrained by the Schaefer function and by
the prior probability distributions that we assume for the r and k
variables
• MCMC accounts for these constraints during the fitting phase. After
several sampling and adjustment steps, the model finds the variables
values that produce the best approximation of the target biomass trend
𝜃1 = {𝛼1, 𝑘1, 𝑟1, 𝑏01, 𝑏11, 𝑏21, . . , 𝑏 𝑇1}
𝜃𝑀 = {𝛼𝑀, 𝑘𝑀, 𝑟𝑀, 𝑏0 𝑀, 𝑏1 𝑀, 𝑏2 𝑀, . . , 𝑏 𝑇 𝑀}
….
𝜃0 = {𝛼0, 𝑘0, 𝑟0, 𝑏00, 𝑏10, 𝑏20, . . , 𝑏 𝑇0}
MCMC and the Schaefer function

MCMC using Gibbs Sampling
• The user takes model variables and designs a graph of
the constraints between the variables
• The system writes a posterior probability density in
terms of priors, likelihoods and conditionals
• The model samples variables values from each factor,
using approximate or analytical forms of these factors
• At each variable sampling step, the model fixes the
values of the other variables
• After several steps the values are likely to converge to
the best estimate
…
Best estimate set 𝜃∗
(Markov Chain)
Details in Coro G. Gibbs Sampling with JAGS: Behind the Scenes. Technical report, 2017, CNR PUMA, cnr.isti/2017-B5-001
http://puma.isti.cnr.it/dfdownload.php?ident=/cnr.isti/2017-B5-001&langver=it&scelta=Metadata
https://www.researchgate.net/publication/313905185_Gibbs_Sampling_with_JAGS_Behind_the_Scenes

𝑏𝑡+1 = 𝑏𝑡 − 𝑐𝑡 + 𝒓 𝑏𝑡 1 −
𝑏𝑡
𝒌
𝑣𝑠
Step 1: consider the complete
r,k space. Use the CMSY points
as background reference only
Step 2: produce iteratively
points that are compliant with
the observed Schaefer
function and the priors
Step 3: concentrate the search
in the accumulation area
Step 4: take the geometric
mean in the accumulation area
Bayesian Schaefer Model (BSM)
estimate
proxies

1. Defining the form of the distributions of the priors was crucial!
This was done using 50 simulated stocks for which r and k were known
2. Defining the initial ranges of the parameters is important
This is done by the stock “expert” when indicating the prior knowledge in the ID file
3. A good balance was found between prior knowledge and knowledge from the
data
This was done by testing the model for several years in Workshops and in focus
groups
Key aspects of CMSY

CMSY on simulated data
• CMSY was tested against 50 simulated stocks where true r, k, MSY and biomass were
known
• Monte Carlo analysis included the true r-k in 100% of the cases. BSM was used as
coherence check

CMSY applications
ICES:
WKLife IV meeting (27-31 Oct. 2014): CMSY was applied to all the data-limited stocks
proposed by ICES.
http://ices.dk/sites/pub/Publication%20Reports/Expert%20Group%20Report/acom/2014/WKLIFE4/wklifeIV_2014.pdf
WKLife V meeting (5-9 Oct. 2015): CMSY was applied to all the data-limited stocks proposed
by ICES.
http://ices.dk/sites/pub/Publication%20Reports/Expert%20Group%20Report/acom/2015/WKLIFEV/wklifeV_2015.pdf
FAO:
Assessed CMSY among the best performing data-limited stocks models
http://www.fao.org/docrep/019/i3491e/i3491e.pdf
Is building a Web interface to produce fisheries management reports using CMSY
http://data.d4science.org/UHZhM2pVWW1IOXRjZk9qTytQTndqaUpjamJScDg0VVVHbWJQNStIS0N6Yz0
Oceana:
Based on CMSY Oceana study (on 400 stocks) found that fish catches in European waters
could increase by 57% if stocks were managed sustainably
http://oceana.org/press-center/press-releases/oceana-study-finds-fish-catches-european-waters-could-increase-57-if

R. Froese, C. Garilao, H. Winker, G. Coro, N. Demirel, A.
Tsikliras, D. Dimarchopoulou, G. Scarcella, A. Sampang-Reyes
(2016)
http://eu.oceana.org/sites/default/files/stockstatusreport_n
ewversion_0.pdf
Full Oceana report and status of EU stocks

European Stocks in 2013-2015
◄ Management Decision ►
Analysis of 397 stocks in European Seas and adjacent waters. Froese et al. 2016.
◄F&Reproduction&Growth►

Exploitation of 397 stocks in European Seas in 2013-2015. Note overlapping of different types of
overexploitation, and therefore the numbers do not add up to 100%. Froese et al. 2016

Status of 397 stocks in European Seas 2013-2015. Froese et al. 2016

Froese et al. 2016
Compliance to Common Fisheries Policy of the
European Union (CFP 2013) by Ecoregion 2013-2015

1. Take the estimated biomass of the stocks in a certain region
2. Evolve the relative biomasses in time starting from values in the
neighbourhoods of B/Bmsy, F and Fmsy considering different F
scenarios
3. For each evolution, cluster the B/Bmsy values and then average the
values
4. Average the averages of each evolved variable, and estimate the
confidence intervals
5. Plot the averaged evolutions
Producing multi-species future fisheries
scenarios
𝐵𝑡+1
𝐵 𝑚𝑠𝑦
=
𝐵𝑡
𝐵 𝑚𝑠𝑦
+ 2 𝐹 𝑚𝑠𝑦
𝐵𝑡
𝐵 𝑚𝑠𝑦
1 −
𝐵𝑡
2 𝐵 𝑚𝑠𝑦
−
𝐵𝑡
𝐵 𝑚𝑠𝑦
𝐹𝑡

Percentage of Stocks at or
above Bmsy
Best rebuilding under the 0.5 Fmsy scenario,
worst under the 0.95 Fmsy scenario
Rainer Froese – Presentation at the EU Parliament 27/02/2017

Percentage of Depleted Stocks
Best rebuilding under the 0.5 Fmsy scenario,
worst under the 0.95 Fmsy scenario

Profitability
Good profits for the 0.5 – 0.8 Fmsy scenarios
Low profit for the 0.95 Fmsy scenario
𝜋 𝑡 =
𝐹𝑡
𝐹𝑚𝑠𝑦
𝐵𝑡
𝐵 𝑚𝑠𝑦
−
1 −
𝜇 𝑚𝑒𝑎𝑛
100
𝐶
𝑀𝑆𝑌 𝑚𝑒𝑎𝑛
𝐹
𝐹𝑚𝑠𝑦 𝑚𝑒𝑎𝑛

Analysis of current (2013 -2015) and potential catches for 397 stocks in European Seas. Because
of trophic interactions, all stocks cannot support maximum yields simultaneously. Froese et al. 2016.

Comments on the multi-species
application of CMSY (1/2)
Species interactions and environmental impact are implicitly considered in surplus
production models by the rate of net productivity (r), which summarizes natural
mortality such as caused by predation by other species, somatic growth such as
modulated by available food sources, and recruitment such as impacted by
environmental conditions and by parental egg production.
CMSY accounts explicitly for reduced recruitment at small stock sizes*.
*Froese, N. Demirel, G. Coro, K. Kleisner, H. Winker, Estimating fisheries reference points from catch and resilience. Fish Fish., (in press) 10.1111/faf.12190,
J.T. Schnute, L.J. Richards, “Surplus production models” in Handbook of Fish Biology and Fisheries, P.J.B. Hart, J.D. Reynolds, Eds. (Blackwell, 2002), vol. 2, pp. 105–126.
T.J. Quinn, R.B. Deriso, Quantitative fish dynamics (Oxford University Press, NY, 1999)

Compared with age-structured models where exploitation is typically reported for a narrow
range of fully selected age classes, surplus production models estimate exploitation as total
catch to biomass ratio.
This is similar to using the mean exploitation rate across all age classes weighted by their respective
contribution to the catch. If the catch consists to a large part of juveniles that are only partly selected by the
gear, then the overall rate of fishing mortality strongly underestimates the fishing mortality of the fully
selected older year classes.
 In order to address the problem of underestimation of fishing mortality in fully selected age
classes CMSY reduces the estimate of Fmsy as a linear function of biomass below 0.5 Bmsy.
𝐹𝑟𝑒𝑑𝑢𝑐𝑒𝑑 = 2
𝐵𝑡
𝐵 𝑚𝑠𝑦
𝐹 |
𝐵𝑡
𝐵 𝑚𝑠𝑦
< 0.5
Comments on the multi-species
application of CMSY (2/2)

A collaborative approach to CMSY

Big Data
1. Large volume
2. High generation velocity
3. Large variety
4. Untrustworthyness
(veracity)
5. High complexity
(variability)
Big Data: a dataset with large volume, variety, generation velocity, containing complex and
untrustworthy information that requires nonconventional methods to extract, manage and
process information within a reasonable time.
6. Understandable value

New Science Paradigms
 Open Science: make scientific research, data and dissemination
accessible to all levels of an inquiring society, amateur or
professional.
Keywords: Open Access, Open research, Open Notebook Science
 E-Science: computationally intensive science is carried out in highly
distributed network environments that use large data sets and
require distributed computing and collaborative tools.
Keywords: Provenance of the scientific process, Scientific workflows
 Science 2.0: process and publish large data sets using a
collaborative approach. Share from raw data to experimental
results and processes. Support collaborative experiments and
Reproducibility-Repeatability-Reusability (R-R-R) of Science.
Keywords: collaborative and repeatable Science

Requirements for IT systems
• Support collaborative research and experimentation
• Implement Reproducibility-Repeatability-Reusability of
Science
• Allow sharing data, processes and findings
• Grant free access to the produced scientific knowledge
• Tackle Big Data challenges
• Sustainability: low operational costs, low maintenance
prices
• Manage heterogeneous data/processes access policies
• Meet industrial processes requirements

Distributed e-Infrastructures
e-Infrastructures enable researchers at different locations across the world
to collaborate in the context of their home institutions or in national or multinational scientific initiatives.
• People can work together having shared access to unique or distributed scientific facilities (including data,
instruments, computing and communications).
Examples:
Belief, http://www.beliefproject.org/
OpenAire, http://www.openaire.eu/
i-Marine, http://www.i-marine.eu/
EU-Brazil OpenBio,
http://www.eubrazilopenbio.eu/

D4Science.org – Hybrid Data
Infrastructure
Unified Resource Space
Powered by gCube
Enables
Integrates
D4Science.org Infrastructure
WPS
Variety/Veracity Volume
Velocity/
Variability
1. External Systems:
• Storage
• Computations
• Data services
2. Integration services:
• Manage external systems
• Harmonise data
• Host data and processes
• Support adaptability
3. Infrastructure resources:
• Manage security
• Expose Integration services
• Support information
exchange between services
Data Computational
Infrastructures
Computational
Services
A system of systems

Virtual Research Environments
Integrates
D4Science.org
Infrastructure
Unified Resource Space
Powered by gCube
Enables
VRE VRE VRE
WPS
• Define sub-communities
• Allow temporary dedicated
assignment of computational,
storage, and data resources
• Manage policies
• Support data and information
sharing

Virtual Research Environments
Innovative, web-based, community-oriented, comprehensive, flexible, and
secure working environments.
• Communities are provided with applications to interact with the VRE services
• Client services are provided both with APIs (Java, R) and simple HTTP-REST interfaces

D4Science.org Services
Mediators /
Adapters
Data Analytics Services Data Space Services
Infrastructures and Service Providers
Collaborative Services Core Services
Resources Mgr
Catalogue
HN
AAA
VRE Mgr
Social Networking Workspace Users Mgmt
Standard based (e.g. CWS)Ad-hoc mediators
Search
Access
Storage
Dashboard
Algorithms
Workflows
Browse
Publish
Curation

Researchers
D4Science supports scientists in several domains
1. More than 25 000
taxonomic
studies per month
www.i-marine.eu
2. More than 60 000
species distribution
maps produced and
hosted
www.d4science.eu
3. Used to build a
pan- European
geothermal energy
map
www.egip.d4science.org
4. Processing and
management of
heterogeneous
environmental and
Earth system data
www.envriplus.eu
5. Enhances
communication and
exchange in Linguistic
Studies, Humanities,
Cultural Heritage,
History and
Archaeology
www.parthenos-project.eu

Society and citizens
1. CNR Smart Campus - PISA: a Smart City experiment to optimise the
use of resources and reduce the environmental impact, whilst
increasing the quality of life and work. www.smart-applications.area.pi.cnr.it
2. SoBigData EU Prj. : create the Social Mining & Big Data Ecosystem, a
research infrastructure for ethic-sensitive scientific discoveries and
advanced applications of social data mining. www.sobigdata.eu
data storage and mining of the large data information flow on
parking, buildings and mobility
computational platform and cloud storage to integrate data mining
processes and host data and results, VA enabler

Policy Makers
1. D4Science hosts and runs the CMSY
model to assess the health status of
fisheries stocks
http://www.cnr.it/news/index/news/id/5987
CMSY model
2. D4Science supports the identification of
Marine Protected Areas to reduce
adverse impact of human activities (e.g.
fishing, aquaculture, tourism) on
ecosystems, and to ensure these
activities are properly embedded in
policy frameworks.
http://www.bluebridge-vres.eu/services/protected-area-
impact-maps

Companies
1. Predict aquaculture
revenue and
business
development
www.bluebridge-vres.eu
2. Host and process
satellite data from
Copernicus
3. Collect logs from
experts and centralize
the network of
information
4. Self-service
integration of
algorithms to enable
Cloud computation
services.d4science.org

Education
Lecture-style: the course topics stress is different depending
on the audience
Interactive: after each explained topic, students do
experiments
Experimental: students reproduce the experiment shown by
the teacher and possibly repeat it on their own data
Social: students communicate via messaging or VRE discussion
panel
• 1 course/year
In Pisa
• 1 course/year
In Paris
• 12 courses
In Copenhagen
www.bluebridge-vres.eu
International Council for
the Exploration of the Sea
• 38 courses
All over the world
+1000 attendees

Numbers
• +2000 scientists in 44 countries,
• integrating +50 heterogeneous
data providers,
• executing +25,000
processes/month,
• providing access to over a billion
quality records in repositories
worldwide,
• 99,7% service availability.
• +50 VREs hosted

Statistical
Manager
D4Science
Computational
Facilities
Sharing
Setup and execution
Computing Platform
Coro, G., Candela, L., Pagano, P., Italiano, A., & Liccardo, L. (2015). Parallelizing the execution of native data mining algorithms for computational
biology. Concurrency and Computation: Practice and Experience, 27(17), 4630-4644.

Collaborative experiments
WS
Shared online folders
Inputs
Outputs
Results
Computational system
In the e-Infrastructure
Through third party software

Process description:
http://dataminer-d-
d4s.d4science.org/wps/WebProcessingService?Request=DescribePro
cess&Service=WPS&Version=1.0.0&gcube-token=d7a4076c-e8c1-
42fe-81e0-
bdecb1e8074a&Identifier=org.gcube.dataanalysis.wps.statisticalman
ager.synchserver.mappedclasses.generators.CMSY
Process execution:
http://dataminer-d-
d4s.d4science.org/wps/WebProcessingService?request=Execute&ser
vice=WPS&Version=1.0.0&gcube-token=d7a4076c-e8c1-42fe-81e0-
bdecb1e8074a&lang=en-
US&Identifier=org.gcube.dataanalysis.wps.statisticalmanager.synchse
rver.mappedclasses.generators.CMSY&DataInputs=IDsFile=http://go
o.gl/9rg3qK;StocksFile=http://goo.gl/Mp2ZLY;SelectedStock=HLH_M0
7
R/JAVA Client
Guide:
https://wiki.gcube-
system.org/gcube/How_to_Interact_with_the_Statistical_Mana
ger_by_client#WPS_Client
InterfacesWeb Processing Service
Web Interfaces
QGIS

WPS
REST
I.S.
Infrastructure
Infrastructure resources
Geospatial data
External infra.
WPS

Advantages of integrations
 The process is available as-a-Service
 Invoked via communication standards
 Higher computational capabilities
 Automatic creation of a Web interface
 Provenance management
 Storage of results on a high-availability system
 Collaboration and sharing
 Re-usability, e.g. from other software (e.g. QGIS)

Innovation through
integrationVision: integration, sharing, and remote hosting help
informing people and taking decisions

Using CMSY
https://i-marine.d4science.org/group/drumfish/drumfish

CMSY workshop - Gianpaolo Coro (ISTI-CNR)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (9)

Similar to CMSY workshop - Gianpaolo Coro (ISTI-CNR)

Similar to CMSY workshop - Gianpaolo Coro (ISTI-CNR) (20)

More from Blue BRIDGE

More from Blue BRIDGE (20)

Recently uploaded

Recently uploaded (20)

CMSY workshop - Gianpaolo Coro (ISTI-CNR)

Editor's Notes