SlideShare a Scribd company logo
1 of 47
Deep time Paleo-environmental
and Bio-diversity data mining
and deep learning classifcation
Abdullah Khan Zehady
Phd student @
Earth, Atmospheric & Planetary Science,
Purdue University
Research Projects
1. Macro and micro scale evolution of planktonic foraminifera and the potential drivers during Cenozoic era (66.04
Ma).
Hypothesis:
“Rates of evolution are correlated with rates of geochemical and sea-level change.”
- Can we find long term (2-5 myr) astronomical cycles?
2. Periodicities and other cause-effect relationships among pulses of evolution since Cambrian period (541 Ma).
Hypothesis:
(a) “The Earth has had semi-periodic episodes of unusual surface/biological change.”
- Abundance of events over time; verification of catastrophe models (20 – 60 myr periods); causation by impact and other disastrous events.
(b) “Pulses of biological evolution occur simultaneously with global changes in sediment facies.”
3. Automated fossil image recognition by feature extraction using deep neural network.
4. Effect of climate change on cultural turnover of last 2000 years of human civilization.
Hypothesis:
“A major factor in the rise and fall of human civilization in different continents is climate cooling.”
2
My papers on evolutionary tree visualization algorithm
3
To be sumbitted (Under review) at BMC
Evolutionary Biology
To be submitted at Nature Methods
Morphospecies and Lineage/Species evolution
(Fordham, Zehady et al 2018) 4
Species Phenon Integrated Tree
(Zehady et al 2019)
Mutual Learning using Integrated Tree
Comparison and learning between integrated tree and molecular tree
PaleoEnvironmental & Bio-Diversity Data
Dataset Significance
1. Marine genera ranges and subsequent turnover timeseries data
for whole Phanerozoic
Marine biodiversity, turnover
2. Cenozoic planktonic foraminifera evolution and turnover
timeseries. (Zehady, Fordham 2018, Aze et al 2011)
Foraminifer diversity, turnover
3. Oxygen-18 (δ18O) curves and events (Cramer 2009) Long-term cooling of the ocean interior
4. Carbon-13 (δ13C) curves and events (Cramer 2009) Terrestrial climate proxy record
(Gradual sinking of the organic matter)
5. Strontium isotope record (Sr87/Sr86) Tectonic evolution, continental spreading
origin and evolution of igneous rocks
(magmatic style)
6. Sulphur isotope record (δ34 Ssulphate) Link with LIP
7. Ages and volume extent of LIP(Large Igneous Province) Large volume gas release in the ocean-
atmospheric systems, link with magmatism,
mass extinction, extinction cyclicity (Melott
2012)
8. Sea-level synthesis curve (Haq et al 2014) Plate motion, changes in continent mass
distribution
9. Passive margin 7
Grand Cycles in Paleo-Meso-Cenzoic Era
Orbital forcing - Milankovitch Cycles
9
Amplitude modulation, 2.4 Ma eccentrici
Marine genera turnover data
Entire Phanerozoic (Prob) Cenozoic only (# turnover)
Raw turnover prob
= Raw speciation +
Raw extinction prob
To reduce stochastic noise:
Fit Hidden Markov Model
(HMM)
Markov property:
Only dependent on the
previous state
Parameter estimation
Using Baum-Welch
Algorithm.
AIC is used to estimate
Speciation and Extinction
State.
Multi-taper spectrum – Whole Phaneorozoic
* Number of significant F-test peaks
identified = 9
ID / Frequency / Period / Harmonic_CL / Red
noise_CL
1 0.1504065 6.648649 98.00066 69.56518
2 0.2524021 3.961933 96.46992 98.53005
3 0.27051 3.696721 93.19279 84.04573
4 0.2775314 3.603196 94.98532 97.86011
5 0.2852919 3.505181 93.26527 59.50705
6 0.2908352 3.438374 90.65819 99.49677
7 0.3510717 2.848421 98.56435 80.32858
8 0.4035477 2.478022 97.29846 68.29225
9 0.4903917 2.039186 95.83189 93.04155
Multi-taper spectrum – Only Cenozoic (0-67 Ma)
* Number of significant F-test peaks identified = 1
ID / Frequency / Period / Harmonic_CL /
Rednoise_CL
1 0.2787879 3.586957 99.95627 92.50397
Multi-taper spectrum – Only Mesozoic(67-252 Ma)
* Number of significant F-test peaks identified =
3
ID / Frequency / Period / Harmonic_CL /
Rednoise_CL
1 0.01079914 92.6 99.45956 67.96132
2 0.3401728 2.939683 99.03912 56.54744
3 0.3704104 2.699708 94.96196 88.81204
Multi-taper spectrum – Only Paleozoic(252- 541 Ma)
* Number of significant F-test peaks identified =
7
ID / Frequency / Period / Harmonic_CL /
Rednoise_CL
1 0.004149378 241 93.49593 78.74621
2 0.01521438 65.72727 91.12898 93.60384
3 0.1500692 6.663594 91.39851 88.02085
4 0.2524205 3.961644 94.03579 98.97888
5 0.2904564 3.442857 93.06378 89.31391
6 0.3506224 2.852071 96.671 84.24726
7 0.4439834 2.252336 90.65441 92.18804
Ocean-Atmosphere System
How marine and terrestrial environment are connected?
MaGIC model – Geochemistry based modeling
(Arvidson et al 2006)
f[1]: Organic Phosphorus (P)
f[21]: Terrestrial Organic matter(C) burial flux
f[50]: Sulfate(S) reduction
f[54]: Organic Carbon(C) sedimentation
f[71], f[72]: Carbonate precipitation
PaleoEnvironmental Data – Oxygen & Carbon
16
Global warming/cooling
Organic carbon
abundance
PaleoEnvironmental Data : Strontium + Passive Margin
17
Continental/Sea floor Spreading
Transition between
oceanic and continental
lithosphere via
sedimentation on
passive margin
Large Igneous Province
18
Number of Large Igneous Provinces and
the volume since 140 Ma
PaleoEnvironmental Data – Oxygen & Carbon
19
Regional impact events
(Crater > 50 km, 5-10km)
Sulphur isotope data
Biodiversity Data
20
Speciation and extinction
of 18,000 marine genera
Number of marine genera
PaleoEnvironmental & Bio-Diversity Data
What am I looking for?
Synchronous anomalies
Long-term oscillations/Periodicities
Changes in rates
Correlation matrix of timeseries data
22
genera_ts genera_prokoph oxy_18 carbon_13 sr87_86 s34 LIP LIP_volume1 LIP_volume2 impact passive_margin sea level
genera_ts 1 0.018355819 -0.0340282 0.083035629 0.0297944 -0.0373417 0.04421 0.344736229 0.344635849 -0.04627802 0.104789606 -0.08349
genera_prokoph 0.018355819 1 -0.530246 -0.465936957 0.48878332 0.742836318 -0.083082 -0.43421673 -0.437467491 0.14855874 -0.101226221 -0.41527
oxy_18 -0.034028212 -0.530246 1 0.237833917 -0.8404705 -0.62891252 -0.0204 0.281461527 0.285603444 -0.15390199 0.118727266 0.849575
carbon_13 0.083035629 -0.4659369570.23783392 1 -0.1983475 -0.37694399 0.1964381 0.17408195 0.181388307 -0.11451248 -0.0444354 0.178451
sr87_86 0.029794405 0.48878332 -0.8404705 -0.19834753 1 0.453619955 -0.08259 -0.19061375 -0.194846829 0.17851698 -0.218495446 -0.9094
s34 -0.037341696 0.742836318 -0.6289125-0.376943992 0.45361995 1 -0.106022 -0.45775826 -0.464132734 0.10039699 -0.170888249 -0.46339
LIP 0.04420999 -0.083082314 -0.0204002 0.196438071 -0.0825905 -0.10602187 1 0.118074692 0.121020407 -0.12452535 -0.042236671 0.072872
LIP_volume1 0.344736229 -0.4342167280.28146153 0.17408195 -0.1906138 -0.45775826 0.1180747 1 0.999783534 0.04981638 -0.005744232 0.202309
LIP_volume2 0.344635849 -0.4374674910.28560344 0.181388307 -0.1948468 -0.46413273 0.1210204 0.999783534 1 0.04848648 -0.00962115 0.207276
impact -0.046278015 0.148558742 -0.153902-0.114512478 0.17851698 0.100396987 -0.124525 0.049816378 0.04848648 1 0.061153414 -0.25373
passive_margin 0.104789606 -0.1012262210.11872727 -0.0444354 -0.2184954 -0.17088825 -0.042237 -0.00574423 -0.00962115 0.06115341 1 0.117477
sea level -0.083488878 -0.4152675890.84957453 0.178450982 -0.9093977 -0.46338895 0.0728716 0.202309075 0.207276424-0.25373116 0.1174771 1
Principal Component Analysis of Cenozoic data
23
PCA transforms correlated data into a new co-ordinate
such that the new variables are uncorrelated.
The goal of PCA is to find components
Z = [Z_1, Z_2, …, Z_p]
which are linear combination of
u = [u_1, u_2, …, u_p]’ of the
Original variable
X = [X_1, X_2, …, X_p] that achieve maximum
variance.
For Cenozoic, we have no missing values for all 12
variables/parameters.
X: A matrix with dimension 69 x 12, n = 69, p =12
Y: Normalized matrix of X
C: Covariance matrix where C = t(X) * X / (p -1) ,
Eigen value decomposition in R
E = eigen(C) where t(E) * E = I
EOF (Empirical Orthogonal Function) : Orthogonal basis
function, basically the eigen vectors
Ev : Eigen vector matrix
Principal Component Analysis - loadings
24
Loadings table is composed of principal component vectors
Principal Component Analysis of Cenozoic data
25
Principal Component Vector 1 Principal Component Vector 2
Principal Component Analysis of Cenozoic data
26
Principal Component Vector 3 Principal Component Vector 4
Principal Component Analysis of Cenozoic data
27
Principal Component Vector 5 Principal Component Vector 6
Automated Fossil Genus Classification
Hedbergella Sigali Globigerinoides Altiaperturus
Binomial nomenclature system: Genus Species
Can we extract features from the species to detect its Genus?
Automated Fossil Genus Classification
Image Data Number of Images Accuracy with Best model so far
Training (Transformed image) 1947 ~ 74%
Validation 649 ~ 55%
Test (Previously Seen Species) 236 ~ 90%
Unseen Test (Totally new species) 37 ~ 43%
Multi class classification
How many different Genus class we have? --> 79
Model Comparison of 3 CNN (VGG19) models “Cross Entropy” loss minimization with Adam Optimizer
Automated Fossil Genus Classification
Paragloborotalia predicted as Turborotalia
Most Likely cause of misclassification
Turborotalia Training Images
Misclassification Analysis - 1
Globigerina
as
Globigerinoides
Misclassification Analysis – 2
Globigerinoides
as
Globorotalia
Misclassification Analysis – 3 (Reverse of 2)
Globorotalia
as
Globigerinoides
The Black hole in the center!!
Misclassification Analysis – 4
Globorotalia
as
Globoturborotalita
Misclassification Analysis – 5
Globoturborotalita
as
Globigerina
Misclassification Analysis – 6
Globuligerina
as
Pseudohastigerina
What each CNN layer is learning?
Layer2 : Block2_Conv2
Mostly directional
Layer3 : Block3_Conv2
10 random filters in each layer
What each CNN layer is learning?
Layer2 : Block2_Conv2
Mostly directional
Layer3 : Block3_Conv2
10 random filters in each layer
How does VGG19 CNN model classify between genus?
What are the final abstraction/specific output categories for each of the 79 classes?
Unique features to classify.. With categorical cross-entropy
Globigerina Globigerinoides Globorotalia Sigalia
Month/Year Prioritized Item
Jan 2019 ~ April 2019 • Submission of Evolutionary tree visualization paper
• Submission of Species-Phenon Integrated tree paper
• Completion of culture paper
Feb 2019 • NSF proposal resubmission.
Jan 2019 ~ May 2019 • Phanerozoic data extraction, Visualization
• Correlation/causality and spectral analysis for Phanerozoic (500 myr) data.
• Learning on Cross phase, phase lag analysis, Causality analysis
• Automatic fossil image detection project
May 2019 ~ Aug 2019 Internship at Cisco Systems in their Data Center
Nov 2019 • Poster, paper presentation on Machine-learning and artificial-intelligence application in
the geosciences (GSA Annual meeting)
• Other Machine learning journals..
Timeline
40
Suggestions,
Recommendations..
41
Extra Slides
Multi-taper spectrum – first 300 myr of Phaneorozoic
* Number of significant F-test peaks identified =
9
ID / Frequency / Period / Harmonic_CL /
Rednoise_CL
1 0.01136364 88 96.39839 95.99728
2 0.1557487 6.420601 94.82795 86.59931
3 0.1657754 6.032258 98.66033 94.32486
4 0.2159091 4.631579 97.50714 82.3273
5 0.2332888 4.286533 97.22287 68.10389
6 0.3709893 2.695495 98.40121 86.48012
7 0.3923797 2.548552 93.06953 89.62598
8 0.3977273 2.514286 93.58799 91.69948
9 0.4946524 2.021622 96.30716 82.51252
Multi-taper spectrum – 300-600 myr of Phaneorozoic
* Number of significant F-test peaks identified =
8
ID / Frequency / Period / Harmonic_CL /
Rednoise_CL
1 0.01570248 63.68421 90.13648 92.73254
2 0.1429752 6.99422 91.96099 97.77526
3 0.246281 4.060403 91.6426 95.21544
4 0.2520661 3.967213 93.09128 97.57965
5 0.2917355 3.427762 90.01668 84.54075
6 0.3504132 2.853774 95.51884 82.62293
7 0.3628099 2.756264 97.1726 71.84603
8 0.4438017 2.253259 97.93826 87.35393
Principal Component Analysis of Cenozoic data
45
Principal Component Vector 7 Principal Component Vector 8
Principal Component Analysis of Cenozoic data
46
Principal Component Vector 9 Principal Component Vector 10
Principal Component Analysis of Cenozoic data
47
Principal Component Vector 11 Principal Component Vector 12

More Related Content

Similar to Paleo environmental bio-diversity macro-evolutionary data mining and deep learning

1_Buck - Wavemil Steps IGARSS-11.ppt
1_Buck - Wavemil Steps IGARSS-11.ppt1_Buck - Wavemil Steps IGARSS-11.ppt
1_Buck - Wavemil Steps IGARSS-11.pptgrssieee
 
Detection of an atmosphere around the super earth 55 cancri e
Detection of an atmosphere around the super earth 55 cancri eDetection of an atmosphere around the super earth 55 cancri e
Detection of an atmosphere around the super earth 55 cancri eSérgio Sacani
 
Data mining and_visualization_of_earth_history_datasets_to_find_cause_effect_...
Data mining and_visualization_of_earth_history_datasets_to_find_cause_effect_...Data mining and_visualization_of_earth_history_datasets_to_find_cause_effect_...
Data mining and_visualization_of_earth_history_datasets_to_find_cause_effect_...Abdullah Khan Zehady
 
Abundance and isotopic_composition_of_gases_in_the_martian_atmosphere_from_th...
Abundance and isotopic_composition_of_gases_in_the_martian_atmosphere_from_th...Abundance and isotopic_composition_of_gases_in_the_martian_atmosphere_from_th...
Abundance and isotopic_composition_of_gases_in_the_martian_atmosphere_from_th...Sérgio Sacani
 
FR01_01_GlezetalIGARSS2011.ppt
FR01_01_GlezetalIGARSS2011.pptFR01_01_GlezetalIGARSS2011.ppt
FR01_01_GlezetalIGARSS2011.pptgrssieee
 
FR01_01_GlezetalIGARSS2011.ppt
FR01_01_GlezetalIGARSS2011.pptFR01_01_GlezetalIGARSS2011.ppt
FR01_01_GlezetalIGARSS2011.pptgrssieee
 
Volatile isotopes ang_organic_analysis_of_martian_fines_with_teh_nars_curiosi...
Volatile isotopes ang_organic_analysis_of_martian_fines_with_teh_nars_curiosi...Volatile isotopes ang_organic_analysis_of_martian_fines_with_teh_nars_curiosi...
Volatile isotopes ang_organic_analysis_of_martian_fines_with_teh_nars_curiosi...Sérgio Sacani
 
Understanding Stellar Nucleosynthesis via Multi-isotopic NanoSIMS analyses of...
Understanding Stellar Nucleosynthesis via Multi-isotopic NanoSIMS analyses of...Understanding Stellar Nucleosynthesis via Multi-isotopic NanoSIMS analyses of...
Understanding Stellar Nucleosynthesis via Multi-isotopic NanoSIMS analyses of...Lalit Shukla
 
A rock composition_for_earth_sized_exoplanets
A rock composition_for_earth_sized_exoplanetsA rock composition_for_earth_sized_exoplanets
A rock composition_for_earth_sized_exoplanetsSérgio Sacani
 
The gravity field_and_interior_structure_of_enceladus
The gravity field_and_interior_structure_of_enceladusThe gravity field_and_interior_structure_of_enceladus
The gravity field_and_interior_structure_of_enceladusSérgio Sacani
 
The gravity fieldandinteriorstructureofenceladus
The gravity fieldandinteriorstructureofenceladusThe gravity fieldandinteriorstructureofenceladus
The gravity fieldandinteriorstructureofenceladusGOASA
 
Small scatter and_nearly_isothermal_mass_profiles_to_four_half_light_radii_fr...
Small scatter and_nearly_isothermal_mass_profiles_to_four_half_light_radii_fr...Small scatter and_nearly_isothermal_mass_profiles_to_four_half_light_radii_fr...
Small scatter and_nearly_isothermal_mass_profiles_to_four_half_light_radii_fr...Sérgio Sacani
 
gc_molecularmotorscourse_97
gc_molecularmotorscourse_97gc_molecularmotorscourse_97
gc_molecularmotorscourse_97Gregory Carroll
 
Far infrared dust_temperatures_and_column_densities_of_the_malt90_molecular_c...
Far infrared dust_temperatures_and_column_densities_of_the_malt90_molecular_c...Far infrared dust_temperatures_and_column_densities_of_the_malt90_molecular_c...
Far infrared dust_temperatures_and_column_densities_of_the_malt90_molecular_c...Sérgio Sacani
 
Radio Astronomy and radio telescopes
Radio Astronomy and radio telescopesRadio Astronomy and radio telescopes
Radio Astronomy and radio telescopesFlavio Falcinelli
 
ÖNCEL AKADEMİ: İSTANBUL DEPREMİ
ÖNCEL AKADEMİ: İSTANBUL DEPREMİÖNCEL AKADEMİ: İSTANBUL DEPREMİ
ÖNCEL AKADEMİ: İSTANBUL DEPREMİAli Osman Öncel
 
Geophysical methods in Hydrocarbon Exploration
Geophysical methods in Hydrocarbon ExplorationGeophysical methods in Hydrocarbon Exploration
Geophysical methods in Hydrocarbon ExplorationRaboon Redar
 

Similar to Paleo environmental bio-diversity macro-evolutionary data mining and deep learning (20)

1_Buck - Wavemil Steps IGARSS-11.ppt
1_Buck - Wavemil Steps IGARSS-11.ppt1_Buck - Wavemil Steps IGARSS-11.ppt
1_Buck - Wavemil Steps IGARSS-11.ppt
 
Sareic mauger
Sareic maugerSareic mauger
Sareic mauger
 
Detection of an atmosphere around the super earth 55 cancri e
Detection of an atmosphere around the super earth 55 cancri eDetection of an atmosphere around the super earth 55 cancri e
Detection of an atmosphere around the super earth 55 cancri e
 
Data mining and_visualization_of_earth_history_datasets_to_find_cause_effect_...
Data mining and_visualization_of_earth_history_datasets_to_find_cause_effect_...Data mining and_visualization_of_earth_history_datasets_to_find_cause_effect_...
Data mining and_visualization_of_earth_history_datasets_to_find_cause_effect_...
 
Abundance and isotopic_composition_of_gases_in_the_martian_atmosphere_from_th...
Abundance and isotopic_composition_of_gases_in_the_martian_atmosphere_from_th...Abundance and isotopic_composition_of_gases_in_the_martian_atmosphere_from_th...
Abundance and isotopic_composition_of_gases_in_the_martian_atmosphere_from_th...
 
FR01_01_GlezetalIGARSS2011.ppt
FR01_01_GlezetalIGARSS2011.pptFR01_01_GlezetalIGARSS2011.ppt
FR01_01_GlezetalIGARSS2011.ppt
 
FR01_01_GlezetalIGARSS2011.ppt
FR01_01_GlezetalIGARSS2011.pptFR01_01_GlezetalIGARSS2011.ppt
FR01_01_GlezetalIGARSS2011.ppt
 
Volatile isotopes ang_organic_analysis_of_martian_fines_with_teh_nars_curiosi...
Volatile isotopes ang_organic_analysis_of_martian_fines_with_teh_nars_curiosi...Volatile isotopes ang_organic_analysis_of_martian_fines_with_teh_nars_curiosi...
Volatile isotopes ang_organic_analysis_of_martian_fines_with_teh_nars_curiosi...
 
Understanding Stellar Nucleosynthesis via Multi-isotopic NanoSIMS analyses of...
Understanding Stellar Nucleosynthesis via Multi-isotopic NanoSIMS analyses of...Understanding Stellar Nucleosynthesis via Multi-isotopic NanoSIMS analyses of...
Understanding Stellar Nucleosynthesis via Multi-isotopic NanoSIMS analyses of...
 
A rock composition_for_earth_sized_exoplanets
A rock composition_for_earth_sized_exoplanetsA rock composition_for_earth_sized_exoplanets
A rock composition_for_earth_sized_exoplanets
 
The gravity field_and_interior_structure_of_enceladus
The gravity field_and_interior_structure_of_enceladusThe gravity field_and_interior_structure_of_enceladus
The gravity field_and_interior_structure_of_enceladus
 
The gravity fieldandinteriorstructureofenceladus
The gravity fieldandinteriorstructureofenceladusThe gravity fieldandinteriorstructureofenceladus
The gravity fieldandinteriorstructureofenceladus
 
Small scatter and_nearly_isothermal_mass_profiles_to_four_half_light_radii_fr...
Small scatter and_nearly_isothermal_mass_profiles_to_four_half_light_radii_fr...Small scatter and_nearly_isothermal_mass_profiles_to_four_half_light_radii_fr...
Small scatter and_nearly_isothermal_mass_profiles_to_four_half_light_radii_fr...
 
gc_molecularmotorscourse_97
gc_molecularmotorscourse_97gc_molecularmotorscourse_97
gc_molecularmotorscourse_97
 
Far infrared dust_temperatures_and_column_densities_of_the_malt90_molecular_c...
Far infrared dust_temperatures_and_column_densities_of_the_malt90_molecular_c...Far infrared dust_temperatures_and_column_densities_of_the_malt90_molecular_c...
Far infrared dust_temperatures_and_column_densities_of_the_malt90_molecular_c...
 
dissertation
dissertationdissertation
dissertation
 
Radio Astronomy and radio telescopes
Radio Astronomy and radio telescopesRadio Astronomy and radio telescopes
Radio Astronomy and radio telescopes
 
ÖNCEL AKADEMİ: İSTANBUL DEPREMİ
ÖNCEL AKADEMİ: İSTANBUL DEPREMİÖNCEL AKADEMİ: İSTANBUL DEPREMİ
ÖNCEL AKADEMİ: İSTANBUL DEPREMİ
 
Geophysical methods in Hydrocarbon Exploration
Geophysical methods in Hydrocarbon ExplorationGeophysical methods in Hydrocarbon Exploration
Geophysical methods in Hydrocarbon Exploration
 
Mercator Ocean newsletter 36
Mercator Ocean newsletter 36Mercator Ocean newsletter 36
Mercator Ocean newsletter 36
 

More from Abdullah Khan Zehady

Change of Dynasty correlated with Climate across the world
Change of Dynasty correlated with Climate across the worldChange of Dynasty correlated with Climate across the world
Change of Dynasty correlated with Climate across the worldAbdullah Khan Zehady
 
Parallel convolutional neural network
Parallel  convolutional neural networkParallel  convolutional neural network
Parallel convolutional neural networkAbdullah Khan Zehady
 
Distributed representation of sentences and documents
Distributed representation of sentences and documentsDistributed representation of sentences and documents
Distributed representation of sentences and documentsAbdullah Khan Zehady
 
How to Create AltCoin(Alternative Cryptocurrency)?
How to Create AltCoin(Alternative Cryptocurrency)?How to Create AltCoin(Alternative Cryptocurrency)?
How to Create AltCoin(Alternative Cryptocurrency)?Abdullah Khan Zehady
 
Applying word vectors sentiment analysis
Applying word vectors sentiment analysisApplying word vectors sentiment analysis
Applying word vectors sentiment analysisAbdullah Khan Zehady
 
Word representations in vector space
Word representations in vector spaceWord representations in vector space
Word representations in vector spaceAbdullah Khan Zehady
 
Masurca genome assembly with super reads
Masurca  genome assembly with super readsMasurca  genome assembly with super reads
Masurca genome assembly with super readsAbdullah Khan Zehady
 
Rudimentary bitcoin network analysis
Rudimentary bitcoin network analysisRudimentary bitcoin network analysis
Rudimentary bitcoin network analysisAbdullah Khan Zehady
 
Bitcoin tech talk @Purdue Bitcoin Club
Bitcoin tech talk @Purdue Bitcoin ClubBitcoin tech talk @Purdue Bitcoin Club
Bitcoin tech talk @Purdue Bitcoin ClubAbdullah Khan Zehady
 

More from Abdullah Khan Zehady (17)

Change of Dynasty correlated with Climate across the world
Change of Dynasty correlated with Climate across the worldChange of Dynasty correlated with Climate across the world
Change of Dynasty correlated with Climate across the world
 
Parallel convolutional neural network
Parallel  convolutional neural networkParallel  convolutional neural network
Parallel convolutional neural network
 
Distributed representation of sentences and documents
Distributed representation of sentences and documentsDistributed representation of sentences and documents
Distributed representation of sentences and documents
 
Tribeflow on bitcoin data
Tribeflow on bitcoin dataTribeflow on bitcoin data
Tribeflow on bitcoin data
 
How to Create AltCoin(Alternative Cryptocurrency)?
How to Create AltCoin(Alternative Cryptocurrency)?How to Create AltCoin(Alternative Cryptocurrency)?
How to Create AltCoin(Alternative Cryptocurrency)?
 
Applying word vectors sentiment analysis
Applying word vectors sentiment analysisApplying word vectors sentiment analysis
Applying word vectors sentiment analysis
 
Word representations in vector space
Word representations in vector spaceWord representations in vector space
Word representations in vector space
 
Masurca genome assembly with super reads
Masurca  genome assembly with super readsMasurca  genome assembly with super reads
Masurca genome assembly with super reads
 
Bitcoin Multisig Transaction
Bitcoin Multisig TransactionBitcoin Multisig Transaction
Bitcoin Multisig Transaction
 
Bitcoin ideas
Bitcoin ideasBitcoin ideas
Bitcoin ideas
 
Bitcoin investments
Bitcoin investmentsBitcoin investments
Bitcoin investments
 
Rudimentary bitcoin network analysis
Rudimentary bitcoin network analysisRudimentary bitcoin network analysis
Rudimentary bitcoin network analysis
 
Rich gets richer-Bitcoin Network
Rich gets richer-Bitcoin NetworkRich gets richer-Bitcoin Network
Rich gets richer-Bitcoin Network
 
Bitcoin tech talk @Purdue Bitcoin Club
Bitcoin tech talk @Purdue Bitcoin ClubBitcoin tech talk @Purdue Bitcoin Club
Bitcoin tech talk @Purdue Bitcoin Club
 
Bitcoin Network Analysis
Bitcoin Network AnalysisBitcoin Network Analysis
Bitcoin Network Analysis
 
Bitcoin & Bitcoin Mining
Bitcoin & Bitcoin MiningBitcoin & Bitcoin Mining
Bitcoin & Bitcoin Mining
 
The true measure of success
The true measure of successThe true measure of success
The true measure of success
 

Recently uploaded

Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Joonhun Lee
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLkantirani197
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...ssuser79fe74
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...Lokesh Kothari
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Monika Rani
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...chandars293
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 

Recently uploaded (20)

Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 

Paleo environmental bio-diversity macro-evolutionary data mining and deep learning

  • 1. Deep time Paleo-environmental and Bio-diversity data mining and deep learning classifcation Abdullah Khan Zehady Phd student @ Earth, Atmospheric & Planetary Science, Purdue University
  • 2. Research Projects 1. Macro and micro scale evolution of planktonic foraminifera and the potential drivers during Cenozoic era (66.04 Ma). Hypothesis: “Rates of evolution are correlated with rates of geochemical and sea-level change.” - Can we find long term (2-5 myr) astronomical cycles? 2. Periodicities and other cause-effect relationships among pulses of evolution since Cambrian period (541 Ma). Hypothesis: (a) “The Earth has had semi-periodic episodes of unusual surface/biological change.” - Abundance of events over time; verification of catastrophe models (20 – 60 myr periods); causation by impact and other disastrous events. (b) “Pulses of biological evolution occur simultaneously with global changes in sediment facies.” 3. Automated fossil image recognition by feature extraction using deep neural network. 4. Effect of climate change on cultural turnover of last 2000 years of human civilization. Hypothesis: “A major factor in the rise and fall of human civilization in different continents is climate cooling.” 2
  • 3. My papers on evolutionary tree visualization algorithm 3 To be sumbitted (Under review) at BMC Evolutionary Biology To be submitted at Nature Methods
  • 4. Morphospecies and Lineage/Species evolution (Fordham, Zehady et al 2018) 4
  • 5. Species Phenon Integrated Tree (Zehady et al 2019)
  • 6. Mutual Learning using Integrated Tree Comparison and learning between integrated tree and molecular tree
  • 7. PaleoEnvironmental & Bio-Diversity Data Dataset Significance 1. Marine genera ranges and subsequent turnover timeseries data for whole Phanerozoic Marine biodiversity, turnover 2. Cenozoic planktonic foraminifera evolution and turnover timeseries. (Zehady, Fordham 2018, Aze et al 2011) Foraminifer diversity, turnover 3. Oxygen-18 (δ18O) curves and events (Cramer 2009) Long-term cooling of the ocean interior 4. Carbon-13 (δ13C) curves and events (Cramer 2009) Terrestrial climate proxy record (Gradual sinking of the organic matter) 5. Strontium isotope record (Sr87/Sr86) Tectonic evolution, continental spreading origin and evolution of igneous rocks (magmatic style) 6. Sulphur isotope record (δ34 Ssulphate) Link with LIP 7. Ages and volume extent of LIP(Large Igneous Province) Large volume gas release in the ocean- atmospheric systems, link with magmatism, mass extinction, extinction cyclicity (Melott 2012) 8. Sea-level synthesis curve (Haq et al 2014) Plate motion, changes in continent mass distribution 9. Passive margin 7
  • 8. Grand Cycles in Paleo-Meso-Cenzoic Era
  • 9. Orbital forcing - Milankovitch Cycles 9 Amplitude modulation, 2.4 Ma eccentrici
  • 10. Marine genera turnover data Entire Phanerozoic (Prob) Cenozoic only (# turnover) Raw turnover prob = Raw speciation + Raw extinction prob To reduce stochastic noise: Fit Hidden Markov Model (HMM) Markov property: Only dependent on the previous state Parameter estimation Using Baum-Welch Algorithm. AIC is used to estimate Speciation and Extinction State.
  • 11. Multi-taper spectrum – Whole Phaneorozoic * Number of significant F-test peaks identified = 9 ID / Frequency / Period / Harmonic_CL / Red noise_CL 1 0.1504065 6.648649 98.00066 69.56518 2 0.2524021 3.961933 96.46992 98.53005 3 0.27051 3.696721 93.19279 84.04573 4 0.2775314 3.603196 94.98532 97.86011 5 0.2852919 3.505181 93.26527 59.50705 6 0.2908352 3.438374 90.65819 99.49677 7 0.3510717 2.848421 98.56435 80.32858 8 0.4035477 2.478022 97.29846 68.29225 9 0.4903917 2.039186 95.83189 93.04155
  • 12. Multi-taper spectrum – Only Cenozoic (0-67 Ma) * Number of significant F-test peaks identified = 1 ID / Frequency / Period / Harmonic_CL / Rednoise_CL 1 0.2787879 3.586957 99.95627 92.50397
  • 13. Multi-taper spectrum – Only Mesozoic(67-252 Ma) * Number of significant F-test peaks identified = 3 ID / Frequency / Period / Harmonic_CL / Rednoise_CL 1 0.01079914 92.6 99.45956 67.96132 2 0.3401728 2.939683 99.03912 56.54744 3 0.3704104 2.699708 94.96196 88.81204
  • 14. Multi-taper spectrum – Only Paleozoic(252- 541 Ma) * Number of significant F-test peaks identified = 7 ID / Frequency / Period / Harmonic_CL / Rednoise_CL 1 0.004149378 241 93.49593 78.74621 2 0.01521438 65.72727 91.12898 93.60384 3 0.1500692 6.663594 91.39851 88.02085 4 0.2524205 3.961644 94.03579 98.97888 5 0.2904564 3.442857 93.06378 89.31391 6 0.3506224 2.852071 96.671 84.24726 7 0.4439834 2.252336 90.65441 92.18804
  • 15. Ocean-Atmosphere System How marine and terrestrial environment are connected? MaGIC model – Geochemistry based modeling (Arvidson et al 2006) f[1]: Organic Phosphorus (P) f[21]: Terrestrial Organic matter(C) burial flux f[50]: Sulfate(S) reduction f[54]: Organic Carbon(C) sedimentation f[71], f[72]: Carbonate precipitation
  • 16. PaleoEnvironmental Data – Oxygen & Carbon 16 Global warming/cooling Organic carbon abundance
  • 17. PaleoEnvironmental Data : Strontium + Passive Margin 17 Continental/Sea floor Spreading Transition between oceanic and continental lithosphere via sedimentation on passive margin
  • 18. Large Igneous Province 18 Number of Large Igneous Provinces and the volume since 140 Ma
  • 19. PaleoEnvironmental Data – Oxygen & Carbon 19 Regional impact events (Crater > 50 km, 5-10km) Sulphur isotope data
  • 20. Biodiversity Data 20 Speciation and extinction of 18,000 marine genera Number of marine genera
  • 21. PaleoEnvironmental & Bio-Diversity Data What am I looking for? Synchronous anomalies Long-term oscillations/Periodicities Changes in rates
  • 22. Correlation matrix of timeseries data 22 genera_ts genera_prokoph oxy_18 carbon_13 sr87_86 s34 LIP LIP_volume1 LIP_volume2 impact passive_margin sea level genera_ts 1 0.018355819 -0.0340282 0.083035629 0.0297944 -0.0373417 0.04421 0.344736229 0.344635849 -0.04627802 0.104789606 -0.08349 genera_prokoph 0.018355819 1 -0.530246 -0.465936957 0.48878332 0.742836318 -0.083082 -0.43421673 -0.437467491 0.14855874 -0.101226221 -0.41527 oxy_18 -0.034028212 -0.530246 1 0.237833917 -0.8404705 -0.62891252 -0.0204 0.281461527 0.285603444 -0.15390199 0.118727266 0.849575 carbon_13 0.083035629 -0.4659369570.23783392 1 -0.1983475 -0.37694399 0.1964381 0.17408195 0.181388307 -0.11451248 -0.0444354 0.178451 sr87_86 0.029794405 0.48878332 -0.8404705 -0.19834753 1 0.453619955 -0.08259 -0.19061375 -0.194846829 0.17851698 -0.218495446 -0.9094 s34 -0.037341696 0.742836318 -0.6289125-0.376943992 0.45361995 1 -0.106022 -0.45775826 -0.464132734 0.10039699 -0.170888249 -0.46339 LIP 0.04420999 -0.083082314 -0.0204002 0.196438071 -0.0825905 -0.10602187 1 0.118074692 0.121020407 -0.12452535 -0.042236671 0.072872 LIP_volume1 0.344736229 -0.4342167280.28146153 0.17408195 -0.1906138 -0.45775826 0.1180747 1 0.999783534 0.04981638 -0.005744232 0.202309 LIP_volume2 0.344635849 -0.4374674910.28560344 0.181388307 -0.1948468 -0.46413273 0.1210204 0.999783534 1 0.04848648 -0.00962115 0.207276 impact -0.046278015 0.148558742 -0.153902-0.114512478 0.17851698 0.100396987 -0.124525 0.049816378 0.04848648 1 0.061153414 -0.25373 passive_margin 0.104789606 -0.1012262210.11872727 -0.0444354 -0.2184954 -0.17088825 -0.042237 -0.00574423 -0.00962115 0.06115341 1 0.117477 sea level -0.083488878 -0.4152675890.84957453 0.178450982 -0.9093977 -0.46338895 0.0728716 0.202309075 0.207276424-0.25373116 0.1174771 1
  • 23. Principal Component Analysis of Cenozoic data 23 PCA transforms correlated data into a new co-ordinate such that the new variables are uncorrelated. The goal of PCA is to find components Z = [Z_1, Z_2, …, Z_p] which are linear combination of u = [u_1, u_2, …, u_p]’ of the Original variable X = [X_1, X_2, …, X_p] that achieve maximum variance. For Cenozoic, we have no missing values for all 12 variables/parameters. X: A matrix with dimension 69 x 12, n = 69, p =12 Y: Normalized matrix of X C: Covariance matrix where C = t(X) * X / (p -1) , Eigen value decomposition in R E = eigen(C) where t(E) * E = I EOF (Empirical Orthogonal Function) : Orthogonal basis function, basically the eigen vectors Ev : Eigen vector matrix
  • 24. Principal Component Analysis - loadings 24 Loadings table is composed of principal component vectors
  • 25. Principal Component Analysis of Cenozoic data 25 Principal Component Vector 1 Principal Component Vector 2
  • 26. Principal Component Analysis of Cenozoic data 26 Principal Component Vector 3 Principal Component Vector 4
  • 27. Principal Component Analysis of Cenozoic data 27 Principal Component Vector 5 Principal Component Vector 6
  • 28. Automated Fossil Genus Classification Hedbergella Sigali Globigerinoides Altiaperturus Binomial nomenclature system: Genus Species Can we extract features from the species to detect its Genus?
  • 29. Automated Fossil Genus Classification Image Data Number of Images Accuracy with Best model so far Training (Transformed image) 1947 ~ 74% Validation 649 ~ 55% Test (Previously Seen Species) 236 ~ 90% Unseen Test (Totally new species) 37 ~ 43% Multi class classification How many different Genus class we have? --> 79 Model Comparison of 3 CNN (VGG19) models “Cross Entropy” loss minimization with Adam Optimizer
  • 30. Automated Fossil Genus Classification Paragloborotalia predicted as Turborotalia Most Likely cause of misclassification Turborotalia Training Images
  • 31. Misclassification Analysis - 1 Globigerina as Globigerinoides
  • 32. Misclassification Analysis – 2 Globigerinoides as Globorotalia
  • 33. Misclassification Analysis – 3 (Reverse of 2) Globorotalia as Globigerinoides The Black hole in the center!!
  • 34. Misclassification Analysis – 4 Globorotalia as Globoturborotalita
  • 35. Misclassification Analysis – 5 Globoturborotalita as Globigerina
  • 36. Misclassification Analysis – 6 Globuligerina as Pseudohastigerina
  • 37. What each CNN layer is learning? Layer2 : Block2_Conv2 Mostly directional Layer3 : Block3_Conv2 10 random filters in each layer
  • 38. What each CNN layer is learning? Layer2 : Block2_Conv2 Mostly directional Layer3 : Block3_Conv2 10 random filters in each layer
  • 39. How does VGG19 CNN model classify between genus? What are the final abstraction/specific output categories for each of the 79 classes? Unique features to classify.. With categorical cross-entropy Globigerina Globigerinoides Globorotalia Sigalia
  • 40. Month/Year Prioritized Item Jan 2019 ~ April 2019 • Submission of Evolutionary tree visualization paper • Submission of Species-Phenon Integrated tree paper • Completion of culture paper Feb 2019 • NSF proposal resubmission. Jan 2019 ~ May 2019 • Phanerozoic data extraction, Visualization • Correlation/causality and spectral analysis for Phanerozoic (500 myr) data. • Learning on Cross phase, phase lag analysis, Causality analysis • Automatic fossil image detection project May 2019 ~ Aug 2019 Internship at Cisco Systems in their Data Center Nov 2019 • Poster, paper presentation on Machine-learning and artificial-intelligence application in the geosciences (GSA Annual meeting) • Other Machine learning journals.. Timeline 40
  • 43. Multi-taper spectrum – first 300 myr of Phaneorozoic * Number of significant F-test peaks identified = 9 ID / Frequency / Period / Harmonic_CL / Rednoise_CL 1 0.01136364 88 96.39839 95.99728 2 0.1557487 6.420601 94.82795 86.59931 3 0.1657754 6.032258 98.66033 94.32486 4 0.2159091 4.631579 97.50714 82.3273 5 0.2332888 4.286533 97.22287 68.10389 6 0.3709893 2.695495 98.40121 86.48012 7 0.3923797 2.548552 93.06953 89.62598 8 0.3977273 2.514286 93.58799 91.69948 9 0.4946524 2.021622 96.30716 82.51252
  • 44. Multi-taper spectrum – 300-600 myr of Phaneorozoic * Number of significant F-test peaks identified = 8 ID / Frequency / Period / Harmonic_CL / Rednoise_CL 1 0.01570248 63.68421 90.13648 92.73254 2 0.1429752 6.99422 91.96099 97.77526 3 0.246281 4.060403 91.6426 95.21544 4 0.2520661 3.967213 93.09128 97.57965 5 0.2917355 3.427762 90.01668 84.54075 6 0.3504132 2.853774 95.51884 82.62293 7 0.3628099 2.756264 97.1726 71.84603 8 0.4438017 2.253259 97.93826 87.35393
  • 45. Principal Component Analysis of Cenozoic data 45 Principal Component Vector 7 Principal Component Vector 8
  • 46. Principal Component Analysis of Cenozoic data 46 Principal Component Vector 9 Principal Component Vector 10
  • 47. Principal Component Analysis of Cenozoic data 47 Principal Component Vector 11 Principal Component Vector 12