SlideShare a Scribd company logo
1 of 20
Download to read offline
Josh Bloom
UC Berkeley Astronomy
@profjsb
Autoencoding RNN for inference on
unevenly sampled time-series data
Data Driven Discovery Investigator
Workshop on Applying Advanced AI Workflows
In Astronomy and Microscopy
11 Sept 2018 (UCSC, Santa Clara)
Discovery in images:
Real or spurious sources?
(Ever) Increasing need for ML methods
in Time-Domain Astronomy
Bloom+12, Goldstein+16, …
Inference: What is
this event and is it
worth following up?
Levitan+14
Surrogate modelling &
parameter estimation
Supernova (Thomas/Nugent);
Exoplanets (Ford+11)
Supernova Discovery in the Pinwheel Galaxy
11 hr after explosion
nearest SN Ia in >3 decades
ML-assisted discovery
©Peter Nugent
Nugent+11, Li, Bloom+12, Bloom+12…
Probabilistic Classification of
50k+ Variable Stars
Shivvers,JSB,Richards MNRAS,2014
106 “DEB” candidates
12 new
mass-radii
15 “RCB/DYP”

candidates
8 new discoveries
Triple # of
Galactic
DYPer Stars
Miller, Richards, JSB,..ApJ 2012
5400
Spectroscopic
Targets
Miller, JSB, Richards,..ApJ 2015
Turn synoptic
imagers into
~spectrographs
Challenges with Traditional ("Hand-Crafted Featurization")
Approaches
• Feature engineering is expensive (people/compute), needs
a lot of domain knowledge
• "Small data" domain with only 1000s of labelled training
examples
• Traditional ML techniques don't account for feature
uncertainty
• Ideally would like to learn on one survey and apply that
knowledge to another (e.g., ASAS→ZTF→LSST)
https://github.com/cesium-ml/cesium
1. Build an autoencoder network to
learn to reproduce irregularly sampled
light curves using an information
bottleneck (B)
E( (→
B
D→ ( ( ≈
2. Use B as features and learn a
traditional classifier (random forest)
len(B) = 64
Example Reconstructions
of the Autoencoder
Bottleneck clearly learns
important features
underlying the "physics"
that generates the data
Results rival best-in-class approaches
Code/Data: https://github.com/bnaul/IrregularTimeSeriesAutoencoderPaper
Figure 1: Diagram of an RNN encoder/decoder architecture for irregularly sampled time ser
data. This network uses two RNN layers (specifically, bidirectional gated recurrent units (GRU) [6, 2
• Natively handles
irregularly sampling
Novelties & Improvements
Figure 1: Diagram of an RNN encoder/decoder architecture for irregularly sampled time ser
data. This network uses two RNN layers (specifically, bidirectional gated recurrent units (GRU) [6, 2
• Natively handles
irregularly sampling
• Learning loss accounts
for uncertainty
Novelties & Improvements
Figure 1: Diagram of an RNN encoder/decoder architecture for irregularly sampled time ser
data. This network uses two RNN layers (specifically, bidirectional gated recurrent units (GRU) [6, 2
• Natively handles
irregularly sampling
• Learning loss accounts
for uncertainty
• Natural data
augmentation with
bootstrap resampling
Novelties & Improvements
Figure 1: Diagram of an RNN encoder/decoder architecture for irregularly sampled time ser
data. This network uses two RNN layers (specifically, bidirectional gated recurrent units (GRU) [6, 2
• unsupervised feature
learning → leverage large
corpus of unlabelled light
curves
Novelties & Improvements
Figure 1: Diagram of an RNN encoder/decoder architecture for irregularly sampled time ser
data. This network uses two RNN layers (specifically, bidirectional gated recurrent units (GRU) [6, 2
• unsupervised feature
learning → leverage large
corpus of unlabelled light
curves
• transfer learning appears
to work
Novelties & Improvements
Figure 1: Diagram of an RNN encoder/decoder architecture for irregularly sampled time ser
data. This network uses two RNN layers (specifically, bidirectional gated recurrent units (GRU) [6, 2
• unsupervised feature
learning → leverage large
corpus of unlabelled light
curves
• transfer learning appears
to work
• learning scales linearly in
training examples
Novelties & Improvements
Extensions/Active Research
• Anomaly detection (on the bottleneck features)
• Hyperspectral topology
UMAP applied to
L2-normed autoencoder
for MNIST
Ellie Schwab Abrahams
Also, with Sara Jamal
• New layer types: explore Temporal Convnet (TCNs)
• Co-training across surveys
• Semi-supervised topology + metadata
Loss ~ Lts + λ Lclass
Source
Metadata
Source
Time series
Bottleneck
Unsupervised
SupervisedClassification
Time series
Reconstruction
FC
LSTM
LSTM
Extensions/Active Research
Ellie Schwab Abrahams
Also, with Sara Jamal
Josh Bloom
UC Berkeley Astronomy
@profjsb
Autoencoding RNN for inference on
unevenly sampled time-series data
Data Driven Discovery Investigator
Thanks!
Workshop on Applying Advanced AI Workflows
In Astronomy and Microscopy
11 Sept 2018 (UCSC, Santa Clara)
50k variables, 810 with known labels (timeseries, colors)
Challenge: classification on large sets
Richards+11, 12

More Related Content

What's hot

Climate data in r with the raster package
Climate data in r with the raster packageClimate data in r with the raster package
Climate data in r with the raster package
Alberto Labarga
 
Super COMPUTING Journal
Super COMPUTING JournalSuper COMPUTING Journal
Super COMPUTING Journal
Pandey_G
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
Yahoo Developer Network
 

What's hot (20)

Detecting solar farms with deep learning
Detecting solar farms with deep learningDetecting solar farms with deep learning
Detecting solar farms with deep learning
 
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and Spark
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and SparkFOSDEM 2015: Distributed Tile Processing with GeoTrellis and Spark
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and Spark
 
Bioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9pBioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9p
 
LocationTech Projects
LocationTech ProjectsLocationTech Projects
LocationTech Projects
 
Big Data for Big Discoveries
Big Data for Big DiscoveriesBig Data for Big Discoveries
Big Data for Big Discoveries
 
OCC Overview OMG Clouds Meeting 07-13-09 v3
OCC Overview OMG Clouds Meeting 07-13-09 v3OCC Overview OMG Clouds Meeting 07-13-09 v3
OCC Overview OMG Clouds Meeting 07-13-09 v3
 
Climate data in r with the raster package
Climate data in r with the raster packageClimate data in r with the raster package
Climate data in r with the raster package
 
Automatic Features Generation And Model Training On Spark: A Bayesian Approach
Automatic Features Generation And Model Training On Spark: A Bayesian ApproachAutomatic Features Generation And Model Training On Spark: A Bayesian Approach
Automatic Features Generation And Model Training On Spark: A Bayesian Approach
 
SSD: Single Shot MultiBox Detector (UPC Reading Group)
SSD: Single Shot MultiBox Detector (UPC Reading Group)SSD: Single Shot MultiBox Detector (UPC Reading Group)
SSD: Single Shot MultiBox Detector (UPC Reading Group)
 
Detection
DetectionDetection
Detection
 
Super COMPUTING Journal
Super COMPUTING JournalSuper COMPUTING Journal
Super COMPUTING Journal
 
Histogram Equalized Heat Maps from Log Data via Apache Spark with Arvind Rao
Histogram Equalized Heat Maps from Log Data via Apache Spark with Arvind RaoHistogram Equalized Heat Maps from Log Data via Apache Spark with Arvind Rao
Histogram Equalized Heat Maps from Log Data via Apache Spark with Arvind Rao
 
Secondary Spectrum Usage for Mobile Devices
Secondary Spectrum Usage for Mobile DevicesSecondary Spectrum Usage for Mobile Devices
Secondary Spectrum Usage for Mobile Devices
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
 
Mask R-CNN
Mask R-CNNMask R-CNN
Mask R-CNN
 
Faster R-CNN
Faster R-CNNFaster R-CNN
Faster R-CNN
 
Advanced deep learning based object detection methods
Advanced deep learning based object detection methodsAdvanced deep learning based object detection methods
Advanced deep learning based object detection methods
 
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
 
Of Sampling and Smoothing: Approximating Distributions over Linked Open Data
Of Sampling and Smoothing: Approximating Distributions over Linked Open DataOf Sampling and Smoothing: Approximating Distributions over Linked Open Data
Of Sampling and Smoothing: Approximating Distributions over Linked Open Data
 
Artificial Neural Networks for Storm Surge Prediction in North Carolina
Artificial Neural Networks for Storm Surge Prediction in North CarolinaArtificial Neural Networks for Storm Surge Prediction in North Carolina
Artificial Neural Networks for Storm Surge Prediction in North Carolina
 

Similar to Autoencoding RNN for inference on unevenly sampled time-series data

(Research Note) Delving deeper into convolutional neural networks for camera ...
(Research Note) Delving deeper into convolutional neural networks for camera ...(Research Note) Delving deeper into convolutional neural networks for camera ...
(Research Note) Delving deeper into convolutional neural networks for camera ...
Jacky Liu
 
myashar_research_2016
myashar_research_2016myashar_research_2016
myashar_research_2016
Mark Yashar
 
BurstCube Poster Final Draft
BurstCube Poster Final DraftBurstCube Poster Final Draft
BurstCube Poster Final Draft
Ykeshia Zamore
 

Similar to Autoencoding RNN for inference on unevenly sampled time-series data (20)

Computational Training and Data Literacy for Domain Scientists
Computational Training and Data Literacy for Domain ScientistsComputational Training and Data Literacy for Domain Scientists
Computational Training and Data Literacy for Domain Scientists
 
Data Science Education: Needs & Opportunities in Astronomy
Data Science Education: Needs & Opportunities in AstronomyData Science Education: Needs & Opportunities in Astronomy
Data Science Education: Needs & Opportunities in Astronomy
 
The Emerging Cyberinfrastructure for Earth and Ocean Sciences
The Emerging Cyberinfrastructure for Earth and Ocean SciencesThe Emerging Cyberinfrastructure for Earth and Ocean Sciences
The Emerging Cyberinfrastructure for Earth and Ocean Sciences
 
Identifying Exoplanets with Machine Learning Methods: A Preliminary Study
Identifying Exoplanets with Machine Learning Methods: A Preliminary StudyIdentifying Exoplanets with Machine Learning Methods: A Preliminary Study
Identifying Exoplanets with Machine Learning Methods: A Preliminary Study
 
Astronomical Data Processing on the LSST Scale with Apache Spark
Astronomical Data Processing on the LSST Scale with Apache SparkAstronomical Data Processing on the LSST Scale with Apache Spark
Astronomical Data Processing on the LSST Scale with Apache Spark
 
IRJET- Deep Convolution Neural Networks for Galaxy Morphology Classification
IRJET- Deep Convolution Neural Networks for Galaxy Morphology ClassificationIRJET- Deep Convolution Neural Networks for Galaxy Morphology Classification
IRJET- Deep Convolution Neural Networks for Galaxy Morphology Classification
 
Cyberinfrastructure to Support Ocean Observatories
Cyberinfrastructure to Support Ocean ObservatoriesCyberinfrastructure to Support Ocean Observatories
Cyberinfrastructure to Support Ocean Observatories
 
(Research Note) Delving deeper into convolutional neural networks for camera ...
(Research Note) Delving deeper into convolutional neural networks for camera ...(Research Note) Delving deeper into convolutional neural networks for camera ...
(Research Note) Delving deeper into convolutional neural networks for camera ...
 
Computational Training for Domain Scientists & Data Literacy
Computational Training for Domain Scientists & Data LiteracyComputational Training for Domain Scientists & Data Literacy
Computational Training for Domain Scientists & Data Literacy
 
ExoSGAN and ExoACGAN: Exoplanet Detection using Adversarial Training Algorithms
ExoSGAN and ExoACGAN: Exoplanet Detection using Adversarial Training AlgorithmsExoSGAN and ExoACGAN: Exoplanet Detection using Adversarial Training Algorithms
ExoSGAN and ExoACGAN: Exoplanet Detection using Adversarial Training Algorithms
 
Science and Cyberinfrastructure in the Data-Dominated Era
Science and Cyberinfrastructure in the Data-Dominated EraScience and Cyberinfrastructure in the Data-Dominated Era
Science and Cyberinfrastructure in the Data-Dominated Era
 
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
 
AstroCV: A computer vision library for Astronomy
AstroCV: A computer vision library for AstronomyAstroCV: A computer vision library for Astronomy
AstroCV: A computer vision library for Astronomy
 
myashar_research_2016
myashar_research_2016myashar_research_2016
myashar_research_2016
 
Ieee 2016 nss mic poster N30-21
Ieee 2016 nss mic poster N30-21Ieee 2016 nss mic poster N30-21
Ieee 2016 nss mic poster N30-21
 
Real-Time Analysis of Streaming Synchotron Data: SCinet SC19 Technology Chall...
Real-Time Analysis of Streaming Synchotron Data: SCinet SC19 Technology Chall...Real-Time Analysis of Streaming Synchotron Data: SCinet SC19 Technology Chall...
Real-Time Analysis of Streaming Synchotron Data: SCinet SC19 Technology Chall...
 
Enabling Real Time Analysis & Decision Making - A Paradigm Shift for Experime...
Enabling Real Time Analysis & Decision Making - A Paradigm Shift for Experime...Enabling Real Time Analysis & Decision Making - A Paradigm Shift for Experime...
Enabling Real Time Analysis & Decision Making - A Paradigm Shift for Experime...
 
Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Ma...
Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Ma...Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Ma...
Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Ma...
 
120_SEM_Special_Topics.ppt
120_SEM_Special_Topics.ppt120_SEM_Special_Topics.ppt
120_SEM_Special_Topics.ppt
 
BurstCube Poster Final Draft
BurstCube Poster Final DraftBurstCube Poster Final Draft
BurstCube Poster Final Draft
 

More from Joshua Bloom

More from Joshua Bloom (6)

Industrial Machine Learning (SIGKDD17)
Industrial Machine Learning (SIGKDD17)Industrial Machine Learning (SIGKDD17)
Industrial Machine Learning (SIGKDD17)
 
Industrial Machine Learning (at GE)
Industrial Machine Learning (at GE)Industrial Machine Learning (at GE)
Industrial Machine Learning (at GE)
 
PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning" PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning"
 
Large-Scale Inference in Time Domain Astrophysics
Large-Scale Inference in Time Domain AstrophysicsLarge-Scale Inference in Time Domain Astrophysics
Large-Scale Inference in Time Domain Astrophysics
 
Data Science at Berkeley
Data Science at BerkeleyData Science at Berkeley
Data Science at Berkeley
 
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey EraJoshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
 

Recently uploaded

The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
Areesha Ahmad
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
NazaninKarimi6
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 

Recently uploaded (20)

GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
Introduction to Viruses
Introduction to VirusesIntroduction to Viruses
Introduction to Viruses
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 

Autoencoding RNN for inference on unevenly sampled time-series data

  • 1. Josh Bloom UC Berkeley Astronomy @profjsb Autoencoding RNN for inference on unevenly sampled time-series data Data Driven Discovery Investigator Workshop on Applying Advanced AI Workflows In Astronomy and Microscopy 11 Sept 2018 (UCSC, Santa Clara)
  • 2. Discovery in images: Real or spurious sources? (Ever) Increasing need for ML methods in Time-Domain Astronomy Bloom+12, Goldstein+16, … Inference: What is this event and is it worth following up? Levitan+14 Surrogate modelling & parameter estimation Supernova (Thomas/Nugent); Exoplanets (Ford+11)
  • 3. Supernova Discovery in the Pinwheel Galaxy 11 hr after explosion nearest SN Ia in >3 decades ML-assisted discovery ©Peter Nugent Nugent+11, Li, Bloom+12, Bloom+12…
  • 4. Probabilistic Classification of 50k+ Variable Stars Shivvers,JSB,Richards MNRAS,2014 106 “DEB” candidates 12 new mass-radii 15 “RCB/DYP”
 candidates 8 new discoveries Triple # of Galactic DYPer Stars Miller, Richards, JSB,..ApJ 2012 5400 Spectroscopic Targets Miller, JSB, Richards,..ApJ 2015 Turn synoptic imagers into ~spectrographs
  • 5. Challenges with Traditional ("Hand-Crafted Featurization") Approaches • Feature engineering is expensive (people/compute), needs a lot of domain knowledge • "Small data" domain with only 1000s of labelled training examples • Traditional ML techniques don't account for feature uncertainty • Ideally would like to learn on one survey and apply that knowledge to another (e.g., ASAS→ZTF→LSST) https://github.com/cesium-ml/cesium
  • 6. 1. Build an autoencoder network to learn to reproduce irregularly sampled light curves using an information bottleneck (B) E( (→ B D→ ( ( ≈ 2. Use B as features and learn a traditional classifier (random forest)
  • 7. len(B) = 64 Example Reconstructions of the Autoencoder
  • 8. Bottleneck clearly learns important features underlying the "physics" that generates the data
  • 9. Results rival best-in-class approaches Code/Data: https://github.com/bnaul/IrregularTimeSeriesAutoencoderPaper
  • 10. Figure 1: Diagram of an RNN encoder/decoder architecture for irregularly sampled time ser data. This network uses two RNN layers (specifically, bidirectional gated recurrent units (GRU) [6, 2 • Natively handles irregularly sampling Novelties & Improvements
  • 11. Figure 1: Diagram of an RNN encoder/decoder architecture for irregularly sampled time ser data. This network uses two RNN layers (specifically, bidirectional gated recurrent units (GRU) [6, 2 • Natively handles irregularly sampling • Learning loss accounts for uncertainty Novelties & Improvements
  • 12. Figure 1: Diagram of an RNN encoder/decoder architecture for irregularly sampled time ser data. This network uses two RNN layers (specifically, bidirectional gated recurrent units (GRU) [6, 2 • Natively handles irregularly sampling • Learning loss accounts for uncertainty • Natural data augmentation with bootstrap resampling Novelties & Improvements
  • 13. Figure 1: Diagram of an RNN encoder/decoder architecture for irregularly sampled time ser data. This network uses two RNN layers (specifically, bidirectional gated recurrent units (GRU) [6, 2 • unsupervised feature learning → leverage large corpus of unlabelled light curves Novelties & Improvements
  • 14. Figure 1: Diagram of an RNN encoder/decoder architecture for irregularly sampled time ser data. This network uses two RNN layers (specifically, bidirectional gated recurrent units (GRU) [6, 2 • unsupervised feature learning → leverage large corpus of unlabelled light curves • transfer learning appears to work Novelties & Improvements
  • 15. Figure 1: Diagram of an RNN encoder/decoder architecture for irregularly sampled time ser data. This network uses two RNN layers (specifically, bidirectional gated recurrent units (GRU) [6, 2 • unsupervised feature learning → leverage large corpus of unlabelled light curves • transfer learning appears to work • learning scales linearly in training examples Novelties & Improvements
  • 16. Extensions/Active Research • Anomaly detection (on the bottleneck features) • Hyperspectral topology UMAP applied to L2-normed autoencoder for MNIST Ellie Schwab Abrahams Also, with Sara Jamal
  • 17. • New layer types: explore Temporal Convnet (TCNs) • Co-training across surveys • Semi-supervised topology + metadata Loss ~ Lts + λ Lclass Source Metadata Source Time series Bottleneck Unsupervised SupervisedClassification Time series Reconstruction FC LSTM LSTM Extensions/Active Research Ellie Schwab Abrahams Also, with Sara Jamal
  • 18. Josh Bloom UC Berkeley Astronomy @profjsb Autoencoding RNN for inference on unevenly sampled time-series data Data Driven Discovery Investigator Thanks! Workshop on Applying Advanced AI Workflows In Astronomy and Microscopy 11 Sept 2018 (UCSC, Santa Clara)
  • 19.
  • 20. 50k variables, 810 with known labels (timeseries, colors) Challenge: classification on large sets Richards+11, 12