Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection of Exoplanets
1. 1The World’s Fastest Time-Series Database
Esperanza López
Aguilera
29 March 2019
Using a Bayesian Neural
Network in the Detection
of Exoplanets
2. 2The World’s Fastest Time-Series Database
● The World’s Fastest Time-Series Database
○ In-memory computing
○ Streaming analytics
● Q Language
○ Functional
○ Array based
○ Primitive temporal datatypes
○ Tables are the first class datatype
○ Qsql
○ Lambda architecture out of the box
○ Scarily efficient
○ Syntax highlighting for q
What’s kdb+?
3. 3The World’s Fastest Time-Series Database
● The World’s Fastest Time-Series Database
○ In-memory computing
○ Streaming analytics
● Q Language
○ Functional
○ Array based
○ Primitive temporal datatypes
○ Tables are the first class datatype
○ Qsql
○ Lambda architecture out of the box
○ Scarily efficient
○ Syntax highlighting for q
What’s kdb+?
Extremely fast
Elegant and concise
4. 4The World’s Fastest Time-Series Database
Applications
What do they have in common?
Data
-
Time series
Machine learning
solutions
5. 5The World’s Fastest Time-Series Database
NASA Frontier Development Lab
● Applied AI research accelerator
● Hosted by:
○ SETI Institute
○ NASA Ames Research Center
● 2-months programme
● 7 challenges
● Involved in 2 challenges:
○ Space weather:
■ How solar activity impacts Earth
■ Paper: https://code.kx.com/q/wp/space-weather/
○ Exoplanets:
■ Find new planet candidates
■ Paper: https://code.kx.com/q/wp/exoplanets/
6. 6The World’s Fastest Time-Series Database
Exoplanets Challenge - TESS
● Launched in April 2018
● 2 years mission
● 26 sectors:
○ 27 days per sector
● Objective:
Discovering new
exoplanets in orbit
around the brightest
stars in the solar
neighborhood
7. 7The World’s Fastest Time-Series Database
Exoplanets challenge - Transits
How do we detect
exoplanets?
Transits
8. 8The World’s Fastest Time-Series Database
But it’s not so easy ...
Background
Eclipsing
Binaries
Eclipsing
Binaries
Stellar activity
9. 9The World’s Fastest Time-Series Database
Data
Images taken at a given frequency
Target stars
Optimal set of pixel representing each star
Aggregate brightness extracted
Remove noise, trends and other factors
● Simulated data
○ 4 sectors
● 64, 000 target stars
● Strong signal found in 9,139
stars
● 19,577 TCEs or planet
candidates
● Optimal parameters inferred:
○ Epoch
○ Period
○ Duration
○ ...
● Issue: Many false positives
Threshold Crossing Events
Corrected flux - Light curve
14. 14The World’s Fastest Time-Series Database
Classification techniques
● Humans looking at light curves
○ Statistical methods used
○ Too many hours
● Complex models
○ Several inputs
○ Time-consuming
○ Intensive preprocessing
15. 15The World’s Fastest Time-Series Database
Benchmark model
● 77% accurate
● Very low precision
○ 47%
● Uncertainty?
● Confidence?
Linear classifier
16. 16The World’s Fastest Time-Series Database
Bayesian Neural Network
Stochastic model + Neural Network
Probabilistic confidence on
predictions
● Weights follow a distribution
● Train parameters instead of
weights
● Result: Distribution of probabilities
● Several criteria for decision making
○ Standard deviation
○ Mean
○ ...
17. 17The World’s Fastest Time-Series Database
Bayesian Neural Network
Oversampling
-
Random sample of the
positive class
Build Network
-
Define architecture and
parameters
EmbedPy
18. 18The World’s Fastest Time-Series Database
Results - Performance
● Outputs of the BNN:
○ Probability of being a planet
○ Sample of size 500 per input
● Decision based on:
○ Average probability
○ P > 0.5 ⇒Planet
○ Flexibility
● Metrics:
○ Accuracy: 91%
○ Precision: 83%
○ Sensitivity: 68% Probability sample of one TCE