3. SASIR LSST
SDSS PTF / LBL (future) (future)
stripe-82 subtraction
archived data pipeline Survey X Survey Y
(real-time survey
telescope) (static survey
repository)
Transients
Classification
Pipeline
Database containing Classify
Broadcast “sources”
“sources”
• features for a source • interesting or transient source
• include classifications Database
• data epochs associated • include features, context
with a source Broadcast
4. SDSS Stripe 82
SDSS
stripe-82
archived data
• A deep field from the Sloan Digital Sky Survey
• 750 Million observation epochs
Transients • ~20 Million “sources” clustered from epochs
• 5 colors / filters, 4 years of observations
Classification • We used Stripe-82 for testing and development
Pipeline
Database containing
“sources”
• features for a source
• data epochs associated
with a source
5. Palomar Transient Factory
• Palomar 48” telescope
• 100 Mpix, 7.8 sq-deg detector
• ~120s cadence : ~200MB : <100GB/night
• Post subtraction: ~1M difference objects / night
• Post filtering: ~10k difference objects / night
~100s transient and variable stars
LBL
subtraction
pipeline
T PTF consortium
PAIRITEL 1.3m
C
P Palomar 60” MDM 1.3m & 2.4m
6. Next Generation Survey: LSST
Large Synoptic Survey
Telescope (LSST):
1 Gb every 2 seconds
106 supernovae/yr
105 eclipsing systems
107 asteroids...
light curves of 800
million sources every
3 days
7. Transients Classification Pipeline
“Object”
Datastream
source
T
generation
C
feature
generation
P source
classification
Database
Follow-up
telescope observations
Broadcast
8. Parallelized source correlation
and classification
• Retrieve difference objects
• Each difference-object is passed to an IPython client
• Each parallel IPython client performs:
• Source creation or correlation with existing sources
• “Feature” generation (or re-generation) for that source
source • Classification of that source
generation
feature
generation
source
classification
10. Warehouse of light-curves
• Need representative light-curves for all science
• With these we can model each science class
• We’ve built a warehouse of example light-curves
TCP-TUTOR DotAstro.org
internal interface public interface
11.
12.
13. “Noisifying to the Survey”
• Well sampled light-curves
• Can make good classifiers for well-sampled data.
• Don’t immediately make good classifiers for noisy, sparse data.
• We need classifiers which are trained using:
• sampling cadence of our survey
• sparseness of our survey data
• noise and sensitivity limitations of our instrument
• We need “Noisification” software which:
• Resamples well-sampled light-curves
• Outputs noisified sources which are used for generating classifiers
15. “Noisifying to the Survey”
• For PTF:
• Code uses PTF pointing and survey observing plans
• Occasionally PTF observes using a faster cadence:
• 7.5 minutes between revisiting an RA, Dec
• Faster cadence requires a separate set of noisified light-curves
and classifiers.
• Other surveys:
• Other pointing and observing plans could be used.
• Can generate noisified light-curves for other surveys.
• Then we can generate science classifiers for these surveys.
16. Classifiers
• General Classifier
Identify: Filter out:
• well sampled (periodic & nonperiodic) • poorly subtracted sources
• interesting sources near known galaxies • minor planets / rocks
• periodic variable science class when • cosmic rays
confidence is high
• detector defects
• Timeseries Classifiers
• Weighted combination of WEKA classifiers
• bagged Random Forest classifier using a cost-matrix
• Each classifier trained on different cadenced noisified data
• Astronomer crafted classifiers for specific science types
• Microlens, Super Nova
17. Interesting near-galaxy PTF sources
• Identified by TCP during end of Aug ‘09
• Classification triggered by latest epoch
added to the source
18. Periodic variable classifiers
• Currently, science classes are determined by combining
the weighted probabilities generated by different
classification models, for a source.
~0.4 day period
~0.14 day period
RR Lyrae using • Each machine-learned classification model is trained using RR Lyrae using
10 epoch
20 epoch “noisified” lightcurves which were generated using
different parameters. noisification
noisification
...shows highest classification
Clicking on a class for one
probability sources for that
of dozens of ML models...
model::class
Overplotting of
period-fold plotting
period-folded model
probably failed here
still needs work
0.1 - 0.17 day period RR Lyrae
using 15 epoch noisification
19. Evaluating and Combining Classifiers
• Issues when using multiple classifiers:
• How to combine classifiers when using:
• weighted classifiers
• tree-hierarchy of sub-classifiers
• How to generate final classification “probabilities” when using:
• Widely varying types of classifiers
• Classifiers which contain sub-classifications & probabilities
• Evaluate the final combination of classifiers
• Classify PTF09xxx user classified sources, determine efficiencies
• Classify noisified sources, determine efficiencies