Talk about T.C.P. for CDI inter-departmental workshop at UC Berkeley. 20090911.

Josh Bloom (PI)
, Justin Higgins, Adam Morgan

“Object”
Datastream

Transients
Classiﬁcation
Pipeline

Classify

Database

Broadcast

SASIR LSST
SDSS PTF / LBL (future) (future)
stripe-82 subtraction
archived data pipeline Survey X Survey Y
(real-time survey
telescope) (static survey
repository)

Transients
Classiﬁcation
Pipeline
Database containing Classify
Broadcast “sources”
“sources”
• features for a source • interesting or transient source
• include classiﬁcations Database
• data epochs associated • include features, context
with a source Broadcast

SDSS Stripe 82
SDSS
stripe-82
archived data
• A deep field from the Sloan Digital Sky Survey

• 750 Million observation epochs

Transients • ~20 Million “sources” clustered from epochs

• 5 colors / filters, 4 years of observations
Classification • We used Stripe-82 for testing and development

Pipeline
Database containing
“sources”
• features for a source
• data epochs associated
with a source

Palomar Transient Factory
• Palomar 48” telescope

• 100 Mpix, 7.8 sq-deg detector

• ~120s cadence : ~200MB : <100GB/night

• Post subtraction: ~1M difference objects / night

• Post ﬁltering: ~10k difference objects / night
~100s transient and variable stars

LBL
subtraction
pipeline
T PTF consortium
PAIRITEL 1.3m

C
P Palomar 60” MDM 1.3m & 2.4m

Next Generation Survey: LSST

Large Synoptic Survey
Telescope (LSST):
1 Gb every 2 seconds

106 supernovae/yr
105 eclipsing systems
107 asteroids...

light curves of 800
million sources every
3 days

Transients Classiﬁcation Pipeline
“Object”
Datastream

source

T
generation

C
feature
generation

P source
classiﬁcation
Database

Follow-up
telescope observations

Broadcast

Parallelized source correlation
and classification

• Retrieve difference objects

• Each difference-object is passed to an IPython client

• Each parallel IPython client performs:
• Source creation or correlation with existing sources

• “Feature” generation (or re-generation) for that source

source • Classification of that source
generation

feature
generation

source
classification

Parallelized source correlation
and classification

• Realtime TCP runs on 22 dedicated cores

• LCOGT’s 96 core beowulf
• non run-time tasks

• Classifier generation

• Additional resources: (for future classification work)
• Yahoo! M45 cluster
source
generation • Amazon EC2 cluster

feature
generation

source
classification

Warehouse of light-curves

• Need representative light-curves for all science

• With these we can model each science class

• We’ve built a warehouse of example light-curves

TCP-TUTOR DotAstro.org
internal interface public interface

“Noisifying to the Survey”

• Well sampled light-curves
• Can make good classifiers for well-sampled data.

• Don’t immediately make good classifiers for noisy, sparse data.

• We need classifiers which are trained using:
• sampling cadence of our survey

• sparseness of our survey data

• noise and sensitivity limitations of our instrument

• We need “Noisification” software which:
• Resamples well-sampled light-curves

• Outputs noisified sources which are used for generating classifiers


• For PTF:
• Code uses PTF pointing and survey observing plans

• Occasionally PTF observes using a faster cadence:

• 7.5 minutes between revisiting an RA, Dec

• Faster cadence requires a separate set of noisified light-curves
and classifiers.

• Other surveys:
• Other pointing and observing plans could be used.

• Can generate noisified light-curves for other surveys.

• Then we can generate science classifiers for these surveys.

Classifiers
• General Classifier
Identify: Filter out:

• well sampled (periodic & nonperiodic) • poorly subtracted sources

• interesting sources near known galaxies • minor planets / rocks

• periodic variable science class when • cosmic rays
confidence is high
• detector defects

• Timeseries Classifiers
• Weighted combination of WEKA classifiers

• bagged Random Forest classifier using a cost-matrix

• Each classifier trained on different cadenced noisified data

• Astronomer crafted classifiers for specific science types

• Microlens, Super Nova

Interesting near-galaxy PTF sources

• Identiﬁed by TCP during end of Aug ‘09
• Classiﬁcation triggered by latest epoch
added to the source

Periodic variable classifiers
• Currently, science classes are determined by combining
the weighted probabilities generated by different
classification models, for a source.
~0.4 day period
~0.14 day period
RR Lyrae using • Each machine-learned classification model is trained using RR Lyrae using
10 epoch
20 epoch “noisified” lightcurves which were generated using
different parameters. noisification
noisification
...shows highest classification
Clicking on a class for one
probability sources for that
of dozens of ML models...
model::class

Overplotting of
period-fold plotting
period-folded model
probably failed here
still needs work

0.1 - 0.17 day period RR Lyrae
using 15 epoch noisification

Evaluating and Combining Classifiers

• Issues when using multiple classifiers:
• How to combine classifiers when using:

• weighted classifiers

• tree-hierarchy of sub-classifiers

• How to generate final classification “probabilities” when using:

• Widely varying types of classifiers
• Classifiers which contain sub-classifications & probabilities
• Evaluate the final combination of classifiers
• Classify PTF09xxx user classified sources, determine efficiencies

• Classify noisified sources, determine efficiencies

Talk about T.C.P. for CDI inter-departmental workshop at UC Berkeley. 20090911.

Talk about T.C.P. for CDI inter-departmental workshop at UC Berkeley. 20090911.

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (19)

Similar to Talk about T.C.P. for CDI inter-departmental workshop at UC Berkeley. 20090911.

Similar to Talk about T.C.P. for CDI inter-departmental workshop at UC Berkeley. 20090911. (20)

Recently uploaded

Recently uploaded (20)

Talk about T.C.P. for CDI inter-departmental workshop at UC Berkeley. 20090911.