SlideShare ist ein Scribd-Unternehmen logo
1 von 51
January 7 - March 2, 1610
Classification of Astronomical
Time-Series Data in the
Synoptic Survey Era


  Josh Bloom
  Joseph Richards

  University of California, Berkeley




          Berkeley Streaming Workshop; 7 May 2012
Center for Time-Domain Informatics

UC Berkeley (UCB):
Faculty/Staff
JSB, Dan Starr (Astro), John Rice, Noureddine El Karoui (Stats), Martin
Wainwright, Masoud Nikravesh (CS)
Postdocs
Joey Richards (stat/astro), Berian James, Damian Eads, Dovi Poznanski
(→Tel Aviv), Brad Cenko, Nat Butler, Nino Cucchiara, Damian Eads
(→Cambridge)
Grad Students
Dan Perley (→Caltech), Adam Miller, Adam Morgan, Chris Klein, James
Long, Tamara Broderick (stats), Sahand Negahban (EECS), John Brewer (→Yale),
Henrik Brink (←Copenhagen)
Undergrads
Anthony Paredes, Tatyana Gavrilchenko, Stuart Gegenheimer, Maxime Rischard,
Justin Higgins, Rachel Kennedy, Arien Crellin-Quick, Michelle Kislak (→UCLA),
Allison Merritt (→Yale)
Lawrence Berkeley National Laboratory (LBNL):
Peter Nugent, David Schlegel, Nic Ross, Horst Simon
                           Visit our website: http://cftd.info/
Text
Understanding & Exploiting the Dynamic Universe
Text
Understanding & Exploiting the Dynamic Universe


  •Twinkle, twinkle...
   Everything changes at some level (brightness, color, position, ...)
Text
Understanding & Exploiting the Dynamic Universe


  •Twinkle, twinkle...
   Everything changes at some level (brightness, color, position, ...)

  • Stars die...and blow up
   supernovae, gamma-ray bursts, new phenomena ...
Text
Understanding & Exploiting the Dynamic Universe


  •Twinkle, twinkle...
   Everything changes at some level (brightness, color, position, ...)

  • Stars die...and blow up
   supernovae, gamma-ray bursts, new phenomena ...


  • Discovery is only the start
Text
Understanding & Exploiting the Dynamic Universe


  •Twinkle, twinkle...
   Everything changes at some level (brightness, color, position, ...)

  • Stars die...and blow up
   supernovae, gamma-ray bursts, new phenomena ...


  • Discovery is only the start
         Greatest insights require follow-up (imaging,
   spectroscopy, archive introspection)
Text
Understanding & Exploiting the Dynamic Universe


  •Twinkle, twinkle...
   Everything changes at some level (brightness, color, position, ...)

  • Stars die...and blow up
   supernovae, gamma-ray bursts, new phenomena ...


  • Discovery is only the start
         Greatest insights require follow-up (imaging,
   spectroscopy, archive introspection)
         Follow-up is EXPENSIVE
              (ie., people, time, telescope, resources, $)
Gamma-Ray Burst Transients




                 “static” γ-ray
                       sky
Gamma-Ray Burst Transients
• Short-lived blasts of
  high energy light
  (γ-rays & X-rays)
Gamma-Ray Burst Transients
• Short-lived blasts of
  high energy light
  (γ-rays & X-rays)




                          “static” γ-ray
                                sky
Gamma-Ray Burst Transients
• Short-lived blasts of
  high energy light
  (γ-rays & X-rays)

• random & rare - found
  by specialized satellites




                              “static” γ-ray
                              “static” γ-ray
                                    sky
Gamma-Ray Burst Transients
• Short-lived blasts of               106
  high energy light
  (γ-rays & X-rays)
                               10,000
• random & rare - found
  by specialized satellites
                                      100
• also: brightest optical
  events in universe
                              power
                                       1
  (transient “afterglow”)

                                  0.01
Gamma-Ray Burst Transients
• Short-lived blasts of               106
  high energy light
  (γ-rays & X-rays)
                               10,000
• random & rare - found
  by specialized satellites
                                      100
• also: brightest optical
  events in universe
                              power
                                       1
  (transient “afterglow”)

• two origins: exploding          0.01
  massive stars &
  colliding compact
  objects
Gamma-Ray Burst Transients
• Short-lived blasts of               106
  high energy light
  (γ-rays & X-rays)
                               10,000
• random & rare - Challenge: how can we
                  found
  by specialized satellites
                     maximize our science
                           100
                   return     power
• also: brightest optical on discovery with
  events in universe optimized follow up?
                            1
  (transient “afterglow”)

• two origins: exploding          0.01
  massive stars &
  colliding compact
  objects
Follow-Up-Resource-Aware Classification

                                         collect burst
                                          data from
                                         satellite feed




                                         predict which
                                           events are
 “high redshift”
                                         "high redshift"
      GRBs
                                          in real-time
  less interesting

                     unclassified         "immediately"
                                          available data
Follow-Up-Resource-Aware Classification
                                                                                           Efficiency vs α




                                                                    1.0
                                                                           predicted




                   fraction of high-redshift GRBs
                                                                           improvement
                                                                           (90% c.l.)




                                                                    0.8
                             Fraction of high (z>4) GRBs observed
“59% (86%) of
high-z GRBs can


                                                                    0.6
be captured from




                                                                                                   om
following up the




                                                                                               nd
top 20% (40%) of                                                    0.4




                                                                                             ra
the ranked
candidates”
                                                                    0.2
                                                                    0.0




                                                                          0.0    0.2       0.4         0.6       0.8   1.0

                                                                                 followed-up fraction
                                                                                    Fraction of GRBs Followed Up: α

 Morgan+11                                                                                       reduced
Extragalactic Transient Universe:
                                Explosive Systems
                   -22



                   -20
                                      Pair Production Supernovae
                                                                          z=0.45
log(brightness)


                   -18                       Type Ia
                  MH




                   -16
                                                              Type IIp
                   -14
                            IMBH + WD Collision                          200Mpc


                   -12
                                              NS + RSG Collision

                           NS + NS Mergers
                   -10
                                50           100        150              200
                                      Days Since Explosion
                                                         E. Ramirez-Ruiz (UCSC)
Text
Data Deluge Challenge

 Large Synoptic Survey Telescope (LSST) - 2018
 !   Light curves for 800M sources every 3 days
     106 supernovae/yr, 105 eclipsing binaries
     3.2 gigapixel camera, 20 TB/night

 LOFAR & SKA
   150 Gps (27 Tflops) → 20 Pps (~100 Pflops)

 Gaia space astrometry mission - 2013
     1 billion stars observed ∼70 times over 5 years
      Will observe 20K supernovae

 Many other astronomical surveys are already producing data:
 SDSS, PTF, CRTS, Pan-STARRS, Hipparcos, OGLE, ASAS,
 Kepler, LINEAR, DES (soon) etc., etc.
Text
Data Deluge Challenge

 Large Synoptic Survey Telescope (LSST) - 2018
 !   Light curves for 800M sources every 3 days
     106 supernovae/yr, 105 eclipsing binaries
     3.2 gigapixel camera, 20 TB/night

 LOFAR & SKA
   150 Gps (27 Tflops) → 20 Pps (~100 Pflops)

 Gaia space astrometry mission - 2013
     1 billion stars observed ∼70 times over 5 years
      Will observe 20K supernovae

 Many other astronomical surveys are already producing data:
 SDSS, PTF, CRTS, Pan-STARRS, Hipparcos, OGLE, ASAS,
 Kepler, LINEAR, DES (soon) etc., etc.
Text
Data Deluge Challenge

 Large Synoptic Survey Telescope (LSST) - 2018
 !   Light curves for 800M sources every 3 days
     106 supernovae/yr, 105 eclipsing binaries
     3.2 gigapixel camera, 20 TB/night

 LOFAR & SKAHow do we do discovery,
      follow-up, and inference when
   150 Gps (27 Tflops) → 20 Pps (~100 Pflops)           the
                 data rates (& requisite
 Gaia space astrometry mission - 2013
   1 billiontimescales) precludeyears
             stars observed ∼70 times over 5 human
                        involvement?
    Will observe 20K supernovae

 Many other astronomical surveys are already producing data:
 SDSS, PTF, CRTS, Pan-STARRS, Hipparcos, OGLE, ASAS,
 Kepler, LINEAR, DES (soon) etc., etc.
Machine Learning As Surrogate
- trained to quickly make concrete, deterministic, &
repeatable statements about abstract concepts


                 “Is this varying
               source astrophysical
                   in nature or
                    spurious?”
Machine Learning As Surrogate
 - trained to quickly make concrete, deterministic, &
 repeatable statements about abstract concepts


  “Is this varying
source astrophysical                  Discovery
    in nature or
     spurious?”
Machine Learning As Surrogate
 - trained to quickly make concrete, deterministic, &
 repeatable statements about abstract concepts

                                PTF: 1.5M candidate/night
  “Is this varying                1:1000 are astrophysical
source astrophysical
                                   machine has opined on
    in nature or                     800M candidates
                                                         Bloom+11
     spurious?”                     Poznanski, Brink, this workshop




                                      Discovery
Reference   New   Difference


                               11kly




                               11kx


                                   also, cf., Bailey+07
2011fe identified w/ Machine-Learned Discovery Algorithms

                                    Discovery image was
                                    ~11 hours after
                                    explosion

                                    Within a few hours, a
                                    spectrum confirmed it
                                    to be a SN Ia

                                    Nearest SN Ia in more
                                    than 3 decades

                                    5th brightest supernova
                                    in 100 years
Machine Learning As Surrogate
- trained to quickly make concrete, deterministic, &
repeatable statements about abstract concepts



               “What is the nature
               (origin/reason...) of
                 the variability?”
Machine Learning As Surrogate
- trained to quickly make concrete, deterministic, &
repeatable statements about abstract concepts



“What is the nature
                                   Classification
(origin/reason...) of
  the variability?”
Pulsating

                        Alpha Cygni (ACYG)                                             Short Period (BCEPS)
                        Beta Cephei (BCEP)                                             Anomalous (BLBOO)
Pulsating Stars
                                                                                       Multiple Modes (CEPB)
                        Cepheids (CEP)
                                                                                       Long Period (CWA)
                        W Virginis (CW)                                                Short Period (CWB)
                        Delta Cep (DCEP)                                               Symmetrical (DCEPS)
                        Delta Scuti (DSCT)                                             Low Amplitude (DSCTC)
                        Slow Irregular (L)                                             Late Spectral Type (K, M, C, S) (LB)

                        Mira (M)                                                       Supergiants (LC)
                                                                                       Dual Mode (RRB)
                        PV Telescopii (PVTEL)
                                                                                       Asymmetric (RRAB)
                        RR Lyrae (RR)                                                  Near Symmetric (RRC)
                                                                                       Constant Mean Magnitude (RVA)
                        RV Tauri (RV)
                                                                                       Variable Mean Magnitude (RVB)
                                                                                       Persistent Periodicity (SRA)
                        Semiregular (SR)                                               Poorly Defined Periodicity (SRB)

                        Pulsating Subdwarfs (SXPHE)                                    Supergiants (SRC)
                                                                                       F, G, or K (SRD)
                                                                                       Only H Absorption (ZZA)
                        ZZ Ceti (ZZ)
                                                                                       Only He Absorption (ZZB)
                                                                                       HeII Absoption (ZZO)
Cataclysmic Variables




                                                          Cataclysmic Variables


                                                             SS Cygni

                        U Geminorum (UG)                     SU Ursae Majoris                                SNIa
                                                             Z Camelopardalis
                                                                                                             SNIb
                                                             Type I Supernovae (SNI)
                                                                                                             SNIc
                        Supernovae (SN)
                                                                                                             SNIIL
                                                             Type II Supernovae (SNII)

                                                                                                             SNIIN
                                                             Fast Novae (NA)
                                                             Slow Novae (NB)                                 SNIIP

                        Novae (N)                            Very Slow Novae (NC)

                                                             Novalike Variables (NL)

                                                             Recurrent Novae (NR)

                        Gamma-ray Bursts (GRB)               Long Gamma-ray Burst (LSB)

                                                             Soft Gamma-ray Repeater (SGR)

                        Symbiotic Variables (ZAND)           Short Gamma-ray Burst (SHB)


                                                                Eclipsing
Eclipsing Systems




                        Systems with White Dwarfs (WD)
                        Semidetached (SD)                                              Early (O-A) (KE)
                        RS Canum Venaticorum (RS)
                                                                                       W Ursa Majoris (KW)
                        Planetary Nebulae (PN)
                        Contact Systems (K)                                            Algol (Beta Persei) (EA)
                        Systems with Supergiant(s) (GS)
                        Eclipsing Binary Systems (E)                                   Beta Lyrae (EB)

                                                                                       W Ursae Majoris (EW)

                                                                                       Main Sequence (DM)

                        Detached (D)                                                   With Subgiant (DS)
                        Detached - AR Lacertae (AR)
                                                                                       W Ursa Majoris (DW)
                        Wolf-Rayet Stars (WR)
Considerable Complications with Time-Series Data



                                   • noisy, irregularly
                                        sampled
Considerable Complications with Time-Series Data



                                   • noisy, irregularly
                                        sampled


                                   • spurious data
Considerable Complications with Time-Series Data



                                   • noisy, irregularly
                                        sampled


                                   • spurious data

                                   • telltale signature
                                    event may not
                                    have happened
                                          yet
Machine-Learning Approach to Classification

        Features: homogenize the data; real-number metrics that
        describe the time-domain characteristics & context of a source

        ~100 features computed in < 1 sec (including periodogram
        analysis)




Wózniak et al. 2004; Protopapas+06, Willemsen & Eyer 2007; Debosscher et al. 2007; Mahabal et al.
2008; Sarro et al. 2009; Blomme et al. 2010; Kim+11, Richards+11
Machine-Learning Approach to Classification

        Features: homogenize the data; real-number metrics that
        describe the time-domain characteristics & context of a source

        ~100 features computed in < 1 sec (including periodogram
        analysis)
                                               periodic
          variability metrics:           e.g. domi        metrics
                                                   nant freq        :
        e.g. Stetson indices, χ  2/dof Lomb-                  uencies in
                                                 Scargle, p
                                                            hase offs
           (constant hypothesis)               between                ets
                                                         periods
               shape analysis
                    wness, kurtosis,         context metrics
            e.g. ske
                   Gaussianity        e.g. distance to nearest galaxy,
                                     type of nearest galaxy, locatio
                                                                       n
                                            in the ecliptic plane
Wózniak et al. 2004; Protopapas+06, Willemsen & Eyer 2007; Debosscher et al. 2007; Mahabal et al.
2008; Sarro et al. 2009; Blomme et al. 2010; Kim+11, Richards+11
Variable Star Classification Confusion
                                                                                                                                                

                                 

                         

                             

                    

                      

               
  


                       

                        

                        

                           


True                   

                          

Class                  

                    

                         

                         

                      

                          

                              

                        

                           

                           

                         

                                   Richards+11
                        
                                                                                                                    
Variable Star Classification Confusion
                                                                                                                                                

                                 

                         

                                                                      pulsating
                    

                      

               
  


                       

                        

                        

                           


True                   

                          

Class                  
                                                                                                                                   eruptive
                    

                         

                         

                      

                          

                              

                        

                           

                           


                                                                                                                multi-star
                         

                                   Richards+11
                        
                                                                                                                    
Variable Star Classification Confusion
                                                                                                                                                

                                 

                         

                                                                      pulsating
                    

                      

               


                                      - global classification errors on
  


                       

                        


                                      well-observed sources approaching
                        

                           


True                                  15%
                       

                          

Class                                 - random forest with missing data
                       
                                                                   eruptive
                                      imputation emerging as superior
                    

                         

                         

                                                                                e.g., Dubath+11,Richards+11
                          

                              

                        

                           

                           


                                                                                                                multi-star
                         

                                   Richards+11
                        
                                                                                                                    
StructuredLearning
Structured Classification

    Structured Classification: Let class taxonomy guide classifier.

   5% gross mis-
   classification
       rate!




  HSC: Hierarchical single-label     HMC: Hierarchical multi-label
  classification.                     classification.
     I Fit separate classifier at        I Fit one classifier, where
                                                          depth
       each non-terminal node.            L(y , f (x)) w0
Richards+11
Decision Boundaries are Survey Specific
  How do we transfer learning from one survey to the next?
                         –3–


    (a)                                    (b)
  feature #2




                    feature #1                            feature #1

                   Hipparcos                             OGLE-III
Fig. 1.— (a) The grey lines represent the CART classifier constructed using Hipparcos data.
The points are Hipparcos sources. This classifier separates Hipparcos sources well (0.6%
error as measured by cross-validation). (b) Here the OGLE sources are plotted over the
                                                              Long+12; Richards+11
same decision boundaries. There is now significant class overlap (30% error rate). This is
Decision Boundaries are Survey Specific
                                                       – 31 –

 How do we transfer learning from one survey to the next?




 “Expert”

   ASAS
  (testing)
OGLE+Hip
 (training)

              Fig. 8.—
                                                                                     Long+12; Richards+11
                         Active learning samples on a single iteration of the algorithm. Yellow circles
              signify points that at least 65% of users were able to classify. These points are included
Decision Boundaries are Survey Specific
               How do we transfer learning from one survey to the next?

                                       ●                                                                                           ●
                               ●
                                   ●

                    ●




                                                                                 0.40
                           ●
                                                                                                                               ●
              ●



                                                                                                                           ●




                                           Percent of Confident ASAS RF Labels
        ●


    ●




                                                                                 0.35
                                                                                                                       ●




                                                                                                                ●


●
                                                                                 0.30

                                                                                                          ●
                                                                                 0.25




                                                                                                    ●

                                                                                                ●
                                                                                 0.20




                                                                                            ●
                                                                                 0.15




                                                                                        ●




    2         4            6       8                                                    0       2         4            6       8
            AL Iteration                                                                                AL Iteration




                                                                       Long+12;
eft: Percent agreement of the Random Forest classifier with the ACVS labels,                                                            Richards+11
 of AL iteration. Right: Percent of ASAS data with confident RF classification
Classification Statements are Inherently Fuzzy

- classification probabilities should reflect
uncertainty in the data & training
- higher confidence with greater proximity to training data
- calibration of classification probability vector
            E.g.: 20% of transients classified as
            supernova of type “Ib” with P=0.2
              should be supernova of type “Ib”
Classification Statements are Inherently Fuzzy

- classification probabilities should reflect
uncertainty in the data & training
- higher confidence with greater proximity to training data
- calibration of classification probability vector
            E.g.: 20% of transients classified as
            supernova of type “Ib” with P=0.2
              should be supernova of type “Ib”
           Catalogs of Transients and
           Variable Stars Must Become
                  Probabilistic
http://bigmacc.info
Doing Science with Probabilistic Catalogs


Demographics (with little followup):
 trading high purity at the cost of lower efficiency
 e.g., using RRL to find new Galactic structure


Novelty Discovery (with lots of followup):
 trading high efficiency for lower purity
 e.g., discovering new instances of rare classes
Discovery of Bright Galactic R Coronae Borealis and DY Persei
                                                                  Variables: Rare Gems Mined from ASAS
                                                             A. A. Miller1,⇤ , J. W. Richards1,2 , J. S. Bloom1 , S. B. Cenko1 , J. M. Silverman1 ,
         arXiv:1204.4181v1 [astro-ph.SR] 18 Apr 2012                                  D. L. Starr1 , and K. G. Stassun3,4


                                                                                                ABSTRACT
                                                                                                    – 13 –
                                                                 We present the results of a machine-learning (ML) based search for new R
                                                             Coronae Borealis (RCB) stars and DY Persei-like stars (DYPers) in the Galaxy
                                                             using cataloged light curves obtained by the All-Sky Automated Survey (ASAS).
                                                             RCB stars—a rare class of hydrogen-deficient carbon-rich supergiants—are of
                                                             great interest owing to the insights they can provide on the late stages of stellar
                                                             evolution. DYPers are possibly the low-temperature, low-luminosity analogs to
                                                             the RCB phenomenon, though additional examples are needed to fully estab-
                                                             lish this connection. While RCB stars and DYPers are traditionally identified
                                                             by epochs of extreme dimming that occur without regularity, the ML search
                                                             framework more fully captures the richness and diversity of their photometric
                                                             behavior. We demonstrate that our ML method recovers ASAS candidates that
                                                             would have been missed by traditional search methods employing hard cuts on
                                                             amplitude and periodicity. Our search yields 13 candidates that we consider
                                                             likely RCB stars/DYPers: new and archival spectroscopic observations confirm
                                                             that four of these candidates are RCB stars and four are DYPers. Our discovery
                                                             of four new DYPers increases the number of known Galactic DYPers from two
                                                             to six; noteworthy is that one of the new DYPers has a measured parallax and is
                                                             m ⇡ 7 mag, making it the brightest known DYPer to date. Future observations
                                                             of these new DYPers should prove instrumental in establishing the RCB con-
Fig. 2.— ASAS                                              V nection. We consider these results, derived from a machine-learned probabilistic
                                                              -band light curves of newly discovery RCB stars and
                                                                                                 DYPers. Note t
di↵ering magnitude ranges shown for each light curve. Spectroscopic observations confi
                                                       1
                 Department of Astronomy, University of California, Berkeley, CA 94720-3411, USA
the top four candidates to RCB/DY California, Berkeley, CA, bottomUSA are DYPers.
     17 known Galactic be Universitystars, while the 94720-7450, four
                 Statistics Department,
                                        RCB of Per     2

                                                       3
Discovery of Bright Galactic R Coronae Borealis and DY Persei
                                                           Variables: Rare Gems Mined from ASAS
                                                      A. A. Miller1,⇤ , J. W. Richards1,2 , J. S. Bloom1 , S. B. Cenko1 , J. M. Silverman1 ,
  arXiv:1204.4181v1 [astro-ph.SR] 18 Apr 2012                                  D. L. Starr1 , and K. G. Stassun3,4


                                                                                          ABSTRACT

                                                          We present the results of a machine-learning (ML) based search for new R
                                                      Coronae Borealis (RCB) stars and DY Persei-like stars (DYPers) in the Galaxy
                                                      using cataloged light curves obtained by the All-Sky Automated Survey (ASAS).
                                                      RCB stars—a rare class of hydrogen-deficient carbon-rich supergiants—are of
                                                      great interest owing to the insights they can provide on the late stages of stellar
                                                      evolution. DYPers are possibly the low-temperature, low-luminosity analogs to
                                                      the RCB phenomenon, though additional examples are needed to fully estab-
                                                      lish this connection. While RCB stars and DYPers are traditionally identified
                                                      by epochs of extreme dimming that occur without regularity, the ML search
                                                      framework more fully captures the richness and diversity of their photometric
                                                      behavior. We demonstrate that our ML method recovers ASAS candidates that
                                                      would have been missed by traditional search methods employing hard cuts on
                                                      amplitude and periodicity. Our search yields 13 candidates that we consider
                                                      likely RCB stars/DYPers: new and archival spectroscopic observations confirm
                                                      that four of these candidates are RCB stars and four are DYPers. Our discovery
                                                      of four new DYPers increases the number of known Galactic DYPers from two
                                                      to six; noteworthy is that one of the new DYPers has a measured parallax and is
                                                      m ⇡ 7 mag, making it the brightest known DYPer to date. Future observations
                                                      of these new DYPers should prove instrumental in establishing the RCB con-
                                                      nection. We consider these results, derived from a machine-learned probabilistic

                                                1
                                                    Department of Astronomy, University of California, Berkeley, CA 94720-3411, USA

17 known Galactic RCB/DY California, Berkeley, CA, 94720-7450, USA
         Statistics Department, University of Per
                                                2

                                                3
Variety of Open Questions
1. How do bootstrap learning from one survey to
the next, given inherent differences?
                              “active learning” (e.g., Richards+11b)
2. How do we detect and quantify real outliers?
                                   e.g. clustering, semi-supervised learning
                (e.g., Protopapas+06, Rebbapragada+09, Bhattacharyya+, in prep)
3. How do imbue domain knowledge into
classifiers?                 hybridization, metalearning
4. How do we weigh classification value with
computational cost?                  resource allocation
Summary
science maximization with synoptic surveys demands a
more distant human role than before
machine learning in time-domain astrophysics is
not just talk...it’s working and enabling novel
science

yet, real-time discovery & classification is far
from solved
helpful to view endeavor as a resource-limited problem
See you tomorrow!


    food starting 8am
    talks starting 9am
group picture before lunch

Weitere ähnliche Inhalte

Mehr von Joshua Bloom

Industrial Machine Learning (at GE)
Industrial Machine Learning (at GE)Industrial Machine Learning (at GE)
Industrial Machine Learning (at GE)Joshua Bloom
 
Data Science Education: Needs & Opportunities in Astronomy
Data Science Education: Needs & Opportunities in AstronomyData Science Education: Needs & Opportunities in Astronomy
Data Science Education: Needs & Opportunities in AstronomyJoshua Bloom
 
PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning" PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning" Joshua Bloom
 
Computational Training for Domain Scientists & Data Literacy
Computational Training for Domain Scientists & Data LiteracyComputational Training for Domain Scientists & Data Literacy
Computational Training for Domain Scientists & Data LiteracyJoshua Bloom
 
Large-Scale Inference in Time Domain Astrophysics
Large-Scale Inference in Time Domain AstrophysicsLarge-Scale Inference in Time Domain Astrophysics
Large-Scale Inference in Time Domain AstrophysicsJoshua Bloom
 
Data Science at Berkeley
Data Science at BerkeleyData Science at Berkeley
Data Science at BerkeleyJoshua Bloom
 
Computational Training and Data Literacy for Domain Scientists
Computational Training and Data Literacy for Domain ScientistsComputational Training and Data Literacy for Domain Scientists
Computational Training and Data Literacy for Domain ScientistsJoshua Bloom
 

Mehr von Joshua Bloom (7)

Industrial Machine Learning (at GE)
Industrial Machine Learning (at GE)Industrial Machine Learning (at GE)
Industrial Machine Learning (at GE)
 
Data Science Education: Needs & Opportunities in Astronomy
Data Science Education: Needs & Opportunities in AstronomyData Science Education: Needs & Opportunities in Astronomy
Data Science Education: Needs & Opportunities in Astronomy
 
PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning" PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning"
 
Computational Training for Domain Scientists & Data Literacy
Computational Training for Domain Scientists & Data LiteracyComputational Training for Domain Scientists & Data Literacy
Computational Training for Domain Scientists & Data Literacy
 
Large-Scale Inference in Time Domain Astrophysics
Large-Scale Inference in Time Domain AstrophysicsLarge-Scale Inference in Time Domain Astrophysics
Large-Scale Inference in Time Domain Astrophysics
 
Data Science at Berkeley
Data Science at BerkeleyData Science at Berkeley
Data Science at Berkeley
 
Computational Training and Data Literacy for Domain Scientists
Computational Training and Data Literacy for Domain ScientistsComputational Training and Data Literacy for Domain Scientists
Computational Training and Data Literacy for Domain Scientists
 

Kürzlich hochgeladen

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 

Kürzlich hochgeladen (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era

  • 1. January 7 - March 2, 1610
  • 2. Classification of Astronomical Time-Series Data in the Synoptic Survey Era Josh Bloom Joseph Richards University of California, Berkeley Berkeley Streaming Workshop; 7 May 2012
  • 3. Center for Time-Domain Informatics UC Berkeley (UCB): Faculty/Staff JSB, Dan Starr (Astro), John Rice, Noureddine El Karoui (Stats), Martin Wainwright, Masoud Nikravesh (CS) Postdocs Joey Richards (stat/astro), Berian James, Damian Eads, Dovi Poznanski (→Tel Aviv), Brad Cenko, Nat Butler, Nino Cucchiara, Damian Eads (→Cambridge) Grad Students Dan Perley (→Caltech), Adam Miller, Adam Morgan, Chris Klein, James Long, Tamara Broderick (stats), Sahand Negahban (EECS), John Brewer (→Yale), Henrik Brink (←Copenhagen) Undergrads Anthony Paredes, Tatyana Gavrilchenko, Stuart Gegenheimer, Maxime Rischard, Justin Higgins, Rachel Kennedy, Arien Crellin-Quick, Michelle Kislak (→UCLA), Allison Merritt (→Yale) Lawrence Berkeley National Laboratory (LBNL): Peter Nugent, David Schlegel, Nic Ross, Horst Simon Visit our website: http://cftd.info/
  • 4. Text Understanding & Exploiting the Dynamic Universe
  • 5. Text Understanding & Exploiting the Dynamic Universe •Twinkle, twinkle... Everything changes at some level (brightness, color, position, ...)
  • 6. Text Understanding & Exploiting the Dynamic Universe •Twinkle, twinkle... Everything changes at some level (brightness, color, position, ...) • Stars die...and blow up supernovae, gamma-ray bursts, new phenomena ...
  • 7. Text Understanding & Exploiting the Dynamic Universe •Twinkle, twinkle... Everything changes at some level (brightness, color, position, ...) • Stars die...and blow up supernovae, gamma-ray bursts, new phenomena ... • Discovery is only the start
  • 8. Text Understanding & Exploiting the Dynamic Universe •Twinkle, twinkle... Everything changes at some level (brightness, color, position, ...) • Stars die...and blow up supernovae, gamma-ray bursts, new phenomena ... • Discovery is only the start Greatest insights require follow-up (imaging, spectroscopy, archive introspection)
  • 9. Text Understanding & Exploiting the Dynamic Universe •Twinkle, twinkle... Everything changes at some level (brightness, color, position, ...) • Stars die...and blow up supernovae, gamma-ray bursts, new phenomena ... • Discovery is only the start Greatest insights require follow-up (imaging, spectroscopy, archive introspection) Follow-up is EXPENSIVE (ie., people, time, telescope, resources, $)
  • 10. Gamma-Ray Burst Transients “static” γ-ray sky
  • 11. Gamma-Ray Burst Transients • Short-lived blasts of high energy light (γ-rays & X-rays)
  • 12. Gamma-Ray Burst Transients • Short-lived blasts of high energy light (γ-rays & X-rays) “static” γ-ray sky
  • 13. Gamma-Ray Burst Transients • Short-lived blasts of high energy light (γ-rays & X-rays) • random & rare - found by specialized satellites “static” γ-ray “static” γ-ray sky
  • 14. Gamma-Ray Burst Transients • Short-lived blasts of 106 high energy light (γ-rays & X-rays) 10,000 • random & rare - found by specialized satellites 100 • also: brightest optical events in universe power 1 (transient “afterglow”) 0.01
  • 15. Gamma-Ray Burst Transients • Short-lived blasts of 106 high energy light (γ-rays & X-rays) 10,000 • random & rare - found by specialized satellites 100 • also: brightest optical events in universe power 1 (transient “afterglow”) • two origins: exploding 0.01 massive stars & colliding compact objects
  • 16. Gamma-Ray Burst Transients • Short-lived blasts of 106 high energy light (γ-rays & X-rays) 10,000 • random & rare - Challenge: how can we found by specialized satellites maximize our science 100 return power • also: brightest optical on discovery with events in universe optimized follow up? 1 (transient “afterglow”) • two origins: exploding 0.01 massive stars & colliding compact objects
  • 17. Follow-Up-Resource-Aware Classification collect burst data from satellite feed predict which events are “high redshift” "high redshift" GRBs in real-time less interesting unclassified "immediately" available data
  • 18. Follow-Up-Resource-Aware Classification Efficiency vs α 1.0 predicted fraction of high-redshift GRBs improvement (90% c.l.) 0.8 Fraction of high (z>4) GRBs observed “59% (86%) of high-z GRBs can 0.6 be captured from om following up the nd top 20% (40%) of 0.4 ra the ranked candidates” 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 followed-up fraction Fraction of GRBs Followed Up: α Morgan+11 reduced
  • 19. Extragalactic Transient Universe: Explosive Systems -22 -20 Pair Production Supernovae z=0.45 log(brightness) -18 Type Ia MH -16 Type IIp -14 IMBH + WD Collision 200Mpc -12 NS + RSG Collision NS + NS Mergers -10 50 100 150 200 Days Since Explosion E. Ramirez-Ruiz (UCSC)
  • 20. Text Data Deluge Challenge Large Synoptic Survey Telescope (LSST) - 2018 ! Light curves for 800M sources every 3 days 106 supernovae/yr, 105 eclipsing binaries 3.2 gigapixel camera, 20 TB/night LOFAR & SKA 150 Gps (27 Tflops) → 20 Pps (~100 Pflops) Gaia space astrometry mission - 2013 1 billion stars observed ∼70 times over 5 years Will observe 20K supernovae Many other astronomical surveys are already producing data: SDSS, PTF, CRTS, Pan-STARRS, Hipparcos, OGLE, ASAS, Kepler, LINEAR, DES (soon) etc., etc.
  • 21. Text Data Deluge Challenge Large Synoptic Survey Telescope (LSST) - 2018 ! Light curves for 800M sources every 3 days 106 supernovae/yr, 105 eclipsing binaries 3.2 gigapixel camera, 20 TB/night LOFAR & SKA 150 Gps (27 Tflops) → 20 Pps (~100 Pflops) Gaia space astrometry mission - 2013 1 billion stars observed ∼70 times over 5 years Will observe 20K supernovae Many other astronomical surveys are already producing data: SDSS, PTF, CRTS, Pan-STARRS, Hipparcos, OGLE, ASAS, Kepler, LINEAR, DES (soon) etc., etc.
  • 22. Text Data Deluge Challenge Large Synoptic Survey Telescope (LSST) - 2018 ! Light curves for 800M sources every 3 days 106 supernovae/yr, 105 eclipsing binaries 3.2 gigapixel camera, 20 TB/night LOFAR & SKAHow do we do discovery, follow-up, and inference when 150 Gps (27 Tflops) → 20 Pps (~100 Pflops) the data rates (& requisite Gaia space astrometry mission - 2013 1 billiontimescales) precludeyears stars observed ∼70 times over 5 human involvement? Will observe 20K supernovae Many other astronomical surveys are already producing data: SDSS, PTF, CRTS, Pan-STARRS, Hipparcos, OGLE, ASAS, Kepler, LINEAR, DES (soon) etc., etc.
  • 23. Machine Learning As Surrogate - trained to quickly make concrete, deterministic, & repeatable statements about abstract concepts “Is this varying source astrophysical in nature or spurious?”
  • 24. Machine Learning As Surrogate - trained to quickly make concrete, deterministic, & repeatable statements about abstract concepts “Is this varying source astrophysical Discovery in nature or spurious?”
  • 25. Machine Learning As Surrogate - trained to quickly make concrete, deterministic, & repeatable statements about abstract concepts PTF: 1.5M candidate/night “Is this varying 1:1000 are astrophysical source astrophysical machine has opined on in nature or 800M candidates Bloom+11 spurious?” Poznanski, Brink, this workshop Discovery
  • 26. Reference New Difference 11kly 11kx also, cf., Bailey+07
  • 27. 2011fe identified w/ Machine-Learned Discovery Algorithms Discovery image was ~11 hours after explosion Within a few hours, a spectrum confirmed it to be a SN Ia Nearest SN Ia in more than 3 decades 5th brightest supernova in 100 years
  • 28. Machine Learning As Surrogate - trained to quickly make concrete, deterministic, & repeatable statements about abstract concepts “What is the nature (origin/reason...) of the variability?”
  • 29. Machine Learning As Surrogate - trained to quickly make concrete, deterministic, & repeatable statements about abstract concepts “What is the nature Classification (origin/reason...) of the variability?”
  • 30. Pulsating Alpha Cygni (ACYG) Short Period (BCEPS) Beta Cephei (BCEP) Anomalous (BLBOO) Pulsating Stars Multiple Modes (CEPB) Cepheids (CEP) Long Period (CWA) W Virginis (CW) Short Period (CWB) Delta Cep (DCEP) Symmetrical (DCEPS) Delta Scuti (DSCT) Low Amplitude (DSCTC) Slow Irregular (L) Late Spectral Type (K, M, C, S) (LB) Mira (M) Supergiants (LC) Dual Mode (RRB) PV Telescopii (PVTEL) Asymmetric (RRAB) RR Lyrae (RR) Near Symmetric (RRC) Constant Mean Magnitude (RVA) RV Tauri (RV) Variable Mean Magnitude (RVB) Persistent Periodicity (SRA) Semiregular (SR) Poorly Defined Periodicity (SRB) Pulsating Subdwarfs (SXPHE) Supergiants (SRC) F, G, or K (SRD) Only H Absorption (ZZA) ZZ Ceti (ZZ) Only He Absorption (ZZB) HeII Absoption (ZZO) Cataclysmic Variables Cataclysmic Variables SS Cygni U Geminorum (UG) SU Ursae Majoris SNIa Z Camelopardalis SNIb Type I Supernovae (SNI) SNIc Supernovae (SN) SNIIL Type II Supernovae (SNII) SNIIN Fast Novae (NA) Slow Novae (NB) SNIIP Novae (N) Very Slow Novae (NC) Novalike Variables (NL) Recurrent Novae (NR) Gamma-ray Bursts (GRB) Long Gamma-ray Burst (LSB) Soft Gamma-ray Repeater (SGR) Symbiotic Variables (ZAND) Short Gamma-ray Burst (SHB) Eclipsing Eclipsing Systems Systems with White Dwarfs (WD) Semidetached (SD) Early (O-A) (KE) RS Canum Venaticorum (RS) W Ursa Majoris (KW) Planetary Nebulae (PN) Contact Systems (K) Algol (Beta Persei) (EA) Systems with Supergiant(s) (GS) Eclipsing Binary Systems (E) Beta Lyrae (EB) W Ursae Majoris (EW) Main Sequence (DM) Detached (D) With Subgiant (DS) Detached - AR Lacertae (AR) W Ursa Majoris (DW) Wolf-Rayet Stars (WR)
  • 31. Considerable Complications with Time-Series Data • noisy, irregularly sampled
  • 32. Considerable Complications with Time-Series Data • noisy, irregularly sampled • spurious data
  • 33. Considerable Complications with Time-Series Data • noisy, irregularly sampled • spurious data • telltale signature event may not have happened yet
  • 34. Machine-Learning Approach to Classification Features: homogenize the data; real-number metrics that describe the time-domain characteristics & context of a source ~100 features computed in < 1 sec (including periodogram analysis) Wózniak et al. 2004; Protopapas+06, Willemsen & Eyer 2007; Debosscher et al. 2007; Mahabal et al. 2008; Sarro et al. 2009; Blomme et al. 2010; Kim+11, Richards+11
  • 35. Machine-Learning Approach to Classification Features: homogenize the data; real-number metrics that describe the time-domain characteristics & context of a source ~100 features computed in < 1 sec (including periodogram analysis) periodic variability metrics: e.g. domi metrics nant freq : e.g. Stetson indices, χ 2/dof Lomb- uencies in Scargle, p hase offs (constant hypothesis) between ets periods shape analysis wness, kurtosis, context metrics e.g. ske Gaussianity e.g. distance to nearest galaxy, type of nearest galaxy, locatio n in the ecliptic plane Wózniak et al. 2004; Protopapas+06, Willemsen & Eyer 2007; Debosscher et al. 2007; Mahabal et al. 2008; Sarro et al. 2009; Blomme et al. 2010; Kim+11, Richards+11
  • 36. Variable Star Classification Confusion                                     True   Class             Richards+11                          
  • 37. Variable Star Classification Confusion                             pulsating         True   Class  eruptive          multi-star   Richards+11                          
  • 38. Variable Star Classification Confusion                             pulsating    - global classification errors on    well-observed sources approaching   True 15%   Class - random forest with missing data  eruptive imputation emerging as superior     e.g., Dubath+11,Richards+11      multi-star   Richards+11                          
  • 39. StructuredLearning Structured Classification Structured Classification: Let class taxonomy guide classifier. 5% gross mis- classification rate! HSC: Hierarchical single-label HMC: Hierarchical multi-label classification. classification. I Fit separate classifier at I Fit one classifier, where depth each non-terminal node. L(y , f (x)) w0 Richards+11
  • 40. Decision Boundaries are Survey Specific How do we transfer learning from one survey to the next? –3– (a) (b) feature #2 feature #1 feature #1 Hipparcos OGLE-III Fig. 1.— (a) The grey lines represent the CART classifier constructed using Hipparcos data. The points are Hipparcos sources. This classifier separates Hipparcos sources well (0.6% error as measured by cross-validation). (b) Here the OGLE sources are plotted over the Long+12; Richards+11 same decision boundaries. There is now significant class overlap (30% error rate). This is
  • 41. Decision Boundaries are Survey Specific – 31 – How do we transfer learning from one survey to the next? “Expert” ASAS (testing) OGLE+Hip (training) Fig. 8.— Long+12; Richards+11 Active learning samples on a single iteration of the algorithm. Yellow circles signify points that at least 65% of users were able to classify. These points are included
  • 42. Decision Boundaries are Survey Specific How do we transfer learning from one survey to the next? ● ● ● ● ● 0.40 ● ● ● ● Percent of Confident ASAS RF Labels ● ● 0.35 ● ● ● 0.30 ● 0.25 ● ● 0.20 ● 0.15 ● 2 4 6 8 0 2 4 6 8 AL Iteration AL Iteration Long+12; eft: Percent agreement of the Random Forest classifier with the ACVS labels, Richards+11 of AL iteration. Right: Percent of ASAS data with confident RF classification
  • 43. Classification Statements are Inherently Fuzzy - classification probabilities should reflect uncertainty in the data & training - higher confidence with greater proximity to training data - calibration of classification probability vector E.g.: 20% of transients classified as supernova of type “Ib” with P=0.2 should be supernova of type “Ib”
  • 44. Classification Statements are Inherently Fuzzy - classification probabilities should reflect uncertainty in the data & training - higher confidence with greater proximity to training data - calibration of classification probability vector E.g.: 20% of transients classified as supernova of type “Ib” with P=0.2 should be supernova of type “Ib” Catalogs of Transients and Variable Stars Must Become Probabilistic
  • 46. Doing Science with Probabilistic Catalogs Demographics (with little followup): trading high purity at the cost of lower efficiency e.g., using RRL to find new Galactic structure Novelty Discovery (with lots of followup): trading high efficiency for lower purity e.g., discovering new instances of rare classes
  • 47. Discovery of Bright Galactic R Coronae Borealis and DY Persei Variables: Rare Gems Mined from ASAS A. A. Miller1,⇤ , J. W. Richards1,2 , J. S. Bloom1 , S. B. Cenko1 , J. M. Silverman1 , arXiv:1204.4181v1 [astro-ph.SR] 18 Apr 2012 D. L. Starr1 , and K. G. Stassun3,4 ABSTRACT – 13 – We present the results of a machine-learning (ML) based search for new R Coronae Borealis (RCB) stars and DY Persei-like stars (DYPers) in the Galaxy using cataloged light curves obtained by the All-Sky Automated Survey (ASAS). RCB stars—a rare class of hydrogen-deficient carbon-rich supergiants—are of great interest owing to the insights they can provide on the late stages of stellar evolution. DYPers are possibly the low-temperature, low-luminosity analogs to the RCB phenomenon, though additional examples are needed to fully estab- lish this connection. While RCB stars and DYPers are traditionally identified by epochs of extreme dimming that occur without regularity, the ML search framework more fully captures the richness and diversity of their photometric behavior. We demonstrate that our ML method recovers ASAS candidates that would have been missed by traditional search methods employing hard cuts on amplitude and periodicity. Our search yields 13 candidates that we consider likely RCB stars/DYPers: new and archival spectroscopic observations confirm that four of these candidates are RCB stars and four are DYPers. Our discovery of four new DYPers increases the number of known Galactic DYPers from two to six; noteworthy is that one of the new DYPers has a measured parallax and is m ⇡ 7 mag, making it the brightest known DYPer to date. Future observations of these new DYPers should prove instrumental in establishing the RCB con- Fig. 2.— ASAS V nection. We consider these results, derived from a machine-learned probabilistic -band light curves of newly discovery RCB stars and DYPers. Note t di↵ering magnitude ranges shown for each light curve. Spectroscopic observations confi 1 Department of Astronomy, University of California, Berkeley, CA 94720-3411, USA the top four candidates to RCB/DY California, Berkeley, CA, bottomUSA are DYPers. 17 known Galactic be Universitystars, while the 94720-7450, four Statistics Department, RCB of Per 2 3
  • 48. Discovery of Bright Galactic R Coronae Borealis and DY Persei Variables: Rare Gems Mined from ASAS A. A. Miller1,⇤ , J. W. Richards1,2 , J. S. Bloom1 , S. B. Cenko1 , J. M. Silverman1 , arXiv:1204.4181v1 [astro-ph.SR] 18 Apr 2012 D. L. Starr1 , and K. G. Stassun3,4 ABSTRACT We present the results of a machine-learning (ML) based search for new R Coronae Borealis (RCB) stars and DY Persei-like stars (DYPers) in the Galaxy using cataloged light curves obtained by the All-Sky Automated Survey (ASAS). RCB stars—a rare class of hydrogen-deficient carbon-rich supergiants—are of great interest owing to the insights they can provide on the late stages of stellar evolution. DYPers are possibly the low-temperature, low-luminosity analogs to the RCB phenomenon, though additional examples are needed to fully estab- lish this connection. While RCB stars and DYPers are traditionally identified by epochs of extreme dimming that occur without regularity, the ML search framework more fully captures the richness and diversity of their photometric behavior. We demonstrate that our ML method recovers ASAS candidates that would have been missed by traditional search methods employing hard cuts on amplitude and periodicity. Our search yields 13 candidates that we consider likely RCB stars/DYPers: new and archival spectroscopic observations confirm that four of these candidates are RCB stars and four are DYPers. Our discovery of four new DYPers increases the number of known Galactic DYPers from two to six; noteworthy is that one of the new DYPers has a measured parallax and is m ⇡ 7 mag, making it the brightest known DYPer to date. Future observations of these new DYPers should prove instrumental in establishing the RCB con- nection. We consider these results, derived from a machine-learned probabilistic 1 Department of Astronomy, University of California, Berkeley, CA 94720-3411, USA 17 known Galactic RCB/DY California, Berkeley, CA, 94720-7450, USA Statistics Department, University of Per 2 3
  • 49. Variety of Open Questions 1. How do bootstrap learning from one survey to the next, given inherent differences? “active learning” (e.g., Richards+11b) 2. How do we detect and quantify real outliers? e.g. clustering, semi-supervised learning (e.g., Protopapas+06, Rebbapragada+09, Bhattacharyya+, in prep) 3. How do imbue domain knowledge into classifiers? hybridization, metalearning 4. How do we weigh classification value with computational cost? resource allocation
  • 50. Summary science maximization with synoptic surveys demands a more distant human role than before machine learning in time-domain astrophysics is not just talk...it’s working and enabling novel science yet, real-time discovery & classification is far from solved helpful to view endeavor as a resource-limited problem
  • 51. See you tomorrow! food starting 8am talks starting 9am group picture before lunch

Hinweis der Redaktion

  1. * time-domain in astronomy\n* Crucial. new discoveries Looking at the sky with new tools (new eyes). Ptlometic order - planets were suppose to be fixed spherical orbs not with their own moons -- that didn&amp;#x2019;t fit the world view. opportunistic tools\n* emphasizes the crucial roles of humans both in the data collection side, data analysis, and inference. \n\n\n
  2. \n\n
  3. happy to acknowledge. big effort. industry support.\n
  4. needle in the haystack\n
  5. needle in the haystack\n
  6. needle in the haystack\n
  7. needle in the haystack\n
  8. needle in the haystack\n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. contrained in time. decisions with incomplete information. Extreme rarities -- maybe a few a year of interest. imbalance and robust\n
  21. \n
  22. Teaming with things we know and dont know about. exploration of the known and the unknowns.\n\nRumsfeldian\nshort timescales.\n
  23. \n
  24. Simply must understand that our roles must change.\n
  25. Simply must understand that our roles must change.\n
  26. identification different than discovery: Galeilio \nGalileo&apos;s drawings show that he first observed Neptune on December 28, 1612, and again on January 27, 1613. On both occasions, Galileo mistook Neptune for a fixed star when it appeared very close&amp;#x2014;inconjunction&amp;#x2014;to Jupiter in the night sky;[20] hence, he is not credited with Neptune&apos;s discovery.\n\n253 year later\n\nJohann Gottfried Galle\n 23 September 1846\n
  27. identification different than discovery: Galeilio \nGalileo&apos;s drawings show that he first observed Neptune on December 28, 1612, and again on January 27, 1613. On both occasions, Galileo mistook Neptune for a fixed star when it appeared very close&amp;#x2014;inconjunction&amp;#x2014;to Jupiter in the night sky;[20] hence, he is not credited with Neptune&apos;s discovery.\n\n253 year later\n\nJohann Gottfried Galle\n 23 September 1846\n
  28. identification different than discovery: Galeilio \nGalileo&apos;s drawings show that he first observed Neptune on December 28, 1612, and again on January 27, 1613. On both occasions, Galileo mistook Neptune for a fixed star when it appeared very close&amp;#x2014;inconjunction&amp;#x2014;to Jupiter in the night sky;[20] hence, he is not credited with Neptune&apos;s discovery.\n\n253 year later\n\nJohann Gottfried Galle\n 23 September 1846\n
  29. identification different than discovery: Galeilio \nGalileo&apos;s drawings show that he first observed Neptune on December 28, 1612, and again on January 27, 1613. On both occasions, Galileo mistook Neptune for a fixed star when it appeared very close&amp;#x2014;inconjunction&amp;#x2014;to Jupiter in the night sky;[20] hence, he is not credited with Neptune&apos;s discovery.\n\n253 year later\n\nJohann Gottfried Galle\n 23 September 1846\n
  30. 1.5 M per night, \n
  31. \n
  32. it should be easy -- there&amp;#x2019;s a bunch of classes of objects which vary, we measure their light curves and that&amp;#x2019;s it. Even remarkably homogeneous classes such as Ia and RRL exhibit huge variations. \n\n
  33. it should be easy -- there&amp;#x2019;s a bunch of classes of objects which vary, we measure their light curves and that&amp;#x2019;s it. Even remarkably homogeneous classes such as Ia and RRL exhibit huge variations. \n\n
  34. it should be easy -- there&amp;#x2019;s a bunch of classes of objects which vary, we measure their light curves and that&amp;#x2019;s it/\n
  35. however, in practice\n
  36. however, in practice\n
  37. however, in practice\n
  38. however, in practice\n
  39. however, in practice\n
  40. however, in practice\n
  41. dynamic time warping\nhundreds of features: n log n, n^2, etc. some of these are results of external queries.\n\nSame things we, as experts, look at in a light curve and ancillary data.\n
  42. dynamic time warping\nhundreds of features: n log n, n^2, etc. some of these are results of external queries.\n\nSame things we, as experts, look at in a light curve and ancillary data.\n
  43. dynamic time warping\nhundreds of features: n log n, n^2, etc. some of these are results of external queries.\n\nSame things we, as experts, look at in a light curve and ancillary data.\n
  44. dynamic time warping\nhundreds of features: n log n, n^2, etc. some of these are results of external queries.\n\nSame things we, as experts, look at in a light curve and ancillary data.\n
  45. discovery of physical intuition, like what Alex talked about.\n
  46. discovery of physical intuition, like what Alex talked about.\n
  47. \n
  48. how you observed the data impacts what you think it is. This is obvious.\n\napproach is to craft ground truth from one survey to look like another. Either in light curve space\nor in feature space.\n
  49. how you observed the data impacts what you think it is. This is obvious.\n\napproach is to craft ground truth from one survey to look like another. Either in light curve space\nor in feature space.\n
  50. how you observed the data impacts what you think it is. This is obvious.\n\napproach is to craft ground truth from one survey to look like another. Either in light curve space\nor in feature space.\n
  51. how you observed the data impacts what you think it is. This is obvious.\n\napproach is to craft ground truth from one survey to look like another. Either in light curve space\nor in feature space.\n
  52. say good bye to black and white catalogs, \n
  53. posterior probabilities\nnot liklihoods -- convolved with the priors\nprescription for adapation\n
  54. best way to find needles in teh haystack is to get really good a finding and identifying hay.\n
  55. 8 of 13\n&amp;#x2206;mV up to &amp;#x223C;8 mag), aperiodic declines in brightness\nAt maximum light RCB stars are bright supergiants,\n\nMerrill-Sanford bands of SiC2 in three of our candidates: ASAS 162232&amp;#x2212;5349.2, ASAS 065113+0222.1, and ASAS 182658+0109.0. To our knowledge this is the first identification of SiC2 in a DYPer spectrum\n\n
  56. last one not so important if we can wait for the answer.\n
  57. richer\n
  58. \n