SlideShare a Scribd company logo
1 of 20
Download to read offline
Josh Bloom (PI)
       , Justin Higgins, Adam Morgan
“Object”
 Datastream




Transients
Classification
Pipeline

 Classify

   Database

 Broadcast
SASIR              LSST
        SDSS                     PTF / LBL                           (future)         (future)
        stripe-82                    subtraction
      archived data                    pipeline                Survey X Survey Y
                                                               (real-time survey
                                                                  telescope)        (static survey
                                                                                     repository)




                                 Transients
                                 Classification
                                 Pipeline
  Database containing                                                     Classify
                               Broadcast “sources”
        “sources”
• features for a source    • interesting or transient source
                           • include classifications                             Database
• data epochs associated   • include features, context
     with a source                                                       Broadcast
SDSS Stripe 82
        SDSS
        stripe-82
      archived data
                           •   A deep field from the Sloan Digital Sky Survey

                           •   750 Million observation epochs


Transients                 •   ~20 Million “sources” clustered from epochs

                           •   5 colors / filters, 4 years of observations
Classification              •   We used Stripe-82 for testing and development

Pipeline
  Database containing
        “sources”
• features for a source
• data epochs associated
     with a source
Palomar Transient Factory
                    •   Palomar 48” telescope

                    •   100 Mpix, 7.8 sq-deg detector

                    •   ~120s cadence : ~200MB : <100GB/night

                    •   Post subtraction: ~1M difference objects / night

                    •   Post filtering: ~10k difference objects / night
                                         ~100s transient and variable stars



 LBL
subtraction
  pipeline
                T       PTF consortium
                                                           PAIRITEL 1.3m


                C
                P                           Palomar 60”           MDM 1.3m & 2.4m
Next Generation Survey: LSST


                 Large Synoptic Survey
                   Telescope (LSST):
                   1 Gb every 2 seconds

                     106 supernovae/yr
                     105 eclipsing systems
                     107 asteroids...

                      light curves of 800
                     million sources every
                             3 days
Transients Classification Pipeline
                                  “Object”
                                 Datastream




                                   source


                           T
                                 generation




                           C
                                   feature
                                 generation



                           P       source
                                classification
                                                   Database



    Follow-up
telescope observations

                                Broadcast
Parallelized source correlation
                             and classification

                •   Retrieve difference objects

                •   Each difference-object is passed to an IPython client

                •   Each parallel IPython client performs:
                     •   Source creation or correlation with existing sources

                     •   “Feature” generation (or re-generation) for that source

   source            •   Classification of that source
 generation




   feature
 generation




   source
classification
Parallelized source correlation
                             and classification

                •   Realtime TCP runs on 22 dedicated cores

                •   LCOGT’s 96 core beowulf
                     •   non run-time tasks

                     •   Classifier generation


                •   Additional resources: (for future classification work)
                     •   Yahoo! M45 cluster
   source
 generation          •   Amazon EC2 cluster


   feature
 generation




   source
classification
Warehouse of light-curves

•   Need representative light-curves for all science

•   With these we can model each science class

•   We’ve built a warehouse of example light-curves




     TCP-TUTOR                 DotAstro.org
        internal interface        public interface
“Noisifying to the Survey”

•   Well sampled light-curves
     •   Can make good classifiers for well-sampled data.

     •   Don’t immediately make good classifiers for noisy, sparse data.


•   We need classifiers which are trained using:
     •   sampling cadence of our survey

     •   sparseness of our survey data

     •   noise and sensitivity limitations of our instrument


•   We need “Noisification” software which:
     •   Resamples well-sampled light-curves

     •   Outputs noisified sources which are used for generating classifiers
“Noisifying to the Survey”
“Noisifying to the Survey”

•   For PTF:
     •   Code uses PTF pointing and survey observing plans

     •   Occasionally PTF observes using a faster cadence:

           •    7.5 minutes between revisiting an RA, Dec

           •    Faster cadence requires a separate set of noisified light-curves
                and classifiers.


•   Other surveys:
     •   Other pointing and observing plans could be used.

     •   Can generate noisified light-curves for other surveys.

     •   Then we can generate science classifiers for these surveys.
Classifiers
       •    General Classifier
                  Identify:                               Filter out:

•   well sampled (periodic & nonperiodic)       •   poorly subtracted sources

•   interesting sources near known galaxies     •   minor planets / rocks

•   periodic variable science class when        •   cosmic rays
    confidence is high
                                                •   detector defects


       •    Timeseries Classifiers
              •    Weighted combination of WEKA classifiers

                     •    bagged Random Forest classifier using a cost-matrix

                     •    Each classifier trained on different cadenced noisified data

              •    Astronomer crafted classifiers for specific science types

                     •    Microlens, Super Nova
Interesting near-galaxy PTF sources

 • Identified by TCP during end of Aug ‘09
 • Classification triggered by latest epoch
    added to the source
Periodic variable classifiers
                   •     Currently, science classes are determined by combining
                         the weighted probabilities generated by different
                         classification models, for a source.
                                                                                                         ~0.4 day period
~0.14 day period
 RR Lyrae using    •     Each machine-learned classification model is trained using                       RR Lyrae using
                                                                                                            10 epoch
   20 epoch              “noisified” lightcurves which were generated using
                         different parameters.                                                            noisification
  noisification
                                                               ...shows highest classification
                               Clicking on a class for one
                                                                probability sources for that
                               of dozens of ML models...
                                                                        model::class




                     Overplotting of
                                                                                  period-fold plotting
                   period-folded model
                                                                                  probably failed here
                     still needs work



                                            0.1 - 0.17 day period RR Lyrae
                                             using 15 epoch noisification
Evaluating and Combining Classifiers


•   Issues when using multiple classifiers:
      •    How to combine classifiers when using:

            •    weighted classifiers

            •    tree-hierarchy of sub-classifiers

      •    How to generate final classification “probabilities” when using:

         • Widely varying types of classifiers
         • Classifiers which contain sub-classifications & probabilities
•   Evaluate the final combination of classifiers
      •    Classify PTF09xxx user classified sources, determine efficiencies

      •    Classify noisified sources, determine efficiencies
Talk about T.C.P. for CDI inter-departmental workshop at UC Berkeley. 20090911.

More Related Content

Viewers also liked

Education Powerpoint
Education PowerpointEducation Powerpoint
Education PowerpointCasandraAdams
 
Authentic Leaders
Authentic LeadersAuthentic Leaders
Authentic Leadersguest970121
 
Authentic Leaders
Authentic LeadersAuthentic Leaders
Authentic Leadersguest970121
 
Caltech 20090903 Talk on T.C.P. for LSST/PTF workshop
Caltech 20090903 Talk on T.C.P. for LSST/PTF workshopCaltech 20090903 Talk on T.C.P. for LSST/PTF workshop
Caltech 20090903 Talk on T.C.P. for LSST/PTF workshopDan Starr
 
Education Powerpoint
Education PowerpointEducation Powerpoint
Education PowerpointCasandraAdams
 
authenticity digital records term essay
authenticity digital records term essayauthenticity digital records term essay
authenticity digital records term essayapogarl
 
Industry’s performance leading ultra low-power dsp solution
Industry’s performance leading ultra low-power dsp solutionIndustry’s performance leading ultra low-power dsp solution
Industry’s performance leading ultra low-power dsp solutionAnalog Devices, Inc.
 
Education Powerpoint
Education PowerpointEducation Powerpoint
Education PowerpointCasandraAdams
 
Education Powerpoint
Education PowerpointEducation Powerpoint
Education PowerpointCasandraAdams
 
Current Educational Issue Powerpoint
Current Educational Issue PowerpointCurrent Educational Issue Powerpoint
Current Educational Issue PowerpointCasandraAdams
 
What would nature do? Natural ecosystems vs. design/business ecosystem
What would nature do? Natural ecosystems vs. design/business ecosystemWhat would nature do? Natural ecosystems vs. design/business ecosystem
What would nature do? Natural ecosystems vs. design/business ecosystemJoshua Sin
 
Authentic Leaders
Authentic LeadersAuthentic Leaders
Authentic Leadersguest970121
 
Authentic Leaders
Authentic LeadersAuthentic Leaders
Authentic Leadersguest970121
 
Introduction to Twitter (HUOC SM101 Spring 2012)
Introduction to Twitter (HUOC SM101 Spring 2012)Introduction to Twitter (HUOC SM101 Spring 2012)
Introduction to Twitter (HUOC SM101 Spring 2012)Mohammad Hijazi
 

Viewers also liked (19)

Education Powerpoint
Education PowerpointEducation Powerpoint
Education Powerpoint
 
S E V E N W O N D E R S
S E V E N W O N D E R SS E V E N W O N D E R S
S E V E N W O N D E R S
 
Authentic Leaders
Authentic LeadersAuthentic Leaders
Authentic Leaders
 
Authentic Leaders
Authentic LeadersAuthentic Leaders
Authentic Leaders
 
Caltech 20090903 Talk on T.C.P. for LSST/PTF workshop
Caltech 20090903 Talk on T.C.P. for LSST/PTF workshopCaltech 20090903 Talk on T.C.P. for LSST/PTF workshop
Caltech 20090903 Talk on T.C.P. for LSST/PTF workshop
 
Education Powerpoint
Education PowerpointEducation Powerpoint
Education Powerpoint
 
authenticity digital records term essay
authenticity digital records term essayauthenticity digital records term essay
authenticity digital records term essay
 
Industry’s performance leading ultra low-power dsp solution
Industry’s performance leading ultra low-power dsp solutionIndustry’s performance leading ultra low-power dsp solution
Industry’s performance leading ultra low-power dsp solution
 
Education Powerpoint
Education PowerpointEducation Powerpoint
Education Powerpoint
 
Education Powerpoint
Education PowerpointEducation Powerpoint
Education Powerpoint
 
Current Educational Issue Powerpoint
Current Educational Issue PowerpointCurrent Educational Issue Powerpoint
Current Educational Issue Powerpoint
 
What would nature do? Natural ecosystems vs. design/business ecosystem
What would nature do? Natural ecosystems vs. design/business ecosystemWhat would nature do? Natural ecosystems vs. design/business ecosystem
What would nature do? Natural ecosystems vs. design/business ecosystem
 
Authentic Leaders
Authentic LeadersAuthentic Leaders
Authentic Leaders
 
Culture Of Great India
Culture Of  Great IndiaCulture Of  Great India
Culture Of Great India
 
Authentic Leaders
Authentic LeadersAuthentic Leaders
Authentic Leaders
 
Exacqvision2
Exacqvision2Exacqvision2
Exacqvision2
 
Introduction to Twitter (HUOC SM101 Spring 2012)
Introduction to Twitter (HUOC SM101 Spring 2012)Introduction to Twitter (HUOC SM101 Spring 2012)
Introduction to Twitter (HUOC SM101 Spring 2012)
 
Proxy & CGLIB
Proxy & CGLIBProxy & CGLIB
Proxy & CGLIB
 
News Corp
News CorpNews Corp
News Corp
 

Similar to Talk about T.C.P. for CDI inter-departmental workshop at UC Berkeley. 20090911.

Astronomical Data Processing on the LSST Scale with Apache Spark
Astronomical Data Processing on the LSST Scale with Apache SparkAstronomical Data Processing on the LSST Scale with Apache Spark
Astronomical Data Processing on the LSST Scale with Apache SparkDatabricks
 
ApacheCon NA 2013 VFASTR
ApacheCon NA 2013 VFASTRApacheCon NA 2013 VFASTR
ApacheCon NA 2013 VFASTRLucaCinquini
 
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...Paul Brebner
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqEnis Afgan
 
Apache Airavata ApacheCon2013
Apache Airavata ApacheCon2013Apache Airavata ApacheCon2013
Apache Airavata ApacheCon2013smarru
 
Round Table Introduction: Analytics on 100 TB+ catalogs
Round Table Introduction: Analytics on 100 TB+ catalogsRound Table Introduction: Analytics on 100 TB+ catalogs
Round Table Introduction: Analytics on 100 TB+ catalogsMario Juric
 
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...NAVER Engineering
 
SoDA v2 - Named Entity Recognition from streaming text
SoDA v2 - Named Entity Recognition from streaming textSoDA v2 - Named Entity Recognition from streaming text
SoDA v2 - Named Entity Recognition from streaming textSujit Pal
 
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...Lucidworks
 
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Nitin S
 
Solr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin PresentationSolr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin PresentationNitin Sharma
 
Accelerating Astronomical Discoveries with Apache Spark
Accelerating Astronomical Discoveries with Apache SparkAccelerating Astronomical Discoveries with Apache Spark
Accelerating Astronomical Discoveries with Apache SparkDatabricks
 
An Introduction to JVM Internals and Garbage Collection in Java
An Introduction to JVM Internals and Garbage Collection in JavaAn Introduction to JVM Internals and Garbage Collection in Java
An Introduction to JVM Internals and Garbage Collection in JavaAbhishek Asthana
 
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...Databricks
 
Information Extraction on Noisy Texts for Historical Research
Information Extraction on Noisy Texts for Historical ResearchInformation Extraction on Noisy Texts for Historical Research
Information Extraction on Noisy Texts for Historical ResearchKepa J. Rodriguez
 
Wrapper induction construct wrappers automatically to extract information f...
Wrapper induction   construct wrappers automatically to extract information f...Wrapper induction   construct wrappers automatically to extract information f...
Wrapper induction construct wrappers automatically to extract information f...George Ang
 
Private Range Query by Perturbation and Matrix Based Encryption
Private Range Query by Perturbation and Matrix Based EncryptionPrivate Range Query by Perturbation and Matrix Based Encryption
Private Range Query by Perturbation and Matrix Based EncryptionJunpei Kawamoto
 
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...Paul Brebner
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Prof. Wim Van Criekinge
 

Similar to Talk about T.C.P. for CDI inter-departmental workshop at UC Berkeley. 20090911. (20)

Astronomical Data Processing on the LSST Scale with Apache Spark
Astronomical Data Processing on the LSST Scale with Apache SparkAstronomical Data Processing on the LSST Scale with Apache Spark
Astronomical Data Processing on the LSST Scale with Apache Spark
 
ApacheCon NA 2013 VFASTR
ApacheCon NA 2013 VFASTRApacheCon NA 2013 VFASTR
ApacheCon NA 2013 VFASTR
 
Far cry 3
Far cry 3Far cry 3
Far cry 3
 
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-Seq
 
Apache Airavata ApacheCon2013
Apache Airavata ApacheCon2013Apache Airavata ApacheCon2013
Apache Airavata ApacheCon2013
 
Round Table Introduction: Analytics on 100 TB+ catalogs
Round Table Introduction: Analytics on 100 TB+ catalogsRound Table Introduction: Analytics on 100 TB+ catalogs
Round Table Introduction: Analytics on 100 TB+ catalogs
 
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
 
SoDA v2 - Named Entity Recognition from streaming text
SoDA v2 - Named Entity Recognition from streaming textSoDA v2 - Named Entity Recognition from streaming text
SoDA v2 - Named Entity Recognition from streaming text
 
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
 
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
 
Solr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin PresentationSolr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin Presentation
 
Accelerating Astronomical Discoveries with Apache Spark
Accelerating Astronomical Discoveries with Apache SparkAccelerating Astronomical Discoveries with Apache Spark
Accelerating Astronomical Discoveries with Apache Spark
 
An Introduction to JVM Internals and Garbage Collection in Java
An Introduction to JVM Internals and Garbage Collection in JavaAn Introduction to JVM Internals and Garbage Collection in Java
An Introduction to JVM Internals and Garbage Collection in Java
 
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
 
Information Extraction on Noisy Texts for Historical Research
Information Extraction on Noisy Texts for Historical ResearchInformation Extraction on Noisy Texts for Historical Research
Information Extraction on Noisy Texts for Historical Research
 
Wrapper induction construct wrappers automatically to extract information f...
Wrapper induction   construct wrappers automatically to extract information f...Wrapper induction   construct wrappers automatically to extract information f...
Wrapper induction construct wrappers automatically to extract information f...
 
Private Range Query by Perturbation and Matrix Based Encryption
Private Range Query by Perturbation and Matrix Based EncryptionPrivate Range Query by Perturbation and Matrix Based Encryption
Private Range Query by Perturbation and Matrix Based Encryption
 
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
 

Recently uploaded

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

Talk about T.C.P. for CDI inter-departmental workshop at UC Berkeley. 20090911.

  • 1. Josh Bloom (PI) , Justin Higgins, Adam Morgan
  • 3. SASIR LSST SDSS PTF / LBL (future) (future) stripe-82 subtraction archived data pipeline Survey X Survey Y (real-time survey telescope) (static survey repository) Transients Classification Pipeline Database containing Classify Broadcast “sources” “sources” • features for a source • interesting or transient source • include classifications Database • data epochs associated • include features, context with a source Broadcast
  • 4. SDSS Stripe 82 SDSS stripe-82 archived data • A deep field from the Sloan Digital Sky Survey • 750 Million observation epochs Transients • ~20 Million “sources” clustered from epochs • 5 colors / filters, 4 years of observations Classification • We used Stripe-82 for testing and development Pipeline Database containing “sources” • features for a source • data epochs associated with a source
  • 5. Palomar Transient Factory • Palomar 48” telescope • 100 Mpix, 7.8 sq-deg detector • ~120s cadence : ~200MB : <100GB/night • Post subtraction: ~1M difference objects / night • Post filtering: ~10k difference objects / night ~100s transient and variable stars LBL subtraction pipeline T PTF consortium PAIRITEL 1.3m C P Palomar 60” MDM 1.3m & 2.4m
  • 6. Next Generation Survey: LSST Large Synoptic Survey Telescope (LSST): 1 Gb every 2 seconds 106 supernovae/yr 105 eclipsing systems 107 asteroids... light curves of 800 million sources every 3 days
  • 7. Transients Classification Pipeline “Object” Datastream source T generation C feature generation P source classification Database Follow-up telescope observations Broadcast
  • 8. Parallelized source correlation and classification • Retrieve difference objects • Each difference-object is passed to an IPython client • Each parallel IPython client performs: • Source creation or correlation with existing sources • “Feature” generation (or re-generation) for that source source • Classification of that source generation feature generation source classification
  • 9. Parallelized source correlation and classification • Realtime TCP runs on 22 dedicated cores • LCOGT’s 96 core beowulf • non run-time tasks • Classifier generation • Additional resources: (for future classification work) • Yahoo! M45 cluster source generation • Amazon EC2 cluster feature generation source classification
  • 10. Warehouse of light-curves • Need representative light-curves for all science • With these we can model each science class • We’ve built a warehouse of example light-curves TCP-TUTOR DotAstro.org internal interface public interface
  • 11.
  • 12.
  • 13. “Noisifying to the Survey” • Well sampled light-curves • Can make good classifiers for well-sampled data. • Don’t immediately make good classifiers for noisy, sparse data. • We need classifiers which are trained using: • sampling cadence of our survey • sparseness of our survey data • noise and sensitivity limitations of our instrument • We need “Noisification” software which: • Resamples well-sampled light-curves • Outputs noisified sources which are used for generating classifiers
  • 14. “Noisifying to the Survey”
  • 15. “Noisifying to the Survey” • For PTF: • Code uses PTF pointing and survey observing plans • Occasionally PTF observes using a faster cadence: • 7.5 minutes between revisiting an RA, Dec • Faster cadence requires a separate set of noisified light-curves and classifiers. • Other surveys: • Other pointing and observing plans could be used. • Can generate noisified light-curves for other surveys. • Then we can generate science classifiers for these surveys.
  • 16. Classifiers • General Classifier Identify: Filter out: • well sampled (periodic & nonperiodic) • poorly subtracted sources • interesting sources near known galaxies • minor planets / rocks • periodic variable science class when • cosmic rays confidence is high • detector defects • Timeseries Classifiers • Weighted combination of WEKA classifiers • bagged Random Forest classifier using a cost-matrix • Each classifier trained on different cadenced noisified data • Astronomer crafted classifiers for specific science types • Microlens, Super Nova
  • 17. Interesting near-galaxy PTF sources • Identified by TCP during end of Aug ‘09 • Classification triggered by latest epoch added to the source
  • 18. Periodic variable classifiers • Currently, science classes are determined by combining the weighted probabilities generated by different classification models, for a source. ~0.4 day period ~0.14 day period RR Lyrae using • Each machine-learned classification model is trained using RR Lyrae using 10 epoch 20 epoch “noisified” lightcurves which were generated using different parameters. noisification noisification ...shows highest classification Clicking on a class for one probability sources for that of dozens of ML models... model::class Overplotting of period-fold plotting period-folded model probably failed here still needs work 0.1 - 0.17 day period RR Lyrae using 15 epoch noisification
  • 19. Evaluating and Combining Classifiers • Issues when using multiple classifiers: • How to combine classifiers when using: • weighted classifiers • tree-hierarchy of sub-classifiers • How to generate final classification “probabilities” when using: • Widely varying types of classifiers • Classifiers which contain sub-classifications & probabilities • Evaluate the final combination of classifiers • Classify PTF09xxx user classified sources, determine efficiencies • Classify noisified sources, determine efficiencies