SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Downloaden Sie, um offline zu lesen
Using Principal Component
Analysis to Remove Correlated
Signal from Astronomical Images
Kim Scott
National Radio Astronomy Observatory
Data Science Meet-up
February 18, 2014
Galaxy Evolution in One Slide...
Galaxy Evolution in One Slide...
Galaxy Evolution in One Slide...

?
Galaxy Surveys – What Are We Missing?
Galaxy Surveys – What Are We Missing?

Optical surveys miss
~50% of star formation
in galaxies
Optical surveys
are biased

Dust reemits stellar
radiation at infrared to
millimeter wavelengths
(λ ~ 20 – 2000 μm)
Galaxy Surveys at (Sub)mm Wavelengths
Atmospheric emission

1000× stronger than signal from galaxies

Extragalactic emission:
Transmitted
Absorbed
Removing the Atmosphere by
Modulating the Signal in Time
Detector array

Galaxy
Removing the Atmosphere by
Modulating the Signal in Time
Detector array

i=1

i=2

i=3

Galaxy

xij: power measured for
time sample i on detector j
Surveys at λ=1.1mm with AzTEC
ASTE Telescope
AzTEC Dewar
AzTEC Array
(117 detectors)
Raw Time-stream Data

Sample rate = 1∕(15.625 ms)
Raw Time-stream Data

Sample rate = 1∕(15.625 ms)
(20 s = 1280 samples)
Principal Component Analysis (PCA)

[Used in supervised learning to compress data - fit to
fewer number of features]
• xij: power measured for time sample i on detector j
• n = number of detectors; m = number of time samples
• X = [ x1 x2 ... xm ] → n × m matrix

*Only input needed for PCA*
Principal Component Analysis (PCA)
Step 1: Mean normalization (and feature scaling)
• Compute μj = (1∕m) Σi=1,m xij for each detector
• Compute σ2j = (1∕(m-1)) Σi=1,m (xij - μj)2 for each detector
• Set xij (xij − μj) ∕ σj
• X = [ x1 x2 ... xm ] → n × m matrix
Principal Component Analysis (PCA)
Step 1: Mean normalization (and feature scaling)
• Compute μj = (1∕m) Σi=1,m xij for each detector
• Compute σ2j = (1∕(m-1)) Σi=1,m (xij - μj)2 for each detector
• Set xij (xij − μj) ∕ σj
• X = [ x1 x2 ... xm ] → n × m matrix
Principal Component Analysis (PCA)
Step 1: Mean normalization (and feature scaling)
• Compute μj = (1∕m) Σi=1,m xij for each detector
• Compute σ2j = (1∕(m-1)) Σi=1,m (xij - μj)2 for each detector
• Set xij (xij − μj) ∕ σj
• X = [ x1 x2 ... xm ] → n × m matrix

1mV

*PCA can identify lower level
correlations among subsets of
the detectors*
Principal Component Analysis (PCA)
Step 2: Calculate covariance matrix
• C = (1∕m) X XT
(recall m = # time samples)
• C → n × n symmetric matrix
(recall n = 117 detectors)
Step 3: Eigen decomposition
• C = Q Λ Q-1 (*solve using SVD*)
• Q = [ q1 q2 ... qn ] → n × n matrix containing
eigenvectors qi
• Λ → n × n diagonal matrix containing eigenvalues λi = Λii
• Principal components = uncorrelated variables
Principal Component Analysis (PCA)
Step 4: Choose number of components to remove
• Goal: choose fewest number of components (k) to
REMOVE most of the observed variance in the data
• QR = [ qk+1 qk+2 ... qn ] → n × k matrix, k < n
• Z = [ z1 z2 ... zm ] = QRT X → k x m matrix
• To derive model of galaxy intensities on sky, use Z instead
of X (but...)
Choosing k:
Variance after PCA (given k)
< 0.05
Variance with average subtraction only
Principal Component Analysis (PCA)
Step 5: Reconstruct data without correlated signal
• Know RA/Dec for each detector: need to reconstruct
approximation for data to make image
• XR = QR Z → n × m matrix with correlated signal
removed!

1mV
Principal Component Analysis (PCA)
Step 5: Reconstruct data without correlated signal
• Know RA/Dec for each detector: need to reconstruct
approximation for data to make image
• XR = QR Z → n × m matrix with correlated signal
removed!
20μV

*Variance reduced by factor of 50*
Image of PKS J1127-1857
Make the map:
• Use information on sky position for each detector at each time
sample (RAij, Decij) and bin data onto image grid
• Set the intensity of each image pixel to the average of the xRij values
that fall into that bin
• Smooth image by telescope point-spread response function
(Gaussian with FWHM=30’’)

Average Subtraction

PCA Cleaned

• raw data = 30 MB
• ttot = 4 min
• 16640 samples/detector
An Extragalactic Survey at λ=1.1 mm
• Most galaxies are 100× fainter
than PKS J1127-1857
• raw data ~ 25 GB
• ttot ~ 80 hrs
• ~ 2×107 samples/detector
• AzTEC/COSMOS survey
• 0.7 deg2
• 500× area of HUDF
• 160 hrs versus 11 days for
HUDF
• 130 mm-bright galaxies

Aretxaga et al. 2011
An Extragalactic Survey at λ=1.1 mm

• AzTEC/COSMOS survey
• 0.7 deg2
• 500× area of HUDF
• 160 hrs versus 270 hrs for
HUDF
• 130 mm-bright galaxies
An Extragalactic Survey at λ=1.1 mm

• AzTEC/COSMOS survey
• 0.7 deg2
• 500× area of HUDF
• 160 hrs versus 270 hrs for
HUDF
• 130 mm-bright galaxies
An Extragalactic Survey at λ=1.1 mm
• AzTEC-3
• Observed 1 Gyr after Big Bang
• Starburst galaxy (SFR~1000 Msun/yr)

Capak et al. 2011

• AzTEC/COSMOS survey
• 0.7 deg2
• 500× area of HUDF
• 160 hrs versus 270 hrs for
HUDF
• 130 mm-bright galaxies

Aretxaga et al. 2011

Weitere ähnliche Inhalte

Was ist angesagt?

Principal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty DetectionPrincipal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty DetectionJordan McBain
 
2012 mdsp pr08 nonparametric approach
2012 mdsp pr08 nonparametric approach2012 mdsp pr08 nonparametric approach
2012 mdsp pr08 nonparametric approachnozomuhamada
 
Multiclass Logistic Regression: Derivation and Apache Spark Examples
Multiclass Logistic Regression: Derivation and Apache Spark ExamplesMulticlass Logistic Regression: Derivation and Apache Spark Examples
Multiclass Logistic Regression: Derivation and Apache Spark ExamplesMarjan Sterjev
 
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...MLconf
 
A Correlative Information-Theoretic Measure for Image Similarity
A Correlative Information-Theoretic Measure for Image SimilarityA Correlative Information-Theoretic Measure for Image Similarity
A Correlative Information-Theoretic Measure for Image SimilarityFarah M. Altufaili
 
MLHEP 2015: Introductory Lecture #2
MLHEP 2015: Introductory Lecture #2MLHEP 2015: Introductory Lecture #2
MLHEP 2015: Introductory Lecture #2arogozhnikov
 
MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1arogozhnikov
 
MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4arogozhnikov
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习AdaboostShocky1
 
Independent Component Analysis
Independent Component Analysis Independent Component Analysis
Independent Component Analysis Ibrahim Amer
 
TENSOR DECOMPOSITION WITH PYTHON
TENSOR DECOMPOSITION WITH PYTHONTENSOR DECOMPOSITION WITH PYTHON
TENSOR DECOMPOSITION WITH PYTHONAndré Panisson
 
[Vldb 2013] skyline operator on anti correlated distributions
[Vldb 2013] skyline operator on anti correlated distributions[Vldb 2013] skyline operator on anti correlated distributions
[Vldb 2013] skyline operator on anti correlated distributionsWooSung Choi
 
Graph Based Clustering
Graph Based ClusteringGraph Based Clustering
Graph Based ClusteringSSA KPI
 
MLHEP Lectures - day 2, basic track
MLHEP Lectures - day 2, basic trackMLHEP Lectures - day 2, basic track
MLHEP Lectures - day 2, basic trackarogozhnikov
 
MLHEP 2015: Introductory Lecture #3
MLHEP 2015: Introductory Lecture #3MLHEP 2015: Introductory Lecture #3
MLHEP 2015: Introductory Lecture #3arogozhnikov
 
Multidimension Scaling and Isomap
Multidimension Scaling and IsomapMultidimension Scaling and Isomap
Multidimension Scaling and IsomapCheng-Shiang Li
 

Was ist angesagt? (20)

Principal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty DetectionPrincipal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty Detection
 
2012 mdsp pr08 nonparametric approach
2012 mdsp pr08 nonparametric approach2012 mdsp pr08 nonparametric approach
2012 mdsp pr08 nonparametric approach
 
Pca ankita dubey
Pca ankita dubeyPca ankita dubey
Pca ankita dubey
 
Multiclass Logistic Regression: Derivation and Apache Spark Examples
Multiclass Logistic Regression: Derivation and Apache Spark ExamplesMulticlass Logistic Regression: Derivation and Apache Spark Examples
Multiclass Logistic Regression: Derivation and Apache Spark Examples
 
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
 
A Correlative Information-Theoretic Measure for Image Similarity
A Correlative Information-Theoretic Measure for Image SimilarityA Correlative Information-Theoretic Measure for Image Similarity
A Correlative Information-Theoretic Measure for Image Similarity
 
MLHEP 2015: Introductory Lecture #2
MLHEP 2015: Introductory Lecture #2MLHEP 2015: Introductory Lecture #2
MLHEP 2015: Introductory Lecture #2
 
MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1
 
MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4MLHEP 2015: Introductory Lecture #4
MLHEP 2015: Introductory Lecture #4
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习Adaboost
 
Independent Component Analysis
Independent Component Analysis Independent Component Analysis
Independent Component Analysis
 
K-Means Algorithm
K-Means AlgorithmK-Means Algorithm
K-Means Algorithm
 
TENSOR DECOMPOSITION WITH PYTHON
TENSOR DECOMPOSITION WITH PYTHONTENSOR DECOMPOSITION WITH PYTHON
TENSOR DECOMPOSITION WITH PYTHON
 
[Vldb 2013] skyline operator on anti correlated distributions
[Vldb 2013] skyline operator on anti correlated distributions[Vldb 2013] skyline operator on anti correlated distributions
[Vldb 2013] skyline operator on anti correlated distributions
 
Graph Based Clustering
Graph Based ClusteringGraph Based Clustering
Graph Based Clustering
 
MLHEP Lectures - day 2, basic track
MLHEP Lectures - day 2, basic trackMLHEP Lectures - day 2, basic track
MLHEP Lectures - day 2, basic track
 
K-means and GMM
K-means and GMMK-means and GMM
K-means and GMM
 
Data Analysis Homework Help
Data Analysis Homework HelpData Analysis Homework Help
Data Analysis Homework Help
 
MLHEP 2015: Introductory Lecture #3
MLHEP 2015: Introductory Lecture #3MLHEP 2015: Introductory Lecture #3
MLHEP 2015: Introductory Lecture #3
 
Multidimension Scaling and Isomap
Multidimension Scaling and IsomapMultidimension Scaling and Isomap
Multidimension Scaling and Isomap
 

Ähnlich wie PCA Removes Atmospheric Signal from Astronomical Images

DimensionalityReduction.pptx
DimensionalityReduction.pptxDimensionalityReduction.pptx
DimensionalityReduction.pptx36rajneekant
 
Introduction to Hadron Structure from Lattice QCD
Introduction to Hadron Structure from Lattice QCDIntroduction to Hadron Structure from Lattice QCD
Introduction to Hadron Structure from Lattice QCDChristos Kallidonis
 
5 DimensionalityReduction.pdf
5 DimensionalityReduction.pdf5 DimensionalityReduction.pdf
5 DimensionalityReduction.pdfRahul926331
 
Ultimate astronomicalimaging
Ultimate astronomicalimagingUltimate astronomicalimaging
Ultimate astronomicalimagingClifford Stone
 
Mathematics and AI
Mathematics and AIMathematics and AI
Mathematics and AIMarc Lelarge
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsElvis DOHMATOB
 
Mapping Ash Tree Colonization in an Agricultural Moutain Landscape_ Investiga...
Mapping Ash Tree Colonization in an Agricultural Moutain Landscape_ Investiga...Mapping Ash Tree Colonization in an Agricultural Moutain Landscape_ Investiga...
Mapping Ash Tree Colonization in an Agricultural Moutain Landscape_ Investiga...grssieee
 
Journey to structure from motion
Journey to structure from motionJourney to structure from motion
Journey to structure from motionJa-Keoung Koo
 
Vladimir Milov and Andrey Savchenko - Classification of Dangerous Situations...
Vladimir Milov and  Andrey Savchenko - Classification of Dangerous Situations...Vladimir Milov and  Andrey Savchenko - Classification of Dangerous Situations...
Vladimir Milov and Andrey Savchenko - Classification of Dangerous Situations...AIST
 
Introduction to Diffusion Monte Carlo
Introduction to Diffusion Monte CarloIntroduction to Diffusion Monte Carlo
Introduction to Diffusion Monte CarloClaudio Attaccalite
 
SPDE presentation 2012
SPDE presentation 2012SPDE presentation 2012
SPDE presentation 2012Zheng Mengdi
 
Digital Signal Processing[ECEG-3171]-Ch1_L05
Digital Signal Processing[ECEG-3171]-Ch1_L05Digital Signal Processing[ECEG-3171]-Ch1_L05
Digital Signal Processing[ECEG-3171]-Ch1_L05Rediet Moges
 
MIRAS: the instrument aboard SMOS
MIRAS: the instrument aboard SMOSMIRAS: the instrument aboard SMOS
MIRAS: the instrument aboard SMOSadrianocamps
 
NMR Spectroscopy
NMR SpectroscopyNMR Spectroscopy
NMR Spectroscopyclayqn88
 
Imaging the Unseen: Taking the First Picture of a Black Hole
Imaging the Unseen: Taking the First Picture of a Black HoleImaging the Unseen: Taking the First Picture of a Black Hole
Imaging the Unseen: Taking the First Picture of a Black HoleDatabricks
 
Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)Frank Nielsen
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier홍배 김
 
Distributed Data Processing using Spark by Panos Labropoulos_and Sarod Yataw...
Distributed Data Processing using Spark by  Panos Labropoulos_and Sarod Yataw...Distributed Data Processing using Spark by  Panos Labropoulos_and Sarod Yataw...
Distributed Data Processing using Spark by Panos Labropoulos_and Sarod Yataw...Spark Summit
 

Ähnlich wie PCA Removes Atmospheric Signal from Astronomical Images (20)

DimensionalityReduction.pptx
DimensionalityReduction.pptxDimensionalityReduction.pptx
DimensionalityReduction.pptx
 
Introduction to Hadron Structure from Lattice QCD
Introduction to Hadron Structure from Lattice QCDIntroduction to Hadron Structure from Lattice QCD
Introduction to Hadron Structure from Lattice QCD
 
5 DimensionalityReduction.pdf
5 DimensionalityReduction.pdf5 DimensionalityReduction.pdf
5 DimensionalityReduction.pdf
 
Ultimate astronomicalimaging
Ultimate astronomicalimagingUltimate astronomicalimaging
Ultimate astronomicalimaging
 
Mathematics and AI
Mathematics and AIMathematics and AI
Mathematics and AI
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priors
 
Jere Koskela slides
Jere Koskela slidesJere Koskela slides
Jere Koskela slides
 
Mapping Ash Tree Colonization in an Agricultural Moutain Landscape_ Investiga...
Mapping Ash Tree Colonization in an Agricultural Moutain Landscape_ Investiga...Mapping Ash Tree Colonization in an Agricultural Moutain Landscape_ Investiga...
Mapping Ash Tree Colonization in an Agricultural Moutain Landscape_ Investiga...
 
Journey to structure from motion
Journey to structure from motionJourney to structure from motion
Journey to structure from motion
 
Xray interferometry
Xray interferometryXray interferometry
Xray interferometry
 
Vladimir Milov and Andrey Savchenko - Classification of Dangerous Situations...
Vladimir Milov and  Andrey Savchenko - Classification of Dangerous Situations...Vladimir Milov and  Andrey Savchenko - Classification of Dangerous Situations...
Vladimir Milov and Andrey Savchenko - Classification of Dangerous Situations...
 
Introduction to Diffusion Monte Carlo
Introduction to Diffusion Monte CarloIntroduction to Diffusion Monte Carlo
Introduction to Diffusion Monte Carlo
 
SPDE presentation 2012
SPDE presentation 2012SPDE presentation 2012
SPDE presentation 2012
 
Digital Signal Processing[ECEG-3171]-Ch1_L05
Digital Signal Processing[ECEG-3171]-Ch1_L05Digital Signal Processing[ECEG-3171]-Ch1_L05
Digital Signal Processing[ECEG-3171]-Ch1_L05
 
MIRAS: the instrument aboard SMOS
MIRAS: the instrument aboard SMOSMIRAS: the instrument aboard SMOS
MIRAS: the instrument aboard SMOS
 
NMR Spectroscopy
NMR SpectroscopyNMR Spectroscopy
NMR Spectroscopy
 
Imaging the Unseen: Taking the First Picture of a Black Hole
Imaging the Unseen: Taking the First Picture of a Black HoleImaging the Unseen: Taking the First Picture of a Black Hole
Imaging the Unseen: Taking the First Picture of a Black Hole
 
Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
Distributed Data Processing using Spark by Panos Labropoulos_and Sarod Yataw...
Distributed Data Processing using Spark by  Panos Labropoulos_and Sarod Yataw...Distributed Data Processing using Spark by  Panos Labropoulos_and Sarod Yataw...
Distributed Data Processing using Spark by Panos Labropoulos_and Sarod Yataw...
 

Kürzlich hochgeladen

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 

Kürzlich hochgeladen (20)

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 

PCA Removes Atmospheric Signal from Astronomical Images

  • 1. Using Principal Component Analysis to Remove Correlated Signal from Astronomical Images Kim Scott National Radio Astronomy Observatory Data Science Meet-up February 18, 2014
  • 2. Galaxy Evolution in One Slide...
  • 3. Galaxy Evolution in One Slide...
  • 4. Galaxy Evolution in One Slide... ?
  • 5. Galaxy Surveys – What Are We Missing?
  • 6. Galaxy Surveys – What Are We Missing? Optical surveys miss ~50% of star formation in galaxies Optical surveys are biased Dust reemits stellar radiation at infrared to millimeter wavelengths (λ ~ 20 – 2000 μm)
  • 7. Galaxy Surveys at (Sub)mm Wavelengths Atmospheric emission 1000× stronger than signal from galaxies Extragalactic emission: Transmitted Absorbed
  • 8. Removing the Atmosphere by Modulating the Signal in Time Detector array Galaxy
  • 9. Removing the Atmosphere by Modulating the Signal in Time Detector array i=1 i=2 i=3 Galaxy xij: power measured for time sample i on detector j
  • 10. Surveys at λ=1.1mm with AzTEC ASTE Telescope AzTEC Dewar AzTEC Array (117 detectors)
  • 11. Raw Time-stream Data Sample rate = 1∕(15.625 ms)
  • 12. Raw Time-stream Data Sample rate = 1∕(15.625 ms) (20 s = 1280 samples)
  • 13. Principal Component Analysis (PCA) [Used in supervised learning to compress data - fit to fewer number of features] • xij: power measured for time sample i on detector j • n = number of detectors; m = number of time samples • X = [ x1 x2 ... xm ] → n × m matrix *Only input needed for PCA*
  • 14. Principal Component Analysis (PCA) Step 1: Mean normalization (and feature scaling) • Compute μj = (1∕m) Σi=1,m xij for each detector • Compute σ2j = (1∕(m-1)) Σi=1,m (xij - μj)2 for each detector • Set xij (xij − μj) ∕ σj • X = [ x1 x2 ... xm ] → n × m matrix
  • 15. Principal Component Analysis (PCA) Step 1: Mean normalization (and feature scaling) • Compute μj = (1∕m) Σi=1,m xij for each detector • Compute σ2j = (1∕(m-1)) Σi=1,m (xij - μj)2 for each detector • Set xij (xij − μj) ∕ σj • X = [ x1 x2 ... xm ] → n × m matrix
  • 16. Principal Component Analysis (PCA) Step 1: Mean normalization (and feature scaling) • Compute μj = (1∕m) Σi=1,m xij for each detector • Compute σ2j = (1∕(m-1)) Σi=1,m (xij - μj)2 for each detector • Set xij (xij − μj) ∕ σj • X = [ x1 x2 ... xm ] → n × m matrix 1mV *PCA can identify lower level correlations among subsets of the detectors*
  • 17. Principal Component Analysis (PCA) Step 2: Calculate covariance matrix • C = (1∕m) X XT (recall m = # time samples) • C → n × n symmetric matrix (recall n = 117 detectors) Step 3: Eigen decomposition • C = Q Λ Q-1 (*solve using SVD*) • Q = [ q1 q2 ... qn ] → n × n matrix containing eigenvectors qi • Λ → n × n diagonal matrix containing eigenvalues λi = Λii • Principal components = uncorrelated variables
  • 18. Principal Component Analysis (PCA) Step 4: Choose number of components to remove • Goal: choose fewest number of components (k) to REMOVE most of the observed variance in the data • QR = [ qk+1 qk+2 ... qn ] → n × k matrix, k < n • Z = [ z1 z2 ... zm ] = QRT X → k x m matrix • To derive model of galaxy intensities on sky, use Z instead of X (but...) Choosing k: Variance after PCA (given k) < 0.05 Variance with average subtraction only
  • 19. Principal Component Analysis (PCA) Step 5: Reconstruct data without correlated signal • Know RA/Dec for each detector: need to reconstruct approximation for data to make image • XR = QR Z → n × m matrix with correlated signal removed! 1mV
  • 20. Principal Component Analysis (PCA) Step 5: Reconstruct data without correlated signal • Know RA/Dec for each detector: need to reconstruct approximation for data to make image • XR = QR Z → n × m matrix with correlated signal removed! 20μV *Variance reduced by factor of 50*
  • 21. Image of PKS J1127-1857 Make the map: • Use information on sky position for each detector at each time sample (RAij, Decij) and bin data onto image grid • Set the intensity of each image pixel to the average of the xRij values that fall into that bin • Smooth image by telescope point-spread response function (Gaussian with FWHM=30’’) Average Subtraction PCA Cleaned • raw data = 30 MB • ttot = 4 min • 16640 samples/detector
  • 22. An Extragalactic Survey at λ=1.1 mm • Most galaxies are 100× fainter than PKS J1127-1857 • raw data ~ 25 GB • ttot ~ 80 hrs • ~ 2×107 samples/detector • AzTEC/COSMOS survey • 0.7 deg2 • 500× area of HUDF • 160 hrs versus 11 days for HUDF • 130 mm-bright galaxies Aretxaga et al. 2011
  • 23. An Extragalactic Survey at λ=1.1 mm • AzTEC/COSMOS survey • 0.7 deg2 • 500× area of HUDF • 160 hrs versus 270 hrs for HUDF • 130 mm-bright galaxies
  • 24. An Extragalactic Survey at λ=1.1 mm • AzTEC/COSMOS survey • 0.7 deg2 • 500× area of HUDF • 160 hrs versus 270 hrs for HUDF • 130 mm-bright galaxies
  • 25. An Extragalactic Survey at λ=1.1 mm • AzTEC-3 • Observed 1 Gyr after Big Bang • Starburst galaxy (SFR~1000 Msun/yr) Capak et al. 2011 • AzTEC/COSMOS survey • 0.7 deg2 • 500× area of HUDF • 160 hrs versus 270 hrs for HUDF • 130 mm-bright galaxies Aretxaga et al. 2011