SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Downloaden Sie, um offline zu lesen
Astronomical	
  Data	
  Compression:	
  
        Algorithms	
  &	
  Architectures	
  
                             Rob	
  Seaman	
  –	
  Na#onal	
  Op#cal	
  Astronomy	
  Observatory	
  
                             William	
  Pence	
  –	
  NASA	
  /	
  Goddard	
  Space	
  Flight	
  Center	
  
                             Richard	
  White	
  –	
  Space	
  Telescope	
  Science	
  Ins#tute	
  
                             Séverin	
  Gaudet	
  –	
  Na#onal	
  Research	
  Council	
  Canada	
  


                             See	
  also	
  poster	
  57,	
  “Op#mal	
  Compression	
  Methods	
  
                             for	
  Floa#ng-­‐point	
  Format	
  Images”,	
  Pence,	
  et	
  al.	
  


Astronomical	
  Data	
  Analysis	
  SoAware	
  &	
  Systems	
  XIX	
  –	
  Sapporo,	
  Japan	
  
Agenda	
  
  •  Overview	
  –	
  Rob	
  
  •  Tile	
  compression	
  and	
  CFITSIO	
  –	
  Bill	
  
  •  Experiences	
  with	
  FITS	
  compression	
  in	
  a	
  
     large	
  astronomical	
  archive	
  –	
  Séverin	
  
  •  Lossy	
  compression	
  –	
  Rick	
  
  •  Open	
  discussion	
  
  •  Door	
  prize!	
  
                           Thanks	
  to	
  Pete	
  Marenfeld	
  &	
  Koji	
  Mukai	
  

5	
  October	
  2009	
                     2	
  
Overview	
  

•      FITS	
  Ule	
  compression	
  
•      Rice	
  algorithm	
  
•      CFITSIO	
  /	
  FPACK	
  
•      IRAF	
  and	
  community	
  soAware	
  
•      The	
  ubiquity	
  of	
  noise:	
  	
  opUmal	
  DN	
  encoding	
  
•      The	
  role	
  of	
  sparsity:	
  	
  compressive	
  sensing	
  
•      An	
  informaUon	
  theory	
  example	
  
5	
  October	
  2009	
                    3	
  
Overview	
  

•      FITS	
  Ule	
  compression	
  
•      Rice	
  algorithm	
  
•      CFITSIO	
  /	
  FPACK	
  
•      IRAF	
  and	
  community	
  soAware	
  
•      The	
  ubiquity	
  of	
  noise:	
  	
  opUmal	
  DN	
  encoding	
  
•      The	
  role	
  of	
  sparsity:	
  	
  compressive	
  sensing	
  
•      An	
  informaUon	
  theory	
  example	
  
5	
  October	
  2009	
                    4	
  
References	
  

•  Too	
  many	
  ADASS	
  presentaUons	
  to	
  list	
  

•  See	
  references	
  within:	
  
  	
  “Lossless	
  Astronomical	
  Image	
  Compression	
  and	
  
      the	
  Effects	
  of	
  Noise”,	
  Pence,	
  Seaman	
  &	
  White,	
  
      PASP	
  v121	
  n878	
  2009,	
  
      hPp://arxiv.org/abs/0903.2140v1	
  

5	
  October	
  2009	
               5	
  
FITS	
  Ule	
  compression	
  

•     ADASS	
  1999	
  (Pence,	
  White,	
  Greenfield,	
  Tody)	
  
•     FITS	
  ConvenUon	
  v2.1,	
  2009	
  
•     Images	
  mapped	
  onto	
  FITS	
  binary	
  tables	
  
•     Headers	
  remain	
  readable	
  
•     Tiling	
  permits	
  rapid	
  RW	
  access	
  
•     Supports	
  mulUple	
  compression	
  algorithms	
  
•     First	
  &	
  every	
  copy	
  can	
  be	
  compressed	
  
5	
  October	
  2009	
                    6	
  
Rice	
  algorithm	
  

•  Fast	
  (difference	
  coding)	
  
           –  near	
  opUmum	
  compression	
  raUo	
  
           –  throughput	
  is	
  key,	
  not	
  just	
  storage	
  volume	
  
•  Numerical,	
  not	
  character-­‐based	
  like	
  gzip	
  
•  Depends	
  on	
  pixel	
  value	
  so	
  BITPIX	
  =	
  32	
  
   compresses	
  to	
  same	
  size	
  as	
  BITPIX	
  =	
  16	
  


5	
  October	
  2009	
                           7	
  
CFITSIO	
  /	
  FPACK	
  

•  fpack	
  can	
  be	
  swapped	
  in	
  for	
  gzip	
  
  	
  &	
  funpack	
  for	
  gunzip	
  
•  Library	
  support	
  (eg,	
  CFITSIO)	
  allows	
  jpeg-­‐like	
  
      access	
  –	
  compression	
  built	
  into	
  the	
  format	
  
•  More	
  opUons	
  means	
  more	
  parameters	
  –	
  
  	
  sejng	
  appropriate	
  defaults	
  is	
  key	
  

5	
  October	
  2009	
                 8	
  
IRAF	
  and	
  community	
  soAware	
  

•  Tile	
  compression	
  can	
  &	
  should	
  be	
  supported	
  
   by	
  all	
  soAware	
  that	
  reads	
  FITS	
  
•  Instrument	
  and	
  pipeline	
  soAware	
  may	
  benefit	
  
   strongly	
  from	
  wri#ng	
  compressed	
  FITS	
  
•  Transport	
  &	
  storage	
  always	
  benefit	
  
•  IRAF	
  fitsuUl	
  package	
  in	
  beta	
  tesUng	
  
•  Work	
  on	
  a	
  new	
  IRAF	
  FITS	
  kernel	
  pending	
  
•  VO	
  applicaUons	
  and	
  services	
  
5	
  October	
  2009	
            9	
  
The	
  ubiquity	
  of	
  noise	
  

•  Noise	
  is	
  incompressible	
  
•  Signals	
  are	
  correlated	
  
           –  physically	
  
           –  instrumentally	
  
•  Shannon	
  entropy:	
  	
  H	
  =	
  –	
  Σ	
  p	
  log	
  p	
  
           –  depends	
  only	
  on	
  the	
  probabiliUes	
  of	
  the	
  states	
  
           –  measures	
  “irreducible	
  complexity”	
  of	
  the	
  data	
  

5	
  October	
  2009	
                          10	
  
OpUmal	
  DN	
  encoding	
  

•  CCD	
  “square-­‐rooUng”	
  
•  Variance	
  stabilizaUon,	
  more	
  generally	
  
           –  many	
  staUsUcal	
  methods	
  assume	
  homoscedasUcity	
  
           –  generalized	
  Anscombe	
  transform	
  
•  FoundaUons	
  of	
  the	
  empirical	
  world	
  view:	
  
    –  ergodicity	
  (staUsUcal	
  homogeneity)	
  
    –  Markov	
  processes	
  (memoryless	
  systems)	
  
•  hPp://www.aspbooks.org/publica#ons/411/101.pdf	
  
5	
  October	
  2009	
                  11	
  
The	
  role	
  of	
  sparsity	
  

•  For	
  most	
  astronomical	
  data,	
  compression	
  raUo	
  
   depends	
  only	
  on	
  the	
  background	
  noise	
  
           –  Sparse	
  signals	
  are	
  negligible	
  (in	
  whatever	
  axes)	
  
           –  Noise	
  is	
  incompressible	
  


      	
  R	
  =	
  BITPIX	
  /	
  (Nbits	
  +	
  K)	
  
                	
  K	
  is	
  about	
  1.2	
  for	
  Rice	
  

5	
  October	
  2009	
                                           12	
  
Compression	
  raUo	
  
                                                    Mosaic	
  II	
     Compression	
  correlates	
  
                                                                            closely	
  with	
  noise	
  
                                                                       DisUncUve	
  funcUonal	
  
                                                                            behavior	
  
                                                                       For	
  three	
  very	
  different	
  
                                                                            comp.	
  algorithms	
  
R	
                                                                    For	
  flat-­‐field	
  and	
  bias	
  
                                                                            exposures	
  as	
  well	
  	
  	
  	
  	
  	
  	
  
                                                                            as	
  for	
  science	
  data	
  
                                                                       That	
  is,	
  for	
  pictures	
  of:	
  
                                                                            the	
  sky	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
                                                                            a	
  lamp	
  in	
  the	
  dome	
  	
  
                                                                            no	
  exposure	
  at	
  all	
  
                                                                       Signal	
  doesn’t	
  maser!	
  


        16	
  Dec	
  2008	
     Noise	
  bits	
  
–8–




                                                     A	
  beser	
  compression	
  diagram	
  
                             15
                                                                                                                                              R
                                                                                                                                             1.2
                                                                                                                                                     	
  R	
  =	
  BITPIX	
  /	
  (Nbits	
  +	
  K)	
  
                                                            Lossy	
  algorithms	
  

                                                       shi^	
  le^	
  to	
  move	
  down	
  
        Compressed bits per pixel




                                                                                                                                             1.5
                             10
                                                                                                                                                     	
  EffecUve	
  BITPIX:	
  
                                                                                                                                             2.0
                                                                                                                                                       	
   	
  BITEFF	
  =	
  BITPIX	
  /	
  R	
  

                                    5        K                                                                                                         	
   	
  BITEFF	
  =	
  Nbits	
  +	
  K	
  
                                             4                                                                                               4.0
                                             3
                                                                                                                  GZIP	
                             	
  Line	
  with:	
  
                                             2                                                                    Rice	
  
                                             1                                                                    Hcompress	
                               	
  Slope	
  =	
  1	
  
                                    0        0                                                                    Margin	
  for	
  improvement	
  
                                                                                                                                                            	
  Intercept	
  =	
  K	
  
                                                 0                                             5                 10                           15
                                                                                                   Nbits Noise
                                    16	
  Dec	
  2008	
  
Fig. 1.— Plot of compressed bits per pixel versus the number of noise bits in 16-bit synthetic
images. The solid lines are generated from the images that have N bits of pure noise, and the
Compressive	
  sensing	
  
 •  Real	
  world	
  data	
  are	
  oAen	
  sparse	
  (correlated)	
  
 •  Nyquist/Shannon	
  sampling	
  applies	
  broadly	
  
 •  But	
  we	
  can	
  do	
  even	
  beser	
  if	
  we	
  sample	
  
    against	
  purpose-­‐specific	
  axes:	
  
                       hPp://www.dsp.ece.rice.edu/cs	
  
                       hPp://nuit-­‐blanche.blogspot.com	
  
 •  Herschel	
  proof	
  of	
  concept,	
  Starck,	
  et	
  al.	
  
 •  CS	
  is	
  about	
  the	
  sampling	
  theorem	
  
 •  OpUmal	
  encoding	
  is	
  about	
  quanUzaUon	
  
5	
  October	
  2009	
                        15	
  
An	
  informaUon	
  theory	
  example	
  
           hPp://www.mapsofconsciousness.com/12coins	
  




                                                                  example of objective
5	
  October	
  2009	
                                                       16	
  
                                                                    About the Coins


                                                           If you like this game send feedback
Compression	
  =	
  opUmal	
  representaUon	
  

      A.  11	
  coins	
  all	
  the	
  same	
  
             	
  +	
  1	
  coin,	
  idenUcal	
  except	
  for	
  weight	
  

      B.  Scale	
  to	
  weigh	
  groups	
  of	
  coins	
  
      C.  In	
  only	
  3	
  steps,	
  must	
  idenUfy:	
  
                      	
  the	
  coin	
  that	
  is	
  different	
  and	
  
                      	
  whether	
  it	
  is	
  light	
  or	
  heavy	
  

      “The	
  12-­‐balls	
  Problem	
  as	
  an	
  IllustraUon	
  of	
  the	
  ApplicaUon	
  of	
  InformaUon	
  Theory”	
  
              –	
  R.H.	
  Thouless,	
  1970,	
  Math.	
  GazePe,	
  v54n389.	
  

5	
  October	
  2009	
                                                  17	
  
How	
  to	
  solve	
  a	
  problem	
  

      •  First,	
  define	
  the	
  problem	
  
                –  second,	
  entertain	
  soluUons	
  
                –  third,	
  iterate	
  (don’t	
  give	
  up)	
  

      •  More	
  basic	
  yet,	
  what	
  is	
  the	
  goal?	
  
                –  to	
  solve	
  the	
  problem?	
  
                –  or	
  to	
  understand	
  how	
  to	
  solve	
  it?	
  

      •  StaUng	
  a	
  problem	
  constrains	
  its	
  soluUons	
  
5	
  October	
  2009	
                               18	
  
What	
  do	
  we	
  know?	
  

•  One	
  bit	
  discriminates	
  two	
  equally	
  likely	
  alternaUves	
  
             	
  To	
  select	
  between	
  N	
  equal	
  choices,	
  Nbits	
  =	
  log2	
  N	
  
•  For	
  12-­‐coin	
  problem,	
  Nbits	
  =	
  log2	
  (12)	
  +	
  1	
  =	
  log2	
  24	
  
             	
  (must	
  also	
  disUnguish	
  light	
  vs.	
  heavy)	
  
•  InformaUon	
  provided	
  in	
  each	
  measurement	
  is	
  log2	
  3	
  
             	
  (3	
  posiUons	
  for	
  scale:	
  le^,	
  right,	
  balanced)	
  
•  For	
  three	
  weighings,	
  Wbits	
  =	
  log2	
  33	
  =	
  log2	
  27	
  
             	
  Meets	
  necessary	
  condiUon	
  that	
  Wbits	
  >=	
  Nbits	
  
  5	
  October	
  2009	
                                  19	
  
Necessary,	
  but	
  not	
  sufficient	
  

•  A	
  strategy	
  is	
  also	
  necessary	
  such	
  that	
  
  	
  	
   	
  Wbits	
  >=	
  Nbits	
  	
  (remaining)	
  
  	
  is	
  saUsfied	
  at	
  each	
  step	
  to	
  the	
  soluUon	
  	
  

•  Nbits	
  is	
  the	
  same	
  thing	
  as	
  the	
  entropy	
  H                                           	
  	
  
   	
   	
  H	
  =	
  –	
  Σ	
  p	
  log	
  p	
  	
  	
  	
  where	
  	
  	
  p	
  =	
  1/N	
  
    	
  	
  	
  	
  	
  	
  =	
  –	
  Σ	
  (1/N)	
  log	
  (1/N)	
  =	
  (Σ	
  (1/N))	
  log	
  N	
  =	
  log	
  N	
  
    	
  	
  H	
  =	
  log2	
  N	
  	
  	
  (in	
  bits)	
  
 5	
  October	
  2009	
                                                      20	
  
What	
  else	
  do	
  we	
  know?	
  
•  Physical	
  priors!	
  
       –  only	
  one	
  coin	
  is	
  fake	
  
       –  astronomical	
  data	
  occupy	
  sparse	
  phase	
  space	
  
•  FITS	
  arrays	
  =	
  images	
  (physical	
  priors)	
  
       –  of	
  astrophysical	
  sources	
  
       –  taken	
  through	
  physical	
  opUcs	
  
       –  recorded	
  by	
  physical	
  electronics	
  
       –  digiUzaUon	
  is	
  	
  restricted	
  by	
  informaUon	
  theory	
  
       –  possessing	
  a	
  disUncUve	
  noise	
  model	
  
 5	
  October	
  2009	
                       21	
  
5	
  October	
  2009	
               22	
  


                           example of objective
5	
  October	
  2009	
               23	
  


                           example of objective
5	
  October	
  2009	
               24	
  


                           example of objective
5	
  October	
  2009	
               25	
  


                           example of objective
ObservaUons	
  about	
  observaUons	
  

•  The	
  sequence	
  of	
  three	
  measurements	
  can	
  
   occur	
  in	
  any	
  order	
  
•  The	
  systemaUzaUon	
  of	
  the	
  soluUon	
  occurs	
  
   during	
  its	
  definiUon,	
  not	
  at	
  run	
  Ume	
  




5	
  October	
  2009	
           26	
  
Try	
  it	
  yourself	
  


hPp://heasarc.gsfc.nasa.gov/fitsio/fpack	
  

hPp://www.mapsofconsciousness.com/12coins	
  




5	
  October	
  2009	
                 27	
  

Weitere ähnliche Inhalte

Andere mochten auch

Dinamica
DinamicaDinamica
Dinamicadalo123
 
Soalan latihan matematik tahun 1 isipadu
Soalan latihan matematik tahun 1 isipaduSoalan latihan matematik tahun 1 isipadu
Soalan latihan matematik tahun 1 isipaduRose Merah
 
Soalan latihan matematik tahun 1 ( Ruang )
Soalan latihan matematik tahun 1 ( Ruang )Soalan latihan matematik tahun 1 ( Ruang )
Soalan latihan matematik tahun 1 ( Ruang )Rose Merah
 
Evaluation models
Evaluation modelsEvaluation models
Evaluation modelsRose Merah
 
Soalan matematik tahun 1 ( timbangan )
Soalan matematik tahun 1 ( timbangan )Soalan matematik tahun 1 ( timbangan )
Soalan matematik tahun 1 ( timbangan )Rose Merah
 
Soalan matematik tahun 1 (panjang)
Soalan matematik  tahun 1 (panjang)Soalan matematik  tahun 1 (panjang)
Soalan matematik tahun 1 (panjang)Rose Merah
 
Señaleas acusticas
Señaleas acusticasSeñaleas acusticas
Señaleas acusticasjuniordario
 

Andere mochten auch (7)

Dinamica
DinamicaDinamica
Dinamica
 
Soalan latihan matematik tahun 1 isipadu
Soalan latihan matematik tahun 1 isipaduSoalan latihan matematik tahun 1 isipadu
Soalan latihan matematik tahun 1 isipadu
 
Soalan latihan matematik tahun 1 ( Ruang )
Soalan latihan matematik tahun 1 ( Ruang )Soalan latihan matematik tahun 1 ( Ruang )
Soalan latihan matematik tahun 1 ( Ruang )
 
Evaluation models
Evaluation modelsEvaluation models
Evaluation models
 
Soalan matematik tahun 1 ( timbangan )
Soalan matematik tahun 1 ( timbangan )Soalan matematik tahun 1 ( timbangan )
Soalan matematik tahun 1 ( timbangan )
 
Soalan matematik tahun 1 (panjang)
Soalan matematik  tahun 1 (panjang)Soalan matematik  tahun 1 (panjang)
Soalan matematik tahun 1 (panjang)
 
Señaleas acusticas
Señaleas acusticasSeñaleas acusticas
Señaleas acusticas
 

Ähnlich wie Compression Bo F 2009

IBM Business Analytics and Optimization - Traffic Management with IBM InfoSph...
IBM Business Analytics and Optimization - Traffic Management with IBM InfoSph...IBM Business Analytics and Optimization - Traffic Management with IBM InfoSph...
IBM Business Analytics and Optimization - Traffic Management with IBM InfoSph...IBM Sverige
 
Grid: New Business Opportunities?
Grid: New Business Opportunities?Grid: New Business Opportunities?
Grid: New Business Opportunities?Cybera Inc.
 
Distributed Data Analysis with Hadoop and R - OSCON 2011
Distributed Data Analysis with Hadoop and R - OSCON 2011Distributed Data Analysis with Hadoop and R - OSCON 2011
Distributed Data Analysis with Hadoop and R - OSCON 2011Jonathan Seidman
 
A2 Structure Of Source Models Measurement Methods R Buetikofer
A2 Structure Of Source Models Measurement Methods  R BuetikoferA2 Structure Of Source Models Measurement Methods  R Buetikofer
A2 Structure Of Source Models Measurement Methods R Buetikoferahmad bassiouny
 
産総研におけるプライベートクラウドへの取り組み
産総研におけるプライベートクラウドへの取り組み産総研におけるプライベートクラウドへの取り組み
産総研におけるプライベートクラウドへの取り組みRyousei Takano
 
TOP500 - 20th Anniversary
TOP500 - 20th Anniversary TOP500 - 20th Anniversary
TOP500 - 20th Anniversary top500
 
2600 v05 n3 (autumn 1988)
2600 v05 n3 (autumn 1988)2600 v05 n3 (autumn 1988)
2600 v05 n3 (autumn 1988)Felipe Prado
 
Complex model of fso links
Complex model of fso linksComplex model of fso links
Complex model of fso linkswtyru1989
 
High-Performance In0.75Ga0.25As Implant-Free n-Type MOSFETs for Low Power App...
High-Performance In0.75Ga0.25As Implant-Free n-Type MOSFETs for Low Power App...High-Performance In0.75Ga0.25As Implant-Free n-Type MOSFETs for Low Power App...
High-Performance In0.75Ga0.25As Implant-Free n-Type MOSFETs for Low Power App...ayubimoak
 
Brownbag Talk 061902
Brownbag Talk 061902Brownbag Talk 061902
Brownbag Talk 061902amcknightus
 
NASA-NOAA Cooperative Supports for Aqua and Aura Missions
NASA-NOAA Cooperative Supports for Aqua and Aura MissionsNASA-NOAA Cooperative Supports for Aqua and Aura Missions
NASA-NOAA Cooperative Supports for Aqua and Aura Missionsledlow
 

Ähnlich wie Compression Bo F 2009 (20)

IBM Business Analytics and Optimization - Traffic Management with IBM InfoSph...
IBM Business Analytics and Optimization - Traffic Management with IBM InfoSph...IBM Business Analytics and Optimization - Traffic Management with IBM InfoSph...
IBM Business Analytics and Optimization - Traffic Management with IBM InfoSph...
 
DAC 2012
DAC 2012DAC 2012
DAC 2012
 
Grid: New Business Opportunities?
Grid: New Business Opportunities?Grid: New Business Opportunities?
Grid: New Business Opportunities?
 
Distributed Data Analysis with Hadoop and R - OSCON 2011
Distributed Data Analysis with Hadoop and R - OSCON 2011Distributed Data Analysis with Hadoop and R - OSCON 2011
Distributed Data Analysis with Hadoop and R - OSCON 2011
 
Arduino gps2 060314
Arduino gps2 060314Arduino gps2 060314
Arduino gps2 060314
 
A2 Structure Of Source Models Measurement Methods R Buetikofer
A2 Structure Of Source Models Measurement Methods  R BuetikoferA2 Structure Of Source Models Measurement Methods  R Buetikofer
A2 Structure Of Source Models Measurement Methods R Buetikofer
 
産総研におけるプライベートクラウドへの取り組み
産総研におけるプライベートクラウドへの取り組み産総研におけるプライベートクラウドへの取り組み
産総研におけるプライベートクラウドへの取り組み
 
TOP500 - 20th Anniversary
TOP500 - 20th Anniversary TOP500 - 20th Anniversary
TOP500 - 20th Anniversary
 
mpeg4
mpeg4mpeg4
mpeg4
 
mpeg4
mpeg4mpeg4
mpeg4
 
mpeg4
mpeg4mpeg4
mpeg4
 
2600 v05 n3 (autumn 1988)
2600 v05 n3 (autumn 1988)2600 v05 n3 (autumn 1988)
2600 v05 n3 (autumn 1988)
 
Digital Media Production
Digital Media ProductionDigital Media Production
Digital Media Production
 
The RDFa, seo wave
The RDFa, seo waveThe RDFa, seo wave
The RDFa, seo wave
 
Complex model of fso links
Complex model of fso linksComplex model of fso links
Complex model of fso links
 
High-Performance In0.75Ga0.25As Implant-Free n-Type MOSFETs for Low Power App...
High-Performance In0.75Ga0.25As Implant-Free n-Type MOSFETs for Low Power App...High-Performance In0.75Ga0.25As Implant-Free n-Type MOSFETs for Low Power App...
High-Performance In0.75Ga0.25As Implant-Free n-Type MOSFETs for Low Power App...
 
Brownbag Talk 061902
Brownbag Talk 061902Brownbag Talk 061902
Brownbag Talk 061902
 
Barcamp PT
Barcamp PTBarcamp PT
Barcamp PT
 
NASA-NOAA Cooperative Supports for Aqua and Aura Missions
NASA-NOAA Cooperative Supports for Aqua and Aura MissionsNASA-NOAA Cooperative Supports for Aqua and Aura Missions
NASA-NOAA Cooperative Supports for Aqua and Aura Missions
 
Undergraduate Thesis
Undergraduate ThesisUndergraduate Thesis
Undergraduate Thesis
 

Compression Bo F 2009

  • 1. Astronomical  Data  Compression:   Algorithms  &  Architectures   Rob  Seaman  –  Na#onal  Op#cal  Astronomy  Observatory   William  Pence  –  NASA  /  Goddard  Space  Flight  Center   Richard  White  –  Space  Telescope  Science  Ins#tute   Séverin  Gaudet  –  Na#onal  Research  Council  Canada   See  also  poster  57,  “Op#mal  Compression  Methods   for  Floa#ng-­‐point  Format  Images”,  Pence,  et  al.   Astronomical  Data  Analysis  SoAware  &  Systems  XIX  –  Sapporo,  Japan  
  • 2. Agenda   •  Overview  –  Rob   •  Tile  compression  and  CFITSIO  –  Bill   •  Experiences  with  FITS  compression  in  a   large  astronomical  archive  –  Séverin   •  Lossy  compression  –  Rick   •  Open  discussion   •  Door  prize!   Thanks  to  Pete  Marenfeld  &  Koji  Mukai   5  October  2009   2  
  • 3. Overview   •  FITS  Ule  compression   •  Rice  algorithm   •  CFITSIO  /  FPACK   •  IRAF  and  community  soAware   •  The  ubiquity  of  noise:    opUmal  DN  encoding   •  The  role  of  sparsity:    compressive  sensing   •  An  informaUon  theory  example   5  October  2009   3  
  • 4. Overview   •  FITS  Ule  compression   •  Rice  algorithm   •  CFITSIO  /  FPACK   •  IRAF  and  community  soAware   •  The  ubiquity  of  noise:    opUmal  DN  encoding   •  The  role  of  sparsity:    compressive  sensing   •  An  informaUon  theory  example   5  October  2009   4  
  • 5. References   •  Too  many  ADASS  presentaUons  to  list   •  See  references  within:    “Lossless  Astronomical  Image  Compression  and   the  Effects  of  Noise”,  Pence,  Seaman  &  White,   PASP  v121  n878  2009,   hPp://arxiv.org/abs/0903.2140v1   5  October  2009   5  
  • 6. FITS  Ule  compression   •  ADASS  1999  (Pence,  White,  Greenfield,  Tody)   •  FITS  ConvenUon  v2.1,  2009   •  Images  mapped  onto  FITS  binary  tables   •  Headers  remain  readable   •  Tiling  permits  rapid  RW  access   •  Supports  mulUple  compression  algorithms   •  First  &  every  copy  can  be  compressed   5  October  2009   6  
  • 7. Rice  algorithm   •  Fast  (difference  coding)   –  near  opUmum  compression  raUo   –  throughput  is  key,  not  just  storage  volume   •  Numerical,  not  character-­‐based  like  gzip   •  Depends  on  pixel  value  so  BITPIX  =  32   compresses  to  same  size  as  BITPIX  =  16   5  October  2009   7  
  • 8. CFITSIO  /  FPACK   •  fpack  can  be  swapped  in  for  gzip    &  funpack  for  gunzip   •  Library  support  (eg,  CFITSIO)  allows  jpeg-­‐like   access  –  compression  built  into  the  format   •  More  opUons  means  more  parameters  –    sejng  appropriate  defaults  is  key   5  October  2009   8  
  • 9. IRAF  and  community  soAware   •  Tile  compression  can  &  should  be  supported   by  all  soAware  that  reads  FITS   •  Instrument  and  pipeline  soAware  may  benefit   strongly  from  wri#ng  compressed  FITS   •  Transport  &  storage  always  benefit   •  IRAF  fitsuUl  package  in  beta  tesUng   •  Work  on  a  new  IRAF  FITS  kernel  pending   •  VO  applicaUons  and  services   5  October  2009   9  
  • 10. The  ubiquity  of  noise   •  Noise  is  incompressible   •  Signals  are  correlated   –  physically   –  instrumentally   •  Shannon  entropy:    H  =  –  Σ  p  log  p   –  depends  only  on  the  probabiliUes  of  the  states   –  measures  “irreducible  complexity”  of  the  data   5  October  2009   10  
  • 11. OpUmal  DN  encoding   •  CCD  “square-­‐rooUng”   •  Variance  stabilizaUon,  more  generally   –  many  staUsUcal  methods  assume  homoscedasUcity   –  generalized  Anscombe  transform   •  FoundaUons  of  the  empirical  world  view:   –  ergodicity  (staUsUcal  homogeneity)   –  Markov  processes  (memoryless  systems)   •  hPp://www.aspbooks.org/publica#ons/411/101.pdf   5  October  2009   11  
  • 12. The  role  of  sparsity   •  For  most  astronomical  data,  compression  raUo   depends  only  on  the  background  noise   –  Sparse  signals  are  negligible  (in  whatever  axes)   –  Noise  is  incompressible    R  =  BITPIX  /  (Nbits  +  K)    K  is  about  1.2  for  Rice   5  October  2009   12  
  • 13. Compression  raUo   Mosaic  II   Compression  correlates   closely  with  noise   DisUncUve  funcUonal   behavior   For  three  very  different   comp.  algorithms   R   For  flat-­‐field  and  bias   exposures  as  well               as  for  science  data   That  is,  for  pictures  of:   the  sky                                                         a  lamp  in  the  dome     no  exposure  at  all   Signal  doesn’t  maser!   16  Dec  2008   Noise  bits  
  • 14. –8– A  beser  compression  diagram   15 R 1.2  R  =  BITPIX  /  (Nbits  +  K)   Lossy  algorithms   shi^  le^  to  move  down   Compressed bits per pixel 1.5 10  EffecUve  BITPIX:   2.0    BITEFF  =  BITPIX  /  R   5 K    BITEFF  =  Nbits  +  K   4 4.0 3 GZIP    Line  with:   2 Rice   1 Hcompress    Slope  =  1   0 0 Margin  for  improvement    Intercept  =  K   0 5 10 15 Nbits Noise 16  Dec  2008   Fig. 1.— Plot of compressed bits per pixel versus the number of noise bits in 16-bit synthetic images. The solid lines are generated from the images that have N bits of pure noise, and the
  • 15. Compressive  sensing   •  Real  world  data  are  oAen  sparse  (correlated)   •  Nyquist/Shannon  sampling  applies  broadly   •  But  we  can  do  even  beser  if  we  sample   against  purpose-­‐specific  axes:   hPp://www.dsp.ece.rice.edu/cs   hPp://nuit-­‐blanche.blogspot.com   •  Herschel  proof  of  concept,  Starck,  et  al.   •  CS  is  about  the  sampling  theorem   •  OpUmal  encoding  is  about  quanUzaUon   5  October  2009   15  
  • 16. An  informaUon  theory  example   hPp://www.mapsofconsciousness.com/12coins   example of objective 5  October  2009   16   About the Coins If you like this game send feedback
  • 17. Compression  =  opUmal  representaUon   A.  11  coins  all  the  same    +  1  coin,  idenUcal  except  for  weight   B.  Scale  to  weigh  groups  of  coins   C.  In  only  3  steps,  must  idenUfy:    the  coin  that  is  different  and    whether  it  is  light  or  heavy   “The  12-­‐balls  Problem  as  an  IllustraUon  of  the  ApplicaUon  of  InformaUon  Theory”   –  R.H.  Thouless,  1970,  Math.  GazePe,  v54n389.   5  October  2009   17  
  • 18. How  to  solve  a  problem   •  First,  define  the  problem   –  second,  entertain  soluUons   –  third,  iterate  (don’t  give  up)   •  More  basic  yet,  what  is  the  goal?   –  to  solve  the  problem?   –  or  to  understand  how  to  solve  it?   •  StaUng  a  problem  constrains  its  soluUons   5  October  2009   18  
  • 19. What  do  we  know?   •  One  bit  discriminates  two  equally  likely  alternaUves    To  select  between  N  equal  choices,  Nbits  =  log2  N   •  For  12-­‐coin  problem,  Nbits  =  log2  (12)  +  1  =  log2  24    (must  also  disUnguish  light  vs.  heavy)   •  InformaUon  provided  in  each  measurement  is  log2  3    (3  posiUons  for  scale:  le^,  right,  balanced)   •  For  three  weighings,  Wbits  =  log2  33  =  log2  27    Meets  necessary  condiUon  that  Wbits  >=  Nbits   5  October  2009   19  
  • 20. Necessary,  but  not  sufficient   •  A  strategy  is  also  necessary  such  that        Wbits  >=  Nbits    (remaining)    is  saUsfied  at  each  step  to  the  soluUon     •  Nbits  is  the  same  thing  as  the  entropy  H        H  =  –  Σ  p  log  p        where      p  =  1/N              =  –  Σ  (1/N)  log  (1/N)  =  (Σ  (1/N))  log  N  =  log  N      H  =  log2  N      (in  bits)   5  October  2009   20  
  • 21. What  else  do  we  know?   •  Physical  priors!   –  only  one  coin  is  fake   –  astronomical  data  occupy  sparse  phase  space   •  FITS  arrays  =  images  (physical  priors)   –  of  astrophysical  sources   –  taken  through  physical  opUcs   –  recorded  by  physical  electronics   –  digiUzaUon  is    restricted  by  informaUon  theory   –  possessing  a  disUncUve  noise  model   5  October  2009   21  
  • 22. 5  October  2009   22   example of objective
  • 23. 5  October  2009   23   example of objective
  • 24. 5  October  2009   24   example of objective
  • 25. 5  October  2009   25   example of objective
  • 26. ObservaUons  about  observaUons   •  The  sequence  of  three  measurements  can   occur  in  any  order   •  The  systemaUzaUon  of  the  soluUon  occurs   during  its  definiUon,  not  at  run  Ume   5  October  2009   26  
  • 27. Try  it  yourself   hPp://heasarc.gsfc.nasa.gov/fitsio/fpack   hPp://www.mapsofconsciousness.com/12coins   5  October  2009   27