Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Histogram Equalized Heat Maps from Log Data via Apache Spark with Arvind Rao

598 Aufrufe

Veröffentlicht am

Reverse geocoding is one of HERE Technologies most heavily used services. From its access logs geocodes can be extracted and then counted with respect to some cellulation of the earth–-creating a sparse heat map. For our Place & Address Search products we use such a heat map to define a notion of relative place importance to rank and index addresses and places. However, the large data size, sparsity, and variations in traffic from which different global heat maps may be derived, makes faithful visualization and comparison a challenge. Additionally, common implementations of spatial image processing techniques that can help address the aforementioned challenges don’t map directly onto Spark’s computing engine.

In this talk Arvind Rao will describe implementations of histogram equalization and kernel-based sparse image processing methods on Spark. Histogram equalization, which is best known as a method of contrast enhancement, automatically normalizes images, facilitating comparison. Along the way, Arvind will talk about how HERE uses heat maps as a feature in their autocompletion service, and say just enough about perception of contrast to put histogram equalization in context.

Veröffentlicht in: Daten & Analysen
  • Als Erste(r) kommentieren

Histogram Equalized Heat Maps from Log Data via Apache Spark with Arvind Rao

  1. 1. Arvind Rao, HERE Technologies @cwcomplex Spatial Processing of Global Heat Maps with Apache Spark #EUds13
  2. 2. Agenda • Definition of the heat map – How its built from access logs • Contrast enhancement – Description of histogram equalization • Kernel based image processing – Gaussian blur. Motivated by noise removal 2@cwcomplex #EUds13
  3. 3. 3@cwcomplex #EUds13
  4. 4. 4@cwcomplex #EUds13
  5. 5. Heat Map is a Spatial Histogram • The earth is partitioned into cells – Defined by fixed latitude & longitude increments • Reverse geocodes within each cell are summed 5@cwcomplex #EUds13
  6. 6. Aggregation of Request Coordinates 6@cwcomplex #EUds13
  7. 7. Caveat: Cell Areas Vary by Latitude 7@cwcomplex #EUds13 India Europe
  8. 8. Definition & Data • Not a conventional image – Not acquired by a camera or sensor – ~ 6.5 trillion tiles • High Resolution – For perspective a 4k display has ~ 8 million pixels – By comparison the heatmap is ~ 1 million times more detailed 8@cwcomplex #EUds13
  9. 9. How is it used in HERE Search 9@cwcomplex #EUds13
  10. 10. The Histogram is SUPER Skewed 10@cwcomplex #EUds13
  11. 11. The Histogram is SUPER Skewed 11@cwcomplex #EUds13
  12. 12. Loose Description of Histogram Equalization (HE) @cwcomplex #EUds13 12
  13. 13. Loose Description of Histogram Equalization (HE) Literally mapping data to percentile rank. @cwcomplex #EUds13 13
  14. 14. Loose Description of HE 𝑙 ↦ 𝑃(𝐿 < 𝑙)×𝑁 Max Pixel Value @cwcomplex #EUds13 14
  15. 15. Loose Description of HE T @cwcomplex #EUds13 15
  16. 16. Log Scaled vs. Histogram Normalized @cwcomplex #EUds13 16
  17. 17. HE – Well- Defined Comparison @cwcomplex #EUds13 17
  18. 18. HE Implementation 18@cwcomplex #EUds13 Heat Map DF Row(latitude, longitude, count) Map Row(latitude, longitude, density) Row(latitude, longitude, CDF(density) × N) Map
  19. 19. Cumulative Distribution Function 19@cwcomplex #EUds13 Row(latitude, longitude, density) Histogram Row(density, count) group by on density collect locally & accumulate CDF Array of (density, count) CDF Broadcast of CDF Array
  20. 20. HE Implementation Binary search finds closest left bin to given density in CDF. Then linear interpolate between left and right bins 20@cwcomplex #EUds13
  21. 21. Some Things To Know • May increase the contrast of background noise, while decreasing the usable signal • HE often best in scientific images, like thermal, satellite, or X-rays @cwcomplex #EUds13 21
  22. 22. Spatial Processing • Transform image pixel by a function of neighborhood pixel values • Examples of kernel based methods: – Mean, median, Gaussian, etc. – Also useful in edge-detection & segmentation 22@cwcomplex #EUds13
  23. 23. Spatial Processing 23@cwcomplex #EUds13 neighborhood pixel of interest Imperative solution: Slide kernel window over image • Iterate pixel by pixel • Each pixel knows its neighborhood.
  24. 24. Spatial Processing – Kernel for Blur Gaussian kernel with 𝜎 = 1, 𝐻 𝑥, 𝑦 = 1 273 1 4 7 4 16 26 4 1 16 4 7 26 41 4 1 16 4 26 7 26 7 16 4 4 1 Transformed image, 𝐼7(𝑥, 𝑦) = 8 𝐻(Δ𝑥,Δ𝑦)𝐼(𝑥 + Δ𝑥, 𝑦 + Δ𝑦) ;<,;= 24@cwcomplex #EUds13 Implemented with a GroupBy kernel application Implemented with a GroupBy kernel application
  25. 25. Spatial Processing Functional approach: process each pixel independently. Sketch of Algorithm: 1. Explode heat map to DataFrame of pointers from new pixel indices to all non-null neighbors in the original image 2. Map over exploded DataFrame to apply kernel 3. GroupBy on pixel indices (latitude & longitude) 25@cwcomplex #EUds13
  26. 26. Spatial Processing 26@cwcomplex #EUds13 Exploding heat map DataFrame gives all indices of pixels in the transformed image.
  27. 27. Spatial Processing 27@cwcomplex #EUds13 The smoothed image is larger and less sparse.
  28. 28. Spatial Processing 28@cwcomplex #EUds13 Being a neighbor is reciprocal.
  29. 29. Spatial Processing 29@cwcomplex #EUds13 Should think of rows in the exploded DataFrame as a pointer from any pixel index in the image domain to its non-null neighbor from input image. Sparsity guarantees that a pixel index exists in exploded DataFrame iff it has a non-null neighbor.
  30. 30. Spatial Processing 30@cwcomplex #EUds13 𝚫x 𝚫𝐲 nb-x nb-y originating value 0 0 34.23 120.56 120 1 -1 34.24 120.55 120 2 -2 34.25 120.54 120
  31. 31. Spatial Processing Sketch of Algorithm 1. Explode heat map to DataFrame of pointers from new pixel indices to all non-null neighbors in the original image. 2. Map over exploded DataFrame to apply kernel. 3. GroupBy on pixel indices (latitude & longitude) 31@cwcomplex #EUds13
  32. 32. Frankfurt – Without & With Blurring 32@cwcomplex #EUds13
  33. 33. Berlin – Without & With Blurring 33@cwcomplex #EUds13
  34. 34. Arvind Rao, HERE Technologies @cwcomplex Take Care! #EUds13