Talk at the EarthCube End-User Domain Workshop for Rock Deformation and Mineral Physics Research.
By Martin Kunz, Lawrence Berkeley National Laboratory
Axa Assurance Maroc - Insurer Innovation Award 2024
Toward Real-Time Analysis of Large Data Volumes for Diffraction Studies by Martin Kunz, LBNL
1. Towards real-time analysis of large data volumes for synchrotron experiments
Martin Kunz, Nobumichi Tamura
Advanced Light Source, Lawrence Berkeley National Lab
2. Towards real-time analysis of large data volumes
for synchrotron experiments
Acknowledgements
- Jack Deslippe, David Skinner (NERSC)
- Abdelilah Essiari , Craig E. Tull (LBNL-CRD)
- Eli Dart (ESNET)
- Dula Parkinson (LBNL – ALS)
3. Towards real-time analysis of large data volumes
for synchrotron experiments
X-rays and Earth-Sciences; the story of a moving bottle-neck:
1960’s / 1970’s
X-ray Source
X-ray Detectors
Henry Levy with Picker 5-circle and PDP-5
Data Analysis
Publication
4. Towards real-time analysis of large data volumes
for synchrotron experiments
X-rays and Earth-Sciences; the story of a moving bottle-neck:
1980’s / 1990’s
X-ray Source
X-ray Detectors
1995: “MD Storm”: Readout time: 45 minutes
Data Analysis
Publication
5. Towards real-time analysis of large data volumes
for synchrotron experiments
X-rays and Earth-Sciences; the story of a moving bottle-neck:
2000’s / 2010’s
X-ray Source
X-ray Detectors
Data Analysis
Publication
6. Towards real-time analysis of large data volumes
for synchrotron experiments
X-rays and Earth-Sciences; the story of a moving bottle-neck:
Future:
X-ray Source
X-ray Detectors
Interactive access to supercomputers
Data Analysis
Publication
7. Towards real-time analysis of large data volumes
for synchrotron experiments
Examples of mineral physics related experiments with high data rates:
1) In situ powder diffraction with automated P-T stepping:
ALS BL 12.2.2 with Perkin Elmer detector (~ 0 read-out delay)
http://www.ltp-oldenburg.de
Data rate in the order of 1000’s of frames per day (i.e. 10’s of GB/day)
8. Towards real-time analysis of large data volumes
for synchrotron experiments
Examples of mineral physics related experiments with high data rates:
2) Micro-diffraction / phase/orientation/strain-mapping at high spatial resolution
Micro-diffraction set-up at ALS beamline 12.3.2 with
Pilatus-1M detector.
Left: Distribution of Re3N (black) and Re (blue) grown in a laser-heated DAC
Right: Relative orientation of Re3N grains.
Source: Friedrich et al. (2010), PRL (105), 085504.
Data rate in the order of 10000’s of frames per day (i.e. 100’s of GB/day)
9. Towards real-time analysis of large data volumes
for synchrotron experiments
Examples of mineral physics related experiments with high data rates:
3) Tomography 3d-mapping of geo-materials:
X-rays
Scintillator
Supercritical CO2 penetrating sandstone on ALS BL 8.3.2 (courtesy J
Ajo-Franklin)
Tomography set-up at ALS beamline 8.3.2
Distribution of Fe-alloy melt prepared at 64 GPa measured at SSRL. Shi et al. (2013)
Nature Geosciences. DOI: 10.1038/NGEO1956
Data rate in the order of 100’000’s of frames per day (i.e. TB’s/day)
10. Towards real-time analysis of large data volumes
for synchrotron experiments
How do we tackle this at the ALS?
1) Not-quite-real-time - local cluster for micro-diffraction analysis
- 24 dual-socket AMD Opteron 248 2.2Ghz processor nodes 48 CPU’s
- 48 GB aggregate memory
- 14 TB shared disk storage
- Gigabit Ethernet interconnect
- 212 GFLOPS (theoretical peak)
11. Towards real-time analysis of large data volumes
for synchrotron experiments
How do we tackle this at the ALS?
1) Not-quite-real-time - local cluster for micro-diffraction analysis
1) User tunes parameters manually on some ‘typical’ patterns
12. Towards real-time analysis of large data volumes
for synchrotron experiments
How do we tackle this at the ALS?
1) Not-quite-real-time - local cluster for micro-diffraction analysis
1) Analysis Parameters are written into a instruction-file
13. Towards real-time analysis of large data volumes
for synchrotron experiments
How do we tackle this at the ALS?
1) Not-quite-real-time - local cluster for micro-diffraction analysis
1) Analysis Parameters are written into a instruction-file
14. Towards real-time analysis of large data volumes
for synchrotron experiments
How do we tackle this at the ALS?
1) Not-quite-real-time - local cluster for micro-diffraction analysis
2) Launch parsing script:
-> reads instruction file and parses data-file onto available CPU’s
-> writes batch files which manage individual CPU’s
-> launches software on each node
15. Towards real-time analysis of large data volumes
for synchrotron experiments
How do we tackle this at the ALS?
1) Not-quite-real-time - local cluster for micro-diffraction analysis
3) Results are written in a single file which can be viewed and further analyzed and published:
Relative lattice orientation: Gives domain structure.
Total color range blue to red corresponds to 4 degs rotation.
Average Intensity: Gives high-res fine structure of grain
16. Towards real-time analysis of large data volumes
for synchrotron experiments
How do we tackle this at the ALS?
2) Real time – collaboration with National Energy Research Scientific Computing Center (NERSC)
(in development)
1) Data are sent directly to NERSC for analysis and storage during data collection
Data are packaged:
- after every n images a ‘trigger file’ is deposited in a
directory which is monitored by NERSC.
- a SPADE web-app wraps the data (512 files at a
time) with HDF5 (hierarchical data format) and ships
them to NERSC via a Gigabit line (will be upgraded to
10G line).
- at NERSC data are received by a SPADE instance,
places them in target folder and on tape, and sends
an acknowledgment.
17. Towards real-time analysis of large data volumes
for synchrotron experiments
How do we tackle this at the ALS?
2) Real time – collaboration with National Energy Research Scientific Computing Center (NERSC)
(in development)
1) Data are sent directly to NERSC for analysis and storage during data collection Up and running
Transfer control is web-based
18. Towards real-time analysis of large data volumes
for synchrotron experiments
How do we tackle this at the ALS?
2) Real time – collaboration with National Energy Research Scientific Computing Center (NERSC)
(in development)
1) Data are sent directly to NERSC for analysis and storage during data collection Up and running
Transfer control is web-based
19. Towards real-time analysis of large data volumes
for synchrotron experiments
How do we tackle this at the ALS?
2) Real time – collaboration with National Energy Research Scientific Computing Center (NERSC)
(in development)
1) Data are sent directly to NERSC for analysis and storage during data collection: Up and running
Transfer control is web-based
20. Towards real-time analysis of large data volumes
for synchrotron experiments
How do we tackle this at the ALS?
2) Real time – collaboration with National Energy Research Scientific Computing Center (NERSC)
(in development)
2) Analysis parameters are set-up with a web-app - under development
21. Towards real-time analysis of large data volumes
for synchrotron experiments
How do we tackle this at the ALS?
2) Real time – collaboration with National Energy Research Scientific Computing Center (NERSC)
(in development)
2) Analysis parameters are set-up with a web-app - under development
Jobs are launched manually by user via same web-page.
Test-runs indicate analysis time in the order of data collection time;
can in principle run synchronous to data collection.
22. Towards real-time analysis of large data volumes
for synchrotron experiments
How do we tackle this at the ALS?
2) Real time – collaboration with National Energy Research Scientific Computing Center (NERSC)
(in development)
3) Analysis jobs are executed on Carver - under development
Carver is an IBM iDataPlex cluster
- 1202 nodes with a total of 9984 processor cores
- 106 Tflop/sec peak performance
- largest allocated parallel job is 512 cores
23. Towards real-time analysis of large data volumes
for synchrotron experiments
Summary:
- Data analysis is the new bottle-neck limiting progress in many aspects of experimental mineral
physics
- Real-time analysis with immediate feed-back is increasingly important in experimental mineral
physics
- These challenges cannot always be met with traditional desktop machines – software has to be
automatized and parallelized; collaborations with super-computing is becoming important also for
experimental scientists (at least for a few more iterations of Moore’s cycle).
- Data analysis on super-computers, remotely controlled with web-applications is a very promising
alley, allowing for big-data methods to enter mineral physics.
- Future developments may (must?) evolve away from super computers to highly parallelized
(GPU’s) local computers and/or cloud computing.
Hinweis der Redaktion
I would like to start off by giving a brief slightly personalized historic perspective on the application of X-rays in mineral physics research:
X-rays are applied in Earth Sciences on a routine basis for about 50 years, this story thus pretty much parallels my life. In the 60-ies and 70-ies, when I was just learning how to spell X-ray the first automated diffractometer replaced fully manual film techniques…. The brightness of the X-rays available in those days limited a data collection powder or single crystal to days and weeks.
This changed most dramatically with the advent of dedicated light sources, in particular high-energy 3rd generation sources such as the ESRF in Grenoble where the first dedicated mineral physics beamline ID30. I meanwhile managed to spell X-rays and thus was fortunate enough to be involved in the early days of said dedicated beamline. The brilliance of the ID30 undulator enabled experiments through a diamond anvil cell to be performed in matter of seconds. However, each data point required the physical transport of a 1 x 1 ft image plate to the one and only IP reader on the floor, plus a read-out time of about 45 minutes. Sadly, the tremendous increase in brightness and flux of the X-ray sources could only be utilized in a limited way.
Another twenty years later - the age-apropriate amount of light sources meanwhile doesn’t fit on my birthday cake anymore - we hail the advent of ultra-fast and ultra-low noise direct detection X-ray detectors such as the Perkin-Elmer or pilatus, which - in principle- allow data-point rates of up to 30 Hz. This leads to the possibility of large data rates. However, our capabil abilities to deal with these data are largely still on the level of high-end desktops and serial work-flow software. The opportunity given to us by the combination of ever brighter lightsources and fast detectors, I.e. to apply big-data methods to mineral physics research can therefore not be fully harnessed.
The way out of this bottleneck is in automatizing and parallelizing the analysis workflow using - at least for the time being - massively parallel super-computers. This is the approach we are presently taking at the Advanced Lightsource in collaboration with the National Energy Research Scientific Computing Center.
Let me quickly give you 3 examples of the order of magnitude of data rates we have to deal with:
Intense X-rays and fast detector, coupled with programmable T and P change allows a much denser coverage of the P-V-T surface and thus a much better description of thermo-elastic properties of Earth materials and their phase transitions….
Mineral physics experiments involving very high temperatures and pressures invariable forces us to deal with large spatial and temporal gradients of pressure, temperature and chemical composition. High-spatial or temporal resolution is therefore needed to explore these inhomegenities. Fast detectors and bright X-rays thus allow us to collect spatially / and or temporally highly resolved maps of our sample…..
Going beyond diffraction, various flavors of tomographic techniques allow now to create 3-dimensional images of samples in- and ex-situ, if needed even with chemical or phase selectivity. Such experiments …..
This solution works fairly well with medium-sized datasets of up to 10000 frames; With larger data volumes and/or tricky data, data analysis even on a 48 CPU cluster can take much more than the data collection