Supervised Learning models to analyze X-Ray Diffraction data
Location: Empa, Thun and University of Bern
If you are interested in this topic or have further questions, do not hesitate to contact kaspar.riesen@unibe.ch.
Background / Context
The Laboratory for Mechanics of Materials and Nanostructures at Empa (Thun) is seeking a creative student to apply Supervised Machine Learning to X-Ray Diffraction (XRD) data analysis.
To characterize new materials, we analyze their structural “fingerprints” using XRD. While crystalline materials with periodic long-range order produce sharp, distinct peaks, amorphous materials (like metallic glasses) with no periodic long-range order produce broad, diffuse signals. However, we often encounter mixed XRD signals: materials that are partly crystalline and partly amorphous, or (more difficult to discern) partly poorly crystalline and partly fully amorphous Distinguishing the crystalline vs amorphous nature of materials is critical for our understanding of structure-property-realationships.
While we have high-throughput experimental equipment generating hundreds of patterns, manual labeling is slow and unscalable. Furthermore, training robust Machine Learning models requires thousands of diverse examples, but experimentally collecting and labeling such a massive dataset is slow and tedious and introduces subjectivity bias. Instead of relying solely on limited experimental data, you will therefore calculate virtual training data.
Your primary goal is to develop a Python-based generator that uses physical laws to create millions of synthetic XRD patterns. You will then use this massive synthetic dataset to train a Supervised Learning model that is robust enough to analyze real-world experimental data.
Objectives and Approach
Synthetic Data Generator
- You will write a Python pipeline to mathematically simulate realistic XRD patterns.
- Crystalline Phase: Generate Bragg peaks using crystallographic rules, varying peak positions, widths (crystallite size), and intensities.
- Amorphous Phase: Model the “amorphous hump” (diffuse scattering) using radial distribution func-tions or broad-peak approximations.
- Mixed phases: Implement advanced artifacts to make the data realistic, such as:
- Texture: Simulating preferred orientation (e.g., enhancing specific peaks while suppressing others).
- Poor Crystallinity: Bridging the gap between sharp peaks and amorphous humps (broad-ened diffraction lines).
- Signal Noise: Adding noise, background fluorescence, and detector artifacts.
Supervised Learning on Synthetic Data: Once your generator is running, you will create a balanced dataset covering many possible mixing ratio of crystalline and amorphous phases.
- Train Deep Learning models on this synthetic data.
- The model must learn to classify the nature of the phase (Crystalline vs. Amorphous vs. Mixed)
You will test your synthetic-trained model on our real-world datasets
- Does a model trained purely on physics equations generalize to real experiments?
