Data Exploration & Visualization for High-Dimensional Material Property Maps
Location: Empa, Thun and University of Bern
If you are interested in this topic or have further questions, do not hesitate to contact kaspar.riesen@unibe.ch.
Background / Context
The Laboratory for Mechanics of Materials and Nanostructures at Empa (Thun) is seeking a creative student to develop new data exploration and visualization approaches.
Our lab generates high-quality multi-modal datasets derived from real-world physical experiments. A typical single dataset consists of hundreds of datapoints points, acquired across chemically graded sample surfaces on a single regular grid. Each data point represents a complex feature vector, encoding chemical composition and crystal structure alongside mechanical, electrical, magnetic, and optical properties. When properly interpolated, these grids reveal continuous property maps, effectively simulating thousands of distinct mate-rials in a single experimental run.
We have automated the generation of such high-dimensional raw data and are working hard on efficient and automated feature extraction. However, for the last step towards scientific advances, we lack the capacity to fully exploit and interpret this high-dimensional information. We need you to implement strategies for complexity reduction and pattern recognition, enabling the discovery of critical correlations within the mapped material properties.
In this project, you will bridge the gap between computer science and physical data. You will learn how experimental measurements translate into our understanding of the materials world, and how data anomalies can signal engineering progress. You will develop your own pipeline to analyze, correlate, and interpret this rich dataset, collaborating closely with material scientists and your computer science supervisor. Building on your new skills, you will finally explore visualization, dimensionality reduction, and interactive clustering techniques that turn raw numbers into scientific insight beyond human intuition.
Objectives and Approach
- Data Ingestion & Preprocessing: Familiarize yourself with the data types and structure.
- Develop robust scripts to ingest, clean, and normalize the multi-modal data.
- Dimensionality Reduction & Clustering: Implement and compare dimensionality reduction techniques to project high-dimensional feature vectors into interpretable low-dimensional spaces.
- Apply unsupervised clustering algorithms to identify distinct material based on properties automatically
- Correlation Engine Development: Build algorithms to systematically detect non-linear correlations
Expected Contributions / Outcomes
Throughout the project, you will act as the bridge between two worlds.
- You will collaborate closely with material scientists and your computer science supervisor who guides the methodology.
- You will participate in project meetings, translating abstract data patterns into concrete scientific observations.
- This project will strengthen your expertise in Applied Data Science and Scientific Computing.
Required Skills / Prerequisites
- Python, Data Visualization libraries, and Git/Version Control, while mastering the soft skill of communicating complex technical concepts to non-computer-scientist col-leagues.
Possible Extensions
Depending on your interests and rate of advancement, you can also deep dive into:
- User Experience (UX) for Science: We are also actively developing end-user software in a modern Python-React stack. You can refine your scripts intuitive and interactive use by non-experts to sup-port intuitive scientific workflows.
- Physics-Informed ML: Incorporating known physical constraints and models into your machine learning models to improve prediction accuracy on sparse data.
