Machine Learning Methods For Material Sciences

Key Points

Virtual Environments
  • Virtual environments isolate dependencies.

  • Using Conda or venv for creating environments.

  • Activating and deactivating virtual environments.

Data Sources
  • Essential libraries for accessing online data sources

  • Data retrieval from the Materials Project API

  • Working with data augmenatation from multiple sources

Machine Learning Fundamentals
  • Data representations are crucial for ML in material science

  • ML algorithms like linear regression, k-nearest neighbors,support vector Machine, xgboost and random forests are vital algorithms

  • Supervised learning is a popular ML approach, with decision trees, random forests, and neural networks being widely used

  • Data representations are fundamental in materials science machine learning, enabling effective encoding of material structures and properties for predictive modeling.

  • Fundamentals of data engineering are crucial for building robust ML pipelines, including data storage, processing, and serving

Deep Learing Fundamentals
  • Deep Learning algorithms are often represented as graph computation

  • We have different non-linear activation functions that help in learning different relationships to solve handle non-linearity in nn problems.

GNNs for Materials
Adsorption Energies
  • Oxide electrocatalysts are critical for the oxygen evolution reaction (OER), but lack sufficient training data for machine learning (ML) models.

  • Existing datasets like OC20 focus primarily on metallic catalysts and adsorption energies.

  • OC22 fills this gap by providing a large-scale dataset for oxide materials.

  • Several graph neural networks (GNNs) were tested, including GemNet-OC, GemNet-dT, SpinConv, Equiformer

  • The OC22 dataset is a major step toward ML-driven discovery of oxide electrocatalysts.

  • By shifting from adsorption-energy-specific tasks to generalized total energy prediction, it enables broader scientific applications and better model generalization, especially when combined with prior datasets like OC20.

UMA Models
  • UMA is an equivariant GNN that leverages a novel technique called Mixture of Linear Experts (MoLE) to give it the capacity to learn the largest multi-modal dataset

  • UMA is trained on 5 different DFT datasets with different levels of theory. An UMA task refers to a specific level of theory associated with that DFT dataset.

Active Learning
Materials Graph Library
  • Virtual environments isolate dependencies.

  • Using Conda or venv for creating environments.

  • Activating and deactivating virtual environments.