Machine Learning Methods For Material Sciences

Key Points

Virtual Environments	Virtual environments isolate dependencies. Using Conda or venv for creating environments. Activating and deactivating virtual environments.
Data Sources	Essential libraries for accessing online data sources Data retrieval from the Materials Project API Working with data augmenatation from multiple sources
Machine Learning Fundamentals	Data representations are crucial for ML in material science ML algorithms like linear regression, k-nearest neighbors,support vector Machine, xgboost and random forests are vital algorithms Supervised learning is a popular ML approach, with decision trees, random forests, and neural networks being widely used Data representations are fundamental in materials science machine learning, enabling effective encoding of material structures and properties for predictive modeling. Fundamentals of data engineering are crucial for building robust ML pipelines, including data storage, processing, and serving
Deep Learing Fundamentals	Deep Learning algorithms are often represented as graph computation We have different non-linear activation functions that help in learning different relationships to solve handle non-linearity in nn problems.
GNNs for Materials
Adsorption Energies	Oxide electrocatalysts are critical for the oxygen evolution reaction (OER), but lack sufficient training data for machine learning (ML) models. Existing datasets like OC20 focus primarily on metallic catalysts and adsorption energies. OC22 fills this gap by providing a large-scale dataset for oxide materials. Several graph neural networks (GNNs) were tested, including GemNet-OC, GemNet-dT, SpinConv, Equiformer The OC22 dataset is a major step toward ML-driven discovery of oxide electrocatalysts. By shifting from adsorption-energy-specific tasks to generalized total energy prediction, it enables broader scientific applications and better model generalization, especially when combined with prior datasets like OC20.
UMA Models	UMA is an equivariant GNN that leverages a novel technique called Mixture of Linear Experts (MoLE) to give it the capacity to learn the largest multi-modal dataset UMA is trained on 5 different DFT datasets with different levels of theory. An UMA task refers to a specific level of theory associated with that DFT dataset.
Active Learning
Materials Graph Library	Virtual environments isolate dependencies. Using Conda or venv for creating environments. Activating and deactivating virtual environments.