|
Virtual Environments
|
Virtual environments isolate dependencies.
Using Conda or venv for creating environments.
Activating and deactivating virtual environments.
|
|
Data Sources
|
Essential libraries for accessing online data sources
Data retrieval from the Materials Project API
Working with data augmenatation from multiple sources
|
|
Machine Learning Fundamentals
|
Data representations are crucial for ML in material science
ML algorithms like linear regression, k-nearest neighbors,support vector Machine, xgboost and random forests are vital algorithms
Supervised learning is a popular ML approach, with decision trees, random forests, and neural networks being widely used
Data representations are fundamental in materials science machine learning, enabling effective encoding of material structures and properties for predictive modeling.
Fundamentals of data engineering are crucial for building robust ML pipelines, including data storage, processing, and serving
|
|
Deep Learing Fundamentals
|
Deep Learning algorithms are often represented as graph computation
We have different non-linear activation functions that help in learning different relationships to solve handle non-linearity in nn problems.
-
|
|
GNNs for Materials
|
|
|
Adsorption Energies
|
Oxide electrocatalysts are critical for the oxygen evolution reaction (OER), but lack sufficient training data for machine learning (ML) models.
Existing datasets like OC20 focus primarily on metallic catalysts and adsorption energies.
OC22 fills this gap by providing a large-scale dataset for oxide materials.
Several graph neural networks (GNNs) were tested, including GemNet-OC, GemNet-dT, SpinConv, Equiformer
The OC22 dataset is a major step toward ML-driven discovery of oxide electrocatalysts.
By shifting from adsorption-energy-specific tasks to generalized total energy prediction, it enables broader scientific applications and better model generalization, especially when combined with prior datasets like OC20.
|
|
UMA Models
|
UMA is an equivariant GNN that leverages a novel technique called Mixture of Linear Experts (MoLE) to give it the capacity to learn the largest multi-modal dataset
UMA is trained on 5 different DFT datasets with different levels of theory. An UMA task refers to a specific level of theory associated with that DFT dataset.
|
|
Active Learning
|
|
|
Materials Graph Library
|
Virtual environments isolate dependencies.
Using Conda or venv for creating environments.
Activating and deactivating virtual environments.
|