OHPSO Module

Particle Swarm Optimization (PSO) for molecular design in the JTVAE latent space.

Table of Contents

Overview

The OHPSO module implements Particle Swarm Optimization for molecular design in the latent space of the Junction Tree VAE (OHVAE). This enables:

  • Multi-objective optimization of molecular properties
  • Exploration of chemical space around seed molecules
  • Property-guided molecular generation
  • Integration with HEM property predictors (EC, EWU, ESR)

Module Structure

OHMind/OHPSO/
├── optimizer.py         # PSO optimizer classes
├── swarm.py             # Swarm class
├── inference.py         # VAE inference wrapper
├── EC.py                # Effective Conductivity predictor
├── EWU.py               # Effective Water Uptake predictor
├── ESR.py               # Effective Swelling Ratio predictor
├── alkaline.py          # Alkaline stability predictor
├── objectives/
│   ├── scoring.py       # ScoringFunction class
│   ├── mol_functions.py # Molecular scoring functions
│   └── config.yaml      # Scoring configuration
└── data/                # Model weights and data

Architecture

graph TD
    subgraph "PSO Optimization"
        Init[Initialize Swarm] --> Encode[Encode SMILES]
        Encode --> Particles[Particle Positions]
        Particles --> Update[Update Velocities]
        Update --> Move[Move Particles]
        Move --> Decode[Decode to SMILES]
        Decode --> Score[Score Molecules]
        Score --> Fitness[Update Fitness]
        Fitness --> Best[Update Best Positions]
        Best --> Update
    end
    
    subgraph "Scoring"
        Score --> EC[EC Score]
        Score --> EWU[EWU Score]
        Score --> ESR[ESR Score]
        Score --> Alkaline[Alkaline Score]
        Score --> QED[QED Score]
        EC --> Desirability[Desirability Scaling]
        EWU --> Desirability
        ESR --> Desirability
        Alkaline --> Desirability
        QED --> Desirability
        Desirability --> Weighted[Weighted Average]
    end
    
    subgraph "Components"
        JTVAE[JTVAE Model]
        Predictors[Property Predictors]
    end

PSO Algorithm

The optimizer uses a modified PSO with three velocity components:

v_new = w * v_old + φ1 * r1 * (p_best - x) + φ2 * r2 * (g_best - x) + φ3 * r3 * (h_best - x)

Where:

  • w: Inertia weight (default: 0.9)
  • φ1, φ2, φ3: Acceleration coefficients (default: 2.0)
  • p_best: Particle’s personal best position
  • g_best: Swarm’s global best position
  • h_best: Historical best position (randomly selected)

Key Classes

BasePSOptimizer

Main optimizer class for PSO-based molecular optimization.

from OHMind.OHPSO.optimizer import BasePSOptimizer

class BasePSOptimizer:
    def __init__(self, swarms, inference_model, scoring_functions=None):
        """
        Initialize PSO optimizer.
        
        Parameters
        ----------
        swarms : list[Swarm]
            List of Swarm objects to optimize
        inference_model : InferenceModel
            VAE model for encoding/decoding
        scoring_functions : list[ScoringFunction], optional
            Scoring functions for fitness evaluation
        """

Class Methods

Method Description
from_query(init_smiles, num_part, num_swarms, ...) Create optimizer from single SMILES
from_query_list(init_smiles, num_part, num_swarms, ...) Create optimizer from SMILES list
from_swarm_dicts(swarm_dicts, ...) Create optimizer from saved swarm dictionaries

Instance Methods

Method Description Returns
run(num_steps, num_track) Run optimization loop list[Swarm]
update_fitness(swarm) Calculate fitness for swarm Swarm

Example Usage

from OHMind.OHPSO.optimizer import BasePSOptimizer
from OHMind.OHPSO.inference import InferenceModel
from OHMind.OHPSO.objectives.scoring import ScoringFunction
from OHMind.OHPSO.objectives.mol_functions import ec_score, ewu_score

# Load inference model
model = InferenceModel(
    model_path="path/to/model.pt",
    vocab_path="path/to/vocab.txt"
)

# Define scoring functions
scoring_funcs = [
    ScoringFunction(
        func=ec_score,
        name="EC",
        desirability=[{"x": 0, "y": 0}, {"x": 100, "y": 1}],
        weight=100,
        is_mol_func=True
    ),
    ScoringFunction(
        func=ewu_score,
        name="EWU",
        desirability=[{"x": 0, "y": 1}, {"x": 100, "y": 0}],
        weight=50,
        is_mol_func=True
    )
]

# Create optimizer
optimizer = BasePSOptimizer.from_query(
    init_smiles="C[N+]1(C)CCCCC1",
    num_part=200,
    num_swarms=1,
    inference_model=model,
    scoring_functions=scoring_funcs
)

# Run optimization
swarms = optimizer.run(num_steps=10, num_track=50)

# Get best solutions
print(optimizer.best_solutions.head())

Swarm

Represents a particle swarm for optimization.

from OHMind.OHPSO.swarm import Swarm

class Swarm:
    def __init__(self, smiles, x, v, x_min=-1., x_max=1.,
                 inertia_weight=0.9, phi1=2., phi2=2., phi3=2.):
        """
        Initialize a particle swarm.
        
        Parameters
        ----------
        smiles : list[str]
            SMILES for each particle
        x : np.ndarray
            Particle positions (num_part, latent_dim)
        v : np.ndarray
            Particle velocities (num_part, latent_dim)
        x_min, x_max : float
            Position bounds
        inertia_weight : float
            Velocity inertia
        phi1, phi2, phi3 : float
            Acceleration coefficients
        """

Attributes

Attribute Type Description
smiles list[str] Current SMILES for particles
x np.ndarray Particle positions
v np.ndarray Particle velocities
fitness np.ndarray Current fitness values
swarm_best_x np.ndarray Best position found by swarm
swarm_best_fitness float Best fitness found
particle_best_x np.ndarray Personal best positions
particle_best_fitness np.ndarray Personal best fitness values
best_smiles str SMILES of best solution

Methods

Method Description
next_step() Update particle positions
update_fitness(fitness) Update fitness and best positions
to_dict() Export swarm to dictionary
from_dict(dictionary, ...) Create swarm from dictionary
from_query(init_sml, init_emb, num_part, ...) Create swarm from query

ScoringFunction

Wraps scoring functions with desirability scaling.

from OHMind.OHPSO.objectives.scoring import ScoringFunction

class ScoringFunction:
    def __init__(self, func, name, description=None, desirability=None,
                 truncate_left=True, truncate_right=True, weight=100,
                 is_mol_func=False):
        """
        Create a scoring function with desirability scaling.
        
        Parameters
        ----------
        func : callable
            Scoring function (mol -> score or embedding -> score)
        name : str
            Unique name for bookkeeping
        description : str, optional
            Function description
        desirability : list[dict], optional
            Points defining desirability curve [{"x": x, "y": y}, ...]
        truncate_left : bool
            Truncate desirability at left boundary
        truncate_right : bool
            Truncate desirability at right boundary
        weight : float
            Weight in multi-objective optimization
        is_mol_func : bool
            True if function takes RDKit mol, False for embeddings
        """

Desirability Curves

Desirability curves map raw scores to [0, 1] range:

# Linear desirability (higher is better)
desirability = [{"x": 0, "y": 0}, {"x": 100, "y": 1}]

# Inverse desirability (lower is better)
desirability = [{"x": 0, "y": 1}, {"x": 100, "y": 0}]

# Target range desirability
desirability = [
    {"x": 0, "y": 0},
    {"x": 50, "y": 1},
    {"x": 100, "y": 1},
    {"x": 150, "y": 0}
]

InferenceModel

Wrapper for JTVAE encoding/decoding.

from OHMind.OHPSO.inference import InferenceModel

class InferenceModel:
    def __init__(self, model_path, vocab_path, use_gpu=False,
                 batch_size=100, latent_size=56, hidden_size=450,
                 depthT=20, depthG=3):
        """
        Initialize inference model.
        
        Parameters
        ----------
        model_path : str
            Path to trained JTVAE model
        vocab_path : str
            Path to vocabulary file
        use_gpu : bool
            Use GPU for inference
        batch_size : int
            Batch size for encoding
        latent_size : int
            Latent space dimension
        hidden_size : int
            Hidden layer dimension
        depthT, depthG : int
            Tree and graph encoding depths
        """

Methods

Method Description Returns
smi_to_emb(smiles) Encode SMILES to latent np.ndarray
emb_to_smi(embedding) Decode latent to SMILES list[str]

Scoring Functions

Property Predictors

HEM-specific property prediction models:

MLPEC (Effective Conductivity)

from OHMind.OHPSO.EC import MLPEC

model = MLPEC(input_size=570, n_hidden1=120, n_hidden2=40, n_output=1)

Predicts OH⁻ conductivity (mS/cm) from:

  • Cation latent vector (56 dim)
  • Backbone fingerprint (512 dim)
  • Temperature and thickness (2 dim)

MLPEWU (Effective Water Uptake)

from OHMind.OHPSO.EWU import MLPEWU

model = MLPEWU(input_size=570, n_hidden1=120, n_hidden2=40, n_output=1)

Predicts water uptake (wt%) with same input features.

MLPESR (Effective Swelling Ratio)

from OHMind.OHPSO.ESR import MLPESR

model = MLPESR(input_size=570, n_hidden1=120, n_hidden2=40, n_output=1)

Predicts swelling ratio with same input features.

MLPAlkaline (Alkaline Stability)

from OHMind.OHPSO.alkaline import MLPAlkaline

model = MLPAlkaline(input_size=112, n_hidden1=200, n_hidden2=50, n_output=1)

Predicts alkaline stability from cation pair latent vectors.

Molecular Functions

Built-in scoring functions in objectives/mol_functions.py:

Function Description Input
ec_score(mol) Effective conductivity RDKit mol
ewu_score(mol) Effective water uptake RDKit mol
esr_score(mol) Effective swelling ratio RDKit mol
alkaline_score(mol) Alkaline stability RDKit mol
qed_score(mol) Quantitative drug-likeness RDKit mol
sa_score(mol) Synthetic accessibility RDKit mol
logp_score(mol) Crippen LogP RDKit mol
tan_sim(mol, ref_smiles) Tanimoto similarity RDKit mol
heavy_atom_count(mol) Heavy atom count RDKit mol
molecular_weight(mol) Molecular weight RDKit mol
tox_alert(mol) Toxicity alert check RDKit mol

Desirability Curves

Example desirability configurations:

# High conductivity is desirable
ec_desirability = [
    {"x": 0, "y": 0},
    {"x": 50, "y": 0.5},
    {"x": 100, "y": 0.9},
    {"x": 150, "y": 1.0}
]

# Low water uptake is desirable
ewu_desirability = [
    {"x": 0, "y": 1.0},
    {"x": 20, "y": 0.8},
    {"x": 50, "y": 0.3},
    {"x": 100, "y": 0}
]

# Target swelling ratio range
esr_desirability = [
    {"x": 0, "y": 0},
    {"x": 10, "y": 0.8},
    {"x": 20, "y": 1.0},
    {"x": 30, "y": 1.0},
    {"x": 50, "y": 0.5},
    {"x": 100, "y": 0}
]

Usage Examples

Basic Optimization

from OHMind.OHPSO.optimizer import BasePSOptimizer
from OHMind.OHPSO.inference import InferenceModel
from OHMind.OHPSO.objectives.scoring import ScoringFunction
from OHMind.OHPSO.objectives.mol_functions import qed_score, sa_score

# Setup
model = InferenceModel(
    model_path="data/model.epoch-29",
    vocab_path="data/vocab.txt"
)

# Define objectives
scoring_funcs = [
    ScoringFunction(
        func=qed_score,
        name="QED",
        desirability=[{"x": 0, "y": 0}, {"x": 1, "y": 1}],
        weight=100,
        is_mol_func=True
    ),
    ScoringFunction(
        func=sa_score,
        name="SA",
        desirability=[{"x": 1, "y": 1}, {"x": 5, "y": 0.5}, {"x": 10, "y": 0}],
        weight=50,
        is_mol_func=True
    )
]

# Create and run optimizer
optimizer = BasePSOptimizer.from_query(
    init_smiles="c1ccccc1",
    num_part=100,
    num_swarms=1,
    inference_model=model,
    scoring_functions=scoring_funcs,
    phi1=2.0,
    phi2=2.0,
    phi3=2.0
)

swarms = optimizer.run(num_steps=20, num_track=10)

# Results
print("Best solutions:")
print(optimizer.best_solutions)

HEM Cation Optimization

from OHMind.OHPSO.optimizer import BasePSOptimizer
from OHMind.OHPSO.inference import InferenceModel
from OHMind.OHPSO.objectives.scoring import ScoringFunction
from OHMind.OHPSO.objectives.mol_functions import ec_score, ewu_score, alkaline_score

# Load model
model = InferenceModel(
    model_path="data/model.epoch-29",
    vocab_path="data/vocab.txt"
)

# Multi-objective scoring for HEM
scoring_funcs = [
    ScoringFunction(
        func=ec_score,
        name="Conductivity",
        desirability=[{"x": 0, "y": 0}, {"x": 100, "y": 1}],
        weight=100,
        is_mol_func=True
    ),
    ScoringFunction(
        func=ewu_score,
        name="WaterUptake",
        desirability=[{"x": 0, "y": 1}, {"x": 50, "y": 0.5}, {"x": 100, "y": 0}],
        weight=50,
        is_mol_func=True
    ),
    ScoringFunction(
        func=alkaline_score,
        name="Stability",
        desirability=[{"x": 0, "y": 0}, {"x": 1, "y": 1}],
        weight=80,
        is_mol_func=True
    )
]

# Start from known cation
optimizer = BasePSOptimizer.from_query(
    init_smiles="C[N+]1(C)CCCCC1",  # N,N-dimethylpiperidinium
    num_part=200,
    num_swarms=1,
    inference_model=model,
    scoring_functions=scoring_funcs
)

# Run optimization
swarms = optimizer.run(num_steps=15, num_track=50)

# Export results
optimizer.best_solutions.to_csv("optimized_cations.csv", index=False)

Multiple Swarms

# Initialize multiple swarms from different starting points
init_smiles_list = [
    "C[N+]1(C)CCCCC1",      # Piperidinium
    "C[N+](C)(C)C",          # Tetramethylammonium
    "c1cc[nH+]cc1"           # Pyridinium
]

optimizer = BasePSOptimizer.from_query_list(
    init_smiles=init_smiles_list,
    num_part=100,
    num_swarms=3,
    inference_model=model,
    scoring_functions=scoring_funcs
)

swarms = optimizer.run(num_steps=20, num_track=30)

Resume Optimization

# Save swarm state
swarm_dicts = [swarm.to_dict() for swarm in optimizer.swarms]

# Later, resume from saved state
optimizer = BasePSOptimizer.from_swarm_dicts(
    swarm_dicts=swarm_dicts,
    inference_model=model,
    scoring_functions=scoring_funcs
)

swarms = optimizer.run(num_steps=10, num_track=50)

Configuration

Scoring Configuration (config.yaml)

# Backbone polymer configuration
phi: "c1ccc(C(C)(C)c2ccccc2)cc1"  # Hydrophilic monomer
pho: "c1ccccc1"                    # Hydrophobic monomer
frac: 0.5                          # Hydrophilic fraction
temperature: 80                    # Temperature (°C)
thickness: 50                      # Membrane thickness (μm)

PSO Hyperparameters

Parameter Default Description
num_part 200 Particles per swarm
num_swarms 1 Number of swarms
phi1 2.0 Personal best acceleration
phi2 2.0 Global best acceleration
phi3 2.0 Historical best acceleration
inertia_weight 0.9 Velocity inertia
x_min -1.0 Position lower bound
x_max 1.0 Position upper bound
v_min -0.6 Initial velocity lower bound
v_max 0.6 Initial velocity upper bound

API Reference

BasePSOptimizer.from_query

@classmethod
def from_query(cls, init_smiles, num_part, num_swarms, inference_model,
               scoring_functions=None, phi1=2., phi2=2., phi3=2.,
               x_min=-1., x_max=1., v_min=-0.6, v_max=0.6, **kwargs):
    """
    Create optimizer from query SMILES.
    
    Parameters
    ----------
    init_smiles : str or list[str]
        Starting SMILES (if list, randomly sample num_part)
    num_part : int
        Number of particles per swarm
    num_swarms : int
        Number of swarms
    inference_model : InferenceModel
        VAE model for encoding/decoding
    scoring_functions : list[ScoringFunction]
        Scoring functions for evaluation
    phi1, phi2, phi3 : float
        PSO acceleration coefficients
    x_min, x_max : float
        Position bounds
    v_min, v_max : float
        Initial velocity bounds
        
    Returns
    -------
    BasePSOptimizer
        Initialized optimizer
    """

BasePSOptimizer.run

def run(self, num_steps, num_track=10):
    """
    Run optimization loop.
    
    Parameters
    ----------
    num_steps : int
        Number of optimization steps
    num_track : int
        Number of best solutions to track
        
    Returns
    -------
    list[Swarm]
        Optimized swarms
    """

See Also


Last updated: 2025-12-22 | OHMind v1.0.0


PolyAI Team
Copyright © 2009-2025 Changchun Institute of Applied Chemistry, Chinese Academy of Sciences
Address: No. 5625, Renmin Street, Changchun, Jilin, China. Postal Code: 130022