Article Load-balanced diffusion Monte Carlo method with lattice regularization

Kousuke Nakano SAMURAI ORCID (National Institute for Materials Science) ; Sandro Sorella ; Michele Casula

Collection

Citation
Kousuke Nakano, Sandro Sorella, Michele Casula. Load-balanced diffusion Monte Carlo method with lattice regularization. The Journal of Chemical Physics. 2025, 163 (19), 194117. https://doi.org/10.1063/5.0296986

Description:

(abstract)

Ab initio quantum Monte Carlo (QMC) is a stochastic approach for solving the many-body Schrödinger equation without resorting to one-body approximations. QMC algorithms are readily parallelizable via ensembles of Nw walkers, making them well suited to large-scale high-performance computing. Among the QMC techniques, diffusion Monte Carlo (DMC) is widely regarded as the most reliable since it provides the projection onto the ground state of a given Hamiltonian under the fixed-node approximation. One practical realization of DMC is the lattice regularized diffusion Monte Carlo (LRDMC) method, which discretizes the Hamiltonian within the Green’s function Monte Carlo framework. DMC methods—including LRDMC—employ the so-called branching technique to stabilize walker weights and populations. At the branching step, walkers must be synchronized globally; any imbalance in per-walker workload can leave central processing unit (CPU) or graphics processing unit (GPU) cores idle, thereby degrading overall hardware utilization. The conventional LRDMC algorithm intrinsically suffers from such load imbalance, which grows as log(Nw), rendering it less efficient on modern parallel architectures. In this work, we present an LRDMC algorithm that inherently addresses the load imbalance issue and achieves significantly improved weak-scaling parallel efficiency. Using the binding energy calculation of a water–methane complex as a test case, we demonstrated that the conventional and load-balanced LRDMC algorithms yield consistent results. Furthermore, by utilizing the Leonardo supercomputer equipped with NVIDIA A100 GPUs, we demonstrated that the load-balanced LRDMC algorithm can maintain extremely high parallel efficiency (∼98%) up to 512 GPUs (corresponding to Nw = 51 200), together with a speedup of ×1.24 if directly compared with the conventional LRDMC algorithm with the same number of walkers. The speedup stays sizable, i.e., × 1.18, even if the number of walkers is reduced to Nw = 400.

Rights:

Keyword: Quantum Monte Carlo, Variational Monte Carlo, Diffusion Monte Carlo, Diffusion Monte Carlo method with lattice regularization

Date published: 2025-11-21

Publisher: AIP Publishing

Journal:

  • The Journal of Chemical Physics (ISSN: 00219606) vol. 163 issue. 19 194117

Funding:

  • Ministry of Education, Culture, Sports, Science and Technology JPMXS0320220025
  • Japan Science and Technology Agency JPMJPR24J9
  • European High Performance Computing Joint Undertaking EHPC-EXT-2024E01-064
  • European High Performance Computing Joint Undertaking HANAMI project (Hpc AlliaNce for Applications and supercoMputing Innovation)

Manuscript type: Publisher's version (Version of record)

MDR DOI:

First published URL: https://doi.org/10.1063/5.0296986

Related item:

Other identifier(s):

Contact agent:

Updated at: 2025-12-02 08:30:15 +0900

Published on MDR: 2025-12-02 08:23:28 +0900

Filename Size
Filename 194117_1_5.0296986.pdf (Thumbnail)
application/pdf
Size 5.76 MB Detail