# Fileset

[README.md](https://mdr.nims.go.jp/filesets/3b228561-0177-462e-8024-762060f5880d/download)

## Creator

[Kosuke Nakano](https://orcid.org/0000-0001-7756-4355), Stefano Battaglia, Jürg Hutter

## Rights



## Other metadata

[Benchmark Dataset used in the paper "Fast Evaluation of Unbiased Atomic Forces in ab initio Variational Monte Carlo via the Lagrangian Technique"](https://mdr.nims.go.jp/datasets/24b8ea43-5201-447e-b341-7f4d66f989fe)

## Fulltext

# Benchmark Dataset used in the paper **Fast Evaluation of Unbiased Atomic Forces in ab initio Variational Monte Carlo via the Lagrangian Technique**This repository provides the **structures**, **input files**, **output files**, and a **Jupyter notebook** used for the paper [arXiv:2511.05222](https://doi.org/10.48550/arXiv.2511.05222)`Archive.zip` (SHA256:ac3f4832c268b54fb21df9a690d432e8831367790231c2798240074050e5576f) contains four directories:- `structures_org/`- `structures_shifted/`- `results/`- `analysis/`The structures are taken from the [rMD17 dataset](https://doi.org/10.6084/m9.figshare.12672038).---## Overview### 1) `structures_org/` and `structures_shifted/`These directories contain the structures (XYZ format) for three molecules used for the rMD17 benchmark test in the above paper. - `ethanol/`- `malonaldehyde/`- `benzene/`Each molecule directory contains **100 structures** extracted from the rMD17 dataset.#### Naming conventionEach structure file is named using three indices:- `{molecule}_0_idx0_old8.xyz`where:- The first integer (before `_idx`) is the **consecutive index in this benchmark set**.- The second integer (after `idx`) is the **consecutive index in the rMD17 dataset**.- The third integer (after `old`) is the **index of the conformation in the original MD17 dataset**.#### Difference between `_org/` and `_shifted/`:- `structures_org/` contains the structures **exactly as extracted** from the distributed rMD17 dataset.- `structures_shifted/` contains the same structures after shifting the **molecular centroid** to the following coordinates: - ethanol: `[9.63857306, 9.63857306, 9.63857306]` - malonaldehyde: `[9.87397841, 9.87397841, 9.87397841]` - benzene: `[10.07516, 10.07516, 10.07516]`where the unit is angstrom. The shift is applied because one should place a molecule at the center of the simulation cell in CP2K calculations.---### 2) `results/`This directory contains the input and output files used for/obtained by **Psi4** and **TurboRVB**. **CP2K** was used to generate trial wavefunctions for **TurboRVB**.---#### A) `psi4-xxx/`All-electron DFT and CC calculations performed by Psi4.- `{molecule}.xyz`  Structure.- `{molecule}_E_F.xyz`   Energies and forces stored in the extended XYZ (extxyz) format (units are eV and eV/angstrom, respectively).- `run_psi4.py`   Psi4 running script.- `psi.out`   Psi4 output file.**Basis sets**:- `def2-QZVPPD` for all DFT calculations- `cc-pVQZ` for HF, MP2, CCSD, and CCSD(T)---#### B) `cp2k-xxx/`CP2K DFT calculations with effective core potential (ECP) used to generate trial wavefunctions for TurboRVB.- `{molecule}.xyz`   Structure.- `{molecule}.inp`   CP2K input file.- `basis.cp2k`   Basis set in the CP2K format.- `ecp.cp2k`   Effective core potential (ECP)  in the CP2K format.- `{molecule}.out`   CP2K output file.- `{molecule}-TREXIO.h5`   Generated TREX-IO file.---#### C) `cp2k-xxx-lr/`Linear-response calculations with effective core potential (ECP) using the VMC parameter derivatives obtained from wavefunctions stored in `turborvb-vmc-JSD/`.- `{molecule}.xyz`   Structure used for the calculation.- `{molecule}.inp`   CP2K input file.- `basis.cp2k`   Basis set in the CP2K format.- `ecp.cp2k`   Effective core potential (ECP) in the CP2K format.- `{molecule}.out`   CP2K output file.- `{molecule}-TREXIO.h5`   TREXIO file.- `{molecule}-TREXIO.dEdP.dat`   Parameter derivatives generated by TurboRVB.- `{molecule}-resp.frc`   Force corrections obtained from the linear-response calculation (unit is Ha/bohr).---#### D) `turborvb-vmc-JSD/`VMC calculations with effective core potential (ECP) using the Jastrow-Slater determinant (JSD) wavefunction with the frozen DFT orbitals. The parameter derivatives used by `cp2k-xxx-lr/` are obtained with the wavefunction stored here.- `{molecule}.xyz`   Structure used for the calculation.- `{molecule}_E_bF.xyz`   Energies and **biased** forces in extxyz format (units are eV and eV/angstrom, respectively).- `{molecule}_E_cF.xyz`   Energies and **unbiased** forces in extxyz format (units are eV and eV/angstrom, respectively).- `vmc_0.input`, `vmc_1.input`   TurboRVB input files.- `wavefunction.dat`   Manybody wavefunction in the TurboRVB format.- `pseudo.dat`   Effective core potential (ECP) in the TurboRVB format.- `vmc_0.output`, `vmc_1.output`   TurboRVB output files.- `energy.dat`   Energy (unit: Hartree).- `forces.dat`   Biased forces (unit: Hartree/bohr).---#### E) `turborvb-vmc-JSDopt/`VMC calculations with effective core potential (ECP) using the Jastrow-Slater determinant (JSD) wavefunction, where **all variational parameters** (both in Jastrow and determinant parts) are optimized.- `{molecule}.xyz`   Structure.- `{molecule}_E_F.xyz`   Energies and forces in the extxyz format (units are eV and eV/angstrom, respectively).- `vmc_0.input`, `vmc_1.input`   TurboRVB input files.- `wavefunction.dat`   Manybody wavefunction in the TurboRVB format.- `pseudo.dat`   Effective core potential in the TurboRVB format.- `vmc_0.output`, `vmc_1.output`   TurboRVB output files.- `energy.dat`   Energy (unit: Hartree).- `forces.dat`   Forces (unit: Hartree/bohr).---#### JSON summary filesEach setting directory also contains a JSON summary file:- `XXXX-E-F-summary.json`This JSON file contains:- structure index,- atomic positions (unit is angstrom),- energies and forces (units are eV and eV/angstrom, respectively),- and (for VMC calculations) error bars.for all structures computed using the given setting.---### 3) `analysis/`This directory contains the Jupyter notebook used to analyze the benchmark results:- `plot_tables_and_graphs.ipynb`---## LICENSE- This data is distributed under the CC0 LICENSE.