Running Qbics

Qbics should be run from Windows command prompt or Linux/macOS terminal.

To run Qbics, you just need to give an input file name. We prepare an input file called water.inp:

water.inp
 1# A B3LYP/cc-pvdz calculation for water.
 2basis
 3    cc-pvdz
 4end
 5
 6scf
 7    charge  0 # Total charge.
 8    spin2p1 1
 9end
10
11mol
12    O   0.00000000000000      0.05011194954430      0.05011194954224
13    H   0.00000000000000     -0.06080277603381      1.01069082652926
14    H   0.00000000000000      1.01069082648951     -0.06080277607149
15end
16
17task
18  energy b3lyp
19end

Command Line Arguments

The usage of Qbics is:

qbics-linux-cpu <name> [-n <number>] [-s <path>] [-m <size>] [-d <size>] [--gpu <ids>]

You can use this command to run Qbics:

$ qbics-linux-cpu water.inp > water.out

The optional arguments are explained below:

-n

Value

Define the number of OpenMP threads for each MPI process.

Default

1

The value should be less than the number of physical CPU cores of the node it is run on.

-s

Value

Define the scratch path where omputational temporary files are saved.

Default

./

Qbics will use this path to write some computational temporary files. It should be on a local, fast, and large disk, and not remote ones, like NFS shared paths. For Windows users, the scratch path should be given in Linux format. For example, if the scratch path is D:\Jobs\Scratch (Windows format), then for Qbics you should give -d D:/Jobs/Scratch.

-m

Value

Define the maximum memory size in GB that a MPI process can use

Default

Unlimited

For example, -m 5.5 means that each MPI process will use up to 5.5 GB of memory, no matter how many OpenMP threads there are. Of course, it should not exceed the total memory size of the node.

-d

Value

Define the maximum disk size in GB that a MPI process can use in the scratch path.

Default

Unlimited

For example, -d 900 means that each MPI process will use up to 900 GB of disk, no matter how many OpenMP threads there are. Of course, it should not exceed the total disk size in the scratch path.

--gpu

Value

Define GPU device IDs to be used.

Default

0

For example, --gpu 0,2,3 means that Qbics will use GPU device of ID 0, 2, 3 to do calculations.

Here is an example of running Qbics:

$ qbics-linux-cpu water.inp -n 8 -m 30 -d 500 -s /scratch/zhang > water.out

This command will run Qbics with an input file water.inp. The number of OpenMP threads is 8, maximum memory and disk size is 30 GB and 500 GB, respectively, and the scratch path is /scratch/zhang.

Run Qbics on a Single Node with GPU

If GPU devices are available, you can just run GPU version of Qbics like before, and Qbics will automatically use GPU if possible:

$ qbics-linux-gpu water.inp -n 8 > water.out

In water.out, Qbics will output the GPU found, and only use 0:

water.out
 1MPI is disabled in this version.
 2# Nodes: 1
 3   ID                       Hostname          Memory (GB)   #Cores  #OpenMP
 4    0                  ubuntu-server                  251       96        1
 5CUDA Device to be used: 0
 6CUDA Device:
 7On node 0, ubuntu-server:
 8  4 CUDA device is available:
 9    0: NVIDIA GeForce RTX 4080
10       Computational ability: 8.9
11       Global memory:         16079 MB
12       Block-shared memory:   48 KB = 6144 double
13       Constant memory:       64 KB = 8192 double
14       Maximum threads per block: 1024
15       Maximum thread dimension:  1024, 1024, 64
16       Maximum grid dimension:    2147483647, 65535, 65535
17    1: NVIDIA GeForce RTX 4080
18       Computational ability: 8.9
19       Global memory:         16077 MB
20       Block-shared memory:   48 KB = 6144 double
21       Constant memory:       64 KB = 8192 double
22       Maximum threads per block: 1024
23       Maximum thread dimension:  1024, 1024, 64
24       Maximum grid dimension:    2147483647, 65535, 65535

In Line 8, Qbics has found 4 CUDA device. In Line 5, Qbics reports that 0 will be used, i.e., the device reported in Line 9.

If you want to use all 4 GPUs, just run with --gpu arguments:

$ qbics-linux-gpu water.inp -n 8 --gpu 0,1,2,3 > water.out

Read water.out to confirm that all 4 GPUs are used (Line 5):

water.out
 1MPI is disabled in this version.
 2# Nodes: 1
 3   ID                       Hostname          Memory (GB)   #Cores  #OpenMP
 4    0                  ubuntu-server                  251       96        1
 5CUDA Device to be used: 0 1 2 3
 6CUDA Device:
 7On node 0, ubuntu-server:
 8  4 CUDA device is available:
 9    0: NVIDIA GeForce RTX 4080
10       Computational ability: 8.9
11       Global memory:         16079 MB
12       Block-shared memory:   48 KB = 6144 double
13       Constant memory:       64 KB = 8192 double
14       Maximum threads per block: 1024
15       Maximum thread dimension:  1024, 1024, 64
16       Maximum grid dimension:    2147483647, 65535, 65535

Run Qbics on Multiple Nodes

To run MPI version of Qbics, make sure that the MPI implementation must be the same version as the one used to compile Qbics. To check this, first run MPI version in serial mode:

$ qbics-linux-cpu-mpi water.inp -n 8 > water.out

In water.out, you can find this:

water.out
1C++ compiler:   g++ (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
2C++ options:    -O2 --std=c++17 -fopenmp -ffast-math -fno-finite-math-only -fexpensive-optimizations -Wall -mavx2 -mfma
3MPI compiler:   mpirun (Open MPI) 4.1.2

Line 3 says that the MPI compiler is mpirun (Open MPI) 4.1.2. Then, in shell,

$ mpirun -V
mpirun (Open MPI) 4.1.2

Report bugs to http://www.open-mpi.org/community/help/

Thus, this mpirun is exactly the same version as Qbics needs.

Run MPI Version of Qbics from Shell

The following command:

$ mpirun -np 4 --bind-to none qbics-linux-cpu-mpi water.inp -n 8 > water.out

Here, -np is the number of MPI processes. Note that you can also use -n to set up OpenMP parallelization. In this case, we have 4 MPI processes, each having 8 OpenMP threads. Here, --bind-to none is the CPU binding mode. If you do not give --bind-to none, the number of OpenMP threads may be incorrect.

Run MPI Version of Qbics from Slurm

In most cases, you will run Qbics through a queueing system. In Qbics distribution, we give an example of a Slurm script tools/run_qbics.slurm to run Qbics:

tools/run_qbics.slurm
 1#!/bin/bash
 2#SBATCH --job-name=water
 3#SBATCH --nodes=4          # Total number of physical nodes.
 4#SBATCH --ntasks=8         # Total number of MPI processes.
 5#SBATCH --cpus-per-task=8  # Number of OpenMP thereads for each MPI process.
 6#SBATCH --partition=your_partition
 7
 8# Load the appropriate modules if needed.
 9# module load openmpi/4.1.1
10
11inp=water.inp
12out=water.out
13mpirun qbics-linux-cpu-mpi $inp -n $SLURM_CPUS_PER_TASK > $out

In this script, we request 4 physical nodes (--nodes) and totally 8 MPI processes (--ntasks), and each MPI process has 8 OpenMP threads (--cpus-per-task). Thus, we guess that each node will have 2 MPI processes. You can change these parameters according to your needs.

--partition is the queue you want to use, which should be arranged by your cluster administrator. In Slurm script, mpirun does not need -np option, since Slurm will automatically set the number of MPI processes according to --ntasks.

Submit this task:

$ sbatch run_qbics.slurm

After running, you can find these lines in water.out (on my cluster):

water.out
 1User: junz
 2# Physical nodes: 4
 3Physical node names: cu295 cu296 cu297 cu298
 4MPI version:     3.1
 5# MPI processes: 8
 6 Rank                       Hostname          Memory (GB)   #Cores  #OpenMP
 7    0                          cu295                  187       32        8
 8    1                          cu295                  187       32        8
 9    2                          cu296                  187       32        8
10    3                          cu296                  187       32        8
11    4                          cu297                  187       32        8
12    5                          cu297                  187       32        8
13    6                          cu298                  187       32        8
14    7                          cu298                  187       32        8
15CUDA is disabled in this version.

Indeed, we have 4 physical nodes, each having 2 MPI processes, and each MPI process has 8 OpenMP threads. We also know that each node has 32 cores and a memory of 187 GB.

Attention

On different clusters, the slurm script may need some modifications. Please consult the administrator of your cluster.