Running Qbics

Command Line Arguments
Run Qbics on a Single Node with GPU
Run Qbics on Multiple Nodes
- Run MPI Version of Qbics from Shell
- Run MPI Version of Qbics from Slurm

Qbics should be run from Windows command prompt or Linux/macOS terminal.

To run Qbics, you just need to give an input file name. We prepare an input file called water.inp:

water.inp

# A B3LYP/cc-pvdz calculation for water.
basis
    cc-pvdz
end

scf
    charge  0 # Total charge.
    spin2p1 1
end

mol
    O   0.00000000000000      0.05011194954430      0.05011194954224
    H   0.00000000000000     -0.06080277603381      1.01069082652926
    H   0.00000000000000      1.01069082648951     -0.06080277607149
end

task
  energy b3lyp
end

Command Line Arguments 

The usage of Qbics is:

qbics-linux-cpu <name> [-n <number>] [-s <path>] [-m <size>] [-d <size>] [--gpu <ids>]

You can use this command to run Qbics:

$ qbics-linux-cpu water.inp > water.out

The optional arguments are explained below:

-n

Value	Define the number of OpenMP threads for each MPI process.
Default	`1`

The value should be less than the number of physical CPU cores of the node it is run on.

-s

Value	Define the scratch path where omputational temporary files are saved.
Default	`./`

Qbics will use this path to write some computational temporary files. It should be on a local, fast, and large disk, and not remote ones, like NFS shared paths. For Windows users, the scratch path should be given in Linux format. For example, if the scratch path is D:\Jobs\Scratch (Windows format), then for Qbics you should give -d D:/Jobs/Scratch.

-m

Value	Define the maximum memory size in GB that a MPI process can use
Default	Unlimited

For example, -m 5.5 means that each MPI process will use up to 5.5 GB of memory, no matter how many OpenMP threads there are. Of course, it should not exceed the total memory size of the node.

-d

Value	Define the maximum disk size in GB that a MPI process can use in the scratch path.
Default	Unlimited

For example, -d 900 means that each MPI process will use up to 900 GB of disk, no matter how many OpenMP threads there are. Of course, it should not exceed the total disk size in the scratch path.

--gpu

Value	Define GPU device IDs to be used.
Default	`0`

For example, --gpu 0,2,3 means that Qbics will use GPU device of ID 0, 2, 3 to do calculations.

Here is an example of running Qbics:

$ qbics-linux-cpu water.inp -n 8 -m 30 -d 500 -s /scratch/zhang > water.out

This command will run Qbics with an input file water.inp. The number of OpenMP threads is 8, maximum memory and disk size is 30 GB and 500 GB, respectively, and the scratch path is /scratch/zhang.

Run Qbics on a Single Node with GPU 

If GPU devices are available, you can just run GPU version of Qbics like before, and Qbics will automatically use GPU if possible:

$ qbics-linux-gpu water.inp -n 8 > water.out

In water.out, Qbics will output the GPU found, and only use 0:

water.out

MPI is disabled in this version.
# Nodes: 1
   ID                       Hostname          Memory (GB)   #Cores  #OpenMP
    0                  ubuntu-server                  251       96        1
CUDA Device to be used: 0
CUDA Device:
On node 0, ubuntu-server:
  4 CUDA device is available:
    0: NVIDIA GeForce RTX 4080
       Computational ability: 8.9
       Global memory:         16079 MB
       Block-shared memory:   48 KB = 6144 double
       Constant memory:       64 KB = 8192 double
       Maximum threads per block: 1024
       Maximum thread dimension:  1024, 1024, 64
       Maximum grid dimension:    2147483647, 65535, 65535
    1: NVIDIA GeForce RTX 4080
       Computational ability: 8.9
       Global memory:         16077 MB
       Block-shared memory:   48 KB = 6144 double
       Constant memory:       64 KB = 8192 double
       Maximum threads per block: 1024
       Maximum thread dimension:  1024, 1024, 64
       Maximum grid dimension:    2147483647, 65535, 65535

In Line 8, Qbics has found 4 CUDA device. In Line 5, Qbics reports that 0 will be used, i.e., the device reported in Line 9.

If you want to use all 4 GPUs, just run with --gpu arguments:

$ qbics-linux-gpu water.inp -n 8 --gpu 0,1,2,3 > water.out

Read water.out to confirm that all 4 GPUs are used (Line 5):

water.out

MPI is disabled in this version.
# Nodes: 1
   ID                       Hostname          Memory (GB)   #Cores  #OpenMP
    0                  ubuntu-server                  251       96        1
CUDA Device to be used: 0 1 2 3
CUDA Device:
On node 0, ubuntu-server:
  4 CUDA device is available:
    0: NVIDIA GeForce RTX 4080
       Computational ability: 8.9
       Global memory:         16079 MB
       Block-shared memory:   48 KB = 6144 double
       Constant memory:       64 KB = 8192 double
       Maximum threads per block: 1024
       Maximum thread dimension:  1024, 1024, 64
       Maximum grid dimension:    2147483647, 65535, 65535

Run Qbics on Multiple Nodes 

To run MPI version of Qbics, make sure that the MPI implementation must be the same version as the one used to compile Qbics. To check this, first run MPI version in serial mode:

$ qbics-linux-cpu-mpi water.inp -n 8 > water.out

In water.out, you can find this:

water.out

C++ compiler:   g++ (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
C++ options:    -O2 --std=c++17 -fopenmp -ffast-math -fno-finite-math-only -fexpensive-optimizations -Wall -mavx2 -mfma
MPI compiler:   mpirun (Open MPI) 4.1.2

Line 3 says that the MPI compiler is mpirun (Open MPI) 4.1.2. Then, in shell,

$ mpirun -V
mpirun (Open MPI) 4.1.2

Report bugs to http://www.open-mpi.org/community/help/

Thus, this mpirun is exactly the same version as Qbics needs.

Run MPI Version of Qbics from Shell 

The following command:

$ mpirun -np 4 --bind-to none qbics-linux-cpu-mpi water.inp -n 8 > water.out

Here, -np is the number of MPI processes. Note that you can also use -n to set up OpenMP parallelization. In this case, we have 4 MPI processes, each having 8 OpenMP threads. Here, --bind-to none is the CPU binding mode. If you do not give --bind-to none, the number of OpenMP threads may be incorrect.

Run MPI Version of Qbics from Slurm 

In most cases, you will run Qbics through a queueing system. In Qbics distribution, we give an example of a Slurm script tools/run_qbics.slurm to run Qbics:

tools/run_qbics.slurm

#!/bin/bash
#SBATCH --job-name=water
#SBATCH --nodes=4          # Total number of physical nodes.
#SBATCH --ntasks=8         # Total number of MPI processes.
#SBATCH --cpus-per-task=8  # Number of OpenMP thereads for each MPI process.
#SBATCH --partition=your_partition

# Load the appropriate modules if needed.
# module load openmpi/4.1.1

inp=water.inp
out=water.out
mpirun qbics-linux-cpu-mpi $inp -n $SLURM_CPUS_PER_TASK > $out

In this script, we request 4 physical nodes (--nodes) and totally 8 MPI processes (--ntasks), and each MPI process has 8 OpenMP threads (--cpus-per-task). Thus, we guess that each node will have 2 MPI processes. You can change these parameters according to your needs.

--partition is the queue you want to use, which should be arranged by your cluster administrator. In Slurm script, mpirun does not need -np option, since Slurm will automatically set the number of MPI processes according to --ntasks.

Submit this task:

$ sbatch run_qbics.slurm

After running, you can find these lines in water.out (on my cluster):

water.out

User: junz
# Physical nodes: 4
Physical node names: cu295 cu296 cu297 cu298
MPI version:     3.1
# MPI processes: 8
 Rank                       Hostname          Memory (GB)   #Cores  #OpenMP
    0                          cu295                  187       32        8
    1                          cu295                  187       32        8
    2                          cu296                  187       32        8
    3                          cu296                  187       32        8
    4                          cu297                  187       32        8
    5                          cu297                  187       32        8
    6                          cu298                  187       32        8
    7                          cu298                  187       32        8
CUDA is disabled in this version.

Indeed, we have 4 physical nodes, each having 2 MPI processes, and each MPI process has 8 OpenMP threads. We also know that each node has 32 cores and a memory of 187 GB.

Attention

On different clusters, the slurm script may need some modifications. Please consult the administrator of your cluster.

Running Qbics

Command Line Arguments

Run Qbics on a Single Node with GPU

Run Qbics on Multiple Nodes

Run MPI Version of Qbics from Shell

Run MPI Version of Qbics from Slurm

Command Line Arguments 

Run Qbics on a Single Node with GPU 

Run Qbics on Multiple Nodes 

Run MPI Version of Qbics from Shell 

Run MPI Version of Qbics from Slurm 