Running Qbics
Qbics should be run from Windows command prompt or Linux/macOS terminal.
To run Qbics, you just need to give an input file name. We prepare an input file called water.inp
:
1# A B3LYP/cc-pvdz calculation for water.
2basis
3 cc-pvdz
4end
5
6scf
7 charge 0 # Total charge.
8 spin2p1 1
9end
10
11mol
12 O 0.00000000000000 0.05011194954430 0.05011194954224
13 H 0.00000000000000 -0.06080277603381 1.01069082652926
14 H 0.00000000000000 1.01069082648951 -0.06080277607149
15end
16
17task
18 energy b3lyp
19end
Command Line Arguments
The usage of Qbics is:
qbics-linux-cpu <name> [-n <number>] [-s <path>] [-m <size>] [-d <size>] [--gpu <ids>]
You can use this command to run Qbics:
$ qbics-linux-cpu water.inp > water.out
The optional arguments are explained below:
- -n
Value
Define the number of OpenMP threads for each MPI process.
Default
1
The value should be less than the number of physical CPU cores of the node it is run on.
- -s
Value
Define the scratch path where omputational temporary files are saved.
Default
./
Qbics will use this path to write some computational temporary files. It should be on a local, fast, and large disk, and not remote ones, like NFS shared paths. For Windows users, the scratch path should be given in Linux format. For example, if the scratch path is D:\Jobs\Scratch
(Windows format), then for Qbics you should give -d D:/Jobs/Scratch
.
- -m
Value
Define the maximum memory size in GB that a MPI process can use
Default
Unlimited
For example, -m 5.5
means that each MPI process will use up to 5.5 GB of memory, no matter how many OpenMP threads there are. Of course, it should not exceed the total memory size of the node.
- -d
Value
Define the maximum disk size in GB that a MPI process can use in the scratch path.
Default
Unlimited
For example, -d 900
means that each MPI process will use up to 900 GB of disk, no matter how many OpenMP threads there are. Of course, it should not exceed the total disk size in the scratch path.
- --gpu
Value
Define GPU device IDs to be used.
Default
0
For example, --gpu 0,2,3
means that Qbics will use GPU device of ID 0
, 2
, 3
to do calculations.
Here is an example of running Qbics:
$ qbics-linux-cpu water.inp -n 8 -m 30 -d 500 -s /scratch/zhang > water.out
This command will run Qbics with an input file water.inp
. The number of OpenMP threads is 8, maximum memory and disk size is 30 GB and 500 GB, respectively, and the scratch path is /scratch/zhang
.
Run Qbics on a Single Node with GPU
If GPU devices are available, you can just run GPU version of Qbics like before, and Qbics will automatically use GPU if possible:
$ qbics-linux-gpu water.inp -n 8 > water.out
In water.out
, Qbics will output the GPU found, and only use 0
:
1MPI is disabled in this version.
2# Nodes: 1
3 ID Hostname Memory (GB) #Cores #OpenMP
4 0 ubuntu-server 251 96 1
5CUDA Device to be used: 0
6CUDA Device:
7On node 0, ubuntu-server:
8 4 CUDA device is available:
9 0: NVIDIA GeForce RTX 4080
10 Computational ability: 8.9
11 Global memory: 16079 MB
12 Block-shared memory: 48 KB = 6144 double
13 Constant memory: 64 KB = 8192 double
14 Maximum threads per block: 1024
15 Maximum thread dimension: 1024, 1024, 64
16 Maximum grid dimension: 2147483647, 65535, 65535
17 1: NVIDIA GeForce RTX 4080
18 Computational ability: 8.9
19 Global memory: 16077 MB
20 Block-shared memory: 48 KB = 6144 double
21 Constant memory: 64 KB = 8192 double
22 Maximum threads per block: 1024
23 Maximum thread dimension: 1024, 1024, 64
24 Maximum grid dimension: 2147483647, 65535, 65535
In Line 8, Qbics has found 4 CUDA device. In Line 5, Qbics reports that 0
will be used, i.e., the device reported in Line 9.
If you want to use all 4 GPUs, just run with --gpu
arguments:
$ qbics-linux-gpu water.inp -n 8 --gpu 0,1,2,3 > water.out
Read water.out
to confirm that all 4 GPUs are used (Line 5):
1MPI is disabled in this version.
2# Nodes: 1
3 ID Hostname Memory (GB) #Cores #OpenMP
4 0 ubuntu-server 251 96 1
5CUDA Device to be used: 0 1 2 3
6CUDA Device:
7On node 0, ubuntu-server:
8 4 CUDA device is available:
9 0: NVIDIA GeForce RTX 4080
10 Computational ability: 8.9
11 Global memory: 16079 MB
12 Block-shared memory: 48 KB = 6144 double
13 Constant memory: 64 KB = 8192 double
14 Maximum threads per block: 1024
15 Maximum thread dimension: 1024, 1024, 64
16 Maximum grid dimension: 2147483647, 65535, 65535
Run Qbics on Multiple Nodes
To run MPI version of Qbics, make sure that the MPI implementation must be the same version as the one used to compile Qbics. To check this, first run MPI version in serial mode:
$ qbics-linux-cpu-mpi water.inp -n 8 > water.out
In water.out
, you can find this:
1C++ compiler: g++ (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
2C++ options: -O2 --std=c++17 -fopenmp -ffast-math -fno-finite-math-only -fexpensive-optimizations -Wall -mavx2 -mfma
3MPI compiler: mpirun (Open MPI) 4.1.2
Line 3 says that the MPI compiler is mpirun (Open MPI) 4.1.2
. Then, in shell,
$ mpirun -V
mpirun (Open MPI) 4.1.2
Report bugs to http://www.open-mpi.org/community/help/
Thus, this mpirun
is exactly the same version as Qbics needs.
Run MPI Version of Qbics from Shell
The following command:
$ mpirun -np 4 --bind-to none qbics-linux-cpu-mpi water.inp -n 8 > water.out
Here, -np
is the number of MPI processes. Note that you can also use -n
to set up OpenMP parallelization. In this case, we have 4 MPI processes, each having 8 OpenMP threads. Here, --bind-to none
is the CPU binding mode. If you do not give --bind-to none
, the number of OpenMP threads may be incorrect.
Run MPI Version of Qbics from Slurm
In most cases, you will run Qbics through a queueing system. In Qbics distribution, we give an example of a Slurm script tools/run_qbics.slurm
to run Qbics:
1#!/bin/bash
2#SBATCH --job-name=water
3#SBATCH --nodes=4 # Total number of physical nodes.
4#SBATCH --ntasks=8 # Total number of MPI processes.
5#SBATCH --cpus-per-task=8 # Number of OpenMP thereads for each MPI process.
6#SBATCH --partition=your_partition
7
8# Load the appropriate modules if needed.
9# module load openmpi/4.1.1
10
11inp=water.inp
12out=water.out
13mpirun qbics-linux-cpu-mpi $inp -n $SLURM_CPUS_PER_TASK > $out
In this script, we request 4 physical nodes (--nodes
) and totally 8 MPI processes (--ntasks
), and each MPI process has 8 OpenMP threads (--cpus-per-task
). Thus, we guess that each node will have 2 MPI processes. You can change these parameters according to your needs.
--partition
is the queue you want to use, which should be arranged by your cluster administrator. In Slurm script, mpirun
does not need -np
option, since Slurm will automatically set the number of MPI processes according to --ntasks
.
Submit this task:
$ sbatch run_qbics.slurm
After running, you can find these lines in water.out
(on my cluster):
1User: junz
2# Physical nodes: 4
3Physical node names: cu295 cu296 cu297 cu298
4MPI version: 3.1
5# MPI processes: 8
6 Rank Hostname Memory (GB) #Cores #OpenMP
7 0 cu295 187 32 8
8 1 cu295 187 32 8
9 2 cu296 187 32 8
10 3 cu296 187 32 8
11 4 cu297 187 32 8
12 5 cu297 187 32 8
13 6 cu298 187 32 8
14 7 cu298 187 32 8
15CUDA is disabled in this version.
Indeed, we have 4 physical nodes, each having 2 MPI processes, and each MPI process has 8 OpenMP threads. We also know that each node has 32 cores and a memory of 187 GB.
Attention
On different clusters, the slurm script may need some modifications. Please consult the administrator of your cluster.