Running Qbics ====================== .. contents:: :local: Qbics should be run from Windows command prompt or Linux/macOS terminal. To run Qbics, you just need to give an input file name. We prepare an input file called ``water.inp``: .. code-block:: bash :caption: water.inp :linenos: # A B3LYP/cc-pvdz calculation for water. basis cc-pvdz end scf charge 0 # Total charge. spin2p1 1 end mol O 0.00000000000000 0.05011194954430 0.05011194954224 H 0.00000000000000 -0.06080277603381 1.01069082652926 H 0.00000000000000 1.01069082648951 -0.06080277607149 end task energy b3lyp end Command Line Arguments ------------------------------------------- The usage of Qbics is: .. code-block:: bash qbics-linux-cpu [-n ] [-s ] [-m ] [-d ] [--gpu ] You can use this command to run Qbics: .. code-block:: bash $ qbics-linux-cpu water.inp > water.out The optional arguments are explained below: .. option:: -n .. list-table:: :stub-columns: 1 :widths: 5 20 * - Value - Define the number of OpenMP threads for each MPI process. * - Default - ``1`` The value should be **less than the number of physical CPU cores** of the node it is run on. .. option:: -s .. list-table:: :stub-columns: 1 :widths: 5 20 * - Value - Define the scratch path where omputational temporary files are saved. * - Default - ``./`` Qbics will use this path to write some computational temporary files. It should be on a **local, fast, and large** disk, and **not** remote ones, like NFS shared paths. For Windows users, the scratch path should be given in **Linux format**. For example, if the scratch path is ``D:\Jobs\Scratch`` (Windows format), then for Qbics you should give ``-d D:/Jobs/Scratch``. .. option:: -m .. list-table:: :stub-columns: 1 :widths: 5 20 * - Value - Define the maximum memory size in GB that a MPI process can use * - Default - Unlimited For example, ``-m 5.5`` means that each MPI process will use up to 5.5 GB of memory, no matter how many OpenMP threads there are. Of course, it should not exceed the total memory size of the node. .. option:: -d .. list-table:: :stub-columns: 1 :widths: 5 20 * - Value - Define the maximum disk size in GB that a MPI process can use in the scratch path. * - Default - Unlimited For example, ``-d 900`` means that each MPI process will use up to 900 GB of disk, no matter how many OpenMP threads there are. Of course, it should not exceed the total disk size in the scratch path. .. option:: --gpu .. list-table:: :stub-columns: 1 :widths: 5 20 * - Value - Define GPU device IDs to be used. * - Default - ``0`` For example, ``--gpu 0,2,3`` means that Qbics will use GPU device of ID ``0``, ``2``, ``3`` to do calculations. Here is an example of running Qbics: .. code-block:: bash $ qbics-linux-cpu water.inp -n 8 -m 30 -d 500 -s /scratch/zhang > water.out This command will run Qbics with an input file ``water.inp``. The number of OpenMP threads is 8, maximum memory and disk size is 30 GB and 500 GB, respectively, and the scratch path is ``/scratch/zhang``. Run Qbics on a Single Node with GPU -------------------------------------------- If GPU devices are available, you can just run GPU version of Qbics like before, and Qbics will automatically use GPU if possible: .. code-block:: bash $ qbics-linux-gpu water.inp -n 8 > water.out In ``water.out``, Qbics will output the GPU found, and only use ``0``: .. code-block:: bash :caption: water.out :linenos: MPI is disabled in this version. # Nodes: 1 ID Hostname Memory (GB) #Cores #OpenMP 0 ubuntu-server 251 96 1 CUDA Device to be used: 0 CUDA Device: On node 0, ubuntu-server: 4 CUDA device is available: 0: NVIDIA GeForce RTX 4080 Computational ability: 8.9 Global memory: 16079 MB Block-shared memory: 48 KB = 6144 double Constant memory: 64 KB = 8192 double Maximum threads per block: 1024 Maximum thread dimension: 1024, 1024, 64 Maximum grid dimension: 2147483647, 65535, 65535 1: NVIDIA GeForce RTX 4080 Computational ability: 8.9 Global memory: 16077 MB Block-shared memory: 48 KB = 6144 double Constant memory: 64 KB = 8192 double Maximum threads per block: 1024 Maximum thread dimension: 1024, 1024, 64 Maximum grid dimension: 2147483647, 65535, 65535 In Line 8, Qbics has found 4 CUDA device. In Line 5, Qbics reports that ``0`` will be used, i.e., the device reported in Line 9. If you want to use all 4 GPUs, just run with ``--gpu`` arguments: .. code-block:: bash $ qbics-linux-gpu water.inp -n 8 --gpu 0,1,2,3 > water.out Read ``water.out`` to confirm that all 4 GPUs are used (Line 5): .. code-block:: bash :caption: water.out :linenos: MPI is disabled in this version. # Nodes: 1 ID Hostname Memory (GB) #Cores #OpenMP 0 ubuntu-server 251 96 1 CUDA Device to be used: 0 1 2 3 CUDA Device: On node 0, ubuntu-server: 4 CUDA device is available: 0: NVIDIA GeForce RTX 4080 Computational ability: 8.9 Global memory: 16079 MB Block-shared memory: 48 KB = 6144 double Constant memory: 64 KB = 8192 double Maximum threads per block: 1024 Maximum thread dimension: 1024, 1024, 64 Maximum grid dimension: 2147483647, 65535, 65535 Run Qbics on Multiple Nodes -------------------------------------------- To run MPI version of Qbics, make sure that the MPI implementation must be **the same version** as the one used to compile Qbics. To check this, first run MPI version in serial mode: .. code-block:: bash $ qbics-linux-cpu-mpi water.inp -n 8 > water.out In ``water.out``, you can find this: .. code-block:: bash :caption: water.out :linenos: C++ compiler: g++ (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 C++ options: -O2 --std=c++17 -fopenmp -ffast-math -fno-finite-math-only -fexpensive-optimizations -Wall -mavx2 -mfma MPI compiler: mpirun (Open MPI) 4.1.2 Line 3 says that the MPI compiler is ``mpirun (Open MPI) 4.1.2``. Then, in shell, .. code-block:: bash $ mpirun -V mpirun (Open MPI) 4.1.2 Report bugs to http://www.open-mpi.org/community/help/ Thus, this ``mpirun`` is exactly the same version as Qbics needs. Run MPI Version of Qbics from Shell +++++++++++++++++++++++++++++++++++++++++ The following command: .. code-block:: bash $ mpirun -np 4 --bind-to none qbics-linux-cpu-mpi water.inp -n 8 > water.out Here, ``-np`` is the number of MPI processes. Note that you can also use ``-n`` to set up OpenMP parallelization. In this case, we have 4 MPI processes, each having 8 OpenMP threads. Here, ``--bind-to none`` is the CPU binding mode. If you do not give ``--bind-to none``, the number of OpenMP threads may be incorrect. Run MPI Version of Qbics from Slurm +++++++++++++++++++++++++++++++++++++++++ In most cases, you will run Qbics through a queueing system. In Qbics distribution, we give an example of a Slurm script ``tools/run_qbics.slurm`` to run Qbics: .. code-block:: bash :caption: tools/run_qbics.slurm :linenos: #!/bin/bash #SBATCH --job-name=water #SBATCH --nodes=4 # Total number of physical nodes. #SBATCH --ntasks=8 # Total number of MPI processes. #SBATCH --cpus-per-task=8 # Number of OpenMP thereads for each MPI process. #SBATCH --partition=your_partition # Load the appropriate modules if needed. # module load openmpi/4.1.1 inp=water.inp out=water.out mpirun qbics-linux-cpu-mpi $inp -n $SLURM_CPUS_PER_TASK > $out In this script, we request 4 physical nodes (``--nodes``) and totally 8 MPI processes (``--ntasks``), and each MPI process has 8 OpenMP threads (``--cpus-per-task``). Thus, we guess that each node will have 2 MPI processes. You can change these parameters according to your needs. ``--partition`` is the queue you want to use, which should be arranged by your cluster administrator. In Slurm script, ``mpirun`` does not need ``-np`` option, since Slurm will automatically set the number of MPI processes according to ``--ntasks``. Submit this task: .. code-block:: bash $ sbatch run_qbics.slurm After running, you can find these lines in ``water.out`` (on my cluster): .. code-block:: bash :caption: water.out :linenos: User: junz # Physical nodes: 4 Physical node names: cu295 cu296 cu297 cu298 MPI version: 3.1 # MPI processes: 8 Rank Hostname Memory (GB) #Cores #OpenMP 0 cu295 187 32 8 1 cu295 187 32 8 2 cu296 187 32 8 3 cu296 187 32 8 4 cu297 187 32 8 5 cu297 187 32 8 6 cu298 187 32 8 7 cu298 187 32 8 CUDA is disabled in this version. Indeed, we have 4 physical nodes, each having 2 MPI processes, and each MPI process has 8 OpenMP threads. We also know that each node has 32 cores and a memory of 187 GB. .. attention:: On different clusters, the slurm script may need some modifications. Please consult the administrator of your cluster.