5. Thermo_pw on the GPU

Thermo_pw uses the QUANTUM ESPRESSO routines which are GPU aware, so if you compile with the flag -D__CUDA, GPU is active in thermo_pw. Starting from version 1.9.0 THERMO_PW contains also an experimental GPU version which improves on the QUANTUM ESPRESSO routines when the system is small and requires many k-points. This version which is activated by setting many_k=.TRUE. in the INPUT_THERMO namelist uses several experimental routines presently available only in thermo_pw. It has many limitations so it has to be used only for particular cases. It is implemented with Davidson diagonalization for LDA and GGA functionals. It is working for norm conserving, ultrasoft, and PAW pseudopotentials, both scalar and fully relativistic, but not for hybrid functionals or for LDA+U schemes. When the response to a phonon perturbation is calculated by thermo_pw with the flag many_k=.TRUE. the new routines are used, while the old ones are used for the other perturbations. The method is found to be useful only when FFT sizes are smaller than approximately 48 x 48 x 48 . It is not compatible with G-vectors parallelization, but it can be used with pools parallelization. It must be used with a number of MPI processes equal to the number of GPUs and with a number of pools equal to the number of MPI processes (one pool per GPU). It uses CUDA fortran so it is active only on NVIDIA GPUs. It has been tested on marconi100 at CINECA loading modules hpc-sdk/2022-binary or hpc-sdk/2023-binary. The many_k routines are active also on the CPU version of the code for testing purposes, but there is no advantage to use them with CPU.

many_k: If .TRUE., the part of the code that loads 
        on the GPU several wavefunctions and Hamiltonians 
        is used. 
        Default: logical .FALSE.
memgpu: The available free memory in one GPU in GByte units.
        (If the code crashes due to memory allocation problems, 
        decrease this value). 
        Default: real 10.

