Author Topic: problem in ridft  (Read 2408 times)

surajit

  • Newbie
  • *
  • Posts: 3
  • Karma: +0/-0
problem in ridft
« on: November 17, 2014, 05:00:30 am »
Hi,

I have turbomole 6.5 installed in a server. The following lines are set in ~/.bashrc:

export PARA_ARCH=MPI
export PARNODES=12
export MPIRUN_OPTIONS=-TCP
ulimit -s unlimited
export TURBODIR=/usr/local/chem/turbomole650
export PATH=$TURBODIR/scripts/:$PATH
export PATH=$TURBODIR/bin/`sysname`:$PATH
source $TURBODIR/Config_turbo_env

In any geometry optimization runs if b-p functional with ri is used, the ridft module
is taking more cpu percent than what it should have taken.Which increases the load.

the result of the top command during one optimization is given in the attached file.
the result for uname -a is:

Linux server 2.6.32-220.el6.x86_64 #1 SMP Tue Dec 6 19:48:22 GMT 2011 x86_64 x86_64 x86_64 GNU/Linux

This problem is happening for any single or multi processor jobs. Please help.

uwe

  • Global Moderator
  • Sr. Member
  • *****
  • Posts: 410
  • Karma: +0/-0
Re: problem in ridft
« Reply #1 on: November 18, 2014, 10:50:05 am »
Hi surajit,

seems that the job was running on one node only. Then I'd try the SMP version instead of the MPI one, but nevertheless, the behavior will be similar:

- during the steps which calculate the contribution to the Fock (or KS) matrix like Coulomb (RI-J) and DFT quadrature, all individual ridft processes run with approx. 100% CPU time

- after each SCF iteration, the resulting matrix can be diagonalized by one process only which uses multi-threaded linear algebra routines since this is the most efficient way to do it. Note that the procedure how to diagonalize the matrix depends on the size of the matrix, other algorithms like ScaLAPACK might be used in other cases. The other processes are sleeping or waiting for the next task. That is why your first ridft_mpi process shows approx. 1200% CPU time: It uses all 12 cores.

The 'top' command shows the CPU usage in a way that you might not see the switch between the two modes, especially if the individual steps are done quite quickly. The top man page says:

The task's share of the elapsed CPU time since the last screen update, expressed as a percentage of total CPU time.

Regards,

Uwe

surajit

  • Newbie
  • *
  • Posts: 3
  • Karma: +0/-0
Re: problem in ridft
« Reply #2 on: November 20, 2014, 12:27:49 pm »
Thanks for the response.

The load shown by queue scheduler also increases (e.g. it goes up to 26 for a 16 processor job in a single SMP machine.). 

The extra load shown by top and the scheduler (SGE) make it difficult to run our job in a common shared facility.

The problem was not there in turbomole 6.4 and earlier versions.

How can I solve this ?

Thanks in advance,

Kind regards,

surajit
« Last Edit: December 01, 2014, 07:24:01 am by surajit »