Author Topic: TM 6.3: problems with ridft (parallel)  (Read 2599 times)

t.kerber

  • Newbie
  • *
  • Posts: 2
  • Karma: +0/-0
TM 6.3: problems with ridft (parallel)
« on: January 20, 2014, 10:27:04 am »
Hello,

I have a problem with TM 6.3 (parallel), I have a nanocluster with approx. 200 metal atoms.
method: PBE/def-SVP

I am trying to run a single point with ridft (MPI, 12, 16, or 32 cpus) on different architectures. Before a single SCF-cycle is completed, the program stopped.
The error message is
get density: illegal message tag or diagonalizer.

Any suggestions?

Best regards,

T. KERBER

antti_karttunen

  • Sr. Member
  • ****
  • Posts: 200
  • Karma: +1/-0
Re: TM 6.3: problems with ridft (parallel)
« Reply #1 on: January 23, 2014, 08:13:44 pm »
Hi,

Can you run smaller jobs successfully? Something like benzene with two CPUs? What kind of machines are you running on? You could try to run on just a single machine to see whether this has anything to do with the internode communications. It would also be helpful to see the full ridft output file (slave1.output) and the file called "master" to see what could be going on.

Antti

t.kerber

  • Newbie
  • *
  • Posts: 2
  • Karma: +0/-0
Re: TM 6.3: problems with ridft (parallel)
« Reply #2 on: January 31, 2014, 04:25:36 pm »
Thank you for your suggestions.

RIDFT is working properly for smaller clusters (up to 150 atoms).
For more than 200 atoms, it crashes. Here a part of the master file.



TURBOMOLE V6.3 8 Mar 2011 at 10:43:40
Copyright (C) 2011 TURBOMOLE GmbH, Karlsruhe

   ***************************************************************************
$operating system unix                                                         
   ***************************************************************************
 n_nodes =            8
                                  8  tasks spawned
           0           1           2           3           4           5
           6           7           8
           0           1           2           3           4           5
           6           7           8

 nshell = 2541, fockdim =  19462212, nfock =  19462212
time elapsed since starting is : cpu   0.004 sec
                                  wall    0.008 sec

 get density: illegal message tag or diagonalizer
  tag=          99  should be 5
  node_id=           1   should be            1

antti_karttunen

  • Sr. Member
  • ****
  • Posts: 200
  • Karma: +1/-0
Re: TM 6.3: problems with ridft (parallel)
« Reply #3 on: January 31, 2014, 08:37:26 pm »
Hmm, I don't see any obvious reason why 150 atoms would work and 200 not (it's not even near the static atom number limits of the basic version). Maybe this is some old technical issue that just happens to surface for your system (bad luck...). I recommend updating to the most recent version 6.5 to see if you can get rid of the problem. If this is not possible for you, I suggest trying the other parallelization scheme that is available for ridft (Global Arrays, see manual)

Antti

uwe

  • Global Moderator
  • Sr. Member
  • *****
  • Posts: 405
  • Karma: +0/-0
Re: TM 6.3: problems with ridft (parallel)
« Reply #4 on: February 03, 2014, 03:13:38 pm »
Hi,

seems that you use the 6.3 version and not the bugfix release 6.3.1
The header of the job should print "TURBOMOLE V6.3.1" and not "TURBOMOLE V6.3". Contact the Turbomole support to get an updated version.

As Antti I'd recommend to use V6.4 or V6.5. The error message you got is related to the way the tasks have been distributed to the clients. Newer versions use a different algorithm.

Regards,

Uwe