NSTX logo Software Group Documentation
     VMS Data Acquisition and Analysis Startup Checks
Updated August 18, 2004

 

 

SUMMARY

• check that RUNNSTX is running
• check IPCS:
     - check that GRANDPA and NETMAILER are running on KEES
       &
- check that IPCS is running on EPICS and the VMS jobs are "connected"
• check that NFS is running on BIRCH
• check that the NSTX_ACQREMOTE queue has the proper jobs running
• check that the NSTX_DB queue has the proper jobs running
• check that the NSTX_VGDS queue has the proper jobs running
• run a test shot (ask COE or, if not a run day, use Clock control page from EPICS console)

To get the definition of "shw", execute @DAS:DASLOGIN, or define
     $ shw :== $DAS$:[UTIL]show$world.exe To define NFS as a symbol for NFScontrol if you don't already have it:
     $ NFS*_control :== @ops:[nfs]nfs_control.com

SYSPRV is necessary to run NFS ($ SET PROC/PRIV=SYSPRV)

To change the Facility Clock time used for starting the MDSplus INIT phase (T-60 seconds by default), change the SYSTEM logical name
    $ define/system NSTX_CREATE_INIT_TIME -60
on KEES

I. Check that RUNNSTX is running

   KEES$ shw runnstx

Pid Prcname Image State Pri CPU ppgcnt/wspeak Faults
21205AFF RUNNSTX RUNNSTX HIB 7/4 00:02:34.23 4880/6816 801

When RUNNSTX is hung, one indication can be that Image shows up as "--" and State is LEF. In this case, stop the existing process. WORLD privilege is necessary to stop a process you did not start. ($ STOP PROC/ID=pid)
    KEES$ STOP/ID=21205AFF     ! the PID from SHW RUNNSTX
and then restart RUNNSTX as below.

If it isn't running after a reboot, probably a whole lot of other things are wrong, and, after verifying that, probably KEESSYSRTM.COM should be re-run from the KEES SYSMANAGER directory. On KEES,
     
KEES$ @SYS$SPECIFIC:[SYSMGR]KEESSYSRTM.COM

If just RUNNSTX is missing, restart it:
    
KEES$ @NSTX$:[NSTX.SOURCE.RUNNSTX]START_RUNNSTX.COM

In summary, to stop RUNNSTX and restart it:
     KEES$ set process/priv=world
     KEES$ @das$:[com]daslogin.com	! if not in login.com
     KEES$ shw *runnstx*		! pid will be in left-hand column
     KEES$ stop/id=<pid for RUNNSTX>
     KEES$ set default nstx$:[nstx.source.runnstx]
     KEES$ @START_RUNNSTX.COM

II. Check IPCS

KEES$ shw/node=kees *ipcs*

Pid       Prcname      Image State Pri

21200226 IPCS_GRANDPA   GRANDPA HIB 10/7
21200228 IPCSMAILER    NETMAILER HIB 10/7

To verify that RUNNSTX is "seeing" IPCS events, check the end of the NSTX$:[LOGS.RUNNSTX]RUNNSTX.LOG and RUNNSTX_yymmdd.LOG

[i.e., RUNNSTX_010208 for 08-Feb-2001]

Inspect the last IPCS message whose receipt has been recorded. If RUNNSTX "knows" it has lost IPCS, a long list of "trying to connect" messages will be present in RUNNSTX.LOG, but this only happens if RUNNSTX is restarted after IPCS has been lost:

xxx: evt_wait4ack: err from ipcs
     sts=no_net_receiver
xxx: evtHello: err from evt_wait4ack, connecting to EVTMGR_NSTX
    sts=Returned message
...wait a bit before trying to re-connect ...

If RUNNSTX is happy and no shots have occurred recently, "lamcheck" messages will be at the end of the log file:

KEES$ type/tail=5 nstx$:[logs.runnstx]runnstx.log

runnstx: lamCheck: 26-Oct-2000 14:36:00
runnstx: lamCheck: 26-Oct-2000 14:38:00
runnstx: lamCheck: 26-Oct-2000 14:40:00
runnstx: lamCheck: 26-Oct-2000 14:42:00
runnstx: lamCheck: 26-Oct-2000 14:44:00

Restarting IPCS from the EPICS console

If it appears that IPCS is no longer connected, it must be restarted on the EPICS side:

in order to do this it is usually necessary to "unlock" the EPICS console next to the KEES console. "One must be a privileged operator" is the hint for the password. (This is the CICOPR password password.)

From control page CH01, in the Operations Startup panel, there is a large gray button marked with "!" This is normally set up and visible on the "CI&C Operations" window of the Common Desktop interface.

The "!" button is a drop-down menu of actions; choose Start/stop evtMgr tasks, and watch on the Xterm output that the shot-cycle events are subsequently subscribed by RUNNSTX, and by the ClockSync_NSTX program, which will probably be identified as "<PID>_kees"; Sometimes this takes more than one restart. RUNNSTX waits between attempts to reconnect, so the subscribe messages may not come immediately. It has infrequently also been necessary to Start Clock Event Tasks. from the save menu. [Note: during from 2002 until mid 2004, restarting IPCS from the EPICS console was very rarely needed.]

If there is a problem at this stage, Tom Gibney or Paul Sichta should be consulted.

III. Check that NFS is running on BIRCH

NFS server should be running on BIRCH:

KEES$ shw nfs_server*/node=BIRCH

Pid Prcname Image State Pri
216094B0 NFS_SERVER NFS_SERVER HIB 11/9

Several processes such as the following are likely to appear as well

21607CB1 NFS_SERVERIO_1 ASYNC_IO_ASS LEF 12/9
216090C5 NFS_SERVERIO_2 ASYNC_IO_ASS LEF 12/9
 

If the NFS server task is not present, or if users of NFS (at present: Fast Camera PC, USXR PC, UCLA MMWR PC) report that they don't see any change to their shotnumber files, it may be necessary to restart NFS (from BIRCH) by running NFScontrol (see definition setup at top of document):

$ NFS

choose "2" to stop NFS and then "1" to restart;

to see the status of the NFS server process from inside NFScontrol requires more than SYSPRV, but using SHW to check it does not

IV Check NSTX_ batch queues

To view all the jobs in a batch queue,

$ SHOW QUE/ALL NSTX_<specific queue name>

To also see the number of jobs allowed to be executing at once and the command files that started the job (to locate log files, for instance, or to be able to restart a job that must be temporarily stopped)

$ SHOW QUE/ALL/FULL NSTX_<specific queue name>

Our convention is that there should be a single startup file for each shot-cycle queue. These startup files are invoked whenever KEES is rebooted, via the KEESSYSRTM.COM file, which in turn points to our "master" files:

Queue name "Master" submit file

NSTX_ACQREMOTE NSTX$:[ACQREMOTE]MASTER_ACQREMOTE_SUBMIT.COM
    NSTX_DB NSTX$:[DB]MASTER_DB_SUBMIT.COM 
  NSTX_VGDS NSTX$:[VGDS.BATCHJOBS]MASTER_VGDSSUBMIT.COM

Whenever a job must run under a specific owner (because a QCS queue is owned by that account or because a database table on Eagle is owned by that account, or ...) make sure that the "/USER_" switch is set on the submit command. Submitting a job under a specific username requires SYSPRV and CMKRNL.

 

       
         
Troubleshooting:        

Log files

 

Log files for all our shot-cycle jobs are meant to be placed in subdirectories of NSTX$:[LOGS]. Currently:

Subdirectories of NSTX$:[LOGS]

ACQREMOTE   CAMAC   DB   EVENTS   MPTS   RF   RUNNSTX   SQLSERVER   TC_MON   VGDS

The NSTX$:[LOGS.RUNNSTX] directory contains log files for RUNNSTX, DISPATCHER, GKB2_SERVER and GKC2_SERVER, each suffixed with the date in yymmdd format.

Using the SEARCH command on these logs, or using TYPE/TAIL=nn to see the most recent entries, are the most common checking techniques

 

NSTXACTMON and SHOTCAMACMON

 

$ nstxACTMON :== "spawn/not/now/input=nl: mcr actmon -monitor kees::mon_server"

NSTXACTMON invokes the program that receives status messages from the dispatcher and various mdsservers, relayed by a monitor_server task; there is only one monitor_server, automatically installed, but there can be many users watching proceedings using NSTXACTMON without incurring a performance penalty (or so we are assured). NSTXACTMON can be run on any Alpha node.

Note: MPTS (Thomson Scattering) runs its own INIT sequence, although the STORE cycle is run via MDSplus; if MPTS is not running, it is normal to get STORE cycle errors for various TS digraph modules, since they will not have been initialized.

To discover what errors occurred in earlier shots, the PPPL utility SHOTCAMACMON.PRO can be used. (DMASTROVITO$:[CAMAC_ERRORS]SHOTCAMACMON.PRO at the moment)

 

Remote acquisition:

EPICS, Rich (the plasma control computer) and most PC's use MDSplus to write directly into the tree.  

file transfer via NFS is only done for Fast Camera

file transfer via SAMBA

USXR
- MDSplus tree writes initiated via MDSplus event sent from PC

RGA for trending files
- MDSplus tree writes initiated by polling for new data file
(arrives about every two hours)

 
 

From NSTX$:[nstx.source.runnstx]runnstx.c
* RUNNSTX.C --
*
* History:
*   20-Sep-2002 TRG Make all activities ast-driven, so recognition of
*   events is not delayed while waiting for some
*   action to complete (e.g., create-init).
*   Logical name NSTX_CREATE_INIT_TIME checked to allow
*   user-specified create-init time. May start any
*   time after SOS.

    Default for logical name NSTX_CREATE_INIT_TIME is, effectively:
       $ define/system NSTX_CREATE_INIT_TIME -60

 

Recently we have wanted longer INIT times and have set NSTX_CREATE_INIT_TIME to -75 or even -90; this value does not survive a reboot, so the INIT time reverts to starting at -60 seconds.

  Edited 11-Oct-2007
SWDOC