| |
SUMMARY
check that
RUNNSTX is running
check
IPCS:
-
check that GRANDPA and NETMAILER are running on KEES
& -
check that IPCS is running on EPICS and the VMS jobs are "connected"
check
that NFS is running on BIRCH
check
that the NSTX_ACQREMOTE queue has the proper jobs running
check
that the NSTX_DB queue has the proper jobs running
check
that the NSTX_VGDS queue has the proper jobs running
run a
test shot (ask COE or, if not a run day, use Clock control page
from EPICS console)
To get the definition
of "shw", execute
@DAS:DASLOGIN,
or define
$
shw :== $DAS$:[UTIL]show$world.exe
To
define NFS as a symbol for NFScontrol if you don't already
have it:
$ NFS*_control :==
@ops:[nfs]nfs_control.com
SYSPRV
is necessary to run NFS ($ SET PROC/PRIV=SYSPRV)
To change
the Facility Clock time used for starting the MDSplus INIT
phase (T-60 seconds by default), change the SYSTEM logical
name
$
define/system NSTX_CREATE_INIT_TIME -60
on KEES |
|
I. Check
that RUNNSTX is running
KEES$ shw runnstx
| Pid |
Prcname |
Image |
State |
Pri |
CPU |
ppgcnt/wspeak |
Faults |
| 21205AFF |
RUNNSTX |
RUNNSTX |
HIB |
7/4 |
00:02:34.23 |
4880/6816 |
801 |
When RUNNSTX is hung,
one indication can be that Image shows up as "--" and
State is LEF. In this case, stop the existing process. WORLD privilege
is necessary to stop a process you did not start. ($ STOP PROC/ID=pid)
KEES$
STOP/ID=21205AFF ! the PID from
SHW RUNNSTX
and then restart RUNNSTX as below.
If it isn't running
after a reboot, probably a whole lot of other things are wrong,
and, after verifying that, probably KEESSYSRTM.COM should be re-run
from the KEES SYSMANAGER directory. On KEES,
KEES$
@SYS$SPECIFIC:[SYSMGR]KEESSYSRTM.COM
If just RUNNSTX is missing,
restart it:
KEES$
@NSTX$:[NSTX.SOURCE.RUNNSTX]START_RUNNSTX.COM
|
In summary, to stop RUNNSTX and restart it:
KEES$ set process/priv=world
KEES$ @das$:[com]daslogin.com ! if not in login.com
KEES$ shw *runnstx* ! pid will be in left-hand column
KEES$ stop/id=<pid for RUNNSTX>
KEES$ set default nstx$:[nstx.source.runnstx]
KEES$ @START_RUNNSTX.COM
|
II. Check
IPCS
KEES$ shw/node=kees
*ipcs*
Pid Prcname
Image State Pri
21200226 IPCS_GRANDPA GRANDPA HIB 10/7
21200228 IPCSMAILER NETMAILER HIB 10/7
To verify that RUNNSTX
is "seeing" IPCS events, check the end of the NSTX$:[LOGS.RUNNSTX]RUNNSTX.LOG
and RUNNSTX_yymmdd.LOG
[i.e., RUNNSTX_010208
for 08-Feb-2001]
Inspect the last IPCS message whose receipt has been recorded. If
RUNNSTX "knows" it has lost IPCS, a long list of "trying to connect"
messages will be present in RUNNSTX.LOG, but this only happens if
RUNNSTX is restarted after IPCS has been lost:
xxx: evt_wait4ack: err from ipcs
sts=no_net_receiver
xxx: evtHello: err from evt_wait4ack, connecting to EVTMGR_NSTX
sts=Returned message
...wait a bit before trying to re-connect ...
If RUNNSTX is happy
and no shots have occurred recently, "lamcheck" messages will be
at the end of the log file:
KEES$ type/tail=5 nstx$:[logs.runnstx]runnstx.log
runnstx: lamCheck: 26-Oct-2000 14:36:00
runnstx: lamCheck: 26-Oct-2000 14:38:00
runnstx: lamCheck: 26-Oct-2000 14:40:00
runnstx: lamCheck: 26-Oct-2000 14:42:00
runnstx: lamCheck: 26-Oct-2000 14:44:00
Restarting
IPCS from the EPICS console
If it appears that IPCS
is no longer connected, it must be restarted on the EPICS side:
in order to do this
it is usually necessary to "unlock" the EPICS console next to the
KEES console. "One must be a privileged operator" is the hint for
the password. (This is the CICOPR password password.)
From control page CH01,
in the Operations Startup panel, there is a large gray button marked
with "!" This is normally set up and visible on the "CI&C Operations"
window of the Common Desktop interface.
The "!" button is a
drop-down menu of actions; choose Start/stop evtMgr tasks,
and watch on the Xterm output that the shot-cycle events are subsequently
subscribed by RUNNSTX, and by the ClockSync_NSTX program, which
will probably be identified as "<PID>_kees"; Sometimes this
takes more than one restart. RUNNSTX waits between attempts to reconnect,
so the subscribe messages may not come immediately. It has infrequently
also been necessary to Start Clock Event Tasks. from
the save menu. [Note: during from 2002 until mid 2004, restarting
IPCS from the EPICS console was very rarely needed.]
If there is a problem
at this stage, Tom Gibney or Paul Sichta should be consulted.
|
III. Check
that NFS is running on BIRCH
NFS server should
be running on BIRCH:
KEES$ shw nfs_server*/node=BIRCH
| Pid |
Prcname |
Image |
State |
Pri |
| 216094B0 |
NFS_SERVER |
NFS_SERVER |
HIB |
11/9 |
Several processes such as the following
are likely to appear as well
21607CB1
NFS_SERVERIO_1 ASYNC_IO_ASS LEF 12/9
216090C5 NFS_SERVERIO_2 ASYNC_IO_ASS LEF 12/9
If the NFS server
task is not present, or if users of NFS (at present: Fast Camera
PC, USXR PC, UCLA MMWR PC) report that they don't see any change
to their shotnumber files, it may be necessary to restart NFS
(from BIRCH) by running NFScontrol (see definition setup
at top of document):
$ NFS
choose "2" to stop
NFS and then "1" to restart;
to see the status
of the NFS server process from inside NFScontrol requires more
than SYSPRV, but using SHW to check it does not
|
IV
Check NSTX_ batch queues
To view all the jobs
in a batch queue,
$
SHOW QUE/ALL NSTX_<specific queue name>
To also see the number
of jobs allowed to be executing at once and the command files that
started the job (to locate log files, for instance, or to be able
to restart a job that must be temporarily stopped)
$
SHOW QUE/ALL/FULL NSTX_<specific queue name>
Our convention is that
there should be a single startup file for each shot-cycle queue.
These startup files are invoked whenever KEES is rebooted, via the
KEESSYSRTM.COM file, which in turn points to our "master" files:
Queue name "Master"
submit file
 |
NSTX_ACQREMOTE |
NSTX$:[ACQREMOTE]MASTER_ACQREMOTE_SUBMIT.COM |
| |
NSTX_DB
|
NSTX$:[DB]MASTER_DB_SUBMIT.COM |
| |
NSTX_VGDS |
NSTX$:[VGDS.BATCHJOBS]MASTER_VGDSSUBMIT.COM |
Whenever a job must
run under a specific owner (because a QCS queue is owned by that
account or because a database table on Eagle is owned by that account,
or ...) make sure that the "/USER_" switch is set on the submit
command. Submitting a job under a specific username requires SYSPRV
and CMKRNL.
|
|
|
|
|
| |
|
|
|
|
| Troubleshooting:
|
|
|
|
|
Log
files
Log files for all
our shot-cycle jobs are meant to be placed in subdirectories of
NSTX$:[LOGS]. Currently:
Subdirectories of
NSTX$:[LOGS]
ACQREMOTE
CAMAC DB EVENTS MPTS
RF RUNNSTX SQLSERVER
TC_MON VGDS
The NSTX$:[LOGS.RUNNSTX]
directory contains log files for RUNNSTX, DISPATCHER, GKB2_SERVER
and GKC2_SERVER, each suffixed with the date in yymmdd format.
Using the SEARCH command
on these logs, or using TYPE/TAIL=nn to see
the most recent entries, are the most common checking techniques
|
| |
NSTXACTMON
and SHOTCAMACMON
$ nstxACTMON :== "spawn/not/now/input=nl:
mcr actmon -monitor kees::mon_server"
NSTXACTMON
invokes the program that receives status messages from the dispatcher
and various mdsservers, relayed by a monitor_server task; there
is only one monitor_server, automatically installed, but there
can be many users watching proceedings using NSTXACTMON without
incurring a performance penalty (or so we are assured). NSTXACTMON
can be run on any Alpha node.
Note: MPTS (Thomson
Scattering) runs its own INIT sequence, although the STORE
cycle is run via MDSplus; if MPTS is not running, it is normal
to get STORE cycle errors for various TS digraph modules,
since they will not have been initialized.
To discover what
errors occurred in earlier shots, the PPPL utility SHOTCAMACMON.PRO
can be used. (DMASTROVITO$:[CAMAC_ERRORS]SHOTCAMACMON.PRO at the
moment)
|
Remote acquisition:
EPICS, Rich (the plasma control computer) and most
PC's use MDSplus to write directly into the tree.
file
transfer via NFS is only done for Fast Camera
file transfer via SAMBA
USXR
- MDSplus tree writes initiated via MDSplus event sent from PC
RGA for trending files
- MDSplus tree writes initiated by polling for new data file
(arrives about every
two hours)
|
|
| |
From NSTX$:[nstx.source.runnstx]runnstx.c
* RUNNSTX.C --
*
* History:
* 20-Sep-2002 TRG Make all activities ast-driven, so recognition
of
* events is not delayed while waiting for some
* action to complete (e.g., create-init).
* Logical name NSTX_CREATE_INIT_TIME checked to allow
* user-specified create-init time. May start any
* time after SOS.
Default for logical name NSTX_CREATE_INIT_TIME
is, effectively:
$ define/system NSTX_CREATE_INIT_TIME
-60
Recently we have wanted longer INIT times and have set NSTX_CREATE_INIT_TIME
to -75 or even -90; this value does not survive a reboot, so the INIT
time reverts to starting at -60 seconds. |