a***@floatingbear.ca
2006-11-23 15:32:14 UTC
We are running an Alpha and OpenVMS V7.1 and have for many years. We
occasionally have problems with job syncronization where the job we are
waiting for ends and the sycronize job just keeps waiting. The process
that is running is QUEMAN.
However, last night, after adding "just a couple" of more jobs to our
overnight processing, all of the queues stopped working. Quite a
number of jobs completed normally and about a dozen jobs were waiting
for a syncronize to a job that had already finished. HOWEVER all of
the rest of the jobs that should have been executing were sitting in a
"Starting" status. Trying to delete existing jobs had them go to an
"Aborting" status and hang. Stopping the queues also did not complete.
We ended up re-booting the system and rebuilding the queue manager
files and then manually re-submitting the jobs which are not chugging
along. About two years ago, we also started a practice of re-booting
the system once a month and rebuilding the queue manager files about
once a quarter. We last did that about a week ago.
Is anyone familiar with what might have caused our problems last night
with the jobs sitting as "Starting", is there something that we should
be doing that could resolve this problem? My task for today is to try
to reduce the number of jobs that are in the queue at any one time.
Thanks
Andrew Butchart
***@floatingbear.ca
occasionally have problems with job syncronization where the job we are
waiting for ends and the sycronize job just keeps waiting. The process
that is running is QUEMAN.
However, last night, after adding "just a couple" of more jobs to our
overnight processing, all of the queues stopped working. Quite a
number of jobs completed normally and about a dozen jobs were waiting
for a syncronize to a job that had already finished. HOWEVER all of
the rest of the jobs that should have been executing were sitting in a
"Starting" status. Trying to delete existing jobs had them go to an
"Aborting" status and hang. Stopping the queues also did not complete.
We ended up re-booting the system and rebuilding the queue manager
files and then manually re-submitting the jobs which are not chugging
along. About two years ago, we also started a practice of re-booting
the system once a month and rebuilding the queue manager files about
once a quarter. We last did that about a week ago.
Is anyone familiar with what might have caused our problems last night
with the jobs sitting as "Starting", is there something that we should
be doing that could resolve this problem? My task for today is to try
to reduce the number of jobs that are in the queue at any one time.
Thanks
Andrew Butchart
***@floatingbear.ca