Discussion:
RMS File statistics and Hein's RMS_STATS program, all zeroes at times
(too old to reply)
Richard Jordan
2024-11-18 20:45:27 UTC
Permalink
Still working on determining the cause of the sporadic severe slowdown
issues with a particular batch job and have run into another issue.

We're trying to get RMS stats from one of the files. We have a monitor
batch job that snapshots info about the problem batch job; that monitor
batch job uses Hein's RMS_STATS program to dump them twice, once just
before the start of the problem batch job, and once right after it
completes. I don't want to risk using the option to zero the counters
on the production system and file so we need before and after.


So the issue is, the output of RMS_STATS in the batch job looks correct
but all the counters are always zero (0).

I can do

RMS_STATS -c -o=a DKA2:[DIR1.DIR2]FILE.DAT

interactively and I'll get the expected output, with counters climbing
on subsequent runs.

If I temporarily disable stats I'll get the warning from RMS_STATS as
expected; it is looking at the same correct file either way.

The problem batch job is run under a normal user account. The monitor
batch that does the system analyzer snapshots (to watch for 'busy' file
channels) and tries to use RMS_STATS is currently running as SYSTEM, but
we've tested it under our priv'd maintenance account also. No
difference in behavior.

And I can run the intaractive command under our maintenance account OR
as SYSTEM and in both cases get real and incrementing counters back, not
just zeroes.

Any thoughts?
Richard Jordan
2024-11-18 21:08:40 UTC
Permalink
Post by Richard Jordan
Still working on determining the cause of the sporadic severe slowdown
issues with a particular batch job and have run into another issue.
We're trying to get RMS stats from one of the files.  We have a monitor
batch job that snapshots info about the problem batch job; that monitor
batch job uses Hein's RMS_STATS program to dump them twice, once just
before the start of the problem batch job, and once right after it
completes.  I don't want to risk using the option to zero the counters
on the production system and file so we need before and after.
So the issue is, the output of RMS_STATS in the batch job looks correct
but all the counters are always zero (0).
I can do
     RMS_STATS -c -o=a DKA2:[DIR1.DIR2]FILE.DAT
interactively and I'll get the expected output, with counters climbing
on subsequent runs.
If I temporarily disable stats I'll get the warning from RMS_STATS as
expected; it is looking at the same correct file either way.
The problem batch job is run under a normal user account.  The monitor
batch that does the system analyzer snapshots (to watch for 'busy' file
channels) and tries to use RMS_STATS is currently running as SYSTEM, but
we've tested it under our priv'd maintenance account also.  No
difference in behavior.
And I can run the intaractive command under our maintenance account OR
as SYSTEM and in both cases get real and incrementing counters back, not
just zeroes.
Any thoughts?
Forgot to add the last test. I trimmed the monitor batch procedure
that ran RMS_STATS down to just the symbol definitions and those two
runs, only added a 5 second wait in between them, and its returning real
numbers instead of zeroes. So something is different in the 1:55AM -
3:30AM timeframe when the full monitor job runs. Tested the same
accounts (SYSTEM and our maintenance account) on two separate batch
queues, all returns are good.
abrsvc
2024-11-19 11:42:21 UTC
Permalink
File statistics are gathered only if enabled. Please be sure that you
have enabled them by using the set file/statistics command for each of
hte files of interest.

Dan
Richard Jordan
2024-11-19 16:24:48 UTC
Permalink
Post by abrsvc
File statistics are gathered only if enabled. Please be sure that you
have enabled them by using the set file/statistics command for each of
hte files of interest.
Dan
They are enabled on the one single data file as noted. If they weren't
then RMS_STATS would return an error. As it is it returns zero values
in every field when run in the early morning batch jobs, but returns
real data when RMS_STATS is run interactively or in batch (during the
day). It doesn't seem to make sense.

The 1:55AM batch job originally enabled stats so we would not have them
running all day, then snapshotted the stats to get starting numbers
(except all it ever shows are zeros). The actual monitor batch would
turn stats off on completion (and after trying to take a snapshot) so
they were not enabled all day (I wasn't sure of overhead imposed).

Now they are enabled full time, so we still need a 'start' and 'end'
snapshot for stats during only the problem batch run time.

Same thing happened this morning. Both the 1:55AM batch log and the
problem job log has output with zero for all values.

I define the symbol RMS_STATS on my interactive process and run it
(copying the lines from the batch jobs so they're identical) and I get
real values back. If I turn stats off on the file I get the expected
error back about RMS statistics NOT enabled on the file. If I wait a
while and turn them back on, I get values showing the incremental usage,
so turning off stats doesn't seem to actually stop them from being
retained and updated.

And I ran my abbreviated batch test again (the stripped down monitor
batch) on the same queue it runs on, and getting valid numbers.
Volker Halle
2024-11-19 16:55:26 UTC
Permalink
NOTE this from good old ITRC forum back in 2006:

https://community.hpe.com/t5/operating-system-openvms/monitor-rms-problem/td-p/3707104

...
Because rms stats are kept in memory and will reset to 0 if, for a
moment, no process has the file open.

Volker.
abrsvc
2024-11-19 16:57:09 UTC
Permalink
I believe that the /ahare option will make a difference to the
/statistics behavior. That might explain the differences here.

Dan
Richard Jordan
2024-11-19 23:15:10 UTC
Permalink
Post by abrsvc
I believe that the /ahare option will make a difference to the
/statistics behavior.  That might explain the differences here.
Dan
The docs say the /share option allows you to enable or disable stats
while the file is open. That file is always open by many users during
business hours and poeridically at night for batch operations. I
default to using the /share option as a precaution.

It is highly likely that when the problem batch job finishes, the file
we're looking is not open by anyone. The job calls multiple images, and
only on completion of the last image is the RMS_STATS program called.

So if the word from Volker about stats being cleared when the file is
not opened by anyone is the case, that would explain things. The file
is quiesced around 11PM; after that it is not opened until the problem
batch starts at 2AM.


So maybe we can create another job that opens the file with minimal
access at 1:59AM and closes it when the monitor batch has completed its
snapshots. Won't be able to test that tonight but tomorrow we'll see.

Thanks for the input!
abrsvc
2024-11-19 23:47:16 UTC
Permalink
Volker is right and I didn't think of it at the time. There are sites
that have a simple program that does nothing but open files and
hibernate to both keep the statistics active but also the global buffers
which are reset as well.

Dan
Arne Vajhøj
2024-11-20 00:44:41 UTC
Permalink
Volker is right and I didn't think of it  at the time. There are sites
that have a simple program that does nothing but open files and
hibernate to both keep the statistics active but also the global buffers
which are reset as well.
Any special requirements needed or is:

$ open/read/share f 'p1'
$ wait 'p2'
$ close f
$ exit

and:

$ @keepopen foobar.dat 03:00:00

good enough?

Arne
Volker Halle
2024-11-20 07:46:24 UTC
Permalink
Richard,

if you'd be already using T4 on the production system, here is an
article, which explain how to add RMS file information to T4:

https://h41379.www4.hpe.com/openvms/journal/v11/t4_and_rms_collector.html

Volker.

Loading...