Discussion:
Expanding Maximum record size in a variable length RMS Indexed file
(too old to reply)
Byron Boyczuk
2021-04-15 19:17:28 UTC
Permalink
We have a few hundred different RMS files as part of a mission critical COBOL application running on OpenVMS 8.3 on ALPHA Servers. We need to expand the maximum record length of a RMS indexed file with variable length records from 4500 to 4600 bytes. The 100 bytes is being added to the end of the existing records. This file is a critical file and is the only Indexed file with variable length records so have not done this before.

When we increase the record size of a file with fixed length records we use the "/PAD" qualifier on the CONVERT command to expand short records for the new file. This option is not allowed when converting an indexed file with variable length records. Therefore the new file will allow 4600 byte records but will only have 4500 bytes records immediately after the CONVERT.

Is there any issue doing it this way or anything to watch out for?

Especially performance related as this file is central to the application. For example, when the application updates an existing record in the new file it will Read in 4500 byte records and REWRITE out 4600 byte records.
Hein RMS van den Heuvel
2021-04-15 21:05:05 UTC
Permalink
Post by Byron Boyczuk
We need to expand the maximum record length of a RMS indexed file with variable length records from 4500 to 4600 bytes.
Ok, SET FILE/ATTR=MRS=4600 will do that. Magically RMS will allow 4600 byte records to be written.
HOWEVER, what is the BUCKETSIZE ? Is is 9 per chance?
- $ write sys$output f$file("tmp.idx","bks") or dir/full or ANALYZE/RMS/FDL - check for AREA 0
If the file was poorly designed with the data bucket only just fitting then an 4500 byte record would require a 9 block bucket (4608 bytes) which would NOT allow for a 4600 byte record to be written due to BUCKET OVERHEAD (15 bytes) and RECORD OVERHEAD (11 bytes).

It seems it is not really a variable length record file is it now? Maybe you should make it fixed?
What are you going to 'fill' the extra bytes with and how will programs know it is for real, or grown?
One why is to NOT forcefully grow all records, and have the record size tell program how much data is valid.

You have to ask yourself - will I be able to update all programs, or will some programs be allowed to keep writing 4500 bytes records.

You could just convert the file to a fixed length record sequential file first with padding, and next convert to the desired indexed file, no padding.
If the file is very large, then you may consider an Indexed file with bucketsize 63, and all compression for the intermediate file - otherwise it may 'explode' due to sequential files not being compressed.
Post by Byron Boyczuk
Is there any issue doing it this way or anything to watch out for?
yes, that bucket size.
Post by Byron Boyczuk
Especially performance related as this file is central to the application.
The record size increase will not make a noticeable difference if there is a reasonable tuning in place already.
Many other factors would influence performance much more.

For mere money I'll be happy to explain further based on the actual file in question.

Cheers,
Hein
abrsvc
2021-04-15 21:07:06 UTC
Permalink
Given that the file contains "variable" length records, it is the program more than the file that requires changes except for the Max recordsize value that the convert should handle just fine. Since the added length is at the end for new records, the old don't need to change assuming that the Cobol code handles the now "short" records. Based upon the way the question was phrased, it sounds like the records are all 4500 bytes long now and will grow to 4600. If this is the case, why the need for variable length?

If the records are all 4500 bytes now, convert to a fixed record file and use convert to pad to the new size.

More info is required here about the records to provide additional recommendations.
Byron Boyczuk
2021-04-16 03:17:45 UTC
Permalink
Given that the file contains "variable" length records, it is the program more than the file that requires changes except for the Max recordsize value that the convert should handle just fine. Since the added length is at the end for new records, the old don't need to change assuming that the Cobol code handles the now "short" records. Based upon the way the question was phrased, it sounds like the records are all 4500 bytes long now and will grow to 4600. If this is the case, why the need for variable length?
If the records are all 4500 bytes now, convert to a fixed record file and use convert to pad to the new size.
More info is required here about the records to provide additional recommendations.
Thank you for your responses. My apologies, I left out a couple of facts that may help clarify. The file contains two different records, one that is 4500 bytes which is a header and one that is 700 bytes which is the details associated with the header. It is a one to many relationship. The application needs to add some more fields to the Header record (due to government regulation changes). The App Teams approach is to expand the record first without changing the application logic by adding "Filler" to the end of the record layout for the Header. All the COBOL programs accessing the file will be re-complied using the expanded record layout. After the CONVERT to change the file, it will contain 4500 bytes records with many being REWRITTEN at 4600 bytes by the application. Of course, new records will be WRITTEN at 4600 bytes.

Current file contains almost 20 Million records and Current Bucket Size in Area 0 is 9. All DATA and INDEX COMPRESSION is "YES" and FILL is "100".

Used output of $ANAL/RMS/FDL in $EDIT/FDL to change record size to 4600, then used INVOKE-OPTIMIZE to generate new FDL. New Bucket Size in Area 0 is 10.

If I understand correctly, this new bucket size will accommodate the expanded record size - correct?

I also assume performance should be similar to the current file.
Tad Winters
2021-04-16 03:46:09 UTC
Permalink
Post by Byron Boyczuk
Given that the file contains "variable" length records, it is the program more than the file that requires changes except for the Max recordsize value that the convert should handle just fine. Since the added length is at the end for new records, the old don't need to change assuming that the Cobol code handles the now "short" records. Based upon the way the question was phrased, it sounds like the records are all 4500 bytes long now and will grow to 4600. If this is the case, why the need for variable length?
If the records are all 4500 bytes now, convert to a fixed record file and use convert to pad to the new size.
More info is required here about the records to provide additional recommendations.
Thank you for your responses. My apologies, I left out a couple of facts that may help clarify. The file contains two different records, one that is 4500 bytes which is a header and one that is 700 bytes which is the details associated with the header. It is a one to many relationship. The application needs to add some more fields to the Header record (due to government regulation changes). The App Teams approach is to expand the record first without changing the application logic by adding "Filler" to the end of the record layout for the Header. All the COBOL programs accessing the file will be re-complied using the expanded record layout. After the CONVERT to change the file, it will contain 4500 bytes records with many being REWRITTEN at 4600 bytes by the application. Of course, new records will be WRITTEN at 4600 bytes.
Ugh!
Redesign the program to use separate files for header and detail, and
used fixed record sizes for each. This will allow simpler changes in
the future. You'll be able to tune each file separately and I think
you'll get much better performance.
Post by Byron Boyczuk
Current file contains almost 20 Million records and Current Bucket Size in Area 0 is 9. All DATA and INDEX COMPRESSION is "YES" and FILL is "100".
Used output of $ANAL/RMS/FDL in $EDIT/FDL to change record size to 4600, then used INVOKE-OPTIMIZE to generate new FDL. New Bucket Size in Area 0 is 10.
If I understand correctly, this new bucket size will accommodate the expanded record size - correct?
I also assume performance should be similar to the current file.
Byron Boyczuk
2021-04-16 04:19:32 UTC
Permalink
Post by Tad Winters
Post by Byron Boyczuk
Given that the file contains "variable" length records, it is the program more than the file that requires changes except for the Max recordsize value that the convert should handle just fine. Since the added length is at the end for new records, the old don't need to change assuming that the Cobol code handles the now "short" records. Based upon the way the question was phrased, it sounds like the records are all 4500 bytes long now and will grow to 4600. If this is the case, why the need for variable length?
If the records are all 4500 bytes now, convert to a fixed record file and use convert to pad to the new size.
More info is required here about the records to provide additional recommendations.
Thank you for your responses. My apologies, I left out a couple of facts that may help clarify. The file contains two different records, one that is 4500 bytes which is a header and one that is 700 bytes which is the details associated with the header. It is a one to many relationship. The application needs to add some more fields to the Header record (due to government regulation changes). The App Teams approach is to expand the record first without changing the application logic by adding "Filler" to the end of the record layout for the Header. All the COBOL programs accessing the file will be re-complied using the expanded record layout. After the CONVERT to change the file, it will contain 4500 bytes records with many being REWRITTEN at 4600 bytes by the application. Of course, new records will be WRITTEN at 4600 bytes.
Ugh!
Redesign the program to use separate files for header and detail, and
used fixed record sizes for each. This will allow simpler changes in
the future. You'll be able to tune each file separately and I think
you'll get much better performance.
Wish I could redesign the file but it is not possible to make all the necessary program changes in the timeframe required to add the new fields. We need to make this work within this constraint.
Dave Froble
2021-04-16 15:05:53 UTC
Permalink
Post by Tad Winters
Post by Byron Boyczuk
Post by abrsvc
Given that the file contains "variable" length records, it is the
program more than the file that requires changes except for the Max
recordsize value that the convert should handle just fine. Since the
added length is at the end for new records, the old don't need to
change assuming that the Cobol code handles the now "short" records.
Based upon the way the question was phrased, it sounds like the
records are all 4500 bytes long now and will grow to 4600. If this is
the case, why the need for variable length?
If the records are all 4500 bytes now, convert to a fixed record file
and use convert to pad to the new size.
More info is required here about the records to provide additional recommendations.
Thank you for your responses. My apologies, I left out a couple of
facts that may help clarify. The file contains two different records,
one that is 4500 bytes which is a header and one that is 700 bytes
which is the details associated with the header. It is a one to many
relationship. The application needs to add some more fields to the
Header record (due to government regulation changes). The App Teams
approach is to expand the record first without changing the
application logic by adding "Filler" to the end of the record layout
for the Header. All the COBOL programs accessing the file will be
re-complied using the expanded record layout. After the CONVERT to
change the file, it will contain 4500 bytes records with many being
REWRITTEN at 4600 bytes by the application. Of course, new records
will be WRITTEN at 4600 bytes.
Ugh!
Redesign the program to use separate files for header and detail, and
used fixed record sizes for each. This will allow simpler changes in
the future. You'll be able to tune each file separately and I think
you'll get much better performance.
Thank you, I didn't want to be the one to point the finger at this design.

I've seen some really bad designs through the years, but this one is
high on the list. I have no idea what someone was thinking when coming
up with this design.

Regardless, the OP has inherited this design, and must work within his
constraints. Too bad.
--
David Froble Tel: 724-529-0450
Dave Froble Enterprises, Inc. E-Mail: ***@tsoft-inc.com
DFE Ultralights, Inc.
170 Grimplin Road
Vanderbilt, PA 15486
Chris Townley
2021-04-16 16:43:58 UTC
Permalink
Post by Dave Froble
I've seen some really bad designs through the years, but this one is
high on the list.  I have no idea what someone was thinking when coming
up with this design.
Regardless, the OP has inherited this design, and must work within his
constraints.  Too bad.
You should have seen some of the file structures I inherited! Mind the
app was originally written in Basic on PDP - pre-Vax

I did manage to change some of them, but it was a nightmare to look at
some of the more commonly accessed files


Chris
Arne Vajhøj
2021-04-16 16:52:03 UTC
Permalink
Post by Chris Townley
Post by Dave Froble
I've seen some really bad designs through the years, but this one is
high on the list.  I have no idea what someone was thinking when
coming up with this design.
Regardless, the OP has inherited this design, and must work within his
constraints.  Too bad.
You should have seen some of the file structures I inherited! Mind the
app was originally written in Basic on PDP - pre-Vax
I did manage to change some of them, but it was a nightmare to look at
some of the more commonly accessed files
And if changing is an option, then the question would also be
how much difference work wise there is between:
- understand how the current code works
- rewrite to better data model using index-sequential files
- test
and:
- understand how the current code works
- rewrite to better data model using SQLite
- test
and whether the second option would be a more desirable state.

Arne
Chris Townley
2021-04-16 17:06:23 UTC
Permalink
Post by Arne Vajhøj
Post by Chris Townley
Post by Dave Froble
I've seen some really bad designs through the years, but this one is
high on the list.  I have no idea what someone was thinking when
coming up with this design.
Regardless, the OP has inherited this design, and must work within
his constraints.  Too bad.
You should have seen some of the file structures I inherited! Mind the
app was originally written in Basic on PDP - pre-Vax
I did manage to change some of them, but it was a nightmare to look at
some of the more commonly accessed files
And if changing is an option, then the question would also be
- understand how the current code works
- rewrite to better data model using index-sequential files
- test
- understand how the current code works
- rewrite to better data model using SQLite
- test
and whether the second option would be a more desirable state.
Arne
System gone for a few years now. Was all RMS files based, so database
was not a sensible option

Understanding the code was generally not a problem, but as a 'full'
system, changing was usually a huge task, with massive testing required.

Chris
Bill Gunshannon
2021-04-16 16:52:10 UTC
Permalink
Post by Chris Townley
Post by Dave Froble
I've seen some really bad designs through the years, but this one is
high on the list.  I have no idea what someone was thinking when
coming up with this design.
Regardless, the OP has inherited this design, and must work within his
constraints.  Too bad.
You should have seen some of the file structures I inherited! Mind the
app was originally written in Basic on PDP - pre-Vax
I did manage to change some of them, but it was a nightmare to look at
some of the more commonly accessed files
Ha!!

I am sure all of us have seen somew really bad design by some of
our predecessors. Like the COBOL I worked on once that was the
result of a contract to convert a system using file access to
database access. The contractor added code to the COBOL program
to access the database and write the results into a flat file
and then use the old programs logic to process the flat file.
And he got paid fore it, too!!

bill
Hein RMS van den Heuvel
2021-04-16 04:42:48 UTC
Permalink
On Thursday, April 15, 2021 at 11:17:46 PM UTC-4, ***@gmail.com wrote:

contains two different records, one that is 4500 bytes which is a header and one that is 700 bytes which is the

Ah! Now the file being variable length records makes sense - but the original thought of wanting to do a /PAD makes less sense - no sense - as that would also from the detail records.

Indeed separate files are clean and easier, notably when non-Cobol, for example SQL access is desired.
However, from a performance perspective the mixed files can be winners.
1) When you read a header record, you are likely to also already have read the associated detail records notably when a good (over) sized bucket is selected.
2) Primary Key compression is at it's most effective. The details records probably/possibly just have 2 or 3 unique bytes, using most of the header key bytes as base.
Post by Byron Boyczuk
All the COBOL programs accessing the file will be re-complied using the expanded record layout.
After the ? CONVERT to change the file, it will contain 4500 bytes records with many being REWRITTEN at 4600 bytes by the application. Of course, new records will be WRITTEN at 4600 bytes.
Fine.
Post by Byron Boyczuk
Current file contains almost 20 Million records
No big deal.
Post by Byron Boyczuk
and Current Bucket Size in Area 0 is 9. All DATA and INDEX COMPRESSION is "YES" and FILL is "100".
Ah! My guess was correct. You would be in trouble without convert. Penny wise pound FOOLISH.
Post by Byron Boyczuk
Used output of $ANAL/RMS/FDL in $EDIT/FDL to change record size to 4600, then used INVOKE-OPTIMIZE to generate new FDL. New Bucket Size in Area 0 is 10.
Yeah well, that's unlikely to be truly optimized. it barely fits a record!

Ok, with compression you can probably fit one header and once related detail, but not much more.
Without further details I'd recommend going straight to 16 or 24 or so and catch all details for a header ?!

The ANALYZE/FDL/OPTIMIZE rules were made 40 years ago when a megabyte was a lot of memory and never changed, it just does the bare minimum pretty much.
Post by Byron Boyczuk
If I understand correctly, this new bucket size will accommodate the expanded record size - correct?
Yes, but only barely.
Post by Byron Boyczuk
I also assume performance should be similar to the current file.
Yes, equally poor. But you have a good opportunity to be better!
Index compression may hurt some, unless the PK is really large ( > 20 bytes) as compression disables binary searches in index buckets.
Key compression is pretty much always desirable.
Why not disable a maximum record size, only limited by bucket size, by setting it to 0 (zero!) or 6000 or some other arbitrary larger number for more changes in the future?

Good luck.
Hein
Byron Boyczuk
2021-04-16 05:26:28 UTC
Permalink
Post by Byron Boyczuk
contains two different records, one that is 4500 bytes which is a header and one that is 700 bytes which is the
Ah! Now the file being variable length records makes sense - but the original thought of wanting to do a /PAD makes less sense - no sense - as that would also from the detail records.
Indeed separate files are clean and easier, notably when non-Cobol, for example SQL access is desired.
However, from a performance perspective the mixed files can be winners.
1) When you read a header record, you are likely to also already have read the associated detail records notably when a good (over) sized bucket is selected.
2) Primary Key compression is at it's most effective. The details records probably/possibly just have 2 or 3 unique bytes, using most of the header key bytes as base.
Post by Byron Boyczuk
All the COBOL programs accessing the file will be re-complied using the expanded record layout.
After the ? CONVERT to change the file, it will contain 4500 bytes records with many being REWRITTEN at 4600 bytes by the application. Of course, new records will be WRITTEN at 4600 bytes.
Fine.
Post by Byron Boyczuk
Current file contains almost 20 Million records
No big deal.
Post by Byron Boyczuk
and Current Bucket Size in Area 0 is 9. All DATA and INDEX COMPRESSION is "YES" and FILL is "100".
Ah! My guess was correct. You would be in trouble without convert. Penny wise pound FOOLISH.
Post by Byron Boyczuk
Used output of $ANAL/RMS/FDL in $EDIT/FDL to change record size to 4600, then used INVOKE-OPTIMIZE to generate new FDL. New Bucket Size in Area 0 is 10.
Yeah well, that's unlikely to be truly optimized. it barely fits a record!
Ok, with compression you can probably fit one header and once related detail, but not much more.
Without further details I'd recommend going straight to 16 or 24 or so and catch all details for a header ?!
The ANALYZE/FDL/OPTIMIZE rules were made 40 years ago when a megabyte was a lot of memory and never changed, it just does the bare minimum pretty much.
Post by Byron Boyczuk
If I understand correctly, this new bucket size will accommodate the expanded record size - correct?
Yes, but only barely.
Post by Byron Boyczuk
I also assume performance should be similar to the current file.
Yes, equally poor. But you have a good opportunity to be better!
Index compression may hurt some, unless the PK is really large ( > 20 bytes) as compression disables binary searches in index buckets.
Key compression is pretty much always desirable.
Why not disable a maximum record size, only limited by bucket size, by setting it to 0 (zero!) or 6000 or some other arbitrary larger number for more changes in the future?
Good luck.
Hein
Thank you very much. This has been a great help, I will use 16 for Area 0 & 1. Btw, PK is 44 bytes. Here is the full FDL that was generated.

IDENT FDL_VERSION 02 "15-APR-2021 22:40:25 OpenVMS FDL Editor"

SYSTEM
SOURCE "OpenVMS"

FILE
CONTIGUOUS no
FILE_MONITORING no
NAME "DGA3820:[MONTHLY.SATURDAYS]CBWRKCCI.FIL;1"
ORGANIZATION indexed
OWNER [LICA,LOCUSOPER]
PROTECTION (system:RWED, owner:RWED, group:RWED, world:RWE)
GLOBAL_BUFFER_COUNT 0
GLBUFF_CNT_V83 0
GLBUFF_FLAGS_V83 none

RECORD
BLOCK_SPAN yes
CARRIAGE_CONTROL carriage_return
FORMAT variable
SIZE 4600

AREA 0
ALLOCATION 11491536
BEST_TRY_CONTIGUOUS yes
BUCKET_SIZE 10
EXTENSION 65520

AREA 1
ALLOCATION 75200
BEST_TRY_CONTIGUOUS yes
BUCKET_SIZE 10
EXTENSION 18800

AREA 2
ALLOCATION 43600
BEST_TRY_CONTIGUOUS yes
BUCKET_SIZE 8
EXTENSION 10912

KEY 0
CHANGES no
DATA_AREA 0
DATA_FILL 100
DATA_KEY_COMPRESSION yes
DATA_RECORD_COMPRESSION yes
DUPLICATES no
INDEX_AREA 1
INDEX_COMPRESSION yes
INDEX_FILL 100
LEVEL1_INDEX_AREA 1
NAME ""
NULL_KEY no
PROLOG 3
SEG0_LENGTH 44
SEG0_POSITION 0
TYPE string

KEY 1
CHANGES yes
DATA_AREA 2
DATA_FILL 100
DATA_KEY_COMPRESSION yes
DUPLICATES yes
INDEX_AREA 2
INDEX_COMPRESSION yes
INDEX_FILL 100
LEVEL1_INDEX_AREA 2
NAME ""
NULL_KEY yes
NULL_VALUE '0'
SEG0_LENGTH 33
SEG0_POSITION 44
TYPE string
Simon Clubley
2021-04-16 12:21:06 UTC
Permalink
Post by Byron Boyczuk
Thank you very much. This has been a great help, I will use 16 for Area 0 & 1. Btw, PK is 44 bytes. Here is the full FDL that was generated.
If you are worried about performance, then how badly fragmented is
the volume this file lives on ?

Little point in making the indexed file itself optimised if you are
also making the XQP weep every time you ask it to find a free
set of blocks on the volume. :-)

Simon.
--
Simon Clubley, ***@remove_me.eisner.decus.org-Earth.UFP
Walking destinations on a map are further away than they appear.
Hein RMS van den Heuvel
2021-04-16 18:29:54 UTC
Permalink
Post by Byron Boyczuk
Is there any issue doing it this way or anything to watch out for?
it will Read in 4500 byte records and REWRITE out 4600 byte records.
In the FDL you published it shows - DATA_FILL 100
You may want to to select 90% or 95% to allow for those header records to grow without causing bucketsplits.
It's not super likely there will not be a few bytes to spare after the initial convert, but why not give it a little fudge factor to be on the safer side.
Post by Byron Boyczuk
Especially performance related as this file is central to the application.
The FDL does NOT show any global buffer in use.
If global buffers are not used then one forfeits the right to talk about performance. Period.
Clearly the powers that be you do not really care about the performance for even a 'central' file.

Next do a SHOW RMS ... let me guess. All defaults?
Like I said they don't really care about performance enough to do something about it.

Cheers,
Hein.
Hein RMS van den Heuvel
2021-04-17 00:07:42 UTC
Permalink
We have a few hundred different RMS files as part of a mission critical COBOL ...
FYI, this topic was cross posted as: https://forum.vmssoftware.com/viewtopic.php?f=13&t=178

Hein.
Dirk Munk
2021-04-22 10:35:26 UTC
Permalink
Post by Byron Boyczuk
We have a few hundred different RMS files as part of a mission critical COBOL application running on OpenVMS 8.3 on ALPHA Servers. We need to expand the maximum record length of a RMS indexed file with variable length records from 4500 to 4600 bytes. The 100 bytes is being added to the end of the existing records. This file is a critical file and is the only Indexed file with variable length records so have not done this before.
When we increase the record size of a file with fixed length records we use the "/PAD" qualifier on the CONVERT command to expand short records for the new file. This option is not allowed when converting an indexed file with variable length records. Therefore the new file will allow 4600 byte records but will only have 4500 bytes records immediately after the CONVERT.
Is there any issue doing it this way or anything to watch out for?
Especially performance related as this file is central to the application. For example, when the application updates an existing record in the new file it will Read in 4500 byte records and REWRITE out 4600 byte records.
Why not write a simple Cobol prograam that will read the present
indexed, extend the records from 4500 to 4600 bytes, and write a
sequential output file.

You can then design the new indexed file (bucket sizes etc.), and build
it. This way you will have an optimized indexed file.

Loading...