Discussion:
VMS Basic strings class D vs class S
(too old to reply)
Arne Vajhøj
2024-02-25 21:33:33 UTC
Permalink
If I have understood it correctly then VMS Basic
strings use class D descriptors.

That is very nice.

But what happens if non-Basic code call Basic
code with a string using a class S descriptor?
For input/read? For output/write?

Does the Basic runtime call some STR$ function that
understands the difference between S and D and handle
A properly? Or will I get a runtime error due to invalid
string?

Arne
Dave Froble
2024-02-25 23:55:25 UTC
Permalink
Post by Arne Vajhøj
If I have understood it correctly then VMS Basic
strings use class D descriptors.
That is very nice.
But what happens if non-Basic code call Basic
code with a string using a class S descriptor?
For input/read? For output/write?
Does the Basic runtime call some STR$ function that
understands the difference between S and D and handle
A properly? Or will I get a runtime error due to invalid
string?
Arne
I don't know the correct answer, but, at a guess, whatever is called to handle
the string quite likely will evaluate the descriptor and do "the right thing".
That would be my bet. Otherwise, why have descriptors?
--
David Froble Tel: 724-529-0450
Dave Froble Enterprises, Inc. E-Mail: ***@tsoft-inc.com
DFE Ultralights, Inc.
170 Grimplin Road
Vanderbilt, PA 15486
Arne Vajhøj
2024-02-26 00:06:42 UTC
Permalink
Post by Dave Froble
Post by Arne Vajhøj
If I have understood it correctly then VMS Basic
strings use class D descriptors.
That is very nice.
But what happens if non-Basic code call Basic
code with a string using a class S descriptor?
For input/read? For output/write?
Does the Basic runtime call some STR$ function that
understands the difference between S and D and handle
A properly? Or will I get a runtime error due to invalid
string?
I don't know the correct answer, but, at a guess, whatever is called to
handle the string quite likely will evaluate the descriptor and do "the
right thing". That would be my bet.  Otherwise, why have descriptors?
I am pretty sure that Basic will not corrupt anything, but handling it
and causing an exception would both be sort of acceptable behavior.

A trivial example works:

$ type f.for
program f
character*80 s
call b1('ABC')
call b2(s)
write(*,*) '|'//trim(s)//'|'
end
$ type b.bas
sub b1(string s)
print "|" + s + "|"
end sub
!
sub b2(string s)
s = "ABC"
end sub
$ for f
$ bas b
$ link f + b
$ run f
|ABC|
|ABC|

But I can't but wonder if it always works.

Arne
Lawrence D'Oliveiro
2024-02-26 03:38:29 UTC
Permalink
If I have understood it correctly then VMS Basic strings use class D
descriptors.
The “VAX BASIC User Manual” mentions both dynamic and fixed-length
strings. Chapter 13 explains that strings are fixed-length when part of
COMMON, MAP or RECORD statements, otherwise they are dynamic. Fixed-length
strings obviously cannot have their storage reallocated.

In chapter 21, it mentions that, if you pass strings by descriptor from a
language that doesn’t understand dynamic strings (e.g. Fortran), then they
are passed as fixed-length strings.
Arne Vajhøj
2024-02-28 14:18:54 UTC
Permalink
Post by Lawrence D'Oliveiro
If I have understood it correctly then VMS Basic strings use class D
descriptors.
The “VAX BASIC User Manual” mentions both dynamic and fixed-length
strings. Chapter 13 explains that strings are fixed-length when part of
COMMON, MAP or RECORD statements, otherwise they are dynamic. Fixed-length
strings obviously cannot have their storage reallocated.
And I guess that sort of makes Basic need to handle it
transparently.

$ type f.for
program f
character*80 s
call print_class('ABC')
call b1('ABC')
call print_class(s)
call b2(s)
write(*,*) '|'//trim(s)//'|'
end
$ type b.bas
sub b1(string s)
print "|" + s + "|"
end sub
!
sub b2(string s)
s = "ABC"
end sub
$ type x.c
#include <stdio.h>

#include <descrip.h>

void print_class(struct dsc$descriptor *s)
{
switch(s->dsc$b_class) {
case DSC$K_CLASS_S:
printf("Class S\n");
break;
case DSC$K_CLASS_VS:
printf("Class VS\n");
break;
case DSC$K_CLASS_D:
printf("Class D\n");
break;
default:
printf("Unknown class\n");
break;
}
}
$ for f
$ bas b
$ link f + b + x
$ run f
Class S
|ABC|
Class S
|ABC|

has to work, because of:

$ type f.bas
program f
map (blk1) string s1 = 3
map (blk2) string s2 = 80
external sub b1(string)
external sub b2(string)
external sub print_class(string)
external string function trim(string)
s1 = "ABC"
call print_class(s1)
call b1(s1)
call print_class(s2)
call b2(s2)
print "|" + trim(s2) + "|"
end
!
function string trim(string s)
declare integer ix
ix = len(s)
while ix > 1 and mid$(s, ix, 1) = " "
ix = ix - 1
next
trim = mid$(s, 1, ix)
end function
$ type b.bas
sub b1(string s)
print "|" + s + "|"
end sub
!
sub b2(string s)
s = "ABC"
end sub
$ type x.c
#include <stdio.h>

#include <descrip.h>

void print_class(struct dsc$descriptor *s)
{
switch(s->dsc$b_class) {
case DSC$K_CLASS_S:
printf("Class S\n");
break;
case DSC$K_CLASS_VS:
printf("Class VS\n");
break;
case DSC$K_CLASS_D:
printf("Class D\n");
break;
default:
printf("Unknown class\n");
break;
}
}
$ bas f
$ bas b
$ link f + b + x
$ run f
Class S
|ABC|
Class S
|ABC|

Arne
Stephen Hoffman
2024-02-26 21:17:00 UTC
Permalink
If I have understood it correctly then VMS Basic strings use class D
descriptors.
That is very nice.
But what happens if non-Basic code call Basic code with a string using
a class S descriptor? For input/read? For output/write?
Does the Basic runtime call some STR$ function that understands the
difference between S and D and handle A properly? Or will I get a
runtime error due to invalid string?
"It depends."

Most everything in most of the traditional languages and in the RTLs
does the right thing with both dynamic and static text strings, though
the app code involved might not. BASIC app code works pretty well here,
absent "heroic" efforts by the app developer.

If the app code assumes a dynamic arriving and gets handed static, the
RTL will either copy it, or space-pad the results into the static, or
the RTL will return a string truncation error. BASIC space-pads into
static string buffers if and as needed. Or truncates with an error.

Apps written in C, C++, BLISS, MACRO32, and probably some others may or
may not do the right thing with descriptors, as these languages can
need to handle descriptors in app code due to the (~lack of) descriptor
support in those languages. (Yes, I well know about dscdef.h,
descrip.h, et al., thanks.) Some devs will use RTL calls, and some use
explicit code.

As for home-grown descriptor code, few apps (nobody?) implements all of
the different sorts of descriptors available. Not outside of the RTL
itself, that is.

Apps expecting to work with dynamic descriptors might fail with the
truncation error as mentioned, and apps expecting to massage static
descriptors directly and not coded sufficiently cautiously around any
arriving dynamic strings can fail with heap and other errors.

Of the common languages, Pascal utilizes a wide variety of descriptors.
Calling into or getting called from Pascal tends to teach much about
descriptors and descriptor usage.

it wouldn't surprise me to learn that BASIC will fail to work correctly
with 64-bit string descriptors, though. Lots of home-grown app code
also won't. I also wouldn't expect the RTLs to work with encodings
other than ASCII and DEC MCS, either. And UTF-8 will fail in the
expected places, and most searching and sorting tends not to be
sensitive to the (written) language used within the text string.
--
Pure Personal Opinion | HoffmanLabs LLC
Arne Vajhøj
2024-02-28 14:38:03 UTC
Permalink
Post by Stephen Hoffman
If I have understood it correctly then VMS Basic strings use class D
descriptors.
That is very nice.
But what happens if non-Basic code call Basic code with a string using
a class S descriptor? For input/read? For output/write?
Does the Basic runtime call some STR$ function that understands the
difference between S and D and handle A properly? Or will I get a
runtime error due to invalid string?
"It depends."
Most everything in most of the traditional languages and in the RTLs
does the right thing with both dynamic and static text strings, though
the app code involved might not. BASIC app code works pretty well here,
absent "heroic" efforts by the app developer.
If the app code assumes a dynamic arriving and gets handed static, the
RTL will either copy it, or space-pad the results into the static, or
the RTL will return a string truncation error. BASIC space-pads into
static string buffers if and as needed. Or truncates with an error.
Apps expecting to work with dynamic descriptors might fail with the
truncation error as mentioned, and apps expecting to massage static
descriptors directly and not coded sufficiently cautiously around any
arriving dynamic strings can fail with heap and other errors.
Basic cannot stuff 200 bytes into a 100 bytes fixed length string. That
is fair.
Post by Stephen Hoffman
it wouldn't surprise me to learn that BASIC will fail to work correctly
with 64-bit string descriptors, though. Lots of home-grown app code also
won't.
I sort of get the impression that using 64 bit descriptors is like
doing a bungee jump.

:-)
Post by Stephen Hoffman
I also wouldn't expect the RTLs to work with encodings other than
ASCII and DEC MCS, either. And UTF-8 will fail in the expected places,
and most searching and sorting tends not to be sensitive to the
(written) language used within the text string.
I would assume that it works as long as the string is considered
a sequence of bytes not a sequence of characters.

Arne
Scott Dorsey
2024-02-28 14:56:52 UTC
Permalink
Post by Arne Vajhøj
Post by Stephen Hoffman
If I have understood it correctly then VMS Basic strings use class D
descriptors.
That is very nice.
But what happens if non-Basic code call Basic code with a string using
a class S descriptor? For input/read? For output/write?
Does the Basic runtime call some STR$ function that understands the
difference between S and D and handle A properly? Or will I get a
runtime error due to invalid string?
"It depends."
Most everything in most of the traditional languages and in the RTLs
does the right thing with both dynamic and static text strings, though
the app code involved might not. BASIC app code works pretty well here,
absent "heroic" efforts by the app developer.
If the app code assumes a dynamic arriving and gets handed static, the
RTL will either copy it, or space-pad the results into the static, or
the RTL will return a string truncation error. BASIC space-pads into
static string buffers if and as needed. Or truncates with an error.
Apps expecting to work with dynamic descriptors might fail with the
truncation error as mentioned, and apps expecting to massage static
descriptors directly and not coded sufficiently cautiously around any
arriving dynamic strings can fail with heap and other errors.
Basic cannot stuff 200 bytes into a 100 bytes fixed length string. That
is fair.
Fortran can! And you likely won't notice that you have damaged some other
memory until you get a SIGSEGV in some totally unrelated part of your code.
--scott
--
"C'est un Nagra. C'est suisse, et tres, tres precis."
Arne Vajhøj
2024-02-28 15:26:12 UTC
Permalink
Post by Scott Dorsey
Post by Arne Vajhøj
Post by Stephen Hoffman
If the app code assumes a dynamic arriving and gets handed static, the
RTL will either copy it, or space-pad the results into the static, or
the RTL will return a string truncation error. BASIC space-pads into
static string buffers if and as needed. Or truncates with an error.
Apps expecting to work with dynamic descriptors might fail with the
truncation error as mentioned, and apps expecting to massage static
descriptors directly and not coded sufficiently cautiously around any
arriving dynamic strings can fail with heap and other errors.
Basic cannot stuff 200 bytes into a 100 bytes fixed length string. That
is fair.
Fortran can! And you likely won't notice that you have damaged some other
memory until you get a SIGSEGV in some totally unrelated part of your code.
:-)

Note that VMS Fortran need to be actively mislead to do this.

$ type bufovr.for
program bufovr
character*4 s1, s2
common /b/s1,s2
write(*,*) %loc(s1), %loc(s2)
s2 = 'XXXX'
call subbo1(s1)
write(*,*) s1//s2
call subbo2(s1)
write(*,*) s1//s2
end
c
subroutine subbo1(s)
character*(*) s
s = 'ABCDEFGH'
end
c
subroutine subbo2(s)
character*8 s
s = '12345678'
end
$ for bufovr
$ link bufovr
$ run bufovr
196608 196612
ABCDXXXX
12345678

Arne
Stephen Hoffman
2024-03-06 23:44:29 UTC
Permalink
I also wouldn't expect the RTLs to work with encodings other than ASCII
and DEC MCS, either. And UTF-8 will fail in the expected places, and
most searching and sorting tends not to be sensitive to the (written)
language used within the text string.
I would assume that it works as long as the string is considered a
sequence of bytes not a sequence of characters.
The assumption that one byte is one character is embedded deeply in
OpenVMS system and app code and APIs.

I would assume that such code will break in various ways when presented
with UTF-8.

Anything assuming a correspondence between string length and displayed
width is going to fail, for instance.

That's before discussing sorting and searching and language
differences, as was mentioned. And normalization.

OpenVMS has (had) support some of those differences with NCS and with
ICU, though those APIs aren't (weren't) widely used by apps.
--
Pure Personal Opinion | HoffmanLabs LLC
Lawrence D'Oliveiro
2024-03-08 03:33:58 UTC
Permalink
Post by Stephen Hoffman
I would assume that such code will break in various ways when presented
with UTF-8.
Could be worse. Imagine if you had adopted Unicode at exactly that period
in the early 1990s, like Windows NT and Java did, when it was still
supposed to be a fixed-length 16-bit code. Then you would be saddled with
that albatross known as UTF-16.

Loading...