Discussion:
64-bit file sizes, was: Re: scp or sftp: file is "raw", needs to be parsed - possible to work around that?
Add Reply
Simon Clubley
2021-05-20 17:59:33 UTC
Reply
Permalink
And now I'm pondering what the eventual advent of 64-bit file sizes
will do to existing apps and APIs...
That's a good question.

It's a pity that on VMS, when using the VMS APIs instead of the C ones,
you can't just use a 64-bit version of stat() with appropriate 64-bit
integer variables, compile your code, and then call it a day. :-(

I wonder if sizes will be locked at the current maximum size for the
older APIs (with the value varying depending on whether you are working
in blocks or bytes) with yet _another_ set of APIs to get and manipulate
the 64-bit file size.

It's also possible that when using higher-level languages such as C
with the C APIs they may just do what Linux does and just call
different versions in the glibc stat() call depending on whether you
are working with 32-bit or 64-bit variables.

That might not work however due to the fact that you can have both
32-bit and 64-bit pointers in the same image on VMS, unlike the either
32-bit or 64-bit pointers seen in Linux binaries.

Also, will there need to be another version of RMS indexed files to
handle the larger file sizes ?

Simon.
--
Simon Clubley, ***@remove_me.eisner.decus.org-Earth.UFP
Walking destinations on a map are further away than they appear.
Craig A. Berry
2021-05-20 18:27:50 UTC
Reply
Permalink
Post by Simon Clubley
And now I'm pondering what the eventual advent of 64-bit file sizes
will do to existing apps and APIs...
That's a good question.
It's a pity that on VMS, when using the VMS APIs instead of the C ones,
you can't just use a 64-bit version of stat() with appropriate 64-bit
integer variables, compile your code, and then call it a day. :-(
I wonder if sizes will be locked at the current maximum size for the
older APIs (with the value varying depending on whether you are working
in blocks or bytes) with yet _another_ set of APIs to get and manipulate
the 64-bit file size.
It's also possible that when using higher-level languages such as C
with the C APIs they may just do what Linux does and just call
different versions in the glibc stat() call depending on whether you
are working with 32-bit or 64-bit variables.
Large file support has been available in the CRTL for quite a few years.
If you define either _LARGEFILE or _USE_STD_STAT, then decc$types.h
defines __USE_OFF64_T, and you can see what that does to the type of off_t:

# if __USE_OFF64_T
typedef __int64 __off_t;
# else
typedef int __off_t;
# endif

so the file size in the stat struct (which has type off_t) can be a
64-bit integer if you want it to be. I can't remember when this came
along, but I believe it was before 8.x, possibly in the 7.3-2 era.
Arne Vajhøj
2021-05-20 19:12:58 UTC
Reply
Permalink
Post by Simon Clubley
And now I'm pondering what the eventual advent of 64-bit file sizes
will do to existing apps and APIs...
That's a good question.
It's a pity that on VMS, when using the VMS APIs instead of the C ones,
you can't just use a 64-bit version of stat() with appropriate 64-bit
integer variables, compile your code, and then call it a day. :-(
I wonder if sizes will be locked at the current maximum size for the
older APIs (with the value varying depending on whether you are working
in blocks or bytes) with yet _another_ set of APIs to get and manipulate
the 64-bit file size.
It's also possible that when using higher-level languages such as C
with the C APIs they may just do what Linux does and just call
different versions in the glibc stat() call depending on whether you
are working with 32-bit or 64-bit variables.
That might not work however due to the fact that you can have both
32-bit and 64-bit pointers in the same image on VMS, unlike the either
32-bit or 64-bit pointers seen in Linux binaries.
VMS C stat can switch between 32 and 64 bit st_size just by recompiling
with _LARGEFILE defined.

It is not a problem with mixed 32 and 64 bit pointers as stat is calling
different functions depending on the define.

(DECC$__LONG_GID_STAT and DECC$__OFF64_LONG_GID_STAT)

The VMS API's does not have a 2 GB or 4 GB problem as they
count in blocks not bytes.

So the 32 bit field limit size to 1 TB or 2 TB.

Which is fundamentally the same problem, but in practice a 2 GB
limit is much more problematic than a 1 TB limit.

And since ODS-5 is limited to 1 TB anyway then no bigger files
exist on VMS today.

When we get ODS-n n>5 based on GFS-2 allowing PB and EB
files then some changes will be needed.

Maybe we will have XABFHCL with XAB$Q_EBK and RAB64L with
more RAB$W_RFA.

:-)
Post by Simon Clubley
Also, will there need to be another version of RMS indexed files to
handle the larger file sizes ?
Or maybe they will limit index-sequential files to 1 TB and
say that for >1TB then people have to use SQLite (which
funny enough is currently limited to 256 TB due to 32 bit
page indexes and 64 KB pages).

Arne
Stephen Hoffman
2021-05-20 19:23:43 UTC
Reply
Permalink
Post by Simon Clubley
It's also possible that when using higher-level languages such as C
with the C APIs they may just do what Linux does and just call
different versions in the glibc stat() call depending on whether you
are working with 32-bit or 64-bit variables.
I'd expect we'll see 32-bit values exposed in existing apps,
discussions of C large-file support aside. Very few folks use
large-file support.
Post by Simon Clubley
That might not work however due to the fact that you can have both
32-bit and 64-bit pointers in the same image on VMS, unlike the either
32-bit or 64-bit pointers seen in Linux binaries.
There is 32-bit virtual addressing for memory, and 32-bit storage
addressing that's centrally involved here.

Apple deprecated the 32-bit virtual addressing APIs some years back,
and removed those APIs with the previous macOS release.

macOS has had large-file support for many years, though. Linux too has
had large-file support for a number of years.

It's been OpenVMS that's been limited to 32-bit 2 TiB storage
addressing for ~twenty years.
Post by Simon Clubley
Also, will there need to be another version of RMS indexed files to
handle the larger file sizes ?
There are 32-bit LBNs in various parts of RMS and its APIs, and in the
IO$_ACPCONTROL interface, and in the $qio interface, yes.

I'd expect to find 32-bit LBNs in the BASIC RTL, given BASIC wasn't
updated for 64-bit addressing. Also in some (most?) storage-related
add-on device drivers, too.

VSI will probably put together some 64-bit LBN migration documentation,
but work on that is probably at least a year away and probably further.
--
Pure Personal Opinion | HoffmanLabs LLC
Simon Clubley
2021-05-20 20:10:12 UTC
Reply
Permalink
Post by Stephen Hoffman
Post by Simon Clubley
It's also possible that when using higher-level languages such as C
with the C APIs they may just do what Linux does and just call
different versions in the glibc stat() call depending on whether you
are working with 32-bit or 64-bit variables.
I'd expect we'll see 32-bit values exposed in existing apps,
discussions of C large-file support aside. Very few folks use
large-file support.
As Craig and Arne have just reminded me :-), there is already some
64-bit support in the C RTL (at least until you reach block numbers
greater than 32 bits).

In my defence, I was thinking about the native VMS APIs and tacked on
the C RTL comments as an afterthought...

I wonder how difficult it would be to use the 64-bit stuff (when it
arrives) in Macro-32 code.
Post by Stephen Hoffman
Post by Simon Clubley
Also, will there need to be another version of RMS indexed files to
handle the larger file sizes ?
There are 32-bit LBNs in various parts of RMS and its APIs, and in the
IO$_ACPCONTROL interface, and in the $qio interface, yes.
I was thinking more about the internal RMS indexed file structure and
if the internal block pointers to other parts of the indexed file are
32-bit values.

I was wondering if we would need a Prolog-4 version of RMS indexed
files with 64-bit internal block numbers. There are also RFAs to
consider and how you would move them around the rest of VMS and
the user's applications.

Simon.
--
Simon Clubley, ***@remove_me.eisner.decus.org-Earth.UFP
Walking destinations on a map are further away than they appear.
Stephen Hoffman
2021-05-20 20:35:48 UTC
Reply
Permalink
Post by Simon Clubley
I was thinking more about the internal RMS indexed file structure and
if the internal block pointers to other parts of the indexed file are
32-bit values.
I was wondering if we would need a Prolog-4 version of RMS indexed
files with 64-bit internal block numbers. There are also RFAs to
consider and how you would move them around the rest of VMS and the
user's applications.
SQLite is 64-bit. 🤷🏼 🦆🦆

RFAs, FIDs, file sizes, UTF-8, the fun here with increased storage
sizes and storage capacities and storage addressing and storage
features is ~endless.

ODS-5 / EFS support has not yet landed in many apps, and I've yet to
grok whatever doc is available for the UTF-8 support that snuck out.

Per VSI comments, the storage device drivers have been updated, while
work on the replacement file system was arriving later.
--
Pure Personal Opinion | HoffmanLabs LLC
Arne Vajhøj
2021-05-20 23:59:21 UTC
Reply
Permalink
Post by Simon Clubley
I was thinking more about the internal RMS indexed file structure and
if the internal block pointers to other parts of the indexed file are
32-bit values.
I was wondering if we would need a Prolog-4 version of RMS indexed
files with 64-bit internal block numbers. There are also RFAs to
consider and how you would move them around the rest of VMS and the
user's applications.
SQLite is 64-bit.  🤷🏼  🦆🦆
It may compile with 64 bit pointers,

But https://www.sqlite.org/limits.html says:

<quote>
14. Maximum Database Size

Every database consists of one or more "pages". Within a single
database, every page is the same size, but different database can have
page sizes that are powers of two between 512 and 65536, inclusive. The
maximum size of a database file is 4294967294 pages. At the maximum page
size of 65536 bytes, this translates into a maximum database size of
approximately 1.4e+14 bytes (281 terabytes, or 256 tebibytes, or 281474
gigabytes or 256,000 gibibytes).
</quote>

4294967294 smells very 32bitish.

:-)

Arne
David Jones
2021-05-21 00:58:56 UTC
Reply
Permalink
Post by Arne Vajhøj
SQLite is 64-bit. 🤷🏼 🦆🦆
It may compile with 64 bit pointers,
I'm not sure what the emoji sequence is supposed to convey. SQLite does indeed compile
and run with 64-bi pointers.
Post by Arne Vajhøj
<quote>
14. Maximum Database Size
Every database consists of one or more "pages". Within a single
database, every page is the same size, but different database can have
page sizes that are powers of two between 512 and 65536, inclusive. The
maximum size of a database file is 4294967294 pages. At the maximum page
size of 65536 bytes, this translates into a maximum database size of
approximately 1.4e+14 bytes (281 terabytes, or 256 tebibytes, or 281474
gigabytes or 256,000 gibibytes).
</quote>
4294967294 smells very 32bitish.
The file format (https://www.sqlite.org/fileformat2.html) has a fixed header with
2 byte and 4 byte cells for the page size and total page count, respectively.
Arne Vajhøj
2021-05-21 01:18:48 UTC
Reply
Permalink
Post by David Jones
Post by Arne Vajhøj
SQLite is 64-bit. 🤷🏼 🦆🦆
It may compile with 64 bit pointers,
I'm not sure what the emoji sequence is supposed to convey. SQLite does indeed compile
and run with 64-bi pointers.
Post by Arne Vajhøj
<quote>
14. Maximum Database Size
Every database consists of one or more "pages". Within a single
database, every page is the same size, but different database can have
page sizes that are powers of two between 512 and 65536, inclusive. The
maximum size of a database file is 4294967294 pages. At the maximum page
size of 65536 bytes, this translates into a maximum database size of
approximately 1.4e+14 bytes (281 terabytes, or 256 tebibytes, or 281474
gigabytes or 256,000 gibibytes).
</quote>
4294967294 smells very 32bitish.
The file format (https://www.sqlite.org/fileformat2.html) has a fixed header with
2 byte and 4 byte cells for the page size and total page count, respectively.
So SQLite has the same fundamental 32 bit problem as ODS-5, but because
the page size 64 KB is bigger than 512 byte blocks it is not
currently a real limitation.

Arne

Loading...