Discussion:
in-memory editing with EDT or EVE
(too old to reply)
Arne Vajhøj
2024-11-23 00:59:07 UTC
Permalink
Over on the VSI forum there has been a couple of threads
about how to make GIT use a specific editor (VIM/JED/EDT/EVE)
for editing commit messages.

GIT does not come from the VMS world, so it does things
in a more *nix/portable way with external editor,
input file and output file.

But it made me remember that 30 years ago I had code
for having callable EDT and callable TPU edit data
in memory not in disk file.

The code was written in Macro-32 and it was expecting
lines in a Fortran CHARACTER array. Not so 2020'ish.
And besides there was a bug in the TPU version so it
frequently crashed.

But this is what a source control system really should
be using for such functionality. No need for temporary
disk files.

So on to making it work again. And this time in C.
Which for good or worse probably is the language that
will be used for code wanting to use callable EDT
or TPU to edit in memory only data.

And not that easy. The full callable interface to
TPU may be flexible, but it is also rather obscure.

Result: two wrapper functions memedt and memtpu
with a very simple API.

From my test programs:

char buf[10000];
memedt("A\nBB\nCCC", buf, sizeof(buf));
puts(buf);

and:

char buf[10000];
memtpu("A\nBB\nCCC", buf, sizeof(buf));
puts(buf);

Basically both input and output are C strings with
embedded \n's.

If anyone want a copy then it is here:

https://www.vajhoej.dk/arne/vmsstuff/memedit/

It seems to work, but with just 1 test case for each, then
there are likely cases that does not work.

Arne
Lawrence D'Oliveiro
2024-11-23 02:02:38 UTC
Permalink
But this is what a source control system really should be using for such
functionality. No need for temporary disk files.
On Linux systems at least, temporary files are usually created in /tmp,
and distros commonly mount an instance of tmpfs on that, so no actual disk
files need be created.

<https://manpages.debian.org/5/tmpfs.5.en.html>
Craig A. Berry
2024-11-23 13:42:42 UTC
Permalink
Post by Lawrence D'Oliveiro
But this is what a source control system really should be using for such
functionality. No need for temporary disk files.
On Linux systems at least, temporary files are usually created in /tmp,
and distros commonly mount an instance of tmpfs on that, so no actual disk
files need be created.
But that is not what git does when staging a commit.
Dan Cross
2024-11-23 14:30:03 UTC
Permalink
Post by Craig A. Berry
Post by Lawrence D'Oliveiro
But this is what a source control system really should be using for such
functionality. No need for temporary disk files.
On Linux systems at least, temporary files are usually created in /tmp,
and distros commonly mount an instance of tmpfs on that, so no actual disk
files need be created.
But that is not what git does when staging a commit.
Regardless of that, it is what git does when composing a commit
message.

- Dan C.
Craig A. Berry
2024-11-23 18:10:34 UTC
Permalink
Post by Dan Cross
Post by Craig A. Berry
Post by Lawrence D'Oliveiro
But this is what a source control system really should be using for such
functionality. No need for temporary disk files.
"should" seems awfully strong there and I don't understand why temporary
disk files pose a problem. To compute the commit ID, git has to
calculate the SHA1 of the actual content changes, the metadata (who,
when, etc.), and the commit message. While that could theoretically all
be done in memory, how can be you sure it would all fit in memory? Plus
debugging and recovery from failed operations would surely be much
easier with some kind of persistence of intermediate steps. So I think
the actual design of git is much better than this hypothetical one that
tries to avoid saving anything to disk until the last step.
Post by Dan Cross
Post by Craig A. Berry
Post by Lawrence D'Oliveiro
On Linux systems at least, temporary files are usually created in /tmp,
and distros commonly mount an instance of tmpfs on that, so no actual disk
files need be created.
But that is not what git does when staging a commit.
Regardless of that, it is what git does when composing a commit
message.
I simply meant staging in the sense of preparing, not staging in the
git-specific sense of adding to the index (which happens before the
commit operation), so you've made a distinction without a meaningful
difference. I was simply trying to make clear that Lawrence's comments
about /tmp and tmpfs have nothing to do with the matter at hand. At
least by default the commit messge goes in .git/COMMIT_EDITMSG in the
current repository, so /tmp and how it's implemented are irrelevant.
Arne Vajhøj
2024-11-23 18:29:55 UTC
Permalink
Post by Craig A. Berry
But this is what a source control system really should be using for such
functionality. No need for temporary disk files.
"should" seems awfully strong there and I don't understand why temporary
disk files pose a problem.
It is likely not a problem with any measurable impact.

But for the task as hand - having the user write a
commit message that is to be send to a server over the
network - then the use of a temporary files seems like
an unnecessary detour to me.
Post by Craig A. Berry
  To compute the commit ID, git has to
calculate the SHA1 of the actual content changes, the metadata (who,
when, etc.), and the commit message. While that could theoretically all
be done in memory, how can be you sure it would all fit in memory?
The files being committed are on disk, so Git will be doing disk IO.

But I don't see that as an argument for that the commit message need to
pass through a file.
Post by Craig A. Berry
  Plus
debugging and recovery from failed operations would surely be much
easier with some kind of persistence of intermediate steps.
Maybe. But It is not obvious to me that having commit message
on disk in a temporary file will help troubleshooting.
Post by Craig A. Berry
  So I think
the actual design of git is much better than this hypothetical one that
tries to avoid saving anything to disk until the last step.
The commit message should not be saved on disk client side at all.
The message get created and get sent to the server over the network.

Arne
Craig A. Berry
2024-11-23 20:16:31 UTC
Permalink
Post by Arne Vajhøj
Post by Craig A. Berry
But this is what a source control system really should be using for such
functionality. No need for temporary disk files.
"should" seems awfully strong there and I don't understand why temporary
disk files pose a problem.
It is likely not a problem with any measurable impact.
But for the task as hand - having the user write a
commit message that is to be send to a server over the
network - then the use of a temporary files seems like
an unnecessary detour to me.
Post by Craig A. Berry
                                   To compute the commit ID, git has to
calculate the SHA1 of the actual content changes, the metadata (who,
when, etc.), and the commit message. While that could theoretically all
be done in memory, how can be you sure it would all fit in memory?
The files being committed are on disk, so Git will be doing disk IO.
But I don't see that as an argument for that the commit message need to
pass through a file.
Post by Craig A. Berry
                                                            Plus
debugging and recovery from failed operations would surely be much
easier with some kind of persistence of intermediate steps.
Maybe. But It is not obvious to me that having commit message
on disk in a temporary file will help troubleshooting.
Post by Craig A. Berry
                                                          So I think
the actual design of git is much better than this hypothetical one that
tries to avoid saving anything to disk until the last step.
The commit message should not be saved on disk client side at all.
The message get created and get sent to the server over the network.
There is no "client." In a DVCS like git, when you commit a change,
everything is written locally. Pushing to a server is an optional
separate operation and what you push is the version history that has
been written locally first. There is never a point where the commit
message is sent over the network to another machine before being stored
as one component of a commit.
Arne Vajhøj
2024-11-24 00:53:57 UTC
Permalink
Post by Arne Vajhøj
Post by Craig A. Berry
                                   To compute the commit ID, git has to
calculate the SHA1 of the actual content changes, the metadata (who,
when, etc.), and the commit message. While that could theoretically all
be done in memory, how can be you sure it would all fit in memory?
The files being committed are on disk, so Git will be doing disk IO.
But I don't see that as an argument for that the commit message need to
pass through a file.
Post by Craig A. Berry
                                                            Plus
debugging and recovery from failed operations would surely be much
easier with some kind of persistence of intermediate steps.
Maybe. But It is not obvious to me that having commit message
on disk in a temporary file will help troubleshooting.
Post by Craig A. Berry
                                                          So I think
the actual design of git is much better than this hypothetical one that
tries to avoid saving anything to disk until the last step.
The commit message should not be saved on disk client side at all.
The message get created and get sent to the server over the network.
There is no "client."  In a DVCS like git, when you commit a change,
everything is written locally.  Pushing to a server is an optional
separate operation and what you push is the version history that has
been written locally first.  There is never a point where the commit
message is sent over the network to another machine before being stored
as one component of a commit.
OK. I am still thinking SVNish. Sorry.

But does it matter?

edit disk file--read disk file--write to local repo
vs
edit in memory--write to local repo

still seem like a difference to me.

Or is git external editor actual editing the final file
inside the repo?

Arne
Craig A. Berry
2024-11-24 02:54:06 UTC
Permalink
Post by Arne Vajhøj
There is no "client."  In a DVCS like git, when you commit a change,
everything is written locally.  Pushing to a server is an optional
separate operation and what you push is the version history that has
been written locally first.  There is never a point where the commit
message is sent over the network to another machine before being stored
as one component of a commit.
OK. I am still thinking SVNish. Sorry.
But does it matter?
edit disk file--read disk file--write to local repo
vs
edit in memory--write to local repo
still seem like a difference to me.
Or is git external editor actual editing the final file
inside the repo?
As I tried to explain before, a git commit consists of the metadata
(author, timestamp, etc.), the commit message, the branch, and the
actual diff content of the changeset. All of the other pieces are
stored on-disk, so it's hard to see a reason to keep the commit message
in memory when it needs to be combined with the other pieces in order to
produce the commit.

It should also be mentioned that the editor used for editing commit
messages is configurable, so git's process for producing a commit needs
to work with any and every editor.

What problem are you trying to solve by keeping a kilobyte or three in
memory instead of persisting it to disk where any and every utility in
the toolbox can operate on it?
Arne Vajhøj
2024-11-24 03:35:41 UTC
Permalink
Post by Craig A. Berry
Post by Arne Vajhøj
There is no "client."  In a DVCS like git, when you commit a change,
everything is written locally.  Pushing to a server is an optional
separate operation and what you push is the version history that has
been written locally first.  There is never a point where the commit
message is sent over the network to another machine before being stored
as one component of a commit.
OK. I am still thinking SVNish. Sorry.
But does it matter?
edit disk file--read disk file--write to local repo
vs
edit in memory--write to local repo
still seem like a difference to me.
Or is git external editor actual editing the final file
inside the repo?
As I tried to explain before, a git commit consists of the metadata
(author, timestamp, etc.), the commit message, the branch, and the
actual diff content of the changeset.  All of the other pieces are
stored on-disk, so it's hard to see a reason to keep the commit message
in memory when it needs to be combined with the other pieces in order to
produce the commit.
I am thinking differently.

Let us say hypothetical that you were doing a code review of some
code and you noticed that function A was passing a string to
function B by having A write it to a temporary file and B reading
it from the temporary file. Both A and B are doing lots of other
IO so the overall performance impact is insignificant. Would you
approve the code? I assume not.

In that case it is obvious funky because passing a string
between functions is very basic.

Doing in memory editing is not quite as basic. But it is possible.
With EDT and TPU on VMS.

So I was thinking that maybe it is still funky to do it
that way.
Post by Craig A. Berry
It should also be mentioned that the editor used for editing commit
messages is configurable, so git's process for producing a commit needs
to work with any and every editor.
I know. That was the topic in one of the VSI forum threads.

I am not expecting GIT to change.

Very few editors will be callable and allow for a custom fileio
function.

And:

if ((OS = "VMS") and ((editor = "EDT") or (editor = "TPU))) then
use callable editor on in memory data
else
use external editor on temp file
end if

is not pretty.
Post by Craig A. Berry
What problem are you trying to solve by keeping a kilobyte or three in
memory instead of persisting it to disk where any and every utility in
the toolbox can operate on it?
I am not really trying to solve a GIT problem. GIT works. And
VMS is not an important platform for GIT.

More like using the GIT editor case to point out that the
standard VMS editors has some nice capabilities, that could
be useful for somebody.

If somebody have a program that need to allow user to
edit data and the program is VMS specific and the target
editors are EDT and TPU, then maybe doing the edit in
memory makes sense.

You can still argue that both subprocess with external editor
and temp file or simple callable editor editor with temp file is
simpler code wise using just the VMS provided API's and
not some wrapper found on the internet.

But I like multiple options.

Arne
Arne Vajhøj
2024-11-24 04:10:46 UTC
Permalink
Post by Arne Vajhøj
If somebody have a program that need to allow user to
edit data and the program is VMS specific and the target
editors are EDT and TPU, then maybe doing the edit in
memory makes sense.
You can still argue that both subprocess with external editor
and temp file or simple callable editor editor with temp file is
simpler code wise using just the VMS provided API's and
not some wrapper found on the internet.
But I like multiple options.
I just created a JNI wrapper around the callable
editor wrappers.

So now the Groovy snippets:

res = Edit.edt("A\nBB\nCCC")
print(res)

and:

res = Edit.tpu("A\nBB\nCCC")
print(res)

work.

Arne
Arne Vajhøj
2024-11-25 00:27:35 UTC
Permalink
Post by Arne Vajhøj
Post by Arne Vajhøj
If somebody have a program that need to allow user to
edit data and the program is VMS specific and the target
editors are EDT and TPU, then maybe doing the edit in
memory makes sense.
You can still argue that both subprocess with external editor
and temp file or simple callable editor editor with temp file is
simpler code wise using just the VMS provided API's and
not some wrapper found on the internet.
But I like multiple options.
I just created a JNI wrapper around the callable
editor wrappers.
res = Edit.edt("A\nBB\nCCC")
print(res)
res = Edit.tpu("A\nBB\nCCC")
print(res)
work.
And using a normal shareable image and ctypes, then
it also works in Python:

import edit

res = edit.edt('A\nBB\nCCC')
print(res)

and:

import edit

res = edit.tpu('A\nBB\nCCC')
print(res)

https://www.vajhoej.dk/arne/vmsstuff/memedit/ updated
with JVM and Python examples.

Arne
Lawrence D'Oliveiro
2024-11-25 00:33:29 UTC
Permalink
And using a normal shareable image and ctypes, then it also works in
Soon as you mentioned “ctypes”, I had to have a look. ;)

I’m not a fan of wildcard imports. I know you tend to need a lot of stuff
from ctypes, but I prefer to do

import ctypes as ct

so you can then write

memedit.memedt.argtypes = [ct.c_char_p, ct.c_char_p, ct.c_int]
memedit.memedt.restype = ct.c_int

etc.
Arne Vajhøj
2024-11-25 00:51:22 UTC
Permalink
Post by Lawrence D'Oliveiro
And using a normal shareable image and ctypes, then it also works in
Soon as you mentioned “ctypes”, I had to have a look. ;)
I’m not a fan of wildcard imports. I know you tend to need a lot of stuff
from ctypes, but I prefer to do
import ctypes as ct
so you can then write
memedit.memedt.argtypes = [ct.c_char_p, ct.c_char_p, ct.c_int]
memedit.memedt.restype = ct.c_int
etc.
I have an algorithm to decide whether to wildcard import
or not:

wildcardimport = isOnlyOptionInLanguage() ? true : (isScriptLanguage() ?
rng.nextDouble() > 0.1 : rng.nextDouble() > 0.9)

:-)

Arne

Lawrence D'Oliveiro
2024-11-24 06:02:55 UTC
Permalink
Post by Arne Vajhøj
Doing in memory editing is not quite as basic. But it is possible.
With EDT and TPU on VMS.
The idea of having callable libraries with a fixed API, as opposed to
spawning separate processes via command lines, sounded quite appealing,
back in the 1980s, for these sorts of tasks. But in the real world, it
turned out to be the inferior solution.

Unix got it right. But then, Unix always had a better command-line
interface than any DEC-original OS.
Dan Cross
2024-11-24 04:15:11 UTC
Permalink
Post by Arne Vajhøj
There is no "client."  In a DVCS like git, when you commit a change,
everything is written locally.  Pushing to a server is an optional
separate operation and what you push is the version history that has
been written locally first.  There is never a point where the commit
message is sent over the network to another machine before being stored
as one component of a commit.
OK. I am still thinking SVNish. Sorry.
But does it matter?
edit disk file--read disk file--write to local repo
vs
edit in memory--write to local repo
still seem like a difference to me.
A difference yes, but why is it an interesting difference?
Aside from a "that's neat" how is this usefully different?
Post by Arne Vajhøj
Or is git external editor actual editing the final file
inside the repo?
It edits a file that is in a known location in the repo, but
that is then folded into the commit. It does not edit the final
committed artifact in the repo.

- Dan C.
Dan Cross
2024-11-23 20:35:39 UTC
Permalink
Post by Arne Vajhøj
Post by Craig A. Berry
But this is what a source control system really should be using for such
functionality. No need for temporary disk files.
"should" seems awfully strong there and I don't understand why temporary
disk files pose a problem.
It is likely not a problem with any measurable impact.
But for the task as hand - having the user write a
commit message that is to be send to a server over the
network - then the use of a temporary files seems like
an unnecessary detour to me.
That's not really how git works. Git puts the entire commit
into the _local_ repository, which one can then push to a
remote.
Post by Arne Vajhøj
Post by Craig A. Berry
  To compute the commit ID, git has to
calculate the SHA1 of the actual content changes, the metadata (who,
when, etc.), and the commit message. While that could theoretically all
be done in memory, how can be you sure it would all fit in memory?
The files being committed are on disk, so Git will be doing disk IO.
But I don't see that as an argument for that the commit message need to
pass through a file.
Post by Craig A. Berry
  Plus
debugging and recovery from failed operations would surely be much
easier with some kind of persistence of intermediate steps.
Maybe. But It is not obvious to me that having commit message
on disk in a temporary file will help troubleshooting.
Post by Craig A. Berry
  So I think
the actual design of git is much better than this hypothetical one that
tries to avoid saving anything to disk until the last step.
The commit message should not be saved on disk client side at all.
The message get created and get sent to the server over the network.
That's just not how git works.

- Dan C.
Lawrence D'Oliveiro
2024-11-23 21:19:21 UTC
Permalink
Post by Arne Vajhøj
The commit message should not be saved on disk client side at all.
The message get created and get sent to the server over the network.
Git is a DVCS, a “Distributed Version Control System”. Every user is
running their own copy of the code, operating on their own copy of the
source repo and commit history.

You’re thinking of older-style VCSes like Subversion, which did indeed
have a client/server architecture. It was various problems with those that
led to the creation of Git.
Craig A. Berry
2024-11-23 21:25:59 UTC
Permalink
Post by Lawrence D'Oliveiro
You’re thinking of older-style VCSes like Subversion, which did indeed
have a client/server architecture. It was various problems with those that
led to the creation of Git.
No, it was the restrictive licensing of BitKeeper, which was already a DVCS.
Dan Cross
2024-11-23 20:34:05 UTC
Permalink
Post by Craig A. Berry
Post by Dan Cross
Post by Craig A. Berry
Post by Lawrence D'Oliveiro
But this is what a source control system really should be using for such
functionality. No need for temporary disk files.
"should" seems awfully strong there and I don't understand why temporary
disk files pose a problem. To compute the commit ID, git has to
calculate the SHA1 of the actual content changes, the metadata (who,
when, etc.), and the commit message. While that could theoretically all
be done in memory, how can be you sure it would all fit in memory? Plus
debugging and recovery from failed operations would surely be much
easier with some kind of persistence of intermediate steps. So I think
the actual design of git is much better than this hypothetical one that
tries to avoid saving anything to disk until the last step.
It does seem like this is solving a non-problem. The approach
is mildly interesting, though, and vaguely reminds me of how
VM/CMS used addressability to the text editor to build all sorts
of useful interfaces. It's not exact, of course, but the idea
of an in-memory, editor-centric interface is not far off.
Post by Craig A. Berry
Post by Dan Cross
Post by Craig A. Berry
Post by Lawrence D'Oliveiro
On Linux systems at least, temporary files are usually created in /tmp,
and distros commonly mount an instance of tmpfs on that, so no actual disk
files need be created.
But that is not what git does when staging a commit.
Regardless of that, it is what git does when composing a commit
message.
I simply meant staging in the sense of preparing, not staging in the
git-specific sense of adding to the index (which happens before the
commit operation), so you've made a distinction without a meaningful
difference. I was simply trying to make clear that Lawrence's comments
about /tmp and tmpfs have nothing to do with the matter at hand. At
least by default the commit messge goes in .git/COMMIT_EDITMSG in the
current repository, so /tmp and how it's implemented are irrelevant.
Except that the editor likely creates a temporary file in /tmp,
but kind of see what you're saying.

- Dan C.
Lawrence D'Oliveiro
2024-11-23 21:15:52 UTC
Permalink
Post by Craig A. Berry
Post by Lawrence D'Oliveiro
On Linux systems at least, temporary files are usually created in /tmp,
and distros commonly mount an instance of tmpfs on that, so no actual
disk files need be created.
But that is not what git does when staging a commit.
Hmm, git does use special fixed file names like .git/COMMIT_EDITMSG
and .git/ADD_EDIT.patch for particular editing purposes ... given that
COMMIT_EDITMSG retains its previous contents, perhaps it doesn’t quite
count as a “temporary file” ...

I was thinking more about what happens in general when some program
invokes $EDITOR or $VISUAL to let the user create/edit some text input. It
is usual to put the file in $TMPDIR.
Loading...