Discussion:
Ksplice equivalent for VMS ?
(too old to reply)
Simon Clubley
2025-02-19 13:25:32 UTC
Permalink
Oracle have a kernel patching tool called Ksplice that they acquired
back in 2011. It allows their support contract Linux users to apply
many Linux kernel patches without having to reboot the server:

https://en.wikipedia.org/wiki/Ksplice

Given the high-availability mindset for VMS users, I wonder if VSI ever
considered creating something similar for VMS ?

Simon.
--
Simon Clubley, ***@remove_me.eisner.decus.org-Earth.UFP
Walking destinations on a map are further away than they appear.
Robert A. Brooks
2025-02-19 15:05:22 UTC
Permalink
Post by Simon Clubley
Oracle have a kernel patching tool called Ksplice that they acquired
back in 2011. It allows their support contract Linux users to apply
https://en.wikipedia.org/wiki/Ksplice
Given the high-availability mindset for VMS users, I wonder if VSI ever
considered creating something similar for VMS ?
No.
--
-- Rob
Arne Vajhøj
2025-02-19 19:10:29 UTC
Permalink
Post by Simon Clubley
Oracle have a kernel patching tool called Ksplice that they acquired
back in 2011. It allows their support contract Linux users to apply
https://en.wikipedia.org/wiki/Ksplice
Given the high-availability mindset for VMS users, I wonder if VSI ever
considered creating something similar for VMS ?
No.
What about process migration?

Arne
Robert A. Brooks
2025-02-19 19:50:53 UTC
Permalink
Post by Arne Vajhøj
Post by Simon Clubley
Oracle have a kernel patching tool called Ksplice that they acquired
back in 2011. It allows their support contract Linux users to apply
https://en.wikipedia.org/wiki/Ksplice
Given the high-availability mindset for VMS users, I wonder if VSI ever
considered creating something similar for VMS ?
No.
What about process migration?
Like Galaxy on Alpha?

There is vMotion for virtual machines on ESXi, but that's not exactly the same.
--
-- Rob
Arne Vajhøj
2025-02-19 20:05:35 UTC
Permalink
Post by Robert A. Brooks
Post by Arne Vajhøj
Post by Simon Clubley
Oracle have a kernel patching tool called Ksplice that they acquired
back in 2011. It allows their support contract Linux users to apply
https://en.wikipedia.org/wiki/Ksplice
Given the high-availability mindset for VMS users, I wonder if VSI ever
considered creating something similar for VMS ?
No.
What about process migration?
Like Galaxy on Alpha?
I thought Galaxy was multiple logical systems on one physical system.
DEC answer to IBM LPAR.

I am thinking about a scenario like:
* cluster with node A and B
* critical process P that for whatever reason does not work
running concurrent on multiple nodes runs on A
* node A needs to be taken down for some reason
* so VMS on node A and B does some magic and migrate P from A to B
transparent to users (obviously require a cluster IP address or
load balancer)
Post by Robert A. Brooks
There is vMotion for virtual machines on ESXi, but that's not exactly the same.
I know about that.

Not exactly the same but similar concept.

Arne
Dan Cross
2025-02-19 22:26:48 UTC
Permalink
Post by Arne Vajhøj
Post by Robert A. Brooks
Post by Arne Vajhøj
Post by Simon Clubley
Oracle have a kernel patching tool called Ksplice that they acquired
back in 2011. It allows their support contract Linux users to apply
https://en.wikipedia.org/wiki/Ksplice
Given the high-availability mindset for VMS users, I wonder if VSI ever
considered creating something similar for VMS ?
No.
What about process migration?
Like Galaxy on Alpha?
I thought Galaxy was multiple logical systems on one physical system.
DEC answer to IBM LPAR.
* cluster with node A and B
* critical process P that for whatever reason does not work
running concurrent on multiple nodes runs on A
* node A needs to be taken down for some reason
* so VMS on node A and B does some magic and migrate P from A to B
transparent to users (obviously require a cluster IP address or
load balancer)
While this may be an acceptable method to "hotpatch" a host with
minimal disruption to whatever workload it's running, it is
completely unlike what ksplice does. For one, it requires that
sufficient resources exist in wherever you'd migrate the process
to for the duration of the update. Moreover, it requires that
all aspects of state that are required to resume execution of
the process are accessable and replicable on other, similar
hardware.

Many hyperscalar cloud providers do something similar for
updates, but there are serious limitations and downsides; for
example, direct passthru to hardware devices (storage, compute
accelerators, etc) can make it impossible to move a VM.

Ksplice updates code in the running system, basically thunking
out function calls to point to new code. It has fairly
significant limitations, but doesn't require any sort of
migration.

- Dan C.
Arne Vajhøj
2025-02-19 23:53:56 UTC
Permalink
Post by Dan Cross
Post by Arne Vajhøj
* cluster with node A and B
* critical process P that for whatever reason does not work
running concurrent on multiple nodes runs on A
* node A needs to be taken down for some reason
* so VMS on node A and B does some magic and migrate P from A to B
transparent to users (obviously require a cluster IP address or
load balancer)
While this may be an acceptable method to "hotpatch" a host with
minimal disruption to whatever workload it's running, it is
completely unlike what ksplice does. For one, it requires that
sufficient resources exist in wherever you'd migrate the process
to for the duration of the update.
That is a requirement.

:-)
Post by Dan Cross
Moreover, it requires that
all aspects of state that are required to resume execution of
the process are accessable and replicable on other, similar
hardware.
Yes. Which becomes a little easier when restricted to a
cluster instead of any systems.
Post by Dan Cross
Many hyperscalar cloud providers do something similar for
updates, but there are serious limitations and downsides; for
example, direct passthru to hardware devices (storage, compute
accelerators, etc) can make it impossible to move a VM.
Moving VM's is common. Robert started by mentioning vMotion.

Arne
Dan Cross
2025-02-20 02:26:41 UTC
Permalink
Post by Arne Vajhøj
Post by Dan Cross
Post by Arne Vajhøj
* cluster with node A and B
* critical process P that for whatever reason does not work
running concurrent on multiple nodes runs on A
* node A needs to be taken down for some reason
* so VMS on node A and B does some magic and migrate P from A to B
transparent to users (obviously require a cluster IP address or
load balancer)
While this may be an acceptable method to "hotpatch" a host with
minimal disruption to whatever workload it's running, it is
completely unlike what ksplice does. For one, it requires that
sufficient resources exist in wherever you'd migrate the process
to for the duration of the update.
That is a requirement.
:-)
Post by Dan Cross
Moreover, it requires that
all aspects of state that are required to resume execution of
the process are accessable and replicable on other, similar
hardware.
Yes. Which becomes a little easier when restricted to a
cluster instead of any systems.
I don't know what you mean when you say, "restricted to a
cluster instead of any systems." If you mean that this somehow
makes managing state during process migration easier, then no,
not really; all of the same caveats apply. For instance,
if a program is using (say) a GPU for computation, part of
migrating it will be extracting whatever state it has in the
GPU out of the GPU, and replicating it on the destination
system.

At one point, the internal numbering of cores in the GPU was
visible to code running on the GPU, creating an $n \choose k$
fingerprinting problem for migration.

Besides, clusters can contain heterogenous systems.
Post by Arne Vajhøj
Post by Dan Cross
Many hyperscalar cloud providers do something similar for
updates, but there are serious limitations and downsides; for
example, direct passthru to hardware devices (storage, compute
accelerators, etc) can make it impossible to move a VM.
Moving VM's is common. Robert started by mentioning vMotion.
I don't see how that's relevant to the points I raised.

- Dan C.
Arne Vajhøj
2025-02-20 02:58:04 UTC
Permalink
Post by Dan Cross
Post by Arne Vajhøj
Post by Dan Cross
Post by Arne Vajhøj
* cluster with node A and B
* critical process P that for whatever reason does not work
running concurrent on multiple nodes runs on A
* node A needs to be taken down for some reason
* so VMS on node A and B does some magic and migrate P from A to B
transparent to users (obviously require a cluster IP address or
load balancer)
While this may be an acceptable method to "hotpatch" a host with
minimal disruption to whatever workload it's running, it is
completely unlike what ksplice does. For one, it requires that
sufficient resources exist in wherever you'd migrate the process
to for the duration of the update.
That is a requirement.
:-)
Post by Dan Cross
Moreover, it requires that
all aspects of state that are required to resume execution of
the process are accessable and replicable on other, similar
hardware.
Yes. Which becomes a little easier when restricted to a
cluster instead of any systems.
I don't know what you mean when you say, "restricted to a
cluster instead of any systems."
A and B being in a cluster instead of being two
standalone nodes.
Post by Dan Cross
If you mean that this somehow
makes managing state during process migration easier, then no,
not really; all of the same caveats apply. For instance,
if a program is using (say) a GPU for computation, part of
migrating it will be extracting whatever state it has in the
GPU out of the GPU, and replicating it on the destination
system.
At one point, the internal numbering of cores in the GPU was
visible to code running on the GPU, creating an $n \choose k$
fingerprinting problem for migration.
A VMS server process will not be using GPU.

I guess as part of the migration the process would need to
be non-CUR and release CPU (and GPU if VMS adds support for
CUDA or similar in the future).

Main memory will need to be migrated. And cluster will
not help with that.

But cluster with shared storage will help with disk files.

And cluster with shared SYSUAF will help with identity.

And cluster with shared queue database will help with jobs.
Post by Dan Cross
Besides, clusters can contain heterogenous systems.
Yes.

The nodes would need to be compatible.

Mixed architecture cluster is definitely out
of the question.

:-)

Arne
Dan Cross
2025-02-20 03:07:30 UTC
Permalink
Post by Arne Vajhøj
Post by Dan Cross
[snip]
Yes. Which becomes a little easier when restricted to a
cluster instead of any systems.
I don't know what you mean when you say, "restricted to a
cluster instead of any systems."
A and B being in a cluster instead of being two
standalone nodes.
Oh I see, you're using cluster here as a shorthand to
mean that they're in the same administrative domain.
Post by Arne Vajhøj
Post by Dan Cross
If you mean that this somehow
makes managing state during process migration easier, then no,
not really; all of the same caveats apply. For instance,
if a program is using (say) a GPU for computation, part of
migrating it will be extracting whatever state it has in the
GPU out of the GPU, and replicating it on the destination
system.
At one point, the internal numbering of cores in the GPU was
visible to code running on the GPU, creating an $n \choose k$
fingerprinting problem for migration.
A VMS server process will not be using GPU.
Sure. GPUs, as compute accelerators, are just one example.
It could be some other resource. The point is, process-level
migration is not a panacea; ksplice has its place, even with
its limitations.
Post by Arne Vajhøj
I guess as part of the migration the process would need to
be non-CUR and release CPU (and GPU if VMS adds support for
CUDA or similar in the future).
Main memory will need to be migrated. And cluster will
not help with that.
But cluster with shared storage will help with disk files.
And cluster with shared SYSUAF will help with identity.
And cluster with shared queue database will help with jobs.
Post by Dan Cross
Besides, clusters can contain heterogenous systems.
Yes.
The nodes would need to be compatible.
Mixed architecture cluster is definitely out
of the question.
:-)
Correct.

- Dan C.
Arne Vajhøj
2025-02-20 03:19:48 UTC
Permalink
Post by Dan Cross
Post by Arne Vajhøj
Post by Dan Cross
[snip]
Yes. Which becomes a little easier when restricted to a
cluster instead of any systems.
I don't know what you mean when you say, "restricted to a
cluster instead of any systems."
A and B being in a cluster instead of being two
standalone nodes.
Oh I see, you're using cluster here as a shorthand to
mean that they're in the same administrative domain.
VMS nodes in a VMS cluster with a some common VMS cluster setup.

That does mean same administrative domain, but more than that.
Post by Dan Cross
Post by Arne Vajhøj
Post by Dan Cross
If you mean that this somehow
makes managing state during process migration easier, then no,
not really; all of the same caveats apply. For instance,
if a program is using (say) a GPU for computation, part of
migrating it will be extracting whatever state it has in the
GPU out of the GPU, and replicating it on the destination
system.
At one point, the internal numbering of cores in the GPU was
visible to code running on the GPU, creating an $n \choose k$
fingerprinting problem for migration.
A VMS server process will not be using GPU.
Sure. GPUs, as compute accelerators, are just one example.
It could be some other resource. The point is, process-level
migration is not a panacea; ksplice has its place, even with
its limitations.
Process migration would be a big huge task to implement.

But VSI are not considering the ksplice model, so I wanted to
know if they are considering the process migration model.

Arne
Lawrence D'Oliveiro
2025-02-19 23:37:23 UTC
Permalink
Post by Arne Vajhøj
* cluster with node A and B
* critical process P that for whatever reason does not work
running concurrent on multiple nodes runs on A
* node A needs to be taken down for some reason
* so VMS on node A and B does some magic and migrate P from A to B
transparent to users (obviously require a cluster IP address or load
balancer)
Linux virtualization migrates entire VMs that way, rather than individual
processes.
Arne Vajhøj
2025-02-19 23:48:41 UTC
Permalink
Post by Lawrence D'Oliveiro
Post by Arne Vajhøj
* cluster with node A and B
* critical process P that for whatever reason does not work
running concurrent on multiple nodes runs on A
* node A needs to be taken down for some reason
* so VMS on node A and B does some magic and migrate P from A to B
transparent to users (obviously require a cluster IP address or load
balancer)
Linux virtualization migrates entire VMs that way, rather than individual
processes.
vMotion which does that for ESXi was mentioned in what you chose
not to quote.

Arne
Lawrence D'Oliveiro
2025-02-20 01:35:51 UTC
Permalink
vMotion which does that for ESXi was mentioned in what you chose not to
quote.
Didn’t see any mention that that was what it did, so I considered it
irrelevant.
Waldek Hebisch
2025-02-21 12:48:54 UTC
Permalink
Post by Lawrence D'Oliveiro
Post by Arne Vajhøj
* cluster with node A and B
* critical process P that for whatever reason does not work
running concurrent on multiple nodes runs on A
* node A needs to be taken down for some reason
* so VMS on node A and B does some magic and migrate P from A to B
transparent to users (obviously require a cluster IP address or load
balancer)
Linux virtualization migrates entire VMs that way, rather than individual
processes.
Migrating whole VM will also migrate _current_ kernel. The whole
point of orignal question is updating kernel.
--
Waldek Hebisch
Dan Cross
2025-02-21 13:13:12 UTC
Permalink
Post by Waldek Hebisch
Post by Lawrence D'Oliveiro
Post by Arne Vajhøj
* cluster with node A and B
* critical process P that for whatever reason does not work
running concurrent on multiple nodes runs on A
* node A needs to be taken down for some reason
* so VMS on node A and B does some magic and migrate P from A to B
transparent to users (obviously require a cluster IP address or load
balancer)
Linux virtualization migrates entire VMs that way, rather than individual
processes.
Migrating whole VM will also migrate _current_ kernel. The whole
point of orignal question is updating kernel.
Indeed. But if the goal is to update the host, and not the
guest, it's an acceptable method, provided you can meet the
requirements vis resources et al, and can handle the limitations
with respect to direct access to devices. And if the workloads
in question can tolerate the drag on performance during the
migration (no matter how you shake it, there's a mandatory
blackout period where the guest is not running on either the
host or target systems).

I designed the live migration protocol used for Bhyve in the
Oxide architecture. Minimizing that period was an explicit
design goal, but at some point you simply have to bottle up the
last bits of state from the source and move them to the
destination.

- Dan C.
Lawrence D'Oliveiro
2025-02-21 21:36:56 UTC
Permalink
Migrating whole VM will also migrate _current_ kernel. The whole point
of orignal question is updating kernel.
Which is what Ksplice does.
Stephen Hoffman
2025-02-21 20:32:17 UTC
Permalink
Post by Robert A. Brooks
Post by Arne Vajhøj
Post by Simon Clubley
Oracle have a kernel patching tool called Ksplice that they acquired
back in 2011. It allows their support contract Linux users to apply
https://en.wikipedia.org/wiki/Ksplice
Given the high-availability mindset for VMS users, I wonder if VSI ever
considered creating something similar for VMS ?
No.
DEC OpenVMS Engineering did look at work akin to Ksplice (with some
predecessor dynamic-patch tool), but that was a very long time ago. It
was not particularly feasible within what was then available, and the
task has probably only gotten more difficult.

Getting consistent online backups was another related discussion around
the same era, but that proposal never became a project.

IIRC, there were patching-related patents from HP, Microsoft, and other
organizations starting in the early 2000s, though some of those patents
took an appeal or two and a few years to be granted.

Some related and more recent reading:
https://web.eecs.umich.edu/~weimerw/p/weimer-dsn2020-kshot.pdf

The provided alternative within OpenVMS is a rolling reboot in a
cluster, with cluster-aware apps. That's documented and supported, and
works well. Works within Galaxy configurations, too.

There's not much documentation on creating cluster-aware apps
unfortunately, and the necessary APIs are scattered around the docs,
but various developers have succeeded in that task. I've written a few
of these cluster-aware apps over the years too, though the cluster
pricing scared many if not most sites away from that approach.

Another option here is Erlang, as well.

Quiescing apps and triggering some shadowing shenanigans was an option
for obtaining consistent backups, though lots of apps "borrowed" a
database with journaling support. Some few apps use RMS journaling too,
but that feature never caught on widely.
Post by Robert A. Brooks
Post by Arne Vajhøj
What about process migration?
Like Galaxy on Alpha?
OpenVMS Galaxy can't migrate processes across instances, though.
Processors, yes. Processes, no.

Semi-related, DEC had Checkpoint-Restart AKA Snapshot AKA FastBoot on
standalone VAX workstations, and had support for that starting at
OpenVMS VAX V6.0, and support for that was withdrawn at OpenVMS VAX
V7.1.
Post by Robert A. Brooks
There is vMotion for virtual machines on ESXi, but that's not exactly the same.
You're most definitely right about that. It's not the same.

I'd not expect to see anything approaching KSplice for OpenVMS from VSI.
--
Pure Personal Opinion | HoffmanLabs LLC
Scott Dorsey
2025-02-23 20:45:26 UTC
Permalink
Post by Simon Clubley
Oracle have a kernel patching tool called Ksplice that they acquired
back in 2011. It allows their support contract Linux users to apply
https://en.wikipedia.org/wiki/Ksplice
Given the high-availability mindset for VMS users, I wonder if VSI ever
considered creating something similar for VMS ?
No.
I have to say ksplice is not really that reliable, although it works more
often than not. IBM has managed to make patching on the fly work for many
years on their big iron, though.
--scott
--
"C'est un Nagra. C'est suisse, et tres, tres precis."
Loading...