Discussion:
Ksplice equivalent for VMS ?
Add Reply
Simon Clubley
2025-02-19 13:25:32 UTC
Reply
Permalink
Oracle have a kernel patching tool called Ksplice that they acquired
back in 2011. It allows their support contract Linux users to apply
many Linux kernel patches without having to reboot the server:

https://en.wikipedia.org/wiki/Ksplice

Given the high-availability mindset for VMS users, I wonder if VSI ever
considered creating something similar for VMS ?

Simon.
--
Simon Clubley, ***@remove_me.eisner.decus.org-Earth.UFP
Walking destinations on a map are further away than they appear.
Robert A. Brooks
2025-02-19 15:05:22 UTC
Reply
Permalink
Post by Simon Clubley
Oracle have a kernel patching tool called Ksplice that they acquired
back in 2011. It allows their support contract Linux users to apply
https://en.wikipedia.org/wiki/Ksplice
Given the high-availability mindset for VMS users, I wonder if VSI ever
considered creating something similar for VMS ?
No.
--
-- Rob
Arne Vajhøj
2025-02-19 19:10:29 UTC
Reply
Permalink
Post by Simon Clubley
Oracle have a kernel patching tool called Ksplice that they acquired
back in 2011. It allows their support contract Linux users to apply
https://en.wikipedia.org/wiki/Ksplice
Given the high-availability mindset for VMS users, I wonder if VSI ever
considered creating something similar for VMS ?
No.
What about process migration?

Arne
Robert A. Brooks
2025-02-19 19:50:53 UTC
Reply
Permalink
Post by Arne Vajhøj
Post by Simon Clubley
Oracle have a kernel patching tool called Ksplice that they acquired
back in 2011. It allows their support contract Linux users to apply
https://en.wikipedia.org/wiki/Ksplice
Given the high-availability mindset for VMS users, I wonder if VSI ever
considered creating something similar for VMS ?
No.
What about process migration?
Like Galaxy on Alpha?

There is vMotion for virtual machines on ESXi, but that's not exactly the same.
--
-- Rob
Arne Vajhøj
2025-02-19 20:05:35 UTC
Reply
Permalink
Post by Robert A. Brooks
Post by Arne Vajhøj
Post by Simon Clubley
Oracle have a kernel patching tool called Ksplice that they acquired
back in 2011. It allows their support contract Linux users to apply
https://en.wikipedia.org/wiki/Ksplice
Given the high-availability mindset for VMS users, I wonder if VSI ever
considered creating something similar for VMS ?
No.
What about process migration?
Like Galaxy on Alpha?
I thought Galaxy was multiple logical systems on one physical system.
DEC answer to IBM LPAR.

I am thinking about a scenario like:
* cluster with node A and B
* critical process P that for whatever reason does not work
running concurrent on multiple nodes runs on A
* node A needs to be taken down for some reason
* so VMS on node A and B does some magic and migrate P from A to B
transparent to users (obviously require a cluster IP address or
load balancer)
Post by Robert A. Brooks
There is vMotion for virtual machines on ESXi, but that's not exactly the same.
I know about that.

Not exactly the same but similar concept.

Arne
Dan Cross
2025-02-19 22:26:48 UTC
Reply
Permalink
Post by Arne Vajhøj
Post by Robert A. Brooks
Post by Arne Vajhøj
Post by Simon Clubley
Oracle have a kernel patching tool called Ksplice that they acquired
back in 2011. It allows their support contract Linux users to apply
https://en.wikipedia.org/wiki/Ksplice
Given the high-availability mindset for VMS users, I wonder if VSI ever
considered creating something similar for VMS ?
No.
What about process migration?
Like Galaxy on Alpha?
I thought Galaxy was multiple logical systems on one physical system.
DEC answer to IBM LPAR.
* cluster with node A and B
* critical process P that for whatever reason does not work
running concurrent on multiple nodes runs on A
* node A needs to be taken down for some reason
* so VMS on node A and B does some magic and migrate P from A to B
transparent to users (obviously require a cluster IP address or
load balancer)
While this may be an acceptable method to "hotpatch" a host with
minimal disruption to whatever workload it's running, it is
completely unlike what ksplice does. For one, it requires that
sufficient resources exist in wherever you'd migrate the process
to for the duration of the update. Moreover, it requires that
all aspects of state that are required to resume execution of
the process are accessable and replicable on other, similar
hardware.

Many hyperscalar cloud providers do something similar for
updates, but there are serious limitations and downsides; for
example, direct passthru to hardware devices (storage, compute
accelerators, etc) can make it impossible to move a VM.

Ksplice updates code in the running system, basically thunking
out function calls to point to new code. It has fairly
significant limitations, but doesn't require any sort of
migration.

- Dan C.
Arne Vajhøj
2025-02-19 23:53:56 UTC
Reply
Permalink
Post by Dan Cross
Post by Arne Vajhøj
* cluster with node A and B
* critical process P that for whatever reason does not work
running concurrent on multiple nodes runs on A
* node A needs to be taken down for some reason
* so VMS on node A and B does some magic and migrate P from A to B
transparent to users (obviously require a cluster IP address or
load balancer)
While this may be an acceptable method to "hotpatch" a host with
minimal disruption to whatever workload it's running, it is
completely unlike what ksplice does. For one, it requires that
sufficient resources exist in wherever you'd migrate the process
to for the duration of the update.
That is a requirement.

:-)
Post by Dan Cross
Moreover, it requires that
all aspects of state that are required to resume execution of
the process are accessable and replicable on other, similar
hardware.
Yes. Which becomes a little easier when restricted to a
cluster instead of any systems.
Post by Dan Cross
Many hyperscalar cloud providers do something similar for
updates, but there are serious limitations and downsides; for
example, direct passthru to hardware devices (storage, compute
accelerators, etc) can make it impossible to move a VM.
Moving VM's is common. Robert started by mentioning vMotion.

Arne
Dan Cross
2025-02-20 02:26:41 UTC
Reply
Permalink
Post by Arne Vajhøj
Post by Dan Cross
Post by Arne Vajhøj
* cluster with node A and B
* critical process P that for whatever reason does not work
running concurrent on multiple nodes runs on A
* node A needs to be taken down for some reason
* so VMS on node A and B does some magic and migrate P from A to B
transparent to users (obviously require a cluster IP address or
load balancer)
While this may be an acceptable method to "hotpatch" a host with
minimal disruption to whatever workload it's running, it is
completely unlike what ksplice does. For one, it requires that
sufficient resources exist in wherever you'd migrate the process
to for the duration of the update.
That is a requirement.
:-)
Post by Dan Cross
Moreover, it requires that
all aspects of state that are required to resume execution of
the process are accessable and replicable on other, similar
hardware.
Yes. Which becomes a little easier when restricted to a
cluster instead of any systems.
I don't know what you mean when you say, "restricted to a
cluster instead of any systems." If you mean that this somehow
makes managing state during process migration easier, then no,
not really; all of the same caveats apply. For instance,
if a program is using (say) a GPU for computation, part of
migrating it will be extracting whatever state it has in the
GPU out of the GPU, and replicating it on the destination
system.

At one point, the internal numbering of cores in the GPU was
visible to code running on the GPU, creating an $n \choose k$
fingerprinting problem for migration.

Besides, clusters can contain heterogenous systems.
Post by Arne Vajhøj
Post by Dan Cross
Many hyperscalar cloud providers do something similar for
updates, but there are serious limitations and downsides; for
example, direct passthru to hardware devices (storage, compute
accelerators, etc) can make it impossible to move a VM.
Moving VM's is common. Robert started by mentioning vMotion.
I don't see how that's relevant to the points I raised.

- Dan C.
Arne Vajhøj
2025-02-20 02:58:04 UTC
Reply
Permalink
Post by Dan Cross
Post by Arne Vajhøj
Post by Dan Cross
Post by Arne Vajhøj
* cluster with node A and B
* critical process P that for whatever reason does not work
running concurrent on multiple nodes runs on A
* node A needs to be taken down for some reason
* so VMS on node A and B does some magic and migrate P from A to B
transparent to users (obviously require a cluster IP address or
load balancer)
While this may be an acceptable method to "hotpatch" a host with
minimal disruption to whatever workload it's running, it is
completely unlike what ksplice does. For one, it requires that
sufficient resources exist in wherever you'd migrate the process
to for the duration of the update.
That is a requirement.
:-)
Post by Dan Cross
Moreover, it requires that
all aspects of state that are required to resume execution of
the process are accessable and replicable on other, similar
hardware.
Yes. Which becomes a little easier when restricted to a
cluster instead of any systems.
I don't know what you mean when you say, "restricted to a
cluster instead of any systems."
A and B being in a cluster instead of being two
standalone nodes.
Post by Dan Cross
If you mean that this somehow
makes managing state during process migration easier, then no,
not really; all of the same caveats apply. For instance,
if a program is using (say) a GPU for computation, part of
migrating it will be extracting whatever state it has in the
GPU out of the GPU, and replicating it on the destination
system.
At one point, the internal numbering of cores in the GPU was
visible to code running on the GPU, creating an $n \choose k$
fingerprinting problem for migration.
A VMS server process will not be using GPU.

I guess as part of the migration the process would need to
be non-CUR and release CPU (and GPU if VMS adds support for
CUDA or similar in the future).

Main memory will need to be migrated. And cluster will
not help with that.

But cluster with shared storage will help with disk files.

And cluster with shared SYSUAF will help with identity.

And cluster with shared queue database will help with jobs.
Post by Dan Cross
Besides, clusters can contain heterogenous systems.
Yes.

The nodes would need to be compatible.

Mixed architecture cluster is definitely out
of the question.

:-)

Arne
Dan Cross
2025-02-20 03:07:30 UTC
Reply
Permalink
Post by Arne Vajhøj
Post by Dan Cross
[snip]
Yes. Which becomes a little easier when restricted to a
cluster instead of any systems.
I don't know what you mean when you say, "restricted to a
cluster instead of any systems."
A and B being in a cluster instead of being two
standalone nodes.
Oh I see, you're using cluster here as a shorthand to
mean that they're in the same administrative domain.
Post by Arne Vajhøj
Post by Dan Cross
If you mean that this somehow
makes managing state during process migration easier, then no,
not really; all of the same caveats apply. For instance,
if a program is using (say) a GPU for computation, part of
migrating it will be extracting whatever state it has in the
GPU out of the GPU, and replicating it on the destination
system.
At one point, the internal numbering of cores in the GPU was
visible to code running on the GPU, creating an $n \choose k$
fingerprinting problem for migration.
A VMS server process will not be using GPU.
Sure. GPUs, as compute accelerators, are just one example.
It could be some other resource. The point is, process-level
migration is not a panacea; ksplice has its place, even with
its limitations.
Post by Arne Vajhøj
I guess as part of the migration the process would need to
be non-CUR and release CPU (and GPU if VMS adds support for
CUDA or similar in the future).
Main memory will need to be migrated. And cluster will
not help with that.
But cluster with shared storage will help with disk files.
And cluster with shared SYSUAF will help with identity.
And cluster with shared queue database will help with jobs.
Post by Dan Cross
Besides, clusters can contain heterogenous systems.
Yes.
The nodes would need to be compatible.
Mixed architecture cluster is definitely out
of the question.
:-)
Correct.

- Dan C.
Arne Vajhøj
2025-02-20 03:19:48 UTC
Reply
Permalink
Post by Dan Cross
Post by Arne Vajhøj
Post by Dan Cross
[snip]
Yes. Which becomes a little easier when restricted to a
cluster instead of any systems.
I don't know what you mean when you say, "restricted to a
cluster instead of any systems."
A and B being in a cluster instead of being two
standalone nodes.
Oh I see, you're using cluster here as a shorthand to
mean that they're in the same administrative domain.
VMS nodes in a VMS cluster with a some common VMS cluster setup.

That does mean same administrative domain, but more than that.
Post by Dan Cross
Post by Arne Vajhøj
Post by Dan Cross
If you mean that this somehow
makes managing state during process migration easier, then no,
not really; all of the same caveats apply. For instance,
if a program is using (say) a GPU for computation, part of
migrating it will be extracting whatever state it has in the
GPU out of the GPU, and replicating it on the destination
system.
At one point, the internal numbering of cores in the GPU was
visible to code running on the GPU, creating an $n \choose k$
fingerprinting problem for migration.
A VMS server process will not be using GPU.
Sure. GPUs, as compute accelerators, are just one example.
It could be some other resource. The point is, process-level
migration is not a panacea; ksplice has its place, even with
its limitations.
Process migration would be a big huge task to implement.

But VSI are not considering the ksplice model, so I wanted to
know if they are considering the process migration model.

Arne
Lawrence D'Oliveiro
2025-02-19 23:37:23 UTC
Reply
Permalink
Post by Arne Vajhøj
* cluster with node A and B
* critical process P that for whatever reason does not work
running concurrent on multiple nodes runs on A
* node A needs to be taken down for some reason
* so VMS on node A and B does some magic and migrate P from A to B
transparent to users (obviously require a cluster IP address or load
balancer)
Linux virtualization migrates entire VMs that way, rather than individual
processes.
Arne Vajhøj
2025-02-19 23:48:41 UTC
Reply
Permalink
Post by Lawrence D'Oliveiro
Post by Arne Vajhøj
* cluster with node A and B
* critical process P that for whatever reason does not work
running concurrent on multiple nodes runs on A
* node A needs to be taken down for some reason
* so VMS on node A and B does some magic and migrate P from A to B
transparent to users (obviously require a cluster IP address or load
balancer)
Linux virtualization migrates entire VMs that way, rather than individual
processes.
vMotion which does that for ESXi was mentioned in what you chose
not to quote.

Arne
Lawrence D'Oliveiro
2025-02-20 01:35:51 UTC
Reply
Permalink
vMotion which does that for ESXi was mentioned in what you chose not to
quote.
Didn’t see any mention that that was what it did, so I considered it
irrelevant.
Waldek Hebisch
2025-02-21 12:48:54 UTC
Reply
Permalink
Post by Lawrence D'Oliveiro
Post by Arne Vajhøj
* cluster with node A and B
* critical process P that for whatever reason does not work
running concurrent on multiple nodes runs on A
* node A needs to be taken down for some reason
* so VMS on node A and B does some magic and migrate P from A to B
transparent to users (obviously require a cluster IP address or load
balancer)
Linux virtualization migrates entire VMs that way, rather than individual
processes.
Migrating whole VM will also migrate _current_ kernel. The whole
point of orignal question is updating kernel.
--
Waldek Hebisch
Dan Cross
2025-02-21 13:13:12 UTC
Reply
Permalink
Post by Waldek Hebisch
Post by Lawrence D'Oliveiro
Post by Arne Vajhøj
* cluster with node A and B
* critical process P that for whatever reason does not work
running concurrent on multiple nodes runs on A
* node A needs to be taken down for some reason
* so VMS on node A and B does some magic and migrate P from A to B
transparent to users (obviously require a cluster IP address or load
balancer)
Linux virtualization migrates entire VMs that way, rather than individual
processes.
Migrating whole VM will also migrate _current_ kernel. The whole
point of orignal question is updating kernel.
Indeed. But if the goal is to update the host, and not the
guest, it's an acceptable method, provided you can meet the
requirements vis resources et al, and can handle the limitations
with respect to direct access to devices. And if the workloads
in question can tolerate the drag on performance during the
migration (no matter how you shake it, there's a mandatory
blackout period where the guest is not running on either the
host or target systems).

I designed the live migration protocol used for Bhyve in the
Oxide architecture. Minimizing that period was an explicit
design goal, but at some point you simply have to bottle up the
last bits of state from the source and move them to the
destination.

- Dan C.

Loading...