Post by Simon Clubley
Why is the VMS codebase apparently so convoluted ?
Operating systems are very complex constructs, and filled with
trade-offs. There are always trade-offs. There are decisions that
are the least-wrong among the bad-choices. There are the inevitable
compromises around available developers and scheduling; around what you
can get done, with the budget and time and staff you have.
In various cases not the least of which is shadowing, that code deals
with a simple-looking problem yet has to deal with errors arising from
networking and local hardware devices, NIC, simple controllers, RAID
controllers, disk devices, and has to do the appropriate thing in cases
such as whether the disk supports re-vectoring or not. Shadowing also
has to deal with various different hardware and firmware, some of which
is sometimes... in the most charitable of phrasing... somewhat odd.
Some of that hardware should be long gone, but — because there's little
precedent for deprecation — somebody's probably still using it. Some
devices and some configurations should have been yanked long ago.
I've worked with more than a few widgets that simply lock up — this
from user-mode code! — and you have to power-cycle the whole box to get
them back. There's more than a little of this hard-earned knowledge
baked into the code of shadowing and of OpenVMS in general, too. That
knowledge is really hard to replicate.
There's also that rewriting existing source code isn't often the best
use of anybody's time. The old code works. Better to spend the time
designing and working on a "strangler" than on a direct rewrite, if
you're going to undertake the effort. Don't just rewrite, make the
replacement substantially better. Then — and there's little precedent
for deprecation, though it has happened with the first-generation of
shadowing — deprecate and remove the old code. Don't patch problems in
customer- or business- or future-critical areas, design or redesign the
solution, provide fundamental and potentially marketable enhancements,
and then schedule and deprecate the problems.
As differentiated from rewriting, there's refactoring the source code.
The OpenVMS source code refactoring tools are entirely non-existent,
and the formatting tools are all add-ons. On other platforms, using
something as limited as EDT for source code development is akin to
using Notepad on Windows or even punched cards; absurd. But there
aren't particularly good development tools available. I've worked
with tools that are quite good at cleaning up code, too.
Then there's having the necessary schedule time available to refactor
the code. Once there's working and neatly formatted code and test
cases (and code reviews, where those are done), there's tremendous
pressure to move on to the next project. Not on re-solving the
problem, based on what the developer(s) have learned from the first
solution. But doing that refactoring later means you have to reset
your context and relearn the old code. Best to do that refactoring
immediately. I and most of you have had test and prototype code put
into production, too. Un-refactored, un-rewritten, "ship it" code, and
that sketchy code becomes permanent, and can and variously does come
back and bite... somebody; the developer, the end-user, and sometimes
the board when the Brian Krebs calls up the PR folks. Technical debt
comes due. Always. The question then becomes whether or not the
original developer(s) or designer(s) — or the product or the whole
organization — is gone before the debt comes due.
One of the more pernicious problems here is hardware and software
compatibility. Compatibility with old hardware. That hits storage
more than you can imagine, as well as the terminal driver. There's
hardware-specific code in OpenVMS that goes back decades, and for
hardware that nobody maintaining and updating existing code or doing
new work even remotely cares about. Some old hardware has gotten
deprecated, such as the DEQNA. But that VT52 will probably still
work. Even DCL procedures are more difficult to tweak, because simple
changes can throw off tools that parse output. The MAIL rewrite ran
into these slight differences for instance, and more than a little work
went into the rewrite to avoid breaking existing tools. Effort that
didn't go to tasks such as integrating MIME into MAIL, which you'd
certainly want to do if you were setting out to strangle the old MAIL
application. Compatibility with old software, too.
There's no right decision around these compatibility trade-offs,
though. Only that not deprecating and not occasionally and
selectively breaking compatibility and deprecating older and
problematic hardware will eventually and inevitably occlude all
substantive future work. And that breaking too much, too fast and/or
with no easy migration will cause the customers to port to elsewhere.
If you can't deprecate and replace problematic areas of any operating
system or application — such as the known-to-be-insecure password hash,
for instance — you can only accrue complexity and technical debt, and
changes get more and more difficult and expensive and hazardous to
compatibility, and sooner or later you get into a situation where
developers can choose to make isolated changes such as adding metadata
storage into the LINKER, where designing and making more systemic
changes — such as enhancing the file system to provide a more generic
solution for metadata — means far more work and far more risk. As
another example, there's the spectacular 64-bit memory addressing
design in OpenVMS. One of the more brilliant efforts, and one that
allowed existing applications to be incrementally upgraded to 64-bit
addressing. Which also — remember, there are always trade-offs — left
OpenVMS with a completely hideous 64-bit native addressing scheme.
There are also issues around developer experience, too. Not to impugn
the VSI staff, but the experience of the VSI team is very much focused
in OpenVMS itself. Customers and users, too, become accustomed to how
OpenVMS works now. ACLs, for instance, ceased to be a competitive
differentiation well over a decade ago. Deep, but limited experience
is neither good nor bad, but it does tend to reduce the numbers of
different and new and variously better approaches that might be
incorporated. Microsoft is not the behemoth it once was, and — while
there are good ideas from Windows and Windows Server — there are more
than a few good ideas (and bad ideas to avoid) from packages and
products and tools elsewhere. More subtly, existing customers are
seldom a good source for suggestions around wholly new enhancements.
Incremental changes, removing pain points, sure. Wholly new features
or substantial updates? Not so much. And every single customer will
prefer to avoid changes. As happened at the boot camp, before the
folks even knew what the benefits of the suggested changes were, too.
Each vendor needs to look at their own products with some brutal
self-honesty, and also look around and learn from and incorporate the
best of the (relevant) advantages and disadvantages of other platforms
Then there's having an idea about what the product can and cannot do,
and the ability to say "no". In a small company, that's exceedingly
difficult. Both for reasons of funding — VSI will be loathe to turn
down changes associated from any substantive prospective sale, if they
can at all afford to make a profit from it. You can bet larger
organizations know this, too. This also includes any hardware or
software products that any substantial number of VSI customers need,
but that VSI does not themselves control, too. That's either in VSI's
own supply chain, or in the supply chains of VSI customers. You can
bet those vendors know their positional advantages here, too.
That's a short answer.
I have more than a little reading in this area, but here are a few
relevant to what's written above....
Pure Personal Opinion | HoffmanLabs LLC