Stephen Hoffman via Info-vax
Sent: January 21, 2017 11:53 AM
Subject: Re: [Info-vax] Doing time on VAX/VMS
Post by IanD
I've always had this general notion (a head based heuristic) that IT
problems pretty much are caused by or somehow related to time.
Post by IanD
conditions, synchronization, timing, recording bla bla bla. Errors of
logic excluded of course...
Many of the traditional programming languages lack support for
threading and related, but that gets lost in most discussions. Newer
incarnations of C get better here, though OpenVMS itself requires KP
Threads to get anywhere, and that's not been integrated into any of the
languages. Constructs such as libdispatch/GCD are absent, and ASTs —
which I'd originally found very useful, but libdispatch/GCD and blocks
and dispatch queues are just as nice to program and much more
flexible than ASTs, and the blocks keep the related code together
rather than necessarily and inherently scattering the related code
module as happens with ASTs. POSIX threads in C and KP threads in
OpenVMS can work here, too — but they're rather more complex to
use, and tend to scatter the source code logic around.
Post by IanD
How to enhance VMS clusters so that they can use a relative time
source? I guess the original design was based on a universal time
measurement and therefore required a single based time source for
How do distributed systems ultimately resolve synchronization
Post by IanD
they use a single synchronization source?
There's a joke: a person with one watch knows what time it is. A
person with two watches is never sure. Between the different
computers and clocks, and the distances between the computers and
the clocks, and with the inevitable occasional packet losses and restarts
and assorted skewage, things start to get murky.
For many apps, accurate time is rather less interesting than the local
arrival order and transactional controls. Or of the most recent
update, for status data that can be UDP-multicast or such. What the
particular application expects and needs. Hopefully few of us are
still saddled with local time values assumed as as inviolate monotomic-
ascending indexes; that we're avoiding most of the problems that can
arise with erant use of CLOCK_REALTIME and CLOCK_MONOTONIC or
equivalent, but that's fodder for another discussion or three. There
are many presentations and papers on the general topic of distributed
This also all ties back to the previous discussions of the CAP theorem
Post by IanD
How to take VMS and it's clustering to a hierarchical system or better
still, fully relational (I do like Google Groups and it's circular
Clustering doesn't do much for you that can't also be done with file
shares and Zookeeper or etcd or such.
For folks using DLM on OpenVMS for selecting a primary or leader or
coordinator, the following discussion (from several years ago) should
That, and the DLM sequence involving $enq[w] and $deq[w] and ASTS
for this same leadership selection task certainly works, but involves
absurd amounts of glue code. I've ended up writing the necessary code
for abstracting these APIs, as most other folks have done. Flipping
huge piles of glue code to elect a leader, or otherwise. Ponder
whether new-to-OpenVMS developers want to have to learn and
write and support those same abstractions, for what should be a
available task within a cluster? They'll certainly ponder loading
Zookeeper or etcd and Kubernates and running on Centos or RHEL or
Void or otherwise, though.
OpenVMS and clustering is stuck at the DLM and shared file access era,
and the developer is then necessarily off to other tools and APIs for
service discover, configuration, distributed authentication, app
distribution and coordination and app containment, and other such.
More glue code there, and absolutely no clear examples of how to do any
of those tasks in the "proper OpenVMS way", either. Zippo for service
discovery, outside of some old RPC bits nobody uses or DNS SRV
or such, or maybe rolling your own integration with LDAP. There's
also little documentation around keeping OpenVMS apps from
stepping on each other, too — experienced OpenVMS developers
know how, but we're all one facility prefix collision or one leaking DEC C
logical name away from a Really Weird Bug. And I've been hitting
those cases more often, as we're spending more time and effort
integrating apps from disparate sources — from app stacking and from
longer dependency chains, or whatever y'all want to call it. That's
before discussing potentially nefarious apps and tools, and y'all are
seriously deluded if you think that's not eventually going to happen (to
you), if it hasn't already (and you just don't know it).
Then there's that cluster shared storage access and distributed file
locking is great right up until that same disk storage — HDDs or SSD —
becomes a bottleneck, and that HBVS and related approaches just don't
scale, even now. How many of us have tussled with hot files and hot
disks and excessive I/O rates?
Then consider whether we really expecting to be sharing traditional
disk storage going forward when we probably want to be running
directly out of memory and journaling to slower storage or shadowed
the purposes of redundancy? I'm working more and more with data in
memory and only journaling writes to local non-volatile storage or to
memory or storage on another server, and spending somewhat less time
working around sharing disks across hosts. That old shared-HDD
approach from clustering still works certainly — and SSD helps alleviate
some of the performance limitations — but I just don't see the
popularity of that approach doing anything but declining over the next
decade, as compared with apps using RDMA or other access to transfer
and to journal data (or ZeroMQ or RabbitMQ or otherwise, for
transactional processing), and with the application data residing in
volatile and increasingly non-volatile byte-addressable memory.
TL;DR: Go Big or Go Home
There is a fundamental issue with what you are saying above.
Yes, there are alternatives to shared disk (OpenVMS, Linux/GFS, z/OS), but while one does need follow the recommended shared disk prog models, the shared nothing model's inter-node management (node add/deletes), HA, data consistency, data sharding (perf impacts), file replication, DR/inter-site load balancing etc. are all done at the Application level. That is a HUGE amount of additional coding, complexity etc that the App developer needs to consider and design into their App code. And each Application group might decide to do it differently.
With a shared disk model, these aspects of the solution are primarily handled at the OS layer e.g. the Cluster takes care of node adds/deletes - not the application; data consistency is handled by HBVS - not file replication (gotta love the term "eventual consistency"), designed at the App level; any node can directly update any data on any system - not via DB update routing designed, implemented and maintained at the App level.
So yes, while its not perfect and the OpenVMS shared disk model certainly does warrant upgrades and improvements (cluster node limits expanded as just one example), the reality is that the shared disk model does work and is considered by most Customers who use it, rock solid.
As I have stated before, comparing the traditional shared nothing programming model to the shared disk model is really asking the question "do you want a dragster (shared nothing) or a Prorsche (shared disk)?"
Extract "Comparing shared-nothing and shared-disk in benchmarks is analogous to comparing a dragster and a Porsche. The dragster, like the hand-tuned shared-nothing database, will beat the Porsche in a straight quarter mile race. However, the Porsche, like a shared-disk database, will easily beat the dragster on regular roads. If your selected benchmark is a quarter mile straightaway that tests all out speed, like Sysbench, a shared-nothing database will win. However, shared-disk will perform better in real world environments."