2021-06-01 16:22:47 UTC
Hello everyone, after the odd DCL "READ/TIMEOUT=n" issue that I posted a couple of weeks ago, I have another problem to ask about. Unfortunately, unlike the last issue, I don't have any hypothesis about what's causing the problem nor how to prevent it.
So: I'm running OpenVMS Alpha V7.3-2; HP TCP/IP V5.4 with ECO 6. I have a few hundred "reverse Telnet" printers defined on the server. These have TNAxxxx: devices created, but they do not have VMS-level queues. They're used by code that simply opens the TNA device, writes to it, and - at some point - closes it. None of them are set "spooled." The actual printers are varied: Mostly HP LaserJets and OKI laser printers, plus an assortment of other brands and models. Most use port 9100. To make things interesting, some are serial printers connected to Lantronix terminal servers, and those use other port numbers (10001 and above.) The network infrastructure is Cisco routers and switches, shared with plenty of other servers running Windows or Linux or AIX. This has been in place for years, relatively trouble free.
But intermittently, a printer will have its TNA device decide that it is owned by a non-existent process: SHOW DEVICE/FULL will show a non-zero PID, but a null process name. SHOW PROCESS/ID=xxxxxxxx on the PID will give a non-existent process message. If we wait, rarely the problem will resolve itself, but most of the time (perhaps 95+%) it does not. When it does not clear itself, we know that we'll eventually get a phone call from users that are complaining that their reports aren't printing. This problem seems to occur on average perhaps 3 or 4 times a day. It occurs on the busier printers more often than the infrequently used printers.
In no particular order, we've come up with three ways that we can _usually_ "fix" this:
1) Delete and re-create the TNA device (TELNET /DELETE_SESSION, TELNET /CREATE_SESSION)
2) From a privileged account, copy any file to the TNA device (such as COPY NL: TNAxxxx:)
3) Reboot (Obviously this is not at all desirable, but it's guaranteed to work!)
As a workaround, I've set up a DCL batch job that checks all TNA printer devices every five minutes with F$GETDVI, and if it find one with a non-zero owner PID, and F$GETJPI on that PID returns a non-existent process status, I copy NL: to the TNA: device and check the owner PID again. So far this works reliably, with the re-checked PID coming up as zero.
But that's a hack. I'd much rather prevent the problem from occurring in the first place. Any ideas on what causes this and/or how to stop it?