Discussion:
resetting SSH without restarting the computer
(too old to reply)
Joukj
2015-12-18 10:43:46 UTC
Permalink
Hi All

Form some reason on my OpenVMS8.4 (AXP+IA64) and TCP/IP services V5.7 -
ECO 5, once in a while the SSH-server stops making connections. On the
client side I see:

$ ssh -v valeta
debug(18-DEC-2015 11:38:03.32): Ssh2/SSH2.C:1896: CRTL version
(SYS$SHARE:DECC$SHR.EXE ident) is ELF
debug(18-DEC-2015 11:38:03.34): SshAppCommon/SSHAPPCOMMON.C:313:
Allocating global SshRegex context.
debug(18-DEC-2015 11:38:03.34): SshConfig/SSHCONFIG.C:3482: Metaconfig
parsing stopped at line 4.
debug(18-DEC-2015 11:38:03.35): SshConfig/SSHCONFIG.C:890: Setting
variable 'VerboseMode' to 'FALSE'.
debug(18-DEC-2015 11:38:03.35): SshConfig/SSHCONFIG.C:3390: Unable to
open ssh2/ssh2_config
debug(18-DEC-2015 11:38:03.36): Connecting to valeta, port 22... (SOCKS
not used)
debug(18-DEC-2015 11:38:03.36): Ssh2/SSH2.C:2881: Entering event loop.
debug(18-DEC-2015 11:38:03.37): Ssh2Client/SSHCLIENT.C:1655: Creating
transport protocol.
debug(18-DEC-2015 11:38:03.37):
SshAuthMethodClient/SSHAUTHMETHODC.C:104: Added "publickey" to usable
methods.
debug(18-DEC-2015 11:38:03.37):
SshAuthMethodClient/SSHAUTHMETHODC.C:104: Added "keyboard-interactive"
to usable methods.
debug(18-DEC-2015 11:38:03.37):
SshAuthMethodClient/SSHAUTHMETHODC.C:104: Added "password" to usable
methods.
debug(18-DEC-2015 11:38:03.37): Ssh2Client/SSHCLIENT.C:1696: Creating
userauth protocol.
debug(18-DEC-2015 11:38:03.37): client supports 3 auth methods:
'publickey,keyboard-interactive,password'
debug(18-DEC-2015 11:38:03.37): SshUnixTcp/SSHUNIXTCP.C:1758: using
local hostname hrem159.nano.tudelft.nl
debug(18-DEC-2015 11:38:03.37): Ssh2Common/SSHCOMMON.C:541: local ip =
100.100.100.1, local port = 49207
debug(18-DEC-2015 11:38:03.37): Ssh2Common/SSHCOMMON.C:543: remote ip =
100.100.100.2, remote port = 22
debug(18-DEC-2015 11:38:03.37): SshConnection/SSHCONN.C:2584: Wrapping...
debug(18-DEC-2015 11:38:03.37): SshReadLine/SSHREADLINE.C:3662:
Initializing ReadLine...
debug(18-DEC-2015 11:38:03.49): Ssh2Common/SSHCOMMON.C:180: DISCONNECT
received: Connection closed by remote host.
debug(18-DEC-2015 11:38:03.49): SshReadLine/SSHREADLINE.C:3728:
Uninitializing ReadLine...
warning: Authentication failed.
debug(18-DEC-2015 11:38:03.49): Ssh2/SSH2.C:327: locally_generated = TRUE
Disconnected; connection lost (Connection closed by remote host.).

debug(18-DEC-2015 11:38:03.49): Ssh2Client/SSHCLIENT.C:1731: Destroying
client.
debug(18-DEC-2015 11:38:03.49): SshConfig/SSHCONFIG.C:2888: Freeing pki.
(host_pki != NULL, user_pki = NULL)
debug(18-DEC-2015 11:38:03.49): SshConnection/SSHCONN.C:2636: Destroying
SshConn object.
debug(18-DEC-2015 11:38:03.49): Ssh2Client/SSHCLIENT.C:1799: Destroying
client completed.
debug(18-DEC-2015 11:38:03.49):
SshAuthMethodClient/SSHAUTHMETHODC.C:109: Destroying authentication
method array.
debug(18-DEC-2015 11:38:03.52): SshAppCommon/SSHAPPCOMMON.C:326: Freeing
global SshRegex context.
debug(18-DEC-2015 11:38:03.52): SshConfig/SSHCONFIG.C:2888: Freeing pki.
(host_pki = NULL, user_pki = NULL)


Question 1:
If I reboot everything is OK ofcourse. But how can I reset the
SSH-server without rebooting?
I already tried
$ @SYS$MANAGER:TCPIP$SSH_SHUTDOWN.COM
$ @SYS$MANAGER:TCPIP$SSH_STARTUP.COM
But that dis not solve the problem.




Question 2:
Does anyone has a clue which could cause this problem?


Regards
Jouk
Roy Omond
2015-12-18 11:05:28 UTC
Permalink
Post by Joukj
Hi All
Form some reason on my OpenVMS8.4 (AXP+IA64) and TCP/IP services V5.7 -
ECO 5, once in a while the SSH-server stops making connections. On the
[...snip...]
If I reboot everything is OK of course. But how can I reset the
SSH-server without rebooting?
I already tried
But that did not solve the problem.
Does anyone has a clue which could cause this problem?
Reached some limit on the server side ?

MAXPROCESSCNT ?
Limit on the SSH service ?
Limit on number of interactive logins ?
Limit on individual user's Maxjobs ?
Non-paged pool ?
Joukj
2015-12-18 12:18:03 UTC
Permalink
Post by Roy Omond
Post by Joukj
Hi All
Form some reason on my OpenVMS8.4 (AXP+IA64) and TCP/IP services V5.7 -
ECO 5, once in a while the SSH-server stops making connections. On the
[...snip...]
If I reboot everything is OK of course. But how can I reset the
SSH-server without rebooting?
I already tried
But that did not solve the problem.
Does anyone has a clue which could cause this problem?
Reached some limit on the server side ?
MAXPROCESSCNT ?
About 30 processes with MAXPROCESSCNT=1033
Post by Roy Omond
Limit on the SSH service ?
What limit? presently there are 0 connections.
Post by Roy Omond
Limit on number of interactive logins ?
Less than 10 (maximum is 64)
Post by Roy Omond
Limit on individual user's Maxjobs ?
It happens before the user logs on, so we are talking to the server only....
Post by Roy Omond
Non-paged pool ?
More than half is free.



Jouk
Kerry Main
2015-12-18 12:49:12 UTC
Permalink
-----Original Message-----
Joukj via Info-vax
Sent: 18-Dec-15 7:18 AM
Subject: Re: [New Info-vax] resetting SSH without restarting the
computer
Post by Roy Omond
Post by Joukj
Hi All
Form some reason on my OpenVMS8.4 (AXP+IA64) and TCP/IP
services V5.7 -
Post by Roy Omond
Post by Joukj
ECO 5, once in a while the SSH-server stops making connections. On
the
Post by Roy Omond
Post by Joukj
[...snip...]
If I reboot everything is OK of course. But how can I reset the
SSH-server without rebooting?
I already tried
But that did not solve the problem.
Does anyone has a clue which could cause this problem?
Reached some limit on the server side ?
MAXPROCESSCNT ?
About 30 processes with MAXPROCESSCNT=1033
Post by Roy Omond
Limit on the SSH service ?
What limit? presently there are 0 connections.
Post by Roy Omond
Limit on number of interactive logins ?
Less than 10 (maximum is 64)
Post by Roy Omond
Limit on individual user's Maxjobs ?
It happens before the user logs on, so we are talking to the server only....
Post by Roy Omond
Non-paged pool ?
More than half is free.
This is a good example where using Availability Manager (free on
Alpha) might be a good troubleshooting tool.

Among many other features, it not only dynamically shows the quotas
for each process, but ALSO, allows one to increase them on the fly.

I am not aware of any other tool that allows you to dynamically increase
process quotas on running systems.

Reference:
http://h71000.www7.hp.com/openvms/products/availman/index.html

Regards,

Kerry Main
Kerry dot main at starkgaming dot com
John E. Malmberg
2015-12-18 14:10:03 UTC
Permalink
Post by Joukj
Hi All
Form some reason on my OpenVMS8.4 (AXP+IA64) and TCP/IP services V5.7 -
ECO 5, once in a while the SSH-server stops making connections. On the
[...snip...]
If I reboot everything is OK of course. But how can I reset the
SSH-server without rebooting?
I already tried
But that did not solve the problem.
Does anyone has a clue which could cause this problem?
It is stating an authentication issue in your log.

Does it do this for all users when this happens, or just a certain user?

Is anything showing up for break in evasion?

Is anything showing up in accounting? (Assuming that is enabled)

Does the SSH server leave any logs behind on VMS of the failure?

I use SSH with TCP 5.7 all the time and it has been 134 days since the
last power failure longer than the UPS battery and I have not seen this
issue.

Are you running a high volume of SSH requests?

Regards,
-John
Joukj
2015-12-18 14:50:58 UTC
Permalink
Post by John E. Malmberg
Post by Joukj
Hi All
Form some reason on my OpenVMS8.4 (AXP+IA64) and TCP/IP services V5.7 -
ECO 5, once in a while the SSH-server stops making connections. On the
[...snip...]
If I reboot everything is OK of course. But how can I reset the
SSH-server without rebooting?
I already tried
But that did not solve the problem.
Does anyone has a clue which could cause this problem?
It is stating an authentication issue in your log.
Does it do this for all users when this happens, or just a certain user?
All users
Post by John E. Malmberg
Is anything showing up for break in evasion?
I suffer from script-kids all the time. Maybe all the fails attemts have
broken something.
Post by John E. Malmberg
Is anything showing up in accounting? (Assuming that is enabled)
Accounting for a failed connection gives:

NETWORK Process Termination
---------------------------
Username: TCPIP$SSH UIC:
[TCPIP$AUX,TCPIP$SSH]
Account: TCPIP Finish time: 18-DEC-2015
15:39:40.93
Process ID: 21E071B9 Start time: 18-DEC-2015
15:39:40.73
Owner ID: Elapsed time: 0
00:00:00.19
Terminal name: Processor time: 0
00:00:00.05
Remote node addr: Priority: 8
Remote node name: Privilege <31-00>: 00108000
Remote ID: TCPIP$SSH Privilege <63-32>: 00000000
Remote full name: 131.180.116.42
Posix UID: -2 Posix GID: -2 (%XFFFFFFFE)
Queue entry: Final status code: 1000000C
Queue name:
Job name:
Final status text: %SYSTEM-F-ACCVIO, access violation, reason mask=!XB,
virtual
Page faults: 672 Direct IO: 44
Page fault reads: 82 Buffered IO: 117
Peak working set: 8896 Volumes mounted: 0
Peak page file: 195184 Images executed: 5
Post by John E. Malmberg
Does the SSH server leave any logs behind on VMS of the failure?
See my answers to Steven's posts
Post by John E. Malmberg
I use SSH with TCP 5.7 all the time and it has been 134 days since the
last power failure longer than the UPS battery and I have not seen this
issue.
Are you running a high volume of SSH requests?
Sometimes, due to attacks on the machines : I see several of them each
week.But presently it is quiet.


Note that some satelites are having problems and others not although
they boot from the same boot-member and are configured in exactly the
same way.




Jouk
j***@yahoo.co.uk
2015-12-18 17:16:55 UTC
Permalink
Post by Joukj
Post by John E. Malmberg
Post by Joukj
Hi All
Form some reason on my OpenVMS8.4 (AXP+IA64) and TCP/IP services V5.7 -
ECO 5, once in a while the SSH-server stops making connections. On the
[...snip...]
If I reboot everything is OK of course. But how can I reset the
SSH-server without rebooting?
I already tried
But that did not solve the problem.
Does anyone has a clue which could cause this problem?
It is stating an authentication issue in your log.
Does it do this for all users when this happens, or just a certain user?
All users
Post by John E. Malmberg
Is anything showing up for break in evasion?
I suffer from script-kids all the time. Maybe all the fails attemts have
broken something.
Post by John E. Malmberg
Is anything showing up in accounting? (Assuming that is enabled)
NETWORK Process Termination
---------------------------
[TCPIP$AUX,TCPIP$SSH]
Account: TCPIP Finish time: 18-DEC-2015
15:39:40.93
Process ID: 21E071B9 Start time: 18-DEC-2015
15:39:40.73
Owner ID: Elapsed time: 0
00:00:00.19
Terminal name: Processor time: 0
00:00:00.05
Remote node addr: Priority: 8
Remote node name: Privilege <31-00>: 00108000
Remote ID: TCPIP$SSH Privilege <63-32>: 00000000
Remote full name: 131.180.116.42
Posix UID: -2 Posix GID: -2 (%XFFFFFFFE)
Queue entry: Final status code: 1000000C
Final status text: %SYSTEM-F-ACCVIO, access violation, reason mask=!XB,
virtual
Page faults: 672 Direct IO: 44
Page fault reads: 82 Buffered IO: 117
Peak working set: 8896 Volumes mounted: 0
Peak page file: 195184 Images executed: 5
Post by John E. Malmberg
Does the SSH server leave any logs behind on VMS of the failure?
See my answers to Steven's posts
Post by John E. Malmberg
I use SSH with TCP 5.7 all the time and it has been 134 days since the
last power failure longer than the UPS battery and I have not seen this
issue.
Are you running a high volume of SSH requests?
Sometimes, due to attacks on the machines : I see several of them each
week.But presently it is quiet.
Note that some satelites are having problems and others not although
they boot from the same boot-member and are configured in exactly the
same way.
Jouk
Here be lots of questions, no answers, and a little thought.

If you shut down the SSH server without the immediate SSH restart, do
you see anything unexpected from e.g. SHOW SYS ?

If you then wait long enough for failed SSH/TCP connections to time
out, then look again, then restart TCPIP$SSH, how does the system
then behave?

[Thinking: something may be being (re-)initialised by a reboot which
isn't being properly re-initialised by SSH server shutdown quickly
followed by SSH server startup. Lots of possibilities there...]

"I suffer from script-kids all the time" isn't an entirely transparent
answer to "Is anything showing up for break in evasion?" but maybe
there's too much sensitive data to post?

Is there any of that info you can actually post without posting stuff
which shouldn't be posted?

Is it always the same remote client(s) in the failure records? Is
there anything special about the remote (client) host (e.g. dubious IP
address range?) or are they all known and explainable?

Dumb question: is there a support contract?

Question to the experts: might there be any value in a system dump?

Followup to joukj: if a system dump would help, would it be possible?
(don't see why not, in principle, given that the current fix is to
reboot, but...)

Best of luck.
Jouk
2015-12-18 19:36:13 UTC
Permalink
Post by j***@yahoo.co.uk
Post by Joukj
Post by John E. Malmberg
Post by Joukj
Hi All
Form some reason on my OpenVMS8.4 (AXP+IA64) and TCP/IP services V5.7 -
ECO 5, once in a while the SSH-server stops making connections. On the
[...snip...]
If I reboot everything is OK of course. But how can I reset the
SSH-server without rebooting?
I already tried
But that did not solve the problem.
Does anyone has a clue which could cause this problem?
It is stating an authentication issue in your log.
Does it do this for all users when this happens, or just a certain user?
All users
Post by John E. Malmberg
Is anything showing up for break in evasion?
I suffer from script-kids all the time. Maybe all the fails attemts have
broken something.
Post by John E. Malmberg
Is anything showing up in accounting? (Assuming that is enabled)
NETWORK Process Termination
---------------------------
[TCPIP$AUX,TCPIP$SSH]
Account: TCPIP Finish time: 18-DEC-2015
15:39:40.93
Process ID: 21E071B9 Start time: 18-DEC-2015
15:39:40.73
Owner ID: Elapsed time: 0
00:00:00.19
Terminal name: Processor time: 0
00:00:00.05
Remote node addr: Priority: 8
Remote node name: Privilege <31-00>: 00108000
Remote ID: TCPIP$SSH Privilege <63-32>: 00000000
Remote full name: 131.180.116.42
Posix UID: -2 Posix GID: -2 (%XFFFFFFFE)
Queue entry: Final status code: 1000000C
Final status text: %SYSTEM-F-ACCVIO, access violation, reason mask=!XB,
virtual
Page faults: 672 Direct IO: 44
Page fault reads: 82 Buffered IO: 117
Peak working set: 8896 Volumes mounted: 0
Peak page file: 195184 Images executed: 5
Post by John E. Malmberg
Does the SSH server leave any logs behind on VMS of the failure?
See my answers to Steven's posts
Post by John E. Malmberg
I use SSH with TCP 5.7 all the time and it has been 134 days since the
last power failure longer than the UPS battery and I have not seen this
issue.
Are you running a high volume of SSH requests?
Sometimes, due to attacks on the machines : I see several of them each
week.But presently it is quiet.
Note that some satelites are having problems and others not although
they boot from the same boot-member and are configured in exactly the
same way.
Jouk
Here be lots of questions, no answers, and a little thought.
If you shut down the SSH server without the immediate SSH restart, do
you see anything unexpected from e.g. SHOW SYS ?
No suspicious entries...
Post by j***@yahoo.co.uk
If you then wait long enough for failed SSH/TCP connections to time
out, then look again, then restart TCPIP$SSH, how does the system
then behave?
I'm going to try this....
Post by j***@yahoo.co.uk
[Thinking: something may be being (re-)initialised by a reboot which
isn't being properly re-initialised by SSH server shutdown quickly
followed by SSH server startup. Lots of possibilities there...]
that is clear, but what?
Post by j***@yahoo.co.uk
"I suffer from script-kids all the time" isn't an entirely transparent
answer to "Is anything showing up for break in evasion?" but maybe
there's too much sensitive data to post?
I see a lot of failed logins via i.e. ssh by means of "SHOW INTRUSION"
Never noticed any trace of a succeeded break-in.
Post by j***@yahoo.co.uk
Is there any of that info you can actually post without posting stuff
which shouldn't be posted?
Is it always the same remote client(s) in the failure records? Is
there anything special about the remote (client) host (e.g. dubious IP
address range?) or are they all known and explainable?
Ofcourse I see dubious IP addresses from countries you expect it from
Post by j***@yahoo.co.uk
Dumb question: is there a support contract?
no.
Post by j***@yahoo.co.uk
Question to the experts: might there be any value in a system dump?
Probably there is noting in the dump, because the sshd is not running at
that time.
Post by j***@yahoo.co.uk
Followup to joukj: if a system dump would help, would it be possible?
(don't see why not, in principle, given that the current fix is to
reboot, but...)
But, I have good reasons at the moment not rebooting the machine. That
brings me back to question no. 1: How to reset without rebooting.
Post by j***@yahoo.co.uk
Best of luck.
j***@yahoo.co.uk
2015-12-18 22:20:20 UTC
Permalink
Post by Jouk
Post by j***@yahoo.co.uk
Post by Joukj
Post by John E. Malmberg
Post by Joukj
Hi All
Form some reason on my OpenVMS8.4 (AXP+IA64) and TCP/IP services V5.7 -
ECO 5, once in a while the SSH-server stops making connections. On the
[...snip...]
If I reboot everything is OK of course. But how can I reset the
SSH-server without rebooting?
I already tried
But that did not solve the problem.
Does anyone has a clue which could cause this problem?
It is stating an authentication issue in your log.
Does it do this for all users when this happens, or just a certain user?
All users
Post by John E. Malmberg
Is anything showing up for break in evasion?
I suffer from script-kids all the time. Maybe all the fails attemts have
broken something.
Post by John E. Malmberg
Is anything showing up in accounting? (Assuming that is enabled)
NETWORK Process Termination
---------------------------
[TCPIP$AUX,TCPIP$SSH]
Account: TCPIP Finish time: 18-DEC-2015
15:39:40.93
Process ID: 21E071B9 Start time: 18-DEC-2015
15:39:40.73
Owner ID: Elapsed time: 0
00:00:00.19
Terminal name: Processor time: 0
00:00:00.05
Remote node addr: Priority: 8
Remote node name: Privilege <31-00>: 00108000
Remote ID: TCPIP$SSH Privilege <63-32>: 00000000
Remote full name: 131.180.116.42
Posix UID: -2 Posix GID: -2 (%XFFFFFFFE)
Queue entry: Final status code: 1000000C
Final status text: %SYSTEM-F-ACCVIO, access violation, reason mask=!XB,
virtual
Page faults: 672 Direct IO: 44
Page fault reads: 82 Buffered IO: 117
Peak working set: 8896 Volumes mounted: 0
Peak page file: 195184 Images executed: 5
Post by John E. Malmberg
Does the SSH server leave any logs behind on VMS of the failure?
See my answers to Steven's posts
Post by John E. Malmberg
I use SSH with TCP 5.7 all the time and it has been 134 days since the
last power failure longer than the UPS battery and I have not seen this
issue.
Are you running a high volume of SSH requests?
Sometimes, due to attacks on the machines : I see several of them each
week.But presently it is quiet.
Note that some satelites are having problems and others not although
they boot from the same boot-member and are configured in exactly the
same way.
Jouk
Here be lots of questions, no answers, and a little thought.
If you shut down the SSH server without the immediate SSH restart, do
you see anything unexpected from e.g. SHOW SYS ?
No suspicious entries...
Post by j***@yahoo.co.uk
If you then wait long enough for failed SSH/TCP connections to time
out, then look again, then restart TCPIP$SSH, how does the system
then behave?
I'm going to try this....
Post by j***@yahoo.co.uk
[Thinking: something may be being (re-)initialised by a reboot which
isn't being properly re-initialised by SSH server shutdown quickly
followed by SSH server startup. Lots of possibilities there...]
that is clear, but what?
Post by j***@yahoo.co.uk
"I suffer from script-kids all the time" isn't an entirely transparent
answer to "Is anything showing up for break in evasion?" but maybe
there's too much sensitive data to post?
I see a lot of failed logins via i.e. ssh by means of "SHOW INTRUSION"
Never noticed any trace of a succeeded break-in.
Post by j***@yahoo.co.uk
Is there any of that info you can actually post without posting stuff
which shouldn't be posted?
Is it always the same remote client(s) in the failure records? Is
there anything special about the remote (client) host (e.g. dubious IP
address range?) or are they all known and explainable?
Ofcourse I see dubious IP addresses from countries you expect it from
Post by j***@yahoo.co.uk
Dumb question: is there a support contract?
no.
Post by j***@yahoo.co.uk
Question to the experts: might there be any value in a system dump?
Probably there is noting in the dump, because the sshd is not running at
that time.
Post by j***@yahoo.co.uk
Followup to joukj: if a system dump would help, would it be possible?
(don't see why not, in principle, given that the current fix is to
reboot, but...)
But, I have good reasons at the moment not rebooting the machine. That
brings me back to question no. 1: How to reset without rebooting.
Post by j***@yahoo.co.uk
Best of luck.
OK, thanks for that.

Dave Froble provided a handy-sounding suggestion to try if my
suggestions of topping TCPIP$SSH, waiting, and restarting
doesn't help: stop and restart the whole IP stack (if acceptable).

It seems something isn't being cleaned up once the problem arises.
TCPIP$SSH (or some related component) seems to see it, we can't
see it, but it seems highly likely that some item in a system
dump might shed light on it (regardless of whether TCPIP$SSH is
actually running) given that the problem persists after a quick
SSHD shutdown and restart. All it needs is the attention of a
suitable expert :)

Once again, best of luck.
Steven Schweda
2015-12-18 23:33:27 UTC
Permalink
Post by j***@yahoo.co.uk
It seems something isn't being cleaned up once the problem
arises.
Or some user quota gets exhausted as the system runs, and
doesn't improve until enough processes get killed (like all
of them). %SYSTEM-F-ACCVIO could be caused by something so
simple as a failing malloc() (with bad error handling).

Other potentially interesting experiments might include:

mcr tcpip$ssh_sshd2.exe

with various options (as might be suggested by
TCPIP$SSH_RUN.COM).
Steven Schweda
2015-12-18 23:46:09 UTC
Permalink
Post by Steven Schweda
mcr tcpip$ssh_sshd2.exe
That seems more entertaining/productive after:

set default tcpip$ssh_device:[tcpip$ssh]

"-i" seems to be the most realistic mode. Adding "-d 99"
makes it noisier. Either way, I get to a message
"INFORMATIONAL: Starting image in auxiliary server mode."
without an ACCVIO. With no args, I get the program version
before it dies:

ALP $ mcr tcpip$ssh_sshd2.exe
alp$dkc0:[sys0.syscommon.][sysexe]tcpip$ssh_sshd2.exe: SSH Secure Shell OpenVMS
(V5.5) 3.2.0 on COMPAQ Professional Workstation - VMS V8.4
Fri 18 17:39:43 WARNING: Starting image in non-auxiliary server mode.
[...]
Steven Schweda
2015-12-18 13:41:12 UTC
Permalink
Post by Joukj
Does anyone has a clue which could cause this problem?
I know nothing, but I'd look for a clue in
TCPIP$SSH_DEVICE:[TCPIP$SSH]TCPIP$SSH_RUN.LOG
(See: tcpip show service /full ssh) Of course, around here,
that file hit ;32767 back around 8-MAY-2014. Sigh. (The
PURGE commands in those TCPIP$xxx_RUN.COM scripts are so
cute.)
Post by Joukj
[...] the SSH-server stops making connections. [...]
debug(18-DEC-2015 11:38:03.49): Ssh2Common/SSHCOMMON.C:180: DISCONNECT
received: Connection closed by remote host.
More precisely, it makes connections, but it aborts them
prematurely?

My guess also would be some resource problem on the
server. Especially if the SSH-stop-start doesn't help, but
system-stop-start does. Does it fail this way for all users?

TCPIP$SYSTEM:TCPIP$SSH_RUN.COM looks at some logical names
(like, say, "tcpip$ssh_server_debug"), which might add some
info. (Or edit the script directly.)
Joukj
2015-12-18 14:17:17 UTC
Permalink
Post by Steven Schweda
Post by Joukj
Does anyone has a clue which could cause this problem?
I know nothing, but I'd look for a clue in
TCPIP$SSH_DEVICE:[TCPIP$SSH]TCPIP$SSH_RUN.LOG
(See: tcpip show service /full ssh) Of course, around here,
that file hit ;32767 back around 8-MAY-2014. Sigh. (The
PURGE commands in those TCPIP$xxx_RUN.COM scripts are so
cute.)
Seems to create a new file foe each connection (thus alos for all
connections of those ssh-script-kids that try to hack my machine. So it
was at version 32767. After deleting it and trying to connect I got

$ Set NoOn
$ VERIFY = F$VERIFY(F$TRNLNM("SYLOGIN_VERIFY"))
%SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual
address=0000000000000014, PC=00000000000B0B90, PS=0000001B

Improperly handled condition, image exit forced.
Signal arguments: Number = 0000000000000005
Name = 000000000000000C
0000000000000000
0000000000000014
00000000000B0B90
000000000000001B

Register dump:
R0 = 0000000000000000 R1 = 0000000000920000 R2 = 0000000000000000
R3 = 0000000000000004 R4 = 000000007FFCF818 R5 = 000000007FFCF8B0
R6 = 0000000010000001 R7 = 0000000000000001 R8 = 0000000000830098
R9 = 0000000000000069 R10 = 000000000000002D R11 = 0000000000000000
SP = 000000007AB4F360 TP = 000000007B50A1C8 R14 = 000000000003001A
R15 = 000000007B8B1BF8 R16 = FFFFFFFF84236180 R17 = 0000000000000000
R18 = 0000000000030010 R19 = 0000000000000000 R20 = 0001000000000000
R21 = 0000000000000002 R22 = 000000007AB4F579 R23 = 0000000000030011
R24 = 0001000000000000 R25 = 0000000000000001 R26 = 0000000000000045
R27 = 000000007AB4F580 R28 = 0000000000030018 R29 = 0000000000000002
R30 = 0000000000000001 R31 = 0000000000000001 PC = 00000000000B0B90
BSP/STORE = 000007FDBFFD4240 / 000007FDBFFD4240 PSR = 00001013084AE010
IIPA = FFFFFFFF803AABD0
B0 = 000000000007BC80 B6 = FFFFFFFF84236180 B7 = FFFFFFFF845797F0

Interrupted Frame RSE Backing Store, Size = 2 registers

R32 = 0000000000000014 R33 = C000000000000613
TCPIP$SSH job terminated at 18-DEC-2015 15:13:26.46

Accounting information:
Buffered I/O count: 102 Peak working set size:
8576
Direct I/O count: 38 Peak virtual size:
194880
Page faults: 650 Mounted volumes:
0
Charged CPU time: 0 00:00:00.06 Elapsed time: 0
00:00:00.15


Looks not good.

Jouk
David Froble
2015-12-18 17:20:46 UTC
Permalink
Post by Joukj
Post by Steven Schweda
Post by Joukj
Does anyone has a clue which could cause this problem?
I know nothing, but I'd look for a clue in
TCPIP$SSH_DEVICE:[TCPIP$SSH]TCPIP$SSH_RUN.LOG
(See: tcpip show service /full ssh) Of course, around here,
that file hit ;32767 back around 8-MAY-2014. Sigh. (The
PURGE commands in those TCPIP$xxx_RUN.COM scripts are so
cute.)
Seems to create a new file foe each connection (thus alos for all
connections of those ssh-script-kids that try to hack my machine. So it
was at version 32767. After deleting it and trying to connect I got
Well, if a VMS reboot solves the problem, for a time, then it sure isn't version
overrun, because a VMS reboot does nothing for that problem.
Post by Joukj
$ Set NoOn
$ VERIFY = F$VERIFY(F$TRNLNM("SYLOGIN_VERIFY"))
%SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual
address=0000000000000014, PC=00000000000B0B90, PS=0000001B
Improperly handled condition, image exit forced.
Signal arguments: Number = 0000000000000005
Name = 000000000000000C
0000000000000000
0000000000000014
00000000000B0B90
000000000000001B
R0 = 0000000000000000 R1 = 0000000000920000 R2 = 0000000000000000
R3 = 0000000000000004 R4 = 000000007FFCF818 R5 = 000000007FFCF8B0
R6 = 0000000010000001 R7 = 0000000000000001 R8 = 0000000000830098
R9 = 0000000000000069 R10 = 000000000000002D R11 = 0000000000000000
SP = 000000007AB4F360 TP = 000000007B50A1C8 R14 = 000000000003001A
R15 = 000000007B8B1BF8 R16 = FFFFFFFF84236180 R17 = 0000000000000000
R18 = 0000000000030010 R19 = 0000000000000000 R20 = 0001000000000000
R21 = 0000000000000002 R22 = 000000007AB4F579 R23 = 0000000000030011
R24 = 0001000000000000 R25 = 0000000000000001 R26 = 0000000000000045
R27 = 000000007AB4F580 R28 = 0000000000030018 R29 = 0000000000000002
R30 = 0000000000000001 R31 = 0000000000000001 PC = 00000000000B0B90
BSP/STORE = 000007FDBFFD4240 / 000007FDBFFD4240 PSR = 00001013084AE010
IIPA = FFFFFFFF803AABD0
B0 = 000000000007BC80 B6 = FFFFFFFF84236180 B7 = FFFFFFFF845797F0
Interrupted Frame RSE Backing Store, Size = 2 registers
R32 = 0000000000000014 R33 = C000000000000613
TCPIP$SSH job terminated at 18-DEC-2015 15:13:26.46
8576
Direct I/O count: 38 Peak virtual size: 194880
Page faults: 650 Mounted volumes: 0
Charged CPU time: 0 00:00:00.06 Elapsed time: 0
00:00:00.15
Looks not good.
Jouk
Stephen Hoffman
2015-12-18 19:05:55 UTC
Permalink
Post by David Froble
Well, if a VMS reboot solves the problem, for a time, then it sure
isn't version overrun, because a VMS reboot does nothing for that
problem.
It shouldn't. But it wouldn't surprise me to see deletion logic
lurking in some product startup.

Some ~forty years on, OpenVMS and its layered products still stink at
handling file versions. There's no API for this, either.

Only a handful of products deal with this case specifically, some tip
over, and more than a few folks use an intentionally-created ;32767 a
sabot tossed into the version machinery.
--
Pure Personal Opinion | HoffmanLabs LLC
Jouk
2015-12-18 19:23:43 UTC
Permalink
Post by Stephen Hoffman
Post by David Froble
Well, if a VMS reboot solves the problem, for a time, then it sure
isn't version overrun, because a VMS reboot does nothing for that
problem.
It shouldn't. But it wouldn't surprise me to see deletion logic
lurking in some product startup.
Some ~forty years on, OpenVMS and its layered products still stink at
handling file versions. There's no API for this, either.
Only a handful of products deal with this case specifically, some tip
over, and more than a few folks use an intentionally-created ;32767 a
sabot tossed into the version machinery.
The version overrun is not the problem: I deleted the last and only
version and it just created a new version ;1 with the "access violation"
dump in it.
Paul Sture
2015-12-18 19:22:45 UTC
Permalink
Post by Stephen Hoffman
Post by David Froble
Well, if a VMS reboot solves the problem, for a time, then it sure
isn't version overrun, because a VMS reboot does nothing for that
problem.
It shouldn't. But it wouldn't surprise me to see deletion logic
lurking in some product startup.
Some ~forty years on, OpenVMS and its layered products still stink at
handling file versions. There's no API for this, either.
Only a handful of products deal with this case specifically, some tip
over, and more than a few folks use an intentionally-created ;32767 a
sabot tossed into the version machinery.
One of the grounds for intentionally creating version ;32767 is where a
software product itself offers no sensible means to switch off a spew of
unwanted log files. Log file verbosity is another issue here, as is
the ability to read logs with a long lifetime (file locking options).
--
An invention needs to make sense in the world in which it's finished,
not the world in which it's started. -- Ray Kurzweil
Stephen Hoffman
2015-12-18 20:04:51 UTC
Permalink
Post by Paul Sture
One of the grounds for intentionally creating version ;32767 is where a
software product itself offers no sensible means to switch off a spew
of unwanted log files. Log file verbosity is another issue here, as is
the ability to read logs with a long lifetime (file locking options).
Ayup; have used that sabot myself.

Mechanisms such as application logging, operator communications, and
application crash handling and related tasks are all absurdly primitive
on OpenVMS.
--
Pure Personal Opinion | HoffmanLabs LLC
j***@yahoo.co.uk
2015-12-18 22:35:27 UTC
Permalink
Post by Stephen Hoffman
Post by Paul Sture
One of the grounds for intentionally creating version ;32767 is where a
software product itself offers no sensible means to switch off a spew
of unwanted log files. Log file verbosity is another issue here, as is
the ability to read logs with a long lifetime (file locking options).
Ayup; have used that sabot myself.
Mechanisms such as application logging, operator communications, and
application crash handling and related tasks are all absurdly primitive
on OpenVMS.
--
Pure Personal Opinion | HoffmanLabs LLC
On some future occasion (I won't have time for a while) perhaps
you could share with us what you see as missing from app logging,
operator comms, etc, and what you currently see as best in class.

I've used VMS for 3 decades and Windows and Linux for 2 decades each,
and had a spattering of exposure to Tru64 and Solaris. This has been
as a mixture of application developer, sysadmin, and occasional
dragged-in troubleshooter on the three platforms.

As system level, the one which I find contains the most useful
information (most relevant, most trustworthy) has for years been
VMS.

The Windows event logger is largely a shiny joke, anything useful
is (a) probably not in there (b) if it is there it's probably
impossible to find unless you already know where to look
(c) probably undocumented if you do find it. The UI is nice
and shiny though, and (marginally related) the resource monitor
is really quite neat sometimes.

Linux syslog might be marginally better. Doubtless systemd will
have the logging capability to end all logging capabilities in
due course.

I know even less about Apple. Maybe they're best in class?

Like I say, no rush.
Stephen Hoffman
2015-12-19 16:23:22 UTC
Permalink
Post by Stephen Hoffman
Mechanisms such as application logging, operator communications, and
application crash handling and related tasks are all absurdly primitive
on OpenVMS.
On some future occasion (I won't have time for a while) perhaps you
could share with us what you see as missing from app logging, operator
comms, etc, and what you currently see as best in class.
...
Like I say, no rush.
There's massive room for improvement here. What's here is state of the
art, circa 1980s.

Off the top...

There's no remote and no centralized collection of logs (OS X OS X
Console.app is a dumb-as-a-post approach for viewing all of the logs,
and utterly blows away what OpenVMS offers here), it's all text-based
and requiring filters to parse (and no filters) for alerts (yes, there
are some trade-offs with binary encoding), audits go in one place,
accounting data goes in another, and OPCOM and OPERATOR.LOG gets
who-knows-what, errors get logged into yet another place, crashes and
dumps go... somewhere, logs go... everywhere, the extensibility of
these current logging mechanisms is somewhere between limited and
baroque, there's no way to automate responses and no generic system
operator API (no, the schtick of creating a terminal and using a
terminal mailbox or using pseudo-terminals isn't a generic API).

There's no central collection of performance data, or licensing data,
configuration and configuration-change auditing, or whether the backups
ran to completion, for that matter.

No tie-ins with security alarms, no mechanism for triggering actions on
certain events — application- or site-specific access control list
routines — save by turning on a whole lot and plugging into and
snorting the entire auditing activity.

Then there's that you have to go dink around with any collection
processes you need. There's no syslog/syslog-ng client or server
support. No VSI-supported way to have a log server. (I'm not married
to syslog-ng et al, those just happen to be the most common choices,
and OpenVMS is going to have to interface with those and with tools
such as Splunk, if VSI wants to play in the same data centers as Linux
and BSD and Windows.)

There's no mechanism to upload application and system crashes
(PKE-encrypted, opt-in, yada yada) to detect patterns of problems.
Either locally, or in conjunction with the support vendor. CLUE CRASH
and what used to be known as CCAT was pretty good for its time, but now
this collection is commonplace. If HPE or VSI gets bombarded by ssh
server daemon crashes from all over the place, or gets crashes from one
particular combination of software and hardware, maybe they learn
slightly more quickly that there's a general problem or
incompatibility, for instance. Or maybe a OpenVMS-specific security
attack is underway, for that matter. And get a head-start on a fix.
(This does mean being able to sort through a not-inconsequential volume
of incoming data. The back-end here is as important as all the probes,
and it's more complex.)

There's no automatic log clean-up. Duh. Logs get written. Duh. Logs
get rolled over. Duh. Sooner or later, you must either delete the
less-interesting logs, or you fill your disk(s) and you die. Duh.
This is not rocket science.

Operator communications also include notifications of available
patches, and (opt-in, yada yada) these patches get downloaded and
staged locally. Or the system administrator gets the patch release
notes and a URL that allows them to purchase the patch(es) or to
purchase support online, if they're not already under contract.

Add in the ability to stage patches and software updates on a local
server, and push those out and install them across a herd of servers
from some up-rated and secured management interface — AMDS/AvailMan/OMS
etc, and pretty soon you start looking like actual distributed server
management capabilities.

If anybody ever actually gets serious about OpenVMS security, new root
certificates and certificate revocations need to get pushed out, too.
Ways to push out security data and related settings, such as telling
the (entirely non-existent) OpenVMS firewall to block a range of ports
or a particular internal server that's been causing problems in some
other part of the environment — think of network-wide break-in evasion,
with the firewalls tied to the web servers tied to the ssh servers.

Then there's searching. OpenVMS SEARCH is stone-knives-and-bearskins
primitive. Why do I mention SEARCH in the context of system and
application logging? Because one of the things you do with logs is
search them. And you want those searches to be fast, fast, fast,
fast, fast. OpenVMS SEARCH is anything but. It's reliable, but it's
limited and entirely metadata-blind and completely unintegrated and
un-embeddable and it's glacially slow.

Then we get into provisioning and automated management. Which is what
you do, once you get logging under control, and when you finally flush
the idea of individually managing bespoke server configurations from
your cranium. Now you need ways to push out your unified logging — as
well as your other server and application requirements — and preferably
pushing out your logging and auditing profiles in some sort of
OpenVMS-generic format.

OpenVMS has an epic pile of ad-hoc features and bags grafted onto the
environment — much like Microsoft Windows, in that regard — and with
seemingly no idea and no plan and no design and no user interface for
where any of this stuff is going, or how it'll be used, nor how it'll
be automated and scaled up. All of which annoys me to no end. Once
you see systems that do have some sort of general idea and some
consistency, the holes in ad-hoc designs become much more obvious.
But I digress.

Anybody that can run a compiler can build crappy auditing and can log
random stuff in files in servers, and can dump on themselves all over
their file system. Rolling up this blizzard of data into something
useful, and particularly as you start rolling out more than the
ones-and-twos server deployments most OpenVMS sites are used to? That
gets ugly, with OpenVMS.

The paramount implementation of logging on OpenVMS? Is that some
combination of
ELV/DECevent/WBEM/SEA/WBES/ELMC/x86-64tools/ErrorToolDuJour and the
long-dead VCS VAXcluster console, some home-grown syslog-ng clients
piping a subset of the data to a text-based scanner or Splunk on some
remote box, and — who could forget? — the completely unencrypted SNMPv2
traffic via the wildly-insecure SMH tool?

Some OS features are part of the core foundation of an OS, and are used
and are integrated across OS tools and applications. Logging and log
management and monitoring is part of the foundation. Searching is a
key part, too. Encryption and certificates and password storage.
Backups. There are other areas.

Then there's what passes for logging in more than a few of "enterprise"
applications and the trend toward ever-crappier user interfaces and
logging and troubleshooting — whether through a lack of design and
thought, or by simply not looking at whether spending time on the
OpenVMS-original 1980s-vintage user interface is really the right
approach to be using in 2015 or 2020.

Logging on OpenVMS is crayon-grade. There are other issues, limits
and potentially-massive improvements lurking here, too.

VSI is occupied for ~three years, save for incremental upgrades and the
port. Which means the earliest we see this stuff starting to get used
is probably 2020. What will server and private-cloud logging look
like in 2020? It won't look like the 1980s.
--
Pure Personal Opinion | HoffmanLabs LLC
Johnny Billquist
2015-12-19 17:02:00 UTC
Permalink
Post by Stephen Hoffman
Post by Stephen Hoffman
Mechanisms such as application logging, operator communications, and
application crash handling and related tasks are all absurdly
primitive on OpenVMS.
On some future occasion (I won't have time for a while) perhaps you
could share with us what you see as missing from app logging, operator
comms, etc, and what you currently see as best in class.
...
Like I say, no rush.
There's massive room for improvement here. What's here is state of the
art, circa 1980s.
I'm going to obnoxious again.
While I totally agree that what VMS have is lacking, the fact is that in
general, Unix systems are not much better, and that includes OS-X.
Yes, you have an API for logging, which is good, and VMS should have one
as well. Unfortunately, not everything in Unix uses those APIs, so you
always finds logs done in different ways, placed in other places as
well. And the API, which good, is actually not preventing the
proliferation of different log files. syslogd just acts as a turnstile.
Logs are still getting written to different files in different
directories, and sometimes on different machines. And the format of
those logs can still be a confusing mess. And they are still, generally,
just text files. Meaning filtering has do be done on free flowing text.
And yes, that includes OS-X. Just check /var/log/...
OS-X is just like all the other Unix systems, but with a bunch of tools
on top, which makes it look polished as long as you don't pop the hood.
And, in fact, when Apple became creative, with binary information
instead of text files, it became an ever uglier mess with the proplists,
which noone really wants to deal with.

So, in truth, there isn't really any system that can be used as an
example of how it should be done today. Definitely do not look too much
towards Unix here, or it will just be a new morass. Unix is older than
VMS, and sometimes it really shows. Mostly in how inconsistent most
things are.
There are good bits, and bad bits, but the worst bit is that nothing is
the same as the next thing.
Post by Stephen Hoffman
There's no automatic log clean-up. Duh. Logs get written. Duh. Logs
get rolled over. Duh. Sooner or later, you must either delete the
less-interesting logs, or you fill your disk(s) and you die. Duh. This
is not rocket science.
Um. If you talk about what Unix does, then VMS have that ability
already. Unix likes to write logs, and from time to time rename them,
appending a number at the end, and delete the oldest one.
Gee, isn't that exactly what file version numbers already do for you,
and you can set a version limit with automatic deletion of old versions?

So no, I disagree with you here, Hoff. VMS already can do exactly the
same as Unix systems do with a bunch of tools.

Or did you have something else in mind, where some magic cleaning
happens based on some more clever base than just time?


For the rest of it (patching, upgrading, security and other items you
mention I can only agree with).

Johnny
--
Johnny Billquist || "I'm on a bus
|| on a psychedelic trip
email: ***@softjar.se || Reading murder books
pdp is alive! || tryin' to stay hip" - B. Idol
Kerry Main
2015-12-20 17:32:35 UTC
Permalink
-----Original Message-----
Johnny Billquist via Info-vax
Sent: 19-Dec-15 12:02 PM
Subject: Re: [New Info-vax] OT: best in class system+application logging,
crash dumps etc
Post by Stephen Hoffman
Post by Stephen Hoffman
Mechanisms such as application logging, operator communications,
and
Post by Stephen Hoffman
Post by Stephen Hoffman
application crash handling and related tasks are all absurdly
primitive on OpenVMS.
On some future occasion (I won't have time for a while) perhaps you
could share with us what you see as missing from app logging,
operator
Post by Stephen Hoffman
comms, etc, and what you currently see as best in class.
...
Like I say, no rush.
There's massive room for improvement here. What's here is state of
the
Post by Stephen Hoffman
art, circa 1980s.
I'm going to obnoxious again.
While I totally agree that what VMS have is lacking, the fact is that in
general, Unix systems are not much better, and that includes OS-X.
Yes, you have an API for logging, which is good, and VMS should have one
as well. Unfortunately, not everything in Unix uses those APIs, so you
always finds logs done in different ways, placed in other places as
well. And the API, which good, is actually not preventing the
proliferation of different log files. syslogd just acts as a turnstile.
Logs are still getting written to different files in different
directories, and sometimes on different machines. And the format of
those logs can still be a confusing mess. And they are still, generally,
just text files. Meaning filtering has do be done on free flowing text.
And yes, that includes OS-X. Just check /var/log/...
OS-X is just like all the other Unix systems, but with a bunch of tools
on top, which makes it look polished as long as you don't pop the hood.
And, in fact, when Apple became creative, with binary information
instead of text files, it became an ever uglier mess with the proplists,
which noone really wants to deal with.
So, in truth, there isn't really any system that can be used as an
example of how it should be done today. Definitely do not look too much
towards Unix here, or it will just be a new morass. Unix is older than
VMS, and sometimes it really shows. Mostly in how inconsistent most
things are.
There are good bits, and bad bits, but the worst bit is that nothing is
the same as the next thing.
Post by Stephen Hoffman
There's no automatic log clean-up. Duh. Logs get written. Duh. Logs
get rolled over. Duh. Sooner or later, you must either delete the
less-interesting logs, or you fill your disk(s) and you die. Duh. This
is not rocket science.
Um. If you talk about what Unix does, then VMS have that ability
already. Unix likes to write logs, and from time to time rename them,
appending a number at the end, and delete the oldest one.
Gee, isn't that exactly what file version numbers already do for you,
and you can set a version limit with automatic deletion of old versions?
So no, I disagree with you here, Hoff. VMS already can do exactly the
same as Unix systems do with a bunch of tools.
Or did you have something else in mind, where some magic cleaning
happens based on some more clever base than just time?
For the rest of it (patching, upgrading, security and other items you
mention I can only agree with).
Johnny
Re: log files management

One should also keep in mind that there are 3rd party management
Products that do this extremely well and are multi-platform which is
likely what a typical med-large DC environment would prefer anyway.
Especially when there is event filtering (integrated with NOC/ECC?),
log backup/ offsite archiving, common log mgmt. GUI across platforms
etc.

[note for future - Customers looking to automate their DC M&M (mgmt.
and monitoring) are also looking to use big data products for log file
analysis. When there are hundreds/thousands of P/V OS instances on
many different platforms, one can see why they are looking at these
big data tools.]

A few console mgmt. samples:
- TDI
http://www.tditechnologies.com/products/openvms-solutions

- Cockpit Manager
http://www.bgsoftware.nl/www1/index.php/legacy/cockpitmgr

- CA Console Management
http://www.ca.com/~/media/Files/ProductBriefs/ca-console-management-for-openvms.PDF


Regards,

Kerry Main
Kerry dot main at starkgaming dot com
V***@SendSpamHere.ORG
2015-12-20 23:38:24 UTC
Permalink
{...snip...}
[note for future - Customers looking to automate their DC M&M (mgmt.
What about customers running on AC? :P
--
VAXman- A Bored Certified VMS Kernel Mode Hacker VAXman(at)TMESIS(dot)ORG

I speak to machines with the voice of humanity.
Kerry Main
2015-12-21 01:01:55 UTC
Permalink
-----Original Message-----
VAXman---- via Info-vax
Sent: 20-Dec-15 6:38 PM
Subject: Re: [New Info-vax] OT: best in class system+application logging,
crash dumps etc
{...snip...}
[note for future - Customers looking to automate their DC M&M
(mgmt.
What about customers running on AC? :P
You mean the ones that swing both ways?

:-)

Regards,

Kerry Main
Kerry dot main at starkgaming dot com
David Froble
2015-12-19 20:38:46 UTC
Permalink
Post by Stephen Hoffman
Post by Stephen Hoffman
Mechanisms such as application logging, operator communications, and
application crash handling and related tasks are all absurdly
primitive on OpenVMS.
On some future occasion (I won't have time for a while) perhaps you
could share with us what you see as missing from app logging, operator
comms, etc, and what you currently see as best in class.
...
Like I say, no rush.
There's massive room for improvement here. What's here is state of the
art, circa 1980s.
Off the top...
There's no remote and no centralized collection of logs (OS X OS X
Console.app is a dumb-as-a-post approach for viewing all of the logs,
and utterly blows away what OpenVMS offers here), it's all text-based
and requiring filters to parse (and no filters) for alerts (yes, there
are some trade-offs with binary encoding), audits go in one place,
accounting data goes in another, and OPCOM and OPERATOR.LOG gets
who-knows-what, errors get logged into yet another place, crashes and
dumps go... somewhere, logs go... everywhere, the extensibility of these
current logging mechanisms is somewhere between limited and baroque,
there's no way to automate responses and no generic system operator API
(no, the schtick of creating a terminal and using a terminal mailbox or
using pseudo-terminals isn't a generic API).
There's no central collection of performance data, or licensing data,
configuration and configuration-change auditing, or whether the backups
ran to completion, for that matter.
No tie-ins with security alarms, no mechanism for triggering actions on
certain events — application- or site-specific access control list
routines — save by turning on a whole lot and plugging into and snorting
the entire auditing activity.
Then there's that you have to go dink around with any collection
processes you need. There's no syslog/syslog-ng client or server
support. No VSI-supported way to have a log server. (I'm not married
to syslog-ng et al, those just happen to be the most common choices, and
OpenVMS is going to have to interface with those and with tools such as
Splunk, if VSI wants to play in the same data centers as Linux and BSD
and Windows.)
There's no mechanism to upload application and system crashes
(PKE-encrypted, opt-in, yada yada) to detect patterns of problems.
Either locally, or in conjunction with the support vendor. CLUE CRASH
and what used to be known as CCAT was pretty good for its time, but now
this collection is commonplace. If HPE or VSI gets bombarded by ssh
server daemon crashes from all over the place, or gets crashes from one
particular combination of software and hardware, maybe they learn
slightly more quickly that there's a general problem or incompatibility,
for instance. Or maybe a OpenVMS-specific security attack is underway,
for that matter. And get a head-start on a fix. (This does mean being
able to sort through a not-inconsequential volume of incoming data. The
back-end here is as important as all the probes, and it's more complex.)
There's no automatic log clean-up. Duh. Logs get written. Duh. Logs
get rolled over. Duh. Sooner or later, you must either delete the
less-interesting logs, or you fill your disk(s) and you die. Duh.
This is not rocket science.
Operator communications also include notifications of available patches,
and (opt-in, yada yada) these patches get downloaded and staged
locally. Or the system administrator gets the patch release notes and
a URL that allows them to purchase the patch(es) or to purchase support
online, if they're not already under contract.
Add in the ability to stage patches and software updates on a local
server, and push those out and install them across a herd of servers
from some up-rated and secured management interface — AMDS/AvailMan/OMS
etc, and pretty soon you start looking like actual distributed server
management capabilities.
If anybody ever actually gets serious about OpenVMS security, new root
certificates and certificate revocations need to get pushed out, too.
Ways to push out security data and related settings, such as telling the
(entirely non-existent) OpenVMS firewall to block a range of ports or a
particular internal server that's been causing problems in some other
part of the environment — think of network-wide break-in evasion, with
the firewalls tied to the web servers tied to the ssh servers.
Then there's searching. OpenVMS SEARCH is stone-knives-and-bearskins
primitive. Why do I mention SEARCH in the context of system and
application logging? Because one of the things you do with logs is
search them. And you want those searches to be fast, fast, fast, fast,
fast. OpenVMS SEARCH is anything but. It's reliable, but it's
limited and entirely metadata-blind and completely unintegrated and
un-embeddable and it's glacially slow.
Then we get into provisioning and automated management. Which is what
you do, once you get logging under control, and when you finally flush
the idea of individually managing bespoke server configurations from
your cranium. Now you need ways to push out your unified logging — as
well as your other server and application requirements — and preferably
pushing out your logging and auditing profiles in some sort of
OpenVMS-generic format.
OpenVMS has an epic pile of ad-hoc features and bags grafted onto the
environment — much like Microsoft Windows, in that regard — and with
seemingly no idea and no plan and no design and no user interface for
where any of this stuff is going, or how it'll be used, nor how it'll be
automated and scaled up. All of which annoys me to no end. Once you
see systems that do have some sort of general idea and some consistency,
the holes in ad-hoc designs become much more obvious. But I digress.
Anybody that can run a compiler can build crappy auditing and can log
random stuff in files in servers, and can dump on themselves all over
their file system. Rolling up this blizzard of data into something
useful, and particularly as you start rolling out more than the
ones-and-twos server deployments most OpenVMS sites are used to? That
gets ugly, with OpenVMS.
The paramount implementation of logging on OpenVMS? Is that some
combination of
ELV/DECevent/WBEM/SEA/WBES/ELMC/x86-64tools/ErrorToolDuJour and the
long-dead VCS VAXcluster console, some home-grown syslog-ng clients
piping a subset of the data to a text-based scanner or Splunk on some
remote box, and — who could forget? — the completely unencrypted SNMPv2
traffic via the wildly-insecure SMH tool?
Some OS features are part of the core foundation of an OS, and are used
and are integrated across OS tools and applications. Logging and log
management and monitoring is part of the foundation. Searching is a
key part, too. Encryption and certificates and password storage.
Backups. There are other areas.
Then there's what passes for logging in more than a few of "enterprise"
applications and the trend toward ever-crappier user interfaces and
logging and troubleshooting — whether through a lack of design and
thought, or by simply not looking at whether spending time on the
OpenVMS-original 1980s-vintage user interface is really the right
approach to be using in 2015 or 2020.
Logging on OpenVMS is crayon-grade. There are other issues, limits and
potentially-massive improvements lurking here, too.
VSI is occupied for ~three years, save for incremental upgrades and the
port. Which means the earliest we see this stuff starting to get used
is probably 2020. What will server and private-cloud logging look like
in 2020? It won't look like the 1980s.
Well, I could just say, "same old rant", but perhaps that's a bit unfair.

My first question is, just what logging are you discussing? I can see that OS
features should have a good method to track activity and exceptions. Well, some
of them, and tailored to a particular user's needs.

But lets look at a few things.

BACKUP:

It would be my opinion that this is site specific. Now, I seem to have some
small concept of backups, and how to set them up, and how to insure they
complete successfully. It's not hard, and reporting of exceptions to a central
entity isn't all that hard. The best exception report is no report, if your
procedures are set up correctly. This doesn't relieve the site of the
responsibility of testing backups now and then.

I don't think I'd want some canned procedure, which may not fit my needs.


DISK errors and such:

Yes, please, much more and much better ....


APPLICATIONS:

Real simple. You (plural) don't know my applications. You don't know my needs.
You don't need to be involved in my applications. You'll just make things worse.

Now, that said, if the application developer doesn't set up appropriate logging
and such, then it's a bad application. Definitely not an OS issue, since no OS
can or will do better.

So, it's not all black and white. In Codis, we know when something goes astray,
and when I say "we", I mean Consolidated Data, not just the users and customers.


VMS:

On the other hand, due to inadequacies in the HW and OS, we don't know when a
mirrored disk array has a problem, until the last disk goes down. Sure, there
is a tool to look at the status of the RAID arrays, an interactive tool. That's
crap! There should be tools that can automate the monitoring of the RAID
arrays. Simple tools. And when there is an exception, the StarTrek "Red Alert"
sound should play, at least in the computer room, and the system manager's
office, if not company wide. Perhaps some electrodes in the system manager's
chair would also be appropriate.

We have actually had the situation where someone goes into the computer room,
and notices some red lights where there should not be any red lights. Then we
get to wonder, how long has that disk been malfunctioning?
Stephen Hoffman
2015-12-19 21:35:52 UTC
Permalink
Post by David Froble
It would be my opinion that this is site specific. Now, I seem to have
some small concept of backups, and how to set them up, and how to
insure they complete successfully. It's not hard, and reporting of
exceptions to a central entity isn't all that hard. The best exception
report is no report, if your procedures are set up correctly. This
doesn't relieve the site of the responsibility of testing backups now
and then.
I don't think I'd want some canned procedure, which may not fit my needs.
You are aware that BACKUP uses OPCOM to provide (some) information on
the status of BACKUP operations? Not all that well, but it does.

As for "some canned procedure", if the tool or utility got my data out
and back in again reliably, I'd look at using it. One less thing to
deal with.

I'd be happy to have something that melded Tru64 Unix Logical Storage
Manager, SCACP, SAS$UTIL, MSA$UTIL, BACKUP$MANAGER, ELV and other
related giblets, and I suspect more than a few folks would look to use
it, too. Rather than the usual and current solution, which is
scattered all over the place, and buried in startups and buried under
piles of home-grown DCL procedures invoked from home-grown task
schedulers with home-grown error logging, and with recovery and
restoration procedures — and logging procedures — that may or may not
still work.
Post by David Froble
Real simple. You (plural) don't know my applications. You don't know
my needs. You don't need to be involved in my applications. You'll
just make things worse.
Now, that said, if the application developer doesn't set up appropriate
logging and such, then it's a bad application. Definitely not an OS
issue, since no OS can or will do better.
So, it's not all black and white. In Codis, we know when something
goes astray, and when I say "we", I mean Consolidated Data, not just
the users and customers.
So you don't use the standard mechanisms and/or don't centrally log
your data and/or don't use some of the available analysis and reduction
tools, and/or you have your own logging. Entirely your call. Good on
you. Go for it.

While you might not need these or other tools, maybe you can see to the
place where new applications and new customers might? Because if
OpenVMS doesn't start attracting new folks and new applications,
OpenVMS and VSI are gone. Rolling your own logging is a slog,
particularly as you add filtering and triggers and related tasks.
Yes, it can be done, but — like rolling your own database — a task that
you're clearly quite familiar with — now you own the upkeep and
enhancements associated with that effort, too. Or cases where — if
your business increased by an order of magnitude, or the scale of the
data increased by an order, or the numbers of servers involved
increased by an order — now you need to really review how your logging
or your database is done.

There are and always will be cases where you can and will want your own
bespoke logging, or your own bespoke database.

But I'd wager there are many cases where you don't, or where the
particular application you're working with doesn't warrant spending
your time on adding it, but it'd still be handy to feed the log data or
the crash data into your central servers for monitoring.

Storage management and storage errors and error logging and backups and
related are all aspects of the same OS foundation. If the
applications want or need to to roll their own logging, go for it.
Many folks will want to use the same tools for all of the data, whether
that's on OpenVMS via some new mechanism or maybe via (for instance)
Splunk or Apache Flume, and process the server and error and backup and
application data through the same mechanisms.

Because none of this is about the next three or five years. That's
already happened, in terms of software development life-cycles. it's
about the five or ten years after that. It's about new folks and new
applications. Yes, VSI has to keep the existing folks happy. But —
if the historical trends are any guide — that's not going to be enough.
--
Pure Personal Opinion | HoffmanLabs LLC
Joukj
2015-12-18 14:33:07 UTC
Permalink
Post by Joukj
debug(18-DEC-2015 11:38:03.49): Ssh2Common/SSHCOMMON.C:180: DISCONNECT
Post by Joukj
received: Connection closed by remote host.
More precisely, it makes connections, but it aborts them
prematurely?
My guess also would be some resource problem on the
server. Especially if the SSH-stop-start doesn't help, but
system-stop-start does. Does it fail this way for all users?
All users.
Post by Joukj
TCPIP$SYSTEM:TCPIP$SSH_RUN.COM looks at some logical names
(like, say, "tcpip$ssh_server_debug"), which might add some
info. (Or edit the script directly.)
That gives some extra info. No idea what to learn form it:

$ Set NoOn
$ VERIFY = F$VERIFY(F$TRNLNM("SYLOGIN_VERIFY"))
debug(18-DEC-2015 15:31:31.40): SshEventLoop/SSHUNIXELOOP.C:810:
Registered signal 13.
debug(18-DEC-2015 15:31:31.40): SshEventLoop/SSHUNIXELOOP.C:810:
Registered signal 1.
debug(18-DEC-2015 15:31:31.40): SshEventLoop/SSHUNIXELOOP.C:810:
Registered signal 20.
debug(18-DEC-2015 15:31:31.40): Sshd2/SSHD2.C:3575: CRTL version
(SYS$SHARE:DECC$SHR.EXE ident) is ELF 
%SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual
address=0000000000000014, PC=00000000000B0B90, PS=0000001B
etc....
Stephen Hoffman
2015-12-18 15:05:21 UTC
Permalink
Post by Joukj
All users.
Are your BG devices rolling past 10K units?
--
Pure Personal Opinion | HoffmanLabs LLC
Joukj
2015-12-18 15:42:05 UTC
Permalink
Post by Stephen Hoffman
Post by Joukj
All users.
Are your BG devices rolling past 10K units?
valeta-jj) sh dev bg

Device Device Error
Name Status Count
BG0: Mounted 0
BG1: Mounted 0
BG36: Mounted 0
BG41: Mounted 0
BG44: Mounted 0
BG49: Mounted 0
BG51: Mounted 0
BG52: Mounted 0
BG54: Mounted 0
BG57: Mounted 0
BG61: Mounted 0
BG70: Mounted 0
BG71: Mounted 0
BG2957: Mounted 0
BG3023: Mounted 0
BG4093: Mounted 0
BG7092: Mounted 0
BG7096: Mounted 0
BG7113: Mounted 0
BG34209: Mounted 0
BG42901: Mounted 0
BG44757: Mounted 0
BG44758: Mounted 0
BG58360: Mounted 0
BG58372: Mounted 0
BG58410: Mounted 0

Looks like it.... What problem does that give?


Jouk
Stephen Hoffman
2015-12-18 16:26:56 UTC
Permalink
Post by Joukj
Post by Stephen Hoffman
Post by Joukj
All users.
Are your BG devices rolling past 10K units?
BG58410: Mounted 0
Looks like it.... What problem does that give?
Donno yet.

There is clearly a latent bug in the ssh server.

Whether the bug is related to the 10K units support changes, I don't know.

The 10K changes did tip over a few applications. (Part of that same
exposed-kernel-structures mess that's being discussed over in the FIDs
thread. Switching to OO and messaging on these APIs would allow more
of these details to become opaque. But I digress.)

Anybody here have a working TCP/IP Services V5.7 ECO5 ssh server, with
BG devices past 10K units?

Any access violations in security-relevant processes should be setting
off klaxons at HPE and VSI.

Reversing the code around the ACCVIO would be the next step, to see
what it's doing (wrong); either at HPE and/or VSI, and/or locally.

This is part of why VSI needs to start collecting crash data from
everybody. Opt-in only, yada yada...
--
Pure Personal Opinion | HoffmanLabs LLC
Jouk
2015-12-18 19:24:54 UTC
Permalink
Post by Stephen Hoffman
Post by Joukj
Post by Stephen Hoffman
Post by Joukj
All users.
Are your BG devices rolling past 10K units?
BG58410: Mounted 0
Looks like it.... What problem does that give?
Donno yet.
There is clearly a latent bug in the ssh server.
Whether the bug is related to the 10K units support changes, I don't know.
The 10K changes did tip over a few applications. (Part of that same
exposed-kernel-structures mess that's being discussed over in the FIDs
thread. Switching to OO and messaging on these APIs would allow more of
these details to become opaque. But I digress.)
Anybody here have a working TCP/IP Services V5.7 ECO5 ssh server, with
BG devices past 10K units?
I do not think it is the 10K number: I see also 10K numbers in the
satelites that still do work.

Jouk
Steven Schweda
2015-12-18 19:32:33 UTC
Permalink
Post by Stephen Hoffman
Anybody here have a working TCP/IP Services V5.7 ECO5 ssh
server, with BG devices past 10K units?
alp $ tcpip show vers

HP TCP/IP Services for OpenVMS Alpha Version V5.7 - ECO 5
on a COMPAQ Professional Workstation XP1000 running OpenVMS V8.4

alp $ show devi bg /outp = bg1.out
alp $ ssh alp-l
[...]
Welcome to VMS (Alpha) V8.4 on ALP.
[...]
alp $ show devi bg /outp = bg2.out
alp $ gdiff bg1.out bg2.out
236a237,243
Post by Stephen Hoffman
BG15619: Mounted 0
BG15624: Mounted 0
BG15627: Mounted 0
BG15628: Mounted 0
BG15632: Mounted 0
BG15634: Mounted 0
BG15637: Mounted 0
[Some of those are client, some server, I assume.]

alp $ logout
SMS logged out at 18-DEC-2015 13:00:44.40
Connection to alp-l closed.

alp $ show devi bg /outp = bg3.out
alp $ gdiff bg1.out bg3.out
alp $

They're all gone now.
Post by Stephen Hoffman
$ Set NoOn
$ VERIFY = F$VERIFY(F$TRNLNM("SYLOGIN_VERIFY"))
%SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual
address=0000000000000014, PC=00000000000B0B90, PS=0000001B
That's quick. Around here:

$ Set NoOn
$ VERIFY = F$VERIFY(F$TRNLNM("SYLOGIN_VERIFY"))
Fri 18 12:59:30 INFORMATIONAL: Starting image in auxiliary server mode.
Fri 18 12:59:30 INFORMATIONAL: connection from "10.0.0.9"
[...]

You might want to add some diagnostics to
TCPIP$SSH_RUN.COM, to make sure that you know what's
exploding, but it's sure early.
Post by Stephen Hoffman
I suffer from script-kids all the time. [...]
I configured my router to forward a port other than 22
(instead of 22) to port 22 on my VMS Alpha system.
Internally, the default port (22) is still used everywhere
(so commands stay simple), but, from outside, you need to
specify the correct (other) port. That seems to have stopped
practically all the break-in attempts. Similar for Telnet
and 23.
Post by Stephen Hoffman
Any access violations in security-relevant processes should
be setting off klaxons at HPE and VSI.
Lately, I've been trying to use Mac Mail to read MIMEish
e-mail. One result is that, from time to time, SYSTEM gets
messages like the following:

#70 18-DEC-2015 08:14:34.36
MAIL
From: ALP::TCPIP$IMAP
To: SYSTEM
CC:
Subj: ALP - %SYSTEM-F-ACCVIO, access violation, reason mask=!XB, virtual addre
ss=!XH, PC=!XH, PS=!XL


The TCP/IP IMAP server has experienced a runtime error. The reason
for the error should appear on the subject line of this message.

Please investigate this problem as quickly as possible.
Thank you.


To a know-nothing like me, not entirely clear is exactly
what I should be investigating, or how. But this is probably
the wrong thread in which to pursue it. Great message,
though.
Jan-Erik Soderholm
2015-12-18 23:24:15 UTC
Permalink
Post by Stephen Hoffman
Post by Joukj
Post by Stephen Hoffman
Post by Joukj
All users.
Are your BG devices rolling past 10K units?
BG58410: Mounted 0
Looks like it.... What problem does that give?
Donno yet.
There is clearly a latent bug in the ssh server.
Whether the bug is related to the 10K units support changes, I don't know.
The 10K changes did tip over a few applications. (Part of that same
exposed-kernel-structures mess that's being discussed over in the FIDs
thread. Switching to OO and messaging on these APIs would allow more of
these details to become opaque. But I digress.)
Anybody here have a working TCP/IP Services V5.7 ECO5 ssh server, with BG
devices past 10K units?
I have an V5.7 ECO3 and haviong BG devices well over 10K.
Was that supposed to have been included in the ECO5 !?

B.t.w, SSH works OK and gets BG devices > 30K.

$ tcpip sh dev
...
bg32910 STREAM 56190 22 127.0.0.1
bg32911 STREAM 22 56190 SSH 127.0.0.1
...

At least using localhost, if it matters...

Jan-Erik.
David Froble
2015-12-18 17:14:52 UTC
Permalink
Post by Joukj
Hi All
Form some reason on my OpenVMS8.4 (AXP+IA64) and TCP/IP services V5.7 -
ECO 5, once in a while the SSH-server stops making connections. On the
$ ssh -v valeta
debug(18-DEC-2015 11:38:03.32): Ssh2/SSH2.C:1896: CRTL version
(SYS$SHARE:DECC$SHR.EXE ident) is ELF
Allocating global SshRegex context.
debug(18-DEC-2015 11:38:03.34): SshConfig/SSHCONFIG.C:3482: Metaconfig
parsing stopped at line 4.
debug(18-DEC-2015 11:38:03.35): SshConfig/SSHCONFIG.C:890: Setting
variable 'VerboseMode' to 'FALSE'.
debug(18-DEC-2015 11:38:03.35): SshConfig/SSHCONFIG.C:3390: Unable to
open ssh2/ssh2_config
debug(18-DEC-2015 11:38:03.36): Connecting to valeta, port 22... (SOCKS
not used)
debug(18-DEC-2015 11:38:03.36): Ssh2/SSH2.C:2881: Entering event loop.
debug(18-DEC-2015 11:38:03.37): Ssh2Client/SSHCLIENT.C:1655: Creating
transport protocol.
SshAuthMethodClient/SSHAUTHMETHODC.C:104: Added "publickey" to usable
methods.
SshAuthMethodClient/SSHAUTHMETHODC.C:104: Added "keyboard-interactive"
to usable methods.
SshAuthMethodClient/SSHAUTHMETHODC.C:104: Added "password" to usable
methods.
debug(18-DEC-2015 11:38:03.37): Ssh2Client/SSHCLIENT.C:1696: Creating
userauth protocol.
'publickey,keyboard-interactive,password'
debug(18-DEC-2015 11:38:03.37): SshUnixTcp/SSHUNIXTCP.C:1758: using
local hostname hrem159.nano.tudelft.nl
debug(18-DEC-2015 11:38:03.37): Ssh2Common/SSHCOMMON.C:541: local ip =
100.100.100.1, local port = 49207
debug(18-DEC-2015 11:38:03.37): Ssh2Common/SSHCOMMON.C:543: remote ip =
100.100.100.2, remote port = 22
debug(18-DEC-2015 11:38:03.37): SshConnection/SSHCONN.C:2584: Wrapping...
Initializing ReadLine...
debug(18-DEC-2015 11:38:03.49): Ssh2Common/SSHCOMMON.C:180: DISCONNECT
received: Connection closed by remote host.
Uninitializing ReadLine...
warning: Authentication failed.
debug(18-DEC-2015 11:38:03.49): Ssh2/SSH2.C:327: locally_generated = TRUE
Disconnected; connection lost (Connection closed by remote host.).
debug(18-DEC-2015 11:38:03.49): Ssh2Client/SSHCLIENT.C:1731: Destroying
client.
debug(18-DEC-2015 11:38:03.49): SshConfig/SSHCONFIG.C:2888: Freeing pki.
(host_pki != NULL, user_pki = NULL)
debug(18-DEC-2015 11:38:03.49): SshConnection/SSHCONN.C:2636: Destroying
SshConn object.
debug(18-DEC-2015 11:38:03.49): Ssh2Client/SSHCLIENT.C:1799: Destroying
client completed.
SshAuthMethodClient/SSHAUTHMETHODC.C:109: Destroying authentication
method array.
debug(18-DEC-2015 11:38:03.52): SshAppCommon/SSHAPPCOMMON.C:326: Freeing
global SshRegex context.
debug(18-DEC-2015 11:38:03.52): SshConfig/SSHCONFIG.C:2888: Freeing pki.
(host_pki = NULL, user_pki = NULL)
If I reboot everything is OK ofcourse. But how can I reset the
SSH-server without rebooting?
I already tried
But that dis not solve the problem.
You're stopping and re-starting SSH. If the problem is above that, perhaps in
TCP/IP, re-starting SSH might not help. Try stopping and re-starting TCP/IP.
Jouk
2015-12-18 19:22:15 UTC
Permalink
Post by David Froble
Post by Joukj
Hi All
Form some reason on my OpenVMS8.4 (AXP+IA64) and TCP/IP services V5.7
- ECO 5, once in a while the SSH-server stops making connections. On
$ ssh -v valeta
debug(18-DEC-2015 11:38:03.32): Ssh2/SSH2.C:1896: CRTL version
(SYS$SHARE:DECC$SHR.EXE ident) is ELF
Allocating global SshRegex context.
debug(18-DEC-2015 11:38:03.34): SshConfig/SSHCONFIG.C:3482: Metaconfig
parsing stopped at line 4.
debug(18-DEC-2015 11:38:03.35): SshConfig/SSHCONFIG.C:890: Setting
variable 'VerboseMode' to 'FALSE'.
debug(18-DEC-2015 11:38:03.35): SshConfig/SSHCONFIG.C:3390: Unable to
open ssh2/ssh2_config
debug(18-DEC-2015 11:38:03.36): Connecting to valeta, port 22...
(SOCKS not used)
debug(18-DEC-2015 11:38:03.36): Ssh2/SSH2.C:2881: Entering event loop.
debug(18-DEC-2015 11:38:03.37): Ssh2Client/SSHCLIENT.C:1655: Creating
transport protocol.
SshAuthMethodClient/SSHAUTHMETHODC.C:104: Added "publickey" to usable
methods.
SshAuthMethodClient/SSHAUTHMETHODC.C:104: Added "keyboard-interactive"
to usable methods.
SshAuthMethodClient/SSHAUTHMETHODC.C:104: Added "password" to usable
methods.
debug(18-DEC-2015 11:38:03.37): Ssh2Client/SSHCLIENT.C:1696: Creating
userauth protocol.
'publickey,keyboard-interactive,password'
debug(18-DEC-2015 11:38:03.37): SshUnixTcp/SSHUNIXTCP.C:1758: using
local hostname hrem159.nano.tudelft.nl
debug(18-DEC-2015 11:38:03.37): Ssh2Common/SSHCOMMON.C:541: local ip =
100.100.100.1, local port = 49207
debug(18-DEC-2015 11:38:03.37): Ssh2Common/SSHCOMMON.C:543: remote ip
= 100.100.100.2, remote port = 22
debug(18-DEC-2015 11:38:03.37): SshConnection/SSHCONN.C:2584: Wrapping...
Initializing ReadLine...
debug(18-DEC-2015 11:38:03.49): Ssh2Common/SSHCOMMON.C:180: DISCONNECT
received: Connection closed by remote host.
Uninitializing ReadLine...
warning: Authentication failed.
debug(18-DEC-2015 11:38:03.49): Ssh2/SSH2.C:327: locally_generated = TRUE
Disconnected; connection lost (Connection closed by remote host.).
Destroying client.
debug(18-DEC-2015 11:38:03.49): SshConfig/SSHCONFIG.C:2888: Freeing
pki. (host_pki != NULL, user_pki = NULL)
Destroying SshConn object.
Destroying client completed.
SshAuthMethodClient/SSHAUTHMETHODC.C:109: Destroying authentication
method array.
Freeing global SshRegex context.
debug(18-DEC-2015 11:38:03.52): SshConfig/SSHCONFIG.C:2888: Freeing
pki. (host_pki = NULL, user_pki = NULL)
If I reboot everything is OK ofcourse. But how can I reset the
SSH-server without rebooting?
I already tried
But that dis not solve the problem.
You're stopping and re-starting SSH. If the problem is above that,
perhaps in TCP/IP, re-starting SSH might not help. Try stopping and
re-starting TCP/IP.
That is wwhat I would like to avoid: Another process, which is important
to me is writing to an external NFS-share. Before that is finished in a
few days I do not want to interupt it.

Jouk
David Froble
2015-12-19 01:41:07 UTC
Permalink
Post by Jouk
Post by David Froble
Post by Joukj
Hi All
Form some reason on my OpenVMS8.4 (AXP+IA64) and TCP/IP services V5.7
- ECO 5, once in a while the SSH-server stops making connections. On
$ ssh -v valeta
debug(18-DEC-2015 11:38:03.32): Ssh2/SSH2.C:1896: CRTL version
(SYS$SHARE:DECC$SHR.EXE ident) is ELF
Allocating global SshRegex context.
debug(18-DEC-2015 11:38:03.34): SshConfig/SSHCONFIG.C:3482: Metaconfig
parsing stopped at line 4.
debug(18-DEC-2015 11:38:03.35): SshConfig/SSHCONFIG.C:890: Setting
variable 'VerboseMode' to 'FALSE'.
debug(18-DEC-2015 11:38:03.35): SshConfig/SSHCONFIG.C:3390: Unable to
open ssh2/ssh2_config
debug(18-DEC-2015 11:38:03.36): Connecting to valeta, port 22...
(SOCKS not used)
debug(18-DEC-2015 11:38:03.36): Ssh2/SSH2.C:2881: Entering event loop.
debug(18-DEC-2015 11:38:03.37): Ssh2Client/SSHCLIENT.C:1655: Creating
transport protocol.
SshAuthMethodClient/SSHAUTHMETHODC.C:104: Added "publickey" to usable
methods.
SshAuthMethodClient/SSHAUTHMETHODC.C:104: Added "keyboard-interactive"
to usable methods.
SshAuthMethodClient/SSHAUTHMETHODC.C:104: Added "password" to usable
methods.
debug(18-DEC-2015 11:38:03.37): Ssh2Client/SSHCLIENT.C:1696: Creating
userauth protocol.
'publickey,keyboard-interactive,password'
debug(18-DEC-2015 11:38:03.37): SshUnixTcp/SSHUNIXTCP.C:1758: using
local hostname hrem159.nano.tudelft.nl
debug(18-DEC-2015 11:38:03.37): Ssh2Common/SSHCOMMON.C:541: local ip =
100.100.100.1, local port = 49207
debug(18-DEC-2015 11:38:03.37): Ssh2Common/SSHCOMMON.C:543: remote ip
= 100.100.100.2, remote port = 22
debug(18-DEC-2015 11:38:03.37): SshConnection/SSHCONN.C:2584: Wrapping...
Initializing ReadLine...
debug(18-DEC-2015 11:38:03.49): Ssh2Common/SSHCOMMON.C:180: DISCONNECT
received: Connection closed by remote host.
Uninitializing ReadLine...
warning: Authentication failed.
debug(18-DEC-2015 11:38:03.49): Ssh2/SSH2.C:327: locally_generated = TRUE
Disconnected; connection lost (Connection closed by remote host.).
Destroying client.
debug(18-DEC-2015 11:38:03.49): SshConfig/SSHCONFIG.C:2888: Freeing
pki. (host_pki != NULL, user_pki = NULL)
Destroying SshConn object.
Destroying client completed.
SshAuthMethodClient/SSHAUTHMETHODC.C:109: Destroying authentication
method array.
Freeing global SshRegex context.
debug(18-DEC-2015 11:38:03.52): SshConfig/SSHCONFIG.C:2888: Freeing
pki. (host_pki = NULL, user_pki = NULL)
If I reboot everything is OK ofcourse. But how can I reset the
SSH-server without rebooting?
I already tried
But that dis not solve the problem.
You're stopping and re-starting SSH. If the problem is above that,
perhaps in TCP/IP, re-starting SSH might not help. Try stopping and
re-starting TCP/IP.
That is wwhat I would like to avoid: Another process, which is important
to me is writing to an external NFS-share. Before that is finished in a
few days I do not want to interupt it.
Jouk
Understood, and just as a VMS reboot will defeat your goals.

The reason I made that suggestion was to possibly determine whether the problem
was in TCP/IP, or perhaps further down in VMS. It's always good to diagnose a
problem one small step at a time.
Joukj
2015-12-21 07:32:42 UTC
Permalink
Post by David Froble
The reason I made that suggestion was to possibly determine whether the
problem was in TCP/IP, or perhaps further down in VMS. It's always good
to diagnose a problem one small step at a time.
I tried to shutdown/restart the whole IP-stack on one of my machines. It
complains about some of the BG devices still exist. It did not solve the
problem of ssh-access. After restarting the machine the problem was gone
on this node.

Loading...