Discussion:
Change of tape drive causing SCSI bus errors on AlphaServer DS20E
Add Reply
Jeremy Begg
2020-10-09 01:12:39 UTC
Reply
Permalink
Hi,

A customer of mine has an AlphaServer DS20E running VMS 7.3-2. There is
NO chance it will ever be upgraded. (In fact this system will be retired
in the first half of next year.)

Until a couple of months ago they were using a DLT8000 tape drive for
the daily backups. That drive died and the replacement supplied by
local HP support was also dead. The customer had a spare SDLT drive
floating around so we connected that and it works, and VMS BACKUP is
happy.

The problem is that VMS is logging hundreds of errors on the SCSI bus
for that drive every day:

WODV02> sh dev/fu mke500

Magtape WODV02$MKE500:, device type COMPAQ SuperDLT1, is online, record-oriented
device, file-oriented device, available to cluster, error logging is
enabled, controller supports compaction (compaction enabled), device
supports fastskip (per_io).

Error count 8 Operations completed 30020052
Owner process "" Owner UIC [SYSTEM]
Owner process ID 00000000 Dev Prot S:RWPL,O:RWPL,G:R,W
Reference count 0 Default buffer size 512
Density SDLT Format Normal-11

Volume status: no-unload on dismount, beginning-of-tape, odd parity.

WODV02> sh err
Device Error Count
PKA0: 1
$10$DKA0: (WODV02) 1
$10$DKA300: (WODV02) 1
$10$DQA1: (WODV02) 1
$10$DQB0: (WODV02) 1
$10$DQB1: (WODV02) 1
PKE0: 25940 <----- WRAPS at 64K every few days
WODV02$TTB0: 3
WODV02$MKE400: 67
WODV02$MKE500: 8
WODV02>

This started happening when the old tape drive (MKE400) was removed and the new
drive (MKE500) was connected in its place. I am assured the SCSI terminator is
in place. (But due to COVID-19 lockdown it's very difficult to get this double-
checked.) There is no other device on that SCSI bus.

The errors are logged even when the tape drive is not in use.

The ERRLOG.SYS entries all refer to the SCSI controller (PKE0) and all include
a textual reference to the SDLT tape drive, for example (and apologies for the
wrapping):

Dump untranslatable event body

00000000 00000000 00003980 00000010 00000000 00000000 00000008 00020001
.....................9.......... 00000000 02B7DD1C
570A0000 00000000 0C440000 00010004 00000096 00016559 00000000 00000000
........Ye............D........W 00000020 02B7DD3C
43000000 00000000 00000000 00000043 44462E30 5F303745 4B502432 3056444F
ODV02$PKE70_0.FDC..............C 00000040 02B7DD5C
03050000 00300000 00000000 00000000 4131544C 44726570 75532051 41504D4F OMPAQ
SuperDLT1A..........0..... 00000060 02B7DD7C
3B200700 00000000 00000000 00000003 00010302 01C00500 FF000000 12060480
.............................. ; 00000080 02B7DD9C
44474552 00000000 00000000 00000000 00000000 00000000 0000814F 3B20814F O.
;O.......................REGD 000000A0 02B7DDBC
02250770 868B0000 00240820 35812C4A E8020F00 0085000B 04006707 0020D87D }. ..g..........J,.5
.$.....p.%. 000000C0 02B7DDDC
00000001 96010769 8010DC0E D0040014 8F017582 08602997 00000028 02250778
x.%.(....)`..u..........i....... 000000E0 02B7DDFC
000C1000 B1604526 EAD11997 00000001 AC4A0A00 00000101 85D9EAE0 00000002
..............J.........&E`..... 00000100 02B7DE1C
570A0000 00000111 00000000 CC02E00C 02021389 00020400 0032322E 312E3256
V2.1.22........................W 00000120 02B7DE3C
020038A0 00000000 2020322D 332E3756 FD2200B5 70704045 4B502432 3056444F ODV02$***@pp..".V7.3-2
.....8.. 00000140 02B7DE5C
03050000 00302045 30325344 20726576 72655361 68706C41 20514150 4D4F431F .COMPAQ AlphaServer DS20E
0..... 00000160 02B7DE7C
010302 01C00500 FF000000 12060480 ...............
00000180 02B7DE9C


Does anyone know of a setting in the tape drive, or possibly for the
operating system, which might reduce the number of errors?

Thanks,

Jeremy Begg
Steven Schweda
2020-10-09 04:29:54 UTC
Reply
Permalink
What is PKE0? Is there a PKF0 nearby which you could try?
[...] There is no other device on that SCSI bus.
Do you get errors when nothing's connected (except, perhaps, a
terminator)?
Does anyone know of a setting in the tape drive, or possibly for the
operating system, which might reduce the number of errors?
Not I. I'd worry about the hardware (cable, termination).
Volker Halle
2020-10-09 04:50:32 UTC
Reply
Permalink
Jeremy,

try to get these errlog entries analyzed with DECevent ($ DIAGNOSE ...). That tool is much better at translating SCSI device errors.

Volker.
Volker Halle
2020-10-09 11:19:39 UTC
Reply
Permalink
Jeremy,

you could also use $ SET DEVICE/NOERROR_LOGGING PKE0 to prevent those errors from being logged into ERRLOG.SYS

Volker.
Scott Dorsey
2020-10-09 17:00:08 UTC
Reply
Permalink
Post by Jeremy Begg
This started happening when the old tape drive (MKE400) was removed and the new
drive (MKE500) was connected in its place. I am assured the SCSI terminator is
in place. (But due to COVID-19 lockdown it's very difficult to get this double-
checked.) There is no other device on that SCSI bus.
How many SCSI terminators are in place? Does the drive have internal
termination? Is there a bunch of cable lying loose after the terminator?

Have they swapped cables yet?
--scott
--
"C'est un Nagra. C'est suisse, et tres, tres precis."
Loading...