Discussion:
When running the OpenVMS Monitor Utility (monitor lock), I see a high value for "ENQs Forced To Wait Rate"
(too old to reply)
Dann Corbit
2017-03-07 08:24:23 UTC
Permalink
Raw Message
What are the factors associated with "ENQs Forced To Wait Rate" that cause a large value to occur?

What are the things that can be done to reduce locking contention for a network process?

OpenVMS Monitor Utility
LOCK MANAGEMENT STATISTICS
on node I2VMS
6-MAR-2017 15:43:07.90

CUR AVE MIN MAX

New ENQ Rate 5822.33 3239.05 0.00 5840.66
Converted ENQ Rate 18729.66 10346.82 0.00 18781.66

DEQ Rate 5822.33 3236.59 0.00 5840.66
Blocking AST Rate 0.00 0.00 0.00 0.00

ENQs Forced To Wait Rate 1554.33 847.85 0.00 1658.33
ENQs Not Queued Rate 0.00 1.88 0.00 8.66

Deadlock Search Rate 0.00 0.00 0.00 0.00
Deadlock Find Rate 0.00 0.00 0.00 0.00

Total Locks 1657.00 1665.65 1605.00 1710.00
Bob Gezelter
2017-03-07 08:48:53 UTC
Permalink
Raw Message
Post by Dann Corbit
What are the factors associated with "ENQs Forced To Wait Rate" that cause a large value to occur?
What are the things that can be done to reduce locking contention for a network process?
OpenVMS Monitor Utility
LOCK MANAGEMENT STATISTICS
on node I2VMS
6-MAR-2017 15:43:07.90
CUR AVE MIN MAX
New ENQ Rate 5822.33 3239.05 0.00 5840.66
Converted ENQ Rate 18729.66 10346.82 0.00 18781.66
DEQ Rate 5822.33 3236.59 0.00 5840.66
Blocking AST Rate 0.00 0.00 0.00 0.00
ENQs Forced To Wait Rate 1554.33 847.85 0.00 1658.33
ENQs Not Queued Rate 0.00 1.88 0.00 8.66
Deadlock Search Rate 0.00 0.00 0.00 0.00
Deadlock Find Rate 0.00 0.00 0.00 0.00
Total Locks 1657.00 1665.65 1605.00 1710.00
Dann,

More information is needed about what your processes are doing with their files and locks.

If I were to hazard a guess, more than one process is attempting to lock the resource. There are other statistics that one can gather (see Keith Parris' response to a similar question at https://groups.google.com/forum/#!msg/comp.os.vms/UA3ecyi6dc4/OwAkQLYWZp8J).

Getting the rate down means reducing the locking conflict. That needs more understanding of the application and workload.

If I have been unclear, please feel free to contact me.

- Bob Gezelter, http://www.rlgsc.com
David Froble
2017-03-07 14:36:28 UTC
Permalink
Raw Message
Post by Bob Gezelter
Post by Dann Corbit
What are the factors associated with "ENQs Forced To Wait Rate" that cause a large value to occur?
What are the things that can be done to reduce locking contention for a network process?
OpenVMS Monitor Utility
LOCK MANAGEMENT STATISTICS
on node I2VMS
6-MAR-2017 15:43:07.90
CUR AVE MIN MAX
New ENQ Rate 5822.33 3239.05 0.00 5840.66
Converted ENQ Rate 18729.66 10346.82 0.00 18781.66
DEQ Rate 5822.33 3236.59 0.00 5840.66
Blocking AST Rate 0.00 0.00 0.00 0.00
ENQs Forced To Wait Rate 1554.33 847.85 0.00 1658.33
ENQs Not Queued Rate 0.00 1.88 0.00 8.66
Deadlock Search Rate 0.00 0.00 0.00 0.00
Deadlock Find Rate 0.00 0.00 0.00 0.00
Total Locks 1657.00 1665.65 1605.00 1710.00
Dann,
More information is needed about what your processes are doing with their files and locks.
If I were to hazard a guess, more than one process is attempting to lock the resource. There are other statistics that one can gather (see Keith Parris' response to a similar question at https://groups.google.com/forum/#!msg/comp.os.vms/UA3ecyi6dc4/OwAkQLYWZp8J).
Getting the rate down means reducing the locking conflict. That needs more understanding of the application and workload.
If I have been unclear, please feel free to contact me.
- Bob Gezelter, http://www.rlgsc.com
I agree with Bob. Got to know what the applications are attempting to do.

Is this just use of RMS, or is this a home grown application using the DLM?
Could be required work, or, could be poor implementation.

The conversion rate looks excessively high to me, compared to ENQ rate and total
locks. But impossible to tell without lots more information.
Jan-Erik Soderholm
2017-03-07 20:25:17 UTC
Permalink
Raw Message
Post by David Froble
Post by Bob Gezelter
Post by Dann Corbit
What are the factors associated with "ENQs Forced To Wait Rate" that
cause a large value to occur?
What are the things that can be done to reduce locking contention for a
network process?
OpenVMS Monitor
Utility LOCK
MANAGEMENT STATISTICS on node
I2VMS
6-MAR-2017 15:43:07.90
CUR AVE MIN
MAX
New ENQ Rate 5822.33 3239.05 0.00
5840.66 Converted ENQ Rate 18729.66 10346.82
0.00 18781.66
DEQ Rate 5822.33 3236.59 0.00
5840.66 Blocking AST Rate 0.00 0.00
0.00 0.00
ENQs Forced To Wait Rate 1554.33 847.85 0.00
1658.33 ENQs Not Queued Rate 0.00 1.88
0.00 8.66
Deadlock Search Rate 0.00 0.00 0.00
0.00 Deadlock Find Rate 0.00 0.00
0.00 0.00
Total Locks 1657.00 1665.65 1605.00 1710.00
Dann,
More information is needed about what your processes are doing with their
files and locks.
If I were to hazard a guess, more than one process is attempting to lock
the resource. There are other statistics that one can gather (see Keith
Parris' response to a similar question at
https://groups.google.com/forum/#!msg/comp.os.vms/UA3ecyi6dc4/OwAkQLYWZp8J).
Getting the rate down means reducing the locking conflict. That needs
more understanding of the application and workload.
If I have been unclear, please feel free to contact me.
- Bob Gezelter, http://www.rlgsc.com
I agree with Bob. Got to know what the applications are attempting to do.
Is this just use of RMS, or is this a home grown application using the DLM?
Could be required work, or, could be poor implementation.
The conversion rate looks excessively high to me, compared to ENQ rate and
total locks.
I'd say that it is hard to compare a "rate" value to a "amount" value.
You can have many active locks but few lock operations, or fewer locks
but higher rates against these locks.

The New:Convert ratio above is approx 1:3, the same I'm seeing here (but
lower total values). The ratio between Convert and Forced is also approx
the same here (approx 20:1 but lower values).

On the other hand we have 75+ K locks compared with 1.6 K above, but
I do not think that is relevant.

If this system runs Rdb you can use the RMU/SHOW STAT utility and view
the screens "Summary Locking Statistics", "Locking (one lock type)"
and "Locking (one stat field)". That can give further info about what
locking activities is going on.






But impossible to tell without lots more information.
Dann Corbit
2017-03-07 23:52:06 UTC
Permalink
Raw Message
Post by Bob Gezelter
Post by Dann Corbit
What are the factors associated with "ENQs Forced To Wait Rate" that cause a large value to occur?
What are the things that can be done to reduce locking contention for a network process?
OpenVMS Monitor Utility
LOCK MANAGEMENT STATISTICS
on node I2VMS
6-MAR-2017 15:43:07.90
CUR AVE MIN MAX
New ENQ Rate 5822.33 3239.05 0.00 5840.66
Converted ENQ Rate 18729.66 10346.82 0.00 18781.66
DEQ Rate 5822.33 3236.59 0.00 5840.66
Blocking AST Rate 0.00 0.00 0.00 0.00
ENQs Forced To Wait Rate 1554.33 847.85 0.00 1658.33
ENQs Not Queued Rate 0.00 1.88 0.00 8.66
Deadlock Search Rate 0.00 0.00 0.00 0.00
Deadlock Find Rate 0.00 0.00 0.00 0.00
Total Locks 1657.00 1665.65 1605.00 1710.00
Dann,
More information is needed about what your processes are doing with their files and locks.
If I were to hazard a guess, more than one process is attempting to lock the resource. There are other statistics that one can gather (see Keith Parris' response to a similar question at https://groups.google.com/forum/#!msg/comp.os.vms/UA3ecyi6dc4/OwAkQLYWZp8J).
Getting the rate down means reducing the locking conflict. That needs more understanding of the application and workload.
If I have been unclear, please feel free to contact me.
- Bob Gezelter, http://www.rlgsc.com
More information:
The problem is related to TCP/IP processing.
We have an application that does RMS I/O operations and sends the data to a client requester.

To simplify the problem, we took a simple echo server and modified it to send and receive representative record packages.

When we turn off the TCP/IP send/receive we get stupendous throughput (many tens of thousands of records per second). When we add the TCP/IP send/receive (with everything else intact, including the random and sequential RMS reads) the process slows to 10% of the throughput.

We have 1 GB network cards, full duplex.
It's not the transfer rate that is the problem because we can read a large file using FTP at a transfer rate that should accomplish our record I/O goals. But the small packets of TCP/IP bog the whole system down enormously.

We have the Nagle algorithm turned off (made very little difference).
We even set the FAB$V_NQL option in the FAB$B_SHR field to request that RMS use no query locking. (made very little difference).

So we can read the RMS file like mad, but when we delay things by sending small TCP/IP packets, it goes right into the crapper.

I should mention that it is a SMP program, with each client getting their own connection.

The application is written in C++.
The sample program is also written in C++.
David Froble
2017-03-08 00:44:22 UTC
Permalink
Raw Message
Post by Dann Corbit
Post by Bob Gezelter
Post by Dann Corbit
What are the factors associated with "ENQs Forced To Wait Rate" that cause a large value to occur?
What are the things that can be done to reduce locking contention for a network process?
OpenVMS Monitor Utility
LOCK MANAGEMENT STATISTICS
on node I2VMS
6-MAR-2017 15:43:07.90
CUR AVE MIN MAX
New ENQ Rate 5822.33 3239.05 0.00 5840.66
Converted ENQ Rate 18729.66 10346.82 0.00 18781.66
DEQ Rate 5822.33 3236.59 0.00 5840.66
Blocking AST Rate 0.00 0.00 0.00 0.00
ENQs Forced To Wait Rate 1554.33 847.85 0.00 1658.33
ENQs Not Queued Rate 0.00 1.88 0.00 8.66
Deadlock Search Rate 0.00 0.00 0.00 0.00
Deadlock Find Rate 0.00 0.00 0.00 0.00
Total Locks 1657.00 1665.65 1605.00 1710.00
Dann,
More information is needed about what your processes are doing with their files and locks.
If I were to hazard a guess, more than one process is attempting to lock the resource. There are other statistics that one can gather (see Keith Parris' response to a similar question at https://groups.google.com/forum/#!msg/comp.os.vms/UA3ecyi6dc4/OwAkQLYWZp8J).
Getting the rate down means reducing the locking conflict. That needs more understanding of the application and workload.
If I have been unclear, please feel free to contact me.
- Bob Gezelter, http://www.rlgsc.com
The problem is related to TCP/IP processing.
We have an application that does RMS I/O operations and sends the data to a client requester.
To simplify the problem, we took a simple echo server and modified it to send and receive representative record packages.
When we turn off the TCP/IP send/receive we get stupendous throughput (many tens of thousands of records per second). When we add the TCP/IP send/receive (with everything else intact, including the random and sequential RMS reads) the process slows to 10% of the throughput.
We have 1 GB network cards, full duplex.
It's not the transfer rate that is the problem because we can read a large file using FTP at a transfer rate that should accomplish our record I/O goals. But the small packets of TCP/IP bog the whole system down enormously.
We have the Nagle algorithm turned off (made very little difference).
We even set the FAB$V_NQL option in the FAB$B_SHR field to request that RMS use no query locking. (made very little difference).
So we can read the RMS file like mad, but when we delay things by sending small TCP/IP packets, it goes right into the crapper.
I should mention that it is a SMP program, with each client getting their own connection.
The application is written in C++.
The sample program is also written in C++.
I'm guessing that it's not a locking problem at all. As far as I know, TCP/IP
does not use the DLM.

With caching, file I/O can be very surprising.

Can we assume that there is no locking and updating occurring?

I've got applications with clients getting connections, asking for and receiving
data, and they are rather quick. Using sockets using the system service
interface, not any C interface. I cannot comment on any differences, if such
may exist.

I have read complaints in the past that the HP TCP/IP performance is not so
great, and I've read complaints that network I/O on VMS is not so great. If the
software cannot keep up with GB network HW, you'll be limited to what the
software can do.

One place to look is some of the SYSGEN parameters.

Are all connections through one process, or one process per connection, or ??

Process limits could be an issue.

How many connections, and transfers, are you looking at per second? Total?

Maybe still not enough detail information ....
Dann Corbit
2017-03-08 01:16:26 UTC
Permalink
Raw Message
Post by David Froble
Post by Dann Corbit
Post by Bob Gezelter
Post by Dann Corbit
What are the factors associated with "ENQs Forced To Wait Rate" that cause a large value to occur?
What are the things that can be done to reduce locking contention for a network process?
OpenVMS Monitor Utility
LOCK MANAGEMENT STATISTICS
on node I2VMS
6-MAR-2017 15:43:07.90
CUR AVE MIN MAX
New ENQ Rate 5822.33 3239.05 0.00 5840.66
Converted ENQ Rate 18729.66 10346.82 0.00 18781.66
DEQ Rate 5822.33 3236.59 0.00 5840.66
Blocking AST Rate 0.00 0.00 0.00 0.00
ENQs Forced To Wait Rate 1554.33 847.85 0.00 1658.33
ENQs Not Queued Rate 0.00 1.88 0.00 8.66
Deadlock Search Rate 0.00 0.00 0.00 0.00
Deadlock Find Rate 0.00 0.00 0.00 0.00
Total Locks 1657.00 1665.65 1605.00 1710.00
Dann,
More information is needed about what your processes are doing with their files and locks.
If I were to hazard a guess, more than one process is attempting to lock the resource. There are other statistics that one can gather (see Keith Parris' response to a similar question at https://groups.google.com/forum/#!msg/comp.os.vms/UA3ecyi6dc4/OwAkQLYWZp8J).
Getting the rate down means reducing the locking conflict. That needs more understanding of the application and workload.
If I have been unclear, please feel free to contact me.
- Bob Gezelter, http://www.rlgsc.com
The problem is related to TCP/IP processing.
We have an application that does RMS I/O operations and sends the data to a client requester.
To simplify the problem, we took a simple echo server and modified it to send and receive representative record packages.
When we turn off the TCP/IP send/receive we get stupendous throughput (many tens of thousands of records per second). When we add the TCP/IP send/receive (with everything else intact, including the random and sequential RMS reads) the process slows to 10% of the throughput.
We have 1 GB network cards, full duplex.
It's not the transfer rate that is the problem because we can read a large file using FTP at a transfer rate that should accomplish our record I/O goals. But the small packets of TCP/IP bog the whole system down enormously.
We have the Nagle algorithm turned off (made very little difference).
We even set the FAB$V_NQL option in the FAB$B_SHR field to request that RMS use no query locking. (made very little difference).
So we can read the RMS file like mad, but when we delay things by sending small TCP/IP packets, it goes right into the crapper.
I should mention that it is a SMP program, with each client getting their own connection.
The application is written in C++.
The sample program is also written in C++.
I'm guessing that it's not a locking problem at all. As far as I know, TCP/IP
does not use the DLM.
With caching, file I/O can be very surprising.
Can we assume that there is no locking and updating occurring?
I've got applications with clients getting connections, asking for and receiving
data, and they are rather quick. Using sockets using the system service
interface, not any C interface. I cannot comment on any differences, if such
may exist.
I have read complaints in the past that the HP TCP/IP performance is not so
great, and I've read complaints that network I/O on VMS is not so great. If the
software cannot keep up with GB network HW, you'll be limited to what the
software can do.
One place to look is some of the SYSGEN parameters.
We set up our SYSGEN parameters to mirror the ORACLE OpenVMS recommendations. It did not help much.
Post by David Froble
Are all connections through one process, or one process per connection, or ??
On the OpenVMS side of the fence, we create a new process for each connection. The processes (potentially) read the same file.
Post by David Froble
Process limits could be an issue.
Don't think it is related to process limits, we have the limit set very high, and when it fails, we get an error message. We don't see any error messages of that nature.
Post by David Froble
How many connections, and transfers, are you looking at per second? Total?
Ideally, we would like to be able to scale to large numbers. At about 5 processes we hit a ceiling, and adding processes does not cause any speedup for total throughput.

We can easily read 50000 to 25000 records per second (depending on how wide the records are). But when we uncomment the TCP/IP transfer, we drop to 10% throughput.
Post by David Froble
Maybe still not enough detail information ....
Bob Gezelter
2017-03-08 02:47:38 UTC
Permalink
Raw Message
Post by Dann Corbit
Post by David Froble
Post by Dann Corbit
Post by Bob Gezelter
Post by Dann Corbit
What are the factors associated with "ENQs Forced To Wait Rate" that cause a large value to occur?
What are the things that can be done to reduce locking contention for a network process?
OpenVMS Monitor Utility
LOCK MANAGEMENT STATISTICS
on node I2VMS
6-MAR-2017 15:43:07.90
CUR AVE MIN MAX
New ENQ Rate 5822.33 3239.05 0.00 5840.66
Converted ENQ Rate 18729.66 10346.82 0.00 18781.66
DEQ Rate 5822.33 3236.59 0.00 5840.66
Blocking AST Rate 0.00 0.00 0.00 0.00
ENQs Forced To Wait Rate 1554.33 847.85 0.00 1658.33
ENQs Not Queued Rate 0.00 1.88 0.00 8.66
Deadlock Search Rate 0.00 0.00 0.00 0.00
Deadlock Find Rate 0.00 0.00 0.00 0.00
Total Locks 1657.00 1665.65 1605.00 1710.00
Dann,
More information is needed about what your processes are doing with their files and locks.
If I were to hazard a guess, more than one process is attempting to lock the resource. There are other statistics that one can gather (see Keith Parris' response to a similar question at https://groups.google.com/forum/#!msg/comp.os.vms/UA3ecyi6dc4/OwAkQLYWZp8J).
Getting the rate down means reducing the locking conflict. That needs more understanding of the application and workload.
If I have been unclear, please feel free to contact me.
- Bob Gezelter, http://www.rlgsc.com
The problem is related to TCP/IP processing.
We have an application that does RMS I/O operations and sends the data to a client requester.
To simplify the problem, we took a simple echo server and modified it to send and receive representative record packages.
When we turn off the TCP/IP send/receive we get stupendous throughput (many tens of thousands of records per second). When we add the TCP/IP send/receive (with everything else intact, including the random and sequential RMS reads) the process slows to 10% of the throughput.
We have 1 GB network cards, full duplex.
It's not the transfer rate that is the problem because we can read a large file using FTP at a transfer rate that should accomplish our record I/O goals. But the small packets of TCP/IP bog the whole system down enormously.
We have the Nagle algorithm turned off (made very little difference).
We even set the FAB$V_NQL option in the FAB$B_SHR field to request that RMS use no query locking. (made very little difference).
So we can read the RMS file like mad, but when we delay things by sending small TCP/IP packets, it goes right into the crapper.
I should mention that it is a SMP program, with each client getting their own connection.
The application is written in C++.
The sample program is also written in C++.
I'm guessing that it's not a locking problem at all. As far as I know, TCP/IP
does not use the DLM.
With caching, file I/O can be very surprising.
Can we assume that there is no locking and updating occurring?
I've got applications with clients getting connections, asking for and receiving
data, and they are rather quick. Using sockets using the system service
interface, not any C interface. I cannot comment on any differences, if such
may exist.
I have read complaints in the past that the HP TCP/IP performance is not so
great, and I've read complaints that network I/O on VMS is not so great. If the
software cannot keep up with GB network HW, you'll be limited to what the
software can do.
One place to look is some of the SYSGEN parameters.
We set up our SYSGEN parameters to mirror the ORACLE OpenVMS recommendations. It did not help much.
Post by David Froble
Are all connections through one process, or one process per connection, or ??
On the OpenVMS side of the fence, we create a new process for each connection. The processes (potentially) read the same file.
Post by David Froble
Process limits could be an issue.
Don't think it is related to process limits, we have the limit set very high, and when it fails, we get an error message. We don't see any error messages of that nature.
Post by David Froble
How many connections, and transfers, are you looking at per second? Total?
Ideally, we would like to be able to scale to large numbers. At about 5 processes we hit a ceiling, and adding processes does not cause any speedup for total throughput.
We can easily read 50000 to 25000 records per second (depending on how wide the records are). But when we uncomment the TCP/IP transfer, we drop to 10% throughput.
Post by David Froble
Maybe still not enough detail information ....
Dann,

First, creating processes for each connection is relatively expensive on OpenVMS, and depending on how it is being done, could cause quite a bit of file activity.

Hard to say without looking at the code. The devil is almost always in the details. Socket level code in particular has a number of ways to lose impressive amounts of efficiency.

FTP pushes high volumes of data. If you are doing ping-pong at TCP level, the throughput will drop drastically. There are ways to fix this, but as I said, the devil is in the details.

- Bob Gezelter, http://www.rlgsc.com
David Froble
2017-03-08 05:06:51 UTC
Permalink
Raw Message
Post by Bob Gezelter
Post by Dann Corbit
Post by David Froble
Post by Dann Corbit
Post by Bob Gezelter
Post by Dann Corbit
What are the factors associated with "ENQs Forced To Wait Rate" that cause a large value to occur?
What are the things that can be done to reduce locking contention for a network process?
OpenVMS Monitor Utility
LOCK MANAGEMENT STATISTICS
on node I2VMS
6-MAR-2017 15:43:07.90
CUR AVE MIN MAX
New ENQ Rate 5822.33 3239.05 0.00 5840.66
Converted ENQ Rate 18729.66 10346.82 0.00 18781.66
DEQ Rate 5822.33 3236.59 0.00 5840.66
Blocking AST Rate 0.00 0.00 0.00 0.00
ENQs Forced To Wait Rate 1554.33 847.85 0.00 1658.33
ENQs Not Queued Rate 0.00 1.88 0.00 8.66
Deadlock Search Rate 0.00 0.00 0.00 0.00
Deadlock Find Rate 0.00 0.00 0.00 0.00
Total Locks 1657.00 1665.65 1605.00 1710.00
Dann,
More information is needed about what your processes are doing with their files and locks.
If I were to hazard a guess, more than one process is attempting to lock the resource. There are other statistics that one can gather (see Keith Parris' response to a similar question at https://groups.google.com/forum/#!msg/comp.os.vms/UA3ecyi6dc4/OwAkQLYWZp8J).
Getting the rate down means reducing the locking conflict. That needs more understanding of the application and workload.
If I have been unclear, please feel free to contact me.
- Bob Gezelter, http://www.rlgsc.com
The problem is related to TCP/IP processing.
We have an application that does RMS I/O operations and sends the data to a client requester.
To simplify the problem, we took a simple echo server and modified it to send and receive representative record packages.
When we turn off the TCP/IP send/receive we get stupendous throughput (many tens of thousands of records per second). When we add the TCP/IP send/receive (with everything else intact, including the random and sequential RMS reads) the process slows to 10% of the throughput.
We have 1 GB network cards, full duplex.
It's not the transfer rate that is the problem because we can read a large file using FTP at a transfer rate that should accomplish our record I/O goals. But the small packets of TCP/IP bog the whole system down enormously.
We have the Nagle algorithm turned off (made very little difference).
We even set the FAB$V_NQL option in the FAB$B_SHR field to request that RMS use no query locking. (made very little difference).
So we can read the RMS file like mad, but when we delay things by sending small TCP/IP packets, it goes right into the crapper.
I should mention that it is a SMP program, with each client getting their own connection.
The application is written in C++.
The sample program is also written in C++.
I'm guessing that it's not a locking problem at all. As far as I know, TCP/IP
does not use the DLM.
With caching, file I/O can be very surprising.
Can we assume that there is no locking and updating occurring?
I've got applications with clients getting connections, asking for and receiving
data, and they are rather quick. Using sockets using the system service
interface, not any C interface. I cannot comment on any differences, if such
may exist.
I have read complaints in the past that the HP TCP/IP performance is not so
great, and I've read complaints that network I/O on VMS is not so great. If the
software cannot keep up with GB network HW, you'll be limited to what the
software can do.
One place to look is some of the SYSGEN parameters.
We set up our SYSGEN parameters to mirror the ORACLE OpenVMS recommendations. It did not help much.
Post by David Froble
Are all connections through one process, or one process per connection, or ??
On the OpenVMS side of the fence, we create a new process for each connection. The processes (potentially) read the same file.
Post by David Froble
Process limits could be an issue.
Don't think it is related to process limits, we have the limit set very high, and when it fails, we get an error message. We don't see any error messages of that nature.
Post by David Froble
How many connections, and transfers, are you looking at per second? Total?
Ideally, we would like to be able to scale to large numbers. At about 5 processes we hit a ceiling, and adding processes does not cause any speedup for total throughput.
We can easily read 50000 to 25000 records per second (depending on how wide the records are). But when we uncomment the TCP/IP transfer, we drop to 10% throughput.
Post by David Froble
Maybe still not enough detail information ....
Dann,
First, creating processes for each connection is relatively expensive on
OpenVMS, and depending on how it is being done, could cause quite a bit of
file activity.
Agreed, but, with today's processors, it isn't the problem it was on an 11/780.
Of course, as you write, it depends on how it's done.

How I've done it is to have the service running, receiving connection requests,
granting the connection, read the client request, perform the operation, send
the result, and close the connection. But, the process stays.
Post by Bob Gezelter
Hard to say without looking at the code. The devil is almost always in the
details. Socket level code in particular has a number of ways to lose
impressive amounts of efficiency.
FTP pushes high volumes of data. If you are doing ping-pong at TCP level, the
throughput will drop drastically. There are ways to fix this, but as I said,
the devil is in the details.
The OP mentions "large numbers", but, just what is large numbers? 50, 100, 1000?
Dann Corbit
2017-03-08 05:33:27 UTC
Permalink
Raw Message
Post by David Froble
Post by Bob Gezelter
Post by Dann Corbit
Post by David Froble
Post by Dann Corbit
Post by Bob Gezelter
Post by Dann Corbit
What are the factors associated with "ENQs Forced To Wait Rate" that cause a large value to occur?
What are the things that can be done to reduce locking contention for a network process?
OpenVMS Monitor Utility
LOCK MANAGEMENT STATISTICS
on node I2VMS
6-MAR-2017 15:43:07.90
CUR AVE MIN MAX
New ENQ Rate 5822.33 3239.05 0.00 5840.66
Converted ENQ Rate 18729.66 10346.82 0.00 18781.66
DEQ Rate 5822.33 3236.59 0.00 5840.66
Blocking AST Rate 0.00 0.00 0.00 0.00
ENQs Forced To Wait Rate 1554.33 847.85 0.00 1658.33
ENQs Not Queued Rate 0.00 1.88 0.00 8.66
Deadlock Search Rate 0.00 0.00 0.00 0.00
Deadlock Find Rate 0.00 0.00 0.00 0.00
Total Locks 1657.00 1665.65 1605.00 1710.00
Dann,
More information is needed about what your processes are doing with their files and locks.
If I were to hazard a guess, more than one process is attempting to lock the resource. There are other statistics that one can gather (see Keith Parris' response to a similar question at https://groups.google.com/forum/#!msg/comp.os.vms/UA3ecyi6dc4/OwAkQLYWZp8J).
Getting the rate down means reducing the locking conflict. That needs more understanding of the application and workload.
If I have been unclear, please feel free to contact me.
- Bob Gezelter, http://www.rlgsc.com
The problem is related to TCP/IP processing.
We have an application that does RMS I/O operations and sends the data to a client requester.
To simplify the problem, we took a simple echo server and modified it to send and receive representative record packages.
When we turn off the TCP/IP send/receive we get stupendous throughput (many tens of thousands of records per second). When we add the TCP/IP send/receive (with everything else intact, including the random and sequential RMS reads) the process slows to 10% of the throughput.
We have 1 GB network cards, full duplex.
It's not the transfer rate that is the problem because we can read a large file using FTP at a transfer rate that should accomplish our record I/O goals. But the small packets of TCP/IP bog the whole system down enormously.
We have the Nagle algorithm turned off (made very little difference).
We even set the FAB$V_NQL option in the FAB$B_SHR field to request that RMS use no query locking. (made very little difference).
So we can read the RMS file like mad, but when we delay things by sending small TCP/IP packets, it goes right into the crapper.
I should mention that it is a SMP program, with each client getting their own connection.
The application is written in C++.
The sample program is also written in C++.
I'm guessing that it's not a locking problem at all. As far as I know, TCP/IP
does not use the DLM.
With caching, file I/O can be very surprising.
Can we assume that there is no locking and updating occurring?
I've got applications with clients getting connections, asking for and receiving
data, and they are rather quick. Using sockets using the system service
interface, not any C interface. I cannot comment on any differences, if such
may exist.
I have read complaints in the past that the HP TCP/IP performance is not so
great, and I've read complaints that network I/O on VMS is not so great. If the
software cannot keep up with GB network HW, you'll be limited to what the
software can do.
One place to look is some of the SYSGEN parameters.
We set up our SYSGEN parameters to mirror the ORACLE OpenVMS recommendations. It did not help much.
Post by David Froble
Are all connections through one process, or one process per connection, or ??
On the OpenVMS side of the fence, we create a new process for each connection. The processes (potentially) read the same file.
Post by David Froble
Process limits could be an issue.
Don't think it is related to process limits, we have the limit set very high, and when it fails, we get an error message. We don't see any error messages of that nature.
Post by David Froble
How many connections, and transfers, are you looking at per second? Total?
Ideally, we would like to be able to scale to large numbers. At about 5 processes we hit a ceiling, and adding processes does not cause any speedup for total throughput.
We can easily read 50000 to 25000 records per second (depending on how wide the records are). But when we uncomment the TCP/IP transfer, we drop to 10% throughput.
Post by David Froble
Maybe still not enough detail information ....
Dann,
First, creating processes for each connection is relatively expensive on
OpenVMS, and depending on how it is being done, could cause quite a bit of
file activity.
Agreed, but, with today's processors, it isn't the problem it was on an 11/780.
Of course, as you write, it depends on how it's done.
How I've done it is to have the service running, receiving connection requests,
granting the connection, read the client request, perform the operation, send
the result, and close the connection. But, the process stays.
Post by Bob Gezelter
Hard to say without looking at the code. The devil is almost always in the
details. Socket level code in particular has a number of ways to lose
impressive amounts of efficiency.
FTP pushes high volumes of data. If you are doing ping-pong at TCP level, the
throughput will drop drastically. There are ways to fix this, but as I said,
the devil is in the details.
The OP mentions "large numbers", but, just what is large numbers? 50, 100, 1000?
In this particular case, we have requests, which contain the key and responses that contain the record. The overhead is about 50 bytes (along with TCP/IP encapsulation, not considered) and the data.

Our benchmark is using 37 byte records, so the size of the message is about 90 bytes.

We can read tens of thousands of rows in a second on the OpenVMS side of the fence. Adding processes in parallel to manage the load rapidly loses efficiency as there is some kind of contention that builds up. The contention is not disk based I/O related. If we simply comment out the TCP/IP part and leave in seeks (based on a randomized list of keys) and reads the speed goes up by a factor of ten.
j***@gmail.com
2017-03-08 13:58:32 UTC
Permalink
Raw Message
Post by Dann Corbit
We can read tens of thousands of rows in a second on the OpenVMS side of the
fence. Adding processes in parallel to manage the load rapidly loses efficiency
as there is some kind of contention that builds up. The contention is not disk
based I/O related. If we simply comment out the TCP/IP part and leave in seeks
(based on a randomized list of keys) and reads the speed goes up by a factor of ten.
One idea to consider is to gather data from the spinlock trace utility. My
memory is fuzzy here but I thought there was a utility buried in SYS$EXAMPLE:
that would gather up a sample of spinlock contention and provide a report on it.

I would expect to see a ton of IOLOCK8 contention when you run with your socket
code enabled vs. what it isn't. My own experience with TCP/IP stacks on VMS was
that there are terribly slow and burn time holding onto IOLOCK8.

I have a post from a few years back that details some of this. I can try to find
a link to that. It's buried somewhere here on comp.os.vms.

EJ
Dann Corbit
2017-03-08 15:01:17 UTC
Permalink
Raw Message
I guess it was this post:
https://groups.google.com/forum/#!searchin/comp.os.vms/IOLOCK8$20vms|sort:relevance/comp.os.vms/9wxcJ0bGgSw/KwATgBwa1H8J

Thanks for the feedback.

I did not run any profiling yet, but supposing it is IOLOCK8 contention, is there anything that can be done about it?

We have several TCP/IP stacks here, but we cannot dictate that to customers.
j***@gmail.com
2017-03-08 15:35:18 UTC
Permalink
Raw Message
Post by Dann Corbit
Thanks for the feedback.
Another idea for you is to run with the socket code, but instead of opening
a file to return data, can you rig it up so that it just returns a random
blob of bytes that you generate on the spot? Or perhaps just recycle a valid
response? That way you can determine if the file reading plays a role or not.

I suspect the results will still be poor and will continue to point to the
TCP/IP stack or the act of creating processes along with sending small messages
via TCP/IP.

EJ
David Froble
2017-03-08 16:27:08 UTC
Permalink
Raw Message
Post by Dann Corbit
Post by David Froble
The OP mentions "large numbers", but, just what is large numbers? 50, 100, 1000?
Still haven't specified how many concurrent connections you're wanting to have?
Post by Dann Corbit
In this particular case, we have requests, which contain the key and
responses that contain the record. The overhead is about 50 bytes (along
with TCP/IP encapsulation, not considered) and the data.
Our benchmark is using 37 byte records, so the size of the message is about 90 bytes.
We can read tens of thousands of rows in a second on the OpenVMS side of the
fence. Adding processes in parallel to manage the load rapidly loses
efficiency as there is some kind of contention that builds up. The
contention is not disk based I/O related. If we simply comment out the
TCP/IP part and leave in seeks (based on a randomized list of keys) and reads
the speed goes up by a factor of ten.
With the exception of having multiple CPUs, keep in mind that everything happens
serially. Whether one process, or 100 processes, doesn't matter in some ways.
It seems to me that a single process just might be faster. If trying to handle
multiple connections concurrently, there will be contention there also.
Sometimes serializing the work flow provides the quickest overall completion.
That doesn't mean that every connection request will be serviced as quickly as
required. But, that happens anyway, right?

Some current performance numbers:

1/100 second for an inventory inquiry

In one day:

3290 connection requests
9924 part numbers
recv 263 k bytes
Return 1.1 MB
Transmission wall time 54 seconds
16 ms avg to receive request
91 seconds wall time to perform work
27 ms avg per complete transaction

Also, we have some applications where a web server (not on VMS) needs to make
inquiries. For that we've set up the communications socket to stay open and
between transactions is in a read wait state, with periodic keep-alives
transmitted. This only works when the communication is constantly with the same
remote site.

Some more stuff.

I'm told that the single largest task is opening / creating the socket.
Dann Corbit
2017-03-08 05:09:18 UTC
Permalink
Raw Message
Post by Bob Gezelter
Post by Dann Corbit
Post by David Froble
Post by Dann Corbit
Post by Bob Gezelter
Post by Dann Corbit
What are the factors associated with "ENQs Forced To Wait Rate" that cause a large value to occur?
What are the things that can be done to reduce locking contention for a network process?
OpenVMS Monitor Utility
LOCK MANAGEMENT STATISTICS
on node I2VMS
6-MAR-2017 15:43:07.90
CUR AVE MIN MAX
New ENQ Rate 5822.33 3239.05 0.00 5840.66
Converted ENQ Rate 18729.66 10346.82 0.00 18781.66
DEQ Rate 5822.33 3236.59 0.00 5840.66
Blocking AST Rate 0.00 0.00 0.00 0.00
ENQs Forced To Wait Rate 1554.33 847.85 0.00 1658.33
ENQs Not Queued Rate 0.00 1.88 0.00 8.66
Deadlock Search Rate 0.00 0.00 0.00 0.00
Deadlock Find Rate 0.00 0.00 0.00 0.00
Total Locks 1657.00 1665.65 1605.00 1710.00
Dann,
More information is needed about what your processes are doing with their files and locks.
If I were to hazard a guess, more than one process is attempting to lock the resource. There are other statistics that one can gather (see Keith Parris' response to a similar question at https://groups.google.com/forum/#!msg/comp.os.vms/UA3ecyi6dc4/OwAkQLYWZp8J).
Getting the rate down means reducing the locking conflict. That needs more understanding of the application and workload.
If I have been unclear, please feel free to contact me.
- Bob Gezelter, http://www.rlgsc.com
The problem is related to TCP/IP processing.
We have an application that does RMS I/O operations and sends the data to a client requester.
To simplify the problem, we took a simple echo server and modified it to send and receive representative record packages.
When we turn off the TCP/IP send/receive we get stupendous throughput (many tens of thousands of records per second). When we add the TCP/IP send/receive (with everything else intact, including the random and sequential RMS reads) the process slows to 10% of the throughput.
We have 1 GB network cards, full duplex.
It's not the transfer rate that is the problem because we can read a large file using FTP at a transfer rate that should accomplish our record I/O goals. But the small packets of TCP/IP bog the whole system down enormously.
We have the Nagle algorithm turned off (made very little difference).
We even set the FAB$V_NQL option in the FAB$B_SHR field to request that RMS use no query locking. (made very little difference).
So we can read the RMS file like mad, but when we delay things by sending small TCP/IP packets, it goes right into the crapper.
I should mention that it is a SMP program, with each client getting their own connection.
The application is written in C++.
The sample program is also written in C++.
I'm guessing that it's not a locking problem at all. As far as I know, TCP/IP
does not use the DLM.
With caching, file I/O can be very surprising.
Can we assume that there is no locking and updating occurring?
I've got applications with clients getting connections, asking for and receiving
data, and they are rather quick. Using sockets using the system service
interface, not any C interface. I cannot comment on any differences, if such
may exist.
I have read complaints in the past that the HP TCP/IP performance is not so
great, and I've read complaints that network I/O on VMS is not so great. If the
software cannot keep up with GB network HW, you'll be limited to what the
software can do.
One place to look is some of the SYSGEN parameters.
We set up our SYSGEN parameters to mirror the ORACLE OpenVMS recommendations. It did not help much.
Post by David Froble
Are all connections through one process, or one process per connection, or ??
On the OpenVMS side of the fence, we create a new process for each connection. The processes (potentially) read the same file.
Post by David Froble
Process limits could be an issue.
Don't think it is related to process limits, we have the limit set very high, and when it fails, we get an error message. We don't see any error messages of that nature.
Post by David Froble
How many connections, and transfers, are you looking at per second? Total?
Ideally, we would like to be able to scale to large numbers. At about 5 processes we hit a ceiling, and adding processes does not cause any speedup for total throughput.
We can easily read 50000 to 25000 records per second (depending on how wide the records are). But when we uncomment the TCP/IP transfer, we drop to 10% throughput.
Post by David Froble
Maybe still not enough detail information ....
Dann,
First, creating processes for each connection is relatively expensive on OpenVMS, and depending on how it is being done, could cause quite a bit of file activity.
The processes are typically long lasting, and it works out better this way because the processes need the rights and login information for the actual user who is collecting data. With threads, rights management is far more difficult.
Post by Bob Gezelter
Hard to say without looking at the code. The devil is almost always in the details. Socket level code in particular has a number of ways to lose impressive amounts of efficiency.
FTP pushes high volumes of data. If you are doing ping-pong at TCP level, the throughput will drop drastically. There are ways to fix this, but as I said, the devil is in the details.
- Bob Gezelter, http://www.rlgsc.com
V***@SendSpamHere.ORG
2017-03-08 16:52:46 UTC
Permalink
Raw Message
Post by Dann Corbit
Post by Dann Corbit
What are the factors associated with "ENQs Forced To Wait Rate" that =
cause a large value to occur?
Post by Dann Corbit
What are the things that can be done to reduce locking contention for=
a network process?
Post by Dann Corbit
OpenVMS Monitor Utility =
=20
Post by Dann Corbit
LOCK MANAGEMENT STATISTICS=20
on node I2VMS =
=20
Post by Dann Corbit
6-MAR-2017 15:43:07.90 =
=20
Post by Dann Corbit
CUR AVE MIN =
MAX =20
Post by Dann Corbit
New ENQ Rate 5822.33 3239.05 0.00 5=
840.66 =20
Post by Dann Corbit
Converted ENQ Rate 18729.66 10346.82 0.00 18=
781.66 =20
Post by Dann Corbit
DEQ Rate 5822.33 3236.59 0.00 5=
840.66 =20
Post by Dann Corbit
Blocking AST Rate 0.00 0.00 0.00 =
0.00 =20
Post by Dann Corbit
ENQs Forced To Wait Rate 1554.33 847.85 0.00 1=
658.33 =20
Post by Dann Corbit
ENQs Not Queued Rate 0.00 1.88 0.00 =
8.66 =20
Post by Dann Corbit
Deadlock Search Rate 0.00 0.00 0.00 =
0.00 =20
Post by Dann Corbit
Deadlock Find Rate 0.00 0.00 0.00 =
0.00 =20
Post by Dann Corbit
Total Locks 1657.00 1665.65 1605.00 1=
710.00
Post by Dann Corbit
Dann,
More information is needed about what your processes are doing with th=
eir files and locks.
Post by Dann Corbit
If I were to hazard a guess, more than one process is attempting to lo=
ck the resource. There are other statistics that one can gather (see Keith =
Parris' response to a similar question at https://groups.google.com/forum/#=
!msg/comp.os.vms/UA3ecyi6dc4/OwAkQLYWZp8J).
Post by Dann Corbit
Getting the rate down means reducing the locking conflict. That needs =
more understanding of the application and workload.
Post by Dann Corbit
If I have been unclear, please feel free to contact me.
- Bob Gezelter, http://www.rlgsc.com
=20
The problem is related to TCP/IP processing.
We have an application that does RMS I/O operations and sends the data =
to a client requester.
Post by Dann Corbit
=20
To simplify the problem, we took a simple echo server and modified it t=
o send and receive representative record packages.
Post by Dann Corbit
=20
When we turn off the TCP/IP send/receive we get stupendous throughput (=
many tens of thousands of records per second). When we add the TCP/IP send=
/receive (with everything else intact, including the random and sequential =
RMS reads) the process slows to 10% of the throughput.
Post by Dann Corbit
=20
We have 1 GB network cards, full duplex.
It's not the transfer rate that is the problem because we can read a la=
rge file using FTP at a transfer rate that should accomplish our record I/O=
goals. But the small packets of TCP/IP bog the whole system down enormous=
ly.
Post by Dann Corbit
=20
We have the Nagle algorithm turned off (made very little difference).
We even set the FAB$V_NQL option in the FAB$B_SHR field to request that=
RMS use no query locking. (made very little difference).
Post by Dann Corbit
=20
So we can read the RMS file like mad, but when we delay things by sendi=
ng small TCP/IP packets, it goes right into the crapper.
Post by Dann Corbit
=20
I should mention that it is a SMP program, with each client getting the=
ir own connection.
Post by Dann Corbit
=20
The application is written in C++.
The sample program is also written in C++.
=20
I'm guessing that it's not a locking problem at all. As far as I know, T=
CP/IP=20
does not use the DLM.
=20
With caching, file I/O can be very surprising.
=20
Can we assume that there is no locking and updating occurring?
=20
I've got applications with clients getting connections, asking for and re=
ceiving=20
data, and they are rather quick. Using sockets using the system service=
=20
interface, not any C interface. I cannot comment on any differences, if =
such=20
may exist.
=20
I have read complaints in the past that the HP TCP/IP performance is not =
so=20
great, and I've read complaints that network I/O on VMS is not so great. =
If the=20
software cannot keep up with GB network HW, you'll be limited to what the=
=20
software can do.
=20
One place to look is some of the SYSGEN parameters.
We set up our SYSGEN parameters to mirror the ORACLE OpenVMS recommendation=
s. It did not help much.
Are all connections through one process, or one process per connection, o=
r ??
On the OpenVMS side of the fence, we create a new process for each connecti=
on. The processes (potentially) read the same file.
Process limits could be an issue.
Don't think it is related to process limits, we have the limit set very hig=
h, and when it fails, we get an error message. We don't see any error mess=
ages of that nature.
=20
How many connections, and transfers, are you looking at per second? Tota=
l?
Ideally, we would like to be able to scale to large numbers. At about 5 pr=
ocesses we hit a ceiling, and adding processes does not cause any speedup f=
or total throughput.
We can easily read 50000 to 25000 records per second (depending on how wide=
the records are). But when we uncomment the TCP/IP transfer, we drop to 1=
0% throughput.
=20
Maybe still not enough detail information ....
It sure sounds to me like you need to revisit your program. As was pointed
out, there'd be very little DLM in TCPIP. If you can turn on and off TCPIP
and see your throughput diminish or increase, you need to look at how you're
doing your file access WRT transmitting its data over the wire.
--
VAXman- A Bored Certified VMS Kernel Mode Hacker VAXman(at)TMESIS(dot)ORG

I speak to machines with the voice of humanity.
Stephen Hoffman
2017-03-08 19:46:53 UTC
Permalink
Raw Message
Post by V***@SendSpamHere.ORG
It sure sounds to me like you need to revisit your program. As was
pointed out, there'd be very little DLM in TCPIP. If you can turn on
and off TCPIP and see your throughput diminish or increase, you need to
look at how you're doing your file access WRT transmitting its data
over the wire.
Ayup. Certainly find the names of the lock resources that are most
busy, and work to resolve those that are hindering performance. Or
post some reproducer client and server code, if you want folks to look
at that.
--
Pure Personal Opinion | HoffmanLabs LLC
Craig A. Berry
2017-03-08 04:17:31 UTC
Permalink
Raw Message
More information: The problem is related to TCP/IP processing. We
have an application that does RMS I/O operations and sends the data
to a client requester.
To simplify the problem, we took a simple echo server and modified it
to send and receive representative record packages.
When we turn off the TCP/IP send/receive we get stupendous throughput
(many tens of thousands of records per second). When we add the
TCP/IP send/receive (with everything else intact, including the
random and sequential RMS reads) the process slows to 10% of the
throughput.
We have 1 GB network cards, full duplex. It's not the transfer rate
that is the problem because we can read a large file using FTP at a
transfer rate that should accomplish our record I/O goals. But the
small packets of TCP/IP bog the whole system down enormously.
We have the Nagle algorithm turned off (made very little
difference). We even set the FAB$V_NQL option in the FAB$B_SHR field
to request that RMS use no query locking. (made very little
difference).
So we can read the RMS file like mad, but when we delay things by
sending small TCP/IP packets, it goes right into the crapper.
I should mention that it is a SMP program, with each client getting their own connection.
The application is written in C++. The sample program is also written
in C++.
What size are these packages or packets that you are sending? Bigger
than fit in one socket buffer? How big is your socket buffer (default is
only 255 bytes and it takes privs to change it)? Do you have jumbo
frames enabled? Do you have the packet processing engine enabled? Are
the sockets blocking or non-blocking? Are there any errors on the network?
David Froble
2017-03-08 05:08:31 UTC
Permalink
Raw Message
Post by Craig A. Berry
More information: The problem is related to TCP/IP processing. We
have an application that does RMS I/O operations and sends the data
to a client requester.
To simplify the problem, we took a simple echo server and modified it
to send and receive representative record packages.
When we turn off the TCP/IP send/receive we get stupendous throughput
(many tens of thousands of records per second). When we add the
TCP/IP send/receive (with everything else intact, including the
random and sequential RMS reads) the process slows to 10% of the
throughput.
We have 1 GB network cards, full duplex. It's not the transfer rate
that is the problem because we can read a large file using FTP at a
transfer rate that should accomplish our record I/O goals. But the
small packets of TCP/IP bog the whole system down enormously.
We have the Nagle algorithm turned off (made very little
difference). We even set the FAB$V_NQL option in the FAB$B_SHR field
to request that RMS use no query locking. (made very little
difference).
So we can read the RMS file like mad, but when we delay things by
sending small TCP/IP packets, it goes right into the crapper.
I should mention that it is a SMP program, with each client getting their own connection.
The application is written in C++. The sample program is also written
in C++.
What size are these packages or packets that you are sending? Bigger
than fit in one socket buffer? How big is your socket buffer (default is
only 255 bytes and it takes privs to change it)? Do you have jumbo
frames enabled? Do you have the packet processing engine enabled? Are
the sockets blocking or non-blocking? Are there any errors on the network?
Yeah, like mis-matched duplex. I've seen that bring network activity to well
below it's knees ....
Dann Corbit
2017-03-08 05:24:58 UTC
Permalink
Raw Message
Post by Craig A. Berry
More information: The problem is related to TCP/IP processing. We
have an application that does RMS I/O operations and sends the data
to a client requester.
To simplify the problem, we took a simple echo server and modified it
to send and receive representative record packages.
When we turn off the TCP/IP send/receive we get stupendous throughput
(many tens of thousands of records per second). When we add the
TCP/IP send/receive (with everything else intact, including the
random and sequential RMS reads) the process slows to 10% of the
throughput.
We have 1 GB network cards, full duplex. It's not the transfer rate
that is the problem because we can read a large file using FTP at a
transfer rate that should accomplish our record I/O goals. But the
small packets of TCP/IP bog the whole system down enormously.
We have the Nagle algorithm turned off (made very little
difference). We even set the FAB$V_NQL option in the FAB$B_SHR field
to request that RMS use no query locking. (made very little
difference).
So we can read the RMS file like mad, but when we delay things by
sending small TCP/IP packets, it goes right into the crapper.
I should mention that it is a SMP program, with each client getting
their own connection.
The application is written in C++. The sample program is also written
in C++.
What size are these packages or packets that you are sending?
Typically fairly small (100 - 200 bytes) but possibly much larger if the RMS records are large. It contains more than just the record, but the overhead is not that much.
Post by Craig A. Berry
Bigger
than fit in one socket buffer?
I think sb_max is 128K, and our buffers will never nearly so large.
Post by Craig A. Berry
How big is your socket buffer (default is
only 255 bytes and it takes privs to change it)?
Our tcp/ip buffer size is 16384.
Post by Craig A. Berry
Do you have jumbo
frames enabled?
Not sure.
Post by Craig A. Berry
Do you have the packet processing engine enabled?
Not sure.
Post by Craig A. Berry
Are
the sockets blocking or non-blocking?
We tried non-blocking with ast and synchronous and both took about the same time.
Post by Craig A. Berry
Are there any errors on the network?
No network errors.

I will have to talk to the OpenVMS systems guy to get answers to some of these questions.

However, our message size is small in this case. So things like huge buffers won't matter (unless huge buffers are somehow useful in this case, but I can't imagine how).
Michael Moroney
2017-03-08 22:17:55 UTC
Permalink
Raw Message
Post by Dann Corbit
The problem is related to TCP/IP processing.
We have an application that does RMS I/O operations and sends the data to a client requester.
To simplify the problem, we took a simple echo server and modified it to
send and receive representative record packages.
When we turn off the TCP/IP send/receive we get stupendous throughput (many
tens of thousands of records per second). When we add the TCP/IP send/receive
(with everything else intact, including the random and sequential RMS
reads) the process slows to 10% of the throughput.
We have 1 GB network cards, full duplex.
It's not the transfer rate that is the problem because we can read a large
file using FTP at a transfer rate that should accomplish our record I/O goals.
But the small packets of TCP/IP bog the whole system down enormously.
We have the Nagle algorithm turned off (made very little difference).
We even set the FAB$V_NQL option in the FAB$B_SHR field to request that RMS
use no query locking. (made very little difference).
So we can read the RMS file like mad, but when we delay things by sending
small TCP/IP packets, it goes right into the crapper.
I should mention that it is a SMP program, with each client getting their own connection.
The application is written in C++.
The sample program is also written in C++.
Try this:

Run the echo server on the same node (perhaps with a boosted priority)
and have the application connect to localhost/127.0.0.1. This may tell
you whether it's the TCP/IP stack itself that's slowing you down or the
process of getting out onto the ethernet that's doing so. This is because
you'll use the internal loopback "connection" instead of a real connection.

Does it now run faster, despite the fact the echo server is running on the
same system? Or just as bad/worse?
Bob Gezelter
2017-03-08 22:35:00 UTC
Permalink
Raw Message
Post by Michael Moroney
Post by Dann Corbit
The problem is related to TCP/IP processing.
We have an application that does RMS I/O operations and sends the data to
a client requester.
To simplify the problem, we took a simple echo server and modified it to
send and receive representative record packages.
When we turn off the TCP/IP send/receive we get stupendous throughput (many
tens of thousands of records per second). When we add the TCP/IP send/receive
(with everything else intact, including the random and sequential RMS
reads) the process slows to 10% of the throughput.
We have 1 GB network cards, full duplex.
It's not the transfer rate that is the problem because we can read a large
file using FTP at a transfer rate that should accomplish our record I/O goals.
But the small packets of TCP/IP bog the whole system down enormously.
We have the Nagle algorithm turned off (made very little difference).
We even set the FAB$V_NQL option in the FAB$B_SHR field to request that RMS
use no query locking. (made very little difference).
So we can read the RMS file like mad, but when we delay things by sending
small TCP/IP packets, it goes right into the crapper.
I should mention that it is a SMP program, with each client getting their own connection.
The application is written in C++.
The sample program is also written in C++.
Run the echo server on the same node (perhaps with a boosted priority)
and have the application connect to localhost/127.0.0.1. This may tell
you whether it's the TCP/IP stack itself that's slowing you down or the
process of getting out onto the ethernet that's doing so. This is because
you'll use the internal loopback "connection" instead of a real connection.
Does it now run faster, despite the fact the echo server is running on the
same system? Or just as bad/worse?
Dann,

I am curious if the Nagle algorithm is actually disabled.

Mike's local loopback idea is a good one. It would also be useful to have a network trace of the connection to see what is going on. There are other things I would try (e.g., faking data to totally remove RMS from the scenario.

- Bob Gezelter, http://www.rlgsc.com
Dann Corbit
2017-03-08 23:25:37 UTC
Permalink
Raw Message
Post by Bob Gezelter
Post by Michael Moroney
Post by Dann Corbit
The problem is related to TCP/IP processing.
We have an application that does RMS I/O operations and sends the data to
a client requester.
To simplify the problem, we took a simple echo server and modified it to
send and receive representative record packages.
When we turn off the TCP/IP send/receive we get stupendous throughput (many
tens of thousands of records per second). When we add the TCP/IP send/receive
(with everything else intact, including the random and sequential RMS
reads) the process slows to 10% of the throughput.
We have 1 GB network cards, full duplex.
It's not the transfer rate that is the problem because we can read a large
file using FTP at a transfer rate that should accomplish our record I/O goals.
But the small packets of TCP/IP bog the whole system down enormously.
We have the Nagle algorithm turned off (made very little difference).
We even set the FAB$V_NQL option in the FAB$B_SHR field to request that RMS
use no query locking. (made very little difference).
So we can read the RMS file like mad, but when we delay things by sending
small TCP/IP packets, it goes right into the crapper.
I should mention that it is a SMP program, with each client getting their
own connection.
The application is written in C++.
The sample program is also written in C++.
Run the echo server on the same node (perhaps with a boosted priority)
and have the application connect to localhost/127.0.0.1. This may tell
you whether it's the TCP/IP stack itself that's slowing you down or the
process of getting out onto the ethernet that's doing so. This is because
you'll use the internal loopback "connection" instead of a real connection.
Does it now run faster, despite the fact the echo server is running on the
same system? Or just as bad/worse?
Dann,
I am curious if the Nagle algorithm is actually disabled.
Yes, we did this:

// Disable the TCP/IP Nagle Algorithm ... major performance drag
int nOpt = 1;

setsockopt( m_conn, IPPROTO_TCP, TCP_NODELAY, (char *)&nOpt, sizeof( nOpt ) );
Post by Bob Gezelter
Mike's local loopback idea is a good one. It would also be useful to have a network trace of the connection to see what is going on. There are other things I would try (e.g., faking data to totally remove RMS from the scenario.
With RMS reads included (both keyed and sequential) we achieved fabulous performance, so we did not need to remove the RMS calls. Just doing a simple send/receive pair with 20 byte packets (even with mock data) threw the whole performance down the drain.

We isolated the problem to be TCP/IP. We found some simple solutions to increase throughput:
1. Connect to a cluster by name rather than to a node. That way, the process scales much better, as previously 5 connections saturated the throughput, and now we get N*5 where N is the node count in the cluster.
2. Batch small collections of keys instead of one key at a time. Increased performance by about 3 times.
Post by Bob Gezelter
- Bob Gezelter, http://www.rlgsc.com
Thanks to everyone for all the help. All of the input was extremely valuable and helped us to solve the problem.
Stephen Hoffman
2017-03-08 23:52:04 UTC
Permalink
Raw Message
Post by Dann Corbit
Thanks to everyone for all the help. All of the input was extremely
valuable and helped us to solve the problem.
Okay. so .... MONITOR snapshots are a poor tool for performance
monitoring (trends are necessary for that) and for debugging (MONITOR
provides no idea what's going on), and that knowing the lock resource
names involved can be important, and that instrumenting or profiling
the application code can be useful. DECset PCA, if you have it.
Probably also that there might still be a saturated network controller
involved somewhere in this configuration, might want to see if there is
a misconfigured controller (duplex or speed) elsewhere. Beyond a NIC
issue, also look for a flaky or misconfigured switch port. (Local
rule of thumb: never, ever, ever trust managed switch displays about
duplex and speed settings. Always verify the connection
independently. One of the best known vendors is the worst here, too.)
Of all these, DECset PCA is probably the easiest way to spot where
the code is spending most of its time, but there are other ways to
sample PC addresses using SDA or otherwise. tcpdump — patched to
current, given the barrage of security bugs reported in that — or via
other tools — using switch port mirroring or otherwise — can also
provide some idea of the network flow, too.
--
Pure Personal Opinion | HoffmanLabs LLC
V***@SendSpamHere.ORG
2017-03-07 14:42:12 UTC
Permalink
Raw Message
Post by Dann Corbit
What are the factors associated with "ENQs Forced To Wait Rate" that cause a large value to occur?
Locks that can not be granted immediately are placed on a wait queue. When
the resource specified in the $ENQ request is available, a waiting lock can
then be granted.
Post by Dann Corbit
What are the things that can be done to reduce locking contention for a network process?
Stop using the system.
--
VAXman- A Bored Certified VMS Kernel Mode Hacker VAXman(at)TMESIS(dot)ORG

I speak to machines with the voice of humanity.
David Froble
2017-03-07 18:37:02 UTC
Permalink
Raw Message
Post by V***@SendSpamHere.ORG
Post by Dann Corbit
What are the factors associated with "ENQs Forced To Wait Rate" that cause a large value to occur?
Locks that can not be granted immediately are placed on a wait queue. When
the resource specified in the $ENQ request is available, a waiting lock can
then be granted.
The itanics are rather fast. When designing anything that has contention, best
practice is to minimize the time a resource is locked. Poor programming
practices can defeat this concept.

Yes, pending locks can be converted, less overhead, but I've never seen the lock
conversions so high with respect to the total locks. And so, Bob's questions
remain, just what is being attempted?
V***@SendSpamHere.ORG
2017-03-07 18:48:43 UTC
Permalink
Raw Message
Post by David Froble
Post by V***@SendSpamHere.ORG
Post by Dann Corbit
What are the factors associated with "ENQs Forced To Wait Rate" that cause a large value to occur?
Locks that can not be granted immediately are placed on a wait queue. When
the resource specified in the $ENQ request is available, a waiting lock can
then be granted.
The itanics are rather fast.
...and...???
Post by David Froble
When designing anything that has contention, best
practice is to minimize the time a resource is locked. Poor programming
practices can defeat this concept.
If a shared resource requires one-at-a-time sharing, it's not poor programming
that is at fault and all the programming mojo in the world will NOT change the
fact that the resource can not be share/used by more than one at a time.
Post by David Froble
Yes, pending locks can be converted, less overhead, but I've never seen the lock
conversions so high with respect to the total locks. And so, Bob's questions
remain, just what is being attempted?
Less overhead but the fact remains that it can not be granted or converted if
the resource is presently locked by another entity -- wait for it regardless.
--
VAXman- A Bored Certified VMS Kernel Mode Hacker VAXman(at)TMESIS(dot)ORG

I speak to machines with the voice of humanity.
David Froble
2017-03-07 20:45:47 UTC
Permalink
Raw Message
Post by V***@SendSpamHere.ORG
Post by David Froble
Post by V***@SendSpamHere.ORG
Post by Dann Corbit
What are the factors associated with "ENQs Forced To Wait Rate" that cause a large value to occur?
Locks that can not be granted immediately are placed on a wait queue. When
the resource specified in the $ENQ request is available, a waiting lock can
then be granted.
The itanics are rather fast.
...and...???
The faster an operation can lock, update, and release the lock, the sooner
another operation can get a lock on the same resource. So, today's speeds sure
beat an 11/780, in ways including processing locks.
Post by V***@SendSpamHere.ORG
Post by David Froble
When designing anything that has contention, best
practice is to minimize the time a resource is locked. Poor programming
practices can defeat this concept.
If a shared resource requires one-at-a-time sharing, it's not poor programming
that is at fault and all the programming mojo in the world will NOT change the
fact that the resource can not be share/used by more than one at a time.
I fully agree. However, I've seen some instances of a lock being obtained, and
a wait for KB input before completion of the operation and release of the lock.
Post by V***@SendSpamHere.ORG
Post by David Froble
Yes, pending locks can be converted, less overhead, but I've never seen the lock
conversions so high with respect to the total locks. And so, Bob's questions
remain, just what is being attempted?
Less overhead but the fact remains that it can not be granted or converted if
the resource is presently locked by another entity -- wait for it regardless.
Having implemented a database that uses the DLM, I'm rather familiar with it's
usage. As part of the implementation, several studies involving locking were
performed.

I've beaten up the DLM to see what I could get away with. It's that experience
I was using to make my observations.
Volker Halle
2017-03-08 17:14:57 UTC
Permalink
Raw Message
Dann,

make use of the OpenVMS built-in tools to find out, which locks/resources are affected:

$ ANAL/SYS
SDA> SHOW LOCK/CONV ! Show the locks in the conversion Queue

SDA> LCK LOAD
SDA> LCK SHOW ACT ! Show the most active resource trees

SDA> EXIT

Regards,

Volker.
Loading...