Discussion:
Apache + mod_php performance
(too old to reply)
Arne Vajhøj
2024-09-24 18:28:05 UTC
Permalink
I am not impressed by Apache + mod_php performance on VMS.

The basic numbers I see (simple PHP code for getting some data
out of a MySQL database and displaying) are:

Apache + CGI : 4 req/sec = 240 req/min
Apache + mod_php : 11 req/sec = 660 req/min
Tomcat + Quercus : 127 req/sec = 7620 req/min

(VMS x86-64 9.2-2, Apache 2.4-58, Berryman PHP 8.1,
Java 8u372, Tomcat 8.5-89, Quercus 4.0)

That CGI is slow is no surprise. Using CGI for performance
is like doing 100 meter crawl dressed in medieval armor.

But I had expected much better numbers for mod_php. Instead
of the actual x2.5 and x10 I had expected like x10 and x2.5
between the three.

Anyone having any ideas for why it is like this and what
can be done about it?

Arne

PS: And before anyone jump at the great Quercus numbers - yes
Quercus is a very nice product, but Resin stopped development
many years ago and it is stuck at PHP 5.x - so it is only
a solution for DIY PHP 5.x code not a solution for any
recent version of common MVC frameworks like Lareval.
Dan Cross
2024-09-24 21:09:25 UTC
Permalink
Post by Arne Vajhøj
I am not impressed by Apache + mod_php performance on VMS.
The basic numbers I see (simple PHP code for getting some data
Apache + CGI : 4 req/sec = 240 req/min
Apache + mod_php : 11 req/sec = 660 req/min
Tomcat + Quercus : 127 req/sec = 7620 req/min
(VMS x86-64 9.2-2, Apache 2.4-58, Berryman PHP 8.1,
Java 8u372, Tomcat 8.5-89, Quercus 4.0)
That CGI is slow is no surprise. Using CGI for performance
is like doing 100 meter crawl dressed in medieval armor.
But I had expected much better numbers for mod_php. Instead
of the actual x2.5 and x10 I had expected like x10 and x2.5
between the three.
Anyone having any ideas for why it is like this and what
can be done about it?
Did you try running your test script under the PHP interpreter
directly, without the web stack? What kind of QPS numbers do
you see if it's just PHP talking to MySQL?

With no further details, I'd wonder if you're not caching
connections to the database between queries.

- Dan C.
Arne Vajhøj
2024-09-25 00:53:22 UTC
Permalink
Post by Dan Cross
Post by Arne Vajhøj
I am not impressed by Apache + mod_php performance on VMS.
The basic numbers I see (simple PHP code for getting some data
Apache + CGI : 4 req/sec = 240 req/min
Apache + mod_php : 11 req/sec = 660 req/min
Tomcat + Quercus : 127 req/sec = 7620 req/min
(VMS x86-64 9.2-2, Apache 2.4-58, Berryman PHP 8.1,
Java 8u372, Tomcat 8.5-89, Quercus 4.0)
That CGI is slow is no surprise. Using CGI for performance
is like doing 100 meter crawl dressed in medieval armor.
But I had expected much better numbers for mod_php. Instead
of the actual x2.5 and x10 I had expected like x10 and x2.5
between the three.
Anyone having any ideas for why it is like this and what
can be done about it?
Did you try running your test script under the PHP interpreter
directly, without the web stack? What kind of QPS numbers do
you see if it's just PHP talking to MySQL?
Just executing the same PHP code in a loop give much higher
performance.

Single process : 158 executions per second = 9480 executions per minute

And multi process could probably get significantly higher.
Post by Dan Cross
With no further details, I'd wonder if you're not caching
connections to the database between queries.
Does not matter.

I just found out that Tomcat+Quercus numbers get even higher
after some warmup.

no db con pool db con pool
Apache + CGI 4 N/A
Apache + mod_php 11 11
Tomcat + Quercus 208 214

Arne
Dan Cross
2024-09-25 12:48:46 UTC
Permalink
Post by Arne Vajhøj
Post by Dan Cross
Post by Arne Vajhøj
I am not impressed by Apache + mod_php performance on VMS.
The basic numbers I see (simple PHP code for getting some data
Apache + CGI : 4 req/sec = 240 req/min
Apache + mod_php : 11 req/sec = 660 req/min
Tomcat + Quercus : 127 req/sec = 7620 req/min
(VMS x86-64 9.2-2, Apache 2.4-58, Berryman PHP 8.1,
Java 8u372, Tomcat 8.5-89, Quercus 4.0)
That CGI is slow is no surprise. Using CGI for performance
is like doing 100 meter crawl dressed in medieval armor.
But I had expected much better numbers for mod_php. Instead
of the actual x2.5 and x10 I had expected like x10 and x2.5
between the three.
Anyone having any ideas for why it is like this and what
can be done about it?
Did you try running your test script under the PHP interpreter
directly, without the web stack? What kind of QPS numbers do
you see if it's just PHP talking to MySQL?
Just executing the same PHP code in a loop give much higher
performance.
Single process : 158 executions per second = 9480 executions per minute
And multi process could probably get significantly higher.
So this suggests that your PHP code, by itself, is not the
bottleneck, though it remains unclear to me what you mean when
you say, "just executing the same PHP code in a loop...": does
this mean that you're running the PHP interpreter itself in a
loop? As in, starting it fresh on every iteration? Or does
this mean that you've got a loop inside the PHP program that
runs your test and you're measuring the throughput of that? And
is this standalone, or executed under the web framework? That
is, are you running this under Apache and hitting some query
that then causes the PHP interpreter to repeatedly query the
database?
Post by Arne Vajhøj
Post by Dan Cross
With no further details, I'd wonder if you're not caching
connections to the database between queries.
Does not matter.
Surely it does. If, for whatever reason, you're not holding
onto the connection to the database between queries, but rather,
re-establishing it each time, that will obviously have overhead
that will impact performance.

Or perhaps you're saying this because of some unstated
assumption alluded to in the questions above?
Post by Arne Vajhøj
I just found out that Tomcat+Quercus numbers get even higher
after some warmup.
no db con pool db con pool
Apache + CGI 4 N/A
Apache + mod_php 11 11
Tomcat + Quercus 208 214
That's nice, but that seems irrelevant to the question of why
PHP under Apache is so slow.

Perhaps a simpler question: what sort of throughput does Apache
on VMS give you if you just hit a simple static resource
repeatedly?

- Dan C.
Arne Vajhøj
2024-09-25 15:49:13 UTC
Permalink
Post by Dan Cross
Post by Arne Vajhøj
Post by Dan Cross
Post by Arne Vajhøj
I am not impressed by Apache + mod_php performance on VMS.
The basic numbers I see (simple PHP code for getting some data
Apache + CGI : 4 req/sec = 240 req/min
Apache + mod_php : 11 req/sec = 660 req/min
Tomcat + Quercus : 127 req/sec = 7620 req/min
(VMS x86-64 9.2-2, Apache 2.4-58, Berryman PHP 8.1,
Java 8u372, Tomcat 8.5-89, Quercus 4.0)
That CGI is slow is no surprise. Using CGI for performance
is like doing 100 meter crawl dressed in medieval armor.
But I had expected much better numbers for mod_php. Instead
of the actual x2.5 and x10 I had expected like x10 and x2.5
between the three.
Anyone having any ideas for why it is like this and what
can be done about it?
Did you try running your test script under the PHP interpreter
directly, without the web stack? What kind of QPS numbers do
you see if it's just PHP talking to MySQL?
Just executing the same PHP code in a loop give much higher
performance.
Single process : 158 executions per second = 9480 executions per minute
And multi process could probably get significantly higher.
So this suggests that your PHP code, by itself, is not the
bottleneck,
The PHP code is very simple: read 3 rows from a database table
and output 35 lines of HTML.
Post by Dan Cross
though it remains unclear to me what you mean when
you say, "just executing the same PHP code in a loop...": does
this mean that you're running the PHP interpreter itself in a
loop? As in, starting it fresh on every iteration? Or does
this mean that you've got a loop inside the PHP program that
runs your test and you're measuring the throughput of that? And
is this standalone, or executed under the web framework? That
is, are you running this under Apache and hitting some query
that then causes the PHP interpreter to repeatedly query the
database?
PHP script with a loop executing the same code as the web
request inside the loop. PHP script run command line.
No Apache or mod_php involved.
Post by Dan Cross
Post by Arne Vajhøj
Post by Dan Cross
With no further details, I'd wonder if you're not caching
connections to the database between queries.
Does not matter.
Surely it does. If, for whatever reason, you're not holding
onto the connection to the database between queries, but rather,
re-establishing it each time, that will obviously have overhead
that will impact performance.
Or perhaps you're saying this because of some unstated
assumption alluded to in the questions above?
I am saying this because the numbers were the same. 11 req/sec
in both cases.
Post by Dan Cross
Post by Arne Vajhøj
I just found out that Tomcat+Quercus numbers get even higher
after some warmup.
no db con pool db con pool
Apache + CGI 4 N/A
Apache + mod_php 11 11
Tomcat + Quercus 208 214
That's nice, but that seems irrelevant to the question of why
PHP under Apache is so slow.
You brought up the topic, so I tested.
Post by Dan Cross
Perhaps a simpler question: what sort of throughput does Apache
on VMS give you if you just hit a simple static resource
repeatedly?
Now it becomes interesting.

nop.php also gives 11 req/sec.

And nop.txt also gives 11 req/sec.

So the arrow is definitely pointing towards Apache.

So either something to speed up Apache or switching to WASD or OSU.

Arne
Dan Cross
2024-09-25 18:41:17 UTC
Permalink
[snip]
Post by Dan Cross
Post by Arne Vajhøj
Just executing the same PHP code in a loop give much higher
performance.
Single process : 158 executions per second = 9480 executions per minute
And multi process could probably get significantly higher.
So this suggests that your PHP code, by itself, is not the
bottleneck,
The PHP code is very simple: read 3 rows from a database table
and output 35 lines of HTML.
Post by Dan Cross
though it remains unclear to me what you mean when
you say, "just executing the same PHP code in a loop...": does
this mean that you're running the PHP interpreter itself in a
loop? As in, starting it fresh on every iteration? Or does
this mean that you've got a loop inside the PHP program that
runs your test and you're measuring the throughput of that? And
is this standalone, or executed under the web framework? That
is, are you running this under Apache and hitting some query
that then causes the PHP interpreter to repeatedly query the
database?
PHP script with a loop executing the same code as the web
request inside the loop. PHP script run command line.
No Apache or mod_php involved.
So PHP talking to your database seems fine, then.
Post by Dan Cross
Post by Arne Vajhøj
Post by Dan Cross
With no further details, I'd wonder if you're not caching
connections to the database between queries.
Does not matter.
Surely it does. If, for whatever reason, you're not holding
onto the connection to the database between queries, but rather,
re-establishing it each time, that will obviously have overhead
that will impact performance.
Or perhaps you're saying this because of some unstated
assumption alluded to in the questions above?
I am saying this because the numbers were the same. 11 req/sec
in both cases.
Oh, I see what you mean now. That was a statement of fact based
on your findings, not an assertion. Sorry, I missed your "db
con pool" numbers in your previous post.
Post by Dan Cross
Post by Arne Vajhøj
I just found out that Tomcat+Quercus numbers get even higher
after some warmup.
no db con pool db con pool
Apache + CGI 4 N/A
Apache + mod_php 11 11
Tomcat + Quercus 208 214
That's nice, but that seems irrelevant to the question of why
PHP under Apache is so slow.
You brought up the topic, so I tested.
Hmm, I just went back and looked at the thread, and I don't see
where I asked about Tomcat/Quercus.
Post by Dan Cross
Perhaps a simpler question: what sort of throughput does Apache
on VMS give you if you just hit a simple static resource
repeatedly?
Now it becomes interesting.
nop.php also gives 11 req/sec.
And nop.txt also gives 11 req/sec.
So the arrow is definitely pointing towards Apache.
I should think so. Lesson #1: always verify your base
assumptions when investigating something like this.
So either something to speed up Apache or switching to WASD or OSU.
Well, the question now becomes, "what makes Apache so slow?"

I would concentrate on your nop.txt test; I assume that's a
small (possibly empty) text file and as an example has the
fewest number of variables.

Do your logs give any indications of what might be going on?
For example, do the logs have host names in them, possibly
implying your stalling on reverse DNS lookups or something
similar?

- Dan C.
Arne Vajhøj
2024-09-25 21:10:43 UTC
Permalink
Post by Dan Cross
Post by Arne Vajhøj
Post by Dan Cross
Perhaps a simpler question: what sort of throughput does Apache
on VMS give you if you just hit a simple static resource
repeatedly?
Now it becomes interesting.
nop.php also gives 11 req/sec.
And nop.txt also gives 11 req/sec.
So the arrow is definitely pointing towards Apache.
I should think so. Lesson #1: always verify your base
assumptions when investigating something like this.
Post by Arne Vajhøj
So either something to speed up Apache or switching to WASD or OSU.
Well, the question now becomes, "what makes Apache so slow?"
I would concentrate on your nop.txt test; I assume that's a
small (possibly empty) text file and as an example has the
fewest number of variables.
Do your logs give any indications of what might be going on?
For example, do the logs have host names in them, possibly
implying your stalling on reverse DNS lookups or something
similar?
Just logging IP address.

It must be Apache.

Apache on VMS is prefork MPM. Yuck.

MaxSpareServers 10 -> 50
MaxClients 150 -> 300

actually did improve performance - double from 11 to 22
req/sec.

But the system did not like further increases. And besides
these numbers are absurd high to handle a simulator doing requests
from just 20 threads.

But not sure what else I can change.

Arne
Arne Vajhøj
2024-09-25 21:17:32 UTC
Permalink
Post by Arne Vajhøj
Apache on VMS is prefork MPM. Yuck.
Which puzzles me.

VMS is a threading OS not a forking OS.

And prefork MPM is really an early 90's traditional Unix design.

Using worker MPM on VMS would make more sense IMHO.

The best would probably have been to create a VMS MPM
based on the WinNT MPM.

Arne
Lawrence D'Oliveiro
2024-09-25 21:49:22 UTC
Permalink
Post by Arne Vajhøj
Using worker MPM on VMS would make more sense IMHO.
Server-side proxy is the way to go.

The suggestion to use PHP-FPM is basically along these lines, except that
the whole FastCGI protocol is IMHO best considered “legacy” at this point.
Proxying (or “reverse proxying”, if you prefer) makes use of standard
HTTP, and easily supports extras like WebSocket connections.
Lawrence D'Oliveiro
2024-09-25 22:24:43 UTC
Permalink
Post by Arne Vajhøj
Using worker MPM on VMS would make more sense IMHO.
According to the docs
<https://httpd.apache.org/docs/2.4/mod/worker.html>, this is a hybrid
multithread/multiprocess model. But threading won’t work with PHP, because
mod_php isn’t threadsafe.
Arne Vajhøj
2024-09-25 23:10:34 UTC
Permalink
Post by Lawrence D'Oliveiro
Post by Arne Vajhøj
Using worker MPM on VMS would make more sense IMHO.
According to the docs
<https://httpd.apache.org/docs/2.4/mod/worker.html>, this is a hybrid
multithread/multiprocess model.
Yes. And I think that would fit better with VMS.
Post by Lawrence D'Oliveiro
But threading won’t work with PHP, because
mod_php isn’t threadsafe.
The worker MPM works fine with PHP.

Two different ways:
A) Build mod_php and PHP extensions thread safe
B) Use fcgi or fpm

Option #A is common on Windows.

Option #B is common on Linux.

I think #A would fit better with VMS.

Arne
Lawrence D'Oliveiro
2024-09-26 00:21:26 UTC
Permalink
Post by Arne Vajhøj
The worker MPM works fine with PHP.
A) Build mod_php and PHP extensions thread safe
B) Use fcgi or fpm
Option #A is common on Windows.
Option #B is common on Linux.
I think #A would fit better with VMS.
I don’t think the Windows option is known for good performance. Just
saying ...
Craig A. Berry
2024-09-26 14:44:14 UTC
Permalink
Post by Arne Vajhøj
Post by Arne Vajhøj
Apache on VMS is prefork MPM. Yuck.
Which puzzles me.
VMS is a threading OS not a forking OS.
Preforking in Apache just means it creates subprocesses at start-up
time. Whoever invented the term apparently thought fork() was the only
way to create a subprocess. On VMS it will obviously use LIB$SPAWN or
SYS$CREPRC.
Post by Arne Vajhøj
And prefork MPM is really an early 90's traditional Unix design.
Using worker MPM on VMS would make more sense IMHO.
That requires everything running in each MPM process to be thread-safe.
It also probably doesn't provide the scaling advantages on VMS that it
would on unixen because there is no pthread_sigsetmask: all signals are
delivered in the main thread. Which means that somewhere around where
threads could provide a scaling advantage, the main thread will get
saturated and the advantage disappears. This based on the assumption
that signals would be used for things like asynchronous I/O completion;
I don't really know that for sure, but it seems like a pretty safe
assumption.
Post by Arne Vajhøj
The best would probably have been to create a VMS MPM
based on the WinNT MPM.
And all of the Apache extensions would have to be rewritten to use QIOs
and ASTs? That's a pretty big ask.
Arne Vajhøj
2024-09-26 14:55:00 UTC
Permalink
Post by Craig A. Berry
Post by Arne Vajhøj
Post by Arne Vajhøj
Apache on VMS is prefork MPM. Yuck.
Which puzzles me.
VMS is a threading OS not a forking OS.
Preforking in Apache just means it creates subprocesses at start-up
time.  Whoever invented the term apparently thought fork() was the only
way to create a subprocess.  On VMS it will obviously use LIB$SPAWN or
SYS$CREPRC.
Yes. But they behave different from fork.
Post by Craig A. Berry
Post by Arne Vajhøj
And prefork MPM is really an early 90's traditional Unix design.
Using worker MPM on VMS would make more sense IMHO.
That requires everything running in each MPM process to be thread-safe.
It also probably doesn't provide the scaling advantages on VMS that it
would on unixen because there is no pthread_sigsetmask: all signals are
delivered in the main thread. Which means that somewhere around where
threads could provide a scaling advantage, the main thread will get
saturated and the advantage disappears. This based on the assumption
that signals would be used for things like asynchronous I/O completion;
I don't really know that for sure, but it seems like a pretty safe
assumption.
Well - the threaded PHP engine that does run on VMS (Tomcat + Quercus)
performs much better, so I am optimistic. I am pretty sure that the
Java RT use standard socket IO and pthreads, so it must be possible
to achieve the same numbers in C.
Post by Craig A. Berry
Post by Arne Vajhøj
The best would probably have been to create a VMS MPM
based on the WinNT MPM.
And all of the Apache extensions would have to be rewritten to use QIOs
and ASTs?  That's a pretty big ask.
Why would extensions require being rewritten to use QIO's and
AST's?

Thread safe IO does not require those.

And most extensions are already available in a thread safe
version. Obviously no guarantee that it will build unchanged
on VMS, but ...

Arne
Lawrence D'Oliveiro
2024-09-26 20:40:03 UTC
Permalink
Post by Craig A. Berry
Whoever invented the term apparently thought fork() was the only
way to create a subprocess.
It is the most natural way in this case, because it creates a complete
copy of the parent process, which is what you want.
Post by Craig A. Berry
On VMS it will obviously use LIB$SPAWN or SYS$CREPRC.
Not only is that more expensive, it also requires additional setup to
recreate the effect of fork(2).
Arne Vajhøj
2024-09-26 23:15:40 UTC
Permalink
Post by Lawrence D'Oliveiro
Post by Craig A. Berry
Whoever invented the term apparently thought fork() was the only
way to create a subprocess.
It is the most natural way in this case, because it creates a complete
copy of the parent process, which is what you want.
Post by Craig A. Berry
On VMS it will obviously use LIB$SPAWN or SYS$CREPRC.
Not only is that more expensive, it also requires additional setup to
recreate the effect of fork(2).
They have one big advantage over fork on VMS.

They exist!

:-)

Arne
Lawrence D'Oliveiro
2024-09-26 23:22:00 UTC
Permalink
Post by Arne Vajhøj
Post by Lawrence D'Oliveiro
Post by Craig A. Berry
Whoever invented the term apparently thought fork() was the only
way to create a subprocess.
It is the most natural way in this case, because it creates a complete
copy of the parent process, which is what you want.
Post by Craig A. Berry
On VMS it will obviously use LIB$SPAWN or SYS$CREPRC.
Not only is that more expensive, it also requires additional setup to
recreate the effect of fork(2).
They have one big advantage over fork on VMS.
They exist!
Not enough to make up for the performance disadvantage, though ...
Craig A. Berry
2024-09-27 00:52:30 UTC
Permalink
Post by Lawrence D'Oliveiro
Post by Arne Vajhøj
Post by Lawrence D'Oliveiro
Post by Craig A. Berry
Whoever invented the term apparently thought fork() was the only
way to create a subprocess.
It is the most natural way in this case, because it creates a complete
copy of the parent process, which is what you want.
Post by Craig A. Berry
On VMS it will obviously use LIB$SPAWN or SYS$CREPRC.
Not only is that more expensive, it also requires additional setup to
recreate the effect of fork(2).
They have one big advantage over fork on VMS.
They exist!
Not enough to make up for the performance disadvantage, though ...
The ability to restart Apache in a second or so on Linux rather than a
couple seconds on VMS is nice but has nothing to do with the problem
Arne reported, which is about performance after the worker processes are
started.
Lawrence D'Oliveiro
2024-09-27 01:26:13 UTC
Permalink
Post by Craig A. Berry
The ability to restart Apache in a second or so on Linux rather than a
couple seconds on VMS is nice but has nothing to do with the problem
Arne reported, which is about performance after the worker processes are
started.
That could be to do with the more convoluted way to do nondeterministic
I/O under VMS.
Arne Vajhøj
2024-09-25 21:22:18 UTC
Permalink
Post by Arne Vajhøj
But the system did not like further increases.
And in case someone wonders what "did not like" means:

$ anal/crash

OpenVMS system dump analyzer
...analyzing an x86-64 interleaved memory dump...

%SDA-W-DUMPINCOMPL, the dump file write was not completed
%SDA-I-LMBEMPTY, empty "Non-Key Global Pages" LMB in file #1 at VBN 000006DE
%SDA-W-NOTSAVED, some processes not found in dump file
Dump taken on 25-SEP-2024 15:27:12.64 using version V9.2-2
RESEXH, Resources exhausted, system shutting down

Arne
Arne Vajhøj
2024-09-25 23:14:58 UTC
Permalink
Post by Arne Vajhøj
It must be Apache.
Apache on VMS is prefork MPM. Yuck.
MaxSpareServers 10 -> 50
MaxClients 150 -> 300
actually did improve performance - double from 11 to 22
req/sec.
But the system did not like further increases. And besides
these numbers are absurd high to handle a simulator doing requests
from just 20 threads.
But not sure what else I can change.
Does anyone know internals of Apache on VMS?

Based on some messages in error_log it looks like
it uses mailboxes APACHE$AWS_CONTROL_MBX_nn.

If every request result in mailbox comm between
master process and a child process, then that could
slow down things.

Arne
Arne Vajhøj
2024-09-26 00:57:44 UTC
Permalink
Based on some messages in error_log it looks like it uses mailboxes
APACHE$AWS_CONTROL_MBX_nn.
If every request result in mailbox comm between master process and a
child process, then that could slow down things.
I also wonder about the difference in handling nondeterministic
communication.
On *nix systems, you have poll/select (also other options like epoll or
kqueue, depending on the *nix variant) for monitoring multiple
communication channels at once, and all your files/pipes/sockets are
treated as byte streams.
On VMS, you have to have async QIO calls pending on every channel that you
want to monitor, and all the communication channels are treated as record-
oriented.
I don't recognize that.

On VMS you can use select if using the socket API and
IO$_SETMODE|IO$M_READATTN if using $QIO(W) API.

And both socket API and $QIO(W) API are stream oriented.

Arne
Lawrence D'Oliveiro
2024-09-26 01:52:57 UTC
Permalink
Post by Arne Vajhøj
On VMS you can use select if using the socket API and
IO$_SETMODE|IO$M_READATTN if using $QIO(W) API.
That sends an AST to tell you there is something to read. Extra mechanism
overhead.
Craig A. Berry
2024-09-26 12:38:35 UTC
Permalink
Post by Arne Vajhøj
Post by Arne Vajhøj
It must be Apache.
Apache on VMS is prefork MPM. Yuck.
MaxSpareServers 10 -> 50
MaxClients 150 -> 300
actually did improve performance - double from 11 to 22
req/sec.
But the system did not like further increases. And besides
these numbers are absurd high to handle a simulator doing requests
from just 20 threads.
But not sure what else I can change.
Does anyone know internals of Apache on VMS?
Based on some messages in error_log it looks like
it uses mailboxes APACHE$AWS_CONTROL_MBX_nn.
If every request result in mailbox comm between
master process and a child process, then that could
slow down things.
I vaguely remember that there was a separate image installed with
privileges that increased socket buffer size from 255 bytes to something
reasonable and these sockets were used as pipes for IPC.

The following links still seem to work if you want (old) sources:

https://ftp.hp.com/pub/openvms/apache/CSWS-V20-2-ALPHA-SRC-KIT.BCK_SFX_AXPEXE

https://ftp.hp.com/pub/openvms/apache/CSWS-V20-2-I64-SRC-KIT.BCK_SFX_I64EXE
Craig A. Berry
2024-09-26 14:30:04 UTC
Permalink
Post by Craig A. Berry
Post by Arne Vajhøj
Post by Arne Vajhøj
It must be Apache.
Apache on VMS is prefork MPM. Yuck.
MaxSpareServers 10 -> 50
MaxClients 150 -> 300
actually did improve performance - double from 11 to 22
req/sec.
But the system did not like further increases. And besides
these numbers are absurd high to handle a simulator doing requests
from just 20 threads.
But not sure what else I can change.
Does anyone know internals of Apache on VMS?
Based on some messages in error_log it looks like
it uses mailboxes APACHE$AWS_CONTROL_MBX_nn.
If every request result in mailbox comm between
master process and a child process, then that could
slow down things.
I vaguely remember that there was a separate image installed with
privileges that increased socket buffer size from 255 bytes to something
reasonable and these sockets were used as pipes for IPC.
https://ftp.hp.com/pub/openvms/apache/CSWS-V20-2-ALPHA-SRC-KIT.BCK_SFX_AXPEXE
https://ftp.hp.com/pub/openvms/apache/CSWS-V20-2-I64-SRC-KIT.BCK_SFX_I64EXE
Trial and error shows that *slightly* later versions are available at
the same place:

https://ftp.hp.com/pub/openvms/apache/CSWS-V21-1-ALPHA-SRC-KIT.BCK_SFX_AXPEXE

https://ftp.hp.com/pub/openvms/apache/CSWS-V21-1-I64-SRC-KIT.BCK_SFX_I64EXE

Dunno why VSI doesn't release source code for their v2.4 port.
Chris Townley
2024-09-26 15:59:57 UTC
Permalink
Post by Craig A. Berry
Post by Craig A. Berry
https://ftp.hp.com/pub/openvms/apache/CSWS-V20-2-ALPHA-SRC-KIT.BCK_SFX_AXPEXE
https://ftp.hp.com/pub/openvms/apache/CSWS-V20-2-I64-SRC-KIT.BCK_SFX_I64EXE
Trial and error shows that *slightly* later versions are available at
https://ftp.hp.com/pub/openvms/apache/CSWS-V21-1-ALPHA-SRC-KIT.BCK_SFX_AXPEXE
https://ftp.hp.com/pub/openvms/apache/CSWS-V21-1-I64-SRC-KIT.BCK_SFX_I64EXE
Dunno why VSI doesn't release source code for their v2.4 port.
ISTR that for all the VSI versions of opensource that requires source to
be available, they say they do not publish, will release source on
request. Not sure how we request, or if there will be any caveats...
--
Chris
Arne Vajhøj
2024-09-26 16:35:35 UTC
Permalink
Post by Chris Townley
Post by Craig A. Berry
Post by Craig A. Berry
https://ftp.hp.com/pub/openvms/apache/CSWS-V20-2-ALPHA-SRC-
KIT.BCK_SFX_AXPEXE
https://ftp.hp.com/pub/openvms/apache/CSWS-V20-2-I64-SRC-
KIT.BCK_SFX_I64EXE
Trial and error shows that *slightly* later versions are available at
https://ftp.hp.com/pub/openvms/apache/CSWS-V21-1-ALPHA-SRC-
KIT.BCK_SFX_AXPEXE
https://ftp.hp.com/pub/openvms/apache/CSWS-V21-1-I64-SRC-
KIT.BCK_SFX_I64EXE
Dunno why VSI doesn't release source code for their v2.4 port.
ISTR that for all the VSI versions of opensource that requires source to
be available, they say they do not publish, will release source on
request. Not sure how we request, or if there will be any caveats...
(they have published some: https://github.com/vmssoftware)

Apache httpd is under Apache license, which does not make such
requirement.

Arne
Arne Vajhøj
2024-09-27 01:20:38 UTC
Permalink
Post by Craig A. Berry
Post by Craig A. Berry
Post by Arne Vajhøj
Does anyone know internals of Apache on VMS?
Based on some messages in error_log it looks like
it uses mailboxes APACHE$AWS_CONTROL_MBX_nn.
If every request result in mailbox comm between
master process and a child process, then that could
slow down things.
I vaguely remember that there was a separate image installed with
privileges that increased socket buffer size from 255 bytes to something
reasonable and these sockets were used as pipes for IPC.
https://ftp.hp.com/pub/openvms/apache/CSWS-V20-2-ALPHA-SRC-
KIT.BCK_SFX_AXPEXE
https://ftp.hp.com/pub/openvms/apache/CSWS-V20-2-I64-SRC-
KIT.BCK_SFX_I64EXE
Trial and error shows that *slightly* later versions are available at
https://ftp.hp.com/pub/openvms/apache/CSWS-V21-1-ALPHA-SRC-
KIT.BCK_SFX_AXPEXE
https://ftp.hp.com/pub/openvms/apache/CSWS-V21-1-I64-SRC-KIT.BCK_SFX_I64EXE
I took a look.

It has:

[.httpd.server.mpm.prefork]prefork.c with 1350 lines calling fork
[.httpd.server.mpm.vms]prefork.c with 3900 lines calling sys$creprc

So it looks like the porting approach was to reimplement prefork
more or less from scratch for VMS.

A very quick search make me think that the mailbox is only used
for control not for data.

So I am still stuck.

Arne
Lawrence D'Oliveiro
2024-09-27 01:26:51 UTC
Permalink
A very quick search make me think that the mailbox is only used for
control not for data.
Could still be a bottleneck, though. That and the need for all the ASTs.
Arne Vajhøj
2024-09-27 02:17:29 UTC
Permalink
Post by Lawrence D'Oliveiro
A very quick search make me think that the mailbox is only used for
control not for data.
Could still be a bottleneck, though.
If it is only used for the parent to signal the child to terminate?
Post by Lawrence D'Oliveiro
That and the need for all the ASTs.
Why should AST's be a problem?

The "call this function when task is done" approach is
a very common design today. DEC was ahead of time with
that.

And implementation wise then the AST's worked on VAX 700
series 45 years ago. Todays systems are extremely much
faster - maybe a factor 10000 faster.

Arne
Lawrence D'Oliveiro
2024-09-27 02:36:18 UTC
Permalink
Post by Arne Vajhøj
Post by Lawrence D'Oliveiro
That and the need for all the ASTs.
Why should AST's be a problem?
The "call this function when task is done" approach is a very common
design today. DEC was ahead of time with that.
Set attention AST → wait for AST to trigger → queue actual I/O → wait for
AST to signal completion. Too many system calls and transitions back and
forth between user and kernel modes.

Note that Linux servers can efficiently handle thousands of concurrent
client connections without this.
Arne Vajhøj
2024-09-27 13:59:16 UTC
Permalink
Post by Lawrence D'Oliveiro
Post by Arne Vajhøj
Post by Lawrence D'Oliveiro
That and the need for all the ASTs.
Why should AST's be a problem?
The "call this function when task is done" approach is a very common
design today. DEC was ahead of time with that.
Set attention AST → wait for AST to trigger → queue actual I/O → wait for
AST to signal completion. Too many system calls and transitions back and
forth between user and kernel modes.
I don't think anyone would use that flow.

You setup a read attention AST, it triggers and then you know that
there are data to be read. There is no reason to make reading async
then, because you know it will not block.

Arne
Lawrence D'Oliveiro
2024-09-27 23:13:06 UTC
Permalink
You setup a read attention AST, it triggers and then you know that there
are data to be read. There is no reason to make reading async then,
because you know it will not block.
I wouldn’t bother with ASTs at all, though I would use async QIOs with I/O
status blocks. Here is my proposed control flow, which is very similar to
a *nix-style poll loop:

0) Clear the event flag you are going to use in the next step.
1) Start the initial set of async QIOs on all the channels I want to
monitor. Give them all the same event flag to set on completion (e.g. the
usual default EFN 0). Don’t specify any completion ASTs, but do specify
I/O status blocks.
2) Wait for the specified EFN to become set.
3) Clear that EFN.
4) Go through all your I/O status blocks, and process all I/Os that have
completed (status field ≠ 0). Queue new async I/Os for those channels (and
any new ones) as appropriate.
5) If you still have a nonempty set of async QIOs outstanding (i.e. a
nonempty set of channels being monitored), then go back to step 2.
Otherwise, you are shutting down, so stop.

How does that sound?

Hmmm ... I just realized ... doesn’t QIO immediately clear the EFN you
specify, before queueing the actual I/O request? That might blow the whole
thing ...
Simon Clubley
2024-09-27 13:00:33 UTC
Permalink
Post by Arne Vajhøj
Post by Lawrence D'Oliveiro
A very quick search make me think that the mailbox is only used for
control not for data.
Could still be a bottleneck, though.
If it is only used for the parent to signal the child to terminate?
Post by Lawrence D'Oliveiro
That and the need for all the ASTs.
Why should AST's be a problem?
The "call this function when task is done" approach is
a very common design today. DEC was ahead of time with
that.
And implementation wise then the AST's worked on VAX 700
series 45 years ago. Todays systems are extremely much
faster - maybe a factor 10000 faster.
Can you try this on an Alpha system (emulated or otherwise) and see
how the figures compare ?

Just wondering if this performance overhead is something that is x86-64
specific.

Simon.
--
Simon Clubley, ***@remove_me.eisner.decus.org-Earth.UFP
Walking destinations on a map are further away than they appear.
Arne Vajhøj
2024-09-27 14:26:02 UTC
Permalink
Post by Simon Clubley
Can you try this on an Alpha system (emulated or otherwise) and see
how the figures compare ?
Just wondering if this performance overhead is something that is x86-64
specific.
Getting a tiny text file on a slow Alpha emulator gives 5 req/sec.
Which is damn good compared to the 22 req/sec on 4 VCPU x86-64.
But then it is probably not a CPU issue.

I suspect that performance on that slow Alpha emulator would be
bad doing something more CPU intensive (like actually running PHP).

The interesting number comparison is:

Apache: 5 req/sec
OSU: 100 req/sec

Arne
Craig A. Berry
2024-09-27 13:18:07 UTC
Permalink
Post by Arne Vajhøj
Post by Lawrence D'Oliveiro
A very quick search make me think that the mailbox is only used for
control not for data.
Could still be a bottleneck, though.
If it is only used for the parent to signal the child to terminate?
Post by Lawrence D'Oliveiro
                                     That and the need for all the ASTs.
Why should AST's be a problem?
The "call this function when task is done" approach is
a very common design today. DEC was ahead of time with
that.
And implementation wise then the AST's worked on VAX 700
series 45 years ago. Todays systems are extremely much
faster - maybe a factor 10000 faster.
There are some limitations around ASTs, especially when mixed with
threads. The definitive wizard article is here:

https://forum.vmssoftware.com/viewtopic.php?t=5198

But I think your basic question is why Apache is slower than Tomcat,
right? The only thing I can think of that hasn't already been mentioned
is that Tomcat code is JIT-compiled, which is likely to be pretty good,
optimized code, whereas Apache is probably either cross-compiled or
native-compiled with an early enough field test compiler that there are
no optimizations.
Arne Vajhøj
2024-09-27 13:55:48 UTC
Permalink
Post by Craig A. Berry
Post by Arne Vajhøj
Post by Lawrence D'Oliveiro
A very quick search make me think that the mailbox is only used for
control not for data.
Could still be a bottleneck, though.
If it is only used for the parent to signal the child to terminate?
Post by Lawrence D'Oliveiro
                                     That and the need for all the ASTs.
Why should AST's be a problem?
The "call this function when task is done" approach is
a very common design today. DEC was ahead of time with
that.
And implementation wise then the AST's worked on VAX 700
series 45 years ago. Todays systems are extremely much
faster - maybe a factor 10000 faster.
There are some limitations around ASTs, especially when mixed with
https://forum.vmssoftware.com/viewtopic.php?t=5198
Technically interesting.

But I don't think it is a big problem. To me it is either an event
driven model with single thread and everything non-blocking where
AST's make sense or a thread model with multiple threads and everything
blocking and no need for AST's.
Post by Craig A. Berry
But I think your basic question is why Apache is slower than Tomcat,
right?
Yes - it can be worded that way.

The question is why Apache (with PHP but that does not seem to matter)
is so slow.

Tomcat (with Quercus to provide PHP support) having much higher
numbers on the same system proves that it is not HW or VMS.

Also Apache on Windows on same CPU have much higher numbers
(number of threads have been bumped).
Post by Craig A. Berry
  The only thing I can think of that hasn't already been mentioned
is that Tomcat code is JIT-compiled, which is likely to be pretty good,
optimized code, whereas Apache is probably either cross-compiled or
native-compiled with an early enough field test compiler that there are
no optimizations.
That is a possible explanation.

But the difference in numbers are crazy big.

Apache getting a static text file with 2 bytes: 22 req/sec

Tomcat with Quercus and PHP getting data out of a MySQL database on
Windows and outputting HTML: over 200 req/sec

Tomcat using JSP (which get triple compiled) getting data out of a MySQL
database on Windows (with db connection pool) and outputting HTML: over
600 req/sec.

My gut feeling is that cross-compilation may contribute to but not
fully explain the difference.

Arne
Dan Cross
2024-09-27 14:16:31 UTC
Permalink
Post by Arne Vajhøj
Post by Craig A. Berry
Post by Arne Vajhøj
Post by Lawrence D'Oliveiro
A very quick search make me think that the mailbox is only used for
control not for data.
Could still be a bottleneck, though.
If it is only used for the parent to signal the child to terminate?
Post by Lawrence D'Oliveiro
                                     That and the need for all the ASTs.
Why should AST's be a problem?
The "call this function when task is done" approach is
a very common design today. DEC was ahead of time with
that.
And implementation wise then the AST's worked on VAX 700
series 45 years ago. Todays systems are extremely much
faster - maybe a factor 10000 faster.
There are some limitations around ASTs, especially when mixed with
https://forum.vmssoftware.com/viewtopic.php?t=5198
Technically interesting.
But I don't think it is a big problem. To me it is either an event
driven model with single thread and everything non-blocking where
AST's make sense or a thread model with multiple threads and everything
blocking and no need for AST's.
Post by Craig A. Berry
But I think your basic question is why Apache is slower than Tomcat,
right?
Yes - it can be worded that way.
The question is why Apache (with PHP but that does not seem to matter)
is so slow.
Tomcat (with Quercus to provide PHP support) having much higher
numbers on the same system proves that it is not HW or VMS.
Also Apache on Windows on same CPU have much higher numbers
(number of threads have been bumped).
Post by Craig A. Berry
  The only thing I can think of that hasn't already been mentioned
is that Tomcat code is JIT-compiled, which is likely to be pretty good,
optimized code, whereas Apache is probably either cross-compiled or
native-compiled with an early enough field test compiler that there are
no optimizations.
That is a possible explanation.
But the difference in numbers are crazy big.
Apache getting a static text file with 2 bytes: 22 req/sec
Tomcat with Quercus and PHP getting data out of a MySQL database on
Windows and outputting HTML: over 200 req/sec
Tomcat using JSP (which get triple compiled) getting data out of a MySQL
database on Windows (with db connection pool) and outputting HTML: over
600 req/sec.
My gut feeling is that cross-compilation may contribute to but not
fully explain the difference.
Almost certainly not; this is an IO bound application, not CPU
bound.

My strong suspicion is that what you're seeing is the result of
a serious impedance mismatch between the multi-process model
Apache was written to use, and its realization using the event
signalling infrastructure on VMS. You're undoubtedly hitting
some sort of serialization point, but with added overhead; at
that point, Amdahl's law dominates.

Again, I would try to establish a baseline. Cut out the MPM
stuff as much as you can; ideally, see what kind of numbers you
can get fetching your text file from a single Apache process.
Simply adding more threads or worker processes is unlikely to
significantly increase performance, and indeed the numbers you
posted are typical of performance collapse one usually sees due
to some kind of contention bottleneck.

Some things to consider: are you creating a new network
connection for each incoming request? It's possible that that's
hitting a single listener, which is then trying to dispatch the
connection to an available worker, using some mechanism that is
slow on VMS. Is there a profiler available? If you can narrow
down where it's spending its time, that'd provide a huge clue.

- Dan C.
Arne Vajhøj
2024-09-27 16:41:03 UTC
Permalink
Post by Dan Cross
Post by Arne Vajhøj
Post by Craig A. Berry
  The only thing I can think of that hasn't already been mentioned
is that Tomcat code is JIT-compiled, which is likely to be pretty good,
optimized code, whereas Apache is probably either cross-compiled or
native-compiled with an early enough field test compiler that there are
no optimizations.
That is a possible explanation.
But the difference in numbers are crazy big.
Apache getting a static text file with 2 bytes: 22 req/sec
Tomcat with Quercus and PHP getting data out of a MySQL database on
Windows and outputting HTML: over 200 req/sec
Tomcat using JSP (which get triple compiled) getting data out of a MySQL
database on Windows (with db connection pool) and outputting HTML: over
600 req/sec.
My gut feeling is that cross-compilation may contribute to but not
fully explain the difference.
Almost certainly not; this is an IO bound application, not CPU
bound.
With static content yes.

With dynamic content and the volume Apache+mod_php delivers yes.

With dynamic content and high volume then CPU can matter. Tomcat
and Quercus can do over 200 req/sec, but CPU utilization fluctuate
between 150% and 250% - 4 VCPU used so not CPU bound, but could
have been if it had been just 2 VCPU.
Post by Dan Cross
My strong suspicion is that what you're seeing is the result of
a serious impedance mismatch between the multi-process model
Apache was written to use, and its realization using the event
signalling infrastructure on VMS.
Yes.

Or actually slightly worse.

Prefork MPM is the multi-process model used in Apache 1.x - it is still
around in Apache 2.x, but Apache 2.x on Linux use event or worker
MPM (that are a mix of processes and threads) and Apache 2.x on Windows
use winnt MPM (that is threads only).
Post by Dan Cross
Again, I would try to establish a baseline. Cut out the MPM
stuff as much as you can;
MPM is the core of the server.
Post by Dan Cross
ideally, see what kind of numbers you
can get fetching your text file from a single Apache process.
Simply adding more threads or worker processes is unlikely to
significantly increase performance, and indeed the numbers you
posted are typical of performance collapse one usually sees due
to some kind of contention bottleneck.
It increases but not enough.

1 -> 0.1 req/sec
150 -> 11 req/sec
300 -> 22 req/sec
Post by Dan Cross
Some things to consider: are you creating a new network
connection for each incoming request?
Yes. Having the load test program keep connections alive
would be misleading as real world clients would be on different
systems.
Post by Dan Cross
It's possible that that's
hitting a single listener, which is then trying to dispatch the
connection to an available worker,
That is the typical web server model.
Post by Dan Cross
using some mechanism that is
slow on VMS.
It is a good question how Apache on VMS is actually doing that.

All thread based solutions (OSU, Tomcat etc.) just pass a
pointer/reference in memory to the thread. Easy.

Fork create a process copy with the open socket. I am not quite
sure about the details of how it works, but it works.

If the model on VMS is:

---(HTTP)---parent---(IPC)---child

then it could explain being so slow.

I may have to read some of those bloody 3900 lines of code (in a
single file!).
Post by Dan Cross
Is there a profiler available? If you can narrow
down where it's spending its time, that'd provide a huge clue.
Or I take another path.

Arne
Dan Cross
2024-09-27 19:39:21 UTC
Permalink
Post by Arne Vajhøj
Post by Dan Cross
Post by Arne Vajhøj
Post by Craig A. Berry
  The only thing I can think of that hasn't already been mentioned
is that Tomcat code is JIT-compiled, which is likely to be pretty good,
optimized code, whereas Apache is probably either cross-compiled or
native-compiled with an early enough field test compiler that there are
no optimizations.
That is a possible explanation.
But the difference in numbers are crazy big.
Apache getting a static text file with 2 bytes: 22 req/sec
Tomcat with Quercus and PHP getting data out of a MySQL database on
Windows and outputting HTML: over 200 req/sec
Tomcat using JSP (which get triple compiled) getting data out of a MySQL
database on Windows (with db connection pool) and outputting HTML: over
600 req/sec.
My gut feeling is that cross-compilation may contribute to but not
fully explain the difference.
Almost certainly not; this is an IO bound application, not CPU
bound.
With static content yes.
Correct. That's all you ought to be looking at under you
understand why that's slow.
Post by Arne Vajhøj
With dynamic content and the volume Apache+mod_php delivers yes.
Maybe, but without a profile you really don't know. But beyond
that, it is currently irrelevant. You see approximately the
same numbers with static and dynamic content; this heavily
implies that the dynamic content case is not related to the
present slow-down, including it now is premature, and likely
just masks what's _actually_ wrong.
Post by Arne Vajhøj
With dynamic content and high volume then CPU can matter. Tomcat
and Quercus can do over 200 req/sec, but CPU utilization fluctuate
between 150% and 250% - 4 VCPU used so not CPU bound, but could
have been if it had been just 2 VCPU.
See above. You know that there's a problem with Apache and
static content, but you don't know _what_ that problem is. Why
would you jump ahead of yourself worrying about things like that
until you actually understand what's going on?

In this case, concentrating on static content, CPU time consumed
by Apache itself due to poor optimization or something seems
like a low-probability root cause of the performance problems
you are seeing, as static file service like this is IO, not
compute, bound. Keep your eye on the ball.
Post by Arne Vajhøj
Post by Dan Cross
My strong suspicion is that what you're seeing is the result of
a serious impedance mismatch between the multi-process model
Apache was written to use, and its realization using the event
signalling infrastructure on VMS.
Yes.
Maybe. You really haven't done enough investigation to know, at
least going by what you've reported here.
Post by Arne Vajhøj
Or actually slightly worse.
Prefork MPM is the multi-process model used in Apache 1.x - it is still
around in Apache 2.x, but Apache 2.x on Linux use event or worker
MPM (that are a mix of processes and threads) and Apache 2.x on Windows
use winnt MPM (that is threads only).
Ok, sure. But as you posted earlier, Apache on VMS, as you're
using it, is using the MPM model, no?
Post by Arne Vajhøj
Post by Dan Cross
Again, I would try to establish a baseline. Cut out the MPM
stuff as much as you can;
MPM is the core of the server.
No, you misunderstand. Try to cut down on contention due to
coordination between multiple entities; you do this by
_lowering_ the number of things at play (processes, threads,
whatever). The architecture of the server is irrelevant in
this case; what _is_ relevant is minimizing concurrency in its
_configuration_. Does that make sense?
Post by Arne Vajhøj
Post by Dan Cross
ideally, see what kind of numbers you
can get fetching your text file from a single Apache process.
Simply adding more threads or worker processes is unlikely to
significantly increase performance, and indeed the numbers you
posted are typical of performance collapse one usually sees due
to some kind of contention bottleneck.
It increases but not enough.
1 -> 0.1 req/sec
150 -> 11 req/sec
300 -> 22 req/sec
Post by Dan Cross
Some things to consider: are you creating a new network
connection for each incoming request?
Yes. Having the load test program keep connections alive
would be misleading as real world clients would be on different
systems.
Again, you're getting ahead of yourself. Try simulating a
single client making multiple, repeated tests to a single
server, ideally reusing a single HTTP connection. This will
tell you whether the issue is with query processing _inside_
the server, or if it has something to do with handling new
connections for each request. If you use HTTP keep alives
and the number of QPS jumps up, you've narrowed down your
search space. If it doesn't, you've eliminated one more
variable, and again, you've cut down on your search space.

Does that make sense?
Post by Arne Vajhøj
Post by Dan Cross
It's possible that that's
hitting a single listener, which is then trying to dispatch the
connection to an available worker,
That is the typical web server model.
No, it is _a_ common model, but not _the_ "typical" model. For
instance, many high-performance web solutions are built on an
asynchronous model, which effectively implement state machines
where state transitions yield callbacks that are distributed
across a collection of executor threads. There's no single
"worker" or dedicated handoff.

Moreover, there are many different _ways_ to implement the
"listener hands connection to worker" model, and it _may_ be
that the way that Apache on VMS is trying to do it is
inherently slow. We don't know, do we? But that's what we're
trying to figure out, and that's why I'm encouraging you to
start simply and build on what you can actually know from
observation, as opposed to faffing about making guesses.
Post by Arne Vajhøj
Post by Dan Cross
using some mechanism that is
slow on VMS.
It is a good question how Apache on VMS is actually doing that.
All thread based solutions (OSU, Tomcat etc.) just pass a
pointer/reference in memory to the thread. Easy.
Fork create a process copy with the open socket. I am not quite
sure about the details of how it works, but it works.
---(HTTP)---parent---(IPC)---child
then it could explain being so slow.
I may have to read some of those bloody 3900 lines of code (in a
single file!).
Precisely. And maybe run some more experiments.
Post by Arne Vajhøj
Post by Dan Cross
Is there a profiler available? If you can narrow
down where it's spending its time, that'd provide a huge clue.
Or I take another path.
This is a useful exercise either way; getting to the root cause
of a problem like this may teach you something you could apply
to other, similar, problems in the future.

- Dan C.
Dan Cross
2024-09-26 15:44:51 UTC
Permalink
Post by Arne Vajhøj
Post by Dan Cross
Post by Arne Vajhøj
Post by Dan Cross
Perhaps a simpler question: what sort of throughput does Apache
on VMS give you if you just hit a simple static resource
repeatedly?
Now it becomes interesting.
nop.php also gives 11 req/sec.
And nop.txt also gives 11 req/sec.
So the arrow is definitely pointing towards Apache.
I should think so. Lesson #1: always verify your base
assumptions when investigating something like this.
Post by Arne Vajhøj
So either something to speed up Apache or switching to WASD or OSU.
Well, the question now becomes, "what makes Apache so slow?"
I would concentrate on your nop.txt test; I assume that's a
small (possibly empty) text file and as an example has the
fewest number of variables.
Do your logs give any indications of what might be going on?
For example, do the logs have host names in them, possibly
implying your stalling on reverse DNS lookups or something
similar?
Just logging IP address.
It must be Apache.
Apache on VMS is prefork MPM. Yuck.
MaxSpareServers 10 -> 50
MaxClients 150 -> 300
actually did improve performance - double from 11 to 22
req/sec.
But the system did not like further increases. And besides
these numbers are absurd high to handle a simulator doing requests
from just 20 threads.
But not sure what else I can change.
My guess is that communications overhead is slowing things down.
What happens if you set these super low, ideally so there's a
single process handling requests, then see what sort of QPS
numbers you get for your trivial text file.

- Dan C.
Arne Vajhøj
2024-09-27 16:06:09 UTC
Permalink
Post by Dan Cross
Post by Arne Vajhøj
It must be Apache.
Apache on VMS is prefork MPM. Yuck.
MaxSpareServers 10 -> 50
MaxClients 150 -> 300
actually did improve performance - double from 11 to 22
req/sec.
But the system did not like further increases. And besides
these numbers are absurd high to handle a simulator doing requests
from just 20 threads.
But not sure what else I can change.
My guess is that communications overhead is slowing things down.
What happens if you set these super low, ideally so there's a
single process handling requests, then see what sort of QPS
numbers you get for your trivial text file.
I set it down to 1.

0.1 req/sec

Note that even if it had performed great then it would not
have been a solution because the real thing the PHP scripts
has significant latency when interacting with external database
so parallelization is a must.

Arne
Chris Townley
2024-09-27 16:28:45 UTC
Permalink
Post by Arne Vajhøj
Post by Dan Cross
Post by Arne Vajhøj
It must be Apache.
Apache on VMS is prefork MPM. Yuck.
MaxSpareServers 10 -> 50
MaxClients 150 -> 300
actually did improve performance - double from 11 to 22
req/sec.
But the system did not like further increases. And besides
these numbers are absurd high to handle a simulator doing requests
from just 20 threads.
But not sure what else I can change.
My guess is that communications overhead is slowing things down.
What happens if you set these super low, ideally so there's a
single process handling requests, then see what sort of QPS
numbers you get for your trivial text file.
I set it down to 1.
0.1 req/sec
Note that even if it had performed great then it would not
have been a solution because the real thing the PHP scripts
has significant latency when interacting with external database
so parallelization is a must.
Arne
I would have thought you would have done something with Java, then used
tomcat
--
Chris
Arne Vajhøj
2024-09-27 16:47:18 UTC
Permalink
Post by Chris Townley
I would have thought you would have done something with Java, then used
tomcat
If I were to create a web app on VMS to be used for a specific
purpose, then I would probably chose something JVM based.

JSF (that I actually like!), Spring MVC or Grails.

(JSF would mean Tomcat, Spring MVC either Tomcat or Spring Boot and
Grails either Tomcat or standalone)

But even though I have a preference for everything J, then
I am also interested in other stuff.

And now I wanted to take a deeper dive in Apache and PHP -
especially those low performance numbers.

Arne
Dan Cross
2024-09-27 19:16:31 UTC
Permalink
Post by Arne Vajhøj
Post by Dan Cross
Post by Arne Vajhøj
It must be Apache.
Apache on VMS is prefork MPM. Yuck.
MaxSpareServers 10 -> 50
MaxClients 150 -> 300
actually did improve performance - double from 11 to 22
req/sec.
But the system did not like further increases. And besides
these numbers are absurd high to handle a simulator doing requests
from just 20 threads.
But not sure what else I can change.
My guess is that communications overhead is slowing things down.
What happens if you set these super low, ideally so there's a
single process handling requests, then see what sort of QPS
numbers you get for your trivial text file.
I set it down to 1.
0.1 req/sec
So a single request takes 10 seconds? Or you can only make one
request every 10 seconds, but the time taken to process that
request is relatively small?
Post by Arne Vajhøj
Note that even if it had performed great then it would not
have been a solution because the real thing the PHP scripts
has significant latency when interacting with external database
so parallelization is a must.
We're not at the point of discussing solutions. You still don't
understand what the actual problem is; we're trying to figure
that out right now.

Again, it's about understanding the baseline performance
characteristics first. Your goal right now ought to be figure
out why requests for a simple static resource, like a text file,
are so slow; the point by trying something simple is to reduce
noise due to confounding variables.

The fact that this is as slow as it is tells you something. Had
this performed better, that would tell you something as well,
but in this case, you know that there's some sort of basic slow
down even in the simplest cases. If you can figure out why that
is, and address it, _then_ you move on to re-evaluating your
actual use case, and if necessary, investigate other slow downs.
But right now, there's little point in doing that: you know you
see a non-linear slowdown as you increase threads (you _did_
notice that, right?).

- Dan C.
Arne Vajhøj
2024-09-27 19:27:14 UTC
Permalink
Post by Dan Cross
Post by Arne Vajhøj
Post by Dan Cross
Post by Arne Vajhøj
It must be Apache.
Apache on VMS is prefork MPM. Yuck.
MaxSpareServers 10 -> 50
MaxClients 150 -> 300
actually did improve performance - double from 11 to 22
req/sec.
But the system did not like further increases. And besides
these numbers are absurd high to handle a simulator doing requests
from just 20 threads.
But not sure what else I can change.
My guess is that communications overhead is slowing things down.
What happens if you set these super low, ideally so there's a
single process handling requests, then see what sort of QPS
numbers you get for your trivial text file.
I set it down to 1.
0.1 req/sec
So a single request takes 10 seconds? Or you can only make one
request every 10 seconds, but the time taken to process that
request is relatively small?
It is throughput.

N / time it takes to get response for N requests

With 20 threads in client then there will always be 20 outstanding
requests.

Arne
Dan Cross
2024-09-27 19:40:48 UTC
Permalink
Post by Arne Vajhøj
Post by Dan Cross
Post by Arne Vajhøj
Post by Dan Cross
Post by Arne Vajhøj
It must be Apache.
Apache on VMS is prefork MPM. Yuck.
MaxSpareServers 10 -> 50
MaxClients 150 -> 300
actually did improve performance - double from 11 to 22
req/sec.
But the system did not like further increases. And besides
these numbers are absurd high to handle a simulator doing requests
from just 20 threads.
But not sure what else I can change.
My guess is that communications overhead is slowing things down.
What happens if you set these super low, ideally so there's a
single process handling requests, then see what sort of QPS
numbers you get for your trivial text file.
I set it down to 1.
0.1 req/sec
So a single request takes 10 seconds? Or you can only make one
request every 10 seconds, but the time taken to process that
request is relatively small?
It is throughput.
N / time it takes to get response for N requests
With 20 threads in client then there will always be 20 outstanding
requests.
How long does it take to serve a single request?

- Dan C.
Arne Vajhøj
2024-09-27 20:11:36 UTC
Permalink
Post by Dan Cross
Post by Arne Vajhøj
Post by Dan Cross
Post by Arne Vajhøj
Post by Dan Cross
Post by Arne Vajhøj
It must be Apache.
Apache on VMS is prefork MPM. Yuck.
MaxSpareServers 10 -> 50
MaxClients 150 -> 300
actually did improve performance - double from 11 to 22
req/sec.
But the system did not like further increases. And besides
these numbers are absurd high to handle a simulator doing requests
from just 20 threads.
But not sure what else I can change.
My guess is that communications overhead is slowing things down.
What happens if you set these super low, ideally so there's a
single process handling requests, then see what sort of QPS
numbers you get for your trivial text file.
I set it down to 1.
0.1 req/sec
So a single request takes 10 seconds? Or you can only make one
request every 10 seconds, but the time taken to process that
request is relatively small?
It is throughput.
N / time it takes to get response for N requests
With 20 threads in client then there will always be 20 outstanding
requests.
How long does it take to serve a single request?
Based on the above information it should be 200 seconds.

But it is actually more like 340 seconds. So apparently the 0.1
req/sec is rounded up a bit.

Arne
Dan Cross
2024-09-27 20:13:50 UTC
Permalink
Post by Arne Vajhøj
Post by Dan Cross
Post by Arne Vajhøj
Post by Dan Cross
Post by Arne Vajhøj
Post by Dan Cross
Post by Arne Vajhøj
It must be Apache.
Apache on VMS is prefork MPM. Yuck.
MaxSpareServers 10 -> 50
MaxClients 150 -> 300
actually did improve performance - double from 11 to 22
req/sec.
But the system did not like further increases. And besides
these numbers are absurd high to handle a simulator doing requests
from just 20 threads.
But not sure what else I can change.
My guess is that communications overhead is slowing things down.
What happens if you set these super low, ideally so there's a
single process handling requests, then see what sort of QPS
numbers you get for your trivial text file.
I set it down to 1.
0.1 req/sec
So a single request takes 10 seconds? Or you can only make one
request every 10 seconds, but the time taken to process that
request is relatively small?
It is throughput.
N / time it takes to get response for N requests
With 20 threads in client then there will always be 20 outstanding
requests.
How long does it take to serve a single request?
Based on the above information it should be 200 seconds.
But it is actually more like 340 seconds. So apparently the 0.1
req/sec is rounded up a bit.
Ok, just to clarify, you hit the web server with a single
request for a small static resource, while no other traffic was
hitting it, and that request took more than _five minutes_ to
complete?

- Dan C.
Arne Vajhøj
2024-09-27 20:22:02 UTC
Permalink
Post by Dan Cross
Post by Arne Vajhøj
Post by Dan Cross
Post by Arne Vajhøj
Post by Dan Cross
Post by Arne Vajhøj
Post by Dan Cross
Post by Arne Vajhøj
It must be Apache.
Apache on VMS is prefork MPM. Yuck.
MaxSpareServers 10 -> 50
MaxClients 150 -> 300
actually did improve performance - double from 11 to 22
req/sec.
But the system did not like further increases. And besides
these numbers are absurd high to handle a simulator doing requests
from just 20 threads.
But not sure what else I can change.
My guess is that communications overhead is slowing things down.
What happens if you set these super low, ideally so there's a
single process handling requests, then see what sort of QPS
numbers you get for your trivial text file.
I set it down to 1.
0.1 req/sec
So a single request takes 10 seconds? Or you can only make one
request every 10 seconds, but the time taken to process that
request is relatively small?
It is throughput.
N / time it takes to get response for N requests
With 20 threads in client then there will always be 20 outstanding
requests.
How long does it take to serve a single request?
Based on the above information it should be 200 seconds.
But it is actually more like 340 seconds. So apparently the 0.1
req/sec is rounded up a bit.
Ok, just to clarify, you hit the web server with a single
request for a small static resource, while no other traffic was
hitting it, and that request took more than _five minutes_ to
complete?
340 seconds is with 20 client threads.

With 1 client thread time is 17 seconds.

As expected.

Arne
Dan Cross
2024-09-27 20:50:00 UTC
Permalink
Post by Arne Vajhøj
Post by Dan Cross
Post by Arne Vajhøj
Post by Dan Cross
Post by Arne Vajhøj
Post by Dan Cross
Post by Arne Vajhøj
Post by Dan Cross
Post by Arne Vajhøj
It must be Apache.
Apache on VMS is prefork MPM. Yuck.
MaxSpareServers 10 -> 50
MaxClients 150 -> 300
actually did improve performance - double from 11 to 22
req/sec.
But the system did not like further increases. And besides
these numbers are absurd high to handle a simulator doing requests
from just 20 threads.
But not sure what else I can change.
My guess is that communications overhead is slowing things down.
What happens if you set these super low, ideally so there's a
single process handling requests, then see what sort of QPS
numbers you get for your trivial text file.
I set it down to 1.
0.1 req/sec
So a single request takes 10 seconds? Or you can only make one
request every 10 seconds, but the time taken to process that
request is relatively small?
It is throughput.
N / time it takes to get response for N requests
With 20 threads in client then there will always be 20 outstanding
requests.
How long does it take to serve a single request?
Based on the above information it should be 200 seconds.
But it is actually more like 340 seconds. So apparently the 0.1
req/sec is rounded up a bit.
Ok, just to clarify, you hit the web server with a single
request for a small static resource, while no other traffic was
hitting it, and that request took more than _five minutes_ to
complete?
340 seconds is with 20 client threads.
With 1 client thread time is 17 seconds.
So again, to clarify, the time to issue one request against an
otherwise idle server and retrieve a small amount of static data
in response to that request is 17 seconds?
Post by Arne Vajhøj
As expected.
It is always good to verify. I might add that there's no
relevant environment where that's a reasonable expectation.

- Dan C.
Arne Vajhøj
2024-09-27 21:10:05 UTC
Permalink
Post by Dan Cross
Post by Arne Vajhøj
Post by Dan Cross
Post by Arne Vajhøj
Post by Dan Cross
Post by Arne Vajhøj
Post by Dan Cross
Post by Arne Vajhøj
Post by Dan Cross
Post by Arne Vajhøj
It must be Apache.
Apache on VMS is prefork MPM. Yuck.
MaxSpareServers 10 -> 50
MaxClients 150 -> 300
actually did improve performance - double from 11 to 22
req/sec.
But the system did not like further increases. And besides
these numbers are absurd high to handle a simulator doing requests
from just 20 threads.
But not sure what else I can change.
My guess is that communications overhead is slowing things down.
What happens if you set these super low, ideally so there's a
single process handling requests, then see what sort of QPS
numbers you get for your trivial text file.
I set it down to 1.
0.1 req/sec
So a single request takes 10 seconds? Or you can only make one
request every 10 seconds, but the time taken to process that
request is relatively small?
It is throughput.
N / time it takes to get response for N requests
With 20 threads in client then there will always be 20 outstanding
requests.
How long does it take to serve a single request?
Based on the above information it should be 200 seconds.
But it is actually more like 340 seconds. So apparently the 0.1
req/sec is rounded up a bit.
Ok, just to clarify, you hit the web server with a single
request for a small static resource, while no other traffic was
hitting it, and that request took more than _five minutes_ to
complete?
340 seconds is with 20 client threads.
With 1 client thread time is 17 seconds.
So again, to clarify, the time to issue one request against an
otherwise idle server and retrieve a small amount of static data
in response to that request is 17 seconds?
Yes.
Post by Dan Cross
Post by Arne Vajhøj
As expected.
It is always good to verify. I might add that there's no
relevant environment where that's a reasonable expectation.
I was referring the math 17 = 340 / 20.

There is nothing reasonable about those numbers.

Arne
Dan Cross
2024-09-27 21:18:19 UTC
Permalink
Post by Arne Vajhøj
[snip]
So again, to clarify, the time to issue one request against an
otherwise idle server and retrieve a small amount of static data
in response to that request is 17 seconds?
Yes.
Post by Arne Vajhøj
As expected.
It is always good to verify. I might add that there's no
relevant environment where that's a reasonable expectation.
I was referring the math 17 = 340 / 20.
There is nothing reasonable about those numbers.
Agreed. I reiterate my request from
https://comp.os.vms.narkive.com/uWy2ouua/apache-mod-php-performance#post50:
do those numbers change substantially if you send
multiple queries over a single connection?

- Dan C.
Lawrence D'Oliveiro
2024-09-27 23:16:20 UTC
Permalink
... PHP scripts has significant latency
when interacting with external database so parallelization is a must.
I haven’t noticed such latency (albeit mainly with intranet apps written
for an SME), but what I have noticed is getting the data with fewer
queries is faster than getting the same data with more queries.
Arne Vajhøj
2024-09-27 23:34:07 UTC
Permalink
Post by Lawrence D'Oliveiro
... PHP scripts has significant latency
when interacting with external database so parallelization is a must.
I haven’t noticed such latency (albeit mainly with intranet apps written
for an SME),
Sending SQL query from web server to database server over the network,
having database server find data and sending data from database server
to web server over the network takes time. Milliseconds.
Post by Lawrence D'Oliveiro
but what I have noticed is getting the data with fewer
queries is faster than getting the same data with more queries.
total latency from queries = number of queries * average latency of one
query

It adds up.

Arne
Lawrence D'Oliveiro
2024-09-27 23:55:38 UTC
Permalink
Post by Arne Vajhøj
Sending SQL query from web server to database server over the network,
having database server find data and sending data from database server
to web server over the network takes time. Milliseconds.
You know we can use AF_UNIX sockets within the same machine, right?
Arne Vajhøj
2024-09-28 00:13:01 UTC
Permalink
Post by Lawrence D'Oliveiro
Post by Arne Vajhøj
Sending SQL query from web server to database server over the network,
having database server find data and sending data from database server
to web server over the network takes time. Milliseconds.
You know we can use AF_UNIX sockets within the same machine, right?
If supported.

But in that case latency will be small. Microseconds.

But running application and database on same system is not
an option if it is a high volume solution or a high availability
solution with load sharing application and failover database.

Arne
Lawrence D'Oliveiro
2024-09-28 01:54:11 UTC
Permalink
Post by Arne Vajhøj
Post by Lawrence D'Oliveiro
Post by Arne Vajhøj
Sending SQL query from web server to database server over the network,
having database server find data and sending data from database server
to web server over the network takes time. Milliseconds.
You know we can use AF_UNIX sockets within the same machine, right?
If supported.
But in that case latency will be small. Microseconds.
Also loopback network connections should be similarly fast. It’s just that
Unix sockets allow peers to verify each other’s identity, to shortcut
authentication issues.
Post by Arne Vajhøj
But running application and database on same system is not an option if
it is a high volume solution ...
Of course it’s an option. Performance is a tradeoff between conflicting
system parameters.
Arne Vajhøj
2024-09-28 01:58:57 UTC
Permalink
Post by Lawrence D'Oliveiro
Post by Arne Vajhøj
But running application and database on same system is not an option if
it is a high volume solution ...
Of course it’s an option. Performance is a tradeoff between conflicting
system parameters.
If volume requires sharding then ...

Arne
Lawrence D'Oliveiro
2024-09-28 02:01:56 UTC
Permalink
Post by Arne Vajhøj
Post by Lawrence D'Oliveiro
Post by Arne Vajhøj
But running application and database on same system is not an option
if it is a high volume solution ...
Of course it’s an option. Performance is a tradeoff between conflicting
system parameters.
If volume requires sharding then ...
“Sharding” means “split across multiple physical persistent storage”.
Arne Vajhøj
2024-09-28 02:13:09 UTC
Permalink
Post by Lawrence D'Oliveiro
Post by Arne Vajhøj
Post by Lawrence D'Oliveiro
Post by Arne Vajhøj
But running application and database on same system is not an option
if it is a high volume solution ...
Of course it’s an option. Performance is a tradeoff between conflicting
system parameters.
If volume requires sharding then ...
“Sharding” means “split across multiple physical persistent storage”.
It means that you have N active database servers each with 1/N of the
data (possible with replication to N or 2N passive database servers).

Arne
Lawrence D'Oliveiro
2024-09-28 02:19:38 UTC
Permalink
Post by Arne Vajhøj
Post by Lawrence D'Oliveiro
Post by Arne Vajhøj
Post by Lawrence D'Oliveiro
Post by Arne Vajhøj
But running application and database on same system is not an option
if it is a high volume solution ...
Of course it’s an option. Performance is a tradeoff between
conflicting system parameters.
If volume requires sharding then ...
“Sharding” means “split across multiple physical persistent storage”.
It means that you have N active database servers each with 1/N of the
data (possible with replication to N or 2N passive database servers).
Quite unnecessary, given that the bottleneck is the usually the latency
and bandwidth of the persistent storage, not the CPU.

Particularly since your network connections introduce latency and
bandwidth limitations of their own.

See what I mean about “tradeoffs”?
Arne Vajhøj
2024-09-28 02:42:10 UTC
Permalink
Post by Lawrence D'Oliveiro
Post by Arne Vajhøj
Post by Lawrence D'Oliveiro
Post by Arne Vajhøj
If volume requires sharding then ...
“Sharding” means “split across multiple physical persistent storage”.
It means that you have N active database servers each with 1/N of the
data (possible with replication to N or 2N passive database servers).
Quite unnecessary, given that the bottleneck is the usually the latency
and bandwidth of the persistent storage, not the CPU.
I don't think Facebook could move their 1800 MySQL shards onto
a single server.
Post by Lawrence D'Oliveiro
Particularly since your network connections introduce latency and
bandwidth limitations of their own.
That is not the problem with shards. Applications are usually OK
with network latency.

The problem with shards is that not all data usage models fit
nicely with sharding.

If you need to get/update a row by primary key then sharding
works perfect - you go to the right server and just do it.

If you need to get/update a number of rows and you don't
know which servers they are on, then it means querying all
servers, which both create performance and consistency
problems.

Arne
Lawrence D'Oliveiro
2024-09-28 05:09:18 UTC
Permalink
If you need to get/update a row by primary key then sharding works
perfect - you go to the right server and just do it.
If you need to get/update a number of rows and you don't know which
servers they are on, then it means querying all servers, which both
create performance and consistency problems.
Well, you were the one who brought up sharding, not me ...
Arne Vajhøj
2024-09-28 01:54:54 UTC
Permalink
Post by Arne Vajhøj
Post by Lawrence D'Oliveiro
Post by Arne Vajhøj
Sending SQL query from web server to database server over the network,
having database server find data and sending data from database server
to web server over the network takes time. Milliseconds.
You know we can use AF_UNIX sockets within the same machine, right?
If supported.
But in that case latency will be small. Microseconds.
But running application and database on same system is not
an option if it is a high volume solution or a high availability
solution with load sharing application and failover database.
Note that besides the Unix sockets then some databases also
supports shared memory.

That includes both MS SQLserver and Oracle Rdb.

Oracle Rdb is interesting because it out of the box
supports loadsharing database, which makes application
and database on same servers a bit easier.

In that regard Rdb is pretty cool!

Arne
Lawrence D'Oliveiro
2024-09-28 02:03:18 UTC
Permalink
Note that besides the Unix sockets then some databases also supports
shared memory.
That includes both MS SQLserver and Oracle Rdb.
I wonder what are the biggest-scale applications where these products have
been deployed?

Facebook, for example, has billions of active users. And Facebook uses
MySQL.
Arne Vajhøj
2024-09-28 02:18:39 UTC
Permalink
Post by Lawrence D'Oliveiro
Note that besides the Unix sockets then some databases also supports
shared memory.
That includes both MS SQLserver and Oracle Rdb.
I wonder what are the biggest-scale applications where these products have
been deployed?
Rdb probably not so big. Despite the coolness factor.

SQLServer is used at a few high volume places like MS own Office 365 web
and StackExchange.
Post by Lawrence D'Oliveiro
Facebook, for example, has billions of active users. And Facebook uses
MySQL.
FaceBook have a lot of MySQL/MariaDB servers. Sharded! :-)

They also use some PostgreSQL.

And for the the real big data they use HBase.

Arne
Arne Vajhøj
2024-09-28 02:27:30 UTC
Permalink
Post by Arne Vajhøj
Post by Lawrence D'Oliveiro
Facebook, for example, has billions of active users. And Facebook uses
MySQL.
FaceBook have a lot of MySQL/MariaDB servers. Sharded! :-)
Also note that FaceBook are using a rather customized
version. RocksDB as storage engine not normal InnoDB.
And they have done some stuff to manage failover and
replication (Raft based??).

Arne
Lawrence D'Oliveiro
2024-09-28 05:07:49 UTC
Permalink
Post by Arne Vajhøj
Post by Lawrence D'Oliveiro
Facebook, for example, has billions of active users. And Facebook uses
MySQL.
FaceBook have a lot of MySQL/MariaDB servers. Sharded! :-)
Also note that FaceBook are using a rather customized version. RocksDB
as storage engine not normal InnoDB.
And they have done some stuff to manage failover and replication (Raft
based??).
And being Open Source, they can do that kind of thing.

They even developed their own PHP implementation, HHVM, which they have
open-sourced.
Arne Vajhøj
2024-09-28 13:12:53 UTC
Permalink
Post by Lawrence D'Oliveiro
Post by Arne Vajhøj
Post by Lawrence D'Oliveiro
Facebook, for example, has billions of active users. And Facebook uses
MySQL.
FaceBook have a lot of MySQL/MariaDB servers. Sharded! :-)
Also note that FaceBook are using a rather customized version. RocksDB
as storage engine not normal InnoDB.
And they have done some stuff to manage failover and replication (Raft
based??).
And being Open Source, they can do that kind of thing.
Yes.
Post by Lawrence D'Oliveiro
They even developed their own PHP implementation, HHVM, which they have
open-sourced.
Note that:

very old versions of HHVM: support PHP
old versions of HHVM: support PHP and Hack
recent versions of HHVM: support Hack

Hack being what Facebook thought PHP 7.x should have been.

Arne
Lawrence D'Oliveiro
2024-09-28 05:08:18 UTC
Permalink
Post by Arne Vajhøj
SQLServer is used at a few high volume places like MS own Office 365 web
and StackExchange.
I wonder if it was used as part of that London Stock Exchange system that
imploded so spectacularly?
Arne Vajhøj
2024-09-28 13:26:51 UTC
Permalink
Post by Lawrence D'Oliveiro
Post by Arne Vajhøj
SQLServer is used at a few high volume places like MS own Office 365 web
and StackExchange.
I wonder if it was used as part of that London Stock Exchange system that
imploded so spectacularly?
The TradElect system used by London Stock Exchange 2007-2011 (which was
not a success) used SQLServer 2000 as database.

Arne
Chris Townley
2024-09-28 15:21:43 UTC
Permalink
Post by Arne Vajhøj
Post by Lawrence D'Oliveiro
I wonder if it was used as part of that London Stock Exchange system that
imploded so spectacularly?
The TradElect system used by London Stock Exchange 2007-2011 (which was
not a success) used SQLServer 2000 as database.
Arne
and the NYSE outage last year was due to Berkshire Hathaway A shares
being valued too nigh!

Mind they fixed in hours
--
Chris
Lawrence D'Oliveiro
2024-09-28 21:33:49 UTC
Permalink
Post by Arne Vajhøj
Post by Lawrence D'Oliveiro
Post by Arne Vajhøj
SQLServer is used at a few high volume places like MS own Office 365
web and StackExchange.
I wonder if it was used as part of that London Stock Exchange system
that imploded so spectacularly?
The TradElect system used by London Stock Exchange 2007-2011 (which was
not a success) used SQLServer 2000 as database.
Not exactly a recommendation for mission-critical use, is it?
Arne Vajhøj
2024-09-28 23:11:52 UTC
Permalink
Post by Lawrence D'Oliveiro
Post by Arne Vajhøj
Post by Lawrence D'Oliveiro
Post by Arne Vajhøj
SQLServer is used at a few high volume places like MS own Office 365
web and StackExchange.
I wonder if it was used as part of that London Stock Exchange system
that imploded so spectacularly?
The TradElect system used by London Stock Exchange 2007-2011 (which was
not a success) used SQLServer 2000 as database.
Not exactly a recommendation for mission-critical use, is it?
The solution did not work well - it went down several times
when load were extraordinary high.

But it has never been documented exactly what the problem was.

SQLServer vs another RDBMS (most likely Oracle DB or IBM DB2) does
not make my top 6 of guesses. SQLServer was a rather mature
product at the time (partly due to Sybase heritage).

My top 6 guesses would be:

1) Implementation team consisting of general consultants
without domain expertise doing a poor job implementing.
2) The choice of a relative new technology at the time .NET
for implementation - the version is not known, but
the system went into production in 2007, so most
likely the project was started with .NET 1.1 (2003) and
not .NET 2.0 (2005) - new stuff and critical systems
is not a good combo because all bugs may not have been
sorted out yet and peoples understanding of the new stuff
may be limited
3) Inadequate hardware to handle unexpected high peak load
4) Decision to use GC language and RDBMS for something that
is let us call it soft real time
5) Other bad architectural decisions
6) Network issues - just because that was what LSE actually claimed

Arne
Craig A. Berry
2024-09-29 15:01:10 UTC
Permalink
Post by Lawrence D'Oliveiro
Post by Arne Vajhøj
Post by Lawrence D'Oliveiro
Post by Arne Vajhøj
SQLServer is used at a few high volume places like MS own Office 365
web and StackExchange.
I wonder if it was used as part of that London Stock Exchange system
that imploded so spectacularly?
The TradElect system used by London Stock Exchange 2007-2011 (which was
not a success) used SQLServer 2000 as database.
Not exactly a recommendation for mission-critical use, is it?
Saying SQL Server is a poor choice for anything based on one story about
SQL 2000 is like saying you should never use a web browser because
Netscape 1.0 had its problems. In the last 20+ years, Microsoft has
made massive investments in improving SQL Server's performance and
scalability, has ported it to Linux, and has added boatloads of new
features. Whether it would be first choice for a trading system I don't
know, but as Arne said, they run quite a bit of their own mission
critical stuff on it.
Arne Vajhøj
2024-09-29 15:41:23 UTC
Permalink
Post by Craig A. Berry
Post by Lawrence D'Oliveiro
Post by Arne Vajhøj
Post by Lawrence D'Oliveiro
Post by Arne Vajhøj
SQLServer is used at a few high volume places like MS own Office 365
web and StackExchange.
I wonder if it was used as part of that London Stock Exchange system
that imploded so spectacularly?
The TradElect system used by London Stock Exchange 2007-2011 (which was
not a success) used SQLServer 2000 as database.
Not exactly a recommendation for mission-critical use, is it?
Saying SQL Server is a poor choice for anything based on one story about
SQL 2000 is like saying you should never use a web browser because
Netscape 1.0 had its problems.  In the last 20+ years, Microsoft has
made massive investments in improving SQL Server's performance and
scalability, has ported it to Linux, and has added boatloads of new
features.  Whether it would be first choice for a trading system I don't
know, but as Arne said, they run quite a bit of their own mission
critical stuff on it.
I consider SQLServer a very nice database. I love the ability to
write CLR SP and UDF.

But a Windows license + SQLServer license for a big box is
not cheap.

(and despite SQLServer being available for Linux then I believe Windows
is still by far the most common)

Arne
Lawrence D'Oliveiro
2024-09-29 23:08:47 UTC
Permalink
Post by Craig A. Berry
Post by Lawrence D'Oliveiro
Post by Arne Vajhøj
Post by Lawrence D'Oliveiro
Post by Arne Vajhøj
SQLServer is used at a few high volume places like MS own Office 365
web and StackExchange.
I wonder if it was used as part of that London Stock Exchange system
that imploded so spectacularly?
The TradElect system used by London Stock Exchange 2007-2011 (which
was not a success) used SQLServer 2000 as database.
Not exactly a recommendation for mission-critical use, is it?
Saying SQL Server is a poor choice for anything based on one story about
SQL 2000 is like saying you should never use a web browser because
Netscape 1.0 had its problems. In the last 20+ years, Microsoft has
made massive investments in improving SQL Server's performance and
scalability, has ported it to Linux, and has added boatloads of new
features.
And yet the whole Windows Server thing is very definitely in its sunset
years -- at least the on-prem version. Did Microsoft’s reputation ever
recover from that screwup, especially after it made such a big, noisy deal
about winning such an important contract against a Linux alternative in
the first place? The “Highly Reliable Times” ad campaign, and all that
self-serving bullshit?

What was George C Scott’s line from “Dr Strangelove”:

“I don’t think it’s quite fair to condemn a whole program because
of a single slipup, sir.”

Remember what the slipup was ...
Arne Vajhøj
2024-09-28 00:07:13 UTC
Permalink
Post by Arne Vajhøj
It must be Apache.
Apache on VMS is prefork MPM. Yuck.
MaxSpareServers 10 -> 50
MaxClients 150 -> 300
actually did improve performance - double from 11 to 22
req/sec.
But the system did not like further increases. And besides
these numbers are absurd high to handle a simulator doing requests
from just 20 threads.
But not sure what else I can change.
And we have a solution.

httpd.conf

KeepAlive On
->
KeepAlive Off

And numbers improve dramatically.

nop.txt 281 req/sec
nop.php 176 req/sec
real PHP no db con pool 94 req/sec
real PHP db con pool 103 req/sec

Numbers are not great, but within acceptable.

It is a bug in the code.

Comment in httpd.conf say:

# KeepAlive: Whether or not to allow persistent connections (more than
# one request per connection). Set to "Off" to deactivate.

It does not say that it will reduce throughput to 1/10'th if on.

And note that keep alive was not needed for me, but it is needed in many
other scenarios:
- web pages with lots of graphics
- high volume server to server web services

Arne
Dan Cross
2024-09-28 00:38:18 UTC
Permalink
Post by Arne Vajhøj
Post by Arne Vajhøj
It must be Apache.
Apache on VMS is prefork MPM. Yuck.
MaxSpareServers 10 -> 50
MaxClients 150 -> 300
actually did improve performance - double from 11 to 22
req/sec.
But the system did not like further increases. And besides
these numbers are absurd high to handle a simulator doing requests
from just 20 threads.
But not sure what else I can change.
And we have a solution.
httpd.conf
KeepAlive On
->
KeepAlive Off
And numbers improve dramatically.
Hmm. You had already said that you were _Not_ using keep alives
because that would somehow mimic multiple machines querying
simultaneously.

This was, of course, the area of investigation I had suggested
to you previously to try and nail down the baseline. I question
whether this will impact your single query latency, however, or
whether this is masking it in your benchmark.
Post by Arne Vajhøj
nop.txt 281 req/sec
nop.php 176 req/sec
real PHP no db con pool 94 req/sec
real PHP db con pool 103 req/sec
Numbers are not great, but within acceptable.
What is your single query latency? Not calculated, but
actually measured.
Post by Arne Vajhøj
It is a bug in the code.
The evidence in hand is insufficient to make that claim.
Post by Arne Vajhøj
# KeepAlive: Whether or not to allow persistent connections (more than
# one request per connection). Set to "Off" to deactivate.
It does not say that it will reduce throughput to 1/10'th if on.
And note that keep alive was not needed for me, but it is needed in many
- web pages with lots of graphics
- high volume server to server web services
Actually, it's useful for any scenario in which you may send
several requests to the same server at roughly the same time,
such as an HTML document and separate CSS stylesheet, not just
graphics or "server to server web services".

- Dan C.
Arne Vajhøj
2024-09-28 01:11:15 UTC
Permalink
Post by Dan Cross
Post by Arne Vajhøj
Post by Arne Vajhøj
It must be Apache.
Apache on VMS is prefork MPM. Yuck.
MaxSpareServers 10 -> 50
MaxClients 150 -> 300
actually did improve performance - double from 11 to 22
req/sec.
But the system did not like further increases. And besides
these numbers are absurd high to handle a simulator doing requests
from just 20 threads.
But not sure what else I can change.
And we have a solution.
httpd.conf
KeepAlive On
->
KeepAlive Off
And numbers improve dramatically.
Hmm. You had already said that you were _Not_ using keep alives
because that would somehow mimic multiple machines querying
simultaneously.
That is correct.

The client was not using keep alive.

But the server was configured to support keep alive.

Turning that capability off on the server solved the performance
problem.

No changes on client.

No keep alive used before - no keep alive used after.

Just disabling the capability on the server.
Post by Dan Cross
This was, of course, the area of investigation I had suggested
to you previously to try and nail down the baseline. I question
whether this will impact your single query latency, however, or
whether this is masking it in your benchmark.
Post by Arne Vajhøj
nop.txt 281 req/sec
nop.php 176 req/sec
real PHP no db con pool 94 req/sec
real PHP db con pool 103 req/sec
Numbers are not great, but within acceptable.
What is your single query latency? Not calculated, but
actually measured.
It is a rather uninteresting number.

But easy to test. It obviously vary a bit, but they
are all in the 50-100 millisecond range.
Post by Dan Cross
Post by Arne Vajhøj
It is a bug in the code.
The evidence in hand is insufficient to make that claim.
I believe that server config supporting keep alive
causing performance to drop to 1/10'th for clients
not using keep alive is a bug.

Arne
Lawrence D'Oliveiro
2024-09-28 01:55:12 UTC
Permalink
I believe that server config supporting keep alive causing performance
to drop to 1/10'th for clients not using keep alive is a bug.
I would too, since the very reason keepalive was introduced was to improve
performance.
Dave Froble
2024-09-28 19:51:59 UTC
Permalink
Post by Arne Vajhøj
Post by Dan Cross
Post by Arne Vajhøj
Post by Arne Vajhøj
It must be Apache.
Apache on VMS is prefork MPM. Yuck.
MaxSpareServers 10 -> 50
MaxClients 150 -> 300
actually did improve performance - double from 11 to 22
req/sec.
But the system did not like further increases. And besides
these numbers are absurd high to handle a simulator doing requests
from just 20 threads.
But not sure what else I can change.
And we have a solution.
httpd.conf
KeepAlive On
->
KeepAlive Off
And numbers improve dramatically.
Hmm. You had already said that you were _Not_ using keep alives
because that would somehow mimic multiple machines querying
simultaneously.
That is correct.
The client was not using keep alive.
But the server was configured to support keep alive.
Turning that capability off on the server solved the performance
problem.
No changes on client.
No keep alive used before - no keep alive used after.
Just disabling the capability on the server.
Post by Dan Cross
This was, of course, the area of investigation I had suggested
to you previously to try and nail down the baseline. I question
whether this will impact your single query latency, however, or
whether this is masking it in your benchmark.
Post by Arne Vajhøj
nop.txt 281 req/sec
nop.php 176 req/sec
real PHP no db con pool 94 req/sec
real PHP db con pool 103 req/sec
Numbers are not great, but within acceptable.
What is your single query latency? Not calculated, but
actually measured.
It is a rather uninteresting number.
But easy to test. It obviously vary a bit, but they
are all in the 50-100 millisecond range.
Post by Dan Cross
Post by Arne Vajhøj
It is a bug in the code.
The evidence in hand is insufficient to make that claim.
I believe that server config supporting keep alive
causing performance to drop to 1/10'th for clients
not using keep alive is a bug.
Arne
Feature ...
--
David Froble Tel: 724-529-0450
Dave Froble Enterprises, Inc. E-Mail: ***@tsoft-inc.com
DFE Ultralights, Inc.
170 Grimplin Road
Vanderbilt, PA 15486
Arne Vajhøj
2024-09-29 14:59:20 UTC
Permalink
Post by Dan Cross
Post by Arne Vajhøj
And note that keep alive was not needed for me, but it is needed in many
- web pages with lots of graphics
- high volume server to server web services
Actually, it's useful for any scenario in which you may send
several requests to the same server at roughly the same time,
such as an HTML document and separate CSS stylesheet, not just
graphics or "server to server web services".
There is no difference in how graphics and CSS are handled,
so the benefits of reusing a connection is the same.

But there is a difference in number of requests. CSS will typical
be cached by the browser. So number of CSS requests will be a fraction
of number of HTML requests, while pages with lots of graphics
will have many graphics requests per HTML request.

Arne
Arne Vajhøj
2024-09-28 14:52:46 UTC
Permalink
Post by Arne Vajhøj
Post by Arne Vajhøj
It must be Apache.
Apache on VMS is prefork MPM. Yuck.
MaxSpareServers 10 -> 50
MaxClients 150 -> 300
actually did improve performance - double from 11 to 22
req/sec.
But the system did not like further increases. And besides
these numbers are absurd high to handle a simulator doing requests
from just 20 threads.
But not sure what else I can change.
And we have a solution.
httpd.conf
KeepAlive On
->
KeepAlive Off
And numbers improve dramatically.
nop.txt 281 req/sec
nop.php 176 req/sec
real PHP no db con pool 94 req/sec
real PHP db con pool 103 req/sec
Numbers are not great, but within acceptable.
It is a bug in the code.
# KeepAlive: Whether or not to allow persistent connections (more than
# one request per connection). Set to "Off" to deactivate.
It does not say that it will reduce throughput to 1/10'th if on.
Note that the problem may not impact anyone in
the real world.

I am simulating thousands of independent users using keep alive
with a single simulator not using keep alive.

It could very well be the case that the problem only arise for
the simulator and not for the real users.

Still weird though.

Arne
Arne Vajhøj
2024-09-29 00:50:21 UTC
Permalink
Post by Arne Vajhøj
Post by Arne Vajhøj
And we have a solution.
httpd.conf
KeepAlive On
->
KeepAlive Off
And numbers improve dramatically.
nop.txt 281 req/sec
nop.php 176 req/sec
real PHP no db con pool 94 req/sec
real PHP db con pool 103 req/sec
Numbers are not great, but within acceptable.
It is a bug in the code.
# KeepAlive: Whether or not to allow persistent connections (more than
# one request per connection). Set to "Off" to deactivate.
It does not say that it will reduce throughput to 1/10'th if on.
Note that the problem may not impact anyone in
the real world.
I am simulating thousands of independent users using keep alive
with a single simulator not using keep alive.
It could very well be the case that the problem only arise for
the simulator and not for the real users.
Still weird though.
Another update.

Client side can also impact keep alive.

HTTP 1.0 : no problem
HTTP 1.1 with "Connection: close" header : no problem
HTTP 1.1 without "Connection: close" header : problem

Server side:

KeepAlive On -> Off

solves the problem. But obviously has the drawback of loosing
keep alive capability.

Not a disaster. Back in the early 00's when prefork MPM was
common, then KeepAlive Off was sometimes suggested for high
volume sites. But inconvenient.

With KeepAlive On then we have a performance problem.

The cause is that worker processes are unavailable while
waiting for next request from client even though client is
long gone.

That indicates that the cap is:

max throughput (req/sec) = MaxClients / KeepAliveTimeout

The formula holds for low resulting throughput but it does
not scale and seems to be more like 1/3 of that for higher
resulting throughput.

But if one wants keep alive enabled, then it is something one
can work with.

My experiments indicate that:

KeepAlive On
KeepAliveTimeout 15 -> 1
MaxSpareServers 50 -> 300
MaxClients 150 -> 300

is almost acceptable.

nop.txt : 100 req/sec

And 1 second should be more than enough for a browser to request
additional assets within a static HTML page.

But having hundreds of processes each using 25 MB for serving a 2 byte
file at such a low throughput is ridiculous.

OSU (or WASD) still seems as a better option.

Arne
Arne Vajhøj
2024-09-29 00:55:37 UTC
Permalink
Post by Arne Vajhøj
With KeepAlive On then we have a performance problem.
The cause is that worker processes are unavailable while
waiting for next request from client even though client is
long gone.
max throughput (req/sec) = MaxClients / KeepAliveTimeout
The formula holds for low resulting throughput but it does
not scale and seems to be more like 1/3 of that for higher
resulting throughput.
But if one wants keep alive enabled, then it is something one
can work with.
KeepAlive On
KeepAliveTimeout 15 -> 1
MaxSpareServers 50 -> 300
MaxClients 150 -> 300
is almost acceptable.
nop.txt : 100 req/sec
MaxSpareServers increase is necessary to improve numbers. It seems
like without that then Apache spend too much time killings childs
when not needed and starting them when needed (and process creation
is expensive on VMS).

Downside is obviously that all those processes are kept even
if load is small for a long time.

Arne
Lawrence D'Oliveiro
2024-09-29 01:18:45 UTC
Permalink
Post by Arne Vajhøj
The cause is that worker processes are unavailable while
waiting for next request from client even though client is
long gone.
That shouldn’t matter, if the client closed the connection properly.

Also, why shouldn’t a worker handle a request for another client?
Arne Vajhøj
2024-09-29 01:31:22 UTC
Permalink
Post by Lawrence D'Oliveiro
Post by Arne Vajhøj
The cause is that worker processes are unavailable while
waiting for next request from client even though client is
long gone.
That shouldn’t matter, if the client closed the connection properly.
It doesn't know the client closed connection.
Post by Lawrence D'Oliveiro
Also, why shouldn’t a worker handle a request for another client?
This is singlethreaded all sync workers so when they wait for a new
request from an existing client, then they can't handle a new
client.

Arne
Lawrence D'Oliveiro
2024-09-29 01:41:31 UTC
Permalink
Post by Arne Vajhøj
Post by Lawrence D'Oliveiro
The cause is that worker processes are unavailable while waiting for
next request from client even though client is long gone.
That shouldn’t matter, if the client closed the connection properly.
It doesn't know the client closed connection.
That would only happen if the client crashed.

Note also that TLS connections require an explicit connection-closing
exchange at the end, to guard against data-truncation attacks.
Post by Arne Vajhøj
Post by Lawrence D'Oliveiro
Also, why shouldn’t a worker handle a request for another client?
This is singlethreaded all sync workers so when they wait for a new
request from an existing client, then they can't handle a new client.
Why not? The whole point of fork(2) is that all the processes are
effectively clones. If you put all the client context into shared memory
sections, then it becomes possible for any process to service any client.

Of course, I’m assuming that all the processes can share the same network
socket connections. This might not be true under VMS ...
Arne Vajhøj
2024-09-29 02:35:07 UTC
Permalink
Post by Lawrence D'Oliveiro
Post by Arne Vajhøj
Post by Lawrence D'Oliveiro
The cause is that worker processes are unavailable while waiting for
next request from client even though client is long gone.
That shouldn’t matter, if the client closed the connection properly.
It doesn't know the client closed connection.
That would only happen if the client crashed.
OK. Then it sounds like the client doesn't actually close the socket
but just stop using it and move on to a new connection.

I wonder why the HttpClient library does not do an actual close. But
it may be the best simulation of a browser - the browser doesn't know
if the user will want to view another page at the same site, so it
probably don't close the socket.

I also wonder why "Connection: close" header exist in request if the
client could just close the socket. But doesn't change how things
are.

Arne
Lawrence D'Oliveiro
2024-09-29 03:38:19 UTC
Permalink
Post by Arne Vajhøj
Post by Lawrence D'Oliveiro
Post by Arne Vajhøj
It doesn't know the client closed connection.
That would only happen if the client crashed.
OK. Then it sounds like the client doesn't actually close the socket
but just stop using it and move on to a new connection.
So your load simulator is buggy?
Arne Vajhøj
2024-09-29 14:46:12 UTC
Permalink
Post by Lawrence D'Oliveiro
Post by Arne Vajhøj
Post by Lawrence D'Oliveiro
Also, why shouldn’t a worker handle a request for another client?
This is singlethreaded all sync workers so when they wait for a new
request from an existing client, then they can't handle a new client.
Why not? The whole point of fork(2) is that all the processes are
effectively clones. If you put all the client context into shared memory
sections, then it becomes possible for any process to service any client.
Of course, I’m assuming that all the processes can share the same network
socket connections. This might not be true under VMS ...
That is not how Apache prefork MPM works.

It is not how the newer worker and event MPM's work either.

The end state sounds more like how single process multi-thread
works.

And I don't understand the "put all the client context into
shared memory" either. Are you saying that if socket descriptors
are put in shared memory then any process that map that memory
can use those sockets????

Arne
Lawrence D'Oliveiro
2024-09-29 22:57:11 UTC
Permalink
Post by Arne Vajhøj
That is not how Apache prefork MPM works.
If that’s not how it works, then how can you serve a client from more than
one member of the worker pool?
Post by Arne Vajhøj
And I don't understand the "put all the client context into shared
memory" either. Are you saying that if socket descriptors are put in
shared memory then any process that map that memory can use those
sockets????
No, but the shared-memory context can contain an index into a table of
socket descriptors in private per-process memory. If the process trying to
server a client context does not actually have a socket descriptor in the
slot for that context, it can ask for one.
Arne Vajhøj
2024-09-29 23:16:48 UTC
Permalink
Post by Lawrence D'Oliveiro
Post by Arne Vajhøj
That is not how Apache prefork MPM works.
If that’s not how it works, then how can you serve a client from more than
one member of the worker pool?
A new request on a new connection goes to another worker. No problem.
Post by Lawrence D'Oliveiro
Post by Arne Vajhøj
And I don't understand the "put all the client context into shared
memory" either. Are you saying that if socket descriptors are put in
shared memory then any process that map that memory can use those
sockets????
No, but the shared-memory context can contain an index into a table of
socket descriptors in private per-process memory. If the process trying to
server a client context does not actually have a socket descriptor in the
slot for that context, it can ask for one.
I still can't follow the idea.

Client X has a connection to server worker A. That means that A has
index 77 in shared memory that points to the socket descriptor for
the connection from X.

Worker B wants to serve X as well and it get index 77 from shared
memory. And then it does what?

Arne

Arne Vajhøj
2024-09-27 15:02:26 UTC
Permalink
Post by Arne Vajhøj
nop.php also gives 11 req/sec.
And nop.txt also gives 11 req/sec.
So the arrow is definitely pointing towards Apache.
So either something to speed up Apache or switching to WASD or OSU.
Increasing spare servers made it possible to increase performance for
nop.txt from 11 to 22 req/sec.

But OSU (with no tuning at all) nop.txt gives 373 req/sec. Which may
not be great, but is good enough for me.

Maybe I should try get PHP working with OSU or WASD.

Arne
Lawrence D'Oliveiro
2024-09-24 21:28:39 UTC
Permalink
Post by Arne Vajhøj
I am not impressed by Apache + mod_php performance on VMS.
I recall that Apache offers a choice of worker processes or worker
threads. I suspect mod_php is not thread-safe, so you have to use multiple
worker processes. And process creation on VMS is expensive.

I wonder how Nginx deals with this: I don’t think it can load the Apache-
specific mod_php, so it offloads PHP to a separate process and uses
“reverse proxying” (actually server-side proxying) instead.

Server-side proxying is the way to go, anyway: I use it for my Python code
now. It lets you manage your own process context, create your own
subprocesses/threads/tasks ... whatever you want.

And I can use WebSockets as well, which PHP has trouble supporting.
Arne Vajhøj
2024-09-25 00:24:42 UTC
Permalink
Post by Lawrence D'Oliveiro
Post by Arne Vajhøj
I am not impressed by Apache + mod_php performance on VMS.
I recall that Apache offers a choice of worker processes or worker
threads. I suspect mod_php is not thread-safe, so you have to use multiple
worker processes. And process creation on VMS is expensive.
Apache on VMS starts multiple processes. But they should
be reusable (otherwise it would be CGI reimplemented). The
client test app use 20 threads, so Apache should
only need to start 20 child processes and let them
process requests. So even though process creation indeed
is expensive on VMS, then it should not kill performance
like what I see.

Arne
Simon Clubley
2024-09-25 12:25:20 UTC
Permalink
Post by Arne Vajhøj
I am not impressed by Apache + mod_php performance on VMS.
Is the PHP FPM option available to you on VMS ?

https://www.php.net/manual/en/install.fpm.php

If it is, be aware that .htaccess no longer works to control PHP when
it is in FPM mode and you need to use .user.ini files instead:

https://www.php.net/manual/en/configuration.file.per-user.php

Simon.
--
Simon Clubley, ***@remove_me.eisner.decus.org-Earth.UFP
Walking destinations on a map are further away than they appear.
Arne Vajhøj
2024-09-25 13:24:06 UTC
Permalink
Post by Simon Clubley
Post by Arne Vajhøj
I am not impressed by Apache + mod_php performance on VMS.
Is the PHP FPM option available to you on VMS ?
I don't see anything in neither Apache or PHP to
indicate so.

Arne
Loading...