Post by Arne VajhøjPost by Dan CrossPost by Arne VajhøjPost by Craig A. Berry The only thing I can think of that hasn't already been mentioned
is that Tomcat code is JIT-compiled, which is likely to be pretty good,
optimized code, whereas Apache is probably either cross-compiled or
native-compiled with an early enough field test compiler that there are
no optimizations.
That is a possible explanation.
But the difference in numbers are crazy big.
Apache getting a static text file with 2 bytes: 22 req/sec
Tomcat with Quercus and PHP getting data out of a MySQL database on
Windows and outputting HTML: over 200 req/sec
Tomcat using JSP (which get triple compiled) getting data out of a MySQL
database on Windows (with db connection pool) and outputting HTML: over
600 req/sec.
My gut feeling is that cross-compilation may contribute to but not
fully explain the difference.
Almost certainly not; this is an IO bound application, not CPU
bound.
With static content yes.
Correct. That's all you ought to be looking at under you
understand why that's slow.
Post by Arne VajhøjWith dynamic content and the volume Apache+mod_php delivers yes.
Maybe, but without a profile you really don't know. But beyond
that, it is currently irrelevant. You see approximately the
same numbers with static and dynamic content; this heavily
implies that the dynamic content case is not related to the
present slow-down, including it now is premature, and likely
just masks what's _actually_ wrong.
Post by Arne VajhøjWith dynamic content and high volume then CPU can matter. Tomcat
and Quercus can do over 200 req/sec, but CPU utilization fluctuate
between 150% and 250% - 4 VCPU used so not CPU bound, but could
have been if it had been just 2 VCPU.
See above. You know that there's a problem with Apache and
static content, but you don't know _what_ that problem is. Why
would you jump ahead of yourself worrying about things like that
until you actually understand what's going on?
In this case, concentrating on static content, CPU time consumed
by Apache itself due to poor optimization or something seems
like a low-probability root cause of the performance problems
you are seeing, as static file service like this is IO, not
compute, bound. Keep your eye on the ball.
Post by Arne VajhøjPost by Dan CrossMy strong suspicion is that what you're seeing is the result of
a serious impedance mismatch between the multi-process model
Apache was written to use, and its realization using the event
signalling infrastructure on VMS.
Yes.
Maybe. You really haven't done enough investigation to know, at
least going by what you've reported here.
Post by Arne VajhøjOr actually slightly worse.
Prefork MPM is the multi-process model used in Apache 1.x - it is still
around in Apache 2.x, but Apache 2.x on Linux use event or worker
MPM (that are a mix of processes and threads) and Apache 2.x on Windows
use winnt MPM (that is threads only).
Ok, sure. But as you posted earlier, Apache on VMS, as you're
using it, is using the MPM model, no?
Post by Arne VajhøjPost by Dan CrossAgain, I would try to establish a baseline. Cut out the MPM
stuff as much as you can;
MPM is the core of the server.
No, you misunderstand. Try to cut down on contention due to
coordination between multiple entities; you do this by
_lowering_ the number of things at play (processes, threads,
whatever). The architecture of the server is irrelevant in
this case; what _is_ relevant is minimizing concurrency in its
_configuration_. Does that make sense?
Post by Arne VajhøjPost by Dan Crossideally, see what kind of numbers you
can get fetching your text file from a single Apache process.
Simply adding more threads or worker processes is unlikely to
significantly increase performance, and indeed the numbers you
posted are typical of performance collapse one usually sees due
to some kind of contention bottleneck.
It increases but not enough.
1 -> 0.1 req/sec
150 -> 11 req/sec
300 -> 22 req/sec
Post by Dan CrossSome things to consider: are you creating a new network
connection for each incoming request?
Yes. Having the load test program keep connections alive
would be misleading as real world clients would be on different
systems.
Again, you're getting ahead of yourself. Try simulating a
single client making multiple, repeated tests to a single
server, ideally reusing a single HTTP connection. This will
tell you whether the issue is with query processing _inside_
the server, or if it has something to do with handling new
connections for each request. If you use HTTP keep alives
and the number of QPS jumps up, you've narrowed down your
search space. If it doesn't, you've eliminated one more
variable, and again, you've cut down on your search space.
Does that make sense?
Post by Arne VajhøjPost by Dan CrossIt's possible that that's
hitting a single listener, which is then trying to dispatch the
connection to an available worker,
That is the typical web server model.
No, it is _a_ common model, but not _the_ "typical" model. For
instance, many high-performance web solutions are built on an
asynchronous model, which effectively implement state machines
where state transitions yield callbacks that are distributed
across a collection of executor threads. There's no single
"worker" or dedicated handoff.
Moreover, there are many different _ways_ to implement the
"listener hands connection to worker" model, and it _may_ be
that the way that Apache on VMS is trying to do it is
inherently slow. We don't know, do we? But that's what we're
trying to figure out, and that's why I'm encouraging you to
start simply and build on what you can actually know from
observation, as opposed to faffing about making guesses.
Post by Arne VajhøjPost by Dan Crossusing some mechanism that is
slow on VMS.
It is a good question how Apache on VMS is actually doing that.
All thread based solutions (OSU, Tomcat etc.) just pass a
pointer/reference in memory to the thread. Easy.
Fork create a process copy with the open socket. I am not quite
sure about the details of how it works, but it works.
---(HTTP)---parent---(IPC)---child
then it could explain being so slow.
I may have to read some of those bloody 3900 lines of code (in a
single file!).
Precisely. And maybe run some more experiments.
Post by Arne VajhøjPost by Dan CrossIs there a profiler available? If you can narrow
down where it's spending its time, that'd provide a huge clue.
Or I take another path.
This is a useful exercise either way; getting to the root cause
of a problem like this may teach you something you could apply
to other, similar, problems in the future.
- Dan C.