Discussion:
Itanium support is back in GCC 15
(too old to reply)
Simon Clubley
2024-11-04 18:26:20 UTC
Permalink
Itanium support will no longer be removed from GCC and Itanium will
instead continue as a supported architecture (at least for Linux).

https://www.theregister.com/2024/11/01/gcc_15_keep_itanium_support/

There's a call in that article for an open source full-system emulator.
Good luck with that one, especially for one that would run VMS as well. :-)

One question: Why ? :-)

Simon.
--
Simon Clubley, ***@remove_me.eisner.decus.org-Earth.UFP
Walking destinations on a map are further away than they appear.
Arne Vajhøj
2024-11-04 20:16:09 UTC
Permalink
Post by Simon Clubley
Itanium support will no longer be removed from GCC and Itanium will
instead continue as a supported architecture (at least for Linux).
https://www.theregister.com/2024/11/01/gcc_15_keep_itanium_support/
There's a call in that article for an open source full-system emulator.
Good luck with that one, especially for one that would run VMS as well. :-)
One question: Why ? :-)
Regarding why, then it seems obvious that there are no
good commercial reason for GCC to support Itanium, but
apparently someone is willing to do the work just for fun.

And in the open source world if someone is willing
to do the work for fun then it (usually) does happen.

And Itanium is rather different from most other
architectures, so from an academic perspective it
may be interesting.

I wish someone would volunteer to create VMS support
in GCC 16 or whatever!

Arne
gcalliet
2024-11-07 17:33:57 UTC
Permalink
Post by Arne Vajhøj
Post by Simon Clubley
Itanium support will no longer be removed from GCC and Itanium will
instead continue as a supported architecture (at least for Linux).
https://www.theregister.com/2024/11/01/gcc_15_keep_itanium_support/
There's a call in that article for an open source full-system emulator.
Good luck with that one, especially for one that would run VMS as well. :-)
One question: Why ? :-)
Regarding why, then it seems obvious that there are no
good commercial reason for GCC to support Itanium, but
apparently someone is willing to do the work just for fun.
And in the open source world if someone is willing
to do the work for fun then it (usually) does happen.
And Itanium is rather different from most other
architectures, so from an academic perspective it
may be interesting.
I wish someone would volunteer to create VMS support
in GCC 16 or whatever!
Arne
Because I created (canadian method) Gnat Ada (on gcc) for VMS Itanium,
and because we were on gcc 4.7, there is some work ahead, but why not :)

The big issue is the step to gcc 5, where they upgraded to c++ mode. It
is one of the reasons why Adacore didn't continue support of gnat ada on
VMS in 2015.

I have to know who likes Itanium so much :)

gcalliet
Arne Vajhøj
2024-11-07 21:48:49 UTC
Permalink
Post by gcalliet
Post by Arne Vajhøj
I wish someone would volunteer to create VMS support
in GCC 16 or whatever!
Because I created (canadian method) Gnat Ada (on gcc) for VMS Itanium,
and because we were on gcc 4.7, there is some work ahead, but why not :)
The big issue is the step to gcc 5, where they upgraded to c++ mode. It
is one of the reasons why Adacore didn't continue support of gnat ada on
VMS in 2015.
VMS x86-64 has a better C++ compiler than VMS Itanium.

But I have no idea which is best for boot strapping:

g++/Linux -> GXX/VMS

clang/VMS -> GXX/VMS

I assume that if a recent GXX/VMS is working then getting
GFortran and Gnat working would become a lot easier.

But obviously a lot of work. And I do not expect it to happen. Just
a thought given that someone wanted to support GCC/Itanium.

Arne
gcalliet
2024-11-08 08:59:50 UTC
Permalink
Post by Arne Vajhøj
But obviously a lot of work. And I do not expect it to happen. Just
a thought given that someone wanted to support GCC/Itanium.
You are right, a lot of work. But perhaps a lot of fun :)

About Itanium, who knows? I heard about some specific uses of Itanium.
So perhaps a very little business with Itanium could exist sometime.

On my side I have always thought the failure of Itanium - they said
Itanic - have been just the bad meeting between the conservatism of
geeks and the inchoate laws of the market. Our hatred of Itanium
contributed to the long life of the very archaic x86 to which the very
wise Intel returned, for its greater good.

Just to initiate a great controversy :)

gcalliet
Arne Vajhøj
2024-11-08 14:04:05 UTC
Permalink
Post by gcalliet
About Itanium, who knows? I heard about some specific uses of Itanium.
So perhaps a very little business with Itanium could exist sometime.
On my side I have always thought the failure of Itanium - they said
Itanic - have been just the bad meeting between the conservatism of
geeks and the inchoate laws of the market. Our hatred of Itanium
contributed to the long life of the very archaic x86 to which the very
wise Intel returned, for its greater good.
VMS people never liked Itanium. We loved VAX and Alpha, we are OK with
x86-64, but Itanium was only bought because for almost 2 decades it
was the only option for a new VMS box.

Itanium never had a chance. But it was due to money.

The CPU cost structure (huge fixed cost for design and fab construction
vs relative small variable cost) means that only CPU's selling
in hundreds of millions can compete cost wise. So Itanium fell
behind in clock speed, number of cores and energy efficiency.

The EPIC concept has been translated to "leave the real work to the
compiler" and for that to succeed then huge investments in
compiler technology would have been needed - hundreds maybe
thousands of engineers working on compiler backend. Did not
happen - not in HP not in Intel not anywhere. So on VMS Itanium
the generated "bundles" has a huge percentage of NOP's.

Could Itanium design have worked out if by magic the necessary
money for CPU development and compiler backend development had
been there? That is an academic question with no practical
impact - it did not happen and it could never have happened.

But from the technical perspective then I do see some
benefits from the Itanium design. CPU's has hit the GHz
cap - just doubling clock speed every generation
is not physical possible. x86-64 has worked around that
mostly by increasing number of cores. 1->2->4->8->16->24->32 cores
worked pretty well as both servers and desktop computers does
a lot of processes and/or threads in parallel. But 64, 128,
192 and 256 cores? If running a hypervisor and 10 VM's then all
good, but what if that is not the case? The Itanium bundles
offer a way to parallelize hardware usage for single
threads.

Modern x86-64 does a lot of advanced stuff under the hood to
do similar things. But it is limited by the instructions
and the memory model. With same level of investments then
I believe Itanium would do better.

But it is all pretty pointless. It is like: what if the speed
of light was 20 MPH instead of 200000 MPS.

Arne
John Dallman
2024-11-08 22:17:00 UTC
Permalink
Post by gcalliet
About Itanium, who knows? I heard about some specific uses of
Itanium. So perhaps a very little business with Itanium could exist
sometime.
It can't last now. There are a finite supply of Itanium CPUs and no more
being made.
Post by gcalliet
On my side I have always thought the failure of Itanium - they said
Itanic - have been just the bad meeting between the conservatism of
geeks and the inchoate laws of the market.
It also had fundamental technical flaws. The basic idea of EPIC, that a
compiler with plenty of time to plan, can optimise memory advance loads
to make Out-of-Order execution unnecessary, is wrong.

It would be possible to do that in a single-core system with no processor
caches, a single-tasking operating system, and few interrupts going off.
In a multi-processor, multi-tasking system which is taking interrupts, it
is impossible to know in advance what data will be in which cache levels,
and hence to optimise memory access in advance.

John
Waldek Hebisch
2024-11-11 01:03:29 UTC
Permalink
Post by gcalliet
On my side I have always thought the failure of Itanium - they said
Itanic - have been just the bad meeting between the conservatism of
geeks and the inchoate laws of the market. Our hatred of Itanium
contributed to the long life of the very archaic x86 to which the very
wise Intel returned, for its greater good.
Failure of Itanic was extensively disscussed in comp.arch. There
were fundamental issues, EPIC concept required compiler arrange
code in clever way to gain good performance. Hand coding small
examples suggested that it is possible to write fast code for
EPIC, but both when Itanic project started and now nobody knows
how to do this in a compiler. There is related issue, when
Itanic started is was not known how to get good instruction
paralellism on conventional architectures. But then branch predictors
happened and Intel and AMD were able to get good ILP from x86
(the same could be done with many other architectures, but is
incompatible with Itanic principles).

Beside fundamental problems there were several specific blunders.

Anyway, Itanic was late, expensive and had unimpressive performance.
Some people were waiting for it, but what was promised (top
performance) never appeared.
--
Waldek Hebisch
Scott Dorsey
2025-02-23 18:11:19 UTC
Permalink
Post by Arne Vajhøj
VMS x86-64 has a better C++ compiler than VMS Itanium.
It is MUCH harder to write an efficient VLIW compiler than an efficient
compiler for a traditional architecture. The need to keep as many
parts of the processor working at the same time for optimal performance
makes for a lot of added work by the compiler back end.

The whole idea of the VLIW system is that the compiler will be able to
optimize the code to gain paralellism of units inside the single
processor. This is a very very ingenious idea but nobody has yet
been able to make a compiler that could do it well enough for it to be
a real win.

It is a very difficult job. A lot of work was put into it. The
available resources for that work have all evaporated now, gone
elsewhere to other better-performing projects. The chance of the
fundamental problems ever getting solved at this point is slim.
--scott
--
"C'est un Nagra. C'est suisse, et tres, tres precis."
John Dallman
2025-02-23 21:29:00 UTC
Permalink
Post by Scott Dorsey
The whole idea of the VLIW system is that the compiler will be able
to optimize the code to gain paralellism of units inside the single
processor. This is a very very ingenious idea but nobody has yet
been able to make a compiler that could do it well enough for it to
be a real win.
Sadly, the job is *impossible*.

The fundamental problem in optimisation for modern computers is the
slowness of main RAM, which isn't currently solvable at a reasonable cost.
We use caches to mitigate it.

Out-of-order execution addresses this problem by tracking the data
dependencies on memory and registers in real time and executing
instructions when their data is available. This has worked pretty well
for almost thirty years for x86 and the other architectures that are
still competing on performance.

Itanium/EPIC was an alternative to this. The management of data
dependencies wasn't to be done dynamically by hardware, but in advance by
the compiler. This requires the compiler to track what data is in cache
so that advance loads can be scheduled correctly to have data available
in time. Unfortunately, in a multi-core system with a multi-tasking
operating system, it's impossible to know in advance what data will be in
cache, because that depends on what else is running.

Other flaws of Itanium include the bulky instruction set, which needs
more memory bandwidth and larger caches than other architectures, and an
architectural misfeature which means floating-point advance loads that
are outstanding across subroutine calls can fail silently.

If anyone tries to re-use ideas from Itanium, they'd be well-advised to
keep quiet about where they got them. There is remaining prejudice
against it, which is well-justified.

John
Stephen Hoffman
2025-02-24 17:22:35 UTC
Permalink
Post by John Dallman
Post by Scott Dorsey
The whole idea of the VLIW system is that the compiler will be able to
optimize the code to gain paralellism of units inside the single
processor. This is a very very ingenious idea but nobody has yet been
able to make a compiler that could do it well enough for it to be a
real win.
Sadly, the job is *impossible*.
The fundamental problem in optimisation for modern computers is the
slowness of main RAM, which isn't currently solvable at a reasonable
cost. We use caches to mitigate it.
Out-of-order execution addresses this problem by tracking the data
dependencies on memory and registers in real time and executing
instructions when their data is available....
The Itanium compiler optimizer just doesn't (and can't) know enough
about the system memory state, yes. Among other (no pun intended)
issues.

The attempt to address that included providing run-time feedback into
the executables; providing post-link, post-execution tuning. (Caliper /
Atom / OM / etc.)

https://www.cs.tufts.edu/comp/150PAT/tools/caliper/wiess-rev-4.pdf

This Alpha versus IA-64 Itanium paper from 1999 describes the issues
with Itanium quite well too, for those interested:

https://web.archive.org/web/20010611202933/http://www.compaq.com/hpc/ref/ref_alpha_ia64.doc
--
Pure Personal Opinion | HoffmanLabs LLC
John Reagan
2025-02-24 19:00:44 UTC
Permalink
Post by Stephen Hoffman
Post by John Dallman
Post by Scott Dorsey
The whole idea of the VLIW system is that the compiler will be able
to optimize the code to gain paralellism of units inside the single
processor. This is a very very ingenious idea but nobody has yet been
able to make a compiler that could do it well enough for it to  be a
real win.
Sadly, the job is *impossible*.
The fundamental problem in optimisation for modern computers is the
slowness of main RAM, which isn't currently solvable at a reasonable
cost. We use caches to mitigate it.
Out-of-order execution addresses this problem by tracking the data
dependencies on memory and registers in real time and executing
instructions when their data is available....
The Itanium compiler optimizer just doesn't (and can't) know enough
about the system memory state, yes. Among other (no pun intended) issues.
The attempt to address that included providing run-time feedback into
the executables; providing post-link, post-execution tuning. (Caliper /
Atom / OM / etc.)
https://www.cs.tufts.edu/comp/150PAT/tools/caliper/wiess-rev-4.pdf
This Alpha versus IA-64 Itanium paper from 1999 describes the issues
https://web.archive.org/web/20010611202933/http://www.compaq.com/hpc/
ref/ref_alpha_ia64.doc
Clearly that old Alpha/IA64 comparison was written with an agenda.
There is no clear attribution in the document but all the "we did" and
"we designed" clearly indicates authorship in the Alpha hardware group.

Some of their assumptions like it will be impossible to do out-of-order
on IA64 are wrong since the last Itaniums actually implemented OOO and
existing images saw an immediate benefit.

They were comparing the Itanium of the day to what they thought Alpha
could someday do. The Itanium of the day was pretty bad compared to the
Alpha of the day (or of the next 2 years). And it is more than just the
architecture. It is the chip, the process, the interface chips, etc.

And yes, it was a challenge for compilers. The GEM implementation is a
good V1 but is lacking. GEM wasn't designed around such a hardware
model. I'm sure with additional time/money/people that subsequent
versions would be better. Of all the backends, I've seen, the HPUX one
is the best. During the Itanium port, I had some of the COBOL RTL
routines for datatype conversion. We had C code and the performance was
horrible out of GEM. We were considering our own assembly versions, but
I was directed to some of the HPUX compiler folks. I gave them the C
code and in a few weeks, I had Itanium assembly code that I could not
recognize. It used all sorts of Itanium features. It was several times
faster (I'm thinking 10x but I don't remember). That code is in the
COBOL RTL today. That was on those early Itaniums without OOO. How
good would the GEM code be on "modern" Itanium? Don't know. Never
tried. Doesn't matter.

As you say, cache is king. Intel doesn't price their chips based on
clock speed. They price them based on cache size.

I'll agree that Alpha was the better floating point system. The weird
bundling rules in the Itanium architecture make it difficult for a
floating application.

Not to litigate the argument (but it is what c.o.v does best) again, but
it was clear to many that upper Digital management didn't want to hear
technical arguments about the decision. Turning around to ask your
choir doesn't give you any information about a transformational change
in the underlying technology.
John Dallman
2025-02-24 21:27:00 UTC
Permalink
Post by Stephen Hoffman
The Itanium compiler optimizer just doesn't (and can't) know enough
about the system memory state, yes. Among other (no pun intended)
issues.
The attempt to address that included providing run-time feedback
into the executables; providing post-link, post-execution tuning.
(Caliper / Atom / OM / etc.)
"Attempt" is about right.

I did several years porting work to Itanium. I tried run-time feedback
zero times: doing the link of the instrumented build took over an hour,
up from about a minute, because it was doing all the code generation at
link time.

The claim was "you only do this for the build you'll ship." My response
was "The compiler is so immature that I'm reporting new bugs every week,
and you want me to give the compiler new and difficult challenges?"

I never heard of anyone who got anywhere with profile-guided optimisation
on Itanium. Have you?

John
John Reagan
2025-02-24 21:55:32 UTC
Permalink
Post by John Dallman
Post by Stephen Hoffman
The Itanium compiler optimizer just doesn't (and can't) know enough
about the system memory state, yes. Among other (no pun intended)
issues.
The attempt to address that included providing run-time feedback
into the executables; providing post-link, post-execution tuning.
(Caliper / Atom / OM / etc.)
"Attempt" is about right.
I did several years porting work to Itanium. I tried run-time feedback
zero times: doing the link of the instrumented build took over an hour,
up from about a minute, because it was doing all the code generation at
link time.
The claim was "you only do this for the build you'll ship." My response
was "The compiler is so immature that I'm reporting new bugs every week,
and you want me to give the compiler new and difficult challenges?"
I never heard of anyone who got anywhere with profile-guided optimisation
on Itanium. Have you?
John
Actually, the NonStop Itanium kernel is built using PGO. Apparently,
for their test workload of transactions, the savings was a measurable
"few" percent. [I think in the 3-5% range but I'm not sure anymore]
John Dallman
2025-02-25 13:05:00 UTC
Permalink
Post by John Reagan
Actually, the NonStop Itanium kernel is built using PGO.
Apparently, for their test workload of transactions, the savings
was a measurable "few" percent. [I think in the 3-5% range but I'm
not sure anymore]
OK, I'll believe that.

John
Stephen Hoffman
2025-02-24 22:41:15 UTC
Permalink
Post by John Dallman
I never heard of anyone who got anywhere with profile-guided
optimisation on Itanium. Have you?
While there were whitepapers and related, AFAIK, the OM tools were
never released for OpenVMS.

Various developers I've chatted with were skeptical about supporting
and debugging post-link-optimized executables.

And then there was The Graph:
https://commons.wikimedia.org/wiki/File:Itanium_Sales_Forecasts_edit.svg
--
Pure Personal Opinion | HoffmanLabs LLC
Michael S
2025-02-24 17:42:39 UTC
Permalink
On Thu, 7 Nov 2024 16:48:49 -0500
Post by Arne Vajhøj
Post by gcalliet
Post by Arne Vajhøj
I wish someone would volunteer to create VMS support
in GCC 16 or whatever!
Because I created (canadian method) Gnat Ada (on gcc) for VMS
Itanium, and because we were on gcc 4.7, there is some work ahead,
but why not :)
The big issue is the step to gcc 5, where they upgraded to c++
mode. It is one of the reasons why Adacore didn't continue support
of gnat ada on VMS in 2015.
VMS x86-64 has a better C++ compiler than VMS Itanium.
According to the benchmarks that you posted here several months (a
year?) ago, VMS x86-64 compilers are quite awful comparatively to
x86-64 compilers available on Windows/Linux/BSD.
Do you want to say that VMS Itanium compilers are worse?
Post by Arne Vajhøj
g++/Linux -> GXX/VMS
clang/VMS -> GXX/VMS
I assume that if a recent GXX/VMS is working then getting
GFortran and Gnat working would become a lot easier.
But obviously a lot of work. And I do not expect it to happen. Just
a thought given that someone wanted to support GCC/Itanium.
Arne
Robert A. Brooks
2025-02-24 18:10:34 UTC
Permalink
Post by Michael S
On Thu, 7 Nov 2024 16:48:49 -0500
Post by Arne Vajhøj
Post by gcalliet
Post by Arne Vajhøj
I wish someone would volunteer to create VMS support
in GCC 16 or whatever!
Because I created (canadian method) Gnat Ada (on gcc) for VMS
Itanium, and because we were on gcc 4.7, there is some work ahead,
but why not :)
The big issue is the step to gcc 5, where they upgraded to c++
mode. It is one of the reasons why Adacore didn't continue support
of gnat ada on VMS in 2015.
VMS x86-64 has a better C++ compiler than VMS Itanium.
According to the benchmarks that you posted here several months (a
year?) ago, VMS x86-64 compilers are quite awful comparatively to
x86-64 compilers available on Windows/Linux/BSD.
Do you want to say that VMS Itanium compilers are worse?
Without knowing the compiler version, it's impossible to comment.
If they were cross-compilers, there was no optimization at all.
--
-- Rob
Simon Clubley
2025-02-24 18:57:55 UTC
Permalink
Post by Michael S
On Thu, 7 Nov 2024 16:48:49 -0500
Post by Arne Vajhøj
VMS x86-64 has a better C++ compiler than VMS Itanium.
According to the benchmarks that you posted here several months (a
year?) ago, VMS x86-64 compilers are quite awful comparatively to
x86-64 compilers available on Windows/Linux/BSD.
Given all the various bits of movement in multiple areas over the last
year or so, it might be time for those same tests to be run again against
current compiler versions.

Simon.
--
Simon Clubley, ***@remove_me.eisner.decus.org-Earth.UFP
Walking destinations on a map are further away than they appear.
Arne Vajhøj
2025-02-24 20:11:07 UTC
Permalink
Post by Simon Clubley
Post by Michael S
On Thu, 7 Nov 2024 16:48:49 -0500
Post by Arne Vajhøj
VMS x86-64 has a better C++ compiler than VMS Itanium.
According to the benchmarks that you posted here several months (a
year?) ago, VMS x86-64 compilers are quite awful comparatively to
x86-64 compilers available on Windows/Linux/BSD.
Given all the various bits of movement in multiple areas over the last
year or so, it might be time for those same tests to be run again against
current compiler versions.
I have updated the VMS numbers with new compiler versions.

The traditional languages are still behind C++.

Arne
Simon Clubley
2025-02-26 18:32:38 UTC
Permalink
Post by Arne Vajhøj
Post by Simon Clubley
Post by Michael S
On Thu, 7 Nov 2024 16:48:49 -0500
Post by Arne Vajhøj
VMS x86-64 has a better C++ compiler than VMS Itanium.
According to the benchmarks that you posted here several months (a
year?) ago, VMS x86-64 compilers are quite awful comparatively to
x86-64 compilers available on Windows/Linux/BSD.
Given all the various bits of movement in multiple areas over the last
year or so, it might be time for those same tests to be run again against
current compiler versions.
I have updated the VMS numbers with new compiler versions.
The traditional languages are still behind C++.
Thanks Arne,

Simon.
--
Simon Clubley, ***@remove_me.eisner.decus.org-Earth.UFP
Walking destinations on a map are further away than they appear.
Arne Vajhøj
2025-02-24 20:08:57 UTC
Permalink
Post by Michael S
On Thu, 7 Nov 2024 16:48:49 -0500
Post by Arne Vajhøj
Post by gcalliet
Post by Arne Vajhøj
I wish someone would volunteer to create VMS support
in GCC 16 or whatever!
Because I created (canadian method) Gnat Ada (on gcc) for VMS
Itanium, and because we were on gcc 4.7, there is some work ahead,
but why not :)
The big issue is the step to gcc 5, where they upgraded to c++
mode. It is one of the reasons why Adacore didn't continue support
of gnat ada on VMS in 2015.
VMS x86-64 has a better C++ compiler than VMS Itanium.
That comment was about C++ standard compliance not performance.

C++ VMS x86-64 is clang which in the (older) clang version used
should mean C++14 while C++ VMS Itanium is very very old (like
C++ 98 old).
Post by Michael S
According to the benchmarks that you posted here several months (a
year?) ago, VMS x86-64 compilers are quite awful comparatively to
x86-64 compilers available on Windows/Linux/BSD.
Do you want to say that VMS Itanium compilers are worse?
I believe the conclusion was that the VMS x86-64 compilers except C++
was slower than C/C++ on other OS and C++ on VMS.

My guess is that it is a combination of the GEM to LLVM translation
and a desire from VSI to be a little conservative (prioritizing
correctness over speed).

Arne
Michael S
2025-02-24 21:22:22 UTC
Permalink
On Mon, 24 Feb 2025 15:08:57 -0500
Post by Arne Vajhøj
Post by Michael S
On Thu, 7 Nov 2024 16:48:49 -0500
Post by Arne Vajhøj
Post by gcalliet
Post by Arne Vajhøj
I wish someone would volunteer to create VMS support
in GCC 16 or whatever!
Because I created (canadian method) Gnat Ada (on gcc) for VMS
Itanium, and because we were on gcc 4.7, there is some work ahead,
but why not :)
The big issue is the step to gcc 5, where they upgraded to c++
mode. It is one of the reasons why Adacore didn't continue support
of gnat ada on VMS in 2015.
VMS x86-64 has a better C++ compiler than VMS Itanium.
That comment was about C++ standard compliance not performance.
Ok
Post by Arne Vajhøj
C++ VMS x86-64 is clang which in the (older) clang version used
should mean C++14 while C++ VMS Itanium is very very old (like
C++ 98 old).
Post by Michael S
According to the benchmarks that you posted here several months (a
year?) ago, VMS x86-64 compilers are quite awful comparatively to
x86-64 compilers available on Windows/Linux/BSD.
Do you want to say that VMS Itanium compilers are worse?
I believe the conclusion was that the VMS x86-64 compilers except C++
was slower than C/C++ on other OS and C++ on VMS.
Somehow I got an impression that C++ compilers were also significantly
slower than C++ compilers on other platforms.
Do I misremember?
Post by Arne Vajhøj
My guess is that it is a combination of the GEM to LLVM translation
and a desire from VSI to be a little conservative (prioritizing
correctness over speed).
Arne
Arne Vajhøj
2025-02-24 21:43:29 UTC
Permalink
Post by Michael S
On Mon, 24 Feb 2025 15:08:57 -0500
Post by Arne Vajhøj
C++ VMS x86-64 is clang which in the (older) clang version used
should mean C++14 while C++ VMS Itanium is very very old (like
C++ 98 old).
Post by Michael S
According to the benchmarks that you posted here several months (a
year?) ago, VMS x86-64 compilers are quite awful comparatively to
x86-64 compilers available on Windows/Linux/BSD.
Do you want to say that VMS Itanium compilers are worse?
I believe the conclusion was that the VMS x86-64 compilers except C++
was slower than C/C++ on other OS and C++ on VMS.
Somehow I got an impression that C++ compilers were also significantly
slower than C++ compilers on other platforms.
Do I misremember?
I don't even remember that I posted non-VMS numbers here. Age! :-)

But I just checked VMS C++ latest (CXX/OPT=LEVEL:5 and clang -O3) vs a
random Windows GCC 14.1 (g++ -O3):

VMS is a little faster for integer
they are about the same for floating point
Windows is a lot faster for string

And given that this is a micro-benchmark with in reality just an inner
loop evaluating a single expression, which means huge uncertainty, then
I don't see this as proof of a significant difference.

Arne
John Reagan
2025-02-24 22:02:08 UTC
Permalink
Post by Arne Vajhøj
Post by Michael S
On Mon, 24 Feb 2025 15:08:57 -0500
Post by Arne Vajhøj
C++ VMS x86-64 is clang which in the (older) clang version used
should mean C++14 while C++ VMS Itanium is very very old (like
C++ 98 old).
Post by Michael S
According to the benchmarks that you posted here several months (a
year?) ago, VMS x86-64 compilers are quite awful comparatively to
x86-64 compilers available on Windows/Linux/BSD.
Do you want to say that VMS Itanium compilers are worse?
I believe the conclusion was that the VMS x86-64 compilers except C++
was slower than C/C++ on other OS and C++ on VMS.
Somehow I got an impression that C++ compilers were also significantly
slower than C++ compilers on other platforms.
Do I misremember?
I don't even remember that I posted non-VMS numbers here. Age! :-)
But I just checked VMS C++ latest (CXX/OPT=LEVEL:5 and clang -O3) vs a
VMS is a little faster for integer
they are about the same for floating point
Windows is a lot faster for string
And given that this is a micro-benchmark with in reality just an inner
loop evaluating a single expression, which means huge uncertainty, then
I don't see this as proof of a significant difference.
Arne
We are aware of the string/char performance issues.

On Alpha and Itanium, the lowlevel routines inside of LIBOTS for things
like OTS$MOVE, string compare, memmove, etc. are all written in
hand-crafted assembly. For x86, we are still using a set of BLISS
reference code that is simple. Plus the LIBOTS we all have on our
systems was compiled with a non-optimizing BLISS cross-compiler.

We are currently playing with native compiled LIBOTS code and doing some
benchmarks. Besides the brain-dead BLISS code, we have versions that
loop with larger chunks of data which are even faster. The fastest
we've seen so far is a native assembly version that uses the REP
instruction prefix on the MOVSB. That version didn't check for
overlapping source/dest however so any real version gets a little
slower. I'm not sure when we can incorporate these, but I'm trying to
push them as soon as possible.

A fun reference to read is

https://cdrdv2-public.intel.com/814198/248966-Optimization-Reference-Manual-V1-050.pdf
Dan Cross
2025-02-25 21:35:46 UTC
Permalink
Post by John Reagan
Post by Arne Vajhøj
Post by Michael S
On Mon, 24 Feb 2025 15:08:57 -0500
Post by Arne Vajhøj
C++ VMS x86-64 is clang which in the (older) clang version used
should mean C++14 while C++ VMS Itanium is very very old (like
C++ 98 old).
Post by Michael S
According to the benchmarks that you posted here several months (a
year?) ago, VMS x86-64 compilers are quite awful comparatively to
x86-64 compilers available on Windows/Linux/BSD.
Do you want to say that VMS Itanium compilers are worse?
I believe the conclusion was that the VMS x86-64 compilers except C++
was slower than C/C++ on other OS and C++ on VMS.
Somehow I got an impression that C++ compilers were also significantly
slower than C++ compilers on other platforms.
Do I misremember?
I don't even remember that I posted non-VMS numbers here. Age! :-)
But I just checked VMS C++ latest (CXX/OPT=LEVEL:5 and clang -O3) vs a
VMS is a little faster for integer
they are about the same for floating point
Windows is a lot faster for string
And given that this is a micro-benchmark with in reality just an inner
loop evaluating a single expression, which means huge uncertainty, then
I don't see this as proof of a significant difference.
Arne
We are aware of the string/char performance issues.
On Alpha and Itanium, the lowlevel routines inside of LIBOTS for things
like OTS$MOVE, string compare, memmove, etc. are all written in
hand-crafted assembly. For x86, we are still using a set of BLISS
reference code that is simple. Plus the LIBOTS we all have on our
systems was compiled with a non-optimizing BLISS cross-compiler.
Hmm. It strikes me that LLVM has intrinsics for `memmove` that
would also work for OTS$MOVE3; I would think that that would be
most efficient, as for small moves, this could lower directly
to a couple of loads and/or stores?
Post by John Reagan
We are currently playing with native compiled LIBOTS code and doing some
benchmarks. Besides the brain-dead BLISS code, we have versions that
loop with larger chunks of data which are even faster. The fastest
we've seen so far is a native assembly version that uses the REP
instruction prefix on the MOVSB. That version didn't check for
overlapping source/dest however so any real version gets a little
slower. I'm not sure when we can incorporate these, but I'm trying to
push them as soon as possible.
Yeah, Intel made `REP MOVESB`/`REP STOSB` actually fast a few
uarchs ago. Good stuff, though startup overhead still dominates
for <128 bytes or something like that, and having to muck with
the DF flag remains a bummer.
Post by John Reagan
A fun reference to read is
https://cdrdv2-public.intel.com/814198/248966-Optimization-Reference-Manual-V1-050.pdf
Agner Fog's optimization guides can also be a useful resource
for things like this: https://www.agner.org/optimize/

- Dan C.
Loading...