[InstCombine] Fix name clashes in check lines (NFC)
These used both lower and upper case variants of the same name,
resulting in malformed check lines when regenerated.
[InstCombine] Thwart complexity-based canonicalization (NFC)
These tests did not test what they were supposed to. The transform
fails to actually handle the commuted cases.
[AMDGPU] Fix GFX12 encoding of s_wait_event export_ready (#89622)
As well as flipping the sense of the bit, GFX12 moved it from bit 0 to
bit 1 in the encoded simm16 operand.
(cherry picked from commit e0a763c490d8ef58dca867e0ef834978ccf8e17d)
[SelectionDAG] Mark frame index as "aliased" at argument copy elison (#89712)
This is a fix for miscompiles reported in
https://github.com/llvm/llvm-project/issues/89060
After argument copy elison the IR value for the eliminated alloca
is aliasing with the fixed stack object. This patch is making sure
that we mark the fixed stack object as being aliased with IR values
to avoid that for example schedulers are reordering accesses to
the fixed stack object. This could otherwise happen when there is a
mix of MemOperands refering the shared fixed stack slow via both
the IR value for the elided alloca, and via a fixed stack pseudo
source value (as would be the case when lowering the arguments).
(cherry picked from commit d8b253be56b3e9073b3e59123cf2da0bcde20c63)
[X86][EVEX512] Add `HasEVEX512` when `NoVLX` used for 512-bit patterns (#91106)
With KNL/KNC being deprecated, we don't need to care about such no VLX
cases anymore. We may remove such patterns in the future.
Fixes #90844
(cherry picked from commit 7963d9a2b3c20561278a85b19e156e013231342c)
[AMDGPU] Enhance s_waitcnt insertion before barrier for gfx12 (#90595)
Code to determine if a waitcnt is required before a barrier instruction
only
considered S_BARRIER.
gfx12 adds barrier_signal/wait so need to enhance the existing code to
look for
a barrier start (which is just an S_BARRIER for earlier architectures).
[Workflows] Re-write release-binaries workflow (#89521)
This updates the release-binaries workflow so that the different build
stages are split across multiple jobs. This saves money by reducing the
time spent on the larger github runners and also makes it easier to
debug, because now it's possible to build a smaller release package
(with clang and lld) using only the free GitHub runners.
The workflow no longer uses the test-release.sh script but instead uses
the Release.cmake cache. This gives the workflow more flexibility and
ensures that the binary package will always be created even if the tests
fail.
This idea to split the stages comes from the "LLVM Precommit CI through
Github Actions" RFC:
https://discourse.llvm.org/t/rfc-llvm-precommit-ci-through-github-actions/76456
(cherry picked from commit abac98479b81cc0cc717bb6cdbae6f774e3b0232)
[CMake][Release] Add stage2-package target (#89517)
This target will be used to generate the release binary package for
uploading to GitHub.
(cherry picked from commit a38f201f1ec70c2b1f3cf46e7f291c53bb16753e)
[Github] Add repository checks to release-binaries workflow (#84437)
This patch adds repository checks to the release-binaries workflow jobs.
People were observing that the job was running on a schedule in their
forks. This only happens on old forks, but those probably exist in great
number given how prolific LLVM is. This is also good practice anyways,
on top of solving the direct problem of these jobs running with the cron
schedule on people's forks.
(cherry picked from commit 9f5be5f0092a636274953389cd5771c45ac0a568)
[CMake][Release] Enable CMAKE_POSITION_INDEPENDENT_CODE (#90139)
Set this in the cache file directly instead of via the test-release.sh
script so that the release builds can be reproduced with just the cache
file.
(cherry picked from commit 53ff002c6f7ec64a75ab0990b1314cc6b4bb67cf)
[CMake][Release] Refactor cache file and use two stages for non-PGO builds (#89812)
Completely refactor the cache file to simplify it and remove unnecessary
variables. The main functional change here is that the non-PGO builds
now use two stages, so `ninja -C build stage2-package` can be used with
both PGO and non-PGO builds.
(cherry picked from commit 6473fbf2d68c8486d168f29afc35d3e8a6fabe69)
[llvm][MachO] Fix integer truncation in rebase/bind parsing (#89337)
`Count` and `Skip` should use `uint64_t` as they are encoded/decoded
using 64-bit ULEB128.
In `*_OPCODE_DO_*_ULEB_TIMES_SKIPPING_ULEB`, `Skip` could be encoded as
a two's complement for moving `SegmentOffset` backwards. Having a 32-bit
`Skip` truncates the encoded value and leads to a malformed
`AdvanceAmount`
and invalid `SegmentOffset` that extends past valid sections.
[PowerPC] Tune AIX shared library TLS model at function level (#84132)
Under some circumstance (library loaded with the main program), TLS
initial-exec model can be applied to local-dynamic access(es). We
could use some simple heuristic to decide the update at function level:
* If there is equal or less than a number of TLS local-dynamic access(es)
in the function, use TLS initial-exec model. (the threshold which default to
1 is controlled by hidden option)
[analyzer] MallocChecker: Recognize std::atomics in smart pointer suppression. (#90918)
Fixes #90498.
Same as 5337efc69cdd5 for atomic builtins, but for `std::atomic` this
time. This is useful because even though the actual builtin atomic is
still there, it may be buried beyond the inlining depth limit.
Also add one popular custom smart pointer class name to the name-based
heuristics, which isn't necessary to fix the bug but arguably a good
idea regardless.
[BOLT] Add test case for PIC fixed indirect jump (#91547)
A compiler can generate a redundant indirection for a jump via a fixed
jump table target. Add a test case that covers such pattern that covers
PIC case. We already have non-PIC case detection.
Currently XFAIL.
[RISCV] Don't use std::vector<std::string> for split extensions in RISCVISAInfo::parseArchString. NFC (#91538)
We can use a SmallVector<StringRef>.
Adjust the code so we check for empty strings in the loop instead of
making a copy of the vector returned from StringRef::split.
This overlaps with #91532 which also removed the std::vector, but
that PR may be more controversial.
[flang] Lowering changes for assigning dummy_scope to hlfir.declare. (#90989)
The lowering produces fir.dummy_scope operation if the current
function has dummy arguments. Each hlfir.declare generated
for a dummy argument is then using the result of fir.dummy_scope
as its dummy_scope operand. This is only done for HLFIR.
I was not able to find a reliable way to identify dummy symbols
in `genDeclareSymbol`, so I added a set of registered dummy symbols
that is alive during the variables instantiation for the current
function. The set is initialized during the mapping of the dummy
argument symbols to their MLIR values. It is reset right after
all variables are instantiated - this is done to avoid generating
hlfir.declare operations with dummy_scope for the clones of
the dummy symbols (e.g. this happens with OpenMP privatization).
If this can be done in a cleaner way, please advise.