LLVM/project 37c5faallvm/lib/Transforms/Vectorize SLPVectorizer.cpp, llvm/test/Transforms/SLPVectorizer/RISCV combined-loads-stored.ll

[𝘀𝗽𝗿] initial version

Created using spr 1.3.5
DeltaFile
+56-20llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+6-9llvm/test/Transforms/SLPVectorizer/RISCV/combined-loads-stored.ll
+62-292 files

LLVM/project 39e24bdllvm/lib/CodeGen MachineLICM.cpp, llvm/test/CodeGen/AMDGPU machinelicm-copy-like-instrs.mir global_atomics_i64_system.ll

MachineLICM: Allow hoisting REG_SEQUENCE (#90638)

DeltaFile
+134-0llvm/test/CodeGen/AMDGPU/machinelicm-copy-like-instrs.mir
+24-17llvm/lib/CodeGen/MachineLICM.cpp
+5-5llvm/test/CodeGen/AMDGPU/global_atomics_i64_system.ll
+2-2llvm/test/CodeGen/AMDGPU/optimize-negated-cond.ll
+1-0llvm/test/CodeGen/Hexagon/expand-vstorerw-undef.ll
+166-245 files

LLVM/project e83c6ddllvm/test/Transforms/SLPVectorizer/RISCV combined-loads-stored.ll

[SLP][NFC]Add a test with the non profitable masked gather loads.
DeltaFile
+35-0llvm/test/Transforms/SLPVectorizer/RISCV/combined-loads-stored.ll
+35-01 files

LLVM/project e22ce61clang/tools/clang-format ClangFormat.cpp, llvm/lib/ToolDrivers/llvm-lib LibDriver.cpp

[z/OS] treat text files as text files so auto-conversion is done (#90128)

To support auto-conversion on z/OS text files need to be opened as text files. These changes will fix a number of LIT failures due to text files not being converted to the internal code page.

update a number of tools so they open the text files as text files
add support in the cat.py to open a text file as a text file (Windows will continue to treat all files as binary so new lines are handled correctly)
add env var definitions to enable auto-conversion in the lit config file.
DeltaFile
+16-2llvm/utils/lit/lit/builtin_commands/cat.py
+6-3llvm/tools/llvm-cxxmap/llvm-cxxmap.cpp
+4-3clang/tools/clang-format/ClangFormat.cpp
+7-0llvm/utils/lit/lit/llvm/config.py
+2-1llvm/lib/ToolDrivers/llvm-lib/LibDriver.cpp
+1-1llvm/tools/yaml2obj/yaml2obj.cpp
+36-106 files

LLVM/project 78270cbllvm/lib/Analysis ValueTracking.cpp

[UndefOrPoison] [CompileTime] Avoid IDom walk unless required. NFC (#90092)

If the value is not boolean and we are checking for `Undef` or
`UndefOrPoison`, we can avoid the potentially expensive IDom walk.
    
This should improve compile time for isGuaranteedNotToBeUndefOrPoison
and isGuaranteedNotToBeUndef.
DeltaFile
+26-22llvm/lib/Analysis/ValueTracking.cpp
+26-221 files

LLVM/project f050660clang/include/clang/Basic DiagnosticParseKinds.td, clang/lib/Parse ParseOpenMP.cpp

[OpenMP][TR12] change property of map-type modifier. (#90499)

map-type change to "default" instead "ultimate" from [OpenMP5.2]

The change is allowed map-type to be placed any locations within map
modifiers, besides the last location in the modifiers-list, also
map-type can be omitted afterward.
DeltaFile
+59-46clang/test/OpenMP/target_map_messages.cpp
+58-0clang/test/OpenMP/target_ast_print.cpp
+40-4clang/lib/Parse/ParseOpenMP.cpp
+5-0clang/include/clang/Basic/DiagnosticParseKinds.td
+162-504 files

LLVM/project be5075aclang/lib/CodeGen CGCUDANV.cpp, clang/test/CodeGenCUDA kernel-stub-name.cu

[CUDA] make kernel stub ICF-proof (#90155)

MSVC linker merges functions having comdat which have identical set of
instructions. CUDA uses kernel stub function as key to look up kernels
in device executables. If kernel stub function for different kernels are
merged by ICF, incorrect kernels will be launched.

To prevent ICF from merging kernel stub functions, an unique global
variable is created for each kernel stub function having comdat and a
store is added to the kernel stub function. This makes the set of
instructions in each kernel function unique.

Fixes: https://github.com/llvm/llvm-project/issues/88883
DeltaFile
+60-39clang/test/CodeGenCUDA/kernel-stub-name.cu
+27-0clang/lib/CodeGen/CGCUDANV.cpp
+87-392 files

LLVM/project f07a2edlldb/source/Plugins/SymbolLocator/Default SymbolLocatorDefault.cpp

[lldb] Teach LocateExecutableSymbolFile to look into LOCALBASE on FreeBSD (#81355)

FreeBSD ports will now install debuginfo under $LOCALBASE/lib/debug/, where $LOCALBASE is typically /usr/local. On FreeBSD search this path in addition to existing debug info paths.

Relevant change on the FreeBSD side: https://reviews.freebsd.org/D43515
DeltaFile
+22-0lldb/source/Plugins/SymbolLocator/Default/SymbolLocatorDefault.cpp
+22-01 files

LLVM/project cfca977llvm/cmake/modules LLVMConfig.cmake.in, llvm/include/llvm/TargetParser AArch64TargetParser.h

[AArch64][TargetParser] autogen ArchExtKind enum (#90314)

Re-land 61b2a0e3336aaa0132bbed06dc185aca4ff5d2db. Some Windows builds
were failing because AArch64TargetParserDef.inc is a generated header
which is included transitively into some clang components, but this
information is not available to the build system and therefore there is
a missing edge in the dependency graph. This patch incorporates the
fixes described in ac1ffd3caca12c254e0b8c847aa8ce8e51b6cfbf/D142403.

Thanks to ExtensionSet::toLLVMFeatureList, all values of ArchExtKind
should correspond to a particular -target-feature. The valid values of
-target-feature are in turn defined by SubtargetFeature defs.

Therefore we can generate ArchExtKind from the tablegen data. This is
done by adding an Extension class which derives from SubtargetFeature.

Because the Has* FieldNames do not always correspond to the AEK_
names ("extensions", as defined in TargetParser), and AEK_ names do
not always correspond to -march strings, some additional enum entries

    [3 lines not shown]
DeltaFile
+126-115llvm/lib/Target/AArch64/AArch64Features.td
+22-81llvm/include/llvm/TargetParser/AArch64TargetParser.h
+12-0llvm/lib/Target/ARM/ARMFeatures.td
+11-0llvm/utils/TableGen/ARMTargetDefEmitter.cpp
+6-0llvm/cmake/modules/LLVMConfig.cmake.in
+177-1965 files

LLVM/project 167b506libcxx/utils/ci run-buildbot

[libcxx][ci] In picolib build, ask clang for the normalised triple (#90722)

This is needed for a workaround to make sure the link later succeeds. I
don't know the reason for that but it is definitely needed.

https://github.com/llvm/llvm-project/pull/89234 will/wants to correct
the triple normalisation for -none- and this means that clang prior to
19, and clang 19 and above will have different answers and therefore
different library paths.

I don't want to bootstrap a clang just for libcxx CI, or require that
anyone building for Arm do the same, so ask the compiler what the triple
should be.

This will be compatible with 17 and 19 when we do update to that
version.

I'm assuming $CC is what anyone locally would set to override the
compiler, and `cc` is the binary name in our CI containers. It's not
perfect but it should cover most use cases.
DeltaFile
+7-1libcxx/utils/ci/run-buildbot
+7-11 files

LLVM/project e312f07offload CMakeLists.txt

[Offload] Fix CMake detection when it is not found (#90729)

Summary:
This variable could be unset if not found or when building standalone.
We should check for that and set it to true or false.

Fixes: https://github.com/llvm/llvm-project/issues/90708
DeltaFile
+6-1offload/CMakeLists.txt
+6-11 files

LLVM/project d30a434llvm/test/CodeGen/AMDGPU memory-legalizer-global-agent.ll memory-legalizer-global-workgroup.ll

Rebase

Created using spr 1.3.5
DeltaFile
+13,404-6,344llvm/test/CodeGen/AMDGPU/memory-legalizer-global-agent.ll
+12,949-6,549llvm/test/CodeGen/AMDGPU/memory-legalizer-global-workgroup.ll
+12,712-6,088llvm/test/CodeGen/AMDGPU/memory-legalizer-global-system.ll
+12,041-6,669llvm/test/CodeGen/AMDGPU/memory-legalizer-global-wavefront.ll
+12,041-6,669llvm/test/CodeGen/AMDGPU/memory-legalizer-global-singlethread.ll
+12,958-4,611llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-agent.ll
+76,105-36,9305,185 files not shown
+449,241-206,3605,191 files

LLVM/project 0647b2allvm/utils/gn/secondary/clang/lib/Headers BUILD.gn

[gn build] Port df241b19c952
DeltaFile
+1-0llvm/utils/gn/secondary/clang/lib/Headers/BUILD.gn
+1-01 files

LLVM/project 9ebf2f8llvm/utils/gn/secondary/llvm/include/llvm/Config BUILD.gn, llvm/utils/gn/secondary/llvm/test BUILD.gn

Revert "[gn] port 088aa81a5454 (LLVM_HAS_LOGF128)"

This reverts commit 68b863b7fa68a196bcc02d12c028dea7dcd9b97b.
088aa81a5454 was reverted in efce8a05aa4e.
DeltaFile
+0-1llvm/utils/gn/secondary/llvm/include/llvm/Config/BUILD.gn
+0-1llvm/utils/gn/secondary/llvm/test/BUILD.gn
+0-22 files

LLVM/project efce8a0llvm/cmake config-ix.cmake, llvm/include/llvm/ADT APFloat.h APInt.h

Revert "Constant Fold logf128 calls"

This reverts commit 088aa81a545421933254f19cd3c8914a0373b493.
DeltaFile
+0-26llvm/include/llvm/Support/float128.h
+0-24llvm/lib/Support/APFloat.cpp
+0-13llvm/include/llvm/ADT/APFloat.h
+0-11llvm/cmake/config-ix.cmake
+0-11llvm/lib/Analysis/ConstantFolding.cpp
+0-8llvm/include/llvm/ADT/APInt.h
+0-937 files not shown
+0-11513 files

LLVM/project 034912dllvm CMakeLists.txt

[SystemZ][z/OS] Build in ASCII 64 bit mode on z/OS (#90630)

Setting the correct build flags on z/OS to build LLVM as 64-bit ASCII
application.

DeltaFile
+5-0llvm/CMakeLists.txt
+5-01 files

LLVM/project 68b863bllvm/utils/gn/secondary/llvm/include/llvm/Config BUILD.gn, llvm/utils/gn/secondary/llvm/test BUILD.gn

[gn] port 088aa81a5454 (LLVM_HAS_LOGF128)

If we want to turn this on on some platforms, we'll also want to
define HAS_LOGF128 for AnalysisTest, see
llvm/unittests/Analysis/CMakeLists.txt
DeltaFile
+1-0llvm/utils/gn/secondary/llvm/include/llvm/Config/BUILD.gn
+1-0llvm/utils/gn/secondary/llvm/test/BUILD.gn
+2-02 files

LLVM/project 57d0d3bflang/lib/Lower Bridge.cpp, flang/test/Lower/OpenMP parallel-private-clause-fixes.f90 cfg-conversion-omp.private.f90

[Flang][OpenMP] Handle more character allocatable cases in privatization (#90449)

Fixes #84732, #81947, #81946

Note: This is a fix till we enable delayed privatization.
DeltaFile
+104-0flang/test/Lower/OpenMP/parallel-private-clause-fixes.f90
+22-19flang/lib/Lower/Bridge.cpp
+1-1flang/test/Lower/OpenMP/cfg-conversion-omp.private.f90
+1-1flang/test/Lower/OpenMP/delayed-privatization-allocatable-private.f90
+128-214 files

LLVM/project 088aa81llvm/cmake config-ix.cmake, llvm/include/llvm/ADT APFloat.h APInt.h

Constant Fold logf128 calls

This is a second attempt to land #84501 which failed on several targets.

This patch adds the HAS_IEE754_FLOAT128 define which makes the check for
typedef'ing float128 more precise by checking whether __uint128_t is available
and checking if the host does not use __ibm128 which is prevalent on power pc
targets and replaces IEEE754 float128s.
DeltaFile
+26-0llvm/include/llvm/Support/float128.h
+24-0llvm/lib/Support/APFloat.cpp
+13-0llvm/include/llvm/ADT/APFloat.h
+11-0llvm/cmake/config-ix.cmake
+11-0llvm/lib/Analysis/ConstantFolding.cpp
+8-0llvm/include/llvm/ADT/APInt.h
+93-07 files not shown
+115-013 files

LLVM/project df241b1clang/lib/Headers CMakeLists.txt stddef.h, clang/lib/Headers/zos_wrappers builtins.h

[z/OS] add support for z/OS system headers to clang std header wrappers (#89995)

Update the wrappers for the C std headers so that they always forward to
the z/OS system headers.
DeltaFile
+17-2clang/lib/Headers/CMakeLists.txt
+18-0clang/lib/Headers/zos_wrappers/builtins.h
+17-0clang/lib/Headers/stddef.h
+12-0clang/lib/Headers/stdarg.h
+6-0clang/lib/Headers/stdnoreturn.h
+5-1clang/lib/Headers/varargs.h
+75-38 files not shown
+111-314 files

LLVM/project 442990bllvm/utils/gn/secondary/llvm/test BUILD.gn

[gn] port 8cde1cfc60e3 (LLVM_APPEND_VC_REV for lit)
DeltaFile
+2-0llvm/utils/gn/secondary/llvm/test/BUILD.gn
+2-01 files

LLVM/project 576261allvm/lib/Transforms/Vectorize SLPVectorizer.cpp, llvm/test/Transforms/SLPVectorizer/RISCV complex-loads.ll

[SLP]Improve reordering for consts, splats and ops from same nodes + improved analysis.

Improved detection of const/splat candidates, their matching and analysis of instructions from same nodes.

Metric: size..text

Program                                                                                                                                                size..text
                                                                                                                                                       results     results0    diff
                                                                                                                                                       results     results0    diff
                                                                             test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test    92952.00    93096.00  0.2%
                                                                                     test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test   779832.00   780136.00  0.0%
                                                                                          test-suite :: MultiSource/Applications/JM/lencod/lencod.test   839923.00   840179.00  0.0%
                                                                                          test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test   392708.00   392740.00  0.0%
                                                                                test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test  1171131.00  1171147.00  0.0%

                                                                              test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  1391089.00  1391073.00 -0.0%
                                                                             test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  1391089.00  1391073.00 -0.0%
                                                                              test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12352780.00 12352636.00 -0.0%


    [20 lines not shown]
DeltaFile
+106-106llvm/test/Transforms/SLPVectorizer/RISCV/complex-loads.ll
+77-18llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+15-17llvm/test/Transforms/SLPVectorizer/X86/vec_list_bias_external_insert_shuffled.ll
+8-8llvm/test/Transforms/SLPVectorizer/X86/operandorder.ll
+5-8llvm/test/Transforms/SLPVectorizer/X86/extractelement-single-use-many-nodes.ll
+4-8llvm/test/Transforms/SLPVectorizer/X86/addsub.ll
+215-1657 files not shown
+230-18213 files

LLVM/project 67e726allvm/lib/Transforms/Vectorize SLPVectorizer.cpp, llvm/test/Transforms/SLPVectorizer/RISCV strided-stores-vectorized.ll

[SLP]Transform stores + reverse to strided stores with stride -1, if profitable.

Adds transformation of consecutive vector store + reverse to strided
stores with stride -1, if it is profitable

Reviewers: RKSimon, preames

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/90464
DeltaFile
+66-8llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+5-26llvm/test/Transforms/SLPVectorizer/RISCV/strided-stores-vectorized.ll
+71-342 files

LLVM/project 7d3623ellvm/test/CodeGen/AMDGPU memory-legalizer-global-agent.ll memory-legalizer-global-workgroup.ll

Rebase

Created using spr 1.3.5
DeltaFile
+13,404-6,344llvm/test/CodeGen/AMDGPU/memory-legalizer-global-agent.ll
+12,949-6,549llvm/test/CodeGen/AMDGPU/memory-legalizer-global-workgroup.ll
+12,712-6,088llvm/test/CodeGen/AMDGPU/memory-legalizer-global-system.ll
+12,041-6,669llvm/test/CodeGen/AMDGPU/memory-legalizer-global-wavefront.ll
+12,041-6,669llvm/test/CodeGen/AMDGPU/memory-legalizer-global-singlethread.ll
+12,958-4,611llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-agent.ll
+76,105-36,9301,594 files not shown
+237,550-108,6431,600 files

LLVM/project 803e03fllvm/include/llvm/CodeGen MachineScheduler.h

[llvm] Revive constructor of 'ResourceSegments'

582c6a82b4bc2ac5cbff803960eeb022bff10168 removed a constructor of
'ResourceSegments' that is needed in LLVM unit tests.

* Revert 582c6a82b4bc2ac5cbff803960eeb022bff10168
* Update the constructor to take a const reference of
  `std::list` as pointed out in #89193.
DeltaFile
+4-0llvm/include/llvm/CodeGen/MachineScheduler.h
+4-01 files

LLVM/project ccb198dllvm/test/CodeGen/AArch64 sve-streaming-mode-fixed-length-fp-compares.ll sve-streaming-mode-fixed-length-int-rem.ll

[AArch64] NFC: Add RUN lines for streaming-compatible code. (#90617)

The intent is to test lowering of vector operations by scalarization,
for functions that are streaming-compatible (and thus cannot use NEON)
and also don't have the +sve attribute.

The generated code is clearly wrong at the moment, but a series of
patches will follow to fix up all cases to use scalar instructions.

A bit of context:

This work will form the base to decouple SME from SVE later on, as it
will make sure that no NEON instructions are used in
streaming[-compatible] mode. Later this will be followed by a patch that
changes `useSVEForFixedLengthVectors` to only return `true` if SVE is
available for the given runtime mode, at which point I'll change the
`-mattr=+sme -force-streaming-compatible-sve` to `-mattr=+sme
-force-streaming-sve` in the RUN lines, so that the tests are considered
to be executed in Streaming-SVE mode.
DeltaFile
+2,486-0llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-fp-compares.ll
+1,631-0llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-rem.ll
+1,145-0llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-div.ll
+1,058-0llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-fp-reduce.ll
+989-0llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-fp-arith.ll
+965-0llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-fp-minmax.ll
+8,274-050 files not shown
+20,992-056 files

LLVM/project f898161llvm/lib/Target/AMDGPU SIInsertWaitcnts.cpp, llvm/test/CodeGen/AMDGPU waitcnt-sample-waw.mir

[AMDGPU] Fix image_msaa_load waitcnt insertion for pre-gfx12 (#90710)

https://github.com/llvm/llvm-project/pull/90201 made some fixes for
gfx12
image_msaa_load waitcnt insertion.
That fix might break in some situations for pre-gfx12 - this fixes that
by
explitly checking for VSAMPLE which always requires a s_wait_samplecnt
and
leaves the previous logic intact for non-gfx12.
DeltaFile
+24-0llvm/test/CodeGen/AMDGPU/waitcnt-sample-waw.mir
+6-6llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+30-62 files

LLVM/project 5fb1e28llvm/lib/Target/AMDGPU SIInstrInfo.h SIInsertWaitcnts.cpp, llvm/test/CodeGen/AMDGPU llvm.amdgcn.s.barrier.wait.ll llvm.amdgcn.s.barrier.ll

[AMDGPU] Enhance s_waitcnt insertion before barrier for gfx12 (#90595)

Code to determine if a waitcnt is required before a barrier instruction
only
considered S_BARRIER.
gfx12 adds barrier_signal/wait so need to enhance the existing code to
look for
a barrier start (which is just an S_BARRIER for earlier architectures).
DeltaFile
+22-0llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.wait.ll
+11-0llvm/lib/Target/AMDGPU/SIInstrInfo.h
+2-0llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.barrier.ll
+1-1llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+36-14 files

LLVM/project 582c6a8llvm/include/llvm/CodeGen MachineScheduler.h

[llvm] Remove unused constructor (NFC)

Closes #89193
DeltaFile
+0-4llvm/include/llvm/CodeGen/MachineScheduler.h
+0-41 files

LLVM/project 0b21b25llvm/test/CodeGen/AMDGPU memory-legalizer-global-agent.ll memory-legalizer-global-workgroup.ll

[AMDGPU] Do not optimize away pre-existing waitcnt instructions at -O0 (#90716)

The autogenerated memory legalizer tests use -O0 so this allows us to
see the exact waitcnts that were inserted by the memory legalizer
without them being optimized away.
DeltaFile
+13,404-6,344llvm/test/CodeGen/AMDGPU/memory-legalizer-global-agent.ll
+12,949-6,549llvm/test/CodeGen/AMDGPU/memory-legalizer-global-workgroup.ll
+12,712-6,088llvm/test/CodeGen/AMDGPU/memory-legalizer-global-system.ll
+12,041-6,669llvm/test/CodeGen/AMDGPU/memory-legalizer-global-singlethread.ll
+12,041-6,669llvm/test/CodeGen/AMDGPU/memory-legalizer-global-wavefront.ll
+12,958-4,611llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-system.ll
+76,105-36,93022 files not shown
+168,414-77,58028 files