OpenBSD/src SqjSSDlsys/arch/amd64/amd64 fpu.c locore.S, sys/arch/amd64/include fpu.h cpu.h

   Switch from lazy FPU switching to semi-eager FPU switching: track whether
   curproc's xstate ("extended state") is loaded in the CPU or not.
    - context switch, sendsig(), vmm, and doing CPU crypto in the kernel all
      check the flag and, if set, save the old thread's state to the PCB,
      clear the flag, and then load the _blank_ state
    - when returning to userspace, if the flag is clear then set it and restore
      the thread's state

   This simpler tracking also fixes the restoring of FPU state after nested
   signal handlers.

   With this, %cr0's TS flag is never set, the FPU #DNA trap can no
   longer happen, and IPIs are no longer necessary for flushing or
   syncing FPU state; on the other hand, restoring xstate while returning
   to userspace means we have to handle xrstor faulting if we could
   be loading an altered state.  If that happens, reset the state,
   fake a #GP fault (SIGBUS), and recheck for ASTs.

   While here, regularize fxsave/fxrstor vs xsave/xrstor handling, by
   using codepatching to switch to xsave/xrstor when present in the
   CPU.  In addition, code patch in use of xsaveopt in most places
   when the CPU supports that.  Use the 64bit-wide variants of the
   instructions in all cases so that x87 instruction fault IPs are
   reported correctly.

   This change has three motivations:
   1) with modern clang, SSE registers are used even in rcrt0.o, making
      lazy FPU switching a smaller benefit vs trap costs
   2) the Intel SDM warns that lazy FPU switching may increase power costs
   3) post-Spectre rumors suggest that the %cr0 TS flag might not block
      speculation, permitting leaking of information about FPU state
      (AES keys?) across protection boundaries.

   tested by many in snaps; prodding from deraadt@
VersionDeltaFile
1.40+13-231sys/arch/amd64/amd64/fpu.c
1.98+189-49sys/arch/amd64/amd64/locore.S
1.200+23-56sys/arch/amd64/amd64/vmm.c
1.245+43-21sys/arch/amd64/amd64/machdep.c
1.121+41-6sys/arch/amd64/amd64/cpu.c
1.14+9-19sys/arch/amd64/include/fpu.h
1.31+3-21sys/arch/amd64/amd64/ipifuncs.c
1.16+4-20sys/arch/amd64/amd64/process_machdep.c
1.32+2-21sys/arch/amd64/amd64/via.c
1.42+5-17sys/arch/amd64/amd64/vm_machdep.c
1.60+4-12sys/arch/amd64/amd64/vector.S
1.123+3-7sys/arch/amd64/include/cpu.h
1.18+3-5sys/arch/amd64/include/intrdefs.h
1.36+4-4sys/arch/amd64/amd64/genassym.cf
1.81+3-2sys/arch/amd64/amd64/acpi_machdep.c
1.17+1-4sys/arch/amd64/include/pcb.h
1.17+2-2sys/arch/amd64/amd64/mptramp.S
1.5+3-1sys/arch/amd64/include/codepatch.h
1.73+2-2sys/arch/amd64/include/specialreg.h
1.10+1-2sys/arch/amd64/include/proc.h
1.69+2-1sys/arch/amd64/amd64/trap.c
+360-50321 files

UnifiedSplitRaw