As mentioned by
@jgreco and
@NickF there will always be some manner of inherent overhead from virtualization technology, especially in a case such as this where you've fully allocated every available physical thread to your VM (1 CPU/32 cores/2 threads) - your host OS (SCALE) still needs some CPU horsepower for context switching and handling I/O, so it's having to cycle between running the bare metal OS (SCALE) and the kvm/qemu process, as well as emulating any virtual hardware needed to underpin it, and handle interrupts. Leaving a core or two free might actually improve performance.
The other issue with frequency reporting could be related to the underlying scheduler - for single-threaded tests, if the given pCPU that's running the kvm/qemu workload for vCPU changes frequently, then it may not reach the peak turbo speeds. Note that your virtual VRay benchmark in post #12 reported the clock speed as 1.32GHz in CPUID in the VM.
But I'm still inclined to believe the reported existence of a significant AVX2 ratio offset, as that's been commonly observed even on bare-metal, as a way to keep thermal/power consumption under control, and with an 85W sustained TDP, there isn't headroom for 32 cores to be going bananas with those instructions.