The helper doesn't use __builtin_cpu_supports (and instead makes raw
cpuid calls) because of three reasons:
- __builtin_cpu_supports only works on x86_64, so its usage need to be
guarded with the preprocessor similarly to the current code.
Moreover, we will have to use custom mechanisms to detect features on
ARM, since there isn't such thing as cpuid there (and __builtin_cpu_*
are not provided).
- __builtin_cpu_supports doesn't support "sha" feature on all targeted
toolchains currently.
- And, of course, NIH.