Apparently gcc-6.2 with -O3 and -mfpu=neon can use ARM instructions
that requires 64bit alignment (here vst1.64) with data that
is not 64bit aligned (g->strcache) https://i.imgur.com/vYktsUn.png
So we need to be able to build "arm_neon_functions.cc" with
-mfpu=neon, while not automatically using NEON for the rest
of the codebase, unless explicitly asked for.