Here are some helpful hints to get optimal performance:
     
- The EMMS call takes a lot of time, so try to seperate floating point and MMX operations.
     
- Use MMX only in low level routines because the compiler saves all used MMX registers
     when calling a subroutine.
     
- The NOT-operator isn’t supported natively by MMX, so the compiler has to generate
     a workaround and this operation is inefficient.
     
- Simple assignements of floating point numbers don’t access floating point registers, so
     you need no call to the EMMS procedure. Only when doing arithmetic, you need to call
     the EMMS procedure.