Intel MMX Instruction Set also extra Cyrix extensions. A.26 `EMMS': Empty MMX State EMMS ; 0F 77 [PENT,MMX] `EMMS' sets the FPU tag word (marking which floating-point registers are available) to all ones, meaning all registers are available for the FPU to use. It should be used after executing MMX instructions and before executing any subsequent floating-point operations. A.103 `MOVD': Move Doubleword to/from MMX Register MOVD mmxreg,r/m32 ; 0F 6E /r [PENT,MMX] MOVD r/m32,mmxreg ; 0F 7E /r [PENT,MMX] `MOVD' copies 32 bits from its source (second) operand into its destination (first) operand. When the destination is a 64-bit MMX register, the top 32 bits are set to zero. A.104 `MOVQ': Move Quadword to/from MMX Register MOVQ mmxreg,r/m64 ; 0F 6F /r [PENT,MMX] MOVQ r/m64,mmxreg ; 0F 7F /r [PENT,MMX] `MOVQ' copies 64 bits from its source (second) operand into its destination (first) operand. A.113 `PACKSSDW', `PACKSSWB', `PACKUSWB': Pack Data PACKSSDW mmxreg,r/m64 ; 0F 6B /r [PENT,MMX] PACKSSWB mmxreg,r/m64 ; 0F 63 /r [PENT,MMX] PACKUSWB mmxreg,r/m64 ; 0F 67 /r [PENT,MMX] All these instructions start by forming a notional 128-bit word by placing the source (second) operand on the left of the destination (first) operand. `PACKSSDW' then splits this 128-bit word into four doublewords, converts each to a word, and loads them side by side into the destination register; `PACKSSWB' and `PACKUSWB' both split the 128-bit word into eight words, converts each to a byte, and loads _those_ side by side into the destination register. `PACKSSDW' and `PACKSSWB' perform signed saturation when reducing the length of numbers: if the number is too large to fit into the reduced space, they replace it by the largest signed number (`7FFFh' or `7Fh') that _will_ fit, and if it is too small then they replace it by the smallest signed number (`8000h' or `80h') that will fit. `PACKUSWB' performs unsigned saturation: it treats its input as unsigned, and replaces it by the largest unsigned number that will fit. A.114 `PADDxx': MMX Packed Addition PADDB mmxreg,r/m64 ; 0F FC /r [PENT,MMX] PADDW mmxreg,r/m64 ; 0F FD /r [PENT,MMX] PADDD mmxreg,r/m64 ; 0F FE /r [PENT,MMX] PADDSB mmxreg,r/m64 ; 0F EC /r [PENT,MMX] PADDSW mmxreg,r/m64 ; 0F ED /r [PENT,MMX] PADDUSB mmxreg,r/m64 ; 0F DC /r [PENT,MMX] PADDUSW mmxreg,r/m64 ; 0F DD /r [PENT,MMX] `PADDxx' all perform packed addition between their two 64-bit operands, storing the result in the destination (first) operand. The `PADDxB' forms treat the 64-bit operands as vectors of eight bytes, and add each byte individually; `PADDxW' treat the operands as vectors of four words; and `PADDD' treats its operands as vectors of two doublewords. `PADDSB' and `PADDSW' perform signed saturation on the sum of each pair of bytes or words: if the result of an addition is too large or too small to fit into a signed byte or word result, it is clipped (saturated) to the largest or smallest value which _will_ fit. `PADDUSB' and `PADDUSW' similarly perform unsigned saturation, clipping to `0FFh' or `0FFFFh' if the result is larger than that. A.115 `PADDSIW': MMX Packed Addition to Implicit Destination PADDSIW mmxreg,r/m64 ; 0F 51 /r [CYRIX,MMX] `PADDSIW', specific to the Cyrix extensions to the MMX instruction set, performs the same function as `PADDSW', except that the result is not placed in the register specified by the first operand, but instead in the register whose number differs from the first operand only in the last bit. So `PADDSIW MM0,MM2' would put the result in `MM1', but `PADDSIW MM1,MM2' would put the result in `MM0'. A.116 `PAND', `PANDN': MMX Bitwise AND and AND-NOT PAND mmxreg,r/m64 ; 0F DB /r [PENT,MMX] PANDN mmxreg,r/m64 ; 0F DF /r [PENT,MMX] `PAND' performs a bitwise AND operation between its two operands (i.e. each bit of the result is 1 if and only if the corresponding bits of the two inputs were both 1), and stores the result in the destination (first) operand. `PANDN' performs the same operation, but performs a one's complement operation on the destination (first) operand first. A.117 `PAVEB': MMX Packed Average PAVEB mmxreg,r/m64 ; 0F 50 /r [CYRIX,MMX] `PAVEB', specific to the Cyrix MMX extensions, treats its two operands as vectors of eight unsigned bytes, and calculates the average of the corresponding bytes in the operands. The resulting vector of eight averages is stored in the first operand. A.118 `PCMPxx': MMX Packed Comparison PCMPEQB mmxreg,r/m64 ; 0F 74 /r [PENT,MMX] PCMPEQW mmxreg,r/m64 ; 0F 75 /r [PENT,MMX] PCMPEQD mmxreg,r/m64 ; 0F 76 /r [PENT,MMX] PCMPGTB mmxreg,r/m64 ; 0F 64 /r [PENT,MMX] PCMPGTW mmxreg,r/m64 ; 0F 65 /r [PENT,MMX] PCMPGTD mmxreg,r/m64 ; 0F 66 /r [PENT,MMX] The `PCMPxx' instructions all treat their operands as vectors of bytes, words, or doublewords; corresponding elements of the source and destination are compared, and the corresponding element of the destination (first) operand is set to all zeros or all ones depending on the result of the comparison. `PCMPxxB' treats the operands as vectors of eight bytes, `PCMPxxW' treats them as vectors of four words, and `PCMPxxD' as two doublewords. `PCMPEQx' sets the corresponding element of the destination operand to all ones if the two elements compared are equal; `PCMPGTx' sets the destination element to all ones if the element of the first (destination) operand is greater (treated as a signed integer) than that of the second (source) operand. A.119 `PDISTIB': MMX Packed Distance and Accumulate with Implied Register PDISTIB mmxreg,mem64 ; 0F 54 /r [CYRIX,MMX] `PDISTIB', specific to the Cyrix MMX extensions, treats its two input operands as vectors of eight unsigned bytes. For each byte position, it finds the absolute difference between the bytes in that position in the two input operands, and adds that value to the byte in the same position in the implied output register. The addition is saturated to an unsigned byte in the same way as `PADDUSB'. The implied output register is found in the same way as `PADDSIW' (section A.115). Note that `PDISTIB' cannot take a register as its second source operand. A.120 `PMACHRIW': MMX Packed Multiply and Accumulate with Rounding PMACHRIW mmxreg,mem64 ; 0F 5E /r [CYRIX,MMX] `PMACHRIW' acts almost identically to `PMULHRIW' (section A.123), but instead of _storing_ its result in the implied destination register, it _adds_ its result, as four packed words, to the implied destination register. No saturation is done: the addition can wrap around. Note that `PMACHRIW' cannot take a register as its second source operand. A.121 `PMADDWD': MMX Packed Multiply and Add PMADDWD mmxreg,r/m64 ; 0F F5 /r [PENT,MMX] `PMADDWD' treats its two inputs as vectors of four signed words. It multiplies corresponding elements of the two operands, giving four signed doubleword results. The top two of these are added and placed in the top 32 bits of the destination (first) operand; the bottom two are added and placed in the bottom 32 bits. A.122 `PMAGW': MMX Packed Magnitude PMAGW mmxreg,r/m64 ; 0F 52 /r [CYRIX,MMX] `PMAGW', specific to the Cyrix MMX extensions, treats both its operands as vectors of four signed words. It compares the absolute values of the words in corresponding positions, and sets each word of the destination (first) operand to whichever of the two words in that position had the larger absolute value. A.123 `PMULHRW', `PMULHRIW': MMX Packed Multiply High with Rounding PMULHRW mmxreg,r/m64 ; 0F 59 /r [CYRIX,MMX] PMULHRIW mmxreg,r/m64 ; 0F 5D /r [CYRIX,MMX] These instructions, specific to the Cyrix MMX extensions, treat their operands as vectors of four signed words. Words in corresponding positions are multiplied, to give a 32-bit value in which bits 30 and 31 are guaranteed equal. Bits 30 to 15 of this value (bit mask `0x7FFF8000') are taken and stored in the corresponding position of the destination operand, after first rounding the low bit (equivalent to adding `0x4000' before extracting bits 30 to 15). For `PMULHRW', the destination operand is the first operand; for `PMULHRIW' the destination operand is implied by the first operand in the manner of `PADDSIW' (section A.115). A.124 `PMULHW', `PMULLW': MMX Packed Multiply PMULHW mmxreg,r/m64 ; 0F E5 /r [PENT,MMX] PMULLW mmxreg,r/m64 ; 0F D5 /r [PENT,MMX] `PMULxW' treats its two inputs as vectors of four signed words. It multiplies corresponding elements of the two operands, giving four signed doubleword results. `PMULHW' then stores the top 16 bits of each doubleword in the destination (first) operand; `PMULLW' stores the bottom 16 bits of each doubleword in the destination operand. A.125 `PMVccZB': MMX Packed Conditional Move PMVZB mmxreg,mem64 ; 0F 58 /r [CYRIX,MMX] PMVNZB mmxreg,mem64 ; 0F 5A /r [CYRIX,MMX] PMVLZB mmxreg,mem64 ; 0F 5B /r [CYRIX,MMX] PMVGEZB mmxreg,mem64 ; 0F 5C /r [CYRIX,MMX] These instructions, specific to the Cyrix MMX extensions, perform parallel conditional moves. The two input operands are treated as vectors of eight bytes. Each byte of the destination (first) operand is either written from the corresponding byte of the source (second) operand, or left alone, depending on the value of the byte in the _implied_ operand (specified in the same way as `PADDSIW', in section A.115). `PMVZB' performs each move if the corresponding byte in the implied operand is zero. `PMVNZB' moves if the byte is non-zero. `PMVLZB' moves if the byte is less than zero, and `PMVGEZB' moves if the byte is greater than or equal to zero. Note that these instructions cannot take a register as their second source operand. A.129 `POR': MMX Bitwise OR POR mmxreg,r/m64 ; 0F EB /r [PENT,MMX] `POR' performs a bitwise OR operation between its two operands (i.e. each bit of the result is 1 if and only if at least one of the corresponding bits of the two inputs was 1), and stores the result in the destination (first) operand. A.130 `PSLLx', `PSRLx', `PSRAx': MMX Bit Shifts PSLLW mmxreg,r/m64 ; 0F F1 /r [PENT,MMX] PSLLW mmxreg,imm8 ; 0F 71 /6 ib [PENT,MMX] PSLLD mmxreg,r/m64 ; 0F F2 /r [PENT,MMX] PSLLD mmxreg,imm8 ; 0F 72 /6 ib [PENT,MMX] PSLLQ mmxreg,r/m64 ; 0F F3 /r [PENT,MMX] PSLLQ mmxreg,imm8 ; 0F 73 /6 ib [PENT,MMX] PSRAW mmxreg,r/m64 ; 0F E1 /r [PENT,MMX] PSRAW mmxreg,imm8 ; 0F 71 /4 ib [PENT,MMX] PSRAD mmxreg,r/m64 ; 0F E2 /r [PENT,MMX] PSRAD mmxreg,imm8 ; 0F 72 /4 ib [PENT,MMX] PSRLW mmxreg,r/m64 ; 0F D1 /r [PENT,MMX] PSRLW mmxreg,imm8 ; 0F 71 /2 ib [PENT,MMX] PSRLD mmxreg,r/m64 ; 0F D2 /r [PENT,MMX] PSRLD mmxreg,imm8 ; 0F 72 /2 ib [PENT,MMX] PSRLQ mmxreg,r/m64 ; 0F D3 /r [PENT,MMX] PSRLQ mmxreg,imm8 ; 0F 73 /2 ib [PENT,MMX] `PSxxQ' perform simple bit shifts on the 64-bit MMX registers: the destination (first) operand is shifted left or right by the number of bits given in the source (second) operand, and the vacated bits are filled in with zeros (for a logical shift) or copies of the original sign bit (for an arithmetic right shift). `PSxxW' and `PSxxD' perform packed bit shifts: the destination operand is treated as a vector of four words or two doublewords, and each element is shifted individually, so bits shifted out of one element do not interfere with empty bits coming into the next. `PSLLx' and `PSRLx' perform logical shifts: the vacated bits at one end of the shifted number are filled with zeros. `PSRAx' performs an arithmetic right shift: the vacated bits at the top of the shifted number are filled with copies of the original top (sign) bit. A.131 `PSUBxx': MMX Packed Subtraction PSUBB mmxreg,r/m64 ; 0F F8 /r [PENT,MMX] PSUBW mmxreg,r/m64 ; 0F F9 /r [PENT,MMX] PSUBD mmxreg,r/m64 ; 0F FA /r [PENT,MMX] PSUBSB mmxreg,r/m64 ; 0F E8 /r [PENT,MMX] PSUBSW mmxreg,r/m64 ; 0F E9 /r [PENT,MMX] PSUBUSB mmxreg,r/m64 ; 0F D8 /r [PENT,MMX] PSUBUSW mmxreg,r/m64 ; 0F D9 /r [PENT,MMX] `PSUBxx' all perform packed subtraction between their two 64-bit operands, storing the result in the destination (first) operand. The `PSUBxB' forms treat the 64-bit operands as vectors of eight bytes, and subtract each byte individually; `PSUBxW' treat the operands as vectors of four words; and `PSUBD' treats its operands as vectors of two doublewords. In all cases, the elements of the operand on the right are subtracted from the corresponding elements of the operand on the left, not the other way round. `PSUBSB' and `PSUBSW' perform signed saturation on the sum of each pair of bytes or words: if the result of a subtraction is too large or too small to fit into a signed byte or word result, it is clipped (saturated) to the largest or smallest value which _will_ fit. `PSUBUSB' and `PSUBUSW' similarly perform unsigned saturation, clipping to `0FFh' or `0FFFFh' if the result is larger than that. A.132 `PSUBSIW': MMX Packed Subtract with Saturation to Implied Destination PSUBSIW mmxreg,r/m64 ; 0F 55 /r [CYRIX,MMX] `PSUBSIW', specific to the Cyrix extensions to the MMX instruction set, performs the same function as `PSUBSW', except that the result is not placed in the register specified by the first operand, but instead in the implied destination register, specified as for `PADDSIW' (section A.115). A.133 `PUNPCKxxx': Unpack Data PUNPCKHBW mmxreg,r/m64 ; 0F 68 /r [PENT,MMX] PUNPCKHWD mmxreg,r/m64 ; 0F 69 /r [PENT,MMX] PUNPCKHDQ mmxreg,r/m64 ; 0F 6A /r [PENT,MMX] PUNPCKLBW mmxreg,r/m64 ; 0F 60 /r [PENT,MMX] PUNPCKLWD mmxreg,r/m64 ; 0F 61 /r [PENT,MMX] PUNPCKLDQ mmxreg,r/m64 ; 0F 62 /r [PENT,MMX] `PUNPCKxx' all treat their operands as vectors, and produce a new vector generated by interleaving elements from the two inputs. The `PUNPCKHxx' instructions start by throwing away the bottom half of each input operand, and the `PUNPCKLxx' instructions throw away the top half. The remaining elements, totalling 64 bits, are then interleaved into the destination, alternating elements from the second (source) operand and the first (destination) operand: so the leftmost element in the result always comes from the second operand, and the rightmost from the destination. `PUNPCKxBW' works a byte at a time, `PUNPCKxWD' a word at a time, and `PUNPCKxDQ' a doubleword at a time. So, for example, if the first operand held `0x7A6A5A4A3A2A1A0A' and the second held `0x7B6B5B4B3B2B1B0B', then: (*) `PUNPCKHBW' would return `0x7B7A6B6A5B5A4B4A'. (*) `PUNPCKHWD' would return `0x7B6B7A6A5B4B5A4A'. (*) `PUNPCKHDQ' would return `0x7B6B5B4B7A6A5A4A'. (*) `PUNPCKLBW' would return `0x3B3A2B2A1B1A0B0A'. (*) `PUNPCKLWD' would return `0x3B2B3A2A1B0B1A0A'. (*) `PUNPCKLDQ' would return `0x3B2B1B0B3A2A1A0A'. A.134 `PUSH': Push Data on Stack PUSH reg16 ; o16 50+r [8086] PUSH reg32 ; o32 50+r [386] PUSH r/m16 ; o16 FF /6 [8086] PUSH r/m32 ; o32 FF /6 [386] PUSH CS ; 0E [8086] PUSH DS ; 1E [8086] PUSH ES ; 06 [8086] PUSH SS ; 16 [8086] PUSH FS ; 0F A0 [386] PUSH GS ; 0F A8 [386] PUSH imm8 ; 6A ib [286] PUSH imm16 ; o16 68 iw [286] PUSH imm32 ; o32 68 id [386] `PUSH' decrements the stack pointer (`SP' or `ESP') by 2 or 4, and then stores the given value at `[SS:SP]' or `[SS:ESP]'. The address-size attribute of the instruction determines whether `SP' or `ESP' is used as the stack pointer: to deliberately override the default given by the `BITS' setting, you can use an `a16' or `a32' prefix. The operand-size attribute of the instruction determines whether the stack pointer is decremented by 2 or 4: this means that segment register pushes in `BITS 32' mode will push 4 bytes on the stack, of which the upper two are undefined. If you need to override that, you can use an `o16' or `o32' prefix. The above opcode listings give two forms for general-purpose register push instructions: for example, `PUSH BX' has the two forms `53' and `FF F3'. NASM will always generate the shorter form when given `PUSH BX'. NDISASM will disassemble both. Unlike the undocumented and barely supported `POP CS', `PUSH CS' is a perfectly valid and sensible instruction, supported on all processors. The instruction `PUSH SP' may be used to distinguish an 8086 from later processors: on an 8086, the value of `SP' stored is the value it has _after_ the push instruction, whereas on later processors it is the value _before_ the push instruction. A.135 `PUSHAx': Push All General-Purpose Registers PUSHA ; 60 [186] PUSHAD ; o32 60 [386] PUSHAW ; o16 60 [186] `PUSHAW' pushes, in succession, `AX', `CX', `DX', `BX', `SP', `BP', `SI' and `DI' on the stack, decrementing the stack pointer by a total of 16. `PUSHAD' pushes, in succession, `EAX', `ECX', `EDX', `EBX', `ESP', `EBP', `ESI' and `EDI' on the stack, decrementing the stack pointer by a total of 32. In both cases, the value of `SP' or `ESP' pushed is its _original_ value, as it had before the instruction was executed. `PUSHA' is an alias mnemonic for either `PUSHAW' or `PUSHAD', depending on the current `BITS' setting. Note that the registers are pushed in order of their numeric values in opcodes (see section A.2.1). See also `POPA' (section A.127). A.136 `PUSHFx': Push Flags Register PUSHF ; 9C [186] PUSHFD ; o32 9C [386] PUSHFW ; o16 9C [186] `PUSHFW' pops a word from the stack and stores it in the bottom 16 bits of the flags register (or the whole flags register, on processors below a 386). `PUSHFD' pops a doubleword and stores it in the entire flags register. `PUSHF' is an alias mnemonic for either `PUSHFW' or `PUSHFD', depending on the current `BITS' setting. See also `POPF' (section A.128). A.137 `PXOR': MMX Bitwise XOR PXOR mmxreg,r/m64 ; 0F EF /r [PENT,MMX] `PXOR' performs a bitwise XOR operation between its two operands (i.e. each bit of the result is 1 if and only if exactly one of the corresponding bits of the two inputs was 1), and stores the result in the destination (first) operand.