Best Programmer
یک شنبه 03 دی 1385, 15:32 عصر
What are floating-point numbers?
So far, all the number systems discussed in this chapter have revolved around whole numbers. Whole
numbers represent numbers that are used for counting, such as one dog, two cats, and ten horses.
Eventually, the concept of negative numbers was included along with whole numbers to incorporate
the signed integer number system. Both the integer and BCD data types can only contain whole integer
values.
As you know, not all numerical relationships can be described using integers. At some point, the concept
of fractions was introduced. This meant that an infinite number of values could be contained between
two integer values. Besides the infinite number of values between integers, there is also an infinite number
of integer values in the number system. All of these numbers combined are referred to as real numbers.
Real numbers can contain any numerical value from positive infinity to negative infinity, with any
number of decimal places. An example of a real number would be 72,326.224576.
182
Chapter 7
Working with real numbers on a computer can be a challenge, especially when there are many different
magnitudes of numbers. The floating-point format was developed to produce a standard method for
representing real numbers on computer systems.
Floating-point format
The floating-point format represents real numbers using a scientific notation. If you had any type of science
class in school you are probably familiar with scientific notation. Scientific notation presents numbers
as a coefficient (also called the mantissa) and an exponent, such as 3.6845 × 10^2. In the decimal world,
the exponent is based on a value of 10, and represents the number of places the decimal point has been
moved to produce the coefficient. Each time the decimal point is moved up, the exponent increases. Each
time the decimal point is moved back, the exponent decreases.
For example, the real number 25.92 would be represented in scientific notation as 2.592 × 10^1. The
value 2.592 is the coefficient, and the value 10^1 is the exponent. You must multiply the coefficient by
the exponent to obtain the original real number. As another example, the value .00172 would be represented
as 1.72 × 10^-3. The number 1.72 must be multiplied by 10^-3 to obtain the original value.
Binary floating-point format
Computer systems use binary floating-point numbers, which express values in binary scientific notation
format. Because the numbers are in binary format, the coefficient and exponent are based on a binary
value, not a decimal value. An example of this would be 1.0101× 2^2. Working with the fractional part
of the coefficient (the part after the decimal place) can be confusing.
To decipher the binary floating-point value, you must first understand how fractional binary numbers
work. In the decimal world, you are used to seeing values such as 0.159. What this value represents is
0 + 1⁄10 + 5⁄100 + 9⁄1000. The same principle applies to the binary world.
The coefficient value 1.0101 multiplied by the exponent 2^2 would yield the binary value 101.01, which
represents the decimal whole number 5, plus the fraction 0⁄2 + 1/4. This yields the decimal value 5.25.
Fractional binary numbers are the most confusing part of dealing with floating-point values. The following
table shows the first few binary fractions and their decimal equivalents.
Binary Decimal Fraction Decimal Value
0.1 1⁄2 0.5
.01 1⁄4 0.25
.001 1⁄8 0.125
.0001 1⁄16 .0625
.00001 1⁄32 .03125
.000001 1⁄64 0.015625
183
Using Numbers
To help demonstrate binary fractions, the following table shows a few examples of using binary floatingpoint
values:
Binary Decimal Fraction Decimal Value
10.101 2 + 1/2 + 1/8 2.625
10011.001 19 + 1/8 19.125
10110.1101 22 + 1/2 + 1/4 + 1/16 22.8125
1101.011 13 + 1/4 + 1/8 13.375
The examples in the table have a finite fractional part. However, just as decimal fractions can have a
repeating value (such as the decimal value of 1/3), binary fractions can also have a repeating fraction
value. These values must be truncated at some point and can only estimate the decimal fraction in
binary.
Fortunately, the GNU assembler does this work for us, so don’t get too worried if you are not completely
comfortable with binary fractions and binary floating-point format.
When writing binary floating-point values, the binary values are usually normalized. This process
moves the decimal point to the leftmost digit and modifies the exponent to compensate. For example,
the value 1101.011 becomes 1.101011 × 2^3.
Trying to properly represent binary floating-point numbers in a computer system was a challenge in the
early days of computing. Fortunately, standards were developed to help programmers deal with floating-
point numbers. A set of standard floating-point data types was created to simplify handling real
numbers in computer programs. The next section describes the standard floating-point data types.
Standard floating-point data types
While there are an infinite number of possible real number values, processors have a finite number of
bits available to handle the values. Because of this, a standard system was created for approximating real
numbers in a computer environment. While the approximations are not perfect, they provide a system
for working with a realistic subset of the real number system.
In 1985, the Institute of Electrical and Electronics Engineers (IEEE) created what is called the IEEE
Standard 754 floating-point formats. These formats are used universally to represent real numbers in
computer systems. Intel has adopted this standard in the IA-32 platform for representing floating-point
values.
184
Chapter 7
The IEEE Standard 754 floating-point standard defines real numbers as binary floating-point values
using three components:
❑ A sign
❑ A significand
❑ An exponent
The sign bit denotes if the value is negative or positive. A one in the sign bit denotes a negative value,
and a zero denotes a positive value.
The significand part represents the coefficient (or mantissa) of the floating-point number. The coefficient
can be either normalized or denormalized. When a binary value is normalized, it is written with a one
before the decimal point. The exponent is modified to accommodate how many bit positions have been
shifted to accomplish this (similar to the scientific notation method). This means that in a normalized
value, the significand is always comprised of a one and a binary fraction.
The exponent represents the exponent part of the floating-point number. Because the exponent value can
be positive or negative, it is offset by a bias value. This ensures that the exponent field can only be a positive
unsigned integer. It also limits the minimum and maximum exponent values available for use in
the format. The general format of the binary floating-point number is shown in Figure 7-11.
Figure 7-11
These three parts of the floating-point number are contained within a fixed-size data format. The IEEE
Standard 754 defines two sizes of floating-point numbers:
❑ 32-bits (called single-precision)
❑ 64-bits (called double-precision)
Binary Floating Point Format
Sign Bit Exponent Coefficient
185
Using Numbers
The number of bits available for representing the significand determines the precision. Figure 7-12
shows the bit layouts for the two different precision types.
Figure 7-12
The single-precision floating-point number uses a 23-bit significand value. However, the floating-point
format assumes that the integer value of the significand will always be a one and does not use it in the
significand value. This effectively makes 24 bits of precision for the significand. The exponent uses an 8-
bit value, with a bias value of 127. This means that the exponent can have a value between -128 and +127
(binary exponent). This combination produces a decimal range for single-precision floating-point numbers
of 1.18 × 10^-38 to 3.40 × 10^38.
The double-precision floating-point number uses a 52-bit fraction value, which provides 53 bits of precision
for the significand. The exponent uses an 11-bit value, with a bias value of 1023. This means that the
exponent can have a value between -1022 and +1023 (binary exponent). This combination produces a
decimal range for double-precision floating-point numbers of 2.23 × 10^-308 to 1.79 × 10^308.
IA-32 floating-point values
The IA-32 platform uses both the IEEE Standard 754 single- and double-precision floating-point formats,
along with its own 80-bit format called the double-extended-precision floating-point format. The three
formats provide for different levels of precision when performing floating-point math. The doubleextended-
precision floating-point format is used within the FPU 80-bit registers during floating-point
mathematical processes.
The Intel 80-bit double-extended-precision floating-point format uses 64 bits for the signficand and
15 bits for the exponent. The bias value used for the double-extended-precision floating-point format
is 16,383, producing an exponent range of –16382 to +16383, for a decimal range of 3.37 × 10^-4932 to
1.18 × 10^4932.
Exponent
Exponent
Significand
IEEE Standard 754 Floating Point Formats
Significand
31
Sign
Single Precision
30 23 22 0
63 62 52 51 0
Sign Double Precision
186
Chapter 7
The following table sums up the three types of floating-point formats used on the standard IA-32
platform.
Data Type Length Significand Bits Exponent Bits Range
Single precision 32 24 8 1.18 x 10^-38 to
3.40 x 10^38
Double precision 64 53 11 2.23 x 10^-308 to
1.79 x 10^308
Double extended 80 64 15 3.37 x 10^-4932 to
1.18 x 10^4932
Defining floating-point values in GAS
The GNU assembler provides directives for defining single-and double-precision floating-point values
(see Chapter 5, “Moving Data”). At the time of this writing, gas does not have a directive for defining
double-extended-precision floating-point values.
Floating-point values are stored in memory using the little-endian format. Arrays are stored in the order
in which the values are defined in the directive. The .float directive is used to create 32-bit singleprecision
values, while the .double directive is used to create 64-bit double-precision values.
Moving floating-point values
The FLD instruction is used to move floating-point values into and out of the FPU registers. The format of
the FLD instruction is
fld source
where source can be a 32-, 64-, or 80-bit memory location.
The floattest.s program demonstrates how floating-point data values are defined and used in
assembly language programs:
# floattest.s - An example of using floating point numbers
.section .data
value1:
.float 12.34
value2:
.double 2353.631
.section .bss
.lcomm data, 8
.section .text
.globl _start
_start:
nop
flds value1
187
Using Numbers
fldl value2
fstl data
movl $1, %eax
movl $0, %ebx
int $0x80
The value1 label points to a single-precision floating-point value stored in 4 bytes of memory. The
value2 label points to a double-precision floating-point value stored in 8 bytes of memory. The data
label points to an empty buffer in memory that will be used to transfer a double-precision floating-point
value.
The IA-32 FLD instruction is used for loading single- and double-precision floating-point numbers stored
in memory onto the FPU register stack. To differentiate between the data sizes, the GNU assembler uses
the FLDS instruction for loading single-precision floating-point numbers, and the FLDL instruction for
loading double-precision floating-point numbers.
Similarly, the FST instruction is used for retrieving the top value on the FPU register stack and placing
the value in a memory location. Again, for single-precision numbers, the instruction is FSTS, and for
double-precision numbers, FSTL.
After assembling the floattest.s program, watch the memory locations and register values as the
instructions execute. First, look at how the floating-point values are stored in the memory locations:
(gdb) x/4b &value1
0x8049094 <value1>: 0xa4 0x70 0x45 0x41
(gdb) x/8b &value2
0x8049098 <value2>: 0x8d 0x97 0x6e 0x12 0x43 0x63 0xa2 0x40
(gdb)
If you want to view the decimal values, you can use the f option of the x command:
(gdb) x/f &value1
0x8049094 <value1>: 12.3400002
(gdb) x/gf &value2
0x8049098 <value2>: 2353.6309999999999
(gdb)
Notice that when the debugger attempts to calculate the values for display, rounding errors are already
present. The f option only displays single-precision numbers. To display the double-precision value, you
need to use the gf option, which displays quadword values.
After stepping through the first FLDS instruction, look at the value of the ST0 register using either the
info reg or print command:
(gdb) print $st0
$1 = 12.340000152587890625
(gdb)
188
Chapter 7
The value in the value1 memory location was correctly placed in the ST0 register. Now step through the
next instruction, and look at the value in the ST0 register:
(gdb) print $st0
$2 = 2353.6309999999998581188265234231949
(gdb)
The value has been replaced with the newly loaded double-precision value (and the debugger correctly
displayed the value as a double-precision floating-point number). To see what happened with the originally
loaded value, look at the ST1 register:
(gdb) print $st1
$3 = 12.340000152587890625
(gdb)
As expected, the value in ST0 was shifted down to the ST1 register when the new value was loaded.
Now look at the value of the data label, then step through the FSTL instruction, and look again:
(gdb) x/gf &data
0x80490a0 <data>: 0
(gdb) s
18 movl $1, %eax
(gdb) x/gf &data
0x80490a0 <data>: 2353.6309999999999
(gdb)
The FSTL instruction loaded the value in the ST0 register to the memory location pointed to by the
data label.
Using preset floating-point values
The IA-32 instruction set includes some preset floating-point values that can be loaded into the FPU register
stack. These are shown in the following table.
Instruction Description
FLD1 Push +1.0 into the FPU stack
FLDL2T Push log(base 2) 10 onto the FPU stack
FLDL2E Push log(base 2) e onto the FPU stack
FLDPI Push the value of pi onto the FPU stack
FLDLG2 Push log(base 10) 2 onto the FPU stack
FLDLN2 Push log(base e) 2 onto the FPU stack
FLDZ Push +0.0 onto the FPU stack
189
Using Numbers
These instructions provide an easy way to push common mathematical values onto the FPU stack
for operations with your data. You may notice something odd about the FLDZ instruction. In the
floating-point data types, there is a difference between +0.0 and –0.0. For most operations they are
considered the same value, but they produce different values when used in division (positive infinity
and negative infinity).
The fpuvals.s program demonstrates how the preset floating-point values can be used:
# fpuvals.s - An example of pushing floating point constants
.section .text
.globl _start
_start:
nop
fld1
fldl2t
fldl2e
fldpi
fldlg2
fldln2
fldz
movl $1, %eax
movl $0, %ebx
int $0x80
The fpuvals.s program simply pushes the various floating-point constants onto the FPU register stack.
You can assemble the program and run it in the debugger to watch the FPU register stack as the instructions
are executed. At the end of the list, the registers should look like this:
(gdb) info all
.
.
.
st0 0 (raw 0x00000000000000000000)
st1 0.6931471805599453094286904741849753 (raw 0x3ffeb17217f7d1cf79ac)
st2 0.30102999566398119522564642835948945 (raw 0x3ffd9a209a84fbcff799)
st3 3.1415926535897932385128089594061862 (raw 0x4000c90fdaa22168c235)
st4 1.4426950408889634073876517827983434 (raw 0x3fffb8aa3b295c17f0bc)
st5 3.3219280948873623478083405569094566 (raw 0x4000d49a784bcd1b8afe)
st6 1 (raw 0x3fff8000000000000000)
st7 0 (raw 0x00000000000000000000)
(gdb)
The values are in the reverse order from how they were placed into the stack.
SSE floating-point data types
Besides the three standard floating-point data types, Intel processors that implement the SSE technology
include two advanced floating-point data types. The SSE technology incorporates eight 128-bit XMM
registers (see Chapter 2 for more details) that can be used to hold packed floating-point numbers.
190
Chapter 7
Similar to the packed BCD concept, packed floating-point numbers enable multiple floating-point values
to be stored in a single register. Floating-point calculations can be performed in parallel using the multiple
data elements, producing results quicker than sequentially processing the data.
The following two new 128-bit floating-point data types are available:
❑ 128-bit packed single-precision floating-point (in SSE)
❑ 128-bit packed double-precision floating-point (in SSE2)
Because a single-precision floating-point value requires 32 bits, the 128-bit register can hold four packed
single-precision floating-point values. Similarly, it can hold two 64-bit packed double-precision floatingpoint
values. This is shown in Figure 7-13.
Figure 7-13
These new data types are not available in the FPU or MMX registers. They can only be used in the XMM registers
and only on processors that support SSE or SSE2. Special instructions must be used to load and
retrieve the data values, as well as special math instructions for performing mathematical operations on
the packed floating-point data.
Moving SSE floating-point values
As expected, the IA-32 instruction set includes instructions for moving the new SSE floating-point data
type values around the processor. The instructions are divided between the SSE instructions that operate
on packed single-precision floating-point data, and the SSE2 instructions that operate on packed doubleprecision
floating-point data.
SSE floating-point values
There is a complete set of instructions for moving 128-bit packed single-precision floating-point values
between memory and the XMM registers on the processor. The instructions for moving SSE packed singleprecision
floating-point data are shown in the following table.
128-bit
XMM register
127
4
2 1
3 2 1
0
4 32-bit
single precision
floating point
2 64-bit
double precision
floating point
191
Using Numbers
Instruction Description
MOVAPS Move four aligned, packed single-precision values to XMM
registers or memory
MOVUPS Move four unaligned, packed single-precision values to XMM
registers or memory
MOVSS Move a single-precision value to memory or the low doubleword
of a register
MOVLPS Move two single-precision values to memory or the low
quadword of a register
MOVHPS Move two single-precision values to memory or the high
quadword of a register
MOVLHPS Move two single-precision values from the low quadword to
the high quadword
MOVHLPS Move two single-precision values from the high quadword to
the low quadword
Each of these instructions uses the 128-bit XMM registers to move packed 32-bit single-precision floatingpoint
values between the XMM registers and memory. Not only can you move entire groups of packed
single-precision floating-point values, you can also move a subset of two packed single-precision
floating-point values between XMM registers.
An example of moving SSE packed single-precision floating-point values is shown in ssefloat.s:
# ssefloat.s - An example of moving SSE FP data types
.section .data
value1:
.float 12.34, 2345.543, -3493.2, 0.44901
value2:
.float -5439.234, 32121.4, 1.0094, 0.000003
.section .bss
.lcomm data, 16
.section .text
.globl _start
_start:
nop
movups value1, %xmm0
movups value2, %xmm1
movups %xmm0, %xmm2
movups %xmm0, data
movl $1, %eax
movl $0, %ebx
int $0x80
192
Chapter 7
The ssefloat.s program creates two data arrays of four single-precision floating-point values (value1
and value2). These will become the packed data value to be stored in the XMM registers. Also, a data
buffer is created with enough space to hold four single-precision floating-point values (a single packed
value). The program then uses the MOVUPS instruction to move the packed single-precision floating-point
values around between the XMM registers and memory.
After assembling the program, you can see what happens in the debugger. After stepping through the
first three instructions, the XMM registers should look like this:
(gdb) print $xmm0
$1 = {v4_float = {5.84860315e+35, 2.63562489, 1.79352231e-36, 5.07264233},
v2_double = {12.34, 2345.5430000000001},
v16_int8 = “xxxx\024x(@u\223\030\004\026Sx@”, v8_int16 = {18350, 31457,
-20972, 16424, -27787, 1048, 21270, 16546}, v4_int32 = {2061584302,
1076407828, 68719477, 1084379926}, v2_int64 = {4623136420479977390,
4657376318677619573}, uint128 = 0x40a25316041893754028ae147ae147ae}
(gdb) print $xmm1
$2 = {v4_float = {-1.11704749e+24, -5.66396856, -1.58818684e-23, 6.98026705},
v2_double = {-5439.2340000000004, 32121.400000000001},
v16_int8 = “D\213lxxxx\232\231\231\231xxxx”, v8_int16 = {-29884, -6292,
16187, -16203, -26214, -26215, 24153, 16607}, v4_int32 = {-412316860,
-1061863621, -1717986918, 1088380505}, v2_int64 = {-4560669521124488380,
4674558677155944858}, uint128 = 0x40df5e599999999ac0b53f3be76c8b44}
(gdb) print $xmm2
$3 = {v4_float = {5.84860315e+35, 2.63562489, 1.79352231e-36, 5.07264233},
v2_double = {12.34, 2345.5430000000001},
v16_int8 = “xxxx\024x(@u\223\030\004\026Sx@”, v8_int16 = {18350, 31457,
-20972, 16424, -27787, 1048, 21270, 16546}, v4_int32 = {2061584302,
1076407828, 68719477, 1084379926}, v2_int64 = {4623136420479977390,
4657376318677619573}, uint128 = 0x40a25316041893754028ae147ae147ae}
(gdb)
As you can see from the output, all of the data was properly loaded into the XMM registers. The v4_float
format shows the packed single-precision floating-point values that were used.
The final instruction step is to copy a value from the XMM register to the data location. You can display
the results using the x/4f command:
(gdb) x/4f &data
0x80490c0 <data>: 12.3400002 2345.54297 -3493.19995 0.449010015
(gdb)
To display the bytes stored in the data memory location as four single-precision floating-point values,
you can use the 4f option of the x command. This interprets the 8 bytes into the proper format. The
data memory location now contains the data loaded from the value1 memory location into the XMM
register, and copied to the data memory location. Just in case the rounding errors in the debugger fool
you, you can double-check the answer in hexadecimal:
193
Using Numbers
(gdb) x/16b &data
0x80490c0 <data>: 0xa4 0x70 0x45 0x41 0xb0 0x98 0x12 0x45
0x80490c8 <data+8>: 0x33 0x53 0x5a 0xc5 0xa4 0xe4 0xe5 0x3e
(gdb) x/16b &value1
0x804909c <value1>: 0xa4 0x70 0x45 0x41 0xb0 0x98 0x12 0x45
0x80490a4 <value1+8>:0x33 0x53 0x5a 0xc5 0xa4 0xe4 0xe5 0x3e
(gdb)
Yes, they do match!
SSE2 floating-point values
Similar to the SSE data types, the IA-32 platform includes instructions for moving the new SSE2 packed
double-precision floating-point data types. The following table describes the new instructions that can
be used.
Instruction Description
MOVAPD Move two aligned, double-precision values to XMM registers
or memory
MOVUPD Move two unaligned, double-precision values to XMM registers
or memory
MOVSD Move one double-precision value to memory or the low
quadword of a register
MOVHPD Move one double-precision value to memory or the high
quadword of a register
MOVLPD Move one double-precision value to memory or the low
quadword of a register
Each of these instructions uses the 128-bit XMM register to move 64-bit double-precision floating-point
values. The MOVAPD and MOVUPD instructions move the complete packed double-precision floating-point
value into and out of the XMM registers.
The sse2float.s program demonstrates these instructions:
# sse2float.s - An example of moving SSE2 FP data types
.section .data
value1:
.double 12.34, 2345.543
value2:
.double -5439.234, 32121.4
.section .bss
.lcomm data, 16
.section .text
.globl _start
194
Chapter 7
_start:
nop
movupd value1, %xmm0
movupd value2, %xmm1
movupd %xmm0, %xmm2
movupd %xmm0, data
movl $1, %eax
movl $0, %ebx
int $0x80
This time the data values stored in memory are changed to double-precision floating-point values.
Because the program will transfer packed values, an array of two values is created.
After assembling the program, you can again watch the operations in the debugger. After stepping
through the MOVUPD instructions, look at the contents of the pertinent XMM registers:
(gdb) print $xmm0
$1 = {v4_float = {0.0587499999, 2.57562494, -7.46297859e-36, -2.33312488},
v2_double = {10.42, -5.3300000000000001},
v16_int8 = “xxp=\xx$@Rx\036\205xQ\025x”, v8_int16 = {-23593, 15728, -10486,
16420, -18350, -31458, 20971, -16363}, v4_int32 = {1030792151, 1076156170,
-2061584302, -1072344597}, v2_int64 = {4622055556569408471,
-4605684971923916718}, uint128 = 0xc01551eb851eb8524024d70a3d70a3d7}
(gdb) print $xmm1
$2 = {v4_float = {0, 2.265625, -107374184, 2.01249981}, v2_double = {4.25,
2.1000000000000001},
v16_int8 = “\000\000\000\000\000\000\021@xxxxxx\000@”, v8_int16 = {0, 0, 0,
16401, -13107, -13108, -13108, 16384}, v4_int32 = {0, 1074855936,
-858993459, 1073794252}, v2_int64 = {4616471093031469056,
4611911198408756429}, uint128 = 0x4000cccccccccccd4011000000000000}
(gdb) print $xmm2
$3 = {v4_float = {1.40129846e-44, 2.80259693e-44, 4.20389539e-44,
5.60519386e-44}, v2_double = {4.2439915824246103e-313,
8.4879831653432862e-313},
v16_int8 = “\n\000\000\000\024\000\000\000\036\000\000\000( \000\000”,
v8_int16 = {10, 0, 20, 0, 30, 0, 40, 0}, v4_int32 = {10, 20, 30, 40},
v2_int64 = {85899345930, 171798691870},
uint128 = 0x000000280000001e000000140000000a}
(gdb) print $xmm3
$4 = {v4_float = {7.00649232e-45, 2.1019477e-44, 3.50324616e-44,
4.90454463e-44}, v2_double = {3.1829936866949413e-313,
7.4269852696136172e-313},
v16_int8 = “\005\000\000\000\017\000\000\000\031\000\000\00 0#\000\000”,
v8_int16 = {5, 0, 15, 0, 25, 0, 35, 0}, v4_int32 = {5, 15, 25, 35},
v2_int64 = {64424509445, 150323855385},
uint128 = 0x00000023000000190000000f00000005}
(gdb)
195
Using Numbers
Again you have to sift through the debugger output, but this time you are looking for the v2_double
data types. The proper values have been moved to the registers.
Next, examine the data memory location to ensure that the proper values have been copied there as well:
(gdb) x/2gf &data
0x80490c0 <data>: 12.34 2345.5430000000001
(gdb)
Because the data memory location contains two double-precision floating-point values, you must use
the 2gf option of the x command to display both values stored at the memory location. Again, we got
what we expected.
SSE3 instructions
On a final note, the SSE3 technology, available in Pentium 4 and later processors, adds three additional
instructions to facilitate moving packed double-precision floating-point values around:
❑ MOVSHDUP: Moves a 128-bit value from memory or an XMM register, duplicating the second and
fourth 32-bit data elements. Thus, moving the data element consisting of 32-bit single-precision
floating-point values DCBA would create the 128-bit packed single-precision floating-point value
consisting of DDBB.
❑ MOVSLDUP: Moves a 128-bit value from memory or an XMM register, duplicating the first and
third 32-bit data elements. Thus, moving the data element consisting of 32-bit single-precision
floating-point values DCBA would create the 128-bit packed single-precision floating-point value
consisting of CCAA.
❑ MOVDDUP: Moves a 64-bit double-precision floating-point value from memory or an XMM register,
duplicating it into a 128-bit XMM register. Thus, moving the data element consisting of 64-
bit double-precision floating-point value A would create the 128-bit packed double-precision
floating-point value AA.
At the time of this writing, the only IA-32 processors that support SSE3 instructions are the Pentium 4
Hyperthreading processors.
منبع :Professional Assembly (Page 182 - 196)
vBulletin® v4.2.5, Copyright ©2000-1404, Jelsoft Enterprises Ltd.