儿郎是什么意思| 呼吸胸口疼是什么原因| 碱性磷酸酶高是什么意思| 洛阳白马寺求什么最灵| 庚日是什么意思啊| 外阴白斑瘙痒用什么药| 疼风是什么原因引起的| 罗西尼手表什么档次| 男人喜欢什么样的女人做老婆| 1902年属什么生肖| 吃什么水果对肠胃好| 67什么意思| 六爻是什么意思| 逆袭什么意思| 莫言是什么意思| 食铁兽是什么动物| 地贫是什么病| 鸡蛋可以炒什么菜| 招魂是什么意思| 苒字五行属什么| 什么是绿茶女| 为什么睾丸一边大一边小| 4月份是什么星座| 秋天有什么水果成熟| 男性吃什么可以壮阳| 狗狗肠胃炎吃什么药| 包茎不割会有什么影响| 肚脐下面是什么部位| 食物中毒什么症状| hpv检查什么| 今年流行什么发型女| 四月十一日是什么星座| 姐妹是什么意思| 3月27日是什么星座| 见人说人话见鬼说鬼话是什么意思| 什么野菜| 梦到女鬼是什么意思| 东坡肉是什么菜系| 阁楼是什么意思| 什么是琥珀| 晚上左眼皮跳预示什么| 赤是什么颜色| 诱因是什么意思| 胃酸胃烧心吃什么药| 不能吃辣是什么原因| 为什么会耳鸣| 猫什么时候打疫苗| 中老年人吃什么油好| 贫血喝什么茶| 病毒感冒吃什么药| 睾丸炎吃什么药最有效| 事业有成是什么意思| 腰疼是什么原因引起的| 流金岁月什么意思| 哈尔滨有什么特产| 什么的舞台| 海参不适合什么人吃| 吃什么除湿气| 被蜜蜂蛰了擦什么药| 睡觉为什么会打呼噜| 梦见好多蛇是什么预兆| 就餐是什么意思| 拉肚子为什么憋不住| 南瓜可以做什么美食| 血栓是什么| panadol是什么药| 切什么意思| 喜面是什么意思| kpa是什么意思| 乙肝二四五阳性什么意思| 美缝剂什么牌子的好| 冬日暖阳是什么意思| 怀孕期间不能吃什么| 基围虾不能和什么一起吃| 尿酸高尿液是什么颜色| 旖旎风光是什么意思| 缺钾吃什么药| 空调什么牌子的好| 幽门螺旋杆菌有什么危害| 腺瘤型息肉是什么意思| 脾胃虚弱吃什么食物补| 2019属什么生肖| 月月红是什么花| 真菌菌丝阳性什么意思| 强迫症有什么症状| 什么小吃最火爆最赚钱| 侄女结婚送什么礼物最好| 尿毒症可以吃什么水果| 老打嗝是什么原因引起的| 脚磨破了涂什么药| 为什么叫中国| 卵巢早衰吃什么药最好| 千里江陵是什么意思| 跑马了是什么意思| anker是什么牌子| 咽炎吃什么消炎药最好| 米非司酮片是什么药| 眼睛为什么老是流眼泪| 欢子真名叫什么| 献血前需要注意什么| choker是什么意思| 肾动脉狭窄有什么症状| 膝盖凉是什么原因| 黄色裤子配什么上衣好看| 杜鹃花什么时候开| 血脂高是什么原因引起的| 端午节安康是什么意思| 数字9像什么| 三个鬼念什么| 手发胀是什么原因造成的| 什么花好养| 隐形眼镜没有护理液用什么代替| 吃什么容易长高| 梦见孕妇是什么预兆| 四肢肿胀是什么原因引起的| 舌头上火是什么原因| 偶见是什么意思| 除氯是什么意思| 什么命的人会丧偶| 出虚汗吃什么药| 戒指上的s925是什么意思| 册那是什么意思| 藩王是什么意思| 整装是什么意思| 工事是什么意思| 飞蛾扑火是什么意思| 梦见下雨是什么预兆| 狗狗打喷嚏流鼻涕怎么办吃什么药| 什么是疱疹怎么得的| 职业规划是什么| 女人吃枸杞有什么好处| 4月15日是什么星座| 千锤百炼什么意思| 农历五月十八是什么星座| 双肾尿盐结晶是什么意思| 竖心旁与什么有关| 白茶什么季节喝好| 毛片是什么意思| 炖鸡汤放什么材料好吃| 辞海是什么书| 尿红色是什么原因| 梦见自己假牙掉了是什么意思| 营养土是什么土| 三氯蔗糖是什么东西| 钙化灶什么意思| 身体乳是什么| 经常感冒发烧是什么原因| 69式是什么意思| 噫是什么意思| 支气管炎吃什么药好得快| 什么是结扎| 利尿剂是什么| 三净肉指什么| 胃热吃什么药最有效| 男人胸前有痣代表什么意思| 智商105是什么水平| 意味深长的意思是什么| 38码衣服相当于什么码| 铅中毒什么症状| vca是什么意思| 突然眩晕是什么原因| 跖疣是什么样子图片| 弯弯的月儿像什么| 什么叫尿潴留| 救济的近义词是什么| eb病毒iga抗体阳性是什么意思| 举贤不避亲什么意思| 咳嗽白痰吃什么药| 蜗牛是什么生肖| 咳嗽有白痰一直不好是什么原因| cinderella是什么意思| 头皮毛囊炎用什么洗发水| 千锤百炼什么意思| 男人蛋皮痒用什么药| 额头上长痘痘是什么原因| 劫数是什么意思| 做梦梦见生孩子是什么意思| 51岁属什么| 电头是什么| 尿毒症前兆是什么症状表现| 海绵体修复吃什么药| 拉肚子应该吃什么药| 贝伐珠单抗是什么药| 不到长城非好汉的下一句是什么| 眼压是什么意思| 乳糜血是什么意思| 怀孕喝什么汤最有营养| 低烧吃什么| 尿道感染是什么原因引起的| 樊胜美是什么电视剧| 栽赃是什么意思| 什么是前庭功能| 吃紧急避孕药有什么副作用| 闲情雅致是什么意思| 疱疹用什么药膏最有效| 下面痒用什么药| 工资5k是什么意思| stranger什么意思| 意图是什么意思| 心脏五行属什么| 8月23是什么星座| 做梦被杀了是什么征兆| 空调为什么不制冷| 高亢是什么意思| 阳阴阳是什么卦| 女性内分泌失调有什么症状| 心脏大是什么病严重吗| 脑电图能检查出什么疾病| yxh是什么意思| 转呼啦圈有什么好处| 小孩感冒挂什么科| 11月17日是什么星座| caluola手表是什么牌子| b12是什么| 什么是冰晶| 吃降压药有什么副作用| 射手座和什么座最配对| 毛囊炎挂什么科| 为什么北京是首都| 手会抖是什么原因| 10月28日什么星座| 囊肿是什么原因引起的| 低钾血症是什么意思| 什么酒不能喝打一生肖| 黄历破屋是什么意思| 大红袍适合什么季节喝| 单亲家庭是什么意思| 什么是肺结节| prp是什么| 为什么一热脸就特别红| 1884年属什么生肖| 肽是什么东西| 乞巧节是什么节| 出汗太多会对身体造成什么伤害| 急性肠炎吃什么食物好| 隐血试验阴性是什么意思| 1213是什么日子| 什么血型不招蚊子| 在岸人民币和离岸人民币什么意思| 左手有点麻是什么原因| 血压压差小是什么原因| 乌龟不吃食是什么原因| 一什么西瓜| 宫腔内异常回声是什么意思| 拔智齿后吃什么消炎药| 四月十五日是什么日子| 丙氨酸氨基转移酶高是什么意思| 长期便秘是什么原因引起的| 法令纹用什么填充效果最好| 五月初六是什么星座| 上吐下泻吃什么食物好| 眼袋大是什么原因| 低压低是什么原因| 腹部疼痛挂什么科| 什么也不懂| 女人胸疼是什么原因| 眼白发黄是什么原因| 棉花是什么时候传入中国的| 嗜酸性粒细胞偏低是什么原因| 萎缩性胃炎吃什么中成药| 酒店五行属什么| 脚冰凉吃什么药| 内科主要看什么病| 检查乳腺挂什么科| 百度Jump to content

发快递寄鸭脖送热干面 “神秘邮包”点赞西安地铁员工

From Wikipedia, the free encyclopedia
百度 “对我们的一些最重要的军事盟友征收这些关税,在我看来毫无益处,”剑桥大学的贸易专家克劳利对《纽约时报》表示:“美国是在说,‘如果发生战争,我们不能指望你们来提供高级钢材。

The XOP (eXtended Operations[1]) instruction set, announced by AMD on May 1, 2009, is an extension to the 128-bit SSE core instructions in the x86 and AMD64 instruction set for the Bulldozer processor core, which was released on October 12, 2011.[2] However AMD removed support for XOP from Zen (microarchitecture) onward.[3]

The XOP instruction set contains several different types of vector instructions since it was originally intended as a major upgrade to SSE. Most of the instructions are integer instructions, but it also contains floating point permutation and floating point fraction extraction instructions. See the index for a list of instruction types.

History

[edit]

XOP is a revised subset of what was originally intended as SSE5. It was changed to be similar but not overlapping with AVX, parts that overlapped with AVX were removed or moved to separate standards such as FMA4 (floating-point vector multiply–accumulate) and CVT16 (Half-precision floating-point conversion implemented as F16C by Intel).[1]

All SSE5 instructions that were equivalent or similar to instructions in the AVX and FMA4 instruction sets announced by Intel have been changed to use the coding proposed by Intel. Integer instructions without equivalents in AVX were classified as the XOP extension.[1] The XOP instructions have an opcode byte 8F (hexadecimal), but otherwise almost identical coding scheme as AVX with the 3-byte VEX prefix.

Commentators[4] have seen this as evidence that Intel has not allowed AMD to use any part of the large VEX coding space. AMD has been forced to use different codes in order to avoid using any code combination that Intel might possibly be using in its development pipeline for something else. The XOP coding scheme is as close to the VEX scheme as technically possible without risking that the AMD codes overlap with future Intel codes. This inference is speculative, since no public information is available about negotiations between the two companies on this issue.

The use of the 8F byte requires that the m-bits (see VEX coding scheme) have a value larger than or equal to 8 in order to avoid overlap with existing instructions.[Note 1] The C4 byte used in the VEX scheme has no such restriction. This may prevent the use of the m-bits for other purposes in the future in the XOP scheme, but not in the VEX scheme. Another possible problem is that the pp bits have the value 00 in the XOP scheme, while they have the value 01 in the VEX scheme for instructions that have no legacy equivalent. This may complicate the use of the pp bits for other purposes in the future.

A similar compatibility issue is the difference between the FMA3 and FMA4 instruction sets. Intel initially proposed FMA4 in AVX/FMA specification version 3 to supersede the 3-operand FMA proposed by AMD in SSE5. After AMD adopted FMA4, Intel canceled FMA4 support and reverted to FMA3 in the AVX/FMA specification version 5 (See FMA history).[1][5][6]

In March 2015, AMD explicitly revealed in the description of the patch for the GNU Binutils package that Zen, its third-generation x86-64 architecture in its first iteration (znver1 – Zen, version 1), will not support TBM, FMA4, XOP and LWP instructions developed specifically for the "Bulldozer" family of micro-architectures.[7][8]

Integer vector multiply–accumulate instructions

[edit]

These are integer version of the FMA instruction set. These are all four operand instructions similar to FMA4 and they all operate on signed integers.

Instruction Description[9] Operation
VPMACSWW, VPMACSSWW Multiply Accumulate (with Saturation) Word to Word 2x8 words (a0-a7, b0-b7) + 8 words (c0-c7) → 8 words (r0-r7)

r0 = a0 * b0 + c0, r1 = a1 * b1 + c1, ..

VPMACSWD, VPMACSSWD Multiply Accumulate (with Saturation) Low Word to Doubleword 2x8 words (a0-a7, b0-b7) + 4 doublewords (c0-c3) → 4 doublewords (r0-r3)

r0 = a0 * b0 + c0, r1 = a2 * b2 + c1, .[2]

VPMACSDD, VPMACSSDD Multiply Accumulate (with Saturation) Doubleword to Doubleword 2x4 doublewords (a0-a3, b0-b3) + 4 doublewords (c0-c3) → 4 doublewords (r0-r3)

r0 = a0 * b0 + c0, r1 = a1 * b1 + c1, ..

VPMACSDQL, VPMACSSDQL Multiply Accumulate (with Saturation) Low Doubleword to Quadword 2x4 doublewords (a0-a3, b0-b3) + 2 quadwords (c0-c1) → 2 quadwords (r0-r3)

r0 = a0 * b0 + c0, r1 = a2 * b2 + c1

VPMACSDQH, VPMACSSDQH Multiply Accumulate (with Saturation) High Doubleword to Quadword 2x4 doublewords (a0-a3, b0-b3) + 2 quadwords (c0-c1) → 2 quadwords (r0-r3)

r0 = a1 * b1 + c0, r1 = a3 * b3 + c1

VPMADCSWD, VPMADCSSWD Multiply Add Accumulate (with Saturation) Word to Doubleword 2x8 words (a0-a7, b0-b7) + 4 doublewords (c0-c3) → 4 doublewords (r0-r3)

r0 = a0 * b0 + a1 * b1 + c0, r1 = a2 * b2 + a3 * b3 + c1, ..

Integer vector horizontal addition

[edit]

Horizontal addition instructions adds adjacent values in the input vector to each other. The output size in the instructions below describes how wide the horizontal addition performed is. For instance horizontal byte to word adds two bytes at a time and returns the result as vector of words, but byte to quadword adds eight bytes together at a time and returns the result as vector of quadwords. Six additional horizontal addition and subtraction instructions can be found in SSSE3, but they operate on two input vectors and only does two and two operations.

Instruction Description[9] Operation
VPHADDBW, VPHADDUBW Horizontal add two signed/unsigned bytes to word 16 bytes (a0-a15) → 8 words (r0-r7)

r0 = a0+a1, r1 = a2+a3, r2 = a4+a5, ...

VPHADDBD, VPHADDUBD Horizontal add four signed/unsigned bytes to doubleword 16 bytes (a0-a15) → 4 doublewords (r0-r3)

r0 = a0+a1+a2+a3, r1 = a4+a5+a6+a7, ...

VPHADDBQ, VPHADDUBQ Horizontal add eight signed/unsigned bytes to quadword 16 bytes (a0-a15) → 2 quadwords (r0-r1)

r0 = a0+a1+a2+a3+a4+a5+a6+a7, ...

VPHADDWD, VPHADDUWD Horizontal add two signed/unsigned words to doubleword 8 words (a0-a7) → 4 doublewords (r0-r3)

r0 = a0+a1, r1 = a2+a3, r2 = a4+a5, ...

VPHADDWQ, VPHADDUWQ Horizontal add four signed/unsigned words to quadword 8 words (a0-a7) → 2 quadwords (r0-r1)

r0 = a0+a1+a2+a3, r1 = a4+a5+a6+a7

VPHADDDQ, VPHADDUDQ Horizontal add two signed/unsigned doublewords to quadword 4 doublewords (a0-a3) → 2 quadwords (r0-r1)

r0 = a0+a1, r1 = a2+a3

VPHSUBBW Horizontal subtract two signed bytes to word 16 bytes (a0-a15) → 8 words (r0-r7)

r0 = a0-a1, r1 = a2-a3, r2 = a4-a5, ...

VPHSUBWD Horizontal subtract two signed words to doubleword 8 words (a0-a7) → 4 doublewords (r0-r3)

r0 = a0-a1, r1 = a2-a3, r2 = a4-a5, ...

VPHSUBDQ Horizontal subtract two signed doublewords to quadword 4 doublewords (a0-a3) → 2 quadwords (r0-r1)

r0 = a0-a1, r1 = a2-a3

Integer vector compare

[edit]

This set of vector compare instructions all take an immediate as an extra argument. The immediate controls what kind of comparison is performed. There are eight comparison possible for each instruction. The vectors are compared and all comparisons that evaluate to true set all corresponding bits in the destination to 1, and false comparisons sets all the same bits to 0. This result can be used directly in VPCMOV instruction for a vectorized conditional move.

Instruction Description[9]
VPCOMB Compare Vector Signed Bytes
VPCOMW Compare Vector Signed Words
VPCOMD Compare Vector Signed Doublewords
VPCOMQ Compare Vector Signed Quadwords
VPCOMUB Compare Vector Unsigned Bytes
VPCOMUW Compare Vector Unsigned Words
VPCOMUD Compare Vector Unsigned Doublewords
VPCOMUQ Compare Vector Unsigned Quadwords
Immediate Comparison
000 Less Than
001 Less Than or Equal
010 Greater Than
011 Greater Than or Equal
100 Equal
101 Not Equal
110 False
111 True

Vector conditional move

[edit]

VPCMOV works as bitwise variant of the blend instructions in SSE4. Like the AVX instruction VPBLENDVB, it is a four-operand instruction with three source operands and a destination. For each bit in the third operand (which acts as a selector), 1 selects the same bit in the first source, and 0 selects the same in the second source. When used together with the XOP vector comparison instructions above this can be used to implement a vectorized ternary move, or if the second input is the same as the destination, a conditional move (CMOV).

Instruction Description[9]
VPCMOV Vector Conditional Move

Integer vector shift and rotate instructions

[edit]

The shift instructions here differ from those in SSE2 in that they can shift each unit with a different amount using a vector register interpreted as packed signed integers. The sign indicates the direction of shift or rotate, with positive values causing left shift and negative right shift[10] Intel has specified a different incompatible set of variable vector shift instructions in AVX2.[11]

Instruction Description[9]
VPROTB Packed Rotate Bytes
VPROTW Packed Rotate Words
VPROTD Packed Rotate Doublewords
VPROTQ Packed Rotate Quadwords
VPSHAB Packed Shift Arithmetic Bytes
VPSHAW Packed Shift Arithmetic Words
VPSHAD Packed Shift Arithmetic Doublewords
VPSHAQ Packed Shift Arithmetic Quadwords
VPSHLB Packed Shift Logical Bytes
VPSHLW Packed Shift Logical Words
VPSHLD Packed Shift Logical Doublewords
VPSHLQ Packed Shift Logical Quadwords

Vector permute

[edit]

VPPERM is a single instruction that combines the SSSE3 instruction PALIGNR and PSHUFB and adds more to both. Some compare it the Altivec instruction VPERM.[12] It takes three registers as input, the first two are source registers and the third the selector register. Each byte in the selector selects one of the bytes in one of the two input registers for the output. The selector can also apply effects on the selected bytes such as setting it to 0, reverse the bit order, and repeating the most-significant bit. All of the effects or the input can in addition be inverted.

The VPERMIL2PD and VPERMIL2PS instructions are two source versions of the VPERMILPD and VPERMILPS instructions in AVX which means like VPPERM they can select output from any of the fields in the two inputs.

Instruction Description[9]
VPPERM Packed Permute Byte
VPERMIL2PD Permute Two-Source Double-Precision Floating-Point
VPERMIL2PS Permute Two-Source Single-Precision Floating-Point

Floating-point fraction extraction

[edit]

These instructions extracts the fractional part of floating point, that is the part that would be lost in conversion to integer.

Instruction Description[9]
VFRCZPD Extract Fraction Packed Double-Precision Floating-Point
VFRCZPS Extract Fraction Packed Single-Precision Floating-Point
VFRCZSD Extract Fraction Scalar Double-Precision Floating-Point
VFRCZSS Extract Fraction Scalar Single-Precision Floating Point

CPUs with XOP

[edit]

See also

[edit]

Notes

[edit]
  1. ^ Byte value 0x8F is an existing opcode for a POP instruction. This instruction uses the ModR/M byte, which follows the opcode, but it does not make use of the "reg" (register) field, which is bits 3-5. Some opcodes which don't use "reg" multiplex instructions by using these bits to signify eight different instructions (0x80-0x83 and 0xD0-0xDF, among others); 0x8F does not. This means, for a standard POP instruction, bits 3-5 should always be zero. Since the m-bits are bits 0-4, requiring a value 8 or higher sets bit 3 of the byte following 0x8F.

References

[edit]
  1. ^ a b c d Dave Christie (2025-08-05), Striking a balance, AMD Developer blogs, archived from the original on 2025-08-05, retrieved 2025-08-05
  2. ^ a b AMD64 Architecture Programmer's Manual Volume 6: 128-Bit and 256-Bit XOP, FMA4 and CVT16 Instructions (PDF), AMD, May 1, 2009
  3. ^ Michael Larabel (March 3, 2017). "The Impact Of GCC Zen Compiler Tuning On AMD Ryzen Performance". Phoronix. But with Zen being a clean-sheet design, there are some instruction set extensions found in Bulldozer processors not found in Zen/znver1. Those no longer present include FMA4 and XOP.
  4. ^ Agner Fog (December 5, 2009), Stop the instruction set war
  5. ^ Intel AVX Programming Reference, March 2008, archived from the original (PDF) on 2025-08-05, retrieved 2025-08-05
  6. ^ Intel Advanced Vector Extensions Programming Reference, January 2009, archived from the original on February 29, 2012, retrieved 2025-08-05
  7. ^ Ganesh Gopalasubramanian (March 10, 2015). "[PATCH] add znver1 processor". binutils@sourceware.org (Mailing list).
  8. ^ Amit Pawar (August 7, 2015). "[PATCH] Remove CpuFMA4 From Znver1 CPU Flags". binutils@sourceware.org (Mailing list).
  9. ^ a b c d e f g "AMD64 Architecture Programmer's Manual, Volume4: 128-Bit and 256-Bit Media Instructions" (PDF). AMD. Retrieved 2025-08-05.
  10. ^ "New "Bulldozer" and "Piledriver" Instructions" (PDF). AMD. Retrieved 2025-08-05.
  11. ^ "Intel Architecture Instruction Set Extensions Programming Reference". Intel. Archived from the original (PDF) on February 1, 2014. Retrieved 2025-08-05.
  12. ^ "Buldozer x264 optimisations". Archived from the original on 2025-08-05. Retrieved 2025-08-05.
  13. ^ Dave Christie (2025-08-05), Striking a balance, AMD Developer blogs, archived from the original on 2025-08-05, retrieved 2025-08-05
  14. ^ New "Bulldozer" and "Piledriver" Instructions (PDF), AMD, October 2012
甘油三酯偏高吃什么药 星月菩提五行属什么 孕妇梦见洪水是什么意思 龙的九个儿子都叫什么名字 两个立念什么
凤尾鱼为什么突然就死 p站是什么 肺主皮毛是什么意思 妇科假丝酵母菌是什么病 狐惑病是什么病
pt950是什么意思 热休克蛋白90a检查高是什么原因 为什么会尿频尿急 黑眼袋是什么原因引起的 乳腺增生吃什么药效果好
澳门用什么币种 月寸读什么 johnson是什么品牌 君子兰叶子发黄是什么原因 减肥喝什么饮料
脂蛋白a高是什么意思hcv9jop4ns5r.cn 本命年红内衣什么时候穿hcv9jop6ns9r.cn 拔罐的原理是什么hcv9jop3ns3r.cn 金不换是什么sanhestory.com 大宗商品是什么意思hcv9jop5ns7r.cn
诺如病毒吃什么药好得快一点hcv7jop7ns4r.cn 火召是什么字hcv7jop7ns4r.cn 依云矿泉水为什么贵hcv9jop4ns5r.cn loho是什么牌子hcv9jop2ns8r.cn 口腔溃疡为什么是白色的hcv8jop4ns8r.cn
胆结石切除胆囊后有什么影响hcv8jop6ns1r.cn pending是什么状态wuhaiwuya.com 3月28日是什么星座hcv7jop9ns6r.cn 滋阴降火吃什么药hcv8jop0ns9r.cn 派出所所长什么级别hcv8jop5ns2r.cn
驿什么意思hcv9jop5ns6r.cn 今天是美国什么节日hcv7jop9ns2r.cn 普惠性幼儿园是什么意思hcv9jop0ns6r.cn 梦见下雪是什么意思hcv9jop6ns5r.cn 狗狗拉稀吃什么药hcv7jop5ns6r.cn
百度