秒男是什么意思| candies什么意思| 现在做什么最赚钱| 为什么会有痔疮| 煮玉米放什么好吃| 胎盘吃了对身体有什么好处| 尿激酶的作用及功效是什么| 罕见是什么意思| 春五行属什么| 发烧吃什么食物| 北京为什么叫四九城| 寸止什么意思| 青口是什么东西| 肌酐高什么东西不能吃| 难缠是什么意思| 瞬息万变什么意思| 女生无缘无故头疼是什么原因| 豆加支念什么| 颈椎问题挂什么科| 目赤是什么症状| 孩子多动缺什么| 黑布林是什么水果| 子宫内膜薄有什么危害| 怀孕为什么要建档| 宫颈纳氏囊肿是什么意思| 凉拖鞋什么材质的好| 青少年腰疼是什么原因引起的| 东星斑为什么这么贵| 祎是什么意思| hpv59阳性是什么意思| 多吃黑芝麻有什么好处| 什么牌子护肤品好| 补牙用什么材料最好| 尊敬是什么意思| 二黑是什么意思| 莫代尔是什么面料| 介词后面跟什么| 红参对子宫有什么作用| 什么牌子的指甲油好| 口臭是什么原因造成的| 15年婚姻是什么婚| 子午流注是什么意思| 什么样的人容易猝死| 小寒是什么意思| 座是什么结构| 蝉是什么意思| 农历六月十七是什么星座| 舟字五行属什么| 人流挂什么科| 长痘痘擦什么药膏好| 吐奶严重是什么原因| 领域是什么意思| 什么样的草地| 鸭肉煲汤放什么材料好| 白头翁是什么| fgr医学上是什么意思| 瘤是什么意思| 什么叶子| 三叉神经痛吃什么药| 什么方法可以让月经快点来| 喉咙干痒咳嗽吃什么药| 什么是寒性食物| 铁蛋白高是什么原因| 5点至7点是什么时辰| 骐字五行属什么| 吃大枣有什么好处| 肾阳虚吃什么| 油墨用什么可以洗掉| 什么是继发性高血压| 1958年是什么年| 脱肛是什么意思| 脸色发青是什么原因引起的| 神态自若是什么意思| 后脑勺胀痛什么原因| 栀子花什么时候开| wis是什么牌子| 皮肤白斑点是什么原因| 假唱是什么意思| 比萨斜塔为什么是斜的| 喝蒲公英茶有什么作用| 宝宝肋骨外翻是什么原因| 眩晕呕吐是什么病| 胃不舒服吃什么好| 吃什么饭| 垂的第三笔是什么| 待产是什么意思| 办理暂住证需要什么材料| 宝宝入盆有什么感觉| 外耳道耵聍什么意思| 萎靡是什么意思| 虾和什么不能一起吃| 性张力是什么意思| 咖啡有什么好处和坏处| 特派员是什么级别| 月经量少吃什么药| 祭日和忌日是什么意思| 星字属于五行属什么| 身体抽搐是什么原因| 什么时候容易怀孕| 梦到吵架是什么意思| 二十三岁属什么生肖| 补气是什么意思| 人生得意须尽欢什么意思| 嗓子突然哑了是什么原因引起的| gp是什么| 吃芒果不能吃什么| 什么东西嘴里没有舌头| pa66是什么材料| 冷藏和冷冻有什么区别| 心气虚吃什么药| 蜗牛吃什么| 细思极恐是什么意思| 两肺纹理增粗是什么意思| 我们到底什么关系| 什么息| 第二天叫什么日| 过敏吃什么| 喉咙痒是什么原因引起的| 茉莉茶属于什么茶| 指标是什么意思| 反应迟钝是什么原因造成的| 唐筛21三体临界风险是什么意思| 什么是云母| 多吃玉米有什么好处和坏处| 盎司是什么意思| 喝什么茶能降血压| 产后抑郁症有什么表现症状| 蜜袋鼯吃什么| 什么山峻岭| 太字五行属什么| 烊什么意思| 胃痛挂什么科| 生丝是什么| 闭合是什么意思| 5年存活率是什么意思| 什么是鸡奸| 32周孕检检查什么项目| 液氮是什么| 阿飞是什么意思| 补钙吃什么维生素| 玉米热量高为什么还减肥| 跑完步喝什么水最好| 肾阳虚喝什么泡水最好| 忧郁症挂什么科| 人绒毛膜促性腺激素是什么意思| 灰溜溜是什么意思| 受之无愧的意思是什么| 鸡蛋壳薄是什么原因| 喉咙痛吃什么水果好| 缘分使然是什么意思| 结婚六十年是什么婚| 97年什么命| 膝盖疼痛挂什么科| 就诊卡是什么| 人生最大的幸福是什么| 婴儿拉肚子是什么原因造成的| 1998年出生属什么生肖| 什么眼睛| 什么是凯格尔运动| 云南有什么少数民族| hold住是什么意思| 12月10号什么星座| 闻所未闻是什么意思| 今年贵庚是什么意思| 受精卵着床失败有什么症状| 70岁是什么之年| 镜花水月什么意思| 刘邦字什么| 蚯蚓可以钓什么鱼| 咽喉炎吃什么药有效| 念珠菌用什么药最好| 血透是什么意思| 腐叶土是什么土| 闺房之乐是什么意思| 寻麻疹涂什么药膏| eo是什么意思| 妥了是什么意思| 碳14阴性是什么意思| 纵什么意思| 正团级是什么军衔| 死心眼什么意思| 冥想什么意思| 什么护肤品好用| 浮沉是什么意思| 一本万利是什么生肖| 月经前几天是什么期| 心跳过慢吃什么药| 喝什么解暑| 荨麻疹可以涂什么药膏| 尿酸高吃什么中药能降下来| 身体抱恙是什么意思| 人彘为什么还能活着| 吃什么补精| 为什么感觉不到饿| 莲子有什么作用| 跖疣用什么药膏能治好| 心无什么用| 付字五行属什么| 吃蒲公英有什么好处| 01年属什么生肖| 爱新觉罗改成什么姓了| 琨字五行属什么| 什么是领导| 临床医学主要学什么| 魁拔4什么时候上映| 梦见狐狸是什么预兆| 狗打针打什么部位| 什么是同源染色体| 灰色是什么颜色调出来的| 喝菊花茶有什么好处| 心脏缺血吃什么补得快| 肝不好吃什么调理| 6月19日是什么节日| 爱生闷气的人容易得什么病| 舌苔厚发黄是什么原因| 芹菜不能和什么食物一起吃| 胎动突然减少是什么原因| 肉烧什么好吃| 每天早上喝一杯蜂蜜水有什么好处| 尿潜血弱阳性是什么意思| 头皮痛什么原因引起的| 内衣为什么会发霉| 博士的学位是什么| 小孩肺炎吃什么药| 阳气不足吃什么中成药| 什么的蜻蜓| 吃什么对睡眠好| 无花果什么味道| 36是什么码| vd是什么| 吃什么能化解肾结石| 碉堡是什么意思啊| 什么相马| 肚脐下方疼是什么原因| 1975年属兔的是什么命| 左眼皮肿是什么原因引起的| 烤鱼用什么鱼| 祸祸是什么意思| 胃一阵一阵的疼吃什么药| 睾丸疼痛吃什么药最好| 情何以堪 什么意思| 女孩第一次来月经需要注意什么| 白带黄用什么药| 为什么会晒黑| 杨字五行属什么| 刷牙牙龈出血是什么原因| 做梦梦见掉头发是什么意思| 龟是什么意思| 甘油三酯查什么项目| 水落石出是什么生肖| 听之任之是什么意思| 神经纤维瘤是什么病| 尿精是什么原因造成的| 例假期间适合吃什么水果| 敖包是什么意思| 清对什么| 女性白带多吃什么药| 卵生是什么意思| 乙肝是什么病严重吗| 比重是什么意思| 帽缨是什么意思| ms是什么病| 病毒性疣是什么病| 吃什么能解决便秘| 百度Jump to content

中国人权研究会常务理事汪习根应邀出席联...

From Wikipedia, the free encyclopedia
百度 但是从过程和结果来看,似乎乒乓球现在也出现不稳迹象,动不动就输球。

Bit manipulation instructions sets (BMI sets) are extensions to the x86 instruction set architecture for microprocessors from Intel and AMD. The purpose of these instruction sets is to improve the speed of bit manipulation. All the instructions in these sets are non-SIMD and operate only on general-purpose registers.

There are two sets published by Intel: BMI (now referred to as BMI1) and BMI2; they were both introduced with the Haswell microarchitecture with BMI1 matching features offered by AMD's ABM instruction set and BMI2 extending them. Another two sets were published by AMD: ABM (Advanced Bit Manipulation, which is also a subset of SSE4a implemented by Intel as part of SSE4.2 and BMI1), and TBM (Trailing Bit Manipulation, an extension introduced with Piledriver-based processors as an extension to BMI1, but dropped again in Zen-based processors).[1]

ABM (Advanced Bit Manipulation)

[edit]

AMD was the first to introduce the instructions that now form Intel's BMI1 as part of its ABM (Advanced Bit Manipulation) instruction set, then later added support for Intel's new BMI2 instructions. AMD today advertises the availability of these features via Intel's BMI1 and BMI2 cpuflags and instructs programmers to target them accordingly.[2]

While Intel considers POPCNT as part of SSE4.2 and LZCNT as part of BMI1, both Intel and AMD advertise the presence of these two instructions individually. POPCNT has a separate CPUID flag of the same name, and Intel and AMD use AMD's ABM flag to indicate LZCNT support (since LZCNT combined with BMI1 and BMI2 completes the expanded ABM instruction set).[2][3]

Encoding Instruction Description[4]
F3 0F B8 /r POPCNT Population count
F3 0F BD /r LZCNT Leading zeros count

LZCNT is related to the Bit Scan Reverse (BSR) instruction, but sets the ZF (if the result is zero) and CF (if the source is zero) flags rather than setting the ZF (if the source is zero). Also, it produces a defined result (the source operand size in bits) if the source operand is zero. For a non-zero argument, sum of LZCNT and BSR results is argument bit width minus 1 (for example, if 32-bit argument is 0x000f0000, LZCNT gives 12, and BSR gives 19).

The encoding of LZCNT is such that if ABM is not supported, then the BSR instruction is executed instead.[4]:?227?

BMI1 (Bit Manipulation Instruction Set 1)

[edit]

The instructions below are those enabled by the BMI bit in CPUID. Intel officially considers LZCNT as part of BMI, but advertises LZCNT support using the ABM CPUID feature flag.[3] BMI1 is available in AMD's Jaguar,[5] Piledriver[6] and newer processors, and in Intel's Haswell[7] and newer processors.

Encoding Instruction Description[3] Equivalent C expression[8][9][10]
VEX.LZ.0F38 F2 /r ANDN Logical and not ~x & y
VEX.LZ.0F38 F7 /r BEXTR Bit field extract (with register) (src >> start) & ((1 << len) - 1)
VEX.LZ.0F38 F3 /3 BLSI Extract lowest set isolated bit x & -x
VEX.LZ.0F38 F3 /2 BLSMSK Get mask up to lowest set bit x ^ (x - 1)
VEX.LZ.0F38 F3 /1 BLSR Reset lowest set bit x & (x - 1)
F3 0F BC /r TZCNT Count the number of trailing zero bits
31 + (!x)
  - (((x & -x) & 0x0000FFFF) ? 16 : 0)
  - (((x & -x) & 0x00FF00FF) ? 8 : 0)
  - (((x & -x) & 0x0F0F0F0F) ? 4 : 0)
  - (((x & -x) & 0x33333333) ? 2 : 0)
  - (((x & -x) & 0x55555555) ? 1 : 0)

TZCNT is almost identical to the Bit Scan Forward (BSF) instruction, but sets the ZF (if the result is zero) and CF (if the source is zero) flags rather than setting the ZF (if the source is zero). For a non-zero argument, the result of TZCNT and BSF is equal.

As with LZCNT, the encoding of TZCNT is such that if BMI1 is not supported, then the BSF instruction is executed instead.[4]:?352?

BMI2 (Bit Manipulation Instruction Set 2)

[edit]

Intel introduced BMI2 together with BMI1 in its line of Haswell processors. Only AMD has produced processors supporting BMI1 without BMI2; BMI2 is supported by AMDs Excavator architecture and newer.[11]

Encoding Instruction Description
VEX.LZ.0F38 F5 /r BZHI Zero high bits starting with specified bit position [src & (1 << inx)-1];
VEX.LZ.F2.0F38 F6 /r MULX Unsigned multiply without affecting flags, and arbitrary destination registers
VEX.LZ.F2.0F38 F5 /r PDEP Parallel bits deposit
VEX.LZ.F3.0F38 F5 /r PEXT Parallel bits extract
VEX.LZ.F2.0F3A F0 /r ib RORX Rotate right logical without affecting flags
VEX.LZ.F3.0F38 F7 /r SARX Shift arithmetic right without affecting flags
VEX.LZ.F2.0F38 F7 /r SHRX Shift logical right without affecting flags
VEX.LZ.66.0F38 F7 /r SHLX Shift logical left without affecting flags

Parallel bit deposit and extract

[edit]

The PDEP and PEXT instructions are new generalized bit-level compress and expand instructions. They take two inputs; one is a source, and the other is a selector. The selector is a bitmap selecting the bits that are to be packed or unpacked. PEXT copies selected bits from the source to contiguous low-order bits of the destination; higher-order destination bits are cleared. PDEP does the opposite for the selected bits: contiguous low-order bits are copied to selected bits of the destination; other destination bits are cleared. This can be used to extract any bitfield of the input, and even do a lot of bit-level shuffling that previously would have been expensive. While what these instructions do is similar to bit level gather-scatter SIMD instructions, PDEP and PEXT instructions (like the rest of the BMI instruction sets) operate on general-purpose registers.[12]

The instructions are available in 32-bit and 64-bit versions. An example using arbitrary source and selector in 32-bit mode is:

Instruction Selector mask Source Destination
PEXT 0xff00fff0 0x12345678 0x00012567
PDEP 0xff00fff0 0x00012567 0x12005670

AMD processors before Zen 3[13] that implement PDEP and PEXT do so in microcode, with a latency of 18 cycles[14] rather than (Zen 3) 3 cycles.[15] As a result it is often faster to use other instructions on these processors.[16]

TBM (Trailing Bit Manipulation)

[edit]

TBM consists of instructions complementary to the instruction set started by BMI1; their complementary nature means they do not necessarily need to be used directly but can be generated by an optimizing compiler when supported. AMD introduced TBM together with BMI1 in its Piledriver[6] line of processors; later AMD Jaguar and Zen-based processors do not support TBM.[5] No Intel processors (at least through Alder Lake) support TBM.

Encoding Instruction Description[4] Equivalent C expression[17][9]
XOP.LZ.0A 10 /r id BEXTR Bit field extract (with immediate) (src >> start) & ((1 << len) - 1)
XOP.LZ.09 01 /1 BLCFILL Fill from lowest clear bit x & (x + 1)
XOP.LZ.09 02 /6 BLCI Isolate lowest clear bit x | ~(x + 1)
XOP.LZ.09 01 /5 BLCIC Isolate lowest clear bit and complement ~x & (x + 1)
XOP.LZ.09 02 /1 BLCMSK Mask from lowest clear bit x ^ (x + 1)
XOP.LZ.09 01 /3 BLCS Set lowest clear bit x | (x + 1)
XOP.LZ.09 01 /2 BLSFILL Fill from lowest set bit x | (x - 1)
XOP.LZ.09 01 /6 BLSIC Isolate lowest set bit and complement ~x | (x - 1)
XOP.LZ.09 01 /7 T1MSKC Inverse mask from trailing ones ~x | (x + 1)
XOP.LZ.09 01 /4 TZMSK Mask from trailing zeros ~x & (x - 1)

Supporting CPUs

[edit]

Note that instruction extension support means the processor is capable of executing the supported instructions for software compatibility purposes. The processor might not perform well doing so. For example, Excavator through Zen 2 processors implement PEXT and PDEP instructions using microcode resulting in the instructions executing significantly slower than the same behaviour recreated using other instructions.[20] (A software method called "zp7" is, in fact, faster on these machines.)[21] For optimum performance it is recommended that compiler developers choose to use individual instructions in the extensions based on architecture specific performance profiles rather than on extension availability.

See also

[edit]

References

[edit]
  1. ^ a b "New "Bulldozer" and "Piledriver" Instructions" (PDF). Retrieved 2025-08-06.
  2. ^ a b "AMD64 Architecture Programmer's Manual Volume 3: General-Purpose and System Instructions" (PDF). Retrieved 2025-08-06.
  3. ^ a b c "Intel Advanced Vector Extensions Programming Reference" (PDF). intel.com. Intel. June 2011. Retrieved 2025-08-06.
  4. ^ a b c d "AMD64 Architecture Programmer's Manual, Volume 3: General-Purpose and System Instructions" (PDF). Revision 3.32. AMD. March 2021. Archived (PDF) from the original on 2025-08-06. Retrieved 2025-08-06.
  5. ^ a b c d "Family 16h AMD A-Series Data Sheet" (PDF). amd.com. AMD. October 2013. Retrieved 2025-08-06.
  6. ^ a b Hollingsworth, Brent. "New "Bulldozer" and "Piledriver" instructions" (PDF). Advanced Micro Devices, Inc. Archived from the original (PDF) on 26 July 2014. Retrieved 11 December 2014.
  7. ^ a b Locktyukhin, Max. "How to detect New Instruction support in the 4th generation Intel? Core? processor family". www.intel.com. Intel. Retrieved 11 December 2014.
  8. ^ "bmiintrin.h from GCC 4.8". Archived from the original on 2025-08-06. Retrieved 2025-08-06.
  9. ^ a b "sandpile.org -- x86 architecture -- bits". Retrieved 2025-08-06.
  10. ^ "Abseil - C++ Common Libraries". GitHub. 4 November 2021.
  11. ^ a b "AMD Excavator Core May Bring Dramatic Performance Increases". X-bit labs. October 18, 2013. Archived from the original on October 23, 2013. Retrieved November 24, 2013.
  12. ^ Yedidya Hilewitz; Ruby B. Lee (August 2009). "A New Basis for Shifters in General-Purpose Processors for Existing and Advanced Bit Manipulations" (PDF). palms.princeton.edu. IEEE Transactions on Computers. pp. 1035–1048. Archived from the original (PDF) on 2025-08-06. Retrieved 2025-08-06.
  13. ^ "Zen 3 - Microarchitectures - AMD - WikiChip".
  14. ^ "Instruction tables" (PDF). Retrieved 2025-08-06.
  15. ^ "Software Optimization Guide for AMD Family 19h Processors". AMD Developer Central. Retrieved 2025-08-06.
  16. ^ "Saving Private Ryzen: PEXT/PDEP 32/64b replacement functions for #AMD CPUs (BR/#Zen/Zen+/#Zen2) based on @zwegner's zp7". Twitter. Retrieved 2025-08-06.
  17. ^ "tbmintrin.h from GCC 4.8". Archived from the original on 2025-08-06. Retrieved 2025-08-06.
  18. ^ "BIOS and Kernel Developer's Guide for AMD Family 14h" (PDF). Retrieved 2025-08-06.
  19. ^ "AMD Zen 3 Ryzen Deep Dive Review: 5950X, 5900X, 5800X and 5600X Tested". Retrieved 2025-08-06.
  20. ^ "Dolphin Progress Report: December 2019 and January 2020". Dolphin Emulator. 7 February 2020. Retrieved 2025-08-06.
  21. ^ Wegner, Zach (4 November 2020). "zwegner/zp7". GitHub.

Further reading

[edit]
[edit]
发低烧是什么原因 12月20号是什么星座 射手女喜欢什么样的男生 上升水瓶座为什么可怕 抽动症是什么原因造成的
腿上长痣代表什么 为什么会得神经性皮炎 鼻咽癌是什么 堪舆是什么意思 射精无力吃什么药
肝风内动是什么原因造成的 娇喘是什么 68年属什么生肖多少岁 回族为什么不能吃猪肉 焦俊艳和焦恩俊是什么关系
抗0是什么意思 因果业力是什么意思 杏花什么颜色 下肢静脉曲张是什么原因引起的 后脑勺疼吃什么药
小孩便秘吃什么通便快hcv8jop9ns5r.cn 蒟蒻是什么意思kuyehao.com 肉刺用什么药膏能治好hcv8jop0ns4r.cn 什么是性瘾症hcv7jop6ns7r.cn 有什么病shenchushe.com
molly什么意思hcv8jop9ns9r.cn 为什么8到10周容易胎停hcv8jop0ns5r.cn 冲突是什么意思naasee.com 尿频吃什么药好hcv7jop9ns8r.cn 压箱钱是什么意思hcv9jop5ns2r.cn
freeze是什么意思hcv7jop9ns8r.cn 残疾证有什么好处hcv8jop0ns7r.cn 诗韵是什么意思creativexi.com 汧是什么意思onlinewuye.com 手脚冰冷是什么原因hcv9jop7ns0r.cn
什么是心悸有什么症状hcv8jop1ns2r.cn 什么是双减hcv8jop2ns9r.cn 花荣的绰号是什么hcv8jop4ns5r.cn 什么同道合hcv8jop1ns4r.cn 氨纶是什么面料优缺点hcv8jop7ns5r.cn
百度