this post was submitted on 07 Jul 2023
9 points (84.6% liked)
Programming
17378 readers
458 users here now
Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!
Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.
Hope you enjoy the instance!
Rules
Rules
- Follow the programming.dev instance rules
- Keep content related to programming in some way
- If you're posting long videos try to add in some form of tldr for those who don't want to watch videos
Wormhole
Follow the wormhole through a path of communities !webdev@programming.dev
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Professor Lemire btw, is a high-performance professor who has been doing a lot of AVX512 techniques / articles for the past few years. His blogposts are very popular on Hacker News (news.ycombinator.com). Pretty cool guy, I think its well worth it to follow his blog if you're into low-level assembly, low-level memory optimizations and the like.
pext (and the reverse, pdep) are basically a 64-bit bitwise gather and 64-bit bitwise scatter instruction. On Intel, they execute in 1-tick, but on AMD they execute on 19-ticks (at least, a few years ago). Rumor is that the newest AMD chips are faster at it.
pdep and pext are some of my favorite functions, because gather/scatter is an important supercomputer / parallelism concept, and Intel invented an extremely elegant way to describe bit-movement in 64-bit registers. Given the huge importance of gather/scatter is to supercomputer algorithms of the past 40 years, I expect many, many more applications of pdep/pext.
My own experiments with pdep and pext was to create a small-sized bit-scale relational database for solving 4-coloring theorem (like) problems. I was able to implement "select" with a pext, and "joins" as a pdep. (4-bits is a single-column table. 16-bits for a dual-column table. 64-bits for a triple-column table).
I agree he’s doing very cool work! I’m just waiting for the day I get to use some of his tidbits! :)