this post was submitted on 08 Aug 2023
50 points (98.1% liked)
Technology
37757 readers
254 users here now
A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.
Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.
Subcommunities on Beehaw:
This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Oh hey, a non-Intel speculative execution vulnerability.
I've been wondering whether speculative execution as it's currently done is just fundamentally broken and guaranteed to leak information one way or another, but I really don't know enough about CPU design to even make a guess
It was done to do the both side of if/else while waiting for the check to finish, then it jumps to the correct branch result execution point to keep going, whatever that was "wrong" is wasted and should be flushed. Don't know if this fits the modern definition but that's why they do this type of thing.
Actual design or implementation is more complex than I described as there are a couple ways to tackle this branching delay issue.
Does it execute both, or does it execute the branch that is more likely to be valid? Branch prediction seems like it'd be way more performant than executing both branches until the result of the branch condition is available. If you think about it, what you're proposing will cause the CPU to always execute instructions that are not meant to be executed when confronted with a branch whereas branch prediction will only execute these "useless" instructions in the unlikely scenario where the prediction is incorrect.
NOTE: this is based on my personal understanding which might be wrong/outdated in the modern setting, so if some lemmy is expert in this regard feel free to correct me.
some source, I only skimp it and decide what I typed might still be relevant: https://en.wikipedia.org/wiki/Branch_predictor#
If anything that my previous cpu architecture course(in the early 2000, so almost 20 years old) is that intuition doesn't work as you imagined. The branch prediction is like a counter you can keep or adjust or flush and there is a very good reason that you execute both when you have the registers and processing unit that can run the instructions in parallel and just pick the correct path once the IF check result is completed. Why they have this in the beginning even before multicore CPU is that the penalty of waiting far outweight just run as if the "if" never exist and "abandon" the result and skip the if block. Common if block is to hide expensive calculation, which means you usually have fetches and potential cache miss and needs to go all the way to read disk(that's why we have prefetching etc instructions that compiler guess for you depending on your code), so if you waited all the cycle and then cache miss, you are gonna stall and wait until a couple hundred cycles waiting for the data to fetch and fill the registers/cache required to run that instruction in side the if block. is there a penalty if the predictor guess wrong? yes, but the penalty is far cheaper compare to if you don't run at all.
For blocks that have both if/else that are roughly similar, then running both is preferred unless your scheduling etc can't really do this(compile will decide this for you) and you probably need some profiler to check why your CPU stalled and fix your code so compiler can schedule better for you(profiler assisted optimization is a thing). It doesn't have to be all equal weight for both if and else part, that's why predictor kicks in so if the past batch of data all resulted in true, they allocate more in the true block and run less in the false block and eat the penalty if only a couple in the same data set resulted in having to run longer in the false block. Modern compiler/CPUs are really good with this and usually are better using the predictor/default optimization flag over hand tuning(like writing your own unwrap/asm etc).
Is there cases where running both is not really practical? yes, I think there will always be cases where your program maybe data heavy where say, true and false fetches entirely different block of data for calculation and doing both makes your cache miss more, it would just be game of "eating less penalty" and usually is out of reach for us since compiler just did it for you. And that's not my specialty at all, I am just your common pleb programmer that relies on compiler -O3.
It will prefetch the instructions and put into the pipeline the branch it thinks is mostly likely. It may do ahead-of-time speculative execution on certain instructions but not always. If it missed the correct branch it will flush the pipeline and start the pipeline over again from the correct branch. Afaik it doesn't execute or prefetch both branches. The other guy is saying it does but that doesn't really gel with my own recollection or the Wikipedia article he cited. You can see some further discussion that suggests only one branch gets prefetched here here and here. Reasons cited for only predicting one branch are: 1) Two pipelines with all the associated circuitry to look ahead, decode, and speculatively execute is incredibly expensive in terms of both processing requirements and die real estate. 2) Caching both would thrash your caches with new data constantly. 3) Modern branch prediction is already so accurate, there's really no need for two pipelines anyways.