Geekbench 6 warns about inconsistent benchmarking performance from new Core Ultra 200S Plus chips — says Intel's IPC boosting Binary Optimization Tool modifies scores in 'unclear' fashion

Intel Arrow Lake Refresh
(Image credit: Intel)

One of the most noticeable upgrades Intel made to its latest Core Ultra 7 270K Plus and 250K Plus CPUs was the introduction of its Binary Optimization Tool that manipulates instructions at the hardware level to boost IPC. The tool is highly beneficial for squeezing extra performance out of the Arrow Lake architecture, but it has led to concerns over benchmarking accuracy and consistency with these chips. John Poole from Geekbench posted a warning to its users that Intel's latest tool can't be trusted at this time, and there's no way to identify when the tool is enabled or disabled during a benchmark run.

Pool revealed that Intel does not have any public documentation on the techniques the Binary Optimization Tool (or iBOT) uses to optimize code, making it difficult to determine how effective iBOT's techniques are when applied to a variety of different applications. Furthermore, this problem makes it impossible for Primate Labs (the makers of Geekbench) and its userbase to understand how iBOT is boosting performance compared to benchmarks that run without it. According to Poole, Geekbench 6 workload scores on the chips increase by up to 40% with iBOT enabled, with overall scores improving by up to 8%. "Since the tool modifies the benchmark, and it is unclear to both Primate Labs and the general public how these changes occur," he warned.

To deal with this problem, Geekbench will provide a warning on all Geekbench benchmark listings featuring iBOT-supported chips with the following description: “This benchmark result may be invalid due to binary modification tools that can run on this system.”

Latest Videos From

Google Preferred Source

Follow Tom's Hardware on Google News, or add us as a preferred source, to get our latest news, analysis, & reviews in your feeds.

Aaron Klotz
Contributing Writer

Aaron Klotz is a contributing writer for Tom’s Hardware, covering news related to computer hardware such as CPUs, and graphics cards.

  • Notton
    Alternate title: Geekbench 6 is a worthless benchmarking tool and no one should use it or listen to what they say
    Reply
  • Neilbob
    Notton said:
    Alternate title: Geekbench 6 is a worthless benchmarking tool and no one should use it or listen to what they say
    But what else are people going to use to define wildly unrealistic performance narratives? What other tool are certain large corporations WILDLY invested in?
    Reply
  • usertests
    Notton said:
    Alternate title: Geekbench 6 is a worthless benchmarking tool and no one should use it or listen to what they say
    I always prefer other benchmarks, particularly PassMark as a quick synthetic comparison, and actual games to show the benefits of bigger L3 cache.

    However, it's not great that Intel's very short list of iBOT-enhanceable games/apps includes a benchmark that people expect to measure the CPU's performance. Maybe it's not really benchmark cheating to use iBOT, but if it works with 0.01% of games/apps instead of say, 80%, it's creating false expectations.

    On the other hand, if Intel is going to diligently update iBOT to work with new and CPU demanding games in the coming years, they are giving their users access to free performance and efficiency. So let them cook.
    Reply
  • Geef
    Easy way to resolve the problem when you benchmark. As long as it doesn't take much time to convert the EXE file, just make two copies of the benchmark and then tag the Excel data sheet with a check box that says iBOT.
    The amount of time it took to write their article was probably enough to get all of the above setup. Unless... maybe if they were never trained in Excel and don't know how to insert another column... :eek: The horror!
    Reply
  • ejolson
    A useful benchmark should produce a verifiable result that can be checked to verify the expected calculation was performed in the measured time. I actually thought that's what Geekbench did and would be surprised if it didn't.

    In other news, since IBM introduced the System/360 in the 1960's processors have not executed the instructions generated by the programmers and the compilers. A micro-op instruction cache, branch prediction, register renaming and speculative execution can also increase performance.
    Reply
  • cknobman
    If iBOT optimizations have to be implemented by Intel for there to be any benefit then when looking a processor performance compared to competitors IT IS USELESS.

    And it would be disingenuous for Intel to optimize for iBot in Geekbench specifically and claim those scores are valid.

    Right now iBOT seems like a neat marketing pitch and has potential. But the reality is without Intel making optimizations for the applications it will be worthless.

    See the issue with that?
    Reply
  • TerryLaze
    cknobman said:
    If iBOT optimizations have to be implemented by Intel for there to be any benefit then when looking a processor performance compared to competitors IT IS USELESS.

    And it would be disingenuous for Intel to optimize for iBot in Geekbench specifically and claim those scores are valid.

    Right now iBOT seems like a neat marketing pitch and has potential. But the reality is without Intel making optimizations for the applications it will be worthless.

    See the issue with that?
    They don't need to make optimizations, the CPU does that by itself, they have to make sure though that what the CPU does isn't complete nonsense that will get your system bricked and kill all your data...see the issue with that?!
    That's why they only allow the titles they have tested enough to be sure about (and still give you an off/on button just in case) .
    Reply
  • usertests
    cknobman said:
    If iBOT optimizations have to be implemented by Intel for there to be any benefit then when looking a processor performance compared to competitors IT IS USELESS.
    I honestly don't think it's useless if they optimized, for example, 10-20 more CPU-intensive or popular titles every year. A small percentage of games are getting a majority of the play time. But if that's what they're forced to do, I don't trust that they will keep it up for a long time.

    They should also enable it by default for these titles if possible.

    If it's considered cheating for reviews, reviewers could consider turning it off or picking more obscure games to test.
    Reply