By Jack Ganssle

Coremark

How fast is your CPU?

That, of course, is rather a meaningless question. The amount of work a processor can get done in a period of time is dependent on many factors, including the compiler (and its optimization level), wait states, background activity like DMA that can steal cycles, and much more. Yet plenty of folks have tried to establish benchmarks to make some level of comparison possible. Principle among these is Dhrystone.

But Dhrystone has problems. Compiler writers target the benchmark with optimizations that may not help developers much, but give better scores. Much of the execution time is spent in libraries, which can vary wildly between compilers. And both the source code and reporting methods are not standardized.

A few years ago the EEMBC people addressed these and other issues with their CoreMark benchmark which is targeted at evaluating just the processor core. It's small - about 16k of code, with little I/O. All of the computations are made at run time so the compiler can't cleverly solve parts of the problem. CoreMark is focused primarily on integer operations - the control problems addressed by embedded systems.

The four bits of workload tested are matrix manipulation, linked lists, state machines, and CRCs. The output of each stage is input to the next to thwart over-eager compiler writers.

One rule is that each benchmark must include the name and version of the compiler used, as well as the compiler flags. Full disclosure, no hiding behind games.

The result has been good news for us. Some of the compiler vendors have taken on CoreMark as the new battleground, publishing their scores and improving their tools to ace the competition. IAR and Green Hills are examples.

Scores are expressed as raw CoreMark, CoreMark/MHz (more interesting to me), and CoreMark/Core (for multi-core devices). There are two types of results - those submitted from vendors, and those certified by EEMBC's staff (for a charge).

Results range from 0.03 CoreMark/MHz for a PIC18F97J60 to 168 for a Tilera TILEPro64 running 64 threads. The single-threaded max is 5.1 for a Fujitsu SPARK64V(8).

But away from speed demons like Pentium-class or SPARC machines, the highest score is for Atmel's SAM4S16CAU - a Cortex M4 device - which notches in at 3.38 CoreMark/MHz. That beats out a lot of high-end devices.

Clock rates do matter, and while the Intel Core i5 gets a score of 5.09 CoreMark/MHz, its raw result, at 2500 MHz, is 12715, or 6458 CoreMark/core. That thrashes the Atmel device which was tested to 21 MHz, where it netted 71 CoreMark.

There are some caveats. Some processors can load the entire test in cache. For those, it makes sense to use some of EEMBC's more comprehensive benchmarks. Wait states are a problem, so tests report where the code runs: if it's from flash it'll generally be slower than from RAM. The nearly-shocking news that the Core i5 is less than two times the score/MHz for a Cortex M4 neglects nifty features like floating point (the i5 has an insanely-fast FPU, which the benchmark's integer tests ignore).

Some companies couple CoreMark with EEMBC's Energy Bench to compute performance per mA, a number of increasing importance.

Best of all, the code is freely available at coremark.org.

I've turned into a crack-head, and my drug of choice is the CoreMark scores. It's fascinating to compare various processors and compilers. The results can be pretty surprising.

Thanks to Marcus Levy of EEMBC for answers to my questions.