2015/04/28 ARM Gives Peek at Road Map @ 真乄科技業的頂尖投資團隊

The Cortex-A72 shares branch prediction, load/store and floating point units with an unannounced high-end ARM core.

SANTA CLARA, Calif. — In the course of providing new details about its latest Cortex-A72 core, ARM provided a peek at its next-generation high-end core yet to be announced. The company also reiterated its support for its big.little multiprocessing technology although it has not yet been widely adopted.

“We’ve seen a big improvement in what phones can do, and I think we are about to see another improvement,” said Brian Jeff, director of marketing for ARM’s Cortex-A products in a talk on the A72 at the Linley Mobile event here. Jeff described three functional units in the A72 taken from a design of an unannounced “high-end core which was a brand new redesign.”

The A72 is the highest end member of ARM’s 64-bit V8 series the company has announced so far. Jeff described its design goals mainly in terms of power efficiency rather than raw performance.

“When we set the scope [for the A72], we had a little more than a year to take as much power out as possible and keep performance constant,” he said.

Although he repeatedly described work optimizing the A72 for mobile workloads, Jeff said the core is also configurable for enterprise applications that would benefit from its efficiency. For example, it can be used in SoCs with up to 48 cores, supports full ECC protection, accelerator interfaces and an AMBA 5 bus.

The description suggested the yet-to-be-announced core targets maximum performance, perhaps in an effort to leapfrog Intel’s Broadwell design. However, Jeff would provide no details on the next-gen core except to say the A72 borrows three blocks from the design—a branch-predict unit, a load/store unit and a floating point unit.

Lowering power was clearly the top priority for the A72 on the heels of experiences with its A15 core. “The improvements in the A15 were a big jump up in performance, but you couldn’t tap into it all because it got thermally throttled, so we’ve been focused on reining in power at the high end…to fit in to an all-day performance of a smartphone,” he said.

As a result the A72 is expected to hit a 2.5 GHz maximum data rate in a 14/16nm process, up from 1.6 GHz for the A15 in a 28nm process, a slight bump up from 2.3 GHz for the A53 in a 14/16nm process. Individual A72 cores can draw up to 600-750 milliwatt power in SoCs that consume up to 2.5W, he added.

The figures translate to the A72 using 75% less energy than an A15 or delivering 3.5x more performance, he said. However Jeff cautioned that ARM doesn’t “have A72 silicon running at speed yet.”

Jeff laid out a laundry list of power efficiency improvements in the three new blocks shared with the unannounced high-end core. For example the branch predictor eats more power but significantly reduces cache misses. “It more than pays for its increase in size,” he said.

He provided a partial list of more than a dozen improvements in the load/store unit that reduce power or boost performance, each providing a quarter-percent benefit. For instance, it decodes micro-operations at later stages in the pipeline to “get more throughput in the front end and save decode power,” he said.

The new floating point unit focuses on lowering latency as much as 25-50% for some operations that are “on par or ahead of Broadwell,” he said. Some execution units also provide 2-3x more bandwidth for integer operations.

The full list Jeff provided is on the final page of this report. However each segment of that list was marked as a partial list noting the dozens of fine-grained upgrades inside the A72.