As is well known, we like Power, Speed, Slots and Watts metrics here at The next platform for any type of equipment that runs in the data center. But we’re also known to follow the money. And when it comes to the largest data centers in the world, and for those that provide cloud computing for rent from their infrastructure, it’s the cost per unit of work per unit of electricity consumed.
This equation, more than any other, drives hyperscalers for sure, and for cloud builders who have platforms and software-as-a-services as well as raw infrastructure, it drives them too.
And so, we’ve been eagerly anticipating the availability of in-house Graviton3 Arm server chips from Amazon Web Services, or more specifically, the Annapurna Labs division of the company that makes its Nitro DPUs, Graviton CPUs, custom Trainium training ASICs. AI, and its Inferentia AI inference engines. (And we think it’s quite possible that AWS is also designing its own ASIC switches.)
The Graviton3 chips we unveiled at the re:Invent 2021 conference last December, which we did a deep dive on here. After creating the Arm-based Nitro DPUs to offload compute, network, and storage virtualization and encryption work from its X86 servers, AWS decided in 2018 to scale it up and create the first Graviton for test the idea of using Arm servers in production. The original 16-core Graviton1 wasn’t that impressive, but offered much better value for money than many of its X86 server instances, and it was fine as a testbed and to do modest work in full that dominates many companies. Processing.
A year later, after seeing the enthusiastic adoption of the Graviton1 and the desire to control more of its stack – years before the coronavirus pandemic had all IT vendors wishing for tighter reins on their supply chains – AWS released the Graviton2 based on the “Ares” Neoverse N1 cores from Arm Holdings, cramming a very respectable 64 cores onto a die and creating a true server-class processor and not just a DPU that has server claims.
With the Graviton3, AWS moves from the Ares core to a “Zeus” V1 core, which doesn’t have much more integer performance – somewhere between 25% and 30% in the AWS design, based on the statements and references shown so far – but which has 2X the floating point performance and 3X the machine learning inference performance compared to Graviton2. Plus 50% more memory bandwidth to maintain balance.
AWS has clarified a few more details about the Graviton3 chip, and we’ve updated the table of key features for the Graviton chip family:
James Hamilton, the prominent AWS engineer who is often behind its custom hardware and data center designs, reveals the size of the L1 and L2 caches in the Graviton3 in a short presentation videoand added that thousands of customers have Graviton chips in production and there are 13 instance families in 23 AWS Regions based on the Graviton family of processors.
At the moment, Graviton3 processors are only available in the US West Oregon and US East Virginia regions, and it is unknown how many have been installed. But presumably, this chip will be the one AWS enthusiastically deploys given its drastically improved performance on many workloads and its continued improvement in performance and value for the dollar across the entire job.
To give you an idea of the evolution and scope of the Graviton family, we’ve put together some charts and graphs for comparison. Let’s start with the Graviton A1 instances from 2018, which were actually just a pumped-up Nitro, as we said. There weren’t many of them and they didn’t have a lot of features:
With a Graviton chip, which does not support simultaneous multithreading (SMT), a virtual CPU (or vCPU) is a physical core. With Intel Xeon SP and AMD Epyc processors, a vCPU is one thread and there are two threads per core. Anyway, the A1 instances based on the Graviton1 chip only had Elastic Block Storage (EBS) on the AWS network as storage (meaning no flash or local disk) and they had network performance Quite modest EBS and Ethernet, as you can see. They were also fairly cheap for their time and offered decent performance for the dollar for modest workloads.
We’ve worked backwards from statements AWS has made to estimate the relative performance of the EC2 computing unit (ECU) of the Graviton chip family in the tables in this story (and in previous tables we’ve facts about Gravitons) so that we can eventually find a Rosetta Stone to perform relative performance between X86 and Arm servers on the AWS cloud. (We’re not in the business of doing benchmarks ourselves, but we want to be able to make comparisons even if they’re odious to some.)
Then we have the Graviton2 chips, and below we look at the base C7g instances, which AWS didn’t actually use for its March 2021 Xeon server comparisons, which we developed and did our initial estimate of the ECU. Here are the C6g instances:
In these comparisons, AWS pitted its C6gd instances with local NVM-Express flash against its R5gd Xeon SP 8000 Platinum instances and its R6gd instances (again using Graviton2) against very old Xeon E7-8800 v3 instances.
With the C7g instances based on Graviton3 which have just become available, these are the bare metal instances with only EBS storage, but their network performance has been slightly increased and EBS performance is also increased, while the whole performance has increased by 25 % and 30% and the hourly cost of On-Demand Instances increased by 6.7%. We assume that 30% is a better metric for the raw performance increase (based on the measured relative SPECint_rate2017 results that AWS disclosed last winter in its architecture discussion) and changed our numbers from ECU to reflect this in the tables above. And when you do the math, then at least when it comes to integer work, Graviton3 offers about 18% better value for money than Graviton2.
It’s hard to visualize performance and price/performance lines, so we’ve created this chart to help you see how Graviton increases performance and lowers cost per unit of performance. Looked:
The orange bars for cost per unit of capacity are getting shorter with each generation of Graviton, and the black bars for performance are getting longer. And there’s no reason to believe that trend won’t continue with Graviton4, which we expect to re:invent 2022 this winter using a refined 5-nanometer process from Taiwan Semiconductor Manufacturing Co, or even the 4 nanometers (TSMC’s 4N process will be specific) that Nvidia uses to manufacture its “Hopper” GH100 GPUs. It’s possible that graviton4 is based on a 3 nanometer process, and that could mean waiting until 2023 unless TSMC has fixed the issues.