In 2019, StorageReview wrote a review of our HCI-224 solution for Azure Stack HCI with Intel® Optane™ NVMe SSDs. In their testing, they found the Azure Stack HCI and DataON solution was 4x faster than VMware vSAN on the same hardware, calling it “the fastest we’ve seen in a mid-market 4-node HCI cluster.” In addition, StorageReview found Azure Stack HCI to be “simple to deploy, easy to manage, and exceptionally performance, all things you want.”
Since then, Microsoft has introduced Azure Stack HCI as a new hyper-converged operating system delivered as an Azure service and Intel has come out with their 3rd Gen Intel® Xeon® Scalable processor. So we asked StorageReview to check out our new solution with the latest software and hardware.
Watch the StorageReview interviews and podcasts
Cosmos Darwin joins the StorageReview podcast to talk about the latest on Microsoft Azure Stack HCI, what’s coming very soon in the next release, and our review that saw 4+ million IOPS out of a tiny cluster.
StorageReview talks with Rocky Shek, Technical Product Manager at DataON to get the lowdown on what’s new with Integrated Systems for Azure Stack HCI.
StorageReview is joined by DataON’s Howard Lo who gives an update on where DataON is today and how the Intel partnership keeps them ahead in the Azure Stack HCI world.
With DataON’s latest Integrated Systems comes MUST Pro, which lets Azure Stack HCI admins update the entire software and hardware stack from a single interface.
StorageReview chats with Gina Merjanian about Intel and their important partnership with DataON and their plan for new US fabs.
Microsoft’s Jeff Woolsey hosts a comprehensive demo video showcasing how Azure Stack HCI is different from other HCI products.
DataON AZS-6224 Integrated System for Azure Stack HCI Review
Hyperconverged Infrastructure (HCI) has gained popularity for the simplicity of deployment and management. For those in the Hyper-V world, HCI is consumed via Azure Stack HCI. The good news here is that Microsoft has continued to pour features into Azure Stack HCI. Features like support for Azure Kubernetes Service (AKS) enable a flexible hybrid cloud experience. On the hardware side, Intel’s 3rd Generation Xeon Scalable processor release means a generous performance boost for Azure Stack HCI nodes. To help organizations take advantage of all these advances, DataON has launched a series of new AZS Integrated Systems.
We’ve looked at DataON systems a number of times over the years. Their Optane-based system won our Editor’s Choice award two years ago. We also took a look at their two-node QLC-heavy system that brought cost-effective flash to the edge. What’s new this time is a tech refresh thanks to Intel and new software that brings what we used to think of as Ready Nodes for Azure Stack HCI, a new integrated system model that’s bringing DataON nearer the level of Dell and Lenovo.
DataON Integrated Systems
So what makes an integrated system? In the case of DataON, we start with the Intel server, which DataON orders on-demand as customer orders come in. These systems show up to DataON fully configured, with the appropriate 3rd Gen Intel Xeon CPUs, DRAM, and networking. From the storage side, DataON supports the full Intel storage stack including the popular P5510, P5316 QLC SSDs, Optane P5800X SSDs, and PMem. For those looking at hybrid solutions, DataON also offers HDD support.
Once into their lab, DataON engineers install and configure the Azure Stack HCI OS, which is fundamentally new and different than in the past, where Azure Stack HCI was more of a feature in Windows Server. DataON then performs a burn-in to makes sure all system components are working and performing as expected. For more on the new OS and the hardware, check out podcast #86 with Howard Lo.
DataON doesn’t stop there though. They’ve long offered a free plugin for Windows Admin Center called MUST, which provides management, monitoring, and alerts for DataON HCI clusters. Now they have MUST Pro, which keeps the Integrated Systems updated with the latest validated firmware and drivers. So now in one interface customers can update their Microsoft software, alongside key drivers and firmware updates for the servers and other components. Kevin did a deep dive on MUST Pro with Henry Fu when we were recently on-site.
One last note about the Integrated Systems is they also include a new support ticketing mechanism. This better unites the Microsoft and DataON support teams so that they’re working off a common ticketing system. This means when customers call into either DataON or Microsoft, engineers from both teams can pass information back and forth to resolve the issue. Support for systems like this is a common complaint amongst customers who tire from finger-pointing. With this solution, the hardware and software teams are working together to solve any issues that occur.
While DataON will continue to offer a wide variety of nodes for Windows Server-first use cases, they’ve identified three key configuration families in their Integrated System portfolio. The AZS-6112, AZS-6212, and AZS-6224 are all similar in that they run the Intel server, CPU, and storage stack. The 6112 is a smaller 1U 12-bay NVMe system. The 6224 is a more mainstream 24-bay NVMe system, with more PCIe expansion than the smaller 1U offering. The 6212 is a hybrid system with twelve (12) 3.5″ bays (2x NVMe) for those who have a high-capacity need.
Our review system is the mainstream AZS-6224, with a modest base-level configuration.
DataON AZS-6224 Configuration
3x DataON S2D-6224 2U 24x 2.5″ all-NVMe server nodes.
Per node:
- 2x Intel® Xeon® Scalable Gen3 Gold 6330 2.0 GHz, 28-Core, 42MB Cache
- 32x Samsung 64GB DDR4 3200MHz ECC-Register RDIMM
- 2x Intel® S4520 480GB SATA M.2 Boot Drive
- 1x NVIDIA/Mellanox ConnectX-6 Dx EN Dual-Port QFP56 100GbE RDMA Card, PCIe 4.0 x16
- 5x Intel DC P5510 NVMe 3.8TB 2.5″ 144L 3D TLC SSD
Nodes connected via NVIDIA/Mellanox SN2010 100GbE switch.
DataON AZS-6224 Performance
To measure the performance of the DataON AZS-6224 cluster, we provisioned VM Fleet spread out evenly across the 3-node cluster, with a balance of storage to compute resources. 168 VMs were deployed on the cluster (one per CPU core) with 20GB of storage per VM in use. This leveraged 3TB of storage evenly spread across the cluster, with enough compute resources behind it to not get held up in high bandwidth or I/O tests.
We leveraged the following workloads to accurately profile the DataON AZS-6224 cluster:
- 4K Random read/write
- 32K Sequential read/write
- 64K Sequential read/write
- 4K Random 70% read, 80% read and 90% read
- 8K Random 70% read, 80% read and 90% read
- 16K Random 70% read, 80% read and 90% read
- VDI: Bootstorm, initial login and Monday login
In our first test, we look at small-block random performance with a 4K read and write profile. The 3-node cluster backed with 100GbE network connectivity performed incredibly well, measuring over 4M IOPS read and 525K IOPS write. In read, it measured just 0.13ms average latency, while write latency came in at 0.03ms.
While random performance is great, it is also important to see how well a cluster performs in regard to bandwidth. To start we looked at a 32K sequential profile. Here the DataON AZS-6224 really surprised is with an incredible 45.6GB/s in read and 14.3GB/s in write.
Switching to a larger 64K blocksize in a sequential workload, the AZS-6224 leveraged the 100GbE fabric to the brink, pushing out 91.5GB/s read and 13.6GB/s write. To say we were impressed was putting it lightly, it was great to see this level of performance out of a three-node cluster.
Next, we start to look more at mixed workloads with random traffic centered around the 4K blocksize. We look at different read percentages including 70%, 80%, and 90% workloads. Here the AZS-6224 cluster measured 2.8M IOPS at 90% read, 2.2M IOPS at 80% read, and 1.5M IOPS at 70% read. Average latency measured 0.18ms, 0.70ms, and 1.93ms respectively.
In our 8K random workload with 70%, 80%, and 90% read mixes, the DataON AZS-6224 continued to shine. We measured 2.7M IOPS at 90%, 2.1M IOPS at 80%, and 1.5M IOPS at 70%, where the cluster stayed at roughly the same performance range as its 4K workloads, now just ramping up bandwidth as the block size increased. Average latency remained low, measured 0.22ms at 90%, 0.78ms at 80%, and 1.90ms at 70% read percentages.
Moving up to 16K random transfers, we kept the same 70%, 80%, and 90% read mixes. IOPS levels barely slowed down on the AZS-6224 from the 4K and 8K profiles. We measured 2.6M IOPS at 90%, 2M IOPS at 80%, and 1.5M IOPS at 70% read, again showing the cluster had no problem keeping up the pace as the bandwidth requirements for the larger block sizes increased. Average latency measured 0.36ms at 70%, 0.87ms at 80% and 1.94ms at 70% read.
In our final two mixed workload groups, we move towards synthetic approximations of SQL and VDI workloads. The first being SQL Server in 80%, 90%, and 97% read combinations. For a 3-node cluster with 5 NVMe drives per node, the performance remained very strong. We measured 2.1M IOPS at 80% read, 2.7M IOPS at 90% read, and 3.4M IOPS at 97% read. Latency measured 0.81ms, 0.24ms, and 0.17ms respectively across the group.
Finally, we move into our VDI workloads, covering profiles such as bootstorm, initial login, and Monday login activities. In this area, the 3-node DataON AZS-6224 maintained its trend of throwing down impressive numbers. We measured 2.5M IOPS in the bootstorm profile, 600K IOPS in Initial Login, and 807K IOPS in Monday login.
In addition to testing Azure Stack HCI, we decided to reprovision the exact same hardware in VMware vSAN just for perspective. Azure Stack HCI and vSAN aren’t exactly the same thing, vSAN remains more full-featured and with some of their storage and networking design considerations, isn’t trying to be the fastest HCI around.
Given the differential of performance, we focused on a four corners comparison and one mixed workload between Azure Stack HCI and VMware vSAN. While the Windows Server side allows you to use a flat flash architecture (fully leveraging all drives in each node for read and write activity) the VMware vSAN layout needs two SSDs for write cache and three or more SSDs for capacity. This puts vSAN into a tough position for very high-performance workloads, which is worsened with smaller node configurations.
In 4K random read using HCI bench in VMware, we measured 699K IOPS read and 257K IOPS write from vSAN, with an average latency of 0.59ms and 1.51ms respectively. This put the Azure Stack HCI setup at more than 5.8x faster in read and 2x faster in write throughput.
Moving to large-block sequential performance with a 64K workload, we measured 19GB/s read and 3.2GB/s write through vSAN at an average latency of 1.57ms and 8.09ms. Azure Stack HCI in this case more than quadrupled the performance in read and offered more than 4.1x the write performance.
Switching focus to mixed workloads, we compared the SQL Server 80%, 90%, and 97% read workloads between the two setups. On vSAN we measured 436K IOPS at 80%, 546K IOPS at 90%, and 630K IOPS at 97%. With mixed workloads, Azure Stack HCI came in about 4.7 to 5.4x faster. Again, the vSAN figures by themselves are still very strong, especially for a three-node cluster. So strong that for most customers it would deliver performance in excess of what is needed. This comparison just highlights where Azure Stack HCI has optimized around NVMe flash designs to help customers that need or demand bleeding-edge performance.
Conclusion
DataON already offered some of our favorite Azure Stack HCI solutions. Now they’ve updated their line with the latest Intel technologies, new software, and improved support matrix. This combination, especially with the new MUST Pro application, gets them on the same playing field as others like Lenovo and Dell who offer integrated solutions for Azure Stack HCI. DataON however is faster to adopt and certify new Intel gear and based on quotes StorageReview has seen, they’re much more cost-effective.
The Gen4 NVMe-based DataON AZS-6224 Azure Stack HCI cluster really hit it out of the park in all areas in regards to performance. Peak bandwidth topped out at 91.5GB/s, which a couple of years ago would have been unheard of on a performance cluster, not to mention a 3-node HCI platform.
Random and mixed workloads were no trouble for this platform either, with 4K random read peaking at 4M IOPS and 8K 70/30 topping 1.5M IOPS. The performance discussion on this type of platform gets pretty interesting, since storage is so far away from being the bottleneck, that you start looking more towards networking fabric or applications for tuning. Or the IT admin can just sit back and enjoy users commenting on barely perceptible response times.
Microsoft has recently made a fundamental and critical change in its approach to HCI. Azure Stack HCI is no longer a feature of Windows Server, it’s a distinct entity. One of their missions is to keep pace with Azure public cloud, while also adding features, like GPU support, where they might be a little behind. This pace of adopting new features is accelerating quickly now, with a major 2H21 release coming very soon.
With all of what’s new that DataON has already deployed and future updates from Microsoft around the corner, we’re again really impressed with what such a small company (relatively speaking) is achieving. The DataON AZS-6224 is an amazing HCI solution, that with only 15 Intel P5510 SSDs, posted over 4 million IOPS and 91.5GB/s. That’s insane.
Learn more about DataON AZS Integrated Systems
This article was originally posted on StorageReview.com and is reposted with permission.