27 Mar

Nvidia announces new DPU, GPUs

    New Nvidia hardware and software are tailored to supporting AI.

    Nvidia launched its GPU Technology Conference with a mix of hardware and software news, all of it centered around AI.

    The first big hardware announcement is the BlueField-3 network data-processing unit (DPU) designed to offload network processing tasks from the CPU. BlueField comes from  Nvidia’s Mellanox acquisition, and is a SmartNIC fintelligent-networking card.

    BlueField-3 has double the number of Arm processor cores as the prior generation product as well as more accelerators in general and can run workloads up to eight times faster than the prior generation. BlueField-3 can accelerate network workloads across the cloud and on premises for high-performance computing and AI workloads in a hybrid setting.

    Kevin Durling, vice president of networking at Nvidia, said the Bluefield offloads MPI collective operations from the CPU, delivering nearly a 20% increase in speed up, which translates to $18 million dollars in cost savings for large scale supercomputers.

    Oracle is the first cloud provider to offer BlueField-3 acceleration across its Oracle Cloud Infrastructure service along with Nvidia’s DGX Cloud GPU hardware. BlueField-3 partners include Cisco, Dell EMC, DDN, Juniper, Palo Alto Networks, Red Hat and VMware

    New GPUs

    Nvidia also announced new GPU-based products, the first of which is the Nvidia L4 card. This is successor to the Nvidia T4 and uses passive cooling and does not require a power connector.

    Nvidia described the L4 as a universal accelerator for efficient video, AI, and graphics. Because it’s a low profile card, it will fit in any server, turning any server or any data center into an AI data center. It’s specifically optimized for AI video with new encoder and decoder accelerators.

    Nvidia said this GPU is four times faster than its predecessor, the T4, 120 times faster than a traditional CPU server, uses 99% less energy than a traditional CPU server, and can decode 1040 video streams coming in from different mobile devices.

    Google will be the launch partner of sorts for this card, with the L4 supporting generative AI services available to Google Cloud customers.

    Another new GPU is Nvidia’s H100 NVL, which is basically two H100 processors on one card. These two GPUs work as one to deploy large-language models and GPT inference models from anywhere from 5 billion parameters all the way up to 200 billion, making it 12 times faster than the throughput of an x86 processor, Nvidia claims.

    DGX Cloud Details

    Nvidia gave a little more detail on DGX Cloud, its AI systems which are hosted by cloud service providers including Microsoft Azure, Google Cloud, and Oracle Cloud Infrastructure. Nvidia CEO Jensen Huang previously announced the service on an earnings call with analysts last month but was short on details.

    DGX Cloud is not just the hardware, but also a full software stack that turns DGX Cloud into a turnkey training-as-a-service offering. Just point to the data set you want to train, say where the results should go, and the training is carried out.

    DGX Cloud instances start at $36,999 per instance per month. It will also be available for purchase and deployment on-premises.

    Nvidia gets into processor lithography

    Making chips is not a trivial process when you’re dealing with transistors measured in nanometers. The process of creating chips is called lithography, or computational photography, where chip designs created on a computer are printed on a piece of silicon.

    As chip designs have gotten smaller, more computational processing is required to make the images. Now entire data centers are dedicated to doing nothing but processing computational photography.

    Nvidia has come up with a solution called cuLitho. They are new algorithms to accelerate the underlying calculations of computational photography. So far,  using the Hopper architecture, Nvidia has demonstrated a 40-times speed up performing the calculations. 500 Hopper systems (4,000 GPUs) can do the work of 40,000 CPU systems while using an eighth the space and a ninth the power. A chip design that typically would take two weeks to process can now be processed overnight.

    This means a significant reduction in time to process and create chips. Faster manufacturing means more supply, and hopefully a price drop. Chipmakers ASML, TSMC, and Synopsys are the initial customers. cuLitho is expected to be in production in June 2023.

    Share this