NVidia L40S

| Supermicro newsNews

NVidia L40S (GPU-NVL40S)

 

NVidia L40S

Specification Nvidia L40S
GPU Architecture  NVIDIA Ada Lovelace architecture
GPU Memory  48GB GDDR6 with ECC
Memory Bandwidth  864GB/s
Interconnect Interface  PCIe Gen4 x16: 64GB/s bidirectional
 NVIDIA Ada Lovelace Architecture-Based CUDA® Cores  18,176
 NVIDIA Third-Generation RT Cores  142
 NVIDIA Fourth-Generation Tensor Cores  568
 RT Core Performance TFLOPS  212 
 FP32 TFLOPS  91.6
 TF32 Tensor Core TFLOPS  183 I 366*
 BFLOAT16 Tensor Core TFLOPS  362.05 I 733*
 FP16 Tensor Core  362.05 I 733*
 FP8 Tensor Core  733 I 1,466*
 Peak INT8 Tensor TOPS  733 I 1,466*
Peak INT4 Tensor TOPS  733 I 1,466*
Form Factor  4.4" (H) x 10.5" (L), dual slot
Display Ports  4x DisplayPort 1.4a
Max Power Consumption  350W
Power Connector  16-pin
Thermal  Passive
Virtual GPU (vGPU) Software Support  Yes
vGPU Profiles Supported  See virtual GPU licensing guide
NVENC I NVDEC  3x l 3x (includes AV1 encode and decode)
Secure Boot With Root of Trust  Yes
NEBS Ready  Level 3
Multi-Instance GPU (MIG) Support  No
NVIDIA® NVLink® Support  No
*With Sparsity
  • The new Ada Lovelace architecture includes a new multi-stream multiprocessor, fourth-generation Tensor cores, third-generation RT cores, and 91.6 teraflops FP32 performance.
  • Experience the power of generative artificial intelligence, LLM training, and inference capabilities through features like the Transformer Engine — FP8, tensor performance exceeding 1.5 petaflops*, and a large L2 cache.
  • Unleash unparalleled 3D graphics and rendering capabilities with 212 teraflops RT core performance, DLSS 3.0 for AI frame generation, and manipulation of shader module execution order.
  • Increase multimedia acceleration with 3 encoding and decoding engines, 4 JPEG decoders, and support for AV1 encoding and decoding.

Why NVidia L40S - Key Features

  • Impressive performance. For LLM, better performance than even HGX A100 in many scenarios, including GPT-170B level, except for large-scale training from scratch.
  • Ideal for utilizing pre-trained base models from NVIDIA, open source types, and fine-tuning. B-etter availability (shortened lead time – available from September)
  • Encompasses graphics, robust multimedia engines (unavailable with A100/H100)
  • 20-25% better price than A100."

Benefits for customers considering L40S instead of H100 or A100.

  1. What is the workload?
    • Are you using Generative AI/large language models (LLM), training a large model from scratch with a massive dataset, or fine-tuning a pre-trained model?
    • Is most of your inference based on pre-trained models?
    • Are you planning to run HPC workloads such as scientific/engineering simulations? Is FP64 precision important?
    • Does your workload involve graphics, video encoding/decoding/transcoding?
    • Will these be edge applications?
  2. What are the relevant benchmarks for the workload?
  3. What is the scale, how many GPUs are needed?
    • For example, 4000 L40S with FP8 precision can fully train GPT170B with 300B tokens in less than 4 days, faster and cheaper than HGX A100.
  4. Any specific technical specifications or bottlenecks? E.g., GPU memory, memory bandwidth, GPU Interconnect, and latency?

Important:

  • Nvidia L40S does not support NVLink
  • NVidia L40S is cheaper~15% than A100

Related Pages

  1. Serwery Supermicro dedykowane dla NVidia L40S
  2. Serwery Gigabyte dedykowane dla NVidia L40S
  3. ChatGPT New Liquid-Cooled Workstations: Supermicro SYS-551A-T and Supermicro SYS-751GE-TNRT-NV1 Designed for AI
  4. New GIGABYTE G363-SR0 and G593-SD2 Servers for AI and HPC:
  5. Artificial Intelligence (AI) ChatGPT, Bing, Bard - part 1
  6. Supermicro GPU Platforms