Instant Clusters

You can now deploy high-performance GPU training clusters with Infiniband interconnect from your Verda Cloud Console, the same way you would deploy a single GPU instance.

The only available contract length is: Pay As You Go.

Instant clusters are available with either Nvidia B200 SXM6 GPUs or Nvidia H200 SXM5 GPUs, a 3.2 Tb/s Infiniband interconnect per node (eight 400 Gb/s links), and a 100 Gbit/s Ethernet network. The uplink to the Internet is symmetric 2 Gb/s.

Our instant clusters range from 16 to 128 GPUs. Each cluster has up to 16 worker nodes, with 8 GPUs per worker node, and one jump host. Each worker node has local NVMe storage and access to a configurable shared filesystem with up to 50TB of storage.

Clusters have Slurm pre-installed for easy job management and Grafana dashboard for monitoring and alerts. The Nvidia B200 instant clusters are currently available in FIN-03 location and H200 instant clusters in ICE-01 location.

View more:

Deploying an Instant cluster

Slurm

Environments

Containers

Monitoring

Good to know

Last updated

Was this helpful?