Snapshot and migrate your workloads across instances

Command your compute_

Command your

compute

Why Cedana? 

Reduce compute costs up to

Eliminate idle compute. Automatically suspend and resume your workloads based on activity. Automatically bin-packs containers freeing up resources.

Unprecedented reliability

Upon hardware or OOM failure, automatically resume workload on a new instance without losing work.

Reduce
latency 2-10x





Accelerate cold start and time to first token by resuming your CPU/GPU workload from its previous state. Eliminate boot time, initialization, and other steps.

Use Cases

Seamless stateful workload management for Kubernetes.

  • Native Kubernetes integration, seamless at all levels
  • 20-80% lower compute costs
  • 2-10x faster cold starts
  • Zero-downtime OS/HW upgrades
  • Automatic stateful workload failover

Maximize value and reliability with automated GPU orchestration.

  • 20%-80% increase in utilization
  • GPU live migration
  • Low-latency, elastic scaling
  • Automatic workload failover
  • Zero-downtime OS/HW upgrades
  • Dynamically resize workloads onto optimal
  • instances without interruption

Give your customers the highest performance, fastest, lowest-cost inferencing. Deliver enterprise SLAs.

  • 2-10x faster time-to-first token
  • Dynamically resize workloads to optimal instances
  • Automatically reduce idle inferencing time
  • Use spot instances without interruption
  • Faster model hotswapping

Increase throughput, reliability and speed of advanced large model training

  • Real-time checkpoint/restore of multi-node systems
  • Automatic workload failover always preserves work in mini-batch
  • Fully transparent, no code modifications
  • Fine-grained system-level checkpointing
  • 20%-80% lower compute costs
  • High availability and reliability, swap in GPUs and nodes on failure

Increase automation, throughput, and reliability of your workflow orchestration

  • Increase workload throughput
  • Automate workflows conditionally based on time and success criteria
  • Reduce redundant compute
  • Reduce manual intervention
  • Implement step-level retries

Orchestrate agent inferencing and training autonomously. Maximize utilization, reliability, and performance.

  • Increase GPU utilization with efficient hot swapping and bin-packing
  • Dynamic scaling for
    • Larger models
    • Increasing task complexity, context windows, and agent counts
    • Variable workload demands
  • Persistent agent state

Improve the performance and reliability of your gaming infrastructure

  • Reduce latency by migrating workloads to player geographies
  • Load balance workloads to eliminate resource bottlenecks
  • Automated workload failover
  • Zero-downtime OS/HW upgrades.
  • Reduce Costs 20%-80%

Increase automation, throughput, and reliability of your HPC workloads.

  • Never lose work on long-running workloads
  • Schedule, queue, and prioritize workloads across users and groups dynamically
  • 20-80% lower compute costs
  • Increase workload throughput
  • Automate workflows conditionally based on time and success criteria

Transform database deployment and operations with zero downtime migration

  • Live migration of in-memory databases
  • Zero-downtime OS/HW upgrades
  • Dynamically resize workloads to optimal instances
  • Eliminate over-provisioning
  • Automatically reduce idle compute
"We reduced our cloud costs 50% by integrating Cedana's Save, Migrate, and Resume capability into our product. If an instance fails, workloads automatically continue without losing work."
Debo Ray
CEO, DevZero

Easy Integration

Use Cedana's REST API to checkpoint your application’s state, transfer it to a new instance, cloud or resource, and resume operations. No code modifications needed.

Get started

Play in the sandbox

We’ve deployed a test cluster for you to play with where you can interact and experiment with the system.

Sandbox

Get a demo

Learn more about how Cedana is transforming compute orchestration and how we can help your organization.

Connect

API Reference & Guides

From deploying on your cluster, to market, to GPU Checkpointing, learn our system and get started quickly.

VIEW DOCS
Backers / Partners