Batch Jobs

What are batch jobs?

Batch jobs are an autoscaling Containers feature for long-running, one-off work.

Each job gets a dedicated replica. That replica is destroyed as soon as the job finishes.

Why use batch jobs instead of continuous deployments?

With long inference duration (typically > 3 minutes), downscaling is tricky:

  • A high Scale-down delay prevents killing in-flight requests. It also leaves replicas idle and wastes money.

  • A low Scale-down delay can terminate a replica mid-request.

Batch jobs avoid this. They tie replica lifetime to the job lifecycle.

Your app must be able to exit the process to signal completion. Use exit code 0 for success. Use a non-zero code for failure.

Key differences vs continuous deployments

  • Batch jobs are always async. See Async Inference.

  • Each job has a deadline. When it’s reached, the replica is killed even if still running.

  • A job is considered “done” only when your process exits.

Usage and example

This example uses:

When creating the deployment, the batch-job specific settings are:

  • Max concurrent jobs: maximum replicas. Scales to 0 when the queue is empty.

  • Deadline: maximum time a replica can stay up for a job.

1

1) Start a job

Trigger a job that runs for 10 seconds:

circle-info

To use a custom job id, set X-Inference-Id: <custom-id>.

2

2) Check job status

3

3) Fetch the result

Best practices

  • Use batch jobs for workloads that usually run longer than ~3 minutes.

  • Exit the process when the job is done (success or failure).

  • If you return an HTTP response, exit after the response is sent.

  • Log heavily. Use DEBUG during development. Use INFO/WARNING in production.

Troubleshooting

  • Replica keeps running after the job is done

    • Make sure you actually exit the process.

    • Make sure you exit with the right status code.

    • Unhandled exceptions may return an HTTP error but keep the process alive.

  • Replica was killed before the job finished

    • Set Deadline higher than your expected job duration.

  • No response is returned

    • Make sure the process doesn’t exit before sending the response.

    • In FastAPI, exit from a BackgroundTasks task after returning.

    • In Node.js, exit via setImmediate() after writing the response.

  • Replica isn’t accepting jobs

    • Make sure you implement a GET /health endpoint.

Last updated

Was this helpful?