Batch Jobs

What are batch jobs?

Batch jobs are an autoscaling Containers feature for long-running, one-off work.

Each job gets a dedicated replica. That replica is destroyed as soon as the job finishes.

Why use batch jobs instead of continuous deployments?

With long inference duration (typically > 3 minutes), downscaling is tricky:

A high Scale-down delay prevents killing in-flight requests. It also leaves replicas idle and wastes money.
A low Scale-down delay can terminate a replica mid-request.

Batch jobs avoid this. They tie replica lifetime to the job lifecycle.

Your app must be able to exit the process to signal completion. Use exit code 0 for success. Use a non-zero code for failure.

Key differences vs continuous deployments

Batch jobs are always async. See Async Inference.
Each job has a deadline. When it’s reached, the replica is killed even if still running.
A job is considered “done” only when your process exits.

Usage and example

This example uses:

Source: verda-cloud/batch-jobs-example
Image: ghcr.io/verda-cloud/batch-jobs-example:1.0.1
Exposed port: 8000
Health check path: /health

When creating the deployment, the batch-job specific settings are:

Max concurrent jobs: maximum replicas. Scales to 0 when the queue is empty.
Deadline: maximum time a replica can stay up for a job.

1) Start a job

Trigger a job that runs for 10 seconds:

curl -X POST "https://tasks.datacrunch.io/<DEPLOYMENT_NAME>/job?duration=10" \
  --header "Authorization: Bearer <INFERENCE_TOKEN>"

# Response:
{
  "Id": "632c1e18-85e6-4567-ac15-f04749a51b9e",
  "StatusPath": "/status/<DEPLOYMENT_NAME>",
  "ResultPath": "/result/<DEPLOYMENT_NAME>"
}

To use a custom job id, set X-Inference-Id: <custom-id>.

2) Check job status

curl -X GET "https://tasks.datacrunch.io/status/<DEPLOYMENT_NAME>" \
  --header "X-Inference-Id: 632c1e18-85e6-4567-ac15-f04749a51b9e" \
  --header "Authorization: Bearer <INFERENCE_TOKEN>"

# Response:
{
  "Id": "632c1e18-85e6-4567-ac15-f04749a51b9e",
  "Status": "Queue"
}

3) Fetch the result

curl -X GET "https://tasks.datacrunch.io/result/<DEPLOYMENT_NAME>" \
  --header "X-Inference-Id: 632c1e18-85e6-4567-ac15-f04749a51b9e" \
  --header "Authorization: Bearer <INFERENCE_TOKEN>"

# Response (example app payload; your app can return anything):
{
  "success": true,
  "message": "Job completed successfully",
  "executionTime": 5,
  "timestamp": "2025-11-06 11:03:03"
}

Best practices

Use batch jobs for workloads that usually run longer than ~3 minutes.
Exit the process when the job is done (success or failure).
If you return an HTTP response, exit after the response is sent.
Log heavily. Use DEBUG during development. Use INFO/WARNING in production.

Troubleshooting

Replica keeps running after the job is done
- Make sure you actually exit the process.
- Make sure you exit with the right status code.
- Unhandled exceptions may return an HTTP error but keep the process alive.
Replica was killed before the job finished
- Set Deadline higher than your expected job duration.
No response is returned
- Make sure the process doesn’t exit before sending the response.
- In FastAPI, exit from a BackgroundTasks task after returning.
- In Node.js, exit via setImmediate() after writing the response.
Replica isn’t accepting jobs
- Make sure you implement a GET /health endpoint.

PreviousAsync Inference NextTutorials

Last updated 7 days ago

Was this helpful?

hashtagWhat are batch jobs?

hashtagWhy use batch jobs instead of continuous deployments?

hashtagKey differences vs continuous deployments

hashtagUsage and example

hashtag1) Start a job

hashtag2) Check job status

hashtag3) Fetch the result

hashtagBest practices

hashtagTroubleshooting