API Reference

Public Inference API

Submit ONNX inference requests and retrieve results through a simple, synchronous REST API. Authenticate with your API key and start running inference in minutes.

Base URL: https://infinite-gpu-backend-bvh8a7c3fdgxd7c5.canadacentral-01.azurewebsites.net Content-Type: application/json

Authentication

Every request must include your API key in the X-Api-Key header. Your API key is generated when you register and can be found in the task creation dialog when selecting "Fill by API" mode.

Request header
// Include in every request
X-Api-Key: pk-xxxxxxxx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

⚠ Security

Keep your API key secret. Do not expose it in client-side code or public repositories. Rotate your key from the console if you suspect it has been compromised.

POST

Submit Inference

Submit an inference request to an existing task that has been configured with "Fill by API" mode. The request blocks until the inference completes (up to 180 seconds), then returns the result synchronously.

POST /api/inference/tasks/{taskId}

Path Parameters

Parameter Type Description
taskId GUID The ID of the task created with fillBindingsViaApi: true

Request Body

Field Type Required Description
bindings array Yes Array of input tensor bindings to feed into the ONNX graph
bindings[].tensorName string Yes Name of the input tensor as defined in the ONNX model
bindings[].payloadType string Yes One of: Json, Text, Binary
bindings[].payload string | null Conditional Inline payload data. Required for Json and Text payload types
bindings[].fileUrl string | null Conditional URL pointing to a binary file. Required for Binary payload type

Request Example

cURL
curl -X POST https://infinite-gpu-backend-bvh8a7c3fdgxd7c5.canadacentral-01.azurewebsites.net/api/inference/tasks/{taskId} \
  -H "Content-Type: application/json" \
  -H "X-Api-Key: pk-xxxxxxxx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" \
  -d '{
  "bindings": [
    {
      "tensorName": "input_ids",
      "payloadType": "Json",
      "payload": "[[101, 2054, 2003, 1996, 3007, 1997, 2605, 102]]",
      "fileUrl": null
    },
    {
      "tensorName": "attention_mask",
      "payloadType": "Json",
      "payload": "[[1, 1, 1, 1, 1, 1, 1, 1]]",
      "fileUrl": null
    }
  ]
}'

Binary Payload Example

cURL – binary tensor via URL
curl -X POST https://infinite-gpu-backend-bvh8a7c3fdgxd7c5.canadacentral-01.azurewebsites.net/api/inference/tasks/{taskId} \
  -H "Content-Type: application/json" \
  -H "X-Api-Key: pk-xxxxxxxx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" \
  -d '{
  "bindings": [
    {
      "tensorName": "image",
      "payloadType": "Binary",
      "payload": null,
      "fileUrl": "https://your-storage.blob.core.windows.net/inputs/image.npy"
    }
  ]
}'

Response

The endpoint blocks until inference completes or times out (180s). The response always includes:

Field Type Description
id GUID Subtask identifier for this inference run
state string One of: success, failed, pending
data object | null Inference results. Present when state is success
error string | null Error message. Present when state is failed

Response Example (Success)

200 OK
{
  "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "state": "success",
  "data": {
    "output_logits": [[0.12, -0.34, 0.98, 1.45, ...]]
  },
  "error": null
}

Response Example (Error)

200 OK (with failed state)
{
  "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "state": "failed",
  "data": null,
  "error": "Subtask failed to execute."
}

HTTP Status Codes

Code Meaning
200 Inference completed (check state for result)
400 Invalid request body or task not configured for API bindings
401 Missing or invalid API key
404 Task not found
GET

Get Inference Result

Retrieve the result of a previously submitted inference subtask. Useful if you need to re-check a result or if the original submit request timed out while the subtask was still pending.

GET /api/inference/subtasks/{subtaskId}

Path Parameters

Parameter Type Description
subtaskId GUID The subtask id returned from the Submit Inference endpoint

Request Example

cURL
curl https://infinite-gpu-backend-bvh8a7c3fdgxd7c5.canadacentral-01.azurewebsites.net/api/inference/subtasks/{subtaskId} \
  -H "X-Api-Key: pk-xxxxxxxx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

Response

Returns the same response structure as the Submit Inference endpoint.

200 OK
{
  "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "state": "success",
  "data": { ... },
  "error": null
}

Code Examples

Complete integration examples in popular languages.

Python

inference.py
import requests

API_KEY  = "pk-xxxxxxxx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
BASE_URL = "https://infinite-gpu-backend-bvh8a7c3fdgxd7c5.canadacentral-01.azurewebsites.net"
TASK_ID  = "your-task-id-here"

# Submit inference
response = requests.post(
    f"{BASE_URL}/api/inference/tasks/{TASK_ID}",
    headers={
        "Content-Type": "application/json",
        "X-Api-Key": API_KEY,
    },
    json={
        "bindings": [
            {
                "tensorName": "input_ids",
                "payloadType": "Json",
                "payload": "[[101, 2054, 2003, 102]]",
                "fileUrl": None,
            }
        ]
    },
)

result = response.json()
print(result["state"])  # "success", "failed", or "pending"
print(result["data"])   # inference output

# Re-check result later
subtask_id = result["id"]
poll = requests.get(
    f"{BASE_URL}/api/inference/subtasks/{subtask_id}",
    headers={"X-Api-Key": API_KEY},
)
print(poll.json())

JavaScript / Node.js

inference.js
const API_KEY  = "pk-xxxxxxxx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx";
const BASE_URL = "https://infinite-gpu-backend-bvh8a7c3fdgxd7c5.canadacentral-01.azurewebsites.net";
const TASK_ID  = "your-task-id-here";

// Submit inference
const response = await fetch(
  `${BASE_URL}/api/inference/tasks/${TASK_ID}`,
  {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      "X-Api-Key": API_KEY,
    },
    body: JSON.stringify({
      bindings: [
        {
          tensorName: "input_ids",
          payloadType: "Json",
          payload: "[[101, 2054, 2003, 102]]",
          fileUrl: null,
        },
      ],
    }),
  }
);

const result = await response.json();
console.log(result.state); // "success", "failed", or "pending"
console.log(result.data);  // inference output

// Re-check result later
const poll = await fetch(
  `${BASE_URL}/api/inference/subtasks/${result.id}`,
  { headers: { "X-Api-Key": API_KEY } }
);
console.log(await poll.json());