API Reference

Public Inference API

Submit ONNX inference requests and retrieve results through a simple, synchronous REST API. Authenticate with your API key and start running inference in minutes.

Base URL: https://infinite-gpu-backend-bvh8a7c3fdgxd7c5.canadacentral-01.azurewebsites.net Content-Type: application/json

Authentication

Every request must include your API key in the X-Api-Key header. Your API key is generated when you register and can be found in the task creation dialog when selecting "Fill by API" mode.

Request header

// Include in every request
X-Api-Key: pk-xxxxxxxx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

⚠ Security

Keep your API key secret. Do not expose it in client-side code or public repositories. Rotate your key from the console if you suspect it has been compromised.

POST

Submit Inference

Submit an inference request to an existing task that has been configured with "Fill by API" mode. The request blocks until the inference completes (up to 180 seconds), then returns the result synchronously.

POST /api/inference/tasks/{taskId}

Path Parameters

Parameter	Type	Description
taskId	GUID	The ID of the task created with `fillBindingsViaApi: true`

Request Body

Field	Type	Required	Description
bindings	array	Yes	Array of input tensor bindings to feed into the ONNX graph
bindings[].tensorName	string	Yes	Name of the input tensor as defined in the ONNX model
bindings[].payloadType	string	Yes	One of: `Json`, `Text`, `Binary`
bindings[].payload	string \| null	Conditional	Inline payload data. Required for `Json` and `Text` payload types
bindings[].fileUrl	string \| null	Conditional	URL pointing to a binary file. Required for `Binary` payload type

Request Example

cURL

curl -X POST https://infinite-gpu-backend-bvh8a7c3fdgxd7c5.canadacentral-01.azurewebsites.net/api/inference/tasks/{taskId} \
  -H "Content-Type: application/json" \
  -H "X-Api-Key: pk-xxxxxxxx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" \
  -d '{
  "bindings": [
    {
      "tensorName": "input_ids",
      "payloadType": "Json",
      "payload": "[[101, 2054, 2003, 1996, 3007, 1997, 2605, 102]]",
      "fileUrl": null
    },
    {
      "tensorName": "attention_mask",
      "payloadType": "Json",
      "payload": "[[1, 1, 1, 1, 1, 1, 1, 1]]",
      "fileUrl": null
    }
  ]
}'

Binary Payload Example

cURL – binary tensor via URL

curl -X POST https://infinite-gpu-backend-bvh8a7c3fdgxd7c5.canadacentral-01.azurewebsites.net/api/inference/tasks/{taskId} \
  -H "Content-Type: application/json" \
  -H "X-Api-Key: pk-xxxxxxxx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" \
  -d '{
  "bindings": [
    {
      "tensorName": "image",
      "payloadType": "Binary",
      "payload": null,
      "fileUrl": "https://your-storage.blob.core.windows.net/inputs/image.npy"
    }
  ]
}'

Response

The endpoint blocks until inference completes or times out (180s). The response always includes:

Field	Type	Description
id	GUID	Subtask identifier for this inference run
state	string	One of: `success`, `failed`, `pending`
data	object \| null	Inference results. Present when `state` is `success`
error	string \| null	Error message. Present when `state` is `failed`

Response Example (Success)

200 OK

{
  "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "state": "success",
  "data": {
    "output_logits": [[0.12, -0.34, 0.98, 1.45, ...]]
  },
  "error": null
}

Response Example (Error)

200 OK (with failed state)

{
  "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "state": "failed",
  "data": null,
  "error": "Subtask failed to execute."
}

HTTP Status Codes

Code	Meaning
200	Inference completed (check `state` for result)
400	Invalid request body or task not configured for API bindings
401	Missing or invalid API key
404	Task not found

GET

Get Inference Result

Retrieve the result of a previously submitted inference subtask. Useful if you need to re-check a result or if the original submit request timed out while the subtask was still pending.

GET /api/inference/subtasks/{subtaskId}

Path Parameters

Parameter	Type	Description
subtaskId	GUID	The subtask `id` returned from the Submit Inference endpoint

Request Example

cURL

curl https://infinite-gpu-backend-bvh8a7c3fdgxd7c5.canadacentral-01.azurewebsites.net/api/inference/subtasks/{subtaskId} \
  -H "X-Api-Key: pk-xxxxxxxx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

Response

Returns the same response structure as the Submit Inference endpoint.

200 OK

{
  "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "state": "success",
  "data": { ... },
  "error": null
}

Code Examples

Complete integration examples in popular languages.

Python

inference.py

import requests

API_KEY  = "pk-xxxxxxxx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
BASE_URL = "https://infinite-gpu-backend-bvh8a7c3fdgxd7c5.canadacentral-01.azurewebsites.net"
TASK_ID  = "your-task-id-here"

# Submit inference
response = requests.post(
    f"{BASE_URL}/api/inference/tasks/{TASK_ID}",
    headers={
        "Content-Type": "application/json",
        "X-Api-Key": API_KEY,
    },
    json={
        "bindings": [
            {
                "tensorName": "input_ids",
                "payloadType": "Json",
                "payload": "[[101, 2054, 2003, 102]]",
                "fileUrl": None,
            }
        ]
    },
)

result = response.json()
print(result["state"])  # "success", "failed", or "pending"
print(result["data"])   # inference output

# Re-check result later
subtask_id = result["id"]
poll = requests.get(
    f"{BASE_URL}/api/inference/subtasks/{subtask_id}",
    headers={"X-Api-Key": API_KEY},
)
print(poll.json())

JavaScript / Node.js

inference.js

const API_KEY  = "pk-xxxxxxxx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx";
const BASE_URL = "https://infinite-gpu-backend-bvh8a7c3fdgxd7c5.canadacentral-01.azurewebsites.net";
const TASK_ID  = "your-task-id-here";

// Submit inference
const response = await fetch(
  `${BASE_URL}/api/inference/tasks/${TASK_ID}`,
  {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      "X-Api-Key": API_KEY,
    },
    body: JSON.stringify({
      bindings: [
        {
          tensorName: "input_ids",
          payloadType: "Json",
          payload: "[[101, 2054, 2003, 102]]",
          fileUrl: null,
        },
      ],
    }),
  }
);

const result = await response.json();
console.log(result.state); // "success", "failed", or "pending"
console.log(result.data);  // inference output

// Re-check result later
const poll = await fetch(
  `${BASE_URL}/api/inference/subtasks/${result.id}`,
  { headers: { "X-Api-Key": API_KEY } }
);
console.log(await poll.json());