LLM API Documentation
Everything you need to connect OpenClaw, Hermes Agent, or any AI tool to our multi-model LLM API provider.
Base URL
Authentication
All LLM API requests require your API key. Create keys in your Dashboard. Each API key works with every model — no separate Claude API key or OpenAI key needed.
OpenAI / Gemini / Responses Format
Pass the API key via the Authorization header (for OpenAI SDK, Gemini, and other non-Anthropic clients):
Authorization: Bearer llm_YOUR_API_KEY
Anthropic Format
The Anthropic SDK uses the x-api-key header by default. We support both methods:
x-api-key: llm_YOUR_API_KEY
// or
Authorization: Bearer llm_YOUR_API_KEY
Integration Guides for LLM API
Hermes Agent API Setup
Hermes Agent is a powerful AI coding assistant framework with flexible LLM provider configuration. You can easily connect to our LLM API via ~/.hermes/config.yaml and access all available models. Choose the configuration based on the type of models you want to use.
For GPT-4o, Qwen, DeepSeek, GLM and other models using OpenAI Chat Completions format. Hermes will automatically use the /v1/chat/completions endpoint.
model:
provider: custom
base_url: https://banana2.pro/llm-api/v1
default: gpt-4o
api_key: llm_YOUR_API_KEY
For GPT-5.4, GPT-5-mini and other models using OpenAI Responses API. When Hermes detects a model name starting with gpt-5, it automatically switches to the /v1/responses endpoint.
model:
provider: custom
base_url: https://banana2.pro/llm-api/api.openai.com/v1
default: gpt-5.4
api_key: llm_YOUR_API_KEY
For Claude Sonnet, Claude Opus and other Anthropic models. When base_url ends with /anthropic, Hermes automatically uses the Anthropic SDK and calls the /v1/messages endpoint.
model:
provider: custom
base_url: https://banana2.pro/llm-api/anthropic
default: claude-sonnet-4-6
api_key: llm_YOUR_API_KEY
Model Type & Base URL Reference
| Model Type | base_url | Actual Endpoint |
|---|---|---|
| GPT-4o, Qwen, DeepSeek, GLM | /llm-api/v1 | /v1/chat/completions |
| GPT-5.x | /llm-api/api.openai.com/v1 | /v1/responses |
| Claude | /llm-api/anthropic | /v1/messages |
OpenClaw API Configuration
OpenClaw is one of the most popular open-source AI coding agents. It uses a configuration file to define which LLM API provider to use. The configuration below shows how to connect both OpenAI-compatible models and Anthropic models — each with its own baseUrl and api format. Once configured, OpenClaw routes all LLM API calls through our provider automatically.
{
"providers": [
{
"name": "banana-2-pro",
"api": "openai-completions",
"baseUrl": "https://banana2.pro/llm-api/v1",
"models": ["gpt-4o", "deepseek-r1", "qwen-max"],
"authProfiles": [
{ "apiKey": "llm_YOUR_API_KEY" }
]
},
{
"name": "banana-2-pro-claude",
"api": "anthropic",
"baseUrl": "https://banana2.pro/llm-api/anthropic",
"models": ["claude-sonnet-4-6"],
"authProfiles": [
{ "apiKey": "llm_YOUR_API_KEY" }
]
}
]
}
OpenAI-compatible models (GPT-4o, DeepSeek, Qwen, etc.) use baseUrl /llm-api/v1, while Claude models use /llm-api/anthropic. Both share the same API key.
Python SDK Examples
OpenAI SDK
from openai import OpenAI
client = OpenAI(
api_key="llm_YOUR_API_KEY",
base_url="https://banana2.pro/llm-api/v1"
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}],
stream=True
)
for chunk in response:
print(chunk.choices[0].delta.content or "", end="")
Anthropic SDK
import anthropic
client = anthropic.Anthropic(
api_key="llm_YOUR_API_KEY",
base_url="https://banana2.pro/llm-api/anthropic"
)
message = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello!"}]
)
cURL Examples
OpenAI Completions
curl https://banana2.pro/llm-api/v1/chat/completions \
-H "Authorization: Bearer llm_YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello"}], "stream": true}'
Anthropic Messages
curl https://banana2.pro/llm-api/anthropic/v1/messages \
-H "x-api-key: llm_YOUR_API_KEY" \
-H "Content-Type: application/json" \
-H "anthropic-version: 2023-06-01" \
-d '{"model": "claude-sonnet-4-6", "max_tokens": 1024, "messages": [{"role": "user", "content": "Hello"}]}'
Gemini
curl "https://banana2.pro/llm-api/v1beta/models/gemini-2.5-flash:streamGenerateContent" \
-H "Authorization: Bearer llm_YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"contents": [{"parts": [{"text": "Hello"}]}]}'
Media Generation API
Generate images using the OpenAI-compatible Images API. Tasks are processed asynchronously — you submit a request and receive a task_id, then poll for results or receive a webhook notification.
1. Submit Generation Request
curl -X POST https://banana2.pro/llm-api/v1/images/generations \
-H "Authorization: Bearer llm_YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-image-2",
"prompt": "A futuristic cityscape at sunset",
"size": "16:9",
"resolution": "1k",
"n": 1,
"webhook_url": "https://your-server.com/webhook"
}'
Response (HTTP 202):
{
"task_id": "media_abc123def456",
"status": "pending",
"model": "gpt-image-2",
"charge": "0.01200"
}
2. Poll Task Status
curl https://banana2.pro/llm-api/v1/tasks/media_abc123def456 \
-H "Authorization: Bearer llm_YOUR_API_KEY"
Response (completed):
{
"task_id": "media_abc123def456",
"status": "completed",
"model": "gpt-image-2",
"result": {
"urls": ["https://cdn.example.com/generated-image.png"]
}
}
Request Parameters
| Parameter | Type | Description |
|---|---|---|
| model | string * | Model name, e.g. gpt-image-2 |
| prompt | string * | Text description of the image to generate (Chinese/English supported) |
| size | string | Aspect ratio: auto, 1:1, 16:9, 9:16, 3:2, 2:3, 4:3, 3:4, 5:4, 4:5, 2:1, 1:2, 21:9, 9:21 (default: 1:1) |
| resolution | string | Output resolution: 1k, 2k, 4k (default: 1k). 4K only supports: 16:9, 9:16, 1:2, 2:1, 21:9, 9:21 |
| n | integer | Number of images (default: 1, currently only 1 supported) |
| image_urls | array | Reference image URLs for image-to-image generation (up to 16, supports URLs and base64 data URIs) |
| webhook_url | string | Optional URL to receive task completion/failure notifications |
Python Example
import requests, time
BASE = "https://banana2.pro/llm-api"
HEADERS = {
"Authorization": "Bearer llm_YOUR_API_KEY",
"Content-Type": "application/json"
}
# 1. Submit
resp = requests.post(f"{BASE}/v1/images/generations", headers=HEADERS, json={
"model": "gpt-image-2",
"prompt": "A cat wearing a space suit",
"size": "1:1",
"resolution": "1k"
})
task = resp.json()
task_id = task["task_id"]
print(f"Task submitted: {task_id}, charge: ${task['charge']}")
# 2. Poll until done
while True:
time.sleep(5)
r = requests.get(f"{BASE}/v1/tasks/{task_id}", headers=HEADERS)
result = r.json()
if result["status"] == "completed":
print("Image URLs:", result["result"]["urls"])
break
elif result["status"] == "failed":
print("Error:", result.get("error"))
break
Billing: Media generation charges are deducted upfront when the task is submitted. If the task fails, the charge is automatically refunded to your balance.
LLM API Format Details
Our LLM provider supports four native API formats. Each model uses the format that matches its official API specification.
OpenAI Chat Completions
The most widely supported LLM API format. Compatible with GPT-4o, GPT-4o-mini, and any model that uses the OpenAI completions spec. This is the format used by most AI agents, including OpenClaw in its default configuration. Send requests to /v1/chat/completions with the standard messages array format. Supports streaming via the stream parameter.
OpenAI Responses
The newer OpenAI Responses API format. Used for models that support the responses endpoint. Send requests to /v1/responses with the input field instead of messages. This LLM API format supports both streaming and non-streaming modes.
Anthropic Messages
The native Claude API format. Use this for Claude Sonnet, Claude Opus, and other Anthropic models. Send requests to /v1/messages with the Anthropic messages format. Requires the max_tokens parameter. This is the format Hermes Agent and other tools use when they need a dedicated Claude API key connection.
Gemini API
Google's native Gemini API format. Use /v1beta/models/{model}:generateContent for non-streaming or :streamGenerateContent for streaming. The contents array format with parts is specific to Gemini. Our LLM provider handles the routing to ensure your Gemini requests reach the right backend.
LLM API Best Practices
Choose the Right Model
Not every task needs the most expensive model. Use Gemini Flash for simple classifications and quick responses. Use GPT-4o for general-purpose tasks. Reserve Claude Sonnet for complex reasoning and code generation. Our LLM API makes it easy to switch between models — just change the model parameter in your request.
Use Streaming for Better UX
Enable streaming in your LLM API calls to show responses as they generate. This significantly improves perceived performance for end users. All four API formats support streaming through our provider.
Monitor Your Usage
Check the Dashboard regularly to track your LLM API usage across all models. Export usage data as CSV for detailed cost analysis. Our per-request tracking shows exactly how many tokens each call consumed.
Available LLM Models & Pricing
| Model | Type | API Format | Input $/1M | Output $/1M |
|---|---|---|---|---|
| No models found. | ||||
LLM API Endpoints Reference
/llm-api/v1/chat/completions
OpenAI Chat Completions format — works with GPT-4o, Qwen, DeepSeek, GLM and compatible models. Supports streaming.
/llm-api/v1/responses
OpenAI Responses format — for GPT-5.x models. Supports streaming.
/llm-api/api.openai.com/v1/responses
OpenAI Responses format (Hermes compatible) — automatically used when Hermes base_url is set to /llm-api/api.openai.com/v1.
/llm-api/v1/messages
Anthropic Messages format — for Claude API models. Supports streaming.
/llm-api/anthropic/v1/messages
Anthropic Messages format (Hermes compatible) — automatically used when Hermes base_url ends with /anthropic.
/llm-api/v1beta/models/{model}:generateContent
Gemini format (non-streaming).
/llm-api/v1beta/models/{model}:streamGenerateContent
Gemini format (streaming).
/llm-api/v1/models
List OpenAI Completions format models (GET) — returns GPT-4o, Qwen, DeepSeek, GLM etc.
/llm-api/api.openai.com/v1/models
List OpenAI Responses format models (GET) — returns GPT-5.x models. For Hermes GPT-5 integration.
/llm-api/anthropic/models
List Anthropic format models (GET) — returns Claude series models.
Ready to start using our LLM API?