LLM API Documentation - Setup Guide for OpenClaw, Hermes Agent & AI Agents

Base URL

https://banana2.pro/llm-api

Authentication

All LLM API requests require your API key. Create keys in your Dashboard. Each API key works with every model — no separate Claude API key or OpenAI key needed.

OpenAI / Gemini / Responses Format

Pass the API key via the Authorization header (for OpenAI SDK, Gemini, and other non-Anthropic clients):

HTTP Header

Authorization: Bearer llm_YOUR_API_KEY

Anthropic Format

The Anthropic SDK uses the x-api-key header by default. We support both methods:

HTTP Header

x-api-key: llm_YOUR_API_KEY // or Authorization: Bearer llm_YOUR_API_KEY

Integration Guides for LLM API

Hermes Agent API Setup

Hermes Agent is a powerful AI coding assistant framework with flexible LLM provider configuration. You can easily connect to our LLM API via ~/.hermes/config.yaml and access all available models. Choose the configuration based on the type of models you want to use.

For GPT-4o, Qwen, DeepSeek, GLM and other models using OpenAI Chat Completions format. Hermes will automatically use the /v1/chat/completions endpoint.

~/.hermes/config.yaml

model:
  provider: custom
  base_url: https://banana2.pro/llm-api/v1
  default: gpt-4o
  api_key: llm_YOUR_API_KEY

For GPT-5.4, GPT-5-mini and other models using OpenAI Responses API. When Hermes detects a model name starting with gpt-5, it automatically switches to the /v1/responses endpoint.

~/.hermes/config.yaml

model:
  provider: custom
  base_url: https://banana2.pro/llm-api/api.openai.com/v1
  default: gpt-5.4
  api_key: llm_YOUR_API_KEY

For Claude Sonnet, Claude Opus and other Anthropic models. When base_url ends with /anthropic, Hermes automatically uses the Anthropic SDK and calls the /v1/messages endpoint.

~/.hermes/config.yaml

model:
  provider: custom
  base_url: https://banana2.pro/llm-api/anthropic
  default: claude-sonnet-4-6
  api_key: llm_YOUR_API_KEY

Model Type & Base URL Reference

Model Type	base_url	Actual Endpoint
GPT-4o, Qwen, DeepSeek, GLM	/llm-api/v1	/v1/chat/completions
GPT-5.x	/llm-api/api.openai.com/v1	/v1/responses
Claude	/llm-api/anthropic	/v1/messages

OpenClaw API Configuration

OpenClaw is one of the most popular open-source AI coding agents. It uses a configuration file to define which LLM API provider to use. The configuration below shows how to connect both OpenAI-compatible models and Anthropic models — each with its own baseUrl and api format. Once configured, OpenClaw routes all LLM API calls through our provider automatically.

openclaw-config.json

{
  "providers": [
    {
      "name": "banana-2-pro",
      "api": "openai-completions",
      "baseUrl": "https://banana2.pro/llm-api/v1",
      "models": ["gpt-4o", "deepseek-r1", "qwen-max"],
      "authProfiles": [
        { "apiKey": "llm_YOUR_API_KEY" }
      ]
    },
    {
      "name": "banana-2-pro-claude",
      "api": "anthropic",
      "baseUrl": "https://banana2.pro/llm-api/anthropic",
      "models": ["claude-sonnet-4-6"],
      "authProfiles": [
        { "apiKey": "llm_YOUR_API_KEY" }
      ]
    }
  ]
}

OpenAI-compatible models (GPT-4o, DeepSeek, Qwen, etc.) use baseUrl /llm-api/v1, while Claude models use /llm-api/anthropic. Both share the same API key.

Python SDK Examples

OpenAI SDK

python

from openai import OpenAI

client = OpenAI(
    api_key="llm_YOUR_API_KEY",
    base_url="https://banana2.pro/llm-api/v1"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True
)

for chunk in response:
    print(chunk.choices[0].delta.content or "", end="")

Anthropic SDK

python

import anthropic

client = anthropic.Anthropic(
    api_key="llm_YOUR_API_KEY",
    base_url="https://banana2.pro/llm-api/anthropic"
)

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}]
)

cURL Examples

OpenAI Completions

bash

curl https://banana2.pro/llm-api/v1/chat/completions \
  -H "Authorization: Bearer llm_YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello"}], "stream": true}'

Anthropic Messages

bash

curl https://banana2.pro/llm-api/anthropic/v1/messages \
  -H "x-api-key: llm_YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{"model": "claude-sonnet-4-6", "max_tokens": 1024, "messages": [{"role": "user", "content": "Hello"}]}'

Gemini

bash

curl "https://banana2.pro/llm-api/v1beta/models/gemini-2.5-flash:streamGenerateContent" \
  -H "Authorization: Bearer llm_YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"contents": [{"parts": [{"text": "Hello"}]}]}'

Media Generation API

Generate images using the OpenAI-compatible Images API. Tasks are processed asynchronously — you submit a request and receive a task_id, then poll for results or receive a webhook notification.

1. Submit Generation Request

POST /llm-api/v1/images/generations

curl -X POST https://banana2.pro/llm-api/v1/images/generations \
  -H "Authorization: Bearer llm_YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-image-2",
    "prompt": "A futuristic cityscape at sunset",
    "size": "16:9",
    "resolution": "1k",
    "n": 1,
    "webhook_url": "https://your-server.com/webhook"
  }'

Response (HTTP 202):

{
  "task_id": "media_abc123def456",
  "status": "pending",
  "model": "gpt-image-2",
  "charge": "0.01200"
}

2. Poll Task Status

GET /llm-api/v1/tasks/{task_id}

curl https://banana2.pro/llm-api/v1/tasks/media_abc123def456 \
  -H "Authorization: Bearer llm_YOUR_API_KEY"

Response (completed):

{
  "task_id": "media_abc123def456",
  "status": "completed",
  "model": "gpt-image-2",
  "result": {
    "urls": ["https://cdn.example.com/generated-image.png"]
  }
}

Request Parameters

Parameter	Type	Description
model	string *	Model name, e.g. gpt-image-2
prompt	string *	Text description of the image to generate (Chinese/English supported)
size	string	Aspect ratio: auto, 1:1, 16:9, 9:16, 3:2, 2:3, 4:3, 3:4, 5:4, 4:5, 2:1, 1:2, 21:9, 9:21 (default: 1:1)
resolution	string	Output resolution: 1k, 2k, 4k (default: 1k). 4K only supports: 16:9, 9:16, 1:2, 2:1, 21:9, 9:21
n	integer	Number of images (default: 1, currently only 1 supported)
image_urls	array	Reference image URLs for image-to-image generation (up to 16, supports URLs and base64 data URIs)
webhook_url	string	Optional URL to receive task completion/failure notifications

Python Example

python

import requests, time

BASE = "https://banana2.pro/llm-api"
HEADERS = {
    "Authorization": "Bearer llm_YOUR_API_KEY",
    "Content-Type": "application/json"
}

# 1. Submit
resp = requests.post(f"{BASE}/v1/images/generations", headers=HEADERS, json={
    "model": "gpt-image-2",
    "prompt": "A cat wearing a space suit",
    "size": "1:1",
    "resolution": "1k"
})
task = resp.json()
task_id = task["task_id"]
print(f"Task submitted: {task_id}, charge: ${task['charge']}")

# 2. Poll until done
while True:
    time.sleep(5)
    r = requests.get(f"{BASE}/v1/tasks/{task_id}", headers=HEADERS)
    result = r.json()
    if result["status"] == "completed":
        print("Image URLs:", result["result"]["urls"])
        break
    elif result["status"] == "failed":
        print("Error:", result.get("error"))
        break

Billing: Media generation charges are deducted upfront when the task is submitted. If the task fails, the charge is automatically refunded to your balance.

LLM API Format Details

Our LLM provider supports four native API formats. Each model uses the format that matches its official API specification.

OpenAI Chat Completions

The most widely supported LLM API format. Compatible with GPT-4o, GPT-4o-mini, and any model that uses the OpenAI completions spec. This is the format used by most AI agents, including OpenClaw in its default configuration. Send requests to /v1/chat/completions with the standard messages array format. Supports streaming via the stream parameter.

OpenAI Responses

The newer OpenAI Responses API format. Used for models that support the responses endpoint. Send requests to /v1/responses with the input field instead of messages. This LLM API format supports both streaming and non-streaming modes.

Anthropic Messages

The native Claude API format. Use this for Claude Sonnet, Claude Opus, and other Anthropic models. Send requests to /v1/messages with the Anthropic messages format. Requires the max_tokens parameter. This is the format Hermes Agent and other tools use when they need a dedicated Claude API key connection.

Gemini API

Google's native Gemini API format. Use /v1beta/models/{model}:generateContent for non-streaming or :streamGenerateContent for streaming. The contents array format with parts is specific to Gemini. Our LLM provider handles the routing to ensure your Gemini requests reach the right backend.

LLM API Best Practices

Choose the Right Model

Not every task needs the most expensive model. Use Gemini Flash for simple classifications and quick responses. Use GPT-4o for general-purpose tasks. Reserve Claude Sonnet for complex reasoning and code generation. Our LLM API makes it easy to switch between models — just change the model parameter in your request.

Use Streaming for Better UX

Enable streaming in your LLM API calls to show responses as they generate. This significantly improves perceived performance for end users. All four API formats support streaming through our provider.

Monitor Your Usage

Check the Dashboard regularly to track your LLM API usage across all models. Export usage data as CSV for detailed cost analysis. Our per-request tracking shows exactly how many tokens each call consumed.

Available LLM Models & Pricing

Model	Type	API Format	Input $/1M	Output $/1M

No models found.

LLM API Endpoints Reference

POST

/llm-api/v1/chat/completions

OpenAI Chat Completions format — works with GPT-4o, Qwen, DeepSeek, GLM and compatible models. Supports streaming.

POST

/llm-api/v1/responses

OpenAI Responses format — for GPT-5.x models. Supports streaming.

POST

/llm-api/api.openai.com/v1/responses

OpenAI Responses format (Hermes compatible) — automatically used when Hermes base_url is set to /llm-api/api.openai.com/v1.

POST

/llm-api/v1/messages

Anthropic Messages format — for Claude API models. Supports streaming.

POST

/llm-api/anthropic/v1/messages

Anthropic Messages format (Hermes compatible) — automatically used when Hermes base_url ends with /anthropic.

POST

/llm-api/v1beta/models/{model}:generateContent

Gemini format (non-streaming).

POST

/llm-api/v1beta/models/{model}:streamGenerateContent

Gemini format (streaming).

GET

/llm-api/v1/models

List OpenAI Completions format models (GET) — returns GPT-4o, Qwen, DeepSeek, GLM etc.

GET

/llm-api/api.openai.com/v1/models

List OpenAI Responses format models (GET) — returns GPT-5.x models. For Hermes GPT-5 integration.

GET

/llm-api/anthropic/models

List Anthropic format models (GET) — returns Claude series models.