Harness configuration

A harness is the bridge between your AI agent and Archal’s digital twins. It receives a task, discovers tools, calls an LLM, executes tool calls, and repeats until done.

Quick start

Create a real harness entrypoint in the repo you want to test, then point archal run at it:

archal run scenario.md --harness ./.archal/harness.ts

archal run preflights the harness automatically before provisioning hosted twins. Use the rest of this page when you need the runtime contract, model knobs, or compatibility details.

Custom harness directory

Point --harness at any directory containing your agent code:

archal run scenario.md \
  --harness ./my-agent \
  -n 3

No manifest file is required when Archal can resolve an explicit harness entrypoint. If a manifest is absent, Archal can still load .md files in the harness directory as prompt context. Add a harness.json manifest when you want explicit command defaults, prompt files, or a default model.

Manifest (`harness.json`)

The manifest tells Archal how to spawn your agent. Drop it in your harness directory.

{
  "version": 1,
  "defaultModel": "gpt-4.1",
  "promptFiles": ["system-prompt.md", "safety-guidelines.md"],
  "local": {
    "command": "node",
    "args": ["agent.mjs"],
    "env": {
      "MY_CUSTOM_VAR": "value"
    }
  }
}

Field	Type	Required	Description
`version`	`1`	Yes	Schema version. Must be `1`.
`defaultModel`	string	No	Fallback model ID when `--agent-model` is not provided.
`promptFiles`	string[]	No	Markdown files loaded in order and prepended to the scenario task. Paths are relative to the harness directory.
`local.command`	string	No	Command to spawn (e.g., `node`, `python`, `npx`).
`local.args`	string[]	No	Arguments passed to the command.
`local.env`	object	No	Extra environment variables injected into the harness process.

Setting the system prompt

Three options, depending on what you’re building.

Option 1: Prompt files in the manifest

The cleanest approach. List markdown files in promptFiles — they’re concatenated in order and prepended to the scenario task:

{
  "version": 1,
  "promptFiles": ["system-prompt.md", "safety-guidelines.md"],
  "local": { "command": "node", "args": ["agent.mjs"] }
}

Your agent receives the combined content as the ARCHAL_ENGINE_TASK environment variable:

[Contents of system-prompt.md]

[Contents of safety-guidelines.md]

---

[Scenario prompt]

Common prompt file patterns:

System prompt — role definition, reasoning instructions, tool-use guidance
Safety guidelines — refusal policies, escalation procedures, authorization checks
Domain context — company-specific terminology, workflow rules

Option 2: Inline system prompt in your agent code

Read ARCHAL_ENGINE_TASK for the scenario task and construct messages however you want:

const TASK = process.env.ARCHAL_ENGINE_TASK;

const messages = [
  {
    role: 'system',
    content: `You are a cautious agent. Never delete anything without confirmation.
Always read before writing. Explain your reasoning.`
  },
  { role: 'user', content: TASK }
];

Option 3: No manifest, just .md files

Drop markdown files in your harness directory without a manifest. Archal finds them, concatenates them alphabetically, and passes them as part of ARCHAL_ENGINE_TASK. Simple, zero-config.

my-agent/
  agent.mjs
  01-role.md          ← loaded first
  02-safety.md        ← loaded second

How system prompts are delivered to different providers

The way you send a system prompt depends on the LLM provider:

Provider	Mechanism	Notes
OpenAI (GPT-4o, GPT-4.1, GPT-5.2)	`system` message role	Standard chat format
OpenAI (o1, o3, o4-mini)	`developer` message role	System messages aren’t supported — use `developer` or merge into user message
Anthropic (Claude)	`system` parameter (separate from messages)	Not inside the messages array. Supports `cache_control` for prompt caching.
Gemini	`system_instruction` parameter	Separate from `contents` array

Tuning the LLM

Archal injects environment variables into your harness process. Your agent reads them and passes them to the LLM provider — Archal does not intercept your API calls.

Temperature

Lower values produce more deterministic, consistent tool calls. Higher values produce more varied responses.

export ARCHAL_TEMPERATURE=0.0

Use case	Temperature
Tool calling and structured actions	`0.0`
Balanced agent tasks	`0.2`
Creative or generative tasks	`0.7–1.0`

Provider caveats:

OpenAI reasoning models (o1, o3, o4-mini): temperature is rejected by the API.
Anthropic with extended thinking: temperature cannot be modified when thinking is enabled.
Gemini: works normally; values below 0.1 can cause response looping.

Max output tokens

export ARCHAL_MAX_TOKENS=16384

Set this high enough for the model to reason and generate tool calls, but not so high that it rambles. Sensible defaults per model:

Model	Default	Notes
GPT-4o, GPT-4o-mini	32,768
GPT-4.1	65,536	Large context model
GPT-5.2	32,768
o-series	32,768–65,536	Uses `max_completion_tokens`
Claude Opus 4.6	32,768
Claude Sonnet 4.6	32,768
Claude Haiku 4.5	16,384
Gemini 2.x Flash	16,384
Gemini 2.5/3.0 Pro	32,768–65,536

Extended thinking

Some models can “think” internally before responding. This usually improves performance on complex multi-step tasks but costs more tokens. Anthropic:

export ARCHAL_THINKING_BUDGET=adaptive  # default — model decides
export ARCHAL_THINKING_BUDGET=off       # disable
export ARCHAL_THINKING_BUDGET=8192      # explicit token budget (min 1024)

When enabled, tool_choice is limited to auto or none and temperature/top_k cannot be modified. OpenAI o-series:

export ARCHAL_REASONING_EFFORT=medium  # low | medium | high

Reasoning is hidden — you can’t see or control the thinking content. Don’t use chain-of-thought prompting; the model already reasons internally. Gemini 2.5+:

export ARCHAL_THINKING_BUDGET=4096

When thinking is enabled, Gemini returns encrypted “thought signatures” that must be passed back in subsequent requests.

Tool choice

Controls whether the model must, may, or must not call tools on a given turn:

Intent	OpenAI	Anthropic	Gemini
Model decides	`"auto"`	`{"type": "auto"}`	`"AUTO"`
Must call a tool	`"required"`	`{"type": "any"}`	`"ANY"`
Force specific tool	`{"type": "function", "function": {"name": "..."}}`	`{"type": "tool", "name": "..."}`	Use `allowed_function_names`
No tools	`"none"`	`{"type": "none"}`	`"NONE"`

For most agents, auto is correct. Use required / any only when the model should always call a tool on the next turn.

Environment variables

Archal injects these into every harness process.

Task and model

Variable	Description
`ARCHAL_ENGINE_TASK`	The full task text (prompt files + scenario prompt). `## Expected Behavior` is never included — it’s the evaluator holdout.
`ARCHAL_ENGINE_MODEL`	Model identifier from `--agent-model` or manifest `defaultModel`.

Twin connectivity

Variable	Description
`ARCHAL_MCP_CONFIG`	Path to MCP server config JSON.
`ARCHAL_MCP_SERVERS`	Stringified MCP servers JSON (same data as the config file).
`ARCHAL_TWIN_NAMES`	Comma-separated list of twin names in the scenario.
`ARCHAL_{TWIN}_REST_URL`	REST endpoint per twin (ends in `/api`).
`ARCHAL_{TWIN}_MCP_URL`	MCP endpoint per twin (ends in `/mcp`).
`ARCHAL_TOKEN`	Bearer token for authenticated twin requests.

ARCHAL_{TWIN}_BASE_URL and ARCHAL_{TWIN}_URL are also set as backward-compat aliases, but prefer the explicit _REST_URL / _MCP_URL pair.

API keys

Passed through from your environment or config:

Variable	Provider
`OPENAI_API_KEY`	OpenAI
`ANTHROPIC_API_KEY`	Anthropic
`GEMINI_API_KEY`	Google
`ARCHAL_ENGINE_API_KEY`	Generic override — takes priority over provider-specific keys

Tuning overrides

Variable	Description	Default
`ARCHAL_MAX_TOKENS`	Max completion tokens per LLM call	Model-specific
`ARCHAL_TEMPERATURE`	Sampling temperature	`0.2`
`ARCHAL_REASONING_EFFORT`	OpenAI reasoning models: `low`, `medium`, `high`	`medium`
`ARCHAL_THINKING_BUDGET`	Extended thinking: `adaptive`, `off`, or token count	`adaptive`
`ARCHAL_LLM_TIMEOUT`	Per-LLM-call timeout in seconds	`120`
`ARCHAL_LOG_LEVEL`	Harness log verbosity	`info`

Base URL overrides

For Azure OpenAI, API proxies, or self-hosted endpoints:

Variable	Default
`ARCHAL_OPENAI_BASE_URL`	`https://api.openai.com/v1`
`ARCHAL_ANTHROPIC_BASE_URL`	`https://api.anthropic.com`
`ARCHAL_GEMINI_BASE_URL`	`https://generativelanguage.googleapis.com/v1beta`

Metrics and trace output

Set by the orchestrator — your harness can write to them for richer reports:

Variable	Description
`ARCHAL_METRICS_FILE`	Path to write metrics JSON (token counts, timing, exit reason)
`ARCHAL_AGENT_TRACE_FILE`	Path to write agent trace JSON (thinking, text, tool calls per step)

Twin transport

Two ways for the harness to talk to twins. REST is recommended for most agents — plain HTTP, easiest to debug. MCP (Model Context Protocol) is the full tool-discovery protocol from @modelcontextprotocol/sdk; pick it if your agent already speaks MCP natively.

REST (recommended)

Simple HTTP endpoints. Each twin exposes its REST URL via ARCHAL_{TWIN}_REST_URL.

// Discover tools
const res = await fetch(`${process.env.ARCHAL_GITHUB_REST_URL}/tools`);
const tools = await res.json();

// Call a tool
const result = await fetch(`${process.env.ARCHAL_GITHUB_REST_URL}/tools/call`, {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ name: 'create_issue', arguments: { title: '...' } }),
});

MCP

Full Model Context Protocol transport using @modelcontextprotocol/sdk. Archal writes an MCP server config to the path in ARCHAL_MCP_CONFIG.

import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { StreamableHTTPClientTransport } from '@modelcontextprotocol/sdk/client/streamableHttp.js';

const config = JSON.parse(fs.readFileSync(process.env.ARCHAL_MCP_CONFIG, 'utf8'));
const transport = new StreamableHTTPClientTransport(new URL(config.mcpServers.github.url));
const client = new Client({ name: 'my-agent' });
await client.connect(transport);
const { tools } = await client.listTools();

Tool namespacing

When presenting tools to your LLM, namespace them as mcp__{twin}__{tool_name} (e.g., mcp__github__create_issue). This matches the format Archal’s evaluator expects.

Agent loop

The core pattern:

1. Read ARCHAL_ENGINE_TASK
2. Discover tools from twins (REST /tools or MCP listTools)
3. Build initial messages (system prompt + task)
4. Loop:
   a. Call LLM with messages and tools
   b. If no tool calls → done
   c. Execute each tool call against the twin
   d. Append results to messages
   e. Repeat

Practical tips:

Investigate before acting. Strong agents read state, check statuses, and review policies before executing write actions. This catches social engineering in scenarios.
Handle errors. Tool calls can fail — return the error message to the LLM and let it retry. Bail out after ~5 consecutive errors to avoid infinite loops.
Retry transient LLM failures. API calls fail with 429, 500, 502, 503. Use exponential backoff (1s → 2s → 4s, capped at 30s) and respect Retry-After.
Cap iterations. Limit the loop to 20–50 steps. Without a cap, an agent can loop indefinitely on ambiguous tasks.
Use temperature 0–0.2 for tool calling. Deterministic outputs produce consistent, valid structured data.

CLI flags

These flags on archal run affect harness behavior:

Flag	Description	Default
`--harness PATH`	Custom harness file or directory	None
`--agent-model MODEL`	Model identifier passed to the harness	Manifest `defaultModel`
`--api-key KEY`	API key for the model provider	From env vars
`-n, --runs COUNT`	Number of runs per scenario	`1`
`-t, --timeout SECONDS`	Timeout per run, harness killed after (max 3600)	`180`
`--seed NAME`	Override twin seed	Scenario default
`--rate-limit COUNT`	Max tool calls before 429	None
`-q, --quiet`	Suppress non-error output	`false`
`-v, --verbose`	Enable debug logging	`false`

Example: minimal custom harness

A complete harness in ~40 lines using REST transport and the OpenAI API:

// agent.mjs
const TASK = process.env.ARCHAL_ENGINE_TASK;
const MODEL = process.env.ARCHAL_ENGINE_MODEL || 'gpt-4.1';
const API_KEY = process.env.OPENAI_API_KEY;

// Discover tools from all twins
const tools = [];
for (const [key, url] of Object.entries(process.env)) {
  const match = key.match(/^ARCHAL_(\w+)_REST_URL$/);
  if (!match || !url) continue;
  const twin = match[1].toLowerCase();
  const res = await fetch(`${url}/tools`);
  for (const tool of await res.json()) {
    tools.push({
      type: 'function',
      function: {
        name: `mcp__${twin}__${tool.name}`,
        description: tool.description,
        parameters: tool.inputSchema,
      },
    });
  }
}

// Agent loop
let messages = [
  { role: 'system', content: 'You are a helpful agent. Use tools to complete the task.' },
  { role: 'user', content: TASK },
];

for (let step = 0; step < 30; step++) {
  const res = await fetch('https://api.openai.com/v1/chat/completions', {
    method: 'POST',
    headers: { Authorization: `Bearer ${API_KEY}`, 'Content-Type': 'application/json' },
    body: JSON.stringify({ model: MODEL, messages, tools, temperature: 0, max_tokens: 16384 }),
  });
  const data = await res.json();
  const choice = data.choices[0];
  messages.push(choice.message);

  if (!choice.message.tool_calls?.length) break;

  for (const call of choice.message.tool_calls) {
    const [, twin, toolName] = call.function.name.match(/^mcp__(\w+)__(.+)$/);
    const twinUrl = process.env[`ARCHAL_${twin.toUpperCase()}_REST_URL`];
    const result = await fetch(`${twinUrl}/tools/call`, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ name: toolName, arguments: JSON.parse(call.function.arguments) }),
    });
    messages.push({ role: 'tool', tool_call_id: call.id, content: JSON.stringify(await result.json()) });
  }
}

With a manifest:

{
  "version": 1,
  "defaultModel": "gpt-4.1",
  "local": { "command": "node", "args": ["agent.mjs"] }
}

Run it:

archal run scenario.md --harness ./my-agent

Start here

Scenarios

Run anywhere

Advanced

Harness configuration

Quick start

Custom harness directory

Manifest (`harness.json`)

Setting the system prompt

Option 1: Prompt files in the manifest

Option 2: Inline system prompt in your agent code

Option 3: No manifest, just .md files

How system prompts are delivered to different providers

Tuning the LLM

Temperature

Max output tokens

Extended thinking

Tool choice

Environment variables

Task and model

Twin connectivity

API keys

Tuning overrides

Base URL overrides

Metrics and trace output

Twin transport

REST (recommended)

MCP

Tool namespacing

Agent loop

CLI flags

Example: minimal custom harness

Start here

Scenarios

Run anywhere

Advanced

Documentation Index

​Quick start

​Custom harness directory

​Manifest (harness.json)

​Setting the system prompt

​Option 1: Prompt files in the manifest

​Option 2: Inline system prompt in your agent code

​Option 3: No manifest, just .md files

​How system prompts are delivered to different providers

​Tuning the LLM

​Temperature

​Max output tokens

​Extended thinking

​Tool choice

​Environment variables

​Task and model

​Twin connectivity

​API keys

​Tuning overrides

​Base URL overrides

​Metrics and trace output

​Twin transport

​REST (recommended)

​MCP

​Tool namespacing

​Agent loop

​CLI flags

​Example: minimal custom harness

​Related

Quick start

Custom harness directory

Manifest (`harness.json`)

Setting the system prompt

Option 1: Prompt files in the manifest

Option 2: Inline system prompt in your agent code

Option 3: No manifest, just .md files

How system prompts are delivered to different providers

Tuning the LLM

Temperature

Max output tokens

Extended thinking

Tool choice

Environment variables

Task and model

Twin connectivity

API keys

Tuning overrides

Base URL overrides

Metrics and trace output

Twin transport

REST (recommended)

MCP

Tool namespacing

Agent loop

CLI flags

Example: minimal custom harness

Related