ClawClone.prompt
· 17 KiB · Text
Raw
You are now implementing the real application in this workspace.
The target implementation language is LANGUAGE.
This workspace should already contain a verified LANGUAGE project skeleton with basic support for CLI, HTTP, SQLite, config parsing, tests, and build/test commands. Begin by inspecting the existing workspace and README before changing anything.
Your first task is to name the application.
The placeholder name is:
*Claw
Replace the wildcard with a short, distinctive prefix suitable for this LANGUAGE implementation. Examples of the naming style:
LispClaw
BeamClaw
LogicClaw
CrystalClaw
CobolClaw
Do not use an existing project’s name or branding. Once you choose the name, use it consistently for:
- executable name
- README title
- config directory
- default workspace directory
- log names
- test names where appropriate
If the selected name is not suitable for an executable on macOS, create a lowercase/kebab-case executable form and document the relationship. For example:
Application name: LogicClaw
Executable: logicclaw
The goal is to build a small, local-first agent runtime. It should run as a single command-line application that can load configuration, talk to a model provider, expose a CLI agent loop, execute a small set of tools through a security gate, persist memory, and write tamper-evident tool receipts.
Do not mention or depend on any external agent runtime project. Treat this as a clean-room implementation from this spec.
Core workflow:
app init
app config validate
app config show
app provider list
app provider test NAME
app tool list
app tool run NAME --json ARGS
app agent
app agent -m "What files are in this project?"
app memory search "previous topic"
app receipt verify
app estop
Use the final executable name you selected instead of `app`.
The agent must:
- accept input from the CLI channel
- send the conversation to a configured model provider
- advertise available tools to the model
- parse tool calls from the model response
- validate each tool call through a security policy
- execute approved tools
- feed tool results back into the model
- persist the final exchange, tool calls, tool results, and receipts
- return a final answer to the user
Required architecture:
The implementation must have visible separation of responsibility for these areas:
runtime
agent loop, request lifecycle, orchestration
config
config loading, validation, defaults, path expansion
providers
model provider abstraction and concrete providers
channels
CLI channel and optional HTTP/gateway channel
tools
time, file_list, file_read, file_write, shell, http, memory_search
security
autonomy levels, command/path policy, tool-risk classification
memory
SQLite persistence, or JSONL only if SQLite is impractical in LANGUAGE
receipts
tamper-evident tool-call receipts
sop
optional deterministic workflow runner
service
optional install/start/stop/status wrappers
Do not force object-oriented structure if LANGUAGE is not object-oriented. Use idiomatic LANGUAGE design, but preserve the conceptual boundaries.
Configuration:
The application must use a user-editable config file.
Default location should be based on the final app name, for example:
~/.logicclaw/config.toml
TOML is preferred. JSON, INI, S-expression, or another idiomatic config format is acceptable if TOML support is weak in LANGUAGE. If not using TOML, document why.
Minimum config shape:
workspace_dir = "~/logicclaw-workspace"
default_provider = "local"
default_model = "mock"
[security]
autonomy = "supervised"
workspace_only = true
forbidden_paths = ["/etc", "/sys", "/boot", "~/.ssh"]
forbidden_commands = ["rm", "shutdown", "reboot", "mkfs", "dd"]
audit_log = true
[providers.models.local]
kind = "mock"
model = "mock"
[providers.models.openai_compatible]
kind = "openai-compatible"
base_url = "http://localhost:1234/v1"
model = "local-model"
api_key_env = "OPENAI_API_KEY"
[channels.cli]
enabled = true
tools_allow = ["file_read", "file_list", "time", "memory_search", "shell"]
[memory]
backend = "sqlite"
path = "~/.logicclaw/memory.sqlite"
[receipts]
enabled = true
path = "~/.logicclaw/tool_receipts.log"
Adjust paths to match the application name you chose.
Config requirements:
- load defaults when keys are absent
- expand ~ and environment variables
- validate enum values
- validate that workspace exists or create it during init
- do not require API keys for mock mode
- support provider credentials by environment variable
- never print secret values in logs or config dumps
- config validate must report all detected errors in one pass when practical
Provider abstraction:
Create an idiomatic equivalent of:
Provider
name() -> string
capabilities() -> ProviderCapabilities
chat(request: ChatRequest) -> ChatResponse
ChatRequest must contain:
- system_prompt
- messages
- tools
- model
- optional temperature
- optional metadata
ChatResponse must contain:
- final_text
- tool_calls
- optional raw_provider_payload
- optional usage
Required providers:
mock
Deterministic provider used for tests. It must be able to return ordinary text and tool calls from scripted fixtures.
openai-compatible
Sends requests to an OpenAI-compatible /chat/completions endpoint. Full support for every provider is not required. Implement non-streaming chat completion. Tool/function call support is required if reasonably practical in LANGUAGE; otherwise document the limitation clearly.
Optional providers:
reliable
Wrapper provider that tries provider names in order and falls back on network/auth/timeout errors.
router
Wrapper provider that chooses a provider from request metadata hints.
Channel abstraction:
Create an idiomatic equivalent of:
Channel
name() -> string
start(runtime_handle)
send(conversation_id, message)
supports_draft_updates() -> bool
Required channel:
cli
CLI behavior:
app agent
starts a REPL
app agent -m "message"
runs one turn and exits
REPL commands:
/exit
exits
/tools
lists active tools
/memory <query>
searches memory
/policy
prints current autonomy and workspace boundary
Optional gateway channel:
localhost HTTP server
Minimum optional gateway endpoints:
GET /health
GET /status
GET /tools
POST /chat
GET /memory/search?q=...
GET /receipts
POST /estop
Tool abstraction:
Create an idiomatic equivalent of:
Tool
name() -> string
description() -> string
parameters_schema() -> JSON Schema object or equivalent metadata
risk(args, context) -> low | medium | high
invoke(args, context) -> ToolResult
ToolResult must contain:
- success: bool
- output: string
- optional error: string
- optional metadata
- optional receipt_id
Required built-in tools:
time
Returns current local time, UTC time, and timezone if available.
file_list
Lists files under a path inside workspace.
file_read
Reads a UTF-8 text file inside workspace.
file_write
Writes a UTF-8 text file inside workspace.
shell
Executes a shell command inside workspace, subject to security policy.
http
Performs HTTP GET. POST is optional.
memory_search
Searches persisted conversations.
Optional tools:
web_search
May be stubbed unless a search API key is configured.
pdf_extract
Optional.
ask_user
In CLI mode, asks the user a question and returns the answer.
Security model:
Implement three autonomy levels:
readonly
Low-risk read-only tools allowed.
No file_write.
No shell execution except optionally harmless commands such as pwd.
supervised
Low-risk tools run automatically.
Medium-risk tools require operator approval.
High-risk tools are blocked.
full
Low and medium run automatically.
High-risk is still blocked if explicitly forbidden by path or command policy.
Default must be:
supervised
Risk rules:
time, memory_search, file_list, file_read inside workspace:
low
http GET to allowed domains:
low
file_write inside workspace:
medium
shell command from allowlist:
medium
shell command not on allowlist:
high
any path outside workspace when workspace_only = true:
blocked
any path under forbidden_paths:
blocked
any command whose basename appears in forbidden_commands:
blocked
Any shell command containing obvious destructive patterns must be blocked. Minimum patterns:
rm -rf /
rm -rf *
mkfs
dd if=
:(){ :|:& };:
shutdown
reboot
chmod -R 777 /
chown -R
curl ... | sh
wget ... | sh
Approval flow in CLI mode:
When a medium-risk action requires approval, print something like:
Tool request:
tool: file_write
risk: medium
reason: writes to workspace
args: ...
Approve? [y/N]
Default is deny.
Tool receipts:
Every attempted tool invocation must produce a receipt whether it is allowed, denied, failed, or approved.
Receipt fields:
{
"id": "receipt-...",
"timestamp": "2026-05-12T14:00:00Z",
"conversation_id": "...",
"tool": "file_read",
"args_hash": "...",
"result_hash": "...",
"status": "allowed|denied|failed",
"risk": "low|medium|high",
"previous_hash": "...",
"receipt_hash": "..."
}
Receipt hash:
receipt_hash = SHA256(canonical_json(receipt_without_receipt_hash))
Tamper-evident chain:
- each receipt includes the previous receipt’s hash
- receipt verify must replay the log
- it must report the first broken link
Optional stronger version:
HMAC-SHA256 with a locally stored secret key
Memory:
Use SQLite if practical in LANGUAGE. Use JSONL only if SQLite support is impractical or broken.
Persist:
- conversation_id
- turn_id
- timestamp
- role
- content
- tool_calls
- tool_results
- provider
- model
- metadata
Required commands:
app memory search QUERY
app memory show CONVERSATION_ID
app memory list
app memory clear --yes
Search may be simple substring search.
Optional scoring:
- tokenize query and content
- rank by term frequency
- boost recent conversations
Agent loop:
Implement this loop:
1. Receive user message from channel.
2. Create or resume conversation.
3. Load recent memory context.
4. Build system prompt.
5. Build tool schemas from active tools.
6. Call provider.
7. If provider returns text only, persist and reply.
8. If provider returns tool calls:
a. For each tool call, classify risk.
b. Validate policy.
c. Ask approval when required.
d. Invoke or deny.
e. Write receipt.
f. Persist tool call and result.
9. Send tool results back to provider.
10. Repeat until final text or max_tool_rounds is reached.
11. Persist final assistant response.
12. Reply to channel.
Guardrails:
max_tool_rounds default:
5
max_response_bytes default:
1 MB
tool execution timeout default:
30 seconds
shell timeout default:
15 seconds
HTTP timeout default:
20 seconds
The runtime must not recursively invoke tools forever.
Required CLI command surface:
app init
app onboard
app config validate
app config show
app provider list
app provider test NAME
app tool list
app tool run NAME --json ARGS
app agent
app agent -m MESSAGE
app memory list
app memory search QUERY
app memory show CONVERSATION_ID
app receipt list
app receipt verify
app estop
Optional commands:
app service install
app service start
app service stop
app service status
app sop list
app sop validate
app sop run NAME
app plugin list
app plugin install PATH
SOP engine, optional but valuable:
Implement deterministic workflows loaded from:
~/.appname/workspace/sops/<name>/SOP.toml
Minimum SOP format:
name = "daily-check"
description = "Run a daily workspace check"
[[steps]]
id = "list"
kind = "tool"
tool = "file_list"
args = { path = "." }
[[steps]]
id = "summarize"
kind = "agent"
prompt = "Summarize the file list from the previous step."
[[steps]]
id = "approval"
kind = "approval"
prompt = "Continue to write report?"
[[steps]]
id = "write"
kind = "tool"
tool = "file_write"
args = { path = "daily-check.txt", content_from = "summarize" }
Requirements:
- validate step IDs are unique
- validate referenced tools exist
- persist SOP run state
- stop at approval steps until approved
- support on_failure = "abort"
- support on_failure = "continue"
Plugin system, stretch goal:
A plugin is a directory:
plugin-name/
manifest.toml
executable-or-script
Minimum manifest:
name = "echo-plugin"
version = "0.1.0"
capabilities = ["tool"]
[[tools]]
name = "echo"
description = "Echoes input"
command = "./echo-plugin"
schema = { type = "object" }
The runtime discovers plugins under:
~/.appname/plugins/
Simpler acceptable version:
Support external process tools where the runtime invokes a configured executable with JSON on stdin and reads JSON from stdout.
Observability:
Minimum logging:
- human-readable logs to stderr
- structured JSON logs when APPNAME_LOG=json, adjusted to the executable name
- never log secrets
Log events:
- startup
- config path
- workspace path
- provider selected
- channel started
- conversation started
- tool requested
- tool approved
- tool denied
- tool completed
- tool failed
- receipt written
- memory persisted
- estop triggered
Optional metrics endpoint:
GET /metrics
Expose counters if the endpoint is implemented:
app_conversations_total
app_tool_calls_total
app_tool_denials_total
app_provider_errors_total
app_receipt_chain_valid
Emergency stop:
app estop
Creates:
~/.appname/ESTOP
When this file exists:
- no new tool calls may run
- existing long-running shell/http tasks should be cancelled if possible
- the agent may still answer text-only messages explaining that tool use is stopped
app estop --clear
Removes the file.
Acceptance tests:
Test 1: init creates expected files.
Given no ~/.appname directory
When app init runs
Then ~/.appname/config file exists
And memory database or memory JSONL exists
And workspace_dir exists
Test 2: config validation catches invalid autonomy.
Given autonomy = "godmode"
When app config validate runs
Then exit code is nonzero
And output mentions allowed values
Test 3: mock provider text-only response.
Given mock provider fixture returns "hello"
When app agent -m "hi" runs
Then stdout contains "hello"
And memory contains the user and assistant turn
Test 4: model-triggered file_list tool.
Given mock provider fixture emits tool_call file_list { path = "." }
When app agent -m "list files" runs
Then file_list executes inside workspace
And a tool receipt is written
And final answer includes the file list summary
Test 5: workspace escape blocked.
Given workspace_only = true
When model requests file_read { path = "/etc/passwd" }
Then tool is denied
And a denied receipt is written
And the provider receives a tool error
Test 6: supervised approval.
Given autonomy = "supervised"
When model requests file_write
Then CLI asks for approval
And default empty answer denies
And "y" approves
Test 7: forbidden command blocked.
When model requests shell { command = "rm -rf /" }
Then tool is blocked before execution
And receipt status is denied
Test 8: receipt chain detects tampering.
Given three receipts exist
When the second receipt is edited manually
Then app receipt verify reports invalid chain at receipt 2
Test 9: provider fallback, if reliable provider is implemented.
Given reliable provider = [bad_provider, mock_provider]
And bad_provider times out
When agent runs
Then runtime logs fallback
And response comes from mock_provider
Test 10: memory search.
Given a previous conversation contains "Aardvark adapter"
When app memory search "aardvark" runs
Then the previous conversation ID is returned
Implementation priorities:
First produce a working vertical slice:
1. application naming
2. init
3. config loading and validation
4. mock provider
5. CLI one-shot agent mode
6. tools: time, file_list, file_read
7. security policy for workspace paths
8. memory persistence
9. receipt writing and verification
10. tests
Then add:
11. interactive REPL
12. file_write with approval
13. shell with blocking rules
14. HTTP GET tool
15. OpenAI-compatible provider
16. optional gateway
17. optional SOP engine
18. optional external-process plugins
Quality requirements:
- Keep the implementation idiomatic for LANGUAGE.
- Do not quietly substitute another implementation language.
- Do not use Python, JavaScript, Rust, or C as the primary implementation language.
- Shell scripts are acceptable only for setup convenience.
- Prefer simple, boring dependencies.
- Write tests for denied actions, not just successful actions.
- Keep secrets out of logs.
- Keep workspace path handling strict and well-tested.
- Use deterministic mock fixtures so tests do not require network access.
- Update README.md with architecture, config, security policy, commands, and test instructions.
Do not stop after creating stubs. Implement the core behavior. If a feature is not practical in LANGUAGE, document the limitation and implement the closest useful equivalent.
| 1 | You are now implementing the real application in this workspace. |
| 2 | |
| 3 | The target implementation language is LANGUAGE. |
| 4 | |
| 5 | This workspace should already contain a verified LANGUAGE project skeleton with basic support for CLI, HTTP, SQLite, config parsing, tests, and build/test commands. Begin by inspecting the existing workspace and README before changing anything. |
| 6 | |
| 7 | Your first task is to name the application. |
| 8 | |
| 9 | The placeholder name is: |
| 10 | |
| 11 | *Claw |
| 12 | |
| 13 | Replace the wildcard with a short, distinctive prefix suitable for this LANGUAGE implementation. Examples of the naming style: |
| 14 | |
| 15 | LispClaw |
| 16 | BeamClaw |
| 17 | LogicClaw |
| 18 | CrystalClaw |
| 19 | CobolClaw |
| 20 | |
| 21 | Do not use an existing project’s name or branding. Once you choose the name, use it consistently for: |
| 22 | |
| 23 | - executable name |
| 24 | - README title |
| 25 | - config directory |
| 26 | - default workspace directory |
| 27 | - log names |
| 28 | - test names where appropriate |
| 29 | |
| 30 | If the selected name is not suitable for an executable on macOS, create a lowercase/kebab-case executable form and document the relationship. For example: |
| 31 | |
| 32 | Application name: LogicClaw |
| 33 | Executable: logicclaw |
| 34 | |
| 35 | The goal is to build a small, local-first agent runtime. It should run as a single command-line application that can load configuration, talk to a model provider, expose a CLI agent loop, execute a small set of tools through a security gate, persist memory, and write tamper-evident tool receipts. |
| 36 | |
| 37 | Do not mention or depend on any external agent runtime project. Treat this as a clean-room implementation from this spec. |
| 38 | |
| 39 | Core workflow: |
| 40 | |
| 41 | app init |
| 42 | app config validate |
| 43 | app config show |
| 44 | app provider list |
| 45 | app provider test NAME |
| 46 | app tool list |
| 47 | app tool run NAME --json ARGS |
| 48 | app agent |
| 49 | app agent -m "What files are in this project?" |
| 50 | app memory search "previous topic" |
| 51 | app receipt verify |
| 52 | app estop |
| 53 | |
| 54 | Use the final executable name you selected instead of `app`. |
| 55 | |
| 56 | The agent must: |
| 57 | |
| 58 | - accept input from the CLI channel |
| 59 | - send the conversation to a configured model provider |
| 60 | - advertise available tools to the model |
| 61 | - parse tool calls from the model response |
| 62 | - validate each tool call through a security policy |
| 63 | - execute approved tools |
| 64 | - feed tool results back into the model |
| 65 | - persist the final exchange, tool calls, tool results, and receipts |
| 66 | - return a final answer to the user |
| 67 | |
| 68 | Required architecture: |
| 69 | |
| 70 | The implementation must have visible separation of responsibility for these areas: |
| 71 | |
| 72 | runtime |
| 73 | agent loop, request lifecycle, orchestration |
| 74 | |
| 75 | config |
| 76 | config loading, validation, defaults, path expansion |
| 77 | |
| 78 | providers |
| 79 | model provider abstraction and concrete providers |
| 80 | |
| 81 | channels |
| 82 | CLI channel and optional HTTP/gateway channel |
| 83 | |
| 84 | tools |
| 85 | time, file_list, file_read, file_write, shell, http, memory_search |
| 86 | |
| 87 | security |
| 88 | autonomy levels, command/path policy, tool-risk classification |
| 89 | |
| 90 | memory |
| 91 | SQLite persistence, or JSONL only if SQLite is impractical in LANGUAGE |
| 92 | |
| 93 | receipts |
| 94 | tamper-evident tool-call receipts |
| 95 | |
| 96 | sop |
| 97 | optional deterministic workflow runner |
| 98 | |
| 99 | service |
| 100 | optional install/start/stop/status wrappers |
| 101 | |
| 102 | Do not force object-oriented structure if LANGUAGE is not object-oriented. Use idiomatic LANGUAGE design, but preserve the conceptual boundaries. |
| 103 | |
| 104 | Configuration: |
| 105 | |
| 106 | The application must use a user-editable config file. |
| 107 | |
| 108 | Default location should be based on the final app name, for example: |
| 109 | |
| 110 | ~/.logicclaw/config.toml |
| 111 | |
| 112 | TOML is preferred. JSON, INI, S-expression, or another idiomatic config format is acceptable if TOML support is weak in LANGUAGE. If not using TOML, document why. |
| 113 | |
| 114 | Minimum config shape: |
| 115 | |
| 116 | workspace_dir = "~/logicclaw-workspace" |
| 117 | default_provider = "local" |
| 118 | default_model = "mock" |
| 119 | |
| 120 | [security] |
| 121 | autonomy = "supervised" |
| 122 | workspace_only = true |
| 123 | forbidden_paths = ["/etc", "/sys", "/boot", "~/.ssh"] |
| 124 | forbidden_commands = ["rm", "shutdown", "reboot", "mkfs", "dd"] |
| 125 | audit_log = true |
| 126 | |
| 127 | [providers.models.local] |
| 128 | kind = "mock" |
| 129 | model = "mock" |
| 130 | |
| 131 | [providers.models.openai_compatible] |
| 132 | kind = "openai-compatible" |
| 133 | base_url = "http://localhost:1234/v1" |
| 134 | model = "local-model" |
| 135 | api_key_env = "OPENAI_API_KEY" |
| 136 | |
| 137 | [channels.cli] |
| 138 | enabled = true |
| 139 | tools_allow = ["file_read", "file_list", "time", "memory_search", "shell"] |
| 140 | |
| 141 | [memory] |
| 142 | backend = "sqlite" |
| 143 | path = "~/.logicclaw/memory.sqlite" |
| 144 | |
| 145 | [receipts] |
| 146 | enabled = true |
| 147 | path = "~/.logicclaw/tool_receipts.log" |
| 148 | |
| 149 | Adjust paths to match the application name you chose. |
| 150 | |
| 151 | Config requirements: |
| 152 | |
| 153 | - load defaults when keys are absent |
| 154 | - expand ~ and environment variables |
| 155 | - validate enum values |
| 156 | - validate that workspace exists or create it during init |
| 157 | - do not require API keys for mock mode |
| 158 | - support provider credentials by environment variable |
| 159 | - never print secret values in logs or config dumps |
| 160 | - config validate must report all detected errors in one pass when practical |
| 161 | |
| 162 | Provider abstraction: |
| 163 | |
| 164 | Create an idiomatic equivalent of: |
| 165 | |
| 166 | Provider |
| 167 | name() -> string |
| 168 | capabilities() -> ProviderCapabilities |
| 169 | chat(request: ChatRequest) -> ChatResponse |
| 170 | |
| 171 | ChatRequest must contain: |
| 172 | |
| 173 | - system_prompt |
| 174 | - messages |
| 175 | - tools |
| 176 | - model |
| 177 | - optional temperature |
| 178 | - optional metadata |
| 179 | |
| 180 | ChatResponse must contain: |
| 181 | |
| 182 | - final_text |
| 183 | - tool_calls |
| 184 | - optional raw_provider_payload |
| 185 | - optional usage |
| 186 | |
| 187 | Required providers: |
| 188 | |
| 189 | mock |
| 190 | Deterministic provider used for tests. It must be able to return ordinary text and tool calls from scripted fixtures. |
| 191 | |
| 192 | openai-compatible |
| 193 | Sends requests to an OpenAI-compatible /chat/completions endpoint. Full support for every provider is not required. Implement non-streaming chat completion. Tool/function call support is required if reasonably practical in LANGUAGE; otherwise document the limitation clearly. |
| 194 | |
| 195 | Optional providers: |
| 196 | |
| 197 | reliable |
| 198 | Wrapper provider that tries provider names in order and falls back on network/auth/timeout errors. |
| 199 | |
| 200 | router |
| 201 | Wrapper provider that chooses a provider from request metadata hints. |
| 202 | |
| 203 | Channel abstraction: |
| 204 | |
| 205 | Create an idiomatic equivalent of: |
| 206 | |
| 207 | Channel |
| 208 | name() -> string |
| 209 | start(runtime_handle) |
| 210 | send(conversation_id, message) |
| 211 | supports_draft_updates() -> bool |
| 212 | |
| 213 | Required channel: |
| 214 | |
| 215 | cli |
| 216 | |
| 217 | CLI behavior: |
| 218 | |
| 219 | app agent |
| 220 | starts a REPL |
| 221 | |
| 222 | app agent -m "message" |
| 223 | runs one turn and exits |
| 224 | |
| 225 | REPL commands: |
| 226 | |
| 227 | /exit |
| 228 | exits |
| 229 | |
| 230 | /tools |
| 231 | lists active tools |
| 232 | |
| 233 | /memory <query> |
| 234 | searches memory |
| 235 | |
| 236 | /policy |
| 237 | prints current autonomy and workspace boundary |
| 238 | |
| 239 | Optional gateway channel: |
| 240 | |
| 241 | localhost HTTP server |
| 242 | |
| 243 | Minimum optional gateway endpoints: |
| 244 | |
| 245 | GET /health |
| 246 | GET /status |
| 247 | GET /tools |
| 248 | POST /chat |
| 249 | GET /memory/search?q=... |
| 250 | GET /receipts |
| 251 | POST /estop |
| 252 | |
| 253 | Tool abstraction: |
| 254 | |
| 255 | Create an idiomatic equivalent of: |
| 256 | |
| 257 | Tool |
| 258 | name() -> string |
| 259 | description() -> string |
| 260 | parameters_schema() -> JSON Schema object or equivalent metadata |
| 261 | risk(args, context) -> low | medium | high |
| 262 | invoke(args, context) -> ToolResult |
| 263 | |
| 264 | ToolResult must contain: |
| 265 | |
| 266 | - success: bool |
| 267 | - output: string |
| 268 | - optional error: string |
| 269 | - optional metadata |
| 270 | - optional receipt_id |
| 271 | |
| 272 | Required built-in tools: |
| 273 | |
| 274 | time |
| 275 | Returns current local time, UTC time, and timezone if available. |
| 276 | |
| 277 | file_list |
| 278 | Lists files under a path inside workspace. |
| 279 | |
| 280 | file_read |
| 281 | Reads a UTF-8 text file inside workspace. |
| 282 | |
| 283 | file_write |
| 284 | Writes a UTF-8 text file inside workspace. |
| 285 | |
| 286 | shell |
| 287 | Executes a shell command inside workspace, subject to security policy. |
| 288 | |
| 289 | http |
| 290 | Performs HTTP GET. POST is optional. |
| 291 | |
| 292 | memory_search |
| 293 | Searches persisted conversations. |
| 294 | |
| 295 | Optional tools: |
| 296 | |
| 297 | web_search |
| 298 | May be stubbed unless a search API key is configured. |
| 299 | |
| 300 | pdf_extract |
| 301 | Optional. |
| 302 | |
| 303 | ask_user |
| 304 | In CLI mode, asks the user a question and returns the answer. |
| 305 | |
| 306 | Security model: |
| 307 | |
| 308 | Implement three autonomy levels: |
| 309 | |
| 310 | readonly |
| 311 | Low-risk read-only tools allowed. |
| 312 | No file_write. |
| 313 | No shell execution except optionally harmless commands such as pwd. |
| 314 | |
| 315 | supervised |
| 316 | Low-risk tools run automatically. |
| 317 | Medium-risk tools require operator approval. |
| 318 | High-risk tools are blocked. |
| 319 | |
| 320 | full |
| 321 | Low and medium run automatically. |
| 322 | High-risk is still blocked if explicitly forbidden by path or command policy. |
| 323 | |
| 324 | Default must be: |
| 325 | |
| 326 | supervised |
| 327 | |
| 328 | Risk rules: |
| 329 | |
| 330 | time, memory_search, file_list, file_read inside workspace: |
| 331 | low |
| 332 | |
| 333 | http GET to allowed domains: |
| 334 | low |
| 335 | |
| 336 | file_write inside workspace: |
| 337 | medium |
| 338 | |
| 339 | shell command from allowlist: |
| 340 | medium |
| 341 | |
| 342 | shell command not on allowlist: |
| 343 | high |
| 344 | |
| 345 | any path outside workspace when workspace_only = true: |
| 346 | blocked |
| 347 | |
| 348 | any path under forbidden_paths: |
| 349 | blocked |
| 350 | |
| 351 | any command whose basename appears in forbidden_commands: |
| 352 | blocked |
| 353 | |
| 354 | Any shell command containing obvious destructive patterns must be blocked. Minimum patterns: |
| 355 | |
| 356 | rm -rf / |
| 357 | rm -rf * |
| 358 | mkfs |
| 359 | dd if= |
| 360 | :(){ :|:& };: |
| 361 | shutdown |
| 362 | reboot |
| 363 | chmod -R 777 / |
| 364 | chown -R |
| 365 | curl ... | sh |
| 366 | wget ... | sh |
| 367 | |
| 368 | Approval flow in CLI mode: |
| 369 | |
| 370 | When a medium-risk action requires approval, print something like: |
| 371 | |
| 372 | Tool request: |
| 373 | tool: file_write |
| 374 | risk: medium |
| 375 | reason: writes to workspace |
| 376 | args: ... |
| 377 | Approve? [y/N] |
| 378 | |
| 379 | Default is deny. |
| 380 | |
| 381 | Tool receipts: |
| 382 | |
| 383 | Every attempted tool invocation must produce a receipt whether it is allowed, denied, failed, or approved. |
| 384 | |
| 385 | Receipt fields: |
| 386 | |
| 387 | { |
| 388 | "id": "receipt-...", |
| 389 | "timestamp": "2026-05-12T14:00:00Z", |
| 390 | "conversation_id": "...", |
| 391 | "tool": "file_read", |
| 392 | "args_hash": "...", |
| 393 | "result_hash": "...", |
| 394 | "status": "allowed|denied|failed", |
| 395 | "risk": "low|medium|high", |
| 396 | "previous_hash": "...", |
| 397 | "receipt_hash": "..." |
| 398 | } |
| 399 | |
| 400 | Receipt hash: |
| 401 | |
| 402 | receipt_hash = SHA256(canonical_json(receipt_without_receipt_hash)) |
| 403 | |
| 404 | Tamper-evident chain: |
| 405 | |
| 406 | - each receipt includes the previous receipt’s hash |
| 407 | - receipt verify must replay the log |
| 408 | - it must report the first broken link |
| 409 | |
| 410 | Optional stronger version: |
| 411 | |
| 412 | HMAC-SHA256 with a locally stored secret key |
| 413 | |
| 414 | Memory: |
| 415 | |
| 416 | Use SQLite if practical in LANGUAGE. Use JSONL only if SQLite support is impractical or broken. |
| 417 | |
| 418 | Persist: |
| 419 | |
| 420 | - conversation_id |
| 421 | - turn_id |
| 422 | - timestamp |
| 423 | - role |
| 424 | - content |
| 425 | - tool_calls |
| 426 | - tool_results |
| 427 | - provider |
| 428 | - model |
| 429 | - metadata |
| 430 | |
| 431 | Required commands: |
| 432 | |
| 433 | app memory search QUERY |
| 434 | app memory show CONVERSATION_ID |
| 435 | app memory list |
| 436 | app memory clear --yes |
| 437 | |
| 438 | Search may be simple substring search. |
| 439 | |
| 440 | Optional scoring: |
| 441 | |
| 442 | - tokenize query and content |
| 443 | - rank by term frequency |
| 444 | - boost recent conversations |
| 445 | |
| 446 | Agent loop: |
| 447 | |
| 448 | Implement this loop: |
| 449 | |
| 450 | 1. Receive user message from channel. |
| 451 | 2. Create or resume conversation. |
| 452 | 3. Load recent memory context. |
| 453 | 4. Build system prompt. |
| 454 | 5. Build tool schemas from active tools. |
| 455 | 6. Call provider. |
| 456 | 7. If provider returns text only, persist and reply. |
| 457 | 8. If provider returns tool calls: |
| 458 | a. For each tool call, classify risk. |
| 459 | b. Validate policy. |
| 460 | c. Ask approval when required. |
| 461 | d. Invoke or deny. |
| 462 | e. Write receipt. |
| 463 | f. Persist tool call and result. |
| 464 | 9. Send tool results back to provider. |
| 465 | 10. Repeat until final text or max_tool_rounds is reached. |
| 466 | 11. Persist final assistant response. |
| 467 | 12. Reply to channel. |
| 468 | |
| 469 | Guardrails: |
| 470 | |
| 471 | max_tool_rounds default: |
| 472 | 5 |
| 473 | |
| 474 | max_response_bytes default: |
| 475 | 1 MB |
| 476 | |
| 477 | tool execution timeout default: |
| 478 | 30 seconds |
| 479 | |
| 480 | shell timeout default: |
| 481 | 15 seconds |
| 482 | |
| 483 | HTTP timeout default: |
| 484 | 20 seconds |
| 485 | |
| 486 | The runtime must not recursively invoke tools forever. |
| 487 | |
| 488 | Required CLI command surface: |
| 489 | |
| 490 | app init |
| 491 | app onboard |
| 492 | app config validate |
| 493 | app config show |
| 494 | app provider list |
| 495 | app provider test NAME |
| 496 | app tool list |
| 497 | app tool run NAME --json ARGS |
| 498 | app agent |
| 499 | app agent -m MESSAGE |
| 500 | app memory list |
| 501 | app memory search QUERY |
| 502 | app memory show CONVERSATION_ID |
| 503 | app receipt list |
| 504 | app receipt verify |
| 505 | app estop |
| 506 | |
| 507 | Optional commands: |
| 508 | |
| 509 | app service install |
| 510 | app service start |
| 511 | app service stop |
| 512 | app service status |
| 513 | app sop list |
| 514 | app sop validate |
| 515 | app sop run NAME |
| 516 | app plugin list |
| 517 | app plugin install PATH |
| 518 | |
| 519 | SOP engine, optional but valuable: |
| 520 | |
| 521 | Implement deterministic workflows loaded from: |
| 522 | |
| 523 | ~/.appname/workspace/sops/<name>/SOP.toml |
| 524 | |
| 525 | Minimum SOP format: |
| 526 | |
| 527 | name = "daily-check" |
| 528 | description = "Run a daily workspace check" |
| 529 | |
| 530 | [[steps]] |
| 531 | id = "list" |
| 532 | kind = "tool" |
| 533 | tool = "file_list" |
| 534 | args = { path = "." } |
| 535 | |
| 536 | [[steps]] |
| 537 | id = "summarize" |
| 538 | kind = "agent" |
| 539 | prompt = "Summarize the file list from the previous step." |
| 540 | |
| 541 | [[steps]] |
| 542 | id = "approval" |
| 543 | kind = "approval" |
| 544 | prompt = "Continue to write report?" |
| 545 | |
| 546 | [[steps]] |
| 547 | id = "write" |
| 548 | kind = "tool" |
| 549 | tool = "file_write" |
| 550 | args = { path = "daily-check.txt", content_from = "summarize" } |
| 551 | |
| 552 | Requirements: |
| 553 | |
| 554 | - validate step IDs are unique |
| 555 | - validate referenced tools exist |
| 556 | - persist SOP run state |
| 557 | - stop at approval steps until approved |
| 558 | - support on_failure = "abort" |
| 559 | - support on_failure = "continue" |
| 560 | |
| 561 | Plugin system, stretch goal: |
| 562 | |
| 563 | A plugin is a directory: |
| 564 | |
| 565 | plugin-name/ |
| 566 | manifest.toml |
| 567 | executable-or-script |
| 568 | |
| 569 | Minimum manifest: |
| 570 | |
| 571 | name = "echo-plugin" |
| 572 | version = "0.1.0" |
| 573 | capabilities = ["tool"] |
| 574 | |
| 575 | [[tools]] |
| 576 | name = "echo" |
| 577 | description = "Echoes input" |
| 578 | command = "./echo-plugin" |
| 579 | schema = { type = "object" } |
| 580 | |
| 581 | The runtime discovers plugins under: |
| 582 | |
| 583 | ~/.appname/plugins/ |
| 584 | |
| 585 | Simpler acceptable version: |
| 586 | |
| 587 | Support external process tools where the runtime invokes a configured executable with JSON on stdin and reads JSON from stdout. |
| 588 | |
| 589 | Observability: |
| 590 | |
| 591 | Minimum logging: |
| 592 | |
| 593 | - human-readable logs to stderr |
| 594 | - structured JSON logs when APPNAME_LOG=json, adjusted to the executable name |
| 595 | - never log secrets |
| 596 | |
| 597 | Log events: |
| 598 | |
| 599 | - startup |
| 600 | - config path |
| 601 | - workspace path |
| 602 | - provider selected |
| 603 | - channel started |
| 604 | - conversation started |
| 605 | - tool requested |
| 606 | - tool approved |
| 607 | - tool denied |
| 608 | - tool completed |
| 609 | - tool failed |
| 610 | - receipt written |
| 611 | - memory persisted |
| 612 | - estop triggered |
| 613 | |
| 614 | Optional metrics endpoint: |
| 615 | |
| 616 | GET /metrics |
| 617 | |
| 618 | Expose counters if the endpoint is implemented: |
| 619 | |
| 620 | app_conversations_total |
| 621 | app_tool_calls_total |
| 622 | app_tool_denials_total |
| 623 | app_provider_errors_total |
| 624 | app_receipt_chain_valid |
| 625 | |
| 626 | Emergency stop: |
| 627 | |
| 628 | app estop |
| 629 | |
| 630 | Creates: |
| 631 | |
| 632 | ~/.appname/ESTOP |
| 633 | |
| 634 | When this file exists: |
| 635 | |
| 636 | - no new tool calls may run |
| 637 | - existing long-running shell/http tasks should be cancelled if possible |
| 638 | - the agent may still answer text-only messages explaining that tool use is stopped |
| 639 | |
| 640 | app estop --clear |
| 641 | |
| 642 | Removes the file. |
| 643 | |
| 644 | Acceptance tests: |
| 645 | |
| 646 | Test 1: init creates expected files. |
| 647 | |
| 648 | Given no ~/.appname directory |
| 649 | When app init runs |
| 650 | Then ~/.appname/config file exists |
| 651 | And memory database or memory JSONL exists |
| 652 | And workspace_dir exists |
| 653 | |
| 654 | Test 2: config validation catches invalid autonomy. |
| 655 | |
| 656 | Given autonomy = "godmode" |
| 657 | When app config validate runs |
| 658 | Then exit code is nonzero |
| 659 | And output mentions allowed values |
| 660 | |
| 661 | Test 3: mock provider text-only response. |
| 662 | |
| 663 | Given mock provider fixture returns "hello" |
| 664 | When app agent -m "hi" runs |
| 665 | Then stdout contains "hello" |
| 666 | And memory contains the user and assistant turn |
| 667 | |
| 668 | Test 4: model-triggered file_list tool. |
| 669 | |
| 670 | Given mock provider fixture emits tool_call file_list { path = "." } |
| 671 | When app agent -m "list files" runs |
| 672 | Then file_list executes inside workspace |
| 673 | And a tool receipt is written |
| 674 | And final answer includes the file list summary |
| 675 | |
| 676 | Test 5: workspace escape blocked. |
| 677 | |
| 678 | Given workspace_only = true |
| 679 | When model requests file_read { path = "/etc/passwd" } |
| 680 | Then tool is denied |
| 681 | And a denied receipt is written |
| 682 | And the provider receives a tool error |
| 683 | |
| 684 | Test 6: supervised approval. |
| 685 | |
| 686 | Given autonomy = "supervised" |
| 687 | When model requests file_write |
| 688 | Then CLI asks for approval |
| 689 | And default empty answer denies |
| 690 | And "y" approves |
| 691 | |
| 692 | Test 7: forbidden command blocked. |
| 693 | |
| 694 | When model requests shell { command = "rm -rf /" } |
| 695 | Then tool is blocked before execution |
| 696 | And receipt status is denied |
| 697 | |
| 698 | Test 8: receipt chain detects tampering. |
| 699 | |
| 700 | Given three receipts exist |
| 701 | When the second receipt is edited manually |
| 702 | Then app receipt verify reports invalid chain at receipt 2 |
| 703 | |
| 704 | Test 9: provider fallback, if reliable provider is implemented. |
| 705 | |
| 706 | Given reliable provider = [bad_provider, mock_provider] |
| 707 | And bad_provider times out |
| 708 | When agent runs |
| 709 | Then runtime logs fallback |
| 710 | And response comes from mock_provider |
| 711 | |
| 712 | Test 10: memory search. |
| 713 | |
| 714 | Given a previous conversation contains "Aardvark adapter" |
| 715 | When app memory search "aardvark" runs |
| 716 | Then the previous conversation ID is returned |
| 717 | |
| 718 | Implementation priorities: |
| 719 | |
| 720 | First produce a working vertical slice: |
| 721 | |
| 722 | 1. application naming |
| 723 | 2. init |
| 724 | 3. config loading and validation |
| 725 | 4. mock provider |
| 726 | 5. CLI one-shot agent mode |
| 727 | 6. tools: time, file_list, file_read |
| 728 | 7. security policy for workspace paths |
| 729 | 8. memory persistence |
| 730 | 9. receipt writing and verification |
| 731 | 10. tests |
| 732 | |
| 733 | Then add: |
| 734 | |
| 735 | 11. interactive REPL |
| 736 | 12. file_write with approval |
| 737 | 13. shell with blocking rules |
| 738 | 14. HTTP GET tool |
| 739 | 15. OpenAI-compatible provider |
| 740 | 16. optional gateway |
| 741 | 17. optional SOP engine |
| 742 | 18. optional external-process plugins |
| 743 | |
| 744 | Quality requirements: |
| 745 | |
| 746 | - Keep the implementation idiomatic for LANGUAGE. |
| 747 | - Do not quietly substitute another implementation language. |
| 748 | - Do not use Python, JavaScript, Rust, or C as the primary implementation language. |
| 749 | - Shell scripts are acceptable only for setup convenience. |
| 750 | - Prefer simple, boring dependencies. |
| 751 | - Write tests for denied actions, not just successful actions. |
| 752 | - Keep secrets out of logs. |
| 753 | - Keep workspace path handling strict and well-tested. |
| 754 | - Use deterministic mock fixtures so tests do not require network access. |
| 755 | - Update README.md with architecture, config, security policy, commands, and test instructions. |
| 756 | |
| 757 | Do not stop after creating stubs. Implement the core behavior. If a feature is not practical in LANGUAGE, document the limitation and implement the closest useful equivalent. |