cadwaladyr revised this gist 3 weeks ago. Go to revision
1 file changed, 1 insertion, 7 deletions
ClawClone.prompt
| @@ -10,13 +10,7 @@ The placeholder name is: | |||
| 10 | 10 | ||
| 11 | 11 | *Claw | |
| 12 | 12 | ||
| 13 | - | Replace the wildcard with a short, distinctive prefix suitable for this LANGUAGE implementation. Examples of the naming style: | |
| 14 | - | ||
| 15 | - | LispClaw | |
| 16 | - | BeamClaw | |
| 17 | - | LogicClaw | |
| 18 | - | CrystalClaw | |
| 19 | - | CobolClaw | |
| 13 | + | Replace the wildcard with a short, distinctive prefix suitable for this implementation. | |
| 20 | 14 | ||
| 21 | 15 | Do not use an existing project’s name or branding. Once you choose the name, use it consistently for: | |
| 22 | 16 | ||
cadwaladyr revised this gist 3 weeks ago. Go to revision
1 file changed, 757 insertions
ClawClone.prompt(file created)
| @@ -0,0 +1,757 @@ | |||
| 1 | + | You are now implementing the real application in this workspace. | |
| 2 | + | ||
| 3 | + | The target implementation language is LANGUAGE. | |
| 4 | + | ||
| 5 | + | This workspace should already contain a verified LANGUAGE project skeleton with basic support for CLI, HTTP, SQLite, config parsing, tests, and build/test commands. Begin by inspecting the existing workspace and README before changing anything. | |
| 6 | + | ||
| 7 | + | Your first task is to name the application. | |
| 8 | + | ||
| 9 | + | The placeholder name is: | |
| 10 | + | ||
| 11 | + | *Claw | |
| 12 | + | ||
| 13 | + | Replace the wildcard with a short, distinctive prefix suitable for this LANGUAGE implementation. Examples of the naming style: | |
| 14 | + | ||
| 15 | + | LispClaw | |
| 16 | + | BeamClaw | |
| 17 | + | LogicClaw | |
| 18 | + | CrystalClaw | |
| 19 | + | CobolClaw | |
| 20 | + | ||
| 21 | + | Do not use an existing project’s name or branding. Once you choose the name, use it consistently for: | |
| 22 | + | ||
| 23 | + | - executable name | |
| 24 | + | - README title | |
| 25 | + | - config directory | |
| 26 | + | - default workspace directory | |
| 27 | + | - log names | |
| 28 | + | - test names where appropriate | |
| 29 | + | ||
| 30 | + | If the selected name is not suitable for an executable on macOS, create a lowercase/kebab-case executable form and document the relationship. For example: | |
| 31 | + | ||
| 32 | + | Application name: LogicClaw | |
| 33 | + | Executable: logicclaw | |
| 34 | + | ||
| 35 | + | The goal is to build a small, local-first agent runtime. It should run as a single command-line application that can load configuration, talk to a model provider, expose a CLI agent loop, execute a small set of tools through a security gate, persist memory, and write tamper-evident tool receipts. | |
| 36 | + | ||
| 37 | + | Do not mention or depend on any external agent runtime project. Treat this as a clean-room implementation from this spec. | |
| 38 | + | ||
| 39 | + | Core workflow: | |
| 40 | + | ||
| 41 | + | app init | |
| 42 | + | app config validate | |
| 43 | + | app config show | |
| 44 | + | app provider list | |
| 45 | + | app provider test NAME | |
| 46 | + | app tool list | |
| 47 | + | app tool run NAME --json ARGS | |
| 48 | + | app agent | |
| 49 | + | app agent -m "What files are in this project?" | |
| 50 | + | app memory search "previous topic" | |
| 51 | + | app receipt verify | |
| 52 | + | app estop | |
| 53 | + | ||
| 54 | + | Use the final executable name you selected instead of `app`. | |
| 55 | + | ||
| 56 | + | The agent must: | |
| 57 | + | ||
| 58 | + | - accept input from the CLI channel | |
| 59 | + | - send the conversation to a configured model provider | |
| 60 | + | - advertise available tools to the model | |
| 61 | + | - parse tool calls from the model response | |
| 62 | + | - validate each tool call through a security policy | |
| 63 | + | - execute approved tools | |
| 64 | + | - feed tool results back into the model | |
| 65 | + | - persist the final exchange, tool calls, tool results, and receipts | |
| 66 | + | - return a final answer to the user | |
| 67 | + | ||
| 68 | + | Required architecture: | |
| 69 | + | ||
| 70 | + | The implementation must have visible separation of responsibility for these areas: | |
| 71 | + | ||
| 72 | + | runtime | |
| 73 | + | agent loop, request lifecycle, orchestration | |
| 74 | + | ||
| 75 | + | config | |
| 76 | + | config loading, validation, defaults, path expansion | |
| 77 | + | ||
| 78 | + | providers | |
| 79 | + | model provider abstraction and concrete providers | |
| 80 | + | ||
| 81 | + | channels | |
| 82 | + | CLI channel and optional HTTP/gateway channel | |
| 83 | + | ||
| 84 | + | tools | |
| 85 | + | time, file_list, file_read, file_write, shell, http, memory_search | |
| 86 | + | ||
| 87 | + | security | |
| 88 | + | autonomy levels, command/path policy, tool-risk classification | |
| 89 | + | ||
| 90 | + | memory | |
| 91 | + | SQLite persistence, or JSONL only if SQLite is impractical in LANGUAGE | |
| 92 | + | ||
| 93 | + | receipts | |
| 94 | + | tamper-evident tool-call receipts | |
| 95 | + | ||
| 96 | + | sop | |
| 97 | + | optional deterministic workflow runner | |
| 98 | + | ||
| 99 | + | service | |
| 100 | + | optional install/start/stop/status wrappers | |
| 101 | + | ||
| 102 | + | Do not force object-oriented structure if LANGUAGE is not object-oriented. Use idiomatic LANGUAGE design, but preserve the conceptual boundaries. | |
| 103 | + | ||
| 104 | + | Configuration: | |
| 105 | + | ||
| 106 | + | The application must use a user-editable config file. | |
| 107 | + | ||
| 108 | + | Default location should be based on the final app name, for example: | |
| 109 | + | ||
| 110 | + | ~/.logicclaw/config.toml | |
| 111 | + | ||
| 112 | + | TOML is preferred. JSON, INI, S-expression, or another idiomatic config format is acceptable if TOML support is weak in LANGUAGE. If not using TOML, document why. | |
| 113 | + | ||
| 114 | + | Minimum config shape: | |
| 115 | + | ||
| 116 | + | workspace_dir = "~/logicclaw-workspace" | |
| 117 | + | default_provider = "local" | |
| 118 | + | default_model = "mock" | |
| 119 | + | ||
| 120 | + | [security] | |
| 121 | + | autonomy = "supervised" | |
| 122 | + | workspace_only = true | |
| 123 | + | forbidden_paths = ["/etc", "/sys", "/boot", "~/.ssh"] | |
| 124 | + | forbidden_commands = ["rm", "shutdown", "reboot", "mkfs", "dd"] | |
| 125 | + | audit_log = true | |
| 126 | + | ||
| 127 | + | [providers.models.local] | |
| 128 | + | kind = "mock" | |
| 129 | + | model = "mock" | |
| 130 | + | ||
| 131 | + | [providers.models.openai_compatible] | |
| 132 | + | kind = "openai-compatible" | |
| 133 | + | base_url = "http://localhost:1234/v1" | |
| 134 | + | model = "local-model" | |
| 135 | + | api_key_env = "OPENAI_API_KEY" | |
| 136 | + | ||
| 137 | + | [channels.cli] | |
| 138 | + | enabled = true | |
| 139 | + | tools_allow = ["file_read", "file_list", "time", "memory_search", "shell"] | |
| 140 | + | ||
| 141 | + | [memory] | |
| 142 | + | backend = "sqlite" | |
| 143 | + | path = "~/.logicclaw/memory.sqlite" | |
| 144 | + | ||
| 145 | + | [receipts] | |
| 146 | + | enabled = true | |
| 147 | + | path = "~/.logicclaw/tool_receipts.log" | |
| 148 | + | ||
| 149 | + | Adjust paths to match the application name you chose. | |
| 150 | + | ||
| 151 | + | Config requirements: | |
| 152 | + | ||
| 153 | + | - load defaults when keys are absent | |
| 154 | + | - expand ~ and environment variables | |
| 155 | + | - validate enum values | |
| 156 | + | - validate that workspace exists or create it during init | |
| 157 | + | - do not require API keys for mock mode | |
| 158 | + | - support provider credentials by environment variable | |
| 159 | + | - never print secret values in logs or config dumps | |
| 160 | + | - config validate must report all detected errors in one pass when practical | |
| 161 | + | ||
| 162 | + | Provider abstraction: | |
| 163 | + | ||
| 164 | + | Create an idiomatic equivalent of: | |
| 165 | + | ||
| 166 | + | Provider | |
| 167 | + | name() -> string | |
| 168 | + | capabilities() -> ProviderCapabilities | |
| 169 | + | chat(request: ChatRequest) -> ChatResponse | |
| 170 | + | ||
| 171 | + | ChatRequest must contain: | |
| 172 | + | ||
| 173 | + | - system_prompt | |
| 174 | + | - messages | |
| 175 | + | - tools | |
| 176 | + | - model | |
| 177 | + | - optional temperature | |
| 178 | + | - optional metadata | |
| 179 | + | ||
| 180 | + | ChatResponse must contain: | |
| 181 | + | ||
| 182 | + | - final_text | |
| 183 | + | - tool_calls | |
| 184 | + | - optional raw_provider_payload | |
| 185 | + | - optional usage | |
| 186 | + | ||
| 187 | + | Required providers: | |
| 188 | + | ||
| 189 | + | mock | |
| 190 | + | Deterministic provider used for tests. It must be able to return ordinary text and tool calls from scripted fixtures. | |
| 191 | + | ||
| 192 | + | openai-compatible | |
| 193 | + | Sends requests to an OpenAI-compatible /chat/completions endpoint. Full support for every provider is not required. Implement non-streaming chat completion. Tool/function call support is required if reasonably practical in LANGUAGE; otherwise document the limitation clearly. | |
| 194 | + | ||
| 195 | + | Optional providers: | |
| 196 | + | ||
| 197 | + | reliable | |
| 198 | + | Wrapper provider that tries provider names in order and falls back on network/auth/timeout errors. | |
| 199 | + | ||
| 200 | + | router | |
| 201 | + | Wrapper provider that chooses a provider from request metadata hints. | |
| 202 | + | ||
| 203 | + | Channel abstraction: | |
| 204 | + | ||
| 205 | + | Create an idiomatic equivalent of: | |
| 206 | + | ||
| 207 | + | Channel | |
| 208 | + | name() -> string | |
| 209 | + | start(runtime_handle) | |
| 210 | + | send(conversation_id, message) | |
| 211 | + | supports_draft_updates() -> bool | |
| 212 | + | ||
| 213 | + | Required channel: | |
| 214 | + | ||
| 215 | + | cli | |
| 216 | + | ||
| 217 | + | CLI behavior: | |
| 218 | + | ||
| 219 | + | app agent | |
| 220 | + | starts a REPL | |
| 221 | + | ||
| 222 | + | app agent -m "message" | |
| 223 | + | runs one turn and exits | |
| 224 | + | ||
| 225 | + | REPL commands: | |
| 226 | + | ||
| 227 | + | /exit | |
| 228 | + | exits | |
| 229 | + | ||
| 230 | + | /tools | |
| 231 | + | lists active tools | |
| 232 | + | ||
| 233 | + | /memory <query> | |
| 234 | + | searches memory | |
| 235 | + | ||
| 236 | + | /policy | |
| 237 | + | prints current autonomy and workspace boundary | |
| 238 | + | ||
| 239 | + | Optional gateway channel: | |
| 240 | + | ||
| 241 | + | localhost HTTP server | |
| 242 | + | ||
| 243 | + | Minimum optional gateway endpoints: | |
| 244 | + | ||
| 245 | + | GET /health | |
| 246 | + | GET /status | |
| 247 | + | GET /tools | |
| 248 | + | POST /chat | |
| 249 | + | GET /memory/search?q=... | |
| 250 | + | GET /receipts | |
| 251 | + | POST /estop | |
| 252 | + | ||
| 253 | + | Tool abstraction: | |
| 254 | + | ||
| 255 | + | Create an idiomatic equivalent of: | |
| 256 | + | ||
| 257 | + | Tool | |
| 258 | + | name() -> string | |
| 259 | + | description() -> string | |
| 260 | + | parameters_schema() -> JSON Schema object or equivalent metadata | |
| 261 | + | risk(args, context) -> low | medium | high | |
| 262 | + | invoke(args, context) -> ToolResult | |
| 263 | + | ||
| 264 | + | ToolResult must contain: | |
| 265 | + | ||
| 266 | + | - success: bool | |
| 267 | + | - output: string | |
| 268 | + | - optional error: string | |
| 269 | + | - optional metadata | |
| 270 | + | - optional receipt_id | |
| 271 | + | ||
| 272 | + | Required built-in tools: | |
| 273 | + | ||
| 274 | + | time | |
| 275 | + | Returns current local time, UTC time, and timezone if available. | |
| 276 | + | ||
| 277 | + | file_list | |
| 278 | + | Lists files under a path inside workspace. | |
| 279 | + | ||
| 280 | + | file_read | |
| 281 | + | Reads a UTF-8 text file inside workspace. | |
| 282 | + | ||
| 283 | + | file_write | |
| 284 | + | Writes a UTF-8 text file inside workspace. | |
| 285 | + | ||
| 286 | + | shell | |
| 287 | + | Executes a shell command inside workspace, subject to security policy. | |
| 288 | + | ||
| 289 | + | http | |
| 290 | + | Performs HTTP GET. POST is optional. | |
| 291 | + | ||
| 292 | + | memory_search | |
| 293 | + | Searches persisted conversations. | |
| 294 | + | ||
| 295 | + | Optional tools: | |
| 296 | + | ||
| 297 | + | web_search | |
| 298 | + | May be stubbed unless a search API key is configured. | |
| 299 | + | ||
| 300 | + | pdf_extract | |
| 301 | + | Optional. | |
| 302 | + | ||
| 303 | + | ask_user | |
| 304 | + | In CLI mode, asks the user a question and returns the answer. | |
| 305 | + | ||
| 306 | + | Security model: | |
| 307 | + | ||
| 308 | + | Implement three autonomy levels: | |
| 309 | + | ||
| 310 | + | readonly | |
| 311 | + | Low-risk read-only tools allowed. | |
| 312 | + | No file_write. | |
| 313 | + | No shell execution except optionally harmless commands such as pwd. | |
| 314 | + | ||
| 315 | + | supervised | |
| 316 | + | Low-risk tools run automatically. | |
| 317 | + | Medium-risk tools require operator approval. | |
| 318 | + | High-risk tools are blocked. | |
| 319 | + | ||
| 320 | + | full | |
| 321 | + | Low and medium run automatically. | |
| 322 | + | High-risk is still blocked if explicitly forbidden by path or command policy. | |
| 323 | + | ||
| 324 | + | Default must be: | |
| 325 | + | ||
| 326 | + | supervised | |
| 327 | + | ||
| 328 | + | Risk rules: | |
| 329 | + | ||
| 330 | + | time, memory_search, file_list, file_read inside workspace: | |
| 331 | + | low | |
| 332 | + | ||
| 333 | + | http GET to allowed domains: | |
| 334 | + | low | |
| 335 | + | ||
| 336 | + | file_write inside workspace: | |
| 337 | + | medium | |
| 338 | + | ||
| 339 | + | shell command from allowlist: | |
| 340 | + | medium | |
| 341 | + | ||
| 342 | + | shell command not on allowlist: | |
| 343 | + | high | |
| 344 | + | ||
| 345 | + | any path outside workspace when workspace_only = true: | |
| 346 | + | blocked | |
| 347 | + | ||
| 348 | + | any path under forbidden_paths: | |
| 349 | + | blocked | |
| 350 | + | ||
| 351 | + | any command whose basename appears in forbidden_commands: | |
| 352 | + | blocked | |
| 353 | + | ||
| 354 | + | Any shell command containing obvious destructive patterns must be blocked. Minimum patterns: | |
| 355 | + | ||
| 356 | + | rm -rf / | |
| 357 | + | rm -rf * | |
| 358 | + | mkfs | |
| 359 | + | dd if= | |
| 360 | + | :(){ :|:& };: | |
| 361 | + | shutdown | |
| 362 | + | reboot | |
| 363 | + | chmod -R 777 / | |
| 364 | + | chown -R | |
| 365 | + | curl ... | sh | |
| 366 | + | wget ... | sh | |
| 367 | + | ||
| 368 | + | Approval flow in CLI mode: | |
| 369 | + | ||
| 370 | + | When a medium-risk action requires approval, print something like: | |
| 371 | + | ||
| 372 | + | Tool request: | |
| 373 | + | tool: file_write | |
| 374 | + | risk: medium | |
| 375 | + | reason: writes to workspace | |
| 376 | + | args: ... | |
| 377 | + | Approve? [y/N] | |
| 378 | + | ||
| 379 | + | Default is deny. | |
| 380 | + | ||
| 381 | + | Tool receipts: | |
| 382 | + | ||
| 383 | + | Every attempted tool invocation must produce a receipt whether it is allowed, denied, failed, or approved. | |
| 384 | + | ||
| 385 | + | Receipt fields: | |
| 386 | + | ||
| 387 | + | { | |
| 388 | + | "id": "receipt-...", | |
| 389 | + | "timestamp": "2026-05-12T14:00:00Z", | |
| 390 | + | "conversation_id": "...", | |
| 391 | + | "tool": "file_read", | |
| 392 | + | "args_hash": "...", | |
| 393 | + | "result_hash": "...", | |
| 394 | + | "status": "allowed|denied|failed", | |
| 395 | + | "risk": "low|medium|high", | |
| 396 | + | "previous_hash": "...", | |
| 397 | + | "receipt_hash": "..." | |
| 398 | + | } | |
| 399 | + | ||
| 400 | + | Receipt hash: | |
| 401 | + | ||
| 402 | + | receipt_hash = SHA256(canonical_json(receipt_without_receipt_hash)) | |
| 403 | + | ||
| 404 | + | Tamper-evident chain: | |
| 405 | + | ||
| 406 | + | - each receipt includes the previous receipt’s hash | |
| 407 | + | - receipt verify must replay the log | |
| 408 | + | - it must report the first broken link | |
| 409 | + | ||
| 410 | + | Optional stronger version: | |
| 411 | + | ||
| 412 | + | HMAC-SHA256 with a locally stored secret key | |
| 413 | + | ||
| 414 | + | Memory: | |
| 415 | + | ||
| 416 | + | Use SQLite if practical in LANGUAGE. Use JSONL only if SQLite support is impractical or broken. | |
| 417 | + | ||
| 418 | + | Persist: | |
| 419 | + | ||
| 420 | + | - conversation_id | |
| 421 | + | - turn_id | |
| 422 | + | - timestamp | |
| 423 | + | - role | |
| 424 | + | - content | |
| 425 | + | - tool_calls | |
| 426 | + | - tool_results | |
| 427 | + | - provider | |
| 428 | + | - model | |
| 429 | + | - metadata | |
| 430 | + | ||
| 431 | + | Required commands: | |
| 432 | + | ||
| 433 | + | app memory search QUERY | |
| 434 | + | app memory show CONVERSATION_ID | |
| 435 | + | app memory list | |
| 436 | + | app memory clear --yes | |
| 437 | + | ||
| 438 | + | Search may be simple substring search. | |
| 439 | + | ||
| 440 | + | Optional scoring: | |
| 441 | + | ||
| 442 | + | - tokenize query and content | |
| 443 | + | - rank by term frequency | |
| 444 | + | - boost recent conversations | |
| 445 | + | ||
| 446 | + | Agent loop: | |
| 447 | + | ||
| 448 | + | Implement this loop: | |
| 449 | + | ||
| 450 | + | 1. Receive user message from channel. | |
| 451 | + | 2. Create or resume conversation. | |
| 452 | + | 3. Load recent memory context. | |
| 453 | + | 4. Build system prompt. | |
| 454 | + | 5. Build tool schemas from active tools. | |
| 455 | + | 6. Call provider. | |
| 456 | + | 7. If provider returns text only, persist and reply. | |
| 457 | + | 8. If provider returns tool calls: | |
| 458 | + | a. For each tool call, classify risk. | |
| 459 | + | b. Validate policy. | |
| 460 | + | c. Ask approval when required. | |
| 461 | + | d. Invoke or deny. | |
| 462 | + | e. Write receipt. | |
| 463 | + | f. Persist tool call and result. | |
| 464 | + | 9. Send tool results back to provider. | |
| 465 | + | 10. Repeat until final text or max_tool_rounds is reached. | |
| 466 | + | 11. Persist final assistant response. | |
| 467 | + | 12. Reply to channel. | |
| 468 | + | ||
| 469 | + | Guardrails: | |
| 470 | + | ||
| 471 | + | max_tool_rounds default: | |
| 472 | + | 5 | |
| 473 | + | ||
| 474 | + | max_response_bytes default: | |
| 475 | + | 1 MB | |
| 476 | + | ||
| 477 | + | tool execution timeout default: | |
| 478 | + | 30 seconds | |
| 479 | + | ||
| 480 | + | shell timeout default: | |
| 481 | + | 15 seconds | |
| 482 | + | ||
| 483 | + | HTTP timeout default: | |
| 484 | + | 20 seconds | |
| 485 | + | ||
| 486 | + | The runtime must not recursively invoke tools forever. | |
| 487 | + | ||
| 488 | + | Required CLI command surface: | |
| 489 | + | ||
| 490 | + | app init | |
| 491 | + | app onboard | |
| 492 | + | app config validate | |
| 493 | + | app config show | |
| 494 | + | app provider list | |
| 495 | + | app provider test NAME | |
| 496 | + | app tool list | |
| 497 | + | app tool run NAME --json ARGS | |
| 498 | + | app agent | |
| 499 | + | app agent -m MESSAGE | |
| 500 | + | app memory list | |
| 501 | + | app memory search QUERY | |
| 502 | + | app memory show CONVERSATION_ID | |
| 503 | + | app receipt list | |
| 504 | + | app receipt verify | |
| 505 | + | app estop | |
| 506 | + | ||
| 507 | + | Optional commands: | |
| 508 | + | ||
| 509 | + | app service install | |
| 510 | + | app service start | |
| 511 | + | app service stop | |
| 512 | + | app service status | |
| 513 | + | app sop list | |
| 514 | + | app sop validate | |
| 515 | + | app sop run NAME | |
| 516 | + | app plugin list | |
| 517 | + | app plugin install PATH | |
| 518 | + | ||
| 519 | + | SOP engine, optional but valuable: | |
| 520 | + | ||
| 521 | + | Implement deterministic workflows loaded from: | |
| 522 | + | ||
| 523 | + | ~/.appname/workspace/sops/<name>/SOP.toml | |
| 524 | + | ||
| 525 | + | Minimum SOP format: | |
| 526 | + | ||
| 527 | + | name = "daily-check" | |
| 528 | + | description = "Run a daily workspace check" | |
| 529 | + | ||
| 530 | + | [[steps]] | |
| 531 | + | id = "list" | |
| 532 | + | kind = "tool" | |
| 533 | + | tool = "file_list" | |
| 534 | + | args = { path = "." } | |
| 535 | + | ||
| 536 | + | [[steps]] | |
| 537 | + | id = "summarize" | |
| 538 | + | kind = "agent" | |
| 539 | + | prompt = "Summarize the file list from the previous step." | |
| 540 | + | ||
| 541 | + | [[steps]] | |
| 542 | + | id = "approval" | |
| 543 | + | kind = "approval" | |
| 544 | + | prompt = "Continue to write report?" | |
| 545 | + | ||
| 546 | + | [[steps]] | |
| 547 | + | id = "write" | |
| 548 | + | kind = "tool" | |
| 549 | + | tool = "file_write" | |
| 550 | + | args = { path = "daily-check.txt", content_from = "summarize" } | |
| 551 | + | ||
| 552 | + | Requirements: | |
| 553 | + | ||
| 554 | + | - validate step IDs are unique | |
| 555 | + | - validate referenced tools exist | |
| 556 | + | - persist SOP run state | |
| 557 | + | - stop at approval steps until approved | |
| 558 | + | - support on_failure = "abort" | |
| 559 | + | - support on_failure = "continue" | |
| 560 | + | ||
| 561 | + | Plugin system, stretch goal: | |
| 562 | + | ||
| 563 | + | A plugin is a directory: | |
| 564 | + | ||
| 565 | + | plugin-name/ | |
| 566 | + | manifest.toml | |
| 567 | + | executable-or-script | |
| 568 | + | ||
| 569 | + | Minimum manifest: | |
| 570 | + | ||
| 571 | + | name = "echo-plugin" | |
| 572 | + | version = "0.1.0" | |
| 573 | + | capabilities = ["tool"] | |
| 574 | + | ||
| 575 | + | [[tools]] | |
| 576 | + | name = "echo" | |
| 577 | + | description = "Echoes input" | |
| 578 | + | command = "./echo-plugin" | |
| 579 | + | schema = { type = "object" } | |
| 580 | + | ||
| 581 | + | The runtime discovers plugins under: | |
| 582 | + | ||
| 583 | + | ~/.appname/plugins/ | |
| 584 | + | ||
| 585 | + | Simpler acceptable version: | |
| 586 | + | ||
| 587 | + | Support external process tools where the runtime invokes a configured executable with JSON on stdin and reads JSON from stdout. | |
| 588 | + | ||
| 589 | + | Observability: | |
| 590 | + | ||
| 591 | + | Minimum logging: | |
| 592 | + | ||
| 593 | + | - human-readable logs to stderr | |
| 594 | + | - structured JSON logs when APPNAME_LOG=json, adjusted to the executable name | |
| 595 | + | - never log secrets | |
| 596 | + | ||
| 597 | + | Log events: | |
| 598 | + | ||
| 599 | + | - startup | |
| 600 | + | - config path | |
| 601 | + | - workspace path | |
| 602 | + | - provider selected | |
| 603 | + | - channel started | |
| 604 | + | - conversation started | |
| 605 | + | - tool requested | |
| 606 | + | - tool approved | |
| 607 | + | - tool denied | |
| 608 | + | - tool completed | |
| 609 | + | - tool failed | |
| 610 | + | - receipt written | |
| 611 | + | - memory persisted | |
| 612 | + | - estop triggered | |
| 613 | + | ||
| 614 | + | Optional metrics endpoint: | |
| 615 | + | ||
| 616 | + | GET /metrics | |
| 617 | + | ||
| 618 | + | Expose counters if the endpoint is implemented: | |
| 619 | + | ||
| 620 | + | app_conversations_total | |
| 621 | + | app_tool_calls_total | |
| 622 | + | app_tool_denials_total | |
| 623 | + | app_provider_errors_total | |
| 624 | + | app_receipt_chain_valid | |
| 625 | + | ||
| 626 | + | Emergency stop: | |
| 627 | + | ||
| 628 | + | app estop | |
| 629 | + | ||
| 630 | + | Creates: | |
| 631 | + | ||
| 632 | + | ~/.appname/ESTOP | |
| 633 | + | ||
| 634 | + | When this file exists: | |
| 635 | + | ||
| 636 | + | - no new tool calls may run | |
| 637 | + | - existing long-running shell/http tasks should be cancelled if possible | |
| 638 | + | - the agent may still answer text-only messages explaining that tool use is stopped | |
| 639 | + | ||
| 640 | + | app estop --clear | |
| 641 | + | ||
| 642 | + | Removes the file. | |
| 643 | + | ||
| 644 | + | Acceptance tests: | |
| 645 | + | ||
| 646 | + | Test 1: init creates expected files. | |
| 647 | + | ||
| 648 | + | Given no ~/.appname directory | |
| 649 | + | When app init runs | |
| 650 | + | Then ~/.appname/config file exists | |
| 651 | + | And memory database or memory JSONL exists | |
| 652 | + | And workspace_dir exists | |
| 653 | + | ||
| 654 | + | Test 2: config validation catches invalid autonomy. | |
| 655 | + | ||
| 656 | + | Given autonomy = "godmode" | |
| 657 | + | When app config validate runs | |
| 658 | + | Then exit code is nonzero | |
| 659 | + | And output mentions allowed values | |
| 660 | + | ||
| 661 | + | Test 3: mock provider text-only response. | |
| 662 | + | ||
| 663 | + | Given mock provider fixture returns "hello" | |
| 664 | + | When app agent -m "hi" runs | |
| 665 | + | Then stdout contains "hello" | |
| 666 | + | And memory contains the user and assistant turn | |
| 667 | + | ||
| 668 | + | Test 4: model-triggered file_list tool. | |
| 669 | + | ||
| 670 | + | Given mock provider fixture emits tool_call file_list { path = "." } | |
| 671 | + | When app agent -m "list files" runs | |
| 672 | + | Then file_list executes inside workspace | |
| 673 | + | And a tool receipt is written | |
| 674 | + | And final answer includes the file list summary | |
| 675 | + | ||
| 676 | + | Test 5: workspace escape blocked. | |
| 677 | + | ||
| 678 | + | Given workspace_only = true | |
| 679 | + | When model requests file_read { path = "/etc/passwd" } | |
| 680 | + | Then tool is denied | |
| 681 | + | And a denied receipt is written | |
| 682 | + | And the provider receives a tool error | |
| 683 | + | ||
| 684 | + | Test 6: supervised approval. | |
| 685 | + | ||
| 686 | + | Given autonomy = "supervised" | |
| 687 | + | When model requests file_write | |
| 688 | + | Then CLI asks for approval | |
| 689 | + | And default empty answer denies | |
| 690 | + | And "y" approves | |
| 691 | + | ||
| 692 | + | Test 7: forbidden command blocked. | |
| 693 | + | ||
| 694 | + | When model requests shell { command = "rm -rf /" } | |
| 695 | + | Then tool is blocked before execution | |
| 696 | + | And receipt status is denied | |
| 697 | + | ||
| 698 | + | Test 8: receipt chain detects tampering. | |
| 699 | + | ||
| 700 | + | Given three receipts exist | |
| 701 | + | When the second receipt is edited manually | |
| 702 | + | Then app receipt verify reports invalid chain at receipt 2 | |
| 703 | + | ||
| 704 | + | Test 9: provider fallback, if reliable provider is implemented. | |
| 705 | + | ||
| 706 | + | Given reliable provider = [bad_provider, mock_provider] | |
| 707 | + | And bad_provider times out | |
| 708 | + | When agent runs | |
| 709 | + | Then runtime logs fallback | |
| 710 | + | And response comes from mock_provider | |
| 711 | + | ||
| 712 | + | Test 10: memory search. | |
| 713 | + | ||
| 714 | + | Given a previous conversation contains "Aardvark adapter" | |
| 715 | + | When app memory search "aardvark" runs | |
| 716 | + | Then the previous conversation ID is returned | |
| 717 | + | ||
| 718 | + | Implementation priorities: | |
| 719 | + | ||
| 720 | + | First produce a working vertical slice: | |
| 721 | + | ||
| 722 | + | 1. application naming | |
| 723 | + | 2. init | |
| 724 | + | 3. config loading and validation | |
| 725 | + | 4. mock provider | |
| 726 | + | 5. CLI one-shot agent mode | |
| 727 | + | 6. tools: time, file_list, file_read | |
| 728 | + | 7. security policy for workspace paths | |
| 729 | + | 8. memory persistence | |
| 730 | + | 9. receipt writing and verification | |
| 731 | + | 10. tests | |
| 732 | + | ||
| 733 | + | Then add: | |
| 734 | + | ||
| 735 | + | 11. interactive REPL | |
| 736 | + | 12. file_write with approval | |
| 737 | + | 13. shell with blocking rules | |
| 738 | + | 14. HTTP GET tool | |
| 739 | + | 15. OpenAI-compatible provider | |
| 740 | + | 16. optional gateway | |
| 741 | + | 17. optional SOP engine | |
| 742 | + | 18. optional external-process plugins | |
| 743 | + | ||
| 744 | + | Quality requirements: | |
| 745 | + | ||
| 746 | + | - Keep the implementation idiomatic for LANGUAGE. | |
| 747 | + | - Do not quietly substitute another implementation language. | |
| 748 | + | - Do not use Python, JavaScript, Rust, or C as the primary implementation language. | |
| 749 | + | - Shell scripts are acceptable only for setup convenience. | |
| 750 | + | - Prefer simple, boring dependencies. | |
| 751 | + | - Write tests for denied actions, not just successful actions. | |
| 752 | + | - Keep secrets out of logs. | |
| 753 | + | - Keep workspace path handling strict and well-tested. | |
| 754 | + | - Use deterministic mock fixtures so tests do not require network access. | |
| 755 | + | - Update README.md with architecture, config, security policy, commands, and test instructions. | |
| 756 | + | ||
| 757 | + | Do not stop after creating stubs. Implement the core behavior. If a feature is not practical in LANGUAGE, document the limitation and implement the closest useful equivalent. | |