Last active 3 weeks ago

Revision 271e99a6320e5db56c878619d069c893a8d1e49c

ClawClone.prompt Raw
1You are now implementing the real application in this workspace.
2
3The target implementation language is LANGUAGE.
4
5This workspace should already contain a verified LANGUAGE project skeleton with basic support for CLI, HTTP, SQLite, config parsing, tests, and build/test commands. Begin by inspecting the existing workspace and README before changing anything.
6
7Your first task is to name the application.
8
9The placeholder name is:
10
11 *Claw
12
13Replace the wildcard with a short, distinctive prefix suitable for this LANGUAGE implementation. Examples of the naming style:
14
15 LispClaw
16 BeamClaw
17 LogicClaw
18 CrystalClaw
19 CobolClaw
20
21Do not use an existing project’s name or branding. Once you choose the name, use it consistently for:
22
23 - executable name
24 - README title
25 - config directory
26 - default workspace directory
27 - log names
28 - test names where appropriate
29
30If the selected name is not suitable for an executable on macOS, create a lowercase/kebab-case executable form and document the relationship. For example:
31
32 Application name: LogicClaw
33 Executable: logicclaw
34
35The goal is to build a small, local-first agent runtime. It should run as a single command-line application that can load configuration, talk to a model provider, expose a CLI agent loop, execute a small set of tools through a security gate, persist memory, and write tamper-evident tool receipts.
36
37Do not mention or depend on any external agent runtime project. Treat this as a clean-room implementation from this spec.
38
39Core workflow:
40
41 app init
42 app config validate
43 app config show
44 app provider list
45 app provider test NAME
46 app tool list
47 app tool run NAME --json ARGS
48 app agent
49 app agent -m "What files are in this project?"
50 app memory search "previous topic"
51 app receipt verify
52 app estop
53
54Use the final executable name you selected instead of `app`.
55
56The agent must:
57
58 - accept input from the CLI channel
59 - send the conversation to a configured model provider
60 - advertise available tools to the model
61 - parse tool calls from the model response
62 - validate each tool call through a security policy
63 - execute approved tools
64 - feed tool results back into the model
65 - persist the final exchange, tool calls, tool results, and receipts
66 - return a final answer to the user
67
68Required architecture:
69
70The implementation must have visible separation of responsibility for these areas:
71
72 runtime
73 agent loop, request lifecycle, orchestration
74
75 config
76 config loading, validation, defaults, path expansion
77
78 providers
79 model provider abstraction and concrete providers
80
81 channels
82 CLI channel and optional HTTP/gateway channel
83
84 tools
85 time, file_list, file_read, file_write, shell, http, memory_search
86
87 security
88 autonomy levels, command/path policy, tool-risk classification
89
90 memory
91 SQLite persistence, or JSONL only if SQLite is impractical in LANGUAGE
92
93 receipts
94 tamper-evident tool-call receipts
95
96 sop
97 optional deterministic workflow runner
98
99 service
100 optional install/start/stop/status wrappers
101
102Do not force object-oriented structure if LANGUAGE is not object-oriented. Use idiomatic LANGUAGE design, but preserve the conceptual boundaries.
103
104Configuration:
105
106The application must use a user-editable config file.
107
108Default location should be based on the final app name, for example:
109
110 ~/.logicclaw/config.toml
111
112TOML is preferred. JSON, INI, S-expression, or another idiomatic config format is acceptable if TOML support is weak in LANGUAGE. If not using TOML, document why.
113
114Minimum config shape:
115
116 workspace_dir = "~/logicclaw-workspace"
117 default_provider = "local"
118 default_model = "mock"
119
120 [security]
121 autonomy = "supervised"
122 workspace_only = true
123 forbidden_paths = ["/etc", "/sys", "/boot", "~/.ssh"]
124 forbidden_commands = ["rm", "shutdown", "reboot", "mkfs", "dd"]
125 audit_log = true
126
127 [providers.models.local]
128 kind = "mock"
129 model = "mock"
130
131 [providers.models.openai_compatible]
132 kind = "openai-compatible"
133 base_url = "http://localhost:1234/v1"
134 model = "local-model"
135 api_key_env = "OPENAI_API_KEY"
136
137 [channels.cli]
138 enabled = true
139 tools_allow = ["file_read", "file_list", "time", "memory_search", "shell"]
140
141 [memory]
142 backend = "sqlite"
143 path = "~/.logicclaw/memory.sqlite"
144
145 [receipts]
146 enabled = true
147 path = "~/.logicclaw/tool_receipts.log"
148
149Adjust paths to match the application name you chose.
150
151Config requirements:
152
153 - load defaults when keys are absent
154 - expand ~ and environment variables
155 - validate enum values
156 - validate that workspace exists or create it during init
157 - do not require API keys for mock mode
158 - support provider credentials by environment variable
159 - never print secret values in logs or config dumps
160 - config validate must report all detected errors in one pass when practical
161
162Provider abstraction:
163
164Create an idiomatic equivalent of:
165
166 Provider
167 name() -> string
168 capabilities() -> ProviderCapabilities
169 chat(request: ChatRequest) -> ChatResponse
170
171ChatRequest must contain:
172
173 - system_prompt
174 - messages
175 - tools
176 - model
177 - optional temperature
178 - optional metadata
179
180ChatResponse must contain:
181
182 - final_text
183 - tool_calls
184 - optional raw_provider_payload
185 - optional usage
186
187Required providers:
188
189 mock
190 Deterministic provider used for tests. It must be able to return ordinary text and tool calls from scripted fixtures.
191
192 openai-compatible
193 Sends requests to an OpenAI-compatible /chat/completions endpoint. Full support for every provider is not required. Implement non-streaming chat completion. Tool/function call support is required if reasonably practical in LANGUAGE; otherwise document the limitation clearly.
194
195Optional providers:
196
197 reliable
198 Wrapper provider that tries provider names in order and falls back on network/auth/timeout errors.
199
200 router
201 Wrapper provider that chooses a provider from request metadata hints.
202
203Channel abstraction:
204
205Create an idiomatic equivalent of:
206
207 Channel
208 name() -> string
209 start(runtime_handle)
210 send(conversation_id, message)
211 supports_draft_updates() -> bool
212
213Required channel:
214
215 cli
216
217CLI behavior:
218
219 app agent
220 starts a REPL
221
222 app agent -m "message"
223 runs one turn and exits
224
225REPL commands:
226
227 /exit
228 exits
229
230 /tools
231 lists active tools
232
233 /memory <query>
234 searches memory
235
236 /policy
237 prints current autonomy and workspace boundary
238
239Optional gateway channel:
240
241 localhost HTTP server
242
243Minimum optional gateway endpoints:
244
245 GET /health
246 GET /status
247 GET /tools
248 POST /chat
249 GET /memory/search?q=...
250 GET /receipts
251 POST /estop
252
253Tool abstraction:
254
255Create an idiomatic equivalent of:
256
257 Tool
258 name() -> string
259 description() -> string
260 parameters_schema() -> JSON Schema object or equivalent metadata
261 risk(args, context) -> low | medium | high
262 invoke(args, context) -> ToolResult
263
264ToolResult must contain:
265
266 - success: bool
267 - output: string
268 - optional error: string
269 - optional metadata
270 - optional receipt_id
271
272Required built-in tools:
273
274 time
275 Returns current local time, UTC time, and timezone if available.
276
277 file_list
278 Lists files under a path inside workspace.
279
280 file_read
281 Reads a UTF-8 text file inside workspace.
282
283 file_write
284 Writes a UTF-8 text file inside workspace.
285
286 shell
287 Executes a shell command inside workspace, subject to security policy.
288
289 http
290 Performs HTTP GET. POST is optional.
291
292 memory_search
293 Searches persisted conversations.
294
295Optional tools:
296
297 web_search
298 May be stubbed unless a search API key is configured.
299
300 pdf_extract
301 Optional.
302
303 ask_user
304 In CLI mode, asks the user a question and returns the answer.
305
306Security model:
307
308Implement three autonomy levels:
309
310 readonly
311 Low-risk read-only tools allowed.
312 No file_write.
313 No shell execution except optionally harmless commands such as pwd.
314
315 supervised
316 Low-risk tools run automatically.
317 Medium-risk tools require operator approval.
318 High-risk tools are blocked.
319
320 full
321 Low and medium run automatically.
322 High-risk is still blocked if explicitly forbidden by path or command policy.
323
324Default must be:
325
326 supervised
327
328Risk rules:
329
330 time, memory_search, file_list, file_read inside workspace:
331 low
332
333 http GET to allowed domains:
334 low
335
336 file_write inside workspace:
337 medium
338
339 shell command from allowlist:
340 medium
341
342 shell command not on allowlist:
343 high
344
345 any path outside workspace when workspace_only = true:
346 blocked
347
348 any path under forbidden_paths:
349 blocked
350
351 any command whose basename appears in forbidden_commands:
352 blocked
353
354Any shell command containing obvious destructive patterns must be blocked. Minimum patterns:
355
356 rm -rf /
357 rm -rf *
358 mkfs
359 dd if=
360 :(){ :|:& };:
361 shutdown
362 reboot
363 chmod -R 777 /
364 chown -R
365 curl ... | sh
366 wget ... | sh
367
368Approval flow in CLI mode:
369
370When a medium-risk action requires approval, print something like:
371
372 Tool request:
373 tool: file_write
374 risk: medium
375 reason: writes to workspace
376 args: ...
377 Approve? [y/N]
378
379Default is deny.
380
381Tool receipts:
382
383Every attempted tool invocation must produce a receipt whether it is allowed, denied, failed, or approved.
384
385Receipt fields:
386
387 {
388 "id": "receipt-...",
389 "timestamp": "2026-05-12T14:00:00Z",
390 "conversation_id": "...",
391 "tool": "file_read",
392 "args_hash": "...",
393 "result_hash": "...",
394 "status": "allowed|denied|failed",
395 "risk": "low|medium|high",
396 "previous_hash": "...",
397 "receipt_hash": "..."
398 }
399
400Receipt hash:
401
402 receipt_hash = SHA256(canonical_json(receipt_without_receipt_hash))
403
404Tamper-evident chain:
405
406 - each receipt includes the previous receipt’s hash
407 - receipt verify must replay the log
408 - it must report the first broken link
409
410Optional stronger version:
411
412 HMAC-SHA256 with a locally stored secret key
413
414Memory:
415
416Use SQLite if practical in LANGUAGE. Use JSONL only if SQLite support is impractical or broken.
417
418Persist:
419
420 - conversation_id
421 - turn_id
422 - timestamp
423 - role
424 - content
425 - tool_calls
426 - tool_results
427 - provider
428 - model
429 - metadata
430
431Required commands:
432
433 app memory search QUERY
434 app memory show CONVERSATION_ID
435 app memory list
436 app memory clear --yes
437
438Search may be simple substring search.
439
440Optional scoring:
441
442 - tokenize query and content
443 - rank by term frequency
444 - boost recent conversations
445
446Agent loop:
447
448Implement this loop:
449
450 1. Receive user message from channel.
451 2. Create or resume conversation.
452 3. Load recent memory context.
453 4. Build system prompt.
454 5. Build tool schemas from active tools.
455 6. Call provider.
456 7. If provider returns text only, persist and reply.
457 8. If provider returns tool calls:
458 a. For each tool call, classify risk.
459 b. Validate policy.
460 c. Ask approval when required.
461 d. Invoke or deny.
462 e. Write receipt.
463 f. Persist tool call and result.
464 9. Send tool results back to provider.
465 10. Repeat until final text or max_tool_rounds is reached.
466 11. Persist final assistant response.
467 12. Reply to channel.
468
469Guardrails:
470
471 max_tool_rounds default:
472 5
473
474 max_response_bytes default:
475 1 MB
476
477 tool execution timeout default:
478 30 seconds
479
480 shell timeout default:
481 15 seconds
482
483 HTTP timeout default:
484 20 seconds
485
486The runtime must not recursively invoke tools forever.
487
488Required CLI command surface:
489
490 app init
491 app onboard
492 app config validate
493 app config show
494 app provider list
495 app provider test NAME
496 app tool list
497 app tool run NAME --json ARGS
498 app agent
499 app agent -m MESSAGE
500 app memory list
501 app memory search QUERY
502 app memory show CONVERSATION_ID
503 app receipt list
504 app receipt verify
505 app estop
506
507Optional commands:
508
509 app service install
510 app service start
511 app service stop
512 app service status
513 app sop list
514 app sop validate
515 app sop run NAME
516 app plugin list
517 app plugin install PATH
518
519SOP engine, optional but valuable:
520
521Implement deterministic workflows loaded from:
522
523 ~/.appname/workspace/sops/<name>/SOP.toml
524
525Minimum SOP format:
526
527 name = "daily-check"
528 description = "Run a daily workspace check"
529
530 [[steps]]
531 id = "list"
532 kind = "tool"
533 tool = "file_list"
534 args = { path = "." }
535
536 [[steps]]
537 id = "summarize"
538 kind = "agent"
539 prompt = "Summarize the file list from the previous step."
540
541 [[steps]]
542 id = "approval"
543 kind = "approval"
544 prompt = "Continue to write report?"
545
546 [[steps]]
547 id = "write"
548 kind = "tool"
549 tool = "file_write"
550 args = { path = "daily-check.txt", content_from = "summarize" }
551
552Requirements:
553
554 - validate step IDs are unique
555 - validate referenced tools exist
556 - persist SOP run state
557 - stop at approval steps until approved
558 - support on_failure = "abort"
559 - support on_failure = "continue"
560
561Plugin system, stretch goal:
562
563A plugin is a directory:
564
565 plugin-name/
566 manifest.toml
567 executable-or-script
568
569Minimum manifest:
570
571 name = "echo-plugin"
572 version = "0.1.0"
573 capabilities = ["tool"]
574
575 [[tools]]
576 name = "echo"
577 description = "Echoes input"
578 command = "./echo-plugin"
579 schema = { type = "object" }
580
581The runtime discovers plugins under:
582
583 ~/.appname/plugins/
584
585Simpler acceptable version:
586
587Support external process tools where the runtime invokes a configured executable with JSON on stdin and reads JSON from stdout.
588
589Observability:
590
591Minimum logging:
592
593 - human-readable logs to stderr
594 - structured JSON logs when APPNAME_LOG=json, adjusted to the executable name
595 - never log secrets
596
597Log events:
598
599 - startup
600 - config path
601 - workspace path
602 - provider selected
603 - channel started
604 - conversation started
605 - tool requested
606 - tool approved
607 - tool denied
608 - tool completed
609 - tool failed
610 - receipt written
611 - memory persisted
612 - estop triggered
613
614Optional metrics endpoint:
615
616 GET /metrics
617
618Expose counters if the endpoint is implemented:
619
620 app_conversations_total
621 app_tool_calls_total
622 app_tool_denials_total
623 app_provider_errors_total
624 app_receipt_chain_valid
625
626Emergency stop:
627
628 app estop
629
630Creates:
631
632 ~/.appname/ESTOP
633
634When this file exists:
635
636 - no new tool calls may run
637 - existing long-running shell/http tasks should be cancelled if possible
638 - the agent may still answer text-only messages explaining that tool use is stopped
639
640 app estop --clear
641
642Removes the file.
643
644Acceptance tests:
645
646Test 1: init creates expected files.
647
648 Given no ~/.appname directory
649 When app init runs
650 Then ~/.appname/config file exists
651 And memory database or memory JSONL exists
652 And workspace_dir exists
653
654Test 2: config validation catches invalid autonomy.
655
656 Given autonomy = "godmode"
657 When app config validate runs
658 Then exit code is nonzero
659 And output mentions allowed values
660
661Test 3: mock provider text-only response.
662
663 Given mock provider fixture returns "hello"
664 When app agent -m "hi" runs
665 Then stdout contains "hello"
666 And memory contains the user and assistant turn
667
668Test 4: model-triggered file_list tool.
669
670 Given mock provider fixture emits tool_call file_list { path = "." }
671 When app agent -m "list files" runs
672 Then file_list executes inside workspace
673 And a tool receipt is written
674 And final answer includes the file list summary
675
676Test 5: workspace escape blocked.
677
678 Given workspace_only = true
679 When model requests file_read { path = "/etc/passwd" }
680 Then tool is denied
681 And a denied receipt is written
682 And the provider receives a tool error
683
684Test 6: supervised approval.
685
686 Given autonomy = "supervised"
687 When model requests file_write
688 Then CLI asks for approval
689 And default empty answer denies
690 And "y" approves
691
692Test 7: forbidden command blocked.
693
694 When model requests shell { command = "rm -rf /" }
695 Then tool is blocked before execution
696 And receipt status is denied
697
698Test 8: receipt chain detects tampering.
699
700 Given three receipts exist
701 When the second receipt is edited manually
702 Then app receipt verify reports invalid chain at receipt 2
703
704Test 9: provider fallback, if reliable provider is implemented.
705
706 Given reliable provider = [bad_provider, mock_provider]
707 And bad_provider times out
708 When agent runs
709 Then runtime logs fallback
710 And response comes from mock_provider
711
712Test 10: memory search.
713
714 Given a previous conversation contains "Aardvark adapter"
715 When app memory search "aardvark" runs
716 Then the previous conversation ID is returned
717
718Implementation priorities:
719
720First produce a working vertical slice:
721
722 1. application naming
723 2. init
724 3. config loading and validation
725 4. mock provider
726 5. CLI one-shot agent mode
727 6. tools: time, file_list, file_read
728 7. security policy for workspace paths
729 8. memory persistence
730 9. receipt writing and verification
731 10. tests
732
733Then add:
734
735 11. interactive REPL
736 12. file_write with approval
737 13. shell with blocking rules
738 14. HTTP GET tool
739 15. OpenAI-compatible provider
740 16. optional gateway
741 17. optional SOP engine
742 18. optional external-process plugins
743
744Quality requirements:
745
746 - Keep the implementation idiomatic for LANGUAGE.
747 - Do not quietly substitute another implementation language.
748 - Do not use Python, JavaScript, Rust, or C as the primary implementation language.
749 - Shell scripts are acceptable only for setup convenience.
750 - Prefer simple, boring dependencies.
751 - Write tests for denied actions, not just successful actions.
752 - Keep secrets out of logs.
753 - Keep workspace path handling strict and well-tested.
754 - Use deterministic mock fixtures so tests do not require network access.
755 - Update README.md with architecture, config, security policy, commands, and test instructions.
756
757Do not stop after creating stubs. Implement the core behavior. If a feature is not practical in LANGUAGE, document the limitation and implement the closest useful equivalent.