ClawClone.prompt - Opengist

ClawClone.prompt · 17 KiB · Text Raw

You are now implementing the real application in this workspace. The target implementation language is LANGUAGE. This workspace should already contain a verified LANGUAGE project skeleton with basic support for CLI, HTTP, SQLite, config parsing, tests, and build/test commands. Begin by inspecting the existing workspace and README before changing anything. Your first task is to name the application. The placeholder name is: *Claw Replace the wildcard with a short, distinctive prefix suitable for this implementation. Do not use an existing project’s name or branding. Once you choose the name, use it consistently for: - executable name - README title - config directory - default workspace directory - log names - test names where appropriate If the selected name is not suitable for an executable on macOS, create a lowercase/kebab-case executable form and document the relationship. For example: Application name: LogicClaw Executable: logicclaw The goal is to build a small, local-first agent runtime. It should run as a single command-line application that can load configuration, talk to a model provider, expose a CLI agent loop, execute a small set of tools through a security gate, persist memory, and write tamper-evident tool receipts. Do not mention or depend on any external agent runtime project. Treat this as a clean-room implementation from this spec. Core workflow: app init app config validate app config show app provider list app provider test NAME app tool list app tool run NAME --json ARGS app agent app agent -m "What files are in this project?" app memory search "previous topic" app receipt verify app estop Use the final executable name you selected instead of `app`. The agent must: - accept input from the CLI channel - send the conversation to a configured model provider - advertise available tools to the model - parse tool calls from the model response - validate each tool call through a security policy - execute approved tools - feed tool results back into the model - persist the final exchange, tool calls, tool results, and receipts - return a final answer to the user Required architecture: The implementation must have visible separation of responsibility for these areas: runtime agent loop, request lifecycle, orchestration config config loading, validation, defaults, path expansion providers model provider abstraction and concrete providers channels CLI channel and optional HTTP/gateway channel tools time, file_list, file_read, file_write, shell, http, memory_search security autonomy levels, command/path policy, tool-risk classification memory SQLite persistence, or JSONL only if SQLite is impractical in LANGUAGE receipts tamper-evident tool-call receipts sop optional deterministic workflow runner service optional install/start/stop/status wrappers Do not force object-oriented structure if LANGUAGE is not object-oriented. Use idiomatic LANGUAGE design, but preserve the conceptual boundaries. Configuration: The application must use a user-editable config file. Default location should be based on the final app name, for example: ~/.logicclaw/config.toml TOML is preferred. JSON, INI, S-expression, or another idiomatic config format is acceptable if TOML support is weak in LANGUAGE. If not using TOML, document why. Minimum config shape: workspace_dir = "~/logicclaw-workspace" default_provider = "local" default_model = "mock" [security] autonomy = "supervised" workspace_only = true forbidden_paths = ["/etc", "/sys", "/boot", "~/.ssh"] forbidden_commands = ["rm", "shutdown", "reboot", "mkfs", "dd"] audit_log = true [providers.models.local] kind = "mock" model = "mock" [providers.models.openai_compatible] kind = "openai-compatible" base_url = "http://localhost:1234/v1" model = "local-model" api_key_env = "OPENAI_API_KEY" [channels.cli] enabled = true tools_allow = ["file_read", "file_list", "time", "memory_search", "shell"] [memory] backend = "sqlite" path = "~/.logicclaw/memory.sqlite" [receipts] enabled = true path = "~/.logicclaw/tool_receipts.log" Adjust paths to match the application name you chose. Config requirements: - load defaults when keys are absent - expand ~ and environment variables - validate enum values - validate that workspace exists or create it during init - do not require API keys for mock mode - support provider credentials by environment variable - never print secret values in logs or config dumps - config validate must report all detected errors in one pass when practical Provider abstraction: Create an idiomatic equivalent of: Provider name() -> string capabilities() -> ProviderCapabilities chat(request: ChatRequest) -> ChatResponse ChatRequest must contain: - system_prompt - messages - tools - model - optional temperature - optional metadata ChatResponse must contain: - final_text - tool_calls - optional raw_provider_payload - optional usage Required providers: mock Deterministic provider used for tests. It must be able to return ordinary text and tool calls from scripted fixtures. openai-compatible Sends requests to an OpenAI-compatible /chat/completions endpoint. Full support for every provider is not required. Implement non-streaming chat completion. Tool/function call support is required if reasonably practical in LANGUAGE; otherwise document the limitation clearly. Optional providers: reliable Wrapper provider that tries provider names in order and falls back on network/auth/timeout errors. router Wrapper provider that chooses a provider from request metadata hints. Channel abstraction: Create an idiomatic equivalent of: Channel name() -> string start(runtime_handle) send(conversation_id, message) supports_draft_updates() -> bool Required channel: cli CLI behavior: app agent starts a REPL app agent -m "message" runs one turn and exits REPL commands: /exit exits /tools lists active tools /memory <query> searches memory /policy prints current autonomy and workspace boundary Optional gateway channel: localhost HTTP server Minimum optional gateway endpoints: GET /health GET /status GET /tools POST /chat GET /memory/search?q=... GET /receipts POST /estop Tool abstraction: Create an idiomatic equivalent of: Tool name() -> string description() -> string parameters_schema() -> JSON Schema object or equivalent metadata risk(args, context) -> low | medium | high invoke(args, context) -> ToolResult ToolResult must contain: - success: bool - output: string - optional error: string - optional metadata - optional receipt_id Required built-in tools: time Returns current local time, UTC time, and timezone if available. file_list Lists files under a path inside workspace. file_read Reads a UTF-8 text file inside workspace. file_write Writes a UTF-8 text file inside workspace. shell Executes a shell command inside workspace, subject to security policy. http Performs HTTP GET. POST is optional. memory_search Searches persisted conversations. Optional tools: web_search May be stubbed unless a search API key is configured. pdf_extract Optional. ask_user In CLI mode, asks the user a question and returns the answer. Security model: Implement three autonomy levels: readonly Low-risk read-only tools allowed. No file_write. No shell execution except optionally harmless commands such as pwd. supervised Low-risk tools run automatically. Medium-risk tools require operator approval. High-risk tools are blocked. full Low and medium run automatically. High-risk is still blocked if explicitly forbidden by path or command policy. Default must be: supervised Risk rules: time, memory_search, file_list, file_read inside workspace: low http GET to allowed domains: low file_write inside workspace: medium shell command from allowlist: medium shell command not on allowlist: high any path outside workspace when workspace_only = true: blocked any path under forbidden_paths: blocked any command whose basename appears in forbidden_commands: blocked Any shell command containing obvious destructive patterns must be blocked. Minimum patterns: rm -rf / rm -rf * mkfs dd if= :(){ :|:& };: shutdown reboot chmod -R 777 / chown -R curl ... | sh wget ... | sh Approval flow in CLI mode: When a medium-risk action requires approval, print something like: Tool request: tool: file_write risk: medium reason: writes to workspace args: ... Approve? [y/N] Default is deny. Tool receipts: Every attempted tool invocation must produce a receipt whether it is allowed, denied, failed, or approved. Receipt fields: { "id": "receipt-...", "timestamp": "2026-05-12T14:00:00Z", "conversation_id": "...", "tool": "file_read", "args_hash": "...", "result_hash": "...", "status": "allowed|denied|failed", "risk": "low|medium|high", "previous_hash": "...", "receipt_hash": "..." } Receipt hash: receipt_hash = SHA256(canonical_json(receipt_without_receipt_hash)) Tamper-evident chain: - each receipt includes the previous receipt’s hash - receipt verify must replay the log - it must report the first broken link Optional stronger version: HMAC-SHA256 with a locally stored secret key Memory: Use SQLite if practical in LANGUAGE. Use JSONL only if SQLite support is impractical or broken. Persist: - conversation_id - turn_id - timestamp - role - content - tool_calls - tool_results - provider - model - metadata Required commands: app memory search QUERY app memory show CONVERSATION_ID app memory list app memory clear --yes Search may be simple substring search. Optional scoring: - tokenize query and content - rank by term frequency - boost recent conversations Agent loop: Implement this loop: 1. Receive user message from channel. 2. Create or resume conversation. 3. Load recent memory context. 4. Build system prompt. 5. Build tool schemas from active tools. 6. Call provider. 7. If provider returns text only, persist and reply. 8. If provider returns tool calls: a. For each tool call, classify risk. b. Validate policy. c. Ask approval when required. d. Invoke or deny. e. Write receipt. f. Persist tool call and result. 9. Send tool results back to provider. 10. Repeat until final text or max_tool_rounds is reached. 11. Persist final assistant response. 12. Reply to channel. Guardrails: max_tool_rounds default: 5 max_response_bytes default: 1 MB tool execution timeout default: 30 seconds shell timeout default: 15 seconds HTTP timeout default: 20 seconds The runtime must not recursively invoke tools forever. Required CLI command surface: app init app onboard app config validate app config show app provider list app provider test NAME app tool list app tool run NAME --json ARGS app agent app agent -m MESSAGE app memory list app memory search QUERY app memory show CONVERSATION_ID app receipt list app receipt verify app estop Optional commands: app service install app service start app service stop app service status app sop list app sop validate app sop run NAME app plugin list app plugin install PATH SOP engine, optional but valuable: Implement deterministic workflows loaded from: ~/.appname/workspace/sops/<name>/SOP.toml Minimum SOP format: name = "daily-check" description = "Run a daily workspace check" [[steps]] id = "list" kind = "tool" tool = "file_list" args = { path = "." } [[steps]] id = "summarize" kind = "agent" prompt = "Summarize the file list from the previous step." [[steps]] id = "approval" kind = "approval" prompt = "Continue to write report?" [[steps]] id = "write" kind = "tool" tool = "file_write" args = { path = "daily-check.txt", content_from = "summarize" } Requirements: - validate step IDs are unique - validate referenced tools exist - persist SOP run state - stop at approval steps until approved - support on_failure = "abort" - support on_failure = "continue" Plugin system, stretch goal: A plugin is a directory: plugin-name/ manifest.toml executable-or-script Minimum manifest: name = "echo-plugin" version = "0.1.0" capabilities = ["tool"] [[tools]] name = "echo" description = "Echoes input" command = "./echo-plugin" schema = { type = "object" } The runtime discovers plugins under: ~/.appname/plugins/ Simpler acceptable version: Support external process tools where the runtime invokes a configured executable with JSON on stdin and reads JSON from stdout. Observability: Minimum logging: - human-readable logs to stderr - structured JSON logs when APPNAME_LOG=json, adjusted to the executable name - never log secrets Log events: - startup - config path - workspace path - provider selected - channel started - conversation started - tool requested - tool approved - tool denied - tool completed - tool failed - receipt written - memory persisted - estop triggered Optional metrics endpoint: GET /metrics Expose counters if the endpoint is implemented: app_conversations_total app_tool_calls_total app_tool_denials_total app_provider_errors_total app_receipt_chain_valid Emergency stop: app estop Creates: ~/.appname/ESTOP When this file exists: - no new tool calls may run - existing long-running shell/http tasks should be cancelled if possible - the agent may still answer text-only messages explaining that tool use is stopped app estop --clear Removes the file. Acceptance tests: Test 1: init creates expected files. Given no ~/.appname directory When app init runs Then ~/.appname/config file exists And memory database or memory JSONL exists And workspace_dir exists Test 2: config validation catches invalid autonomy. Given autonomy = "godmode" When app config validate runs Then exit code is nonzero And output mentions allowed values Test 3: mock provider text-only response. Given mock provider fixture returns "hello" When app agent -m "hi" runs Then stdout contains "hello" And memory contains the user and assistant turn Test 4: model-triggered file_list tool. Given mock provider fixture emits tool_call file_list { path = "." } When app agent -m "list files" runs Then file_list executes inside workspace And a tool receipt is written And final answer includes the file list summary Test 5: workspace escape blocked. Given workspace_only = true When model requests file_read { path = "/etc/passwd" } Then tool is denied And a denied receipt is written And the provider receives a tool error Test 6: supervised approval. Given autonomy = "supervised" When model requests file_write Then CLI asks for approval And default empty answer denies And "y" approves Test 7: forbidden command blocked. When model requests shell { command = "rm -rf /" } Then tool is blocked before execution And receipt status is denied Test 8: receipt chain detects tampering. Given three receipts exist When the second receipt is edited manually Then app receipt verify reports invalid chain at receipt 2 Test 9: provider fallback, if reliable provider is implemented. Given reliable provider = [bad_provider, mock_provider] And bad_provider times out When agent runs Then runtime logs fallback And response comes from mock_provider Test 10: memory search. Given a previous conversation contains "Aardvark adapter" When app memory search "aardvark" runs Then the previous conversation ID is returned Implementation priorities: First produce a working vertical slice: 1. application naming 2. init 3. config loading and validation 4. mock provider 5. CLI one-shot agent mode 6. tools: time, file_list, file_read 7. security policy for workspace paths 8. memory persistence 9. receipt writing and verification 10. tests Then add: 11. interactive REPL 12. file_write with approval 13. shell with blocking rules 14. HTTP GET tool 15. OpenAI-compatible provider 16. optional gateway 17. optional SOP engine 18. optional external-process plugins Quality requirements: - Keep the implementation idiomatic for LANGUAGE. - Do not quietly substitute another implementation language. - Do not use Python, JavaScript, Rust, or C as the primary implementation language. - Shell scripts are acceptable only for setup convenience. - Prefer simple, boring dependencies. - Write tests for denied actions, not just successful actions. - Keep secrets out of logs. - Keep workspace path handling strict and well-tested. - Use deterministic mock fixtures so tests do not require network access. - Update README.md with architecture, config, security policy, commands, and test instructions. Do not stop after creating stubs. Implement the core behavior. If a feature is not practical in LANGUAGE, document the limitation and implement the closest useful equivalent.

1	You are now implementing the real application in this workspace.
2
3	The target implementation language is LANGUAGE.
4
5	This workspace should already contain a verified LANGUAGE project skeleton with basic support for CLI, HTTP, SQLite, config parsing, tests, and build/test commands. Begin by inspecting the existing workspace and README before changing anything.
6
7	Your first task is to name the application.
8
9	The placeholder name is:
10
11	*Claw
12
13	Replace the wildcard with a short, distinctive prefix suitable for this implementation.
14
15	Do not use an existing project’s name or branding. Once you choose the name, use it consistently for:
16
17	- executable name
18	- README title
19	- config directory
20	- default workspace directory
21	- log names
22	- test names where appropriate
23
24	If the selected name is not suitable for an executable on macOS, create a lowercase/kebab-case executable form and document the relationship. For example:
25
26	Application name: LogicClaw
27	Executable: logicclaw
28
29	The goal is to build a small, local-first agent runtime. It should run as a single command-line application that can load configuration, talk to a model provider, expose a CLI agent loop, execute a small set of tools through a security gate, persist memory, and write tamper-evident tool receipts.
30
31	Do not mention or depend on any external agent runtime project. Treat this as a clean-room implementation from this spec.
32
33	Core workflow:
34
35	app init
36	app config validate
37	app config show
38	app provider list
39	app provider test NAME
40	app tool list
41	app tool run NAME --json ARGS
42	app agent
43	app agent -m "What files are in this project?"
44	app memory search "previous topic"
45	app receipt verify
46	app estop
47
48	Use the final executable name you selected instead of `app`.
49
50	The agent must:
51
52	- accept input from the CLI channel
53	- send the conversation to a configured model provider
54	- advertise available tools to the model
55	- parse tool calls from the model response
56	- validate each tool call through a security policy
57	- execute approved tools
58	- feed tool results back into the model
59	- persist the final exchange, tool calls, tool results, and receipts
60	- return a final answer to the user
61
62	Required architecture:
63
64	The implementation must have visible separation of responsibility for these areas:
65
66	runtime
67	agent loop, request lifecycle, orchestration
68
69	config
70	config loading, validation, defaults, path expansion
71
72	providers
73	model provider abstraction and concrete providers
74
75	channels
76	CLI channel and optional HTTP/gateway channel
77
78	tools
79	time, file_list, file_read, file_write, shell, http, memory_search
80
81	security
82	autonomy levels, command/path policy, tool-risk classification
83
84	memory
85	SQLite persistence, or JSONL only if SQLite is impractical in LANGUAGE
86
87	receipts
88	tamper-evident tool-call receipts
89
90	sop
91	optional deterministic workflow runner
92
93	service
94	optional install/start/stop/status wrappers
95
96	Do not force object-oriented structure if LANGUAGE is not object-oriented. Use idiomatic LANGUAGE design, but preserve the conceptual boundaries.
97
98	Configuration:
99
100	The application must use a user-editable config file.
101
102	Default location should be based on the final app name, for example:
103
104	~/.logicclaw/config.toml
105
106	TOML is preferred. JSON, INI, S-expression, or another idiomatic config format is acceptable if TOML support is weak in LANGUAGE. If not using TOML, document why.
107
108	Minimum config shape:
109
110	workspace_dir = "~/logicclaw-workspace"
111	default_provider = "local"
112	default_model = "mock"
113
114	[security]
115	autonomy = "supervised"
116	workspace_only = true
117	forbidden_paths = ["/etc", "/sys", "/boot", "~/.ssh"]
118	forbidden_commands = ["rm", "shutdown", "reboot", "mkfs", "dd"]
119	audit_log = true
120
121	[providers.models.local]
122	kind = "mock"
123	model = "mock"
124
125	[providers.models.openai_compatible]
126	kind = "openai-compatible"
127	base_url = "http://localhost:1234/v1"
128	model = "local-model"
129	api_key_env = "OPENAI_API_KEY"
130
131	[channels.cli]
132	enabled = true
133	tools_allow = ["file_read", "file_list", "time", "memory_search", "shell"]
134
135	[memory]
136	backend = "sqlite"
137	path = "~/.logicclaw/memory.sqlite"
138
139	[receipts]
140	enabled = true
141	path = "~/.logicclaw/tool_receipts.log"
142
143	Adjust paths to match the application name you chose.
144
145	Config requirements:
146
147	- load defaults when keys are absent
148	- expand ~ and environment variables
149	- validate enum values
150	- validate that workspace exists or create it during init
151	- do not require API keys for mock mode
152	- support provider credentials by environment variable
153	- never print secret values in logs or config dumps
154	- config validate must report all detected errors in one pass when practical
155
156	Provider abstraction:
157
158	Create an idiomatic equivalent of:
159
160	Provider
161	name() -> string
162	capabilities() -> ProviderCapabilities
163	chat(request: ChatRequest) -> ChatResponse
164
165	ChatRequest must contain:
166
167	- system_prompt
168	- messages
169	- tools
170	- model
171	- optional temperature
172	- optional metadata
173
174	ChatResponse must contain:
175
176	- final_text
177	- tool_calls
178	- optional raw_provider_payload
179	- optional usage
180
181	Required providers:
182
183	mock
184	Deterministic provider used for tests. It must be able to return ordinary text and tool calls from scripted fixtures.
185
186	openai-compatible
187	Sends requests to an OpenAI-compatible /chat/completions endpoint. Full support for every provider is not required. Implement non-streaming chat completion. Tool/function call support is required if reasonably practical in LANGUAGE; otherwise document the limitation clearly.
188
189	Optional providers:
190
191	reliable
192	Wrapper provider that tries provider names in order and falls back on network/auth/timeout errors.
193
194	router
195	Wrapper provider that chooses a provider from request metadata hints.
196
197	Channel abstraction:
198
199	Create an idiomatic equivalent of:
200
201	Channel
202	name() -> string
203	start(runtime_handle)
204	send(conversation_id, message)
205	supports_draft_updates() -> bool
206
207	Required channel:
208
209	cli
210
211	CLI behavior:
212
213	app agent
214	starts a REPL
215
216	app agent -m "message"
217	runs one turn and exits
218
219	REPL commands:
220
221	/exit
222	exits
223
224	/tools
225	lists active tools
226
227	/memory <query>
228	searches memory
229
230	/policy
231	prints current autonomy and workspace boundary
232
233	Optional gateway channel:
234
235	localhost HTTP server
236
237	Minimum optional gateway endpoints:
238
239	GET /health
240	GET /status
241	GET /tools
242	POST /chat
243	GET /memory/search?q=...
244	GET /receipts
245	POST /estop
246
247	Tool abstraction:
248
249	Create an idiomatic equivalent of:
250
251	Tool
252	name() -> string
253	description() -> string
254	parameters_schema() -> JSON Schema object or equivalent metadata
255	risk(args, context) -> low \| medium \| high
256	invoke(args, context) -> ToolResult
257
258	ToolResult must contain:
259
260	- success: bool
261	- output: string
262	- optional error: string
263	- optional metadata
264	- optional receipt_id
265
266	Required built-in tools:
267
268	time
269	Returns current local time, UTC time, and timezone if available.
270
271	file_list
272	Lists files under a path inside workspace.
273
274	file_read
275	Reads a UTF-8 text file inside workspace.
276
277	file_write
278	Writes a UTF-8 text file inside workspace.
279
280	shell
281	Executes a shell command inside workspace, subject to security policy.
282
283	http
284	Performs HTTP GET. POST is optional.
285
286	memory_search
287	Searches persisted conversations.
288
289	Optional tools:
290
291	web_search
292	May be stubbed unless a search API key is configured.
293
294	pdf_extract
295	Optional.
296
297	ask_user
298	In CLI mode, asks the user a question and returns the answer.
299
300	Security model:
301
302	Implement three autonomy levels:
303
304	readonly
305	Low-risk read-only tools allowed.
306	No file_write.
307	No shell execution except optionally harmless commands such as pwd.
308
309	supervised
310	Low-risk tools run automatically.
311	Medium-risk tools require operator approval.
312	High-risk tools are blocked.
313
314	full
315	Low and medium run automatically.
316	High-risk is still blocked if explicitly forbidden by path or command policy.
317
318	Default must be:
319
320	supervised
321
322	Risk rules:
323
324	time, memory_search, file_list, file_read inside workspace:
325	low
326
327	http GET to allowed domains:
328	low
329
330	file_write inside workspace:
331	medium
332
333	shell command from allowlist:
334	medium
335
336	shell command not on allowlist:
337	high
338
339	any path outside workspace when workspace_only = true:
340	blocked
341
342	any path under forbidden_paths:
343	blocked
344
345	any command whose basename appears in forbidden_commands:
346	blocked
347
348	Any shell command containing obvious destructive patterns must be blocked. Minimum patterns:
349
350	rm -rf /
351	rm -rf *
352	mkfs
353	dd if=
354	:(){ :\|:& };:
355	shutdown
356	reboot
357	chmod -R 777 /
358	chown -R
359	curl ... \| sh
360	wget ... \| sh
361
362	Approval flow in CLI mode:
363
364	When a medium-risk action requires approval, print something like:
365
366	Tool request:
367	tool: file_write
368	risk: medium
369	reason: writes to workspace
370	args: ...
371	Approve? [y/N]
372
373	Default is deny.
374
375	Tool receipts:
376
377	Every attempted tool invocation must produce a receipt whether it is allowed, denied, failed, or approved.
378
379	Receipt fields:
380
381	{
382	"id": "receipt-...",
383	"timestamp": "2026-05-12T14:00:00Z",
384	"conversation_id": "...",
385	"tool": "file_read",
386	"args_hash": "...",
387	"result_hash": "...",
388	"status": "allowed\|denied\|failed",
389	"risk": "low\|medium\|high",
390	"previous_hash": "...",
391	"receipt_hash": "..."
392	}
393
394	Receipt hash:
395
396	receipt_hash = SHA256(canonical_json(receipt_without_receipt_hash))
397
398	Tamper-evident chain:
399
400	- each receipt includes the previous receipt’s hash
401	- receipt verify must replay the log
402	- it must report the first broken link
403
404	Optional stronger version:
405
406	HMAC-SHA256 with a locally stored secret key
407
408	Memory:
409
410	Use SQLite if practical in LANGUAGE. Use JSONL only if SQLite support is impractical or broken.
411
412	Persist:
413
414	- conversation_id
415	- turn_id
416	- timestamp
417	- role
418	- content
419	- tool_calls
420	- tool_results
421	- provider
422	- model
423	- metadata
424
425	Required commands:
426
427	app memory search QUERY
428	app memory show CONVERSATION_ID
429	app memory list
430	app memory clear --yes
431
432	Search may be simple substring search.
433
434	Optional scoring:
435
436	- tokenize query and content
437	- rank by term frequency
438	- boost recent conversations
439
440	Agent loop:
441
442	Implement this loop:
443
444	1. Receive user message from channel.
445	2. Create or resume conversation.
446	3. Load recent memory context.
447	4. Build system prompt.
448	5. Build tool schemas from active tools.
449	6. Call provider.
450	7. If provider returns text only, persist and reply.
451	8. If provider returns tool calls:
452	a. For each tool call, classify risk.
453	b. Validate policy.
454	c. Ask approval when required.
455	d. Invoke or deny.
456	e. Write receipt.
457	f. Persist tool call and result.
458	9. Send tool results back to provider.
459	10. Repeat until final text or max_tool_rounds is reached.
460	11. Persist final assistant response.
461	12. Reply to channel.
462
463	Guardrails:
464
465	max_tool_rounds default:
466	5
467
468	max_response_bytes default:
469	1 MB
470
471	tool execution timeout default:
472	30 seconds
473
474	shell timeout default:
475	15 seconds
476
477	HTTP timeout default:
478	20 seconds
479
480	The runtime must not recursively invoke tools forever.
481
482	Required CLI command surface:
483
484	app init
485	app onboard
486	app config validate
487	app config show
488	app provider list
489	app provider test NAME
490	app tool list
491	app tool run NAME --json ARGS
492	app agent
493	app agent -m MESSAGE
494	app memory list
495	app memory search QUERY
496	app memory show CONVERSATION_ID
497	app receipt list
498	app receipt verify
499	app estop
500
501	Optional commands:
502
503	app service install
504	app service start
505	app service stop
506	app service status
507	app sop list
508	app sop validate
509	app sop run NAME
510	app plugin list
511	app plugin install PATH
512
513	SOP engine, optional but valuable:
514
515	Implement deterministic workflows loaded from:
516
517	~/.appname/workspace/sops/<name>/SOP.toml
518
519	Minimum SOP format:
520
521	name = "daily-check"
522	description = "Run a daily workspace check"
523
524	[[steps]]
525	id = "list"
526	kind = "tool"
527	tool = "file_list"
528	args = { path = "." }
529
530	[[steps]]
531	id = "summarize"
532	kind = "agent"
533	prompt = "Summarize the file list from the previous step."
534
535	[[steps]]
536	id = "approval"
537	kind = "approval"
538	prompt = "Continue to write report?"
539
540	[[steps]]
541	id = "write"
542	kind = "tool"
543	tool = "file_write"
544	args = { path = "daily-check.txt", content_from = "summarize" }
545
546	Requirements:
547
548	- validate step IDs are unique
549	- validate referenced tools exist
550	- persist SOP run state
551	- stop at approval steps until approved
552	- support on_failure = "abort"
553	- support on_failure = "continue"
554
555	Plugin system, stretch goal:
556
557	A plugin is a directory:
558
559	plugin-name/
560	manifest.toml
561	executable-or-script
562
563	Minimum manifest:
564
565	name = "echo-plugin"
566	version = "0.1.0"
567	capabilities = ["tool"]
568
569	[[tools]]
570	name = "echo"
571	description = "Echoes input"
572	command = "./echo-plugin"
573	schema = { type = "object" }
574
575	The runtime discovers plugins under:
576
577	~/.appname/plugins/
578
579	Simpler acceptable version:
580
581	Support external process tools where the runtime invokes a configured executable with JSON on stdin and reads JSON from stdout.
582
583	Observability:
584
585	Minimum logging:
586
587	- human-readable logs to stderr
588	- structured JSON logs when APPNAME_LOG=json, adjusted to the executable name
589	- never log secrets
590
591	Log events:
592
593	- startup
594	- config path
595	- workspace path
596	- provider selected
597	- channel started
598	- conversation started
599	- tool requested
600	- tool approved
601	- tool denied
602	- tool completed
603	- tool failed
604	- receipt written
605	- memory persisted
606	- estop triggered
607
608	Optional metrics endpoint:
609
610	GET /metrics
611
612	Expose counters if the endpoint is implemented:
613
614	app_conversations_total
615	app_tool_calls_total
616	app_tool_denials_total
617	app_provider_errors_total
618	app_receipt_chain_valid
619
620	Emergency stop:
621
622	app estop
623
624	Creates:
625
626	~/.appname/ESTOP
627
628	When this file exists:
629
630	- no new tool calls may run
631	- existing long-running shell/http tasks should be cancelled if possible
632	- the agent may still answer text-only messages explaining that tool use is stopped
633
634	app estop --clear
635
636	Removes the file.
637
638	Acceptance tests:
639
640	Test 1: init creates expected files.
641
642	Given no ~/.appname directory
643	When app init runs
644	Then ~/.appname/config file exists
645	And memory database or memory JSONL exists
646	And workspace_dir exists
647
648	Test 2: config validation catches invalid autonomy.
649
650	Given autonomy = "godmode"
651	When app config validate runs
652	Then exit code is nonzero
653	And output mentions allowed values
654
655	Test 3: mock provider text-only response.
656
657	Given mock provider fixture returns "hello"
658	When app agent -m "hi" runs
659	Then stdout contains "hello"
660	And memory contains the user and assistant turn
661
662	Test 4: model-triggered file_list tool.
663
664	Given mock provider fixture emits tool_call file_list { path = "." }
665	When app agent -m "list files" runs
666	Then file_list executes inside workspace
667	And a tool receipt is written
668	And final answer includes the file list summary
669
670	Test 5: workspace escape blocked.
671
672	Given workspace_only = true
673	When model requests file_read { path = "/etc/passwd" }
674	Then tool is denied
675	And a denied receipt is written
676	And the provider receives a tool error
677
678	Test 6: supervised approval.
679
680	Given autonomy = "supervised"
681	When model requests file_write
682	Then CLI asks for approval
683	And default empty answer denies
684	And "y" approves
685
686	Test 7: forbidden command blocked.
687
688	When model requests shell { command = "rm -rf /" }
689	Then tool is blocked before execution
690	And receipt status is denied
691
692	Test 8: receipt chain detects tampering.
693
694	Given three receipts exist
695	When the second receipt is edited manually
696	Then app receipt verify reports invalid chain at receipt 2
697
698	Test 9: provider fallback, if reliable provider is implemented.
699
700	Given reliable provider = [bad_provider, mock_provider]
701	And bad_provider times out
702	When agent runs
703	Then runtime logs fallback
704	And response comes from mock_provider
705
706	Test 10: memory search.
707
708	Given a previous conversation contains "Aardvark adapter"
709	When app memory search "aardvark" runs
710	Then the previous conversation ID is returned
711
712	Implementation priorities:
713
714	First produce a working vertical slice:
715
716	1. application naming
717	2. init
718	3. config loading and validation
719	4. mock provider
720	5. CLI one-shot agent mode
721	6. tools: time, file_list, file_read
722	7. security policy for workspace paths
723	8. memory persistence
724	9. receipt writing and verification
725	10. tests
726
727	Then add:
728
729	11. interactive REPL
730	12. file_write with approval
731	13. shell with blocking rules
732	14. HTTP GET tool
733	15. OpenAI-compatible provider
734	16. optional gateway
735	17. optional SOP engine
736	18. optional external-process plugins
737
738	Quality requirements:
739
740	- Keep the implementation idiomatic for LANGUAGE.
741	- Do not quietly substitute another implementation language.
742	- Do not use Python, JavaScript, Rust, or C as the primary implementation language.
743	- Shell scripts are acceptable only for setup convenience.
744	- Prefer simple, boring dependencies.
745	- Write tests for denied actions, not just successful actions.
746	- Keep secrets out of logs.
747	- Keep workspace path handling strict and well-tested.
748	- Use deterministic mock fixtures so tests do not require network access.
749	- Update README.md with architecture, config, security policy, commands, and test instructions.
750
751	Do not stop after creating stubs. Implement the core behavior. If a feature is not practical in LANGUAGE, document the limitation and implement the closest useful equivalent.