ClawClone.prompt - Opengist

Revision 271e99a6320e5db56c878619d069c893a8d1e49c

ClawClone.prompt · 17 KiB · Text Raw

You are now implementing the real application in this workspace. The target implementation language is LANGUAGE. This workspace should already contain a verified LANGUAGE project skeleton with basic support for CLI, HTTP, SQLite, config parsing, tests, and build/test commands. Begin by inspecting the existing workspace and README before changing anything. Your first task is to name the application. The placeholder name is: *Claw Replace the wildcard with a short, distinctive prefix suitable for this LANGUAGE implementation. Examples of the naming style: LispClaw BeamClaw LogicClaw CrystalClaw CobolClaw Do not use an existing project’s name or branding. Once you choose the name, use it consistently for: - executable name - README title - config directory - default workspace directory - log names - test names where appropriate If the selected name is not suitable for an executable on macOS, create a lowercase/kebab-case executable form and document the relationship. For example: Application name: LogicClaw Executable: logicclaw The goal is to build a small, local-first agent runtime. It should run as a single command-line application that can load configuration, talk to a model provider, expose a CLI agent loop, execute a small set of tools through a security gate, persist memory, and write tamper-evident tool receipts. Do not mention or depend on any external agent runtime project. Treat this as a clean-room implementation from this spec. Core workflow: app init app config validate app config show app provider list app provider test NAME app tool list app tool run NAME --json ARGS app agent app agent -m "What files are in this project?" app memory search "previous topic" app receipt verify app estop Use the final executable name you selected instead of `app`. The agent must: - accept input from the CLI channel - send the conversation to a configured model provider - advertise available tools to the model - parse tool calls from the model response - validate each tool call through a security policy - execute approved tools - feed tool results back into the model - persist the final exchange, tool calls, tool results, and receipts - return a final answer to the user Required architecture: The implementation must have visible separation of responsibility for these areas: runtime agent loop, request lifecycle, orchestration config config loading, validation, defaults, path expansion providers model provider abstraction and concrete providers channels CLI channel and optional HTTP/gateway channel tools time, file_list, file_read, file_write, shell, http, memory_search security autonomy levels, command/path policy, tool-risk classification memory SQLite persistence, or JSONL only if SQLite is impractical in LANGUAGE receipts tamper-evident tool-call receipts sop optional deterministic workflow runner service optional install/start/stop/status wrappers Do not force object-oriented structure if LANGUAGE is not object-oriented. Use idiomatic LANGUAGE design, but preserve the conceptual boundaries. Configuration: The application must use a user-editable config file. Default location should be based on the final app name, for example: ~/.logicclaw/config.toml TOML is preferred. JSON, INI, S-expression, or another idiomatic config format is acceptable if TOML support is weak in LANGUAGE. If not using TOML, document why. Minimum config shape: workspace_dir = "~/logicclaw-workspace" default_provider = "local" default_model = "mock" [security] autonomy = "supervised" workspace_only = true forbidden_paths = ["/etc", "/sys", "/boot", "~/.ssh"] forbidden_commands = ["rm", "shutdown", "reboot", "mkfs", "dd"] audit_log = true [providers.models.local] kind = "mock" model = "mock" [providers.models.openai_compatible] kind = "openai-compatible" base_url = "http://localhost:1234/v1" model = "local-model" api_key_env = "OPENAI_API_KEY" [channels.cli] enabled = true tools_allow = ["file_read", "file_list", "time", "memory_search", "shell"] [memory] backend = "sqlite" path = "~/.logicclaw/memory.sqlite" [receipts] enabled = true path = "~/.logicclaw/tool_receipts.log" Adjust paths to match the application name you chose. Config requirements: - load defaults when keys are absent - expand ~ and environment variables - validate enum values - validate that workspace exists or create it during init - do not require API keys for mock mode - support provider credentials by environment variable - never print secret values in logs or config dumps - config validate must report all detected errors in one pass when practical Provider abstraction: Create an idiomatic equivalent of: Provider name() -> string capabilities() -> ProviderCapabilities chat(request: ChatRequest) -> ChatResponse ChatRequest must contain: - system_prompt - messages - tools - model - optional temperature - optional metadata ChatResponse must contain: - final_text - tool_calls - optional raw_provider_payload - optional usage Required providers: mock Deterministic provider used for tests. It must be able to return ordinary text and tool calls from scripted fixtures. openai-compatible Sends requests to an OpenAI-compatible /chat/completions endpoint. Full support for every provider is not required. Implement non-streaming chat completion. Tool/function call support is required if reasonably practical in LANGUAGE; otherwise document the limitation clearly. Optional providers: reliable Wrapper provider that tries provider names in order and falls back on network/auth/timeout errors. router Wrapper provider that chooses a provider from request metadata hints. Channel abstraction: Create an idiomatic equivalent of: Channel name() -> string start(runtime_handle) send(conversation_id, message) supports_draft_updates() -> bool Required channel: cli CLI behavior: app agent starts a REPL app agent -m "message" runs one turn and exits REPL commands: /exit exits /tools lists active tools /memory <query> searches memory /policy prints current autonomy and workspace boundary Optional gateway channel: localhost HTTP server Minimum optional gateway endpoints: GET /health GET /status GET /tools POST /chat GET /memory/search?q=... GET /receipts POST /estop Tool abstraction: Create an idiomatic equivalent of: Tool name() -> string description() -> string parameters_schema() -> JSON Schema object or equivalent metadata risk(args, context) -> low | medium | high invoke(args, context) -> ToolResult ToolResult must contain: - success: bool - output: string - optional error: string - optional metadata - optional receipt_id Required built-in tools: time Returns current local time, UTC time, and timezone if available. file_list Lists files under a path inside workspace. file_read Reads a UTF-8 text file inside workspace. file_write Writes a UTF-8 text file inside workspace. shell Executes a shell command inside workspace, subject to security policy. http Performs HTTP GET. POST is optional. memory_search Searches persisted conversations. Optional tools: web_search May be stubbed unless a search API key is configured. pdf_extract Optional. ask_user In CLI mode, asks the user a question and returns the answer. Security model: Implement three autonomy levels: readonly Low-risk read-only tools allowed. No file_write. No shell execution except optionally harmless commands such as pwd. supervised Low-risk tools run automatically. Medium-risk tools require operator approval. High-risk tools are blocked. full Low and medium run automatically. High-risk is still blocked if explicitly forbidden by path or command policy. Default must be: supervised Risk rules: time, memory_search, file_list, file_read inside workspace: low http GET to allowed domains: low file_write inside workspace: medium shell command from allowlist: medium shell command not on allowlist: high any path outside workspace when workspace_only = true: blocked any path under forbidden_paths: blocked any command whose basename appears in forbidden_commands: blocked Any shell command containing obvious destructive patterns must be blocked. Minimum patterns: rm -rf / rm -rf * mkfs dd if= :(){ :|:& };: shutdown reboot chmod -R 777 / chown -R curl ... | sh wget ... | sh Approval flow in CLI mode: When a medium-risk action requires approval, print something like: Tool request: tool: file_write risk: medium reason: writes to workspace args: ... Approve? [y/N] Default is deny. Tool receipts: Every attempted tool invocation must produce a receipt whether it is allowed, denied, failed, or approved. Receipt fields: { "id": "receipt-...", "timestamp": "2026-05-12T14:00:00Z", "conversation_id": "...", "tool": "file_read", "args_hash": "...", "result_hash": "...", "status": "allowed|denied|failed", "risk": "low|medium|high", "previous_hash": "...", "receipt_hash": "..." } Receipt hash: receipt_hash = SHA256(canonical_json(receipt_without_receipt_hash)) Tamper-evident chain: - each receipt includes the previous receipt’s hash - receipt verify must replay the log - it must report the first broken link Optional stronger version: HMAC-SHA256 with a locally stored secret key Memory: Use SQLite if practical in LANGUAGE. Use JSONL only if SQLite support is impractical or broken. Persist: - conversation_id - turn_id - timestamp - role - content - tool_calls - tool_results - provider - model - metadata Required commands: app memory search QUERY app memory show CONVERSATION_ID app memory list app memory clear --yes Search may be simple substring search. Optional scoring: - tokenize query and content - rank by term frequency - boost recent conversations Agent loop: Implement this loop: 1. Receive user message from channel. 2. Create or resume conversation. 3. Load recent memory context. 4. Build system prompt. 5. Build tool schemas from active tools. 6. Call provider. 7. If provider returns text only, persist and reply. 8. If provider returns tool calls: a. For each tool call, classify risk. b. Validate policy. c. Ask approval when required. d. Invoke or deny. e. Write receipt. f. Persist tool call and result. 9. Send tool results back to provider. 10. Repeat until final text or max_tool_rounds is reached. 11. Persist final assistant response. 12. Reply to channel. Guardrails: max_tool_rounds default: 5 max_response_bytes default: 1 MB tool execution timeout default: 30 seconds shell timeout default: 15 seconds HTTP timeout default: 20 seconds The runtime must not recursively invoke tools forever. Required CLI command surface: app init app onboard app config validate app config show app provider list app provider test NAME app tool list app tool run NAME --json ARGS app agent app agent -m MESSAGE app memory list app memory search QUERY app memory show CONVERSATION_ID app receipt list app receipt verify app estop Optional commands: app service install app service start app service stop app service status app sop list app sop validate app sop run NAME app plugin list app plugin install PATH SOP engine, optional but valuable: Implement deterministic workflows loaded from: ~/.appname/workspace/sops/<name>/SOP.toml Minimum SOP format: name = "daily-check" description = "Run a daily workspace check" [[steps]] id = "list" kind = "tool" tool = "file_list" args = { path = "." } [[steps]] id = "summarize" kind = "agent" prompt = "Summarize the file list from the previous step." [[steps]] id = "approval" kind = "approval" prompt = "Continue to write report?" [[steps]] id = "write" kind = "tool" tool = "file_write" args = { path = "daily-check.txt", content_from = "summarize" } Requirements: - validate step IDs are unique - validate referenced tools exist - persist SOP run state - stop at approval steps until approved - support on_failure = "abort" - support on_failure = "continue" Plugin system, stretch goal: A plugin is a directory: plugin-name/ manifest.toml executable-or-script Minimum manifest: name = "echo-plugin" version = "0.1.0" capabilities = ["tool"] [[tools]] name = "echo" description = "Echoes input" command = "./echo-plugin" schema = { type = "object" } The runtime discovers plugins under: ~/.appname/plugins/ Simpler acceptable version: Support external process tools where the runtime invokes a configured executable with JSON on stdin and reads JSON from stdout. Observability: Minimum logging: - human-readable logs to stderr - structured JSON logs when APPNAME_LOG=json, adjusted to the executable name - never log secrets Log events: - startup - config path - workspace path - provider selected - channel started - conversation started - tool requested - tool approved - tool denied - tool completed - tool failed - receipt written - memory persisted - estop triggered Optional metrics endpoint: GET /metrics Expose counters if the endpoint is implemented: app_conversations_total app_tool_calls_total app_tool_denials_total app_provider_errors_total app_receipt_chain_valid Emergency stop: app estop Creates: ~/.appname/ESTOP When this file exists: - no new tool calls may run - existing long-running shell/http tasks should be cancelled if possible - the agent may still answer text-only messages explaining that tool use is stopped app estop --clear Removes the file. Acceptance tests: Test 1: init creates expected files. Given no ~/.appname directory When app init runs Then ~/.appname/config file exists And memory database or memory JSONL exists And workspace_dir exists Test 2: config validation catches invalid autonomy. Given autonomy = "godmode" When app config validate runs Then exit code is nonzero And output mentions allowed values Test 3: mock provider text-only response. Given mock provider fixture returns "hello" When app agent -m "hi" runs Then stdout contains "hello" And memory contains the user and assistant turn Test 4: model-triggered file_list tool. Given mock provider fixture emits tool_call file_list { path = "." } When app agent -m "list files" runs Then file_list executes inside workspace And a tool receipt is written And final answer includes the file list summary Test 5: workspace escape blocked. Given workspace_only = true When model requests file_read { path = "/etc/passwd" } Then tool is denied And a denied receipt is written And the provider receives a tool error Test 6: supervised approval. Given autonomy = "supervised" When model requests file_write Then CLI asks for approval And default empty answer denies And "y" approves Test 7: forbidden command blocked. When model requests shell { command = "rm -rf /" } Then tool is blocked before execution And receipt status is denied Test 8: receipt chain detects tampering. Given three receipts exist When the second receipt is edited manually Then app receipt verify reports invalid chain at receipt 2 Test 9: provider fallback, if reliable provider is implemented. Given reliable provider = [bad_provider, mock_provider] And bad_provider times out When agent runs Then runtime logs fallback And response comes from mock_provider Test 10: memory search. Given a previous conversation contains "Aardvark adapter" When app memory search "aardvark" runs Then the previous conversation ID is returned Implementation priorities: First produce a working vertical slice: 1. application naming 2. init 3. config loading and validation 4. mock provider 5. CLI one-shot agent mode 6. tools: time, file_list, file_read 7. security policy for workspace paths 8. memory persistence 9. receipt writing and verification 10. tests Then add: 11. interactive REPL 12. file_write with approval 13. shell with blocking rules 14. HTTP GET tool 15. OpenAI-compatible provider 16. optional gateway 17. optional SOP engine 18. optional external-process plugins Quality requirements: - Keep the implementation idiomatic for LANGUAGE. - Do not quietly substitute another implementation language. - Do not use Python, JavaScript, Rust, or C as the primary implementation language. - Shell scripts are acceptable only for setup convenience. - Prefer simple, boring dependencies. - Write tests for denied actions, not just successful actions. - Keep secrets out of logs. - Keep workspace path handling strict and well-tested. - Use deterministic mock fixtures so tests do not require network access. - Update README.md with architecture, config, security policy, commands, and test instructions. Do not stop after creating stubs. Implement the core behavior. If a feature is not practical in LANGUAGE, document the limitation and implement the closest useful equivalent.

1	You are now implementing the real application in this workspace.
2
3	The target implementation language is LANGUAGE.
4
5	This workspace should already contain a verified LANGUAGE project skeleton with basic support for CLI, HTTP, SQLite, config parsing, tests, and build/test commands. Begin by inspecting the existing workspace and README before changing anything.
6
7	Your first task is to name the application.
8
9	The placeholder name is:
10
11	*Claw
12
13	Replace the wildcard with a short, distinctive prefix suitable for this LANGUAGE implementation. Examples of the naming style:
14
15	LispClaw
16	BeamClaw
17	LogicClaw
18	CrystalClaw
19	CobolClaw
20
21	Do not use an existing project’s name or branding. Once you choose the name, use it consistently for:
22
23	- executable name
24	- README title
25	- config directory
26	- default workspace directory
27	- log names
28	- test names where appropriate
29
30	If the selected name is not suitable for an executable on macOS, create a lowercase/kebab-case executable form and document the relationship. For example:
31
32	Application name: LogicClaw
33	Executable: logicclaw
34
35	The goal is to build a small, local-first agent runtime. It should run as a single command-line application that can load configuration, talk to a model provider, expose a CLI agent loop, execute a small set of tools through a security gate, persist memory, and write tamper-evident tool receipts.
36
37	Do not mention or depend on any external agent runtime project. Treat this as a clean-room implementation from this spec.
38
39	Core workflow:
40
41	app init
42	app config validate
43	app config show
44	app provider list
45	app provider test NAME
46	app tool list
47	app tool run NAME --json ARGS
48	app agent
49	app agent -m "What files are in this project?"
50	app memory search "previous topic"
51	app receipt verify
52	app estop
53
54	Use the final executable name you selected instead of `app`.
55
56	The agent must:
57
58	- accept input from the CLI channel
59	- send the conversation to a configured model provider
60	- advertise available tools to the model
61	- parse tool calls from the model response
62	- validate each tool call through a security policy
63	- execute approved tools
64	- feed tool results back into the model
65	- persist the final exchange, tool calls, tool results, and receipts
66	- return a final answer to the user
67
68	Required architecture:
69
70	The implementation must have visible separation of responsibility for these areas:
71
72	runtime
73	agent loop, request lifecycle, orchestration
74
75	config
76	config loading, validation, defaults, path expansion
77
78	providers
79	model provider abstraction and concrete providers
80
81	channels
82	CLI channel and optional HTTP/gateway channel
83
84	tools
85	time, file_list, file_read, file_write, shell, http, memory_search
86
87	security
88	autonomy levels, command/path policy, tool-risk classification
89
90	memory
91	SQLite persistence, or JSONL only if SQLite is impractical in LANGUAGE
92
93	receipts
94	tamper-evident tool-call receipts
95
96	sop
97	optional deterministic workflow runner
98
99	service
100	optional install/start/stop/status wrappers
101
102	Do not force object-oriented structure if LANGUAGE is not object-oriented. Use idiomatic LANGUAGE design, but preserve the conceptual boundaries.
103
104	Configuration:
105
106	The application must use a user-editable config file.
107
108	Default location should be based on the final app name, for example:
109
110	~/.logicclaw/config.toml
111
112	TOML is preferred. JSON, INI, S-expression, or another idiomatic config format is acceptable if TOML support is weak in LANGUAGE. If not using TOML, document why.
113
114	Minimum config shape:
115
116	workspace_dir = "~/logicclaw-workspace"
117	default_provider = "local"
118	default_model = "mock"
119
120	[security]
121	autonomy = "supervised"
122	workspace_only = true
123	forbidden_paths = ["/etc", "/sys", "/boot", "~/.ssh"]
124	forbidden_commands = ["rm", "shutdown", "reboot", "mkfs", "dd"]
125	audit_log = true
126
127	[providers.models.local]
128	kind = "mock"
129	model = "mock"
130
131	[providers.models.openai_compatible]
132	kind = "openai-compatible"
133	base_url = "http://localhost:1234/v1"
134	model = "local-model"
135	api_key_env = "OPENAI_API_KEY"
136
137	[channels.cli]
138	enabled = true
139	tools_allow = ["file_read", "file_list", "time", "memory_search", "shell"]
140
141	[memory]
142	backend = "sqlite"
143	path = "~/.logicclaw/memory.sqlite"
144
145	[receipts]
146	enabled = true
147	path = "~/.logicclaw/tool_receipts.log"
148
149	Adjust paths to match the application name you chose.
150
151	Config requirements:
152
153	- load defaults when keys are absent
154	- expand ~ and environment variables
155	- validate enum values
156	- validate that workspace exists or create it during init
157	- do not require API keys for mock mode
158	- support provider credentials by environment variable
159	- never print secret values in logs or config dumps
160	- config validate must report all detected errors in one pass when practical
161
162	Provider abstraction:
163
164	Create an idiomatic equivalent of:
165
166	Provider
167	name() -> string
168	capabilities() -> ProviderCapabilities
169	chat(request: ChatRequest) -> ChatResponse
170
171	ChatRequest must contain:
172
173	- system_prompt
174	- messages
175	- tools
176	- model
177	- optional temperature
178	- optional metadata
179
180	ChatResponse must contain:
181
182	- final_text
183	- tool_calls
184	- optional raw_provider_payload
185	- optional usage
186
187	Required providers:
188
189	mock
190	Deterministic provider used for tests. It must be able to return ordinary text and tool calls from scripted fixtures.
191
192	openai-compatible
193	Sends requests to an OpenAI-compatible /chat/completions endpoint. Full support for every provider is not required. Implement non-streaming chat completion. Tool/function call support is required if reasonably practical in LANGUAGE; otherwise document the limitation clearly.
194
195	Optional providers:
196
197	reliable
198	Wrapper provider that tries provider names in order and falls back on network/auth/timeout errors.
199
200	router
201	Wrapper provider that chooses a provider from request metadata hints.
202
203	Channel abstraction:
204
205	Create an idiomatic equivalent of:
206
207	Channel
208	name() -> string
209	start(runtime_handle)
210	send(conversation_id, message)
211	supports_draft_updates() -> bool
212
213	Required channel:
214
215	cli
216
217	CLI behavior:
218
219	app agent
220	starts a REPL
221
222	app agent -m "message"
223	runs one turn and exits
224
225	REPL commands:
226
227	/exit
228	exits
229
230	/tools
231	lists active tools
232
233	/memory <query>
234	searches memory
235
236	/policy
237	prints current autonomy and workspace boundary
238
239	Optional gateway channel:
240
241	localhost HTTP server
242
243	Minimum optional gateway endpoints:
244
245	GET /health
246	GET /status
247	GET /tools
248	POST /chat
249	GET /memory/search?q=...
250	GET /receipts
251	POST /estop
252
253	Tool abstraction:
254
255	Create an idiomatic equivalent of:
256
257	Tool
258	name() -> string
259	description() -> string
260	parameters_schema() -> JSON Schema object or equivalent metadata
261	risk(args, context) -> low \| medium \| high
262	invoke(args, context) -> ToolResult
263
264	ToolResult must contain:
265
266	- success: bool
267	- output: string
268	- optional error: string
269	- optional metadata
270	- optional receipt_id
271
272	Required built-in tools:
273
274	time
275	Returns current local time, UTC time, and timezone if available.
276
277	file_list
278	Lists files under a path inside workspace.
279
280	file_read
281	Reads a UTF-8 text file inside workspace.
282
283	file_write
284	Writes a UTF-8 text file inside workspace.
285
286	shell
287	Executes a shell command inside workspace, subject to security policy.
288
289	http
290	Performs HTTP GET. POST is optional.
291
292	memory_search
293	Searches persisted conversations.
294
295	Optional tools:
296
297	web_search
298	May be stubbed unless a search API key is configured.
299
300	pdf_extract
301	Optional.
302
303	ask_user
304	In CLI mode, asks the user a question and returns the answer.
305
306	Security model:
307
308	Implement three autonomy levels:
309
310	readonly
311	Low-risk read-only tools allowed.
312	No file_write.
313	No shell execution except optionally harmless commands such as pwd.
314
315	supervised
316	Low-risk tools run automatically.
317	Medium-risk tools require operator approval.
318	High-risk tools are blocked.
319
320	full
321	Low and medium run automatically.
322	High-risk is still blocked if explicitly forbidden by path or command policy.
323
324	Default must be:
325
326	supervised
327
328	Risk rules:
329
330	time, memory_search, file_list, file_read inside workspace:
331	low
332
333	http GET to allowed domains:
334	low
335
336	file_write inside workspace:
337	medium
338
339	shell command from allowlist:
340	medium
341
342	shell command not on allowlist:
343	high
344
345	any path outside workspace when workspace_only = true:
346	blocked
347
348	any path under forbidden_paths:
349	blocked
350
351	any command whose basename appears in forbidden_commands:
352	blocked
353
354	Any shell command containing obvious destructive patterns must be blocked. Minimum patterns:
355
356	rm -rf /
357	rm -rf *
358	mkfs
359	dd if=
360	:(){ :\|:& };:
361	shutdown
362	reboot
363	chmod -R 777 /
364	chown -R
365	curl ... \| sh
366	wget ... \| sh
367
368	Approval flow in CLI mode:
369
370	When a medium-risk action requires approval, print something like:
371
372	Tool request:
373	tool: file_write
374	risk: medium
375	reason: writes to workspace
376	args: ...
377	Approve? [y/N]
378
379	Default is deny.
380
381	Tool receipts:
382
383	Every attempted tool invocation must produce a receipt whether it is allowed, denied, failed, or approved.
384
385	Receipt fields:
386
387	{
388	"id": "receipt-...",
389	"timestamp": "2026-05-12T14:00:00Z",
390	"conversation_id": "...",
391	"tool": "file_read",
392	"args_hash": "...",
393	"result_hash": "...",
394	"status": "allowed\|denied\|failed",
395	"risk": "low\|medium\|high",
396	"previous_hash": "...",
397	"receipt_hash": "..."
398	}
399
400	Receipt hash:
401
402	receipt_hash = SHA256(canonical_json(receipt_without_receipt_hash))
403
404	Tamper-evident chain:
405
406	- each receipt includes the previous receipt’s hash
407	- receipt verify must replay the log
408	- it must report the first broken link
409
410	Optional stronger version:
411
412	HMAC-SHA256 with a locally stored secret key
413
414	Memory:
415
416	Use SQLite if practical in LANGUAGE. Use JSONL only if SQLite support is impractical or broken.
417
418	Persist:
419
420	- conversation_id
421	- turn_id
422	- timestamp
423	- role
424	- content
425	- tool_calls
426	- tool_results
427	- provider
428	- model
429	- metadata
430
431	Required commands:
432
433	app memory search QUERY
434	app memory show CONVERSATION_ID
435	app memory list
436	app memory clear --yes
437
438	Search may be simple substring search.
439
440	Optional scoring:
441
442	- tokenize query and content
443	- rank by term frequency
444	- boost recent conversations
445
446	Agent loop:
447
448	Implement this loop:
449
450	1. Receive user message from channel.
451	2. Create or resume conversation.
452	3. Load recent memory context.
453	4. Build system prompt.
454	5. Build tool schemas from active tools.
455	6. Call provider.
456	7. If provider returns text only, persist and reply.
457	8. If provider returns tool calls:
458	a. For each tool call, classify risk.
459	b. Validate policy.
460	c. Ask approval when required.
461	d. Invoke or deny.
462	e. Write receipt.
463	f. Persist tool call and result.
464	9. Send tool results back to provider.
465	10. Repeat until final text or max_tool_rounds is reached.
466	11. Persist final assistant response.
467	12. Reply to channel.
468
469	Guardrails:
470
471	max_tool_rounds default:
472	5
473
474	max_response_bytes default:
475	1 MB
476
477	tool execution timeout default:
478	30 seconds
479
480	shell timeout default:
481	15 seconds
482
483	HTTP timeout default:
484	20 seconds
485
486	The runtime must not recursively invoke tools forever.
487
488	Required CLI command surface:
489
490	app init
491	app onboard
492	app config validate
493	app config show
494	app provider list
495	app provider test NAME
496	app tool list
497	app tool run NAME --json ARGS
498	app agent
499	app agent -m MESSAGE
500	app memory list
501	app memory search QUERY
502	app memory show CONVERSATION_ID
503	app receipt list
504	app receipt verify
505	app estop
506
507	Optional commands:
508
509	app service install
510	app service start
511	app service stop
512	app service status
513	app sop list
514	app sop validate
515	app sop run NAME
516	app plugin list
517	app plugin install PATH
518
519	SOP engine, optional but valuable:
520
521	Implement deterministic workflows loaded from:
522
523	~/.appname/workspace/sops/<name>/SOP.toml
524
525	Minimum SOP format:
526
527	name = "daily-check"
528	description = "Run a daily workspace check"
529
530	[[steps]]
531	id = "list"
532	kind = "tool"
533	tool = "file_list"
534	args = { path = "." }
535
536	[[steps]]
537	id = "summarize"
538	kind = "agent"
539	prompt = "Summarize the file list from the previous step."
540
541	[[steps]]
542	id = "approval"
543	kind = "approval"
544	prompt = "Continue to write report?"
545
546	[[steps]]
547	id = "write"
548	kind = "tool"
549	tool = "file_write"
550	args = { path = "daily-check.txt", content_from = "summarize" }
551
552	Requirements:
553
554	- validate step IDs are unique
555	- validate referenced tools exist
556	- persist SOP run state
557	- stop at approval steps until approved
558	- support on_failure = "abort"
559	- support on_failure = "continue"
560
561	Plugin system, stretch goal:
562
563	A plugin is a directory:
564
565	plugin-name/
566	manifest.toml
567	executable-or-script
568
569	Minimum manifest:
570
571	name = "echo-plugin"
572	version = "0.1.0"
573	capabilities = ["tool"]
574
575	[[tools]]
576	name = "echo"
577	description = "Echoes input"
578	command = "./echo-plugin"
579	schema = { type = "object" }
580
581	The runtime discovers plugins under:
582
583	~/.appname/plugins/
584
585	Simpler acceptable version:
586
587	Support external process tools where the runtime invokes a configured executable with JSON on stdin and reads JSON from stdout.
588
589	Observability:
590
591	Minimum logging:
592
593	- human-readable logs to stderr
594	- structured JSON logs when APPNAME_LOG=json, adjusted to the executable name
595	- never log secrets
596
597	Log events:
598
599	- startup
600	- config path
601	- workspace path
602	- provider selected
603	- channel started
604	- conversation started
605	- tool requested
606	- tool approved
607	- tool denied
608	- tool completed
609	- tool failed
610	- receipt written
611	- memory persisted
612	- estop triggered
613
614	Optional metrics endpoint:
615
616	GET /metrics
617
618	Expose counters if the endpoint is implemented:
619
620	app_conversations_total
621	app_tool_calls_total
622	app_tool_denials_total
623	app_provider_errors_total
624	app_receipt_chain_valid
625
626	Emergency stop:
627
628	app estop
629
630	Creates:
631
632	~/.appname/ESTOP
633
634	When this file exists:
635
636	- no new tool calls may run
637	- existing long-running shell/http tasks should be cancelled if possible
638	- the agent may still answer text-only messages explaining that tool use is stopped
639
640	app estop --clear
641
642	Removes the file.
643
644	Acceptance tests:
645
646	Test 1: init creates expected files.
647
648	Given no ~/.appname directory
649	When app init runs
650	Then ~/.appname/config file exists
651	And memory database or memory JSONL exists
652	And workspace_dir exists
653
654	Test 2: config validation catches invalid autonomy.
655
656	Given autonomy = "godmode"
657	When app config validate runs
658	Then exit code is nonzero
659	And output mentions allowed values
660
661	Test 3: mock provider text-only response.
662
663	Given mock provider fixture returns "hello"
664	When app agent -m "hi" runs
665	Then stdout contains "hello"
666	And memory contains the user and assistant turn
667
668	Test 4: model-triggered file_list tool.
669
670	Given mock provider fixture emits tool_call file_list { path = "." }
671	When app agent -m "list files" runs
672	Then file_list executes inside workspace
673	And a tool receipt is written
674	And final answer includes the file list summary
675
676	Test 5: workspace escape blocked.
677
678	Given workspace_only = true
679	When model requests file_read { path = "/etc/passwd" }
680	Then tool is denied
681	And a denied receipt is written
682	And the provider receives a tool error
683
684	Test 6: supervised approval.
685
686	Given autonomy = "supervised"
687	When model requests file_write
688	Then CLI asks for approval
689	And default empty answer denies
690	And "y" approves
691
692	Test 7: forbidden command blocked.
693
694	When model requests shell { command = "rm -rf /" }
695	Then tool is blocked before execution
696	And receipt status is denied
697
698	Test 8: receipt chain detects tampering.
699
700	Given three receipts exist
701	When the second receipt is edited manually
702	Then app receipt verify reports invalid chain at receipt 2
703
704	Test 9: provider fallback, if reliable provider is implemented.
705
706	Given reliable provider = [bad_provider, mock_provider]
707	And bad_provider times out
708	When agent runs
709	Then runtime logs fallback
710	And response comes from mock_provider
711
712	Test 10: memory search.
713
714	Given a previous conversation contains "Aardvark adapter"
715	When app memory search "aardvark" runs
716	Then the previous conversation ID is returned
717
718	Implementation priorities:
719
720	First produce a working vertical slice:
721
722	1. application naming
723	2. init
724	3. config loading and validation
725	4. mock provider
726	5. CLI one-shot agent mode
727	6. tools: time, file_list, file_read
728	7. security policy for workspace paths
729	8. memory persistence
730	9. receipt writing and verification
731	10. tests
732
733	Then add:
734
735	11. interactive REPL
736	12. file_write with approval
737	13. shell with blocking rules
738	14. HTTP GET tool
739	15. OpenAI-compatible provider
740	16. optional gateway
741	17. optional SOP engine
742	18. optional external-process plugins
743
744	Quality requirements:
745
746	- Keep the implementation idiomatic for LANGUAGE.
747	- Do not quietly substitute another implementation language.
748	- Do not use Python, JavaScript, Rust, or C as the primary implementation language.
749	- Shell scripts are acceptable only for setup convenience.
750	- Prefer simple, boring dependencies.
751	- Write tests for denied actions, not just successful actions.
752	- Keep secrets out of logs.
753	- Keep workspace path handling strict and well-tested.
754	- Use deterministic mock fixtures so tests do not require network access.
755	- Update README.md with architecture, config, security policy, commands, and test instructions.
756
757	Do not stop after creating stubs. Implement the core behavior. If a feature is not practical in LANGUAGE, document the limitation and implement the closest useful equivalent.