HASHI v2.1: From Chat Bridge to Self-Evolving Multi-Agent Orchestra

Two weeks ago, I shared the first version of HASHI — a privacy-first bridge that let you chat with multiple AI agents through a single WhatsApp or Telegram account. It was Version 1.0: functional and fun.

Today, I am releasing HASHI v2.1, and I genuinely struggle to describe how much has changed. If v1 was a bridge for conversations, v2.1 is a self-evolving multi-agent orchestration platform — one that can design its own workflows, critique its own designs using a different AI vendor, learn from every run, and recover from failures automatically.

(illustration genrated by AI)

Let me walk you through what happened.

The v2.0 Foundation: Agents That Can Actually Do Things

Before we get to the headline feature, let me cover the v2.0 upgrades that made v2.1 possible. These shipped over the past two weeks:

🔧 Tool Execution Layer (11 Local Tools)

In v1, OpenRouter-backed agents could only talk. They could write beautiful prose about editing a file, but they couldn’t actually touch one. That’s fixed now.

Every OpenRouter agent can now execute 11 built-in tools: run shell commands, read and write files, apply patches, search the web (via Brave API), fetch URLs, make HTTP requests, list and kill processes, and even send Telegram messages. The bridge handles the tool loop — the model proposes a tool call, HASHI executes it locally, returns the result, and the model continues. Up to 15 iterations per turn.

This single change transformed HASHI from “a thing you talk to” into “a thing that gets stuff done.”

🌐 Browser Automation

All agents — regardless of backend — can now control a real web browser via Playwright. Six actions: screenshot, get text, get HTML, click elements, fill forms, and run arbitrary JavaScript. Two modes: standalone headless Chromium, or CDP mode that attaches to your already-running Chrome with all your cookies and sessions intact.

My agents use this daily to check dashboards, scrape pages, and interact with web apps that don’t have APIs.

💾 Pack & Go: USB Zero-Install Deployment

This one I’m particularly proud of. Run prepare_usb.bat (Windows) or prepare_usb.sh (macOS) on any machine with internet. It downloads an embedded Python runtime, installs all dependencies, and packages everything onto a USB drive. Hand that USB to anyone — they double-click a launcher and HASHI runs. No Python installation, no pip, no terminal, nothing.

I built this because I wanted to share HASHI with people who have never opened a command line in their lives. It works.

📺 TUI: Terminal Interface

Not everyone wants a browser open. tui.py gives you a split-panel terminal UI built with Textual — log stream on top (~80%), chat input on the bottom (~20%), status bar showing current agent and backend. It connects to the same orchestrator, so Telegram messages and TUI messages share the same session.

🧠 Vector Memory

HASHI now embeds conversation turns and memories using BGE-M3 (local ONNX inference, no API calls) and stores them in bridge_memory.sqlite with sqlite-vec for cosine similarity search. When you send a message, the bridge vectorizes it, retrieves the top-K most relevant memories, and injects them into the prompt. Your agents remember things without you having to remind them.

Other v2.0 Additions

Flex/Fixed Backend Switching — /backend switches between CLI and OpenRouter mid-conversation. No session restart needed.
Workbench Web UI — React + Vite local interface for multi-agent chat.
/dream Skill — Nightly AI memory consolidation. Your agent “sleeps,” reviews the day’s transcript, extracts important memories, and optionally updates its own personality file. Includes snapshot-based undo for morning rollback.
Process-Tree Stop — /stop now kills the entire subprocess tree using os.killpg(). No more zombie Node.js workers holding pipes open.
/retry Persistence — Resend your last prompt or re-run the agent’s last response.
/memory Command — Surgical memory control: pause injection, wipe stored data, check status.

The Main Event: Nagare Flow System (v2.1)

Everything above was the foundation. Now for the part that changes the game entirely.

Nagare (流れ, Japanese for “flow”) is HASHI’s multi-agent workflow orchestration engine. It coordinates multiple AI agents — potentially from different vendors — through a declarative pipeline, producing work that no single agent or prompt chain could achieve.

Why Does This Exist?

Every AI model, no matter how capable, operates inside a single reasoning session. Within that session, it cannot:

Run parallel sub-tasks with true separation of concerns
Call itself with a fresh perspective to critique its own output
Remember lessons from previous runs
Escalate only when necessary without pausing the whole conversation

For any task requiring more than 2-3 coherent reasoning steps, quality collapses. A brilliant translation model becomes inconsistent across chapters. A capable code writer misses cross-file implications. A thorough analyst ignores its own contradictions.

Nagare solves this at the architecture level — not by making a bigger model, but by coordinating many focused agents, each excellent at their narrow role.

The 12-Step Meta-Workflow

Here’s the killer feature: describe a task in natural language, and Nagare designs a complete multi-agent workflow for it automatically.

Say you tell it: “I want a workflow that takes academic papers, extracts key claims, searches for contradicting evidence, and writes a critical analysis report.”

Nagare’s meta-workflow will:

Analyze requirements (Claude Opus) — deep task decomposition
Generate pre-flight questions (Claude Opus) — score each question on necessity × impact × clarity; only ask you the top 5
Integrate your answers (Claude Sonnet) — merge human input with smart defaults
Validate completeness (Claude Opus) — ensure nothing is missing before proceeding
Design the workflow (Claude Opus) — full YAML + DAG + rationale
Critique the design (GPT-5.4) — a Devil’s Advocate from a different AI vendor challenges every assumption
Create workflow files (Claude Opus) — materialize the validated design
Validate the YAML (Claude Sonnet) — format and schema check
Independent review (GPT-5.4) — cross-vendor audit
Evaluate and improve (GPT-5.4) — quality scoring + Knowledge Base update
Apply improvements (GPT-5.4) — low-risk fixes auto-applied; high-risk queued for approval
Notify completion (Claude Sonnet) — push notification with results

The entire pipeline runs in the background. You get notified when it’s done.

Cross-Vendor Anti-Bias: Why This Matters

This is the design decision I’m most proud of: Claude never evaluates Claude.

When a model writes something and then reviews it in the same session, it has already “committed” to its choices. Its review is biased. Nagare architecturally enforces independence: Claude designs, GPT critiques. Claude generates, GPT audits. This isn’t a convention you can forget to follow — it’s how the system is wired.

I haven’t seen any other open-source project do this systematically.

Pre-Flight: Ask Everything Once, Then Run Clean

Most AI workflows either require constant babysitting or make assumptions without asking. Nagare’s pre-flight system does something different:

Categorizes every unknown into three layers: design-time (must ask human), runtime (collected when the generated workflow runs), and implementation detail (use a smart default)
Scores each question on a 3-dimensional scale and filters to maximum 5 questions
If you don’t respond within 5 minutes, smart defaults kick in automatically

Once confirmed, the workflow runs uninterrupted. No mid-task “hey, what did you mean by…?” interruptions.

Self-Improving: The Evaluation Knowledge Base

Every workflow run feeds lessons back into an Evaluation Knowledge Base:

What patterns worked
What failures occurred
Model performance benchmarks per task type
Improvement proposals with confidence scores

Improvements are classified into three risk tiers:

Class	Risk	Action	Examples
A	Low	Auto-applied	Prompt rewording, timeout tweaks
B	Medium	Needs approval	Agent role changes, model substitution
C	High	Needs approval	New agents, DAG restructuring

The 201st workflow run is genuinely better than the 1st — because the previous 200 taught the system what works.

Crash Recovery and Debug Agents

Nagare uses atomic state persistence — write to tmp → fsync → rename — so if your machine crashes mid-workflow, you resume at the exact step that was interrupted, without re-running completed work.

When a step fails, a Debug Agent automatically analyzes the failure and retries with an adjusted prompt, up to 3 times. Only after 3 failures does it escalate to a human. In practice, most transient errors self-recover.

The Big Picture: Three Generations in Two Weeks

Version	What It Was	Released
v1.0	A chat bridge — talk to AI agents via WhatsApp/Telegram	Mar 15
v2.0	A tool platform — agents that can take real actions locally	Mar 23
v2.1	A self-evolving orchestration engine — agents that design, critique, and improve their own workflows	Mar 28

Each version didn’t just add features — it changed what the system fundamentally is.

Get Started

HASHI is open source under the MIT License.

GitHub: github.com/Bazza1982/HASHI
Requirements: Python 3.10+ and at least one AI backend (Claude CLI, Gemini CLI, Codex CLI, or an OpenRouter API key)
Quick start: Clone → pip install -r requirements.txt → python onboarding/onboarding_main.py
USB deployment: Run prepare_usb.bat / prepare_usb.sh → hand the USB to anyone

Honest Disclaimer

This is still a prototype built through vibe-coding. I’m a PhD candidate in sustainability assurance, not a software engineer. Every line of code was written by AI (Claude, Gemini, Codex) and cross-reviewed by AI, with me directing the architecture and making judgment calls.

It works. I use it every day — my agents check my email, manage my calendar, write code, and now orchestrate multi-step workflows autonomously. But expect edge cases, cryptic error messages, and the occasional surprise.

If you find bugs, the Issues page is always open.

Nagare — because the most capable model and the cleverest prompt are still just one voice. Orchestration is the difference between a monologue and a symphony.

Built with Vision. Written by AI. Directed by Human.

recent posts

Barry Li

recent posts

Barry Li