Band of Agents Hackathon · Judging Reference
Role specialization, shared context, task state, and visible coordination — each mapped directly to where it shows up in the build. Everything below is real, checkable, and built to be watched live.
At a glance
Judging criteria mapping
No abstractions — each claim below points to something a judge can actually watch happen in the Band room or read in the repo.
Six agents, six distinct jobs. No agent does another's work.
A defined schema chain — every agent gets exactly the structured context it needs.
Every review has a tracked lifecycle, not a single stateless pass.
All six agents in one Band room. Every handoff is a readable message.
Architecture
Reviews move left to right. Most exit through QA → Publish; flagged cases exit early through Escalation.
| # | Agent | Job | Model |
|---|---|---|---|
| 01 | Monitor | Polls for new reviews, surfaces them into Band | No LLM |
| 02 | Triage | Classifies sentiment & urgency, routes the case | Featherless |
| 03 | Research | Enriches with brand voice and business context | Featherless |
| 04 | Drafter | Generates the response; revises on QA rejection | DeepSeek-V4 |
| 05 | QA | Scores drafts; approves or loops back (max 2x) | DeepSeek-V4 |
| 06 | Escalation | Holds critical or flagged cases for human review — terminal stage, no auto-publish | Featherless |
The coordination showpiece
This is the single best moment to watch live — it's the clearest proof that coordination, not just generation, is happening.
Seed #04 is engineered to fail QA on the first pass — a verbose draft trips an explicit rejection rule — guaranteeing this loop is visible every time it's demoed, not left to chance. Capped at 2 iterations; a second rejection routes to Escalation instead of looping indefinitely.
Live demo guide
Each step is triggered with POST /inject?id=seed_XX for a reliable, repeatable run — no dependency on live review platforms.
Establishes the baseline — full pipeline, single pass, clean publish. Proves the mechanics work end-to-end before anything interesting happens.
Shows Research's enrichment actually shaping the response tone — not a templated reply, a context-aware one.
The centerpiece. QA rejects the first draft, Drafter revises, QA approves the second. The clearest live proof of agent coordination in the whole demo.
Shows the safety path — drafting is skipped entirely, Escalation fires immediately, and the case is held for a human. Nothing auto-publishes.
How to run
# 1. clone & install python deps git clone git@github.com:resilientbeast/review-response-system.git cd review-response-system uv sync # 2. frontend deps cd dashboard/frontend && npm install && cd ../.. # 3. configure cp .env.example .env cp agent_config.yaml.example agent_config.yaml # add BAND_API_KEY, OPENAI_API_KEY, etc. # 4. init local database sqlite3 data/reviews.db < migrations/001_initial.sql # run — three processes, separate terminals uv run python run_all.py # agent pipeline uv run uvicorn dashboard.bridge:app --port 8001 # SSE bridge cd dashboard/frontend && npm run dev # dashboard UI # trigger seed_04 — the QA revision loop showpiece curl -X POST http://localhost:8002/inject \ -H "Content-Type: application/json" \ -d '{ "platform": "tripadvisor", "business_id": "loc_demo", "review": { "text": "Worst experience of my life. The manager was completely dismissive when I raised my concerns. The food was inedible and no one seemed to care. I will be leaving reviews everywhere I can.", "rating": 1, "author": "Sarah K.", "url": "http://demo.platform", "language": "en" } }' # other seeds (01, 02, 05): same structure, swap the "review" object # fallback if the local inject server isn't running: POST the same body to # http://localhost:8000/demo/inject # optional: external webhook ingestion (see README)
Verification
Tech stack