OVATION — Judges Overview | Band of Agents Hackathon

At a glance

The essentials.

Project

OVATION

Pipeline

6 specialized agents

Coordination

Band (shared room)

LLM providers

Featherless + AI/ML API

Generation model

DeepSeek-V4

Demo control

HTTP inject endpoint

Seed scenarios

8 engineered cases

Build tool

Antigravity

Repo

resilientbeast/review-response-system

Judging criteria mapping

Four criteria. Direct evidence for each.

No abstractions — each claim below points to something a judge can actually watch happen in the Band room or read in the repo.

Role Specialization

Six agents, six distinct jobs. No agent does another's work.

Monitor only surfaces reviews — it never classifies or drafts
Drafter only generates — it never scores its own work
QA only evaluates — it never writes the published text
Each agent is a separate registered Band identity

Shared Context

A defined schema chain — every agent gets exactly the structured context it needs.

ReviewEvent → TriageContext → ResearchPack
ResearchPack → DraftPackage → QAResult
QAResult branches to RevisionNotes or EscalationAlert
No agent re-derives what an upstream agent already decided

Task State

Every review has a tracked lifecycle, not a single stateless pass.

States: drafted → in QA → revised → approved / escalated
The QA loop explicitly models a revision cycle, capped at 2 iterations
A failed second pass changes the task's path — to Escalation

Visible Coordination

All six agents in one Band room. Every handoff is a readable message.

No hidden API calls between agents — only Band messages
Mentions route each message to the agent that should act next
A judge can read the full decision trail without touching code

Architecture

The six-agent pipeline.

Reviews move left to right. Most exit through QA → Publish; flagged cases exit early through Escalation.

#	Agent	Job	Model
01	Monitor	Polls for new reviews, surfaces them into Band	No LLM
02	Triage	Classifies sentiment & urgency, routes the case	Featherless
03	Research	Enriches with brand voice and business context	Featherless
04	Drafter	Generates the response; revises on QA rejection	DeepSeek-V4
05	QA	Scores drafts; approves or loops back (max 2x)	DeepSeek-V4
06	Escalation	Holds critical or flagged cases for human review — terminal stage, no auto-publish	Featherless

The coordination showpiece

The QA revision loop.

This is the single best moment to watch live — it's the clearest proof that coordination, not just generation, is happening.

DRAFTER → QA → REJECT (v1) → DRAFTER REVISES → QA → APPROVE (v2)

Seed #04 is engineered to fail QA on the first pass — a verbose draft trips an explicit rejection rule — guaranteeing this loop is visible every time it's demoed, not left to chance. Capped at 2 iterations; a second rejection routes to Escalation instead of looping indefinitely.

Live demo guide

Recommended sequence: four seeds, four proofs.

Each step is triggered with POST /inject?id=seed_XX for a reliable, repeatable run — no dependency on live review platforms.

Happy Path seed_01

Establishes the baseline — full pipeline, single pass, clean publish. Proves the mechanics work end-to-end before anything interesting happens.

Routine Complaint seed_02

Shows Research's enrichment actually shaping the response tone — not a templated reply, a context-aware one.

QA Revision Loop ★ seed_04

The centerpiece. QA rejects the first draft, Drafter revises, QA approves the second. The clearest live proof of agent coordination in the whole demo.

Legal Threat seed_05

Shows the safety path — drafting is skipped entirely, Escalation fires immediately, and the case is held for a human. Nothing auto-publishes.

How to run

From clone to live demo.

# 1. clone & install python deps
git clone git@github.com:resilientbeast/review-response-system.git
cd review-response-system
uv sync

# 2. frontend deps
cd dashboard/frontend && npm install && cd ../..

# 3. configure
cp .env.example .env
cp agent_config.yaml.example agent_config.yaml
# add BAND_API_KEY, OPENAI_API_KEY, etc.

# 4. init local database
sqlite3 data/reviews.db < migrations/001_initial.sql

# run — three processes, separate terminals
uv run python run_all.py                          # agent pipeline
uv run uvicorn dashboard.bridge:app --port 8001    # SSE bridge
cd dashboard/frontend && npm run dev               # dashboard UI

# trigger seed_04 — the QA revision loop showpiece
curl -X POST http://localhost:8002/inject \
  -H "Content-Type: application/json" \
  -d '{
    "platform": "tripadvisor",
    "business_id": "loc_demo",
    "review": {
      "text": "Worst experience of my life. The manager was completely dismissive when I raised my concerns. The food was inedible and no one seemed to care. I will be leaving reviews everywhere I can.",
      "rating": 1,
      "author": "Sarah K.",
      "url": "http://demo.platform",
      "language": "en"
    }
  }'

# other seeds (01, 02, 05): same structure, swap the "review" object
# fallback if the local inject server isn't running: POST the same body to
#   http://localhost:8000/demo/inject

# optional: external webhook ingestion (see README)

Verification

Pre-submission checklist.

Seed scenarios verified: 8 / 8 producing expected outcomes

Band integration: all 6 agents registered and responding in the shared room

Featherless integration: Triage, Research, Escalation calling successfully

AI/ML API integration: Drafter and QA calling DeepSeek-V4 successfully

HTTP inject endpoint: all 8 seed IDs trigger correctly

QA loop fires on demand: seed_04 reliably rejects on v1 and approves on v2

Tech stack

What it's built on.

Band Coordination layer — shared room, agent handles, visible message routing

Featherless LLM provider for Triage, Research, and Escalation — cost-efficient inference

AI/ML API · DeepSeek-V4 LLM provider for Drafter and QA — premium reasoning for generation and scoring

HTTP Trigger Controlled inject endpoint for reliable, repeatable live demo scenarios

Antigravity AI code builder used to implement the spec against the verified Band SDK