Open Source · MIT License

Your AI assistant,
self-hosted.

Persistent memory, web search, email integration, and dream-cycle cognition — validated on 1,049 QA items with F1 0.48.

ScallopBotonline
What did Sarah say about the project deadline?
10:41
memory_search · 847 memories · 3 results
Sarah mentioned on Oct 15 that the deadline moved to November 30. She noted the design review needs to happen first.
10:42
Search for recent papers on sparse attention
10:43
web_search · "sparse attention 2026"
Found 3 papers. Most cited: "Sub-quadratic Attention" (Chen et al.) — linear cross-attention with 40% fewer FLOPs.
10:43
Anything urgent in my emails?
10:44
email_check · 2 unread from team
Jake sent a deployment blocker 30m ago. Lisa’s Q3 review is due Friday.
10:45
Message...
Hybrid MemoryModel Routing9 ChannelsLocal VoiceModular SkillsDream CyclesDashboardSchedulingReliability

LoCoMo benchmark evaluation

Evaluated on LoCoMo — a standardized long-conversation memory benchmark with 1,049 QA items across 5 conversations and 138 sessions. Both systems use identical models (Moonshot kimi-k2.5) and embeddings (Ollama nomic-embed-text). The system comprises 367 TypeScript source files (~63,000 lines of code) with 1,560 tests across 95 test files. ScallopBot’s hybrid retrieval with LLM reranking, temporal query detection, and score-gated context achieves F1 0.48 vs OpenClaw’s 0.38 — a 26% relative improvement.

0.48
F1 Score
+26% vs OpenClaw on 1,049 QA items
+0.20
Adversarial Gain
F1 0.97 vs 0.77 on unanswerable questions
$0.06–0.10
Daily cost
Full cognitive pipeline, 7 LLM providers
LoCoMo Results by Category
Overall F1
Token-level F1 across all 1,049 QA items (5 conversations, 138 sessions)
ScallopBot
0.48
OpenClaw
0.38
+26% relative improvement
Adversarial Questions
Unanswerable questions designed to test refusal accuracy
ScallopBot
0.97
OpenClaw
0.77
+26% relative improvement
Multi-hop Questions
Questions requiring synthesis of facts across multiple sessions
ScallopBot
0.42
OpenClaw
0.32
+31% relative improvement
Temporal Questions
Questions requiring time-based reasoning across sessions
ScallopBot
0.34
OpenClaw
0.26
+31% relative improvement
Single-hop Questions
Direct factual recall from a single conversation session
ScallopBot
0.20
OpenClaw
0.14
+43% relative improvement
Open-domain Questions
General knowledge questions not tied to specific conversation sessions
ScallopBot
0.09
OpenClaw
0.07
+29% relative improvement

Standardized benchmark with real embeddings (Ollama nomic-embed-text, 768-dim) and real LLM (Moonshot kimi-k2.5). Adversarial gains driven by score-gating and anti-fabrication constraints. Multi-hop gains from memory fusion, NREM dream consolidation, and increased retrieval depth. Temporal gains from date-embedded memories and regex-based temporal query detection. Full cognitive pipeline adds ~$0.02/day to base conversation cost. Design validated against 30 research works from 2023–2026 across six domains.

Up and running in minutes

One script installs everything on a fresh Ubuntu server. Add a provider key and you're live.

# Clone the repo
git clone https://github.com/tashfeenahmed/scallopbot
cd scallopbot

# One-command server setup (Node 22, PM2, voice deps, Ollama)
bash scripts/server-install.sh

# Configure your provider key
cp .env.example .env
nano .env  # add at least ANTHROPIC_API_KEY

# Build and start
npm run build
node dist/cli.js start

Own your AI assistant

MIT licensed. Self-hosted. No vendor lock-in.

Get Started on GitHub