feat: brain-engine + brain-ui + docs — template full stack standalone
- brain-engine: server, embed, search, RAG, MCP, start.sh (standalone) - brain-ui: source React complète, build.sh, DocsView avec tier colors - docs: 14 pages guides humains (getting-started, architecture, sessions, workflows, agents, vues tier) - brain-compose.yml v0.9.0: tier featured ajouté, sessions/agents par tier, coach_level, API key schema - DISTRIBUTION_CHECKLIST v1.2: brain-engine + brain-ui + docs dans la checklist
This commit is contained in:
77
brain-engine/README.md
Normal file
77
brain-engine/README.md
Normal file
@@ -0,0 +1,77 @@
|
||||
---
|
||||
name: brain-engine
|
||||
type: reference
|
||||
context_tier: cold
|
||||
---
|
||||
|
||||
# brain-engine — Moteur local
|
||||
|
||||
> Le cerveau du brain. Recherche semantique, API locale, embeddings, BSI.
|
||||
|
||||
---
|
||||
|
||||
## Demarrage rapide
|
||||
|
||||
```bash
|
||||
bash brain-engine/start.sh
|
||||
```
|
||||
|
||||
Ca fait tout : installe les deps Python, cree brain.db, indexe le corpus si Ollama est present, et lance le serveur sur le port 7700.
|
||||
|
||||
---
|
||||
|
||||
## Prerequis
|
||||
|
||||
- **Python 3.10+** — `sudo apt install python3 python3-pip python3-venv`
|
||||
- **Ollama** (optionnel mais recommande) — `curl -fsSL https://ollama.com/install.sh | sh`
|
||||
- Modele embedding : `ollama pull nomic-embed-text`
|
||||
- Sans Ollama : le serveur tourne mais la recherche semantique n'est pas disponible
|
||||
|
||||
---
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
brain-engine/
|
||||
start.sh <- script de demarrage standalone
|
||||
server.py <- API HTTP (FastAPI, port 7700)
|
||||
mcp_server.py <- MCP server (FastMCP, port 7701)
|
||||
embed.py <- pipeline embeddings (Ollama + nomic-embed-text)
|
||||
search.py <- recherche cosine similarity + filtre scope
|
||||
rag.py <- couche RAG (boot queries + ad-hoc)
|
||||
schema.sql <- tables SQLite (claims, signals, embeddings, sessions)
|
||||
migrate.py <- migration brain.db
|
||||
distill.py <- distillation session memory (featured+)
|
||||
requirements.txt <- dependances Python
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Endpoints principaux
|
||||
|
||||
- `GET /health` — statut du serveur
|
||||
- `GET /search?q=` — recherche semantique dans le brain
|
||||
- `GET /agents` — liste des agents disponibles
|
||||
- `GET /boot` — contexte initial pour une session
|
||||
- `GET /workflows` — claims BSI ouverts
|
||||
- `GET /tier` — tier actif
|
||||
|
||||
---
|
||||
|
||||
## Mode standalone
|
||||
|
||||
Sans token configure, le serveur donne acces total en localhost. C'est le mode par defaut quand tu forkes le brain.
|
||||
|
||||
Sans cle API (`brain_api_key: null`), le tier est `free` — toutes les fonctionnalites fondamentales sont disponibles.
|
||||
|
||||
---
|
||||
|
||||
## Connexion Claude Code (MCP)
|
||||
|
||||
```bash
|
||||
# Lancer le MCP server
|
||||
python3 brain-engine/mcp_server.py
|
||||
|
||||
# Ajouter dans Claude Code
|
||||
claude mcp add brain --transport http http://localhost:7701/mcp/
|
||||
```
|
||||
401
brain-engine/distill.py
Normal file
401
brain-engine/distill.py
Normal file
@@ -0,0 +1,401 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
brain-engine/distill.py — BE-5 Session memory distillation
|
||||
Distille une session BSI (.jsonl Claude) en chunks indexés dans brain.db.
|
||||
|
||||
Usage :
|
||||
python3 brain-engine/distill.py <session.jsonl> → distille la session
|
||||
python3 brain-engine/distill.py <session.jsonl> --dry-run → aperçu sans écriture
|
||||
python3 brain-engine/distill.py --last → distille la dernière session Claude
|
||||
|
||||
Point de substitution LLM : fonction summarize() — Ollama local (pro tier).
|
||||
Pour tier full : remplacer summarize() par un appel API Claude/OpenAI.
|
||||
|
||||
Scope : work — les distillats sont accessibles via brain_search (MCP + owner).
|
||||
"""
|
||||
|
||||
import os
|
||||
import sys
|
||||
import json
|
||||
import re
|
||||
import argparse
|
||||
import sqlite3
|
||||
import urllib.request
|
||||
import urllib.error
|
||||
from pathlib import Path
|
||||
from datetime import datetime
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).parent))
|
||||
from embed import connect, upsert_chunk, get_embedding, chunk_id, OLLAMA_URL
|
||||
|
||||
# ── Config ─────────────────────────────────────────────────────────────────────
|
||||
|
||||
BRAIN_ROOT = Path(__file__).parent.parent
|
||||
DISTILL_MODEL = os.getenv('DISTILL_MODEL', 'mistral:7b') # LLM local pour résumé
|
||||
SCOPE = 'work'
|
||||
|
||||
# Sessions Claude — chemin par défaut
|
||||
CLAUDE_SESSIONS_DIR = Path.home() / '.claude' / 'projects'
|
||||
|
||||
# Taille max du contexte envoyé au LLM (chars) — réduit pour garder le format few-shot (BE-5d)
|
||||
MAX_CONTEXT_CHARS = 12_000
|
||||
|
||||
# Max messages récents envoyés au LLM — évite les narratives anglaises sur grandes sessions (BE-5d)
|
||||
MAX_MESSAGES = 50
|
||||
|
||||
# Seuil minimum — sessions trop courtes ne contiennent que le brief, pas de vraies décisions (BE-5d)
|
||||
MIN_MESSAGES = 10
|
||||
|
||||
# Levier 2 — max chunks par aspect (Stratégie A, split post-LLM)
|
||||
CHUNK_LIMITS = {'decisions': 10, 'code': 5, 'todos': 5}
|
||||
|
||||
|
||||
# ── Extraction session ─────────────────────────────────────────────────────────
|
||||
|
||||
def extract_messages(jsonl_path: Path) -> list[dict]:
|
||||
"""Extrait les messages human/assistant du .jsonl Claude."""
|
||||
messages = []
|
||||
try:
|
||||
with open(jsonl_path) as f:
|
||||
for line in f:
|
||||
line = line.strip()
|
||||
if not line:
|
||||
continue
|
||||
try:
|
||||
entry = json.loads(line)
|
||||
except json.JSONDecodeError:
|
||||
continue
|
||||
msg = entry.get('message', {})
|
||||
role = msg.get('role')
|
||||
if role not in ('user', 'assistant'):
|
||||
continue
|
||||
content = msg.get('content', '')
|
||||
if isinstance(content, list):
|
||||
# Extraire le texte des blocs content
|
||||
parts = [b.get('text', '') for b in content
|
||||
if isinstance(b, dict) and b.get('type') == 'text']
|
||||
content = '\n'.join(parts)
|
||||
if content and content.strip():
|
||||
messages.append({'role': role, 'content': content.strip()})
|
||||
except FileNotFoundError:
|
||||
sys.exit(f'❌ Fichier introuvable : {jsonl_path}')
|
||||
return messages
|
||||
|
||||
|
||||
def build_context(messages: list[dict], max_chars: int = MAX_CONTEXT_CHARS) -> str:
|
||||
"""Construit un contexte tronqué pour le LLM.
|
||||
Priorise les N derniers messages (MAX_MESSAGES) pour garder le LLM dans le format few-shot.
|
||||
"""
|
||||
# Bug 2 fix — prioriser les messages récents sur grandes sessions
|
||||
if len(messages) > MAX_MESSAGES:
|
||||
messages = messages[-MAX_MESSAGES:]
|
||||
lines = []
|
||||
total = 0
|
||||
# On prend les messages les plus récents en priorité
|
||||
for msg in reversed(messages):
|
||||
prefix = 'USER' if msg['role'] == 'user' else 'ASSISTANT'
|
||||
line = f'[{prefix}] {msg["content"][:500]}'
|
||||
if total + len(line) > max_chars:
|
||||
break
|
||||
lines.append(line)
|
||||
total += len(line)
|
||||
lines.reverse()
|
||||
return '\n\n'.join(lines)
|
||||
|
||||
|
||||
# ── LLM — point de substitution ───────────────────────────────────────────────
|
||||
|
||||
def summarize(context: str, aspect: str) -> str | None:
|
||||
"""
|
||||
Résume le contexte selon l'aspect demandé.
|
||||
POINT DE SUBSTITUTION : remplacer par API Claude/OpenAI pour tier full.
|
||||
|
||||
aspect : 'decisions' | 'code' | 'todos'
|
||||
"""
|
||||
prompts = {
|
||||
'decisions': (
|
||||
'Tu es un extracteur de mémoire technique. '
|
||||
'Extrait les décisions architecturales et techniques prises dans cette session.\n\n'
|
||||
'FORMAT OBLIGATOIRE : une décision par ligne, commençant par "- ".\n'
|
||||
'Si aucune décision : répondre uniquement "none".\n\n'
|
||||
'EXEMPLES :\n'
|
||||
'Session : "On a choisi mistral:7b parce que mistral-small était trop lent"\n'
|
||||
'→\n'
|
||||
'- Modèle LLM distillation : mistral:7b retenu (mistral-small écarté — latence)\n\n'
|
||||
'Session : "On garde 3 chunks par session, max 10 decisions, 5 code, 5 todos"\n'
|
||||
'→\n'
|
||||
'- Chunking BE-5 : 3 aspects (decisions/code/todos), caps 10/5/5\n\n'
|
||||
'Session : "Finalement on utilise SQLite plutôt que Postgres pour brain.db"\n'
|
||||
'→\n'
|
||||
'- Stockage brain.db : SQLite retenu (Postgres écarté — overhead opérationnel)\n\n'
|
||||
'Réponds dans la même langue que la session. Max 15 mots par bullet.\n\n'
|
||||
'Session :\n'
|
||||
),
|
||||
'code': (
|
||||
'Tu es un extracteur de mémoire technique. '
|
||||
'Extrait les fichiers créés ou modifiés, les fonctions clés implémentées, et les bugs corrigés.\n\n'
|
||||
'FORMAT OBLIGATOIRE : une entrée par ligne, commençant par "- ".\n'
|
||||
'Si rien de notable : répondre uniquement "none".\n\n'
|
||||
'EXEMPLES :\n'
|
||||
'Session : "On a créé distill.py avec les fonctions extract_messages, build_context et summarize"\n'
|
||||
'→\n'
|
||||
'- brain-engine/distill.py créé — pipeline distillation : extract_messages(), build_context(), summarize()\n\n'
|
||||
'Session : "J\'ai corrigé le timeout dans embed.py, maintenant c\'est 90s au lieu de 60s"\n'
|
||||
'→\n'
|
||||
'- embed.py:get_embedding() — fix timeout 60s → 90s\n\n'
|
||||
'Session : "On a ajouté CHUNK_LIMITS et parse_bullets dans distill.py"\n'
|
||||
'→\n'
|
||||
'- distill.py — ajout CHUNK_LIMITS (10/5/5) + parse_bullets() stratégie A\n\n'
|
||||
'Réponds dans la même langue que la session. Sois concis.\n\n'
|
||||
'Session :\n'
|
||||
),
|
||||
'todos': (
|
||||
'Tu es un extracteur de mémoire technique. '
|
||||
'Extrait les tâches ouvertes, blockers et prochaines étapes mentionnés dans cette session.\n\n'
|
||||
'FORMAT OBLIGATOIRE : une tâche par ligne, commençant par "- ".\n'
|
||||
'Si aucune tâche : répondre uniquement "none".\n\n'
|
||||
'EXEMPLES :\n'
|
||||
'Session : "Il faudra tester deepseek-coder pour l\'aspect code plus tard"\n'
|
||||
'→\n'
|
||||
'- Tester deepseek-coder:6.7b pour aspect "code" (levier 3 BE-5)\n\n'
|
||||
'Session : "Le cron VPS n\'est pas viable tant qu\'Ollama ne tourne pas sur le VPS"\n'
|
||||
'→\n'
|
||||
'- Installer Ollama sur VPS pour activer cron distillation automatique\n\n'
|
||||
'Session : "On fera l\'externalisation des prompts en BE-5c si nécessaire"\n'
|
||||
'→\n'
|
||||
'- BE-5c (optionnel) : externaliser prompts distill dans brain-engine/prompts/*.txt\n\n'
|
||||
'Réponds dans la même langue que la session. Sois concis.\n\n'
|
||||
'Session :\n'
|
||||
),
|
||||
}
|
||||
prompt = prompts[aspect] + context
|
||||
|
||||
url = f'{OLLAMA_URL}/api/generate'
|
||||
payload = json.dumps({
|
||||
'model': DISTILL_MODEL,
|
||||
'prompt': prompt,
|
||||
'stream': False,
|
||||
'options': {'temperature': 0.1, 'num_predict': 400},
|
||||
}).encode()
|
||||
req = urllib.request.Request(url, data=payload,
|
||||
headers={'Content-Type': 'application/json'})
|
||||
try:
|
||||
with urllib.request.urlopen(req, timeout=60) as resp:
|
||||
data = json.loads(resp.read())
|
||||
return data.get('response', '').strip()
|
||||
except (urllib.error.URLError, TimeoutError) as e:
|
||||
print(f'⚠️ Ollama indisponible ({OLLAMA_URL}) : {e}', file=sys.stderr)
|
||||
return None
|
||||
|
||||
|
||||
# ── Parsing bullets (Stratégie A — post-split) ────────────────────────────────
|
||||
|
||||
def parse_bullets(text: str) -> list[str]:
|
||||
"""
|
||||
Extrait les bullets d'une réponse LLM.
|
||||
Reconnaît : '- ', '• ', '* ', '– ' en début de ligne.
|
||||
Gère les continuations (ligne indentée sans préfixe = suite du bullet précédent).
|
||||
"""
|
||||
bullets: list[str] = []
|
||||
current: list[str] = []
|
||||
|
||||
for line in text.splitlines():
|
||||
stripped = line.strip()
|
||||
if not stripped:
|
||||
continue
|
||||
# Préfixes reconnus : tiret court, puce, astérisque, tiret long
|
||||
is_bullet = (
|
||||
stripped[:2] in ('- ', '• ', '* ')
|
||||
or (stripped[0] == '–' and len(stripped) > 1 and stripped[1] == ' ')
|
||||
)
|
||||
if is_bullet:
|
||||
if current:
|
||||
bullets.append(' '.join(current))
|
||||
# Extraire le texte après le préfixe (1 ou 2 chars)
|
||||
prefix_len = 2 if stripped[:2] in ('- ', '• ', '* ') else 2
|
||||
current = [stripped[prefix_len:].strip()]
|
||||
elif current:
|
||||
# Continuation d'un bullet multi-ligne
|
||||
current.append(stripped)
|
||||
|
||||
if current:
|
||||
bullets.append(' '.join(current))
|
||||
|
||||
return [b for b in bullets if b]
|
||||
|
||||
|
||||
# ── Summarisation 2 passes (BE-5e) ────────────────────────────────────────────
|
||||
|
||||
def summarize_2pass(messages: list[dict], aspect: str) -> str | None:
|
||||
"""
|
||||
Summarisation en 2 passes pour grandes sessions (BE-5e).
|
||||
Pass 1 : résumé de chaque bloc de MAX_MESSAGES messages.
|
||||
Pass 2 : résumé final sur la concaténation des résumés partiels.
|
||||
"""
|
||||
blocks = [messages[i:i + MAX_MESSAGES] for i in range(0, len(messages), MAX_MESSAGES)]
|
||||
partial_summaries = []
|
||||
for idx, block in enumerate(blocks):
|
||||
context = build_context(block)
|
||||
partial = summarize(context, aspect)
|
||||
if partial and partial.strip().lower() not in ('none', 'aucune', 'aucun', 'ninguno', 'ninguna', ''):
|
||||
partial_summaries.append(f'# Bloc {idx + 1}/{len(blocks)}\n{partial}')
|
||||
|
||||
if not partial_summaries:
|
||||
return None
|
||||
|
||||
combined = '\n\n'.join(partial_summaries)
|
||||
# Pass 2 : résumé final
|
||||
return summarize(combined[:MAX_CONTEXT_CHARS], aspect)
|
||||
|
||||
|
||||
# ── Distillation ──────────────────────────────────────────────────────────────
|
||||
|
||||
def distill_session(jsonl_path: Path, dry_run: bool = False) -> int:
|
||||
"""
|
||||
Distille une session en chunks granulaires (1 bullet = 1 chunk).
|
||||
Caps : decisions ≤ 10, code ≤ 5, todos ≤ 5.
|
||||
Retourne le nombre de chunks indexés.
|
||||
"""
|
||||
print(f'📖 Lecture : {jsonl_path.name}')
|
||||
messages = extract_messages(jsonl_path)
|
||||
if not messages:
|
||||
print('⚠️ Aucun message extractible — session vide ou format inconnu.')
|
||||
return 0
|
||||
|
||||
print(f' {len(messages)} messages extraits')
|
||||
|
||||
# Bug 1 fix — filtre micro-sessions (brief bootstrap seul, pas de vraies décisions)
|
||||
if len(messages) < MIN_MESSAGES:
|
||||
print(f'⚠️ Session trop courte ({len(messages)} messages < {MIN_MESSAGES}) — skip.')
|
||||
return 0
|
||||
|
||||
is_large = len(messages) > MAX_MESSAGES
|
||||
context = build_context(messages) if not is_large else None
|
||||
if is_large:
|
||||
print(f' ⚡ Grande session ({len(messages)} msg) — mode 2-pass activé')
|
||||
sess_id = jsonl_path.stem # ex: c22807f5-04df-...
|
||||
date_str = datetime.now().strftime('%Y-%m-%d')
|
||||
|
||||
conn = connect() if not dry_run else None
|
||||
total = 0
|
||||
|
||||
# Bug 3 fix — purger les anciens chunks sans suffixe numérique (format pré-BE-5b)
|
||||
if conn:
|
||||
cur = conn.cursor()
|
||||
cur.execute(
|
||||
'DELETE FROM embeddings WHERE filepath LIKE ? AND filepath NOT LIKE ?',
|
||||
(f'sessions/{sess_id}/%', f'sessions/{sess_id}/%/%'),
|
||||
)
|
||||
purged = cur.rowcount
|
||||
if purged:
|
||||
print(f' 🧹 {purged} anciens chunk(s) purgés (format pré-BE-5b)')
|
||||
conn.commit()
|
||||
|
||||
for aspect in ('decisions', 'code', 'todos'):
|
||||
limit = CHUNK_LIMITS[aspect]
|
||||
if is_large:
|
||||
print(f' 🧠 Distillation [{aspect}] (2-pass)...', end=' ', flush=True)
|
||||
summary = summarize_2pass(messages, aspect)
|
||||
else:
|
||||
print(f' 🧠 Distillation [{aspect}]...', end=' ', flush=True)
|
||||
summary = summarize(context, aspect)
|
||||
|
||||
if not summary or summary.strip().lower() in ('aucune', 'aucun', 'none', 'ninguno', 'ninguna', ''):
|
||||
print('vide — ignoré')
|
||||
continue
|
||||
|
||||
bullets = parse_bullets(summary)
|
||||
if not bullets:
|
||||
# Fallback : LLM n'a pas suivi le format — 1 chunk brut plutôt que perdre l'info
|
||||
bullets = [summary.strip()]
|
||||
|
||||
# Filtrer les bullets "none" parasites (LLM met parfois "none:" au lieu du sentinel)
|
||||
_none_words = {'none', 'aucune', 'aucun', 'ninguno', 'ninguna'}
|
||||
bullets = [b for b in bullets
|
||||
if b.strip().lower().split()[0].rstrip(':') not in _none_words]
|
||||
|
||||
bullets = bullets[:limit]
|
||||
print(f'{len(bullets)} bullet(s)')
|
||||
|
||||
for i, bullet in enumerate(bullets):
|
||||
filepath = f'sessions/{sess_id}/{aspect}/{i:02d}'
|
||||
title = f'Session {date_str} — {aspect} #{i+1:02d}'
|
||||
chunk = {
|
||||
'filepath': filepath,
|
||||
'title': title,
|
||||
'text': f'# {title}\n\nSource : {jsonl_path.name}\n\n- {bullet}',
|
||||
'scope': SCOPE,
|
||||
}
|
||||
|
||||
if dry_run:
|
||||
print(f' [{aspect}/{i:02d}] {bullet[:100]}')
|
||||
total += 1
|
||||
continue
|
||||
|
||||
vector = get_embedding(chunk['text'])
|
||||
if vector:
|
||||
upsert_chunk(conn, chunk, vector)
|
||||
conn.commit()
|
||||
total += 1
|
||||
else:
|
||||
print(f'⚠️ embed échoué [{aspect}/{i:02d}] — stocké sans vecteur')
|
||||
upsert_chunk(conn, chunk, None)
|
||||
conn.commit()
|
||||
|
||||
if conn:
|
||||
conn.close()
|
||||
|
||||
return total
|
||||
|
||||
|
||||
# ── Helpers ───────────────────────────────────────────────────────────────────
|
||||
|
||||
def find_last_session() -> Path | None:
|
||||
"""Trouve le .jsonl de la dernière session Claude dans ~/.claude/projects."""
|
||||
jsonl_files = list(CLAUDE_SESSIONS_DIR.glob('**/*.jsonl'))
|
||||
if not jsonl_files:
|
||||
return None
|
||||
return max(jsonl_files, key=lambda p: p.stat().st_mtime)
|
||||
|
||||
|
||||
# ── CLI ───────────────────────────────────────────────────────────────────────
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description='brain-engine distill — BE-5 session memory distillation'
|
||||
)
|
||||
parser.add_argument('session', nargs='?', type=Path,
|
||||
help='Chemin vers le .jsonl de session Claude')
|
||||
parser.add_argument('--last', action='store_true',
|
||||
help='Distille la dernière session Claude automatiquement')
|
||||
parser.add_argument('--dry-run', action='store_true',
|
||||
help='Aperçu sans écriture dans brain.db')
|
||||
args = parser.parse_args()
|
||||
|
||||
if args.last:
|
||||
jsonl = find_last_session()
|
||||
if not jsonl:
|
||||
sys.exit('❌ Aucune session trouvée dans ~/.claude/projects/')
|
||||
print(f'📌 Dernière session : {jsonl}')
|
||||
elif args.session:
|
||||
jsonl = args.session
|
||||
else:
|
||||
parser.print_help()
|
||||
sys.exit(1)
|
||||
|
||||
mode = ' (dry-run)' if args.dry_run else ''
|
||||
print(f'\n🔬 Distillation BE-5{mode}\n')
|
||||
|
||||
n = distill_session(jsonl, dry_run=args.dry_run)
|
||||
|
||||
if n == 0:
|
||||
print('\n⚠️ Aucun chunk produit — session vide ou Ollama indisponible.')
|
||||
sys.exit(2)
|
||||
|
||||
print(f'\n✅ {n} chunk(s) distillé(s) → brain.db (scope: {SCOPE})')
|
||||
if not args.dry_run:
|
||||
print(' → brain_search "session précédente" pour retrouver ce contexte')
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
524
brain-engine/embed.py
Normal file
524
brain-engine/embed.py
Normal file
@@ -0,0 +1,524 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
brain-engine/embed.py — Pipeline d'embedding BE-2c
|
||||
Indexe le corpus brain via Ollama nomic-embed-text → table embeddings dans brain.db
|
||||
|
||||
Usage :
|
||||
python3 brain-engine/embed.py → index tout le corpus
|
||||
python3 brain-engine/embed.py --dry-run → liste les chunks sans embed
|
||||
python3 brain-engine/embed.py --file agents/helloWorld.md → réindexer un fichier
|
||||
python3 brain-engine/embed.py --stats → stats de l'index actuel
|
||||
|
||||
Headless : zéro dépendance Wayland/display.
|
||||
OLLAMA_URL : variable d'env (défaut localhost:11434) — supporte réseau local.
|
||||
|
||||
Zone filter — ADR-033a (2026-03-18) :
|
||||
kernel (agents/, wiki/, toolkit/, contexts/, KERNEL.md) → toujours indexé
|
||||
project (projets/, handoffs/, workspace/) → TTL 60 jours git-based
|
||||
session (claims/) → JAMAIS indexé
|
||||
personal (profil/bact/, profil/collaboration.md) → JAMAIS indexé
|
||||
profil/decisions/ → scope frontmatter (kernel | project)
|
||||
|
||||
Stratégie chunking par type :
|
||||
agents/*.md, projets/*.md, wiki/**/*.md → chunk par section H2
|
||||
workspace/**/*.md, profil/decisions/*.md → H2 ou fichier entier si < 512 tokens
|
||||
KERNEL.md, focus.md, contexts/ → fichier entier (documents courts)
|
||||
"""
|
||||
|
||||
import os
|
||||
import re
|
||||
import sys
|
||||
import json
|
||||
import struct
|
||||
import hashlib
|
||||
import argparse
|
||||
import sqlite3
|
||||
import subprocess
|
||||
import time
|
||||
import urllib.request
|
||||
import urllib.error
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
|
||||
BRAIN_ROOT = Path(__file__).parent.parent
|
||||
DB_PATH = BRAIN_ROOT / 'brain.db'
|
||||
OLLAMA_URL = os.getenv('OLLAMA_URL', 'http://localhost:11434')
|
||||
EMBED_MODEL = os.getenv('EMBED_MODEL', 'nomic-embed-text')
|
||||
|
||||
# Guardrail — LLMs génériques interdits : freeze machine garanti sur corpus entier
|
||||
# (validé empiriquement : mistral:7b + qwen3:8b → freeze total ~20min, 2026-03-16)
|
||||
_BLOCKED_MODELS = ['mistral', 'qwen', 'llama', 'gemma', 'phi', 'deepseek']
|
||||
if any(b in EMBED_MODEL.lower() for b in _BLOCKED_MODELS):
|
||||
sys.exit(f"❌ EMBED_MODEL='{EMBED_MODEL}' interdit — LLM générique → freeze machine sur corpus entier.\n"
|
||||
f" Utiliser un modèle dédié embedding : nomic-embed-text, mxbai-embed-large, all-minilm")
|
||||
|
||||
CHUNK_TOKENS = 512 # tokens max par chunk (approximé : 1 token ≈ 4 chars)
|
||||
CHUNK_OVERLAP = 64 # overlap entre chunks consécutifs
|
||||
|
||||
# ── Zones d'accès ─────────────────────────────────────────────────────────────
|
||||
|
||||
# Zone 0 — jamais indexé (privé absolu) — ADR-033a
|
||||
PRIVATE_PATHS = [
|
||||
'profil/capital.md',
|
||||
'profil/objectifs.md',
|
||||
'profil/bact/', # personal — jamais
|
||||
'profil/collaboration.md',# personal — jamais
|
||||
'progression/', # personal — journal + tout le répertoire
|
||||
'MYSECRETS',
|
||||
]
|
||||
|
||||
# Zone par préfixe — premier match gagne — ADR-033a + KERNEL.md zones
|
||||
# Zones : kernel | instance | satellite | public (private = exclusion totale ci-dessus)
|
||||
PATH_SCOPES = [
|
||||
# KERNEL — protection maximale
|
||||
('contexts/', 'kernel'),
|
||||
('profil/decisions/', 'kernel'),
|
||||
('profil/', 'kernel'),
|
||||
('KERNEL.md', 'kernel'),
|
||||
('brain-constitution.md', 'kernel'),
|
||||
('scripts/', 'kernel'),
|
||||
# INSTANCE — configuration machine + projets actifs
|
||||
('focus.md', 'instance'),
|
||||
('projets/', 'instance'),
|
||||
('PATHS.md', 'instance'),
|
||||
('now.md', 'instance'),
|
||||
# SATELLITE — vie libre, promotion possible
|
||||
('toolkit/', 'satellite'),
|
||||
('todo/', 'satellite'),
|
||||
('workspace/', 'satellite'),
|
||||
('handoffs/', 'satellite'),
|
||||
('intentions/', 'satellite'),
|
||||
# PUBLIC — visible, distribué
|
||||
('wiki/', 'public'),
|
||||
('agents/', 'public'),
|
||||
('infrastructure/', 'public'),
|
||||
('BRAIN-INDEX.md', 'public'),
|
||||
]
|
||||
DEFAULT_SCOPE = 'public'
|
||||
|
||||
|
||||
TTL_PROJECT_DAYS = 60 # ADR-033a — TTL projet, git-based
|
||||
|
||||
|
||||
def is_private(filepath: str) -> bool:
|
||||
"""Zone 0 — jamais indexé, jamais accessible."""
|
||||
return any(filepath == p or filepath.startswith(p) for p in PRIVATE_PATHS)
|
||||
|
||||
|
||||
def resolve_scope(filepath: str) -> str:
|
||||
"""Retourne la zone d'accès (kernel | instance | satellite | public)."""
|
||||
for prefix, scope in PATH_SCOPES:
|
||||
if filepath == prefix or filepath.startswith(prefix):
|
||||
return scope
|
||||
return DEFAULT_SCOPE
|
||||
|
||||
|
||||
def get_frontmatter_scope(filepath: Path) -> str | None:
|
||||
"""
|
||||
Lit le champ scope: du frontmatter YAML d'un fichier .md.
|
||||
Retourne 'kernel' | 'project' | 'personal' | None si absent.
|
||||
ADR-033a Règle 2 — override sur la règle répertoire.
|
||||
"""
|
||||
try:
|
||||
text = filepath.read_text(errors='replace')
|
||||
if not text.startswith('---'):
|
||||
return None
|
||||
end = text.find('\n---', 3)
|
||||
if end == -1:
|
||||
return None
|
||||
for line in text[3:end].splitlines():
|
||||
line = line.strip()
|
||||
if line.startswith('scope:'):
|
||||
val = line[len('scope:'):].strip()
|
||||
val = val.split('#')[0].strip() # retire commentaires inline
|
||||
return val if val else None
|
||||
except Exception:
|
||||
pass
|
||||
return None
|
||||
|
||||
|
||||
def get_git_age_days(filepath: Path) -> int | None:
|
||||
"""
|
||||
Retourne le nombre de jours depuis le dernier git commit sur ce fichier.
|
||||
None si le fichier n'est pas tracké ou si git échoue.
|
||||
ADR-033a — TTL git-based, aucun couplage BSI.
|
||||
"""
|
||||
try:
|
||||
result = subprocess.run(
|
||||
['git', 'log', '-1', '--format=%ct', '--', str(filepath)],
|
||||
capture_output=True, text=True, cwd=str(BRAIN_ROOT), timeout=5
|
||||
)
|
||||
ts = result.stdout.strip()
|
||||
if not ts:
|
||||
return None
|
||||
age_secs = time.time() - int(ts)
|
||||
return int(age_secs / 86400)
|
||||
except Exception:
|
||||
return None
|
||||
|
||||
|
||||
def should_skip_by_zone(filepath: Path) -> bool:
|
||||
"""
|
||||
Applique les règles ADR-033a — retourne True si le fichier doit être exclu.
|
||||
|
||||
Règle 1 — répertoire (défaut)
|
||||
Règle 2 — frontmatter scope: (override sur Règle 1, pour profil/decisions/)
|
||||
|
||||
Zones :
|
||||
kernel → False (toujours indexé)
|
||||
project + TTL > 60j → True (périmé)
|
||||
personal → True (jamais)
|
||||
"""
|
||||
rel = str(filepath.relative_to(BRAIN_ROOT))
|
||||
|
||||
# profil/decisions/ — Règle 2 : scope par frontmatter
|
||||
if rel.startswith('profil/decisions/'):
|
||||
scope = get_frontmatter_scope(filepath)
|
||||
if scope == 'personal':
|
||||
return True
|
||||
if scope == 'project':
|
||||
age = get_git_age_days(filepath)
|
||||
return age is not None and age > TTL_PROJECT_DAYS
|
||||
# scope: kernel ou absent → toujours indexé
|
||||
return False
|
||||
|
||||
# Zone project — TTL git-based
|
||||
if any(rel.startswith(p) for p in ('projets/', 'handoffs/', 'workspace/')):
|
||||
age = get_git_age_days(filepath)
|
||||
return age is not None and age > TTL_PROJECT_DAYS
|
||||
|
||||
return False
|
||||
|
||||
|
||||
# Corpus à indexer — chemins relatifs à BRAIN_ROOT — ADR-033a
|
||||
# kernel → toujours | project → TTL 60j git | omis → JAMAIS
|
||||
CORPUS_PATHS = [
|
||||
# ── kernel — toujours indexé ──────────────────────────────────────────────
|
||||
('agents', '*.md', 'h2'), # agents brain
|
||||
('wiki', '**/*.md', 'h2'), # documentation (submodule)
|
||||
('toolkit', '**/*.md', 'h2'), # patterns réutilisables
|
||||
('contexts', '*.yml', 'file'), # contextes de session
|
||||
# ── project — TTL 60 jours git-based ─────────────────────────────────────
|
||||
('projets', '*.md', 'h2'),
|
||||
('handoffs', '*.md', 'file'),
|
||||
('workspace', '**/*.md', 'h2'),
|
||||
# ── profil/decisions — scope par frontmatter (kernel | project) ──────────
|
||||
('profil/decisions', '*.md', 'file'),
|
||||
# ── fichiers racine kernel ────────────────────────────────────────────────
|
||||
('.', 'KERNEL.md', 'file'),
|
||||
('.', 'focus.md', 'file'),
|
||||
('.', 'BRAIN-INDEX.md', 'file'),
|
||||
# SUPPRIMÉ : ('ADR', ...) — chemin obsolète (ADRs dans profil/decisions/)
|
||||
# SUPPRIMÉ : ('profil', ...) — trop large, inclut bact/ — géré par scope
|
||||
# SUPPRIMÉ : ('claims', ...) — JAMAIS indexé per ADR-033a (session structurée)
|
||||
]
|
||||
|
||||
# Fichiers à exclure
|
||||
EXCLUDE_PATTERNS = [
|
||||
'brain-template/',
|
||||
'brain-engine/',
|
||||
'.git/',
|
||||
'node_modules/',
|
||||
]
|
||||
|
||||
|
||||
# ── Helpers ───────────────────────────────────────────────────────────────────
|
||||
|
||||
def should_exclude(filepath: Path) -> bool:
|
||||
s = str(filepath)
|
||||
if any(p in s for p in EXCLUDE_PATTERNS):
|
||||
return True
|
||||
# Zone 0 — privé absolu, jamais indexé
|
||||
if filepath.is_absolute():
|
||||
try:
|
||||
rel = str(filepath.relative_to(BRAIN_ROOT))
|
||||
except ValueError:
|
||||
rel = s # path hors BRAIN_ROOT — is_private unlikely mais safe
|
||||
else:
|
||||
rel = s
|
||||
return is_private(rel)
|
||||
|
||||
|
||||
def chunk_by_h2(text: str, filepath: str) -> list[dict]:
|
||||
"""Découpe un markdown en chunks par section H2."""
|
||||
sections = re.split(r'\n(?=## )', text)
|
||||
chunks = []
|
||||
for sec in sections:
|
||||
sec = sec.strip()
|
||||
if not sec:
|
||||
continue
|
||||
# Si section trop longue → re-découper par paragraphes
|
||||
if len(sec) > CHUNK_TOKENS * 4:
|
||||
sub = chunk_by_size(sec, filepath)
|
||||
chunks.extend(sub)
|
||||
else:
|
||||
title = sec.split('\n')[0].strip('#').strip()
|
||||
chunks.append({'text': sec, 'title': title, 'filepath': filepath})
|
||||
return chunks if chunks else [{'text': text, 'title': '', 'filepath': filepath}]
|
||||
|
||||
|
||||
def chunk_by_size(text: str, filepath: str) -> list[dict]:
|
||||
"""Découpe un texte en chunks de CHUNK_TOKENS tokens (approx)."""
|
||||
max_chars = CHUNK_TOKENS * 4
|
||||
overlap_chars = CHUNK_OVERLAP * 4
|
||||
chunks = []
|
||||
start = 0
|
||||
while start < len(text):
|
||||
end = min(start + max_chars, len(text))
|
||||
# Couper sur un saut de ligne si possible
|
||||
if end < len(text):
|
||||
nl = text.rfind('\n', start, end)
|
||||
if nl > start:
|
||||
end = nl
|
||||
chunk_text = text[start:end].strip()
|
||||
if chunk_text:
|
||||
chunks.append({'text': chunk_text, 'title': '', 'filepath': filepath})
|
||||
if end >= len(text):
|
||||
break
|
||||
# Toujours avancer : si l'overlap remonterait avant start, aller à end
|
||||
next_start = end - overlap_chars
|
||||
start = next_start if next_start > start else end
|
||||
return chunks
|
||||
|
||||
|
||||
def chunk_file(filepath: Path, strategy: str) -> list[dict]:
|
||||
"""Lit un fichier et retourne ses chunks selon la stratégie."""
|
||||
try:
|
||||
text = filepath.read_text(errors='replace').strip()
|
||||
except Exception as e:
|
||||
print(f" ⚠️ {filepath.name} : erreur lecture — {e}")
|
||||
return []
|
||||
|
||||
if not text:
|
||||
return []
|
||||
|
||||
rel = str(filepath.relative_to(BRAIN_ROOT))
|
||||
|
||||
if strategy == 'h2':
|
||||
return chunk_by_h2(text, rel)
|
||||
else:
|
||||
# Fichier entier — si trop long, chunk par taille
|
||||
if len(text) > CHUNK_TOKENS * 4:
|
||||
return chunk_by_size(text, rel)
|
||||
title = filepath.stem
|
||||
return [{'text': text, 'title': title, 'filepath': rel}]
|
||||
|
||||
|
||||
def chunk_id(filepath: str, text: str) -> str:
|
||||
"""ID déterministe : hash(filepath + text[:64])."""
|
||||
h = hashlib.sha1(f"{filepath}::{text[:64]}".encode()).hexdigest()[:12]
|
||||
return f"emb-{h}"
|
||||
|
||||
|
||||
# ── Ollama API ────────────────────────────────────────────────────────────────
|
||||
|
||||
def get_embedding(text: str) -> list[float] | None:
|
||||
"""Appelle Ollama embeddings API — retourne None si indisponible."""
|
||||
url = f"{OLLAMA_URL}/api/embeddings"
|
||||
payload = json.dumps({"model": EMBED_MODEL, "prompt": text}).encode()
|
||||
req = urllib.request.Request(url, data=payload,
|
||||
headers={"Content-Type": "application/json"})
|
||||
try:
|
||||
with urllib.request.urlopen(req, timeout=30) as resp:
|
||||
data = json.loads(resp.read())
|
||||
return data.get('embedding')
|
||||
except (urllib.error.URLError, TimeoutError) as e:
|
||||
print(f" ⚠️ Ollama indisponible ({OLLAMA_URL}) : {e}")
|
||||
return None
|
||||
|
||||
|
||||
def vector_to_blob(vec: list[float]) -> bytes:
|
||||
"""Sérialise un vecteur float32 en BLOB SQLite."""
|
||||
return struct.pack(f'{len(vec)}f', *vec)
|
||||
|
||||
|
||||
def blob_to_vector(blob: bytes) -> list[float]:
|
||||
"""Désérialise un BLOB SQLite en vecteur float32."""
|
||||
n = len(blob) // 4
|
||||
return list(struct.unpack(f'{n}f', blob))
|
||||
|
||||
|
||||
# ── SQLite ────────────────────────────────────────────────────────────────────
|
||||
|
||||
def connect() -> sqlite3.Connection:
|
||||
conn = sqlite3.connect(DB_PATH)
|
||||
conn.row_factory = sqlite3.Row
|
||||
conn.execute("PRAGMA journal_mode=WAL")
|
||||
# Créer la table embeddings si absente (extend schema)
|
||||
conn.execute("""
|
||||
CREATE TABLE IF NOT EXISTS embeddings (
|
||||
chunk_id TEXT PRIMARY KEY,
|
||||
filepath TEXT NOT NULL,
|
||||
title TEXT,
|
||||
chunk_text TEXT NOT NULL,
|
||||
vector BLOB, -- NULL si Ollama indisponible au moment du chunk
|
||||
model TEXT,
|
||||
indexed INTEGER DEFAULT 0, -- 1 = vecteur présent
|
||||
scope TEXT NOT NULL DEFAULT 'work', -- kernel | instance | satellite | public
|
||||
created_at TEXT NOT NULL DEFAULT (datetime('now')),
|
||||
updated_at TEXT NOT NULL DEFAULT (datetime('now'))
|
||||
)
|
||||
""")
|
||||
# Migration — ajouter scope si absente (db existante avant BE-4)
|
||||
try:
|
||||
conn.execute("ALTER TABLE embeddings ADD COLUMN scope TEXT NOT NULL DEFAULT 'work'")
|
||||
conn.commit()
|
||||
# Backfill — résoudre le scope de chaque chunk existant depuis son filepath
|
||||
rows = conn.execute("SELECT DISTINCT filepath FROM embeddings WHERE scope = 'work'").fetchall()
|
||||
for row in rows:
|
||||
fp = row['filepath']
|
||||
s = resolve_scope(fp)
|
||||
if s != 'work':
|
||||
conn.execute("UPDATE embeddings SET scope = ? WHERE filepath = ?", (s, fp))
|
||||
conn.commit()
|
||||
except Exception:
|
||||
pass # colonne déjà présente
|
||||
conn.execute("CREATE INDEX IF NOT EXISTS idx_emb_filepath ON embeddings(filepath)")
|
||||
conn.execute("CREATE INDEX IF NOT EXISTS idx_emb_indexed ON embeddings(indexed)")
|
||||
conn.execute("CREATE INDEX IF NOT EXISTS idx_emb_scope ON embeddings(scope)")
|
||||
conn.commit()
|
||||
return conn
|
||||
|
||||
|
||||
def upsert_chunk(conn: sqlite3.Connection, chunk: dict,
|
||||
vector: list[float] | None, dry_run: bool = False) -> bool:
|
||||
cid = chunk_id(chunk['filepath'], chunk['text'])
|
||||
blob = vector_to_blob(vector) if vector else None
|
||||
indexed = 1 if vector else 0
|
||||
scope = chunk.get('scope', resolve_scope(chunk['filepath']))
|
||||
|
||||
if dry_run:
|
||||
return True
|
||||
|
||||
conn.execute("""
|
||||
INSERT INTO embeddings(chunk_id, filepath, title, chunk_text, vector, model, indexed, scope, updated_at)
|
||||
VALUES (?,?,?,?,?,?,?,?, datetime('now'))
|
||||
ON CONFLICT(chunk_id) DO UPDATE SET
|
||||
chunk_text = excluded.chunk_text,
|
||||
vector = COALESCE(excluded.vector, embeddings.vector),
|
||||
indexed = COALESCE(excluded.indexed, embeddings.indexed),
|
||||
scope = excluded.scope,
|
||||
updated_at = excluded.updated_at
|
||||
""", (cid, chunk['filepath'], chunk.get('title',''), chunk['text'],
|
||||
blob, EMBED_MODEL if vector else None, indexed, scope))
|
||||
return True
|
||||
|
||||
|
||||
# ── Pipeline principal ────────────────────────────────────────────────────────
|
||||
|
||||
def collect_files(target_file: str | None = None) -> list[tuple[Path, str]]:
|
||||
"""Retourne la liste (path, strategy) des fichiers à indexer."""
|
||||
files = []
|
||||
seen = set()
|
||||
|
||||
if target_file:
|
||||
p = (BRAIN_ROOT / target_file).resolve()
|
||||
if not str(p).startswith(str(BRAIN_ROOT.resolve())):
|
||||
print(f" 🚨 --file hors BRAIN_ROOT refusé : {p}")
|
||||
return files
|
||||
if p.exists():
|
||||
# Déterminer stratégie par répertoire
|
||||
for base, pattern, strategy in CORPUS_PATHS:
|
||||
if str(p).startswith(str(BRAIN_ROOT / base)):
|
||||
files.append((p, strategy))
|
||||
break
|
||||
else:
|
||||
files.append((p, 'h2'))
|
||||
return files
|
||||
|
||||
for base, pattern, strategy in CORPUS_PATHS:
|
||||
base_path = BRAIN_ROOT / base
|
||||
if not base_path.exists():
|
||||
continue
|
||||
for p in sorted(base_path.glob(pattern)):
|
||||
if p in seen or not p.is_file():
|
||||
continue
|
||||
if should_exclude(p):
|
||||
continue
|
||||
if should_skip_by_zone(p):
|
||||
continue
|
||||
seen.add(p)
|
||||
files.append((p, strategy))
|
||||
|
||||
return files
|
||||
|
||||
|
||||
def run(dry_run: bool = False, target_file: str | None = None,
|
||||
stats_only: bool = False):
|
||||
|
||||
conn = connect()
|
||||
|
||||
if stats_only:
|
||||
total = conn.execute("SELECT COUNT(*) FROM embeddings").fetchone()[0]
|
||||
indexed = conn.execute("SELECT COUNT(*) FROM embeddings WHERE indexed=1").fetchone()[0]
|
||||
pending = total - indexed
|
||||
files_n = conn.execute("SELECT COUNT(DISTINCT filepath) FROM embeddings").fetchone()[0]
|
||||
print(f"Index embeddings :")
|
||||
print(f" chunks total : {total}")
|
||||
print(f" indexés : {indexed} ({100*indexed//total if total else 0}%)")
|
||||
print(f" sans vecteur : {pending}")
|
||||
print(f" fichiers : {files_n}")
|
||||
print(f" modèle : {EMBED_MODEL} @ {OLLAMA_URL}")
|
||||
conn.close()
|
||||
return
|
||||
|
||||
files = collect_files(target_file)
|
||||
print(f"Corpus : {len(files)} fichier(s) — modèle {EMBED_MODEL} @ {OLLAMA_URL}")
|
||||
|
||||
# Tester Ollama avant de boucler
|
||||
test_vec = get_embedding("test connexion") if not dry_run else None
|
||||
ollama_ok = test_vec is not None
|
||||
if not ollama_ok and not dry_run:
|
||||
print(f" ⚠️ Ollama indisponible — chunks enregistrés sans vecteur (indexed=0)")
|
||||
|
||||
total_chunks = 0
|
||||
total_indexed = 0
|
||||
|
||||
for filepath, strategy in files:
|
||||
chunks = chunk_file(filepath, strategy)
|
||||
if not chunks:
|
||||
continue
|
||||
|
||||
file_chunks = 0
|
||||
for chunk in chunks:
|
||||
chunk['scope'] = resolve_scope(chunk['filepath'])
|
||||
vec = None
|
||||
if ollama_ok and not dry_run:
|
||||
vec = get_embedding(chunk['text'])
|
||||
if vec:
|
||||
total_indexed += 1
|
||||
|
||||
upsert_chunk(conn, chunk, vec, dry_run=dry_run)
|
||||
total_chunks += 1
|
||||
file_chunks += 1
|
||||
|
||||
rel = str(filepath.relative_to(BRAIN_ROOT))
|
||||
status = "✅" if ollama_ok else "⬜"
|
||||
print(f" {status} {rel} — {file_chunks} chunk(s)")
|
||||
|
||||
if not dry_run:
|
||||
conn.commit()
|
||||
|
||||
print(f"\n{'[dry] ' if dry_run else ''}Chunks traités : {total_chunks}")
|
||||
if not dry_run:
|
||||
print(f"Vecteurs générés : {total_indexed}")
|
||||
if not ollama_ok:
|
||||
print(f"⚠️ Relancer avec Ollama actif pour compléter l'index")
|
||||
|
||||
conn.close()
|
||||
|
||||
|
||||
# ── CLI ───────────────────────────────────────────────────────────────────────
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description='brain-engine embed — pipeline embeddings BE-2c')
|
||||
parser.add_argument('--dry-run', action='store_true', help='Liste les chunks sans embed')
|
||||
parser.add_argument('--file', metavar='PATH', help='Réindexer un fichier spécifique')
|
||||
parser.add_argument('--stats', action='store_true', help='Stats de l\'index actuel')
|
||||
args = parser.parse_args()
|
||||
|
||||
run(dry_run=args.dry_run, target_file=args.file, stats_only=args.stats)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
412
brain-engine/mcp_server.py
Normal file
412
brain-engine/mcp_server.py
Normal file
@@ -0,0 +1,412 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
brain-engine/mcp_server.py — BE-4 MCP Server
|
||||
Expose le brain comme source de contexte native pour Claude.
|
||||
|
||||
Transport : StreamableHTTP (MCP 1.x)
|
||||
Port : 7701 (défaut) — distinct du BaaS HTTP (7700)
|
||||
Auth : BRAIN_TOKEN_MCP dans MYSECRETS → passé via header x-api-key
|
||||
|
||||
Outils exposés :
|
||||
brain_search(query, top) → recherche sémantique (zones public + work)
|
||||
brain_boot() → contexte de boot (3 queries ciblées)
|
||||
brain_workflows() → workflows actifs (claims BSI ouverts)
|
||||
brain_agents(name) → liste des agents ou contenu d'un agent
|
||||
brain_decisions(last) → dernières décisions architecturales (ADRs)
|
||||
brain_focus() → focus actuel du brain (direction + projets + blockers)
|
||||
brain_write(path, content)→ écrire un fichier dans le brain via PUT /brain/{path}
|
||||
|
||||
Usage :
|
||||
python3 brain-engine/mcp_server.py → port 7701 (défaut)
|
||||
BRAIN_MCP_PORT=8000 python3 brain-engine/mcp_server.py
|
||||
|
||||
Connexion Claude Code :
|
||||
claude mcp add brain --transport http http://localhost:7701/mcp/
|
||||
|
||||
Auth dans Claude Code :
|
||||
Settings → MCP → brain → Headers → x-api-key: <BRAIN_TOKEN_MCP>
|
||||
"""
|
||||
|
||||
import os
|
||||
import sys
|
||||
import logging
|
||||
from pathlib import Path
|
||||
|
||||
from mcp.server.fastmcp import FastMCP
|
||||
from starlette.requests import Request
|
||||
from starlette.responses import JSONResponse
|
||||
|
||||
sys.path.insert(0, str(Path(__file__).parent))
|
||||
from rag import run_boot_queries, run_single_query, format_compact, format_full
|
||||
|
||||
# ── Config ─────────────────────────────────────────────────────────────────────
|
||||
|
||||
BRAIN_MCP_PORT = int(os.getenv('BRAIN_MCP_PORT', 7701))
|
||||
BRAIN_TOKEN_MCP = os.getenv('BRAIN_TOKEN_MCP') or os.getenv('BRAIN_TOKEN')
|
||||
|
||||
# Scopes autorisés pour le token MCP
|
||||
MCP_SCOPES = ['public', 'work']
|
||||
|
||||
logging.basicConfig(level=logging.INFO, format='%(asctime)s %(levelname)s %(message)s')
|
||||
log = logging.getLogger('brain-mcp')
|
||||
|
||||
# ── MCP Server ─────────────────────────────────────────────────────────────────
|
||||
|
||||
mcp = FastMCP(
|
||||
name='brain',
|
||||
instructions=(
|
||||
'Brain-as-a-Service — mémoire sémantique du brain. '
|
||||
'Utilise brain_search pour trouver du contexte précis sur un sujet. '
|
||||
'Utilise brain_boot au démarrage d\'une session pour charger le contexte actif. '
|
||||
'Les résultats sont des chunks de fichiers markdown classés par pertinence. '
|
||||
'Zones accessibles : focus, todos, projets, agents, infrastructure.'
|
||||
),
|
||||
)
|
||||
|
||||
|
||||
# ── Auth middleware ─────────────────────────────────────────────────────────────
|
||||
|
||||
class BrainAuthMiddleware:
|
||||
"""
|
||||
Wrapper ASGI — vérifie x-api-key avant chaque requête MCP.
|
||||
Note : les dunders Python (__call__) sont résolus sur la classe, pas l'instance.
|
||||
Un vrai wrapper ASGI est requis (monkey-patch d'instance ne fonctionne pas).
|
||||
"""
|
||||
def __init__(self, app, token: str | None):
|
||||
self._app = app
|
||||
self._token = token
|
||||
|
||||
async def __call__(self, scope, receive, send):
|
||||
if scope['type'] == 'http' and self._token:
|
||||
headers = dict(scope.get('headers', []))
|
||||
api_key = headers.get(b'x-api-key', b'').decode()
|
||||
if api_key != self._token:
|
||||
async def _send_401():
|
||||
await send({'type': 'http.response.start', 'status': 401,
|
||||
'headers': [(b'content-type', b'application/json')]})
|
||||
await send({'type': 'http.response.body',
|
||||
'body': b'{"error":"Unauthorized"}', 'more_body': False})
|
||||
await _send_401()
|
||||
return
|
||||
await self._app(scope, receive, send)
|
||||
|
||||
|
||||
mcp_app = BrainAuthMiddleware(mcp.streamable_http_app(), BRAIN_TOKEN_MCP)
|
||||
|
||||
|
||||
# ── Outils MCP ─────────────────────────────────────────────────────────────────
|
||||
|
||||
@mcp.tool()
|
||||
def brain_search(query: str, top: int = 5, full: bool = False) -> str:
|
||||
"""
|
||||
Recherche sémantique dans le brain.
|
||||
|
||||
Args:
|
||||
query : Question en langage naturel (ex: "comment fonctionne le BSI v2 ?")
|
||||
top : Nombre de résultats (défaut: 5, max recommandé: 10)
|
||||
full : True = chunks complets, False = extraits 120 chars (défaut)
|
||||
|
||||
Returns:
|
||||
Bloc markdown avec les chunks les plus pertinents, triés par score.
|
||||
Chaque résultat indique le filepath source et un extrait du contenu.
|
||||
"""
|
||||
log.info('brain_search query=%r top=%d full=%s', query, top, full)
|
||||
results = run_single_query(query, top_k=top, allowed_scopes=MCP_SCOPES)
|
||||
if not results:
|
||||
return f'Aucun résultat pour : {query!r}'
|
||||
label = f'brain_search — {query}'
|
||||
return format_full(results, label=label) if full else format_compact(results, label=label)
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
def brain_state() -> str:
|
||||
"""
|
||||
Environnement fondamental du brain — dérivé en temps réel, jamais stocké.
|
||||
|
||||
Retourne les services actifs (pm2), la version brain (git), et les ports
|
||||
configurés. Layer 2 uniquement (localhost).
|
||||
|
||||
À appeler en début de session pour connaître l'état de l'infrastructure
|
||||
sans avoir à demander "quel port ? quel service tourne ?".
|
||||
|
||||
Returns:
|
||||
Bloc markdown structuré avec hostname, version, pm2 status, ports.
|
||||
"Indisponible" si brain-engine hors ligne.
|
||||
"""
|
||||
import json
|
||||
import urllib.request
|
||||
log.info('brain_state')
|
||||
try:
|
||||
with urllib.request.urlopen('http://127.0.0.1:7700/state', timeout=3) as resp:
|
||||
data = json.loads(resp.read())
|
||||
lines = [f'## Environnement fondamental\n']
|
||||
lines.append(f"**Machine** : {data.get('hostname', '?')}")
|
||||
lines.append(f"**Brain** : {data.get('brain_version', '?')}\n")
|
||||
pm2 = data.get('pm2', [])
|
||||
if pm2:
|
||||
lines.append('**Services (pm2)**')
|
||||
lines.append('| Nom | Status | Restarts |')
|
||||
lines.append('|-----|--------|---------|')
|
||||
for p in pm2:
|
||||
icon = '🟢' if p.get('status') == 'online' else '🔴'
|
||||
lines.append(f"| {p['name']} | {icon} {p.get('status','?')} | {p.get('restarts',0)} |")
|
||||
ports = data.get('ports', {})
|
||||
if ports:
|
||||
lines.append(f"\n**Ports** : engine={ports.get('brain_engine','?')} · mcp={ports.get('brain_mcp','?')} · key={ports.get('brain_key','?')}")
|
||||
return '\n'.join(lines)
|
||||
except Exception as exc:
|
||||
log.warning('brain_state failed: %s', exc)
|
||||
return f'Environnement indisponible : {exc}'
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
def brain_boot() -> str:
|
||||
"""
|
||||
Charge le contexte de boot du brain.
|
||||
|
||||
Séquence :
|
||||
1. brain/now.md — slot garanti (push de la session précédente)
|
||||
2. brain_state() — environnement fondamental dérivé (pm2, ports)
|
||||
3. 3 queries RAG ciblées (décisions récentes, todos prioritaires, sprint actif)
|
||||
|
||||
À appeler en début de session pour enrichir le contexte sans saturer le
|
||||
context window. Exit silencieux si Ollama indisponible.
|
||||
|
||||
Returns:
|
||||
Bloc markdown additif avec contexte de boot complet.
|
||||
"""
|
||||
log.info('brain_boot')
|
||||
sections = []
|
||||
|
||||
# 1. Slot garanti — brain/now.md
|
||||
now_path = Path(__file__).parent.parent / 'brain' / 'now.md'
|
||||
if now_path.exists():
|
||||
try:
|
||||
content = now_path.read_text(encoding='utf-8').strip()
|
||||
if content:
|
||||
sections.append(content)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
# 2. Environnement dérivé
|
||||
env = brain_state()
|
||||
if env and 'Indisponible' not in env:
|
||||
sections.append(env)
|
||||
|
||||
# 3. RAG queries
|
||||
results = run_boot_queries(allowed_scopes=MCP_SCOPES)
|
||||
if results:
|
||||
sections.append(format_compact(results, label='brain_boot'))
|
||||
|
||||
return '\n\n---\n\n'.join(sections) if sections else ''
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
def brain_workflows() -> str:
|
||||
"""
|
||||
Retourne les workflows actifs du brain (claims BSI ouverts).
|
||||
|
||||
Returns:
|
||||
Bloc markdown avec les workflows en cours : nom, projet, étapes, statuts.
|
||||
Utile en début de session pour connaître l'état des sprints actifs.
|
||||
"""
|
||||
import json
|
||||
import urllib.request
|
||||
log.info('brain_workflows')
|
||||
try:
|
||||
url = f'http://127.0.0.1:7700/workflows'
|
||||
with urllib.request.urlopen(url, timeout=3) as resp:
|
||||
data = json.loads(resp.read())
|
||||
workflows = data.get('workflows', [])
|
||||
if not workflows:
|
||||
return 'Aucun workflow actif.'
|
||||
lines = ['## Workflows actifs\n']
|
||||
for wf in workflows:
|
||||
lines.append(f"### {wf.get('name', wf.get('id', '?'))} — {wf.get('project', '')}")
|
||||
for step in wf.get('steps', []):
|
||||
status = step.get('status', '?')
|
||||
icon = {'done': '✅', 'in-progress': '🔄', 'pending': '⬜',
|
||||
'gate': '🔶', 'blocked': '🔴', 'fail': '❌'}.get(status, '•')
|
||||
gate = ' [GATE]' if step.get('isGate') else ''
|
||||
lines.append(f" {icon} {step.get('label', step.get('id', '?'))}{gate}")
|
||||
lines.append('')
|
||||
return '\n'.join(lines)
|
||||
except Exception as exc:
|
||||
log.warning('brain_workflows failed: %s', exc)
|
||||
return f'Workflows indisponibles : {exc}'
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
def brain_agents(name: str = '') -> str:
|
||||
"""
|
||||
Retourne les agents disponibles dans le brain.
|
||||
|
||||
Args:
|
||||
name : Nom de l'agent (sans extension .md). Si vide, retourne la liste
|
||||
complète. Exemple : "debug", "vps", "code-review".
|
||||
|
||||
Returns:
|
||||
Liste des agents en tableau markdown (nom, status, context_tier, description)
|
||||
ou contenu brut du fichier agents/{name}.md si name fourni.
|
||||
Fallback filesystem si brain-engine indisponible.
|
||||
"""
|
||||
import json
|
||||
import urllib.request
|
||||
BRAIN_ROOT = Path(__file__).parent.parent
|
||||
log.info('brain_agents name=%r', name)
|
||||
|
||||
if name:
|
||||
agent_path = BRAIN_ROOT / 'agents' / f'{name}.md'
|
||||
if not agent_path.exists():
|
||||
return f'Agent introuvable : agents/{name}.md'
|
||||
return agent_path.read_text(encoding='utf-8')
|
||||
|
||||
# Liste via brain-engine
|
||||
try:
|
||||
with urllib.request.urlopen('http://127.0.0.1:7700/agents', timeout=3) as resp:
|
||||
data = json.loads(resp.read())
|
||||
agents = data.get('agents', data) if isinstance(data, dict) else data
|
||||
if not agents:
|
||||
return 'Aucun agent trouvé.'
|
||||
lines = ['## Agents disponibles\n', '| Nom | Status | Tier | Description |',
|
||||
'|-----|--------|------|-------------|']
|
||||
for ag in agents:
|
||||
nom = ag.get('name', ag.get('id', '?'))
|
||||
stat = ag.get('status', '—')
|
||||
tier = ag.get('context_tier', '—')
|
||||
desc = (ag.get('boot_summary') or ag.get('description') or '')[:80]
|
||||
lines.append(f'| {nom} | {stat} | {tier} | {desc} |')
|
||||
return '\n'.join(lines)
|
||||
except Exception as exc:
|
||||
log.warning('brain_agents HTTP failed, fallback filesystem: %s', exc)
|
||||
|
||||
# Fallback filesystem
|
||||
agents_dir = BRAIN_ROOT / 'agents'
|
||||
if not agents_dir.exists():
|
||||
return 'Répertoire agents/ introuvable.'
|
||||
files = sorted(agents_dir.glob('*.md'))
|
||||
if not files:
|
||||
return 'Aucun agent trouvé.'
|
||||
lines = ['## Agents disponibles (filesystem)\n', '| Nom |', '|-----|']
|
||||
for f in files:
|
||||
lines.append(f'| {f.stem} |')
|
||||
return '\n'.join(lines)
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
def brain_decisions(last: int = 5) -> str:
|
||||
"""
|
||||
Retourne les dernières décisions architecturales (ADRs).
|
||||
|
||||
Lit les fichiers profil/decisions/*.md, triés par nom décroissant
|
||||
(numérotation → plus récent en premier).
|
||||
|
||||
Args:
|
||||
last : Nombre d'ADRs à retourner (défaut: 5).
|
||||
|
||||
Returns:
|
||||
Bloc markdown avec numéro, titre, statut, date et résumé (150 chars)
|
||||
de chaque ADR. "Aucune décision trouvée" si le répertoire est absent.
|
||||
"""
|
||||
BRAIN_ROOT = Path(__file__).parent.parent
|
||||
log.info('brain_decisions last=%d', last)
|
||||
decisions_dir = BRAIN_ROOT / 'profil' / 'decisions'
|
||||
if not decisions_dir.exists():
|
||||
return 'Aucune décision trouvée.'
|
||||
files = sorted(decisions_dir.glob('*.md'), reverse=True)[:last]
|
||||
if not files:
|
||||
return 'Aucune décision trouvée.'
|
||||
lines = ['## Décisions architecturales récentes\n']
|
||||
for f in files:
|
||||
body = f.read_text(encoding='utf-8')
|
||||
# Extraire titre (première ligne # ...)
|
||||
titre = next((l.lstrip('# ').strip() for l in body.splitlines() if l.startswith('#')), f.stem)
|
||||
# Extraire statut et date depuis les premières lignes (format ADR standard)
|
||||
statut = '—'
|
||||
date = '—'
|
||||
for line in body.splitlines():
|
||||
ll = line.lower()
|
||||
if ll.startswith('statut') or ll.startswith('status') or ll.startswith('- statut'):
|
||||
statut = line.split(':', 1)[-1].strip()
|
||||
if ll.startswith('date') or ll.startswith('- date'):
|
||||
date = line.split(':', 1)[-1].strip()
|
||||
# Résumé : premier paragraphe non-titre non-vide de moins de 150 chars
|
||||
resume = ''
|
||||
for line in body.splitlines():
|
||||
if line.startswith('#') or not line.strip():
|
||||
continue
|
||||
resume = line.strip()[:150]
|
||||
break
|
||||
lines.append(f'### {f.stem} — {titre}')
|
||||
lines.append(f'**Statut** : {statut} | **Date** : {date}')
|
||||
lines.append(f'{resume}')
|
||||
lines.append('')
|
||||
return '\n'.join(lines)
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
def brain_focus() -> str:
|
||||
"""
|
||||
Retourne le focus actuel du brain.
|
||||
|
||||
Lit BRAIN_ROOT/focus.md et retourne le contenu brut.
|
||||
Utile pour connaître la direction active, les projets en cours et les blockers.
|
||||
|
||||
Returns:
|
||||
Contenu complet de focus.md ou "focus.md non trouvé".
|
||||
"""
|
||||
BRAIN_ROOT = Path(__file__).parent.parent
|
||||
log.info('brain_focus')
|
||||
focus_path = BRAIN_ROOT / 'focus.md'
|
||||
if not focus_path.exists():
|
||||
return 'focus.md non trouvé.'
|
||||
return focus_path.read_text(encoding='utf-8')
|
||||
|
||||
|
||||
@mcp.tool()
|
||||
def brain_write(path: str, content: str) -> str:
|
||||
"""
|
||||
Écrit un fichier dans le brain via PUT /brain/{path}.
|
||||
|
||||
Réservé aux sessions owner. Permet de mettre à jour n'importe quel fichier
|
||||
du brain depuis une session Claude avec MCP actif.
|
||||
|
||||
Args:
|
||||
path : Chemin relatif dans le brain (ex: "focus.md", "todos/sprint.md").
|
||||
content : Contenu complet du fichier à écrire.
|
||||
|
||||
Returns:
|
||||
JSON {"ok": true, "path": path} en cas de succès,
|
||||
message d'erreur sinon. 403 → "Requiert tier owner".
|
||||
"""
|
||||
import json
|
||||
import urllib.request
|
||||
log.info('brain_write path=%r len=%d', path, len(content))
|
||||
url = f'http://127.0.0.1:7700/brain/{path}'
|
||||
payload = json.dumps({'content': content}).encode('utf-8')
|
||||
req = urllib.request.Request(
|
||||
url, data=payload, method='PUT',
|
||||
headers={'Content-Type': 'application/json'},
|
||||
)
|
||||
try:
|
||||
with urllib.request.urlopen(req, timeout=5) as resp:
|
||||
body = resp.read()
|
||||
return json.dumps({'ok': True, 'path': path})
|
||||
except urllib.error.HTTPError as exc:
|
||||
if exc.code == 403:
|
||||
return 'Requiert tier owner — écriture refusée.'
|
||||
return f'Erreur {exc.code} : {exc.reason}'
|
||||
except Exception as exc:
|
||||
log.warning('brain_write failed: %s', exc)
|
||||
return f'brain_write indisponible : {exc}'
|
||||
|
||||
|
||||
# ── Entrypoint ─────────────────────────────────────────────────────────────────
|
||||
|
||||
if __name__ == '__main__':
|
||||
import uvicorn
|
||||
auth_status = 'token actif' if BRAIN_TOKEN_MCP else 'auth désactivée (dev)'
|
||||
log.info('Brain MCP BE-4 — port %d — %s — scopes: %s',
|
||||
BRAIN_MCP_PORT, auth_status, MCP_SCOPES)
|
||||
uvicorn.run(mcp_app, host='0.0.0.0', port=BRAIN_MCP_PORT,
|
||||
forwarded_allow_ips='*', proxy_headers=True)
|
||||
348
brain-engine/migrate.py
Normal file
348
brain-engine/migrate.py
Normal file
@@ -0,0 +1,348 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
brain-engine/migrate.py — Migration BE-1 + BE-2b
|
||||
Ingère les sources existantes du brain dans brain.db
|
||||
|
||||
Sources :
|
||||
- claims/*.yml → table claims
|
||||
- BRAIN-INDEX.md ## Signals → table signals (parsing markdown)
|
||||
- handoffs/*.md → table handoffs (parsing frontmatter)
|
||||
- claims → sessions → table sessions (dérivée depuis claims, BE-2b)
|
||||
|
||||
Usage :
|
||||
python3 brain-engine/migrate.py [--dry-run] [--reset]
|
||||
|
||||
Anti-drift :
|
||||
- Lecture seule sur les sources — jamais de modification des .md
|
||||
- Idempotent — relancer ne duplique pas les données (UPSERT)
|
||||
- En cas d'erreur parsing → warning + skip, pas de crash
|
||||
"""
|
||||
|
||||
import sqlite3
|
||||
import os
|
||||
import re
|
||||
import sys
|
||||
import argparse
|
||||
from datetime import datetime
|
||||
|
||||
BRAIN_ROOT = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
|
||||
DB_PATH = os.path.join(BRAIN_ROOT, 'brain.db')
|
||||
SCHEMA_PATH = os.path.join(BRAIN_ROOT, 'brain-engine', 'schema.sql')
|
||||
|
||||
|
||||
def connect(db_path: str) -> sqlite3.Connection:
|
||||
conn = sqlite3.connect(db_path)
|
||||
conn.row_factory = sqlite3.Row
|
||||
conn.execute("PRAGMA journal_mode=WAL")
|
||||
conn.execute("PRAGMA foreign_keys=ON")
|
||||
return conn
|
||||
|
||||
|
||||
def init_schema(conn: sqlite3.Connection):
|
||||
with open(SCHEMA_PATH) as f:
|
||||
schema = f.read()
|
||||
conn.executescript(schema)
|
||||
conn.commit()
|
||||
print(f"✅ Schema initialisé depuis {SCHEMA_PATH}")
|
||||
|
||||
|
||||
def parse_yml_field(content: str, field: str, default=None) -> str:
|
||||
"""Extrait un champ YAML simple (pas de parsing YAML complet — volontaire)."""
|
||||
m = re.search(rf'^{re.escape(field)}:\s*(.+)', content, re.MULTILINE)
|
||||
if m:
|
||||
return m.group(1).strip().strip('"\'')
|
||||
return default
|
||||
|
||||
|
||||
def migrate_claims(conn: sqlite3.Connection, dry_run: bool = False) -> int:
|
||||
"""Migre claims/*.yml → table claims."""
|
||||
claims_dir = os.path.join(BRAIN_ROOT, 'claims')
|
||||
if not os.path.isdir(claims_dir):
|
||||
print(f"⚠️ claims/ introuvable : {claims_dir}")
|
||||
return 0
|
||||
|
||||
count = 0
|
||||
for filename in sorted(os.listdir(claims_dir)):
|
||||
if not filename.startswith('sess-') or not filename.endswith('.yml'):
|
||||
continue
|
||||
|
||||
filepath = os.path.join(claims_dir, filename)
|
||||
try:
|
||||
with open(filepath) as f:
|
||||
content = f.read()
|
||||
except Exception as e:
|
||||
print(f" ⚠️ {filename} : erreur lecture — {e}")
|
||||
continue
|
||||
|
||||
# Gère v1 (name:) et v2 (sess_id:)
|
||||
sess_id = parse_yml_field(content, 'sess_id') or \
|
||||
parse_yml_field(content, 'name', filename.replace('.yml', ''))
|
||||
scope = parse_yml_field(content, 'scope', '—')
|
||||
status = parse_yml_field(content, 'status', 'closed')
|
||||
opened_at = parse_yml_field(content, 'opened_at') or \
|
||||
parse_yml_field(content, 'opened', '—')
|
||||
closed_at = parse_yml_field(content, 'closed_at') or \
|
||||
parse_yml_field(content, 'closed')
|
||||
sess_type = parse_yml_field(content, 'type', 'brain')
|
||||
handoff_lvl = parse_yml_field(content, 'handoff_level')
|
||||
story_angle = parse_yml_field(content, 'story_angle')
|
||||
|
||||
if not sess_id or sess_id == '—':
|
||||
print(f" ⚠️ {filename} : sess_id introuvable — skippé")
|
||||
continue
|
||||
|
||||
if not dry_run:
|
||||
conn.execute("""
|
||||
INSERT INTO claims(sess_id, type, scope, status, opened_at, closed_at,
|
||||
handoff_level, story_angle)
|
||||
VALUES (?,?,?,?,?,?,?,?)
|
||||
ON CONFLICT(sess_id) DO UPDATE SET
|
||||
status=excluded.status,
|
||||
closed_at=excluded.closed_at,
|
||||
story_angle=excluded.story_angle
|
||||
""", (sess_id, sess_type, scope, status, opened_at, closed_at,
|
||||
handoff_lvl, story_angle))
|
||||
else:
|
||||
print(f" [dry] claim: {sess_id} | {status} | {scope}")
|
||||
|
||||
count += 1
|
||||
|
||||
if not dry_run:
|
||||
conn.commit()
|
||||
print(f"✅ Claims migrés : {count}")
|
||||
return count
|
||||
|
||||
|
||||
def migrate_signals(conn: sqlite3.Connection, dry_run: bool = False) -> int:
|
||||
"""Migre ## Signals depuis BRAIN-INDEX.md → table signals."""
|
||||
index_path = os.path.join(BRAIN_ROOT, 'BRAIN-INDEX.md')
|
||||
if not os.path.exists(index_path):
|
||||
print(f"⚠️ BRAIN-INDEX.md introuvable")
|
||||
return 0
|
||||
|
||||
with open(index_path) as f:
|
||||
content = f.read()
|
||||
|
||||
# Extraire la section ## Signals
|
||||
m = re.search(r'## Signals.*?\n(.*?)(?=\n##|\Z)', content, re.DOTALL)
|
||||
if not m:
|
||||
print("⚠️ Section ## Signals non trouvée dans BRAIN-INDEX.md")
|
||||
return 0
|
||||
|
||||
signals_section = m.group(1)
|
||||
|
||||
# Parser le tableau markdown
|
||||
# Format : | sig_id | De | Pour | Type | Concerné | Payload | État |
|
||||
row_pattern = re.compile(
|
||||
r'^\|\s*([^|]+?)\s*\|\s*([^|]+?)\s*\|\s*([^|]+?)\s*\|\s*([^|]+?)\s*\|\s*([^|]*?)\s*\|\s*([^|]*?)\s*\|\s*([^|]*?)\s*\|',
|
||||
re.MULTILINE
|
||||
)
|
||||
|
||||
count = 0
|
||||
for m in row_pattern.finditer(signals_section):
|
||||
sig_id, from_sess, to_sess, sig_type, projet, payload, state = [
|
||||
v.strip() for v in m.groups()
|
||||
]
|
||||
|
||||
# Ignorer les lignes d'en-tête
|
||||
if sig_id.startswith('ID') or sig_id.startswith('-'):
|
||||
continue
|
||||
if not sig_id.startswith('sig-'):
|
||||
continue
|
||||
|
||||
VALID_TYPES = {'READY_FOR_REVIEW', 'REVIEWED', 'BLOCKED_ON', 'HANDOFF', 'CHECKPOINT', 'INFO'}
|
||||
if sig_type not in VALID_TYPES:
|
||||
continue
|
||||
|
||||
state = state.lower().strip()
|
||||
if state not in ('pending', 'delivered', 'archived'):
|
||||
state = 'delivered'
|
||||
|
||||
if not dry_run:
|
||||
conn.execute("""
|
||||
INSERT INTO signals(sig_id, from_sess, to_sess, type, projet, payload, state, created_at)
|
||||
VALUES (?,?,?,?,?,?,?,?)
|
||||
ON CONFLICT(sig_id) DO UPDATE SET state=excluded.state
|
||||
""", (sig_id, from_sess, to_sess, sig_type, projet, payload, state,
|
||||
datetime.now().isoformat()))
|
||||
else:
|
||||
print(f" [dry] signal: {sig_id} | {sig_type} | {state}")
|
||||
|
||||
count += 1
|
||||
|
||||
if not dry_run:
|
||||
conn.commit()
|
||||
print(f"✅ Signals migrés : {count}")
|
||||
return count
|
||||
|
||||
|
||||
def migrate_handoffs(conn: sqlite3.Connection, dry_run: bool = False) -> int:
|
||||
"""Migre handoffs/*.md → table handoffs."""
|
||||
handoffs_dir = os.path.join(BRAIN_ROOT, 'handoffs')
|
||||
if not os.path.isdir(handoffs_dir):
|
||||
print(f"⚠️ handoffs/ introuvable : {handoffs_dir}")
|
||||
return 0
|
||||
|
||||
count = 0
|
||||
for filename in sorted(os.listdir(handoffs_dir)):
|
||||
if not filename.endswith('.md') or filename.startswith('_'):
|
||||
continue
|
||||
|
||||
filepath = os.path.join(handoffs_dir, filename)
|
||||
try:
|
||||
with open(filepath) as f:
|
||||
content = f.read()
|
||||
except Exception as e:
|
||||
print(f" ⚠️ {filename} : erreur lecture — {e}")
|
||||
continue
|
||||
|
||||
# Extraire le frontmatter
|
||||
fm_match = re.match(r'^---\n(.*?)\n---', content, re.DOTALL)
|
||||
if not fm_match:
|
||||
continue
|
||||
|
||||
fm = fm_match.group(1)
|
||||
htype = parse_yml_field(fm, 'type', 'HANDOFF')
|
||||
projet = parse_yml_field(fm, 'projet') or parse_yml_field(fm, 'project')
|
||||
status = parse_yml_field(fm, 'status', 'active')
|
||||
from_s = parse_yml_field(fm, 'from') or parse_yml_field(fm, 'source')
|
||||
created = parse_yml_field(fm, 'created') or parse_yml_field(fm, 'date',
|
||||
datetime.now().strftime('%Y-%m-%d'))
|
||||
|
||||
if status not in ('active', 'consumed', 'archived'):
|
||||
status = 'active'
|
||||
|
||||
if not dry_run:
|
||||
conn.execute("""
|
||||
INSERT INTO handoffs(filename, type, projet, status, from_sess, created_at)
|
||||
VALUES (?,?,?,?,?,?)
|
||||
ON CONFLICT(filename) DO UPDATE SET status=excluded.status
|
||||
""", (filename, htype, projet, status, from_s, created))
|
||||
else:
|
||||
print(f" [dry] handoff: {filename} | {status}")
|
||||
|
||||
count += 1
|
||||
|
||||
if not dry_run:
|
||||
conn.commit()
|
||||
print(f"✅ Handoffs migrés : {count}")
|
||||
return count
|
||||
|
||||
|
||||
def migrate_sessions(conn: sqlite3.Connection, dry_run: bool = False) -> int:
|
||||
"""
|
||||
Peuple la table sessions depuis claims (BE-2b).
|
||||
|
||||
Stratégie : claims = sessions — chaque claim est une session brain.
|
||||
Les champs metabolism (tokens_used, duration_min, etc.) restent NULL
|
||||
jusqu'à ce que metabolism-scribe les alimente directement.
|
||||
|
||||
Mapping :
|
||||
claims.sess_id → sessions.sess_id
|
||||
claims.opened_at → sessions.date (partie date uniquement)
|
||||
claims.type → sessions.type
|
||||
claims.handoff_level → sessions.handoff_level
|
||||
claims.health_score → sessions.health_score (si présent dans yml)
|
||||
claims.cold_start_kpi_pass → sessions.cold_start_kpi_pass
|
||||
"""
|
||||
if dry_run:
|
||||
rows = conn.execute("SELECT COUNT(*) as n FROM claims").fetchone()
|
||||
print(f" [dry] sessions à créer depuis claims : {rows['n']}")
|
||||
return rows['n']
|
||||
|
||||
# UPSERT : ne pas écraser les champs metabolism déjà renseignés
|
||||
conn.execute("""
|
||||
INSERT INTO sessions(sess_id, date, type, handoff_level, health_score, cold_start_kpi_pass)
|
||||
SELECT
|
||||
c.sess_id,
|
||||
SUBSTR(c.opened_at, 1, 10) AS date,
|
||||
c.type,
|
||||
c.handoff_level,
|
||||
c.health_score,
|
||||
c.cold_start_kpi_pass
|
||||
FROM claims c
|
||||
WHERE TRUE
|
||||
ON CONFLICT(sess_id) DO UPDATE SET
|
||||
date = COALESCE(excluded.date, sessions.date),
|
||||
type = COALESCE(excluded.type, sessions.type),
|
||||
handoff_level = COALESCE(excluded.handoff_level, sessions.handoff_level),
|
||||
health_score = COALESCE(excluded.health_score, sessions.health_score),
|
||||
cold_start_kpi_pass = COALESCE(excluded.cold_start_kpi_pass, sessions.cold_start_kpi_pass)
|
||||
""")
|
||||
conn.commit()
|
||||
|
||||
count = conn.execute("SELECT COUNT(*) FROM sessions").fetchone()[0]
|
||||
kpi_row = conn.execute("""
|
||||
SELECT
|
||||
COUNT(*) as total,
|
||||
SUM(CASE WHEN cold_start_kpi_pass = 1 THEN 1 ELSE 0 END) as passes
|
||||
FROM sessions WHERE handoff_level = 'NO'
|
||||
""").fetchone()
|
||||
|
||||
print(f"✅ Sessions migrées : {count}")
|
||||
if kpi_row and kpi_row[0] > 0:
|
||||
print(f" cold_start KPI (handoff=NO) : {kpi_row[1]}/{kpi_row[0]} passes")
|
||||
return count
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description='Brain state engine — migration BE-1 + BE-2b')
|
||||
parser.add_argument('--dry-run', action='store_true', help='Simulation sans écriture')
|
||||
parser.add_argument('--reset', action='store_true', help='Supprimer brain.db avant migration')
|
||||
parser.add_argument('--sessions-only', action='store_true', help='Rejouer uniquement migrate_sessions')
|
||||
args = parser.parse_args()
|
||||
|
||||
if args.reset and os.path.exists(DB_PATH):
|
||||
os.remove(DB_PATH)
|
||||
print(f"♻️ brain.db supprimé — reconstruction depuis zéro")
|
||||
|
||||
print(f"Brain root : {BRAIN_ROOT}")
|
||||
print(f"DB path : {DB_PATH}")
|
||||
print(f"Mode : {'DRY RUN' if args.dry_run else 'WRITE'}")
|
||||
print()
|
||||
|
||||
conn = connect(DB_PATH)
|
||||
init_schema(conn)
|
||||
|
||||
if args.sessions_only:
|
||||
print("\n── Sessions (replay) ───────────────────")
|
||||
migrate_sessions(conn, dry_run=args.dry_run)
|
||||
else:
|
||||
print("\n── Claims ──────────────────────────────")
|
||||
migrate_claims(conn, dry_run=args.dry_run)
|
||||
|
||||
print("\n── Signals ─────────────────────────────")
|
||||
migrate_signals(conn, dry_run=args.dry_run)
|
||||
|
||||
print("\n── Handoffs ────────────────────────────")
|
||||
migrate_handoffs(conn, dry_run=args.dry_run)
|
||||
|
||||
print("\n── Sessions ────────────────────────────")
|
||||
migrate_sessions(conn, dry_run=args.dry_run)
|
||||
|
||||
if not args.dry_run:
|
||||
# Vérification finale
|
||||
print("\n── Vérification ────────────────────────")
|
||||
for table in ('claims', 'signals', 'handoffs', 'agent_memory', 'sessions'):
|
||||
row = conn.execute(f"SELECT COUNT(*) as n FROM {table}").fetchone()
|
||||
print(f" {table:<15} : {row['n']} entrées")
|
||||
|
||||
print("\n── Vues ────────────────────────────────")
|
||||
row = conn.execute("SELECT * FROM v_open_claims").fetchall()
|
||||
print(f" v_open_claims : {len(row)} claim(s) open")
|
||||
row = conn.execute("SELECT * FROM v_stale_claims").fetchall()
|
||||
if row:
|
||||
print(f" ⚠️ v_stale_claims : {len(row)} claim(s) stale !")
|
||||
else:
|
||||
print(f" v_stale_claims : ✅ aucun stale")
|
||||
row = conn.execute("SELECT * FROM v_cold_start_kpi").fetchone()
|
||||
if row and row['total_no_handoff'] > 0:
|
||||
rate = row['pass_rate_pct'] or 0
|
||||
print(f" v_cold_start_kpi: {row['passes']}/{row['total_no_handoff']} passes ({rate:.0f}%)")
|
||||
|
||||
conn.close()
|
||||
print(f"\n✅ Migration terminée — brain.db prêt")
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
30
brain-engine/queries/cold-start-kpi.sql
Normal file
30
brain-engine/queries/cold-start-kpi.sql
Normal file
@@ -0,0 +1,30 @@
|
||||
-- cold-start-kpi.sql — KPI North Star : NO HANDOFF productif < 2 min
|
||||
-- Ref : brain-constitution.md §3
|
||||
-- Usage : sqlite3 brain.db < brain-engine/queries/cold-start-kpi.sql
|
||||
|
||||
-- Vue globale
|
||||
SELECT
|
||||
total_no_handoff,
|
||||
passes,
|
||||
pass_rate_pct || '%' AS pass_rate,
|
||||
CASE
|
||||
WHEN pass_rate_pct >= 80 THEN '✅ Layer 0 stable'
|
||||
WHEN pass_rate_pct >= 60 THEN '⚠️ Layer 0 à surveiller'
|
||||
ELSE '🔴 Layer 0 insuffisant — enrichir brain-constitution.md'
|
||||
END AS verdict
|
||||
FROM v_cold_start_kpi;
|
||||
|
||||
-- Détail par session
|
||||
SELECT
|
||||
sess_id,
|
||||
date,
|
||||
CASE cold_start_kpi_pass
|
||||
WHEN 1 THEN '✅ pass'
|
||||
WHEN 0 THEN '❌ fail'
|
||||
ELSE '— non mesuré'
|
||||
END AS kpi,
|
||||
notes
|
||||
FROM sessions
|
||||
WHERE handoff_level = 'NO'
|
||||
ORDER BY date DESC
|
||||
LIMIT 10;
|
||||
16
brain-engine/queries/graduation-candidates.sql
Normal file
16
brain-engine/queries/graduation-candidates.sql
Normal file
@@ -0,0 +1,16 @@
|
||||
-- graduation-candidates.sql — Patterns L3a prêts pour graduation vers L3b (toolkit)
|
||||
-- Usage : sqlite3 brain.db < brain-engine/queries/graduation-candidates.sql
|
||||
|
||||
SELECT
|
||||
agent,
|
||||
projet,
|
||||
stack,
|
||||
pattern_id,
|
||||
validations,
|
||||
seuil_graduation,
|
||||
ROUND(CAST(validations AS REAL) / seuil_graduation * 100) || '%' AS progress,
|
||||
last_validated
|
||||
FROM agent_memory
|
||||
WHERE graduated = 0
|
||||
AND validations >= seuil_graduation
|
||||
ORDER BY validations DESC;
|
||||
24
brain-engine/queries/metabolism-dashboard.sql
Normal file
24
brain-engine/queries/metabolism-dashboard.sql
Normal file
@@ -0,0 +1,24 @@
|
||||
-- metabolism-dashboard.sql — Vue santé brain sur 7 jours
|
||||
-- Usage : sqlite3 brain.db < brain-engine/queries/metabolism-dashboard.sql
|
||||
|
||||
-- Ratio use-brain / build-brain sur 7 jours
|
||||
SELECT
|
||||
COUNT(*) AS sessions_7d,
|
||||
SUM(CASE WHEN type = 'build-brain' THEN 1 ELSE 0 END) AS build_brain,
|
||||
SUM(CASE WHEN type = 'use-brain' THEN 1 ELSE 0 END) AS use_brain,
|
||||
ROUND(
|
||||
CAST(SUM(CASE WHEN type='use-brain' THEN 1 ELSE 0 END) AS REAL) /
|
||||
NULLIF(SUM(CASE WHEN type='build-brain' THEN 1 ELSE 0 END), 0),
|
||||
2) AS ratio_use_build,
|
||||
ROUND(AVG(health_score), 2) AS avg_health_score,
|
||||
CASE
|
||||
WHEN ROUND(CAST(SUM(CASE WHEN type='use-brain' THEN 1 ELSE 0 END) AS REAL) /
|
||||
NULLIF(SUM(CASE WHEN type='build-brain' THEN 1 ELSE 0 END), 0), 2) >= 1.0
|
||||
THEN '✅ équilibré'
|
||||
WHEN ROUND(CAST(SUM(CASE WHEN type='use-brain' THEN 1 ELSE 0 END) AS REAL) /
|
||||
NULLIF(SUM(CASE WHEN type='build-brain' THEN 1 ELSE 0 END), 0), 2) >= 0.5
|
||||
THEN '⚠️ à surveiller'
|
||||
ELSE '🔴 boucle narcissique'
|
||||
END AS verdict
|
||||
FROM sessions
|
||||
WHERE date >= date('now', '-7 days');
|
||||
12
brain-engine/queries/stale-claims.sql
Normal file
12
brain-engine/queries/stale-claims.sql
Normal file
@@ -0,0 +1,12 @@
|
||||
-- stale-claims.sql — Claims ouverts depuis plus de 4h
|
||||
-- Usage : sqlite3 brain.db < brain-engine/queries/stale-claims.sql
|
||||
|
||||
SELECT
|
||||
sess_id,
|
||||
scope,
|
||||
opened_at,
|
||||
ROUND((julianday('now') - julianday(opened_at)) * 24, 1) AS age_hours
|
||||
FROM claims
|
||||
WHERE status = 'open'
|
||||
AND julianday('now') > julianday(opened_at, '+4 hours')
|
||||
ORDER BY age_hours DESC;
|
||||
190
brain-engine/rag.py
Normal file
190
brain-engine/rag.py
Normal file
@@ -0,0 +1,190 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
brain-engine/rag.py — Couche RAG BE-3a
|
||||
Enrichit le contexte Claude au boot avec des chunks additifs (non redondants avec helloWorld).
|
||||
|
||||
Usage :
|
||||
python3 brain-engine/rag.py → boot queries (3 ciblées, skip helloWorld)
|
||||
python3 brain-engine/rag.py "query custom" → query ad-hoc (compact)
|
||||
python3 brain-engine/rag.py "query" --full → chunks complets
|
||||
python3 brain-engine/rag.py --json → JSON brut (boot)
|
||||
python3 brain-engine/rag.py "query" --json → JSON brut (ad-hoc)
|
||||
python3 brain-engine/rag.py "query" --top 10 → top-10 résultats
|
||||
|
||||
Output : bloc markdown prêt à injection dans le contexte Claude.
|
||||
Silencieux si aucun résultat ou Ollama indisponible (exit 0).
|
||||
"""
|
||||
|
||||
import sys
|
||||
import json
|
||||
import argparse
|
||||
from pathlib import Path
|
||||
|
||||
# Import search depuis le même répertoire
|
||||
sys.path.insert(0, str(Path(__file__).parent))
|
||||
from search import search as semantic_search
|
||||
|
||||
|
||||
# ── Config ─────────────────────────────────────────────────────────────────────
|
||||
|
||||
# Fichiers déjà chargés par helloWorld — ignorés dans les résultats boot
|
||||
# pour éviter de dupliquer le contexte déjà présent.
|
||||
HELLOWORLD_SKIP = frozenset({
|
||||
'focus.md',
|
||||
'KERNEL.md',
|
||||
'BRAIN-INDEX.md',
|
||||
'agents/helloWorld.md',
|
||||
'agents/secrets-guardian.md',
|
||||
'agents/coach.md',
|
||||
'profil/collaboration.md',
|
||||
})
|
||||
|
||||
# Queries ciblées au boot — surface ce qu'helloWorld ne charge pas.
|
||||
# Chaque tuple : (query, top_k)
|
||||
RAG_BOOT_QUERIES = [
|
||||
("décisions architecturales récentes", 3), # ADRs, choix archi
|
||||
("todos prioritaires backlog actif", 3), # todo/*.md au-delà du README
|
||||
("sprint en cours workspace actif", 2), # workspace/shadow-*/
|
||||
]
|
||||
|
||||
# Seuil minimum au boot — évite le bruit des chunks peu pertinents
|
||||
BOOT_MIN_SCORE = 0.30
|
||||
|
||||
|
||||
# ── Core ───────────────────────────────────────────────────────────────────────
|
||||
|
||||
def run_boot_queries(allowed_scopes: list[str] | None = None) -> list[dict]:
|
||||
"""
|
||||
Exécute les 3 queries boot en séquence.
|
||||
Déduplique par filepath, filtre les fichiers helloWorld.
|
||||
Conserve la query source dans le champ '_query' pour le formatage.
|
||||
"""
|
||||
seen_filepaths: set[str] = set()
|
||||
results: list[dict] = []
|
||||
|
||||
for query, top_k in RAG_BOOT_QUERIES:
|
||||
hits = semantic_search(query, top_k=top_k, min_score=BOOT_MIN_SCORE,
|
||||
allowed_scopes=allowed_scopes)
|
||||
for hit in hits:
|
||||
fp = hit['filepath']
|
||||
if fp in HELLOWORLD_SKIP:
|
||||
continue
|
||||
if fp in seen_filepaths:
|
||||
continue
|
||||
seen_filepaths.add(fp)
|
||||
results.append({**hit, '_query': query})
|
||||
|
||||
return results
|
||||
|
||||
|
||||
def run_single_query(query: str, top_k: int = 5,
|
||||
allowed_scopes: list[str] | None = None) -> list[dict]:
|
||||
"""Query ad-hoc — pas de skip helloWorld, pas de déduplication inter-queries."""
|
||||
hits = semantic_search(query, top_k=top_k, min_score=0.0,
|
||||
allowed_scopes=allowed_scopes)
|
||||
return [{**h, '_query': query} for h in hits]
|
||||
|
||||
|
||||
# ── Formatage ──────────────────────────────────────────────────────────────────
|
||||
|
||||
def format_compact(results: list[dict], label: str = 'RAG boot') -> str:
|
||||
"""
|
||||
Format A (défaut) — filepath + extrait de 120 chars.
|
||||
~100 tokens par chunk, lean pour injection boot.
|
||||
"""
|
||||
if not results:
|
||||
return ''
|
||||
|
||||
lines = [f'## Brain context ({label})\n']
|
||||
current_query: str | None = None
|
||||
|
||||
for r in results:
|
||||
q = r.get('_query', '')
|
||||
if q and q != current_query:
|
||||
current_query = q
|
||||
lines.append(f'\n### {q}\n')
|
||||
|
||||
fp = r['filepath']
|
||||
score = r['score']
|
||||
title = r.get('title') or ''
|
||||
excerpt = r['chunk_text'].replace('\n', ' ')[:120].strip()
|
||||
if title:
|
||||
excerpt = f'[{title}] {excerpt}'
|
||||
|
||||
lines.append(f'- `{fp}` *(score: {score:.2f})* — {excerpt}…\n')
|
||||
|
||||
return ''.join(lines)
|
||||
|
||||
|
||||
def format_full(results: list[dict], label: str = 'RAG — full') -> str:
|
||||
"""
|
||||
Format B (--full) — chunks complets.
|
||||
Pour queries ad-hoc profondes où l'extrait est insuffisant.
|
||||
"""
|
||||
if not results:
|
||||
return ''
|
||||
|
||||
lines = [f'## Brain context ({label})\n']
|
||||
for r in results:
|
||||
fp = r['filepath']
|
||||
score = r['score']
|
||||
title = r.get('title') or ''
|
||||
chunk = r['chunk_text']
|
||||
|
||||
header = f'### `{fp}`'
|
||||
if title:
|
||||
header += f' — {title}'
|
||||
header += f' *(score: {score:.2f})*'
|
||||
|
||||
lines.append(f'\n{header}\n\n{chunk}\n')
|
||||
|
||||
return ''.join(lines)
|
||||
|
||||
|
||||
def format_json(results: list[dict]) -> str:
|
||||
out = [{
|
||||
'score': round(r['score'], 4),
|
||||
'filepath': r['filepath'],
|
||||
'title': r.get('title') or '',
|
||||
'chunk_text': r['chunk_text'],
|
||||
'query': r.get('_query', ''),
|
||||
} for r in results]
|
||||
return json.dumps(out, ensure_ascii=False, indent=2)
|
||||
|
||||
|
||||
# ── CLI ────────────────────────────────────────────────────────────────────────
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description='brain-engine RAG — BE-3a')
|
||||
parser.add_argument('query', nargs='?',
|
||||
help='Query ad-hoc (sans arg = mode boot)')
|
||||
parser.add_argument('--full', action='store_true',
|
||||
help='Chunks complets (défaut: compact)')
|
||||
parser.add_argument('--top', type=int, default=5,
|
||||
help='Top-K pour query ad-hoc (défaut: 5)')
|
||||
parser.add_argument('--json', action='store_true',
|
||||
help='Output JSON brut')
|
||||
args = parser.parse_args()
|
||||
|
||||
# Mode boot si aucune query fournie
|
||||
if not args.query:
|
||||
results = run_boot_queries()
|
||||
label = 'RAG boot'
|
||||
else:
|
||||
results = run_single_query(args.query, top_k=args.top)
|
||||
label = f'RAG — {args.query}'
|
||||
|
||||
# Silencieux si aucun résultat — ne pas polluer le contexte
|
||||
if not results:
|
||||
sys.exit(0)
|
||||
|
||||
if args.json:
|
||||
print(format_json(results))
|
||||
elif args.full:
|
||||
print(format_full(results, label=label))
|
||||
else:
|
||||
print(format_compact(results, label=label))
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
6
brain-engine/requirements.txt
Normal file
6
brain-engine/requirements.txt
Normal file
@@ -0,0 +1,6 @@
|
||||
fastapi>=0.110.0
|
||||
uvicorn[standard]>=0.29.0
|
||||
mcp[cli]>=1.0.0
|
||||
PyYAML>=6.0
|
||||
umap-learn>=0.5.6
|
||||
numpy>=1.26.0
|
||||
188
brain-engine/schema.sql
Normal file
188
brain-engine/schema.sql
Normal file
@@ -0,0 +1,188 @@
|
||||
-- brain-engine/schema.sql — Brain State Engine (BE-1)
|
||||
-- Source de vérité : les .md restent souverains.
|
||||
-- Ce schema est un INDEX QUERYABLE dérivé depuis les fichiers.
|
||||
-- brain.db = lecture seule sur le brain — jamais d'écriture sur les .md.
|
||||
--
|
||||
-- Ref : ADR-012 (L3a), ADR-011 (autonomie), workspace/brain-engine/vision.md
|
||||
-- Migration : brain-engine/migrate.py
|
||||
|
||||
PRAGMA journal_mode=WAL; -- Concurrent reads safe (multi-sessions)
|
||||
PRAGMA foreign_keys=ON;
|
||||
|
||||
-- ── Claims BSI ───────────────────────────────────────────────────────────────
|
||||
-- ADR-036 : source de vérité BSI — claims/*.yml migrent ici
|
||||
CREATE TABLE IF NOT EXISTS claims (
|
||||
sess_id TEXT PRIMARY KEY,
|
||||
type TEXT NOT NULL, -- brainstorm | work | deploy | debug | coach | brain
|
||||
scope TEXT NOT NULL, -- ex: brain/memory-sql
|
||||
status TEXT NOT NULL DEFAULT 'open', -- open | closed | stale
|
||||
opened_at TEXT NOT NULL, -- ISO8601
|
||||
closed_at TEXT, -- ISO8601 — null si encore open
|
||||
handoff_level TEXT, -- NO | SEMI | SEMI+ | FULL
|
||||
story_angle TEXT, -- angle narratif optionnel
|
||||
health_score REAL, -- alimenté par metabolism-scribe au close
|
||||
context_at_close INTEGER, -- % context utilisé au close
|
||||
cold_start_kpi_pass INTEGER, -- 1=true 0=false NULL=non mesuré
|
||||
-- BSI v3 fields (ADR-036)
|
||||
ttl_hours INTEGER DEFAULT 4, -- TTL par défaut deep work
|
||||
expires_at TEXT, -- ISO8601 — calculé au boot
|
||||
instance TEXT, -- brain_name@machine
|
||||
parent_sess TEXT, -- parent_satellite
|
||||
satellite_type TEXT, -- code|brain-write|test|deploy|search|domain
|
||||
satellite_level TEXT, -- leaf|domain
|
||||
theme_branch TEXT, -- theme/<nom>
|
||||
zone TEXT, -- kernel|project|personal (inféré)
|
||||
mode TEXT, -- rendering|pilote|etc.
|
||||
result_status TEXT, -- success|partial|fail
|
||||
result_json TEXT, -- {files_modified, tests, children, signal_id}
|
||||
CHECK(status IN ('open', 'closed', 'stale')),
|
||||
CHECK(handoff_level IN ('NO', 'SEMI', 'SEMI+', 'FULL', NULL))
|
||||
);
|
||||
|
||||
-- ── Locks BSI (ADR-036 — ex file-lock.sh) ──────────────────────────────────
|
||||
CREATE TABLE IF NOT EXISTS locks (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
filepath TEXT NOT NULL UNIQUE, -- chemin normalisé (ex: agents/foo.md)
|
||||
holder TEXT NOT NULL, -- sess_id détenteur
|
||||
claimed_at TEXT NOT NULL DEFAULT (datetime('now')), -- ISO8601
|
||||
expires_at TEXT NOT NULL, -- ISO8601
|
||||
ttl_min INTEGER NOT NULL DEFAULT 60
|
||||
);
|
||||
|
||||
-- ── Circuit breaker BSI (ADR-036) ───────────────────────────────────────────
|
||||
CREATE TABLE IF NOT EXISTS circuit_breaker (
|
||||
sess_id TEXT PRIMARY KEY,
|
||||
fail_count INTEGER NOT NULL DEFAULT 0,
|
||||
last_fail_at TEXT, -- ISO8601
|
||||
updated_at TEXT NOT NULL DEFAULT (datetime('now'))
|
||||
);
|
||||
|
||||
-- ── Signaux inter-sessions ────────────────────────────────────────────────────
|
||||
CREATE TABLE IF NOT EXISTS signals (
|
||||
sig_id TEXT PRIMARY KEY, -- sig-YYYYMMDD-<seq>
|
||||
from_sess TEXT, -- sess_id source
|
||||
to_sess TEXT NOT NULL, -- sess_id cible ou brain_name@machine
|
||||
type TEXT NOT NULL, -- READY_FOR_REVIEW | REVIEWED | BLOCKED_ON | HANDOFF | CHECKPOINT | INFO
|
||||
projet TEXT,
|
||||
payload TEXT, -- description ou chemin handoff file
|
||||
state TEXT NOT NULL DEFAULT 'pending', -- pending | delivered | archived
|
||||
created_at TEXT NOT NULL, -- ISO8601
|
||||
delivered_at TEXT,
|
||||
CHECK(type IN ('READY_FOR_REVIEW','REVIEWED','BLOCKED_ON','HANDOFF','CHECKPOINT','INFO')),
|
||||
CHECK(state IN ('pending','delivered','archived'))
|
||||
);
|
||||
|
||||
-- ── Handoffs ──────────────────────────────────────────────────────────────────
|
||||
CREATE TABLE IF NOT EXISTS handoffs (
|
||||
filename TEXT PRIMARY KEY, -- handoffs/<nom>.md
|
||||
type TEXT, -- CHECKPOINT | HANDOFF | FEEDBACK
|
||||
projet TEXT,
|
||||
status TEXT NOT NULL DEFAULT 'active', -- active | consumed | archived
|
||||
from_sess TEXT,
|
||||
consumed_by TEXT, -- sess_id qui a consommé ce handoff
|
||||
created_at TEXT NOT NULL,
|
||||
consumed_at TEXT,
|
||||
CHECK(status IN ('active','consumed','archived'))
|
||||
);
|
||||
|
||||
-- ── Mémoire agents L3a ────────────────────────────────────────────────────────
|
||||
-- Alimenté par metabolism-scribe via kpi.yml dans agent-memory/<agent>/<projet>/
|
||||
CREATE TABLE IF NOT EXISTS agent_memory (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
agent TEXT NOT NULL, -- ex: tech-lead, debug, vps
|
||||
projet TEXT NOT NULL, -- slug projet
|
||||
stack TEXT NOT NULL, -- ex: node-express-jwt
|
||||
pattern_id TEXT NOT NULL, -- slug du pattern
|
||||
validations INTEGER NOT NULL DEFAULT 0, -- sessions où le pattern a été validé
|
||||
kpi_score REAL NOT NULL DEFAULT 0.0, -- 0.0 → 1.0
|
||||
graduated INTEGER NOT NULL DEFAULT 0, -- 0=false 1=true (→ L3b toolkit)
|
||||
seuil_graduation INTEGER NOT NULL DEFAULT 3,
|
||||
last_validated TEXT, -- ISO8601
|
||||
notes TEXT,
|
||||
created_at TEXT NOT NULL DEFAULT (datetime('now')),
|
||||
updated_at TEXT NOT NULL DEFAULT (datetime('now')),
|
||||
UNIQUE(agent, projet, stack, pattern_id)
|
||||
);
|
||||
|
||||
-- ── Sessions metabolism ───────────────────────────────────────────────────────
|
||||
CREATE TABLE IF NOT EXISTS sessions (
|
||||
sess_id TEXT PRIMARY KEY,
|
||||
date TEXT NOT NULL,
|
||||
type TEXT, -- build-brain | use-brain | auto
|
||||
mode TEXT,
|
||||
handoff_level TEXT,
|
||||
tokens_used INTEGER,
|
||||
context_peak_pct INTEGER,
|
||||
context_at_close INTEGER,
|
||||
duration_min INTEGER,
|
||||
commits INTEGER,
|
||||
todos_closed INTEGER,
|
||||
saturation_flag INTEGER, -- 0/1
|
||||
health_score REAL,
|
||||
cold_start_kpi_pass INTEGER, -- 0/1/NULL
|
||||
notes TEXT
|
||||
);
|
||||
|
||||
-- ── Agents chargés par session ───────────────────────────────────────────────
|
||||
CREATE TABLE IF NOT EXISTS agent_loads (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
sess_id TEXT NOT NULL REFERENCES claims(sess_id),
|
||||
agent TEXT NOT NULL,
|
||||
tokens_estimated INTEGER,
|
||||
loaded_at TEXT NOT NULL DEFAULT (datetime('now')),
|
||||
reason TEXT -- why it was loaded
|
||||
);
|
||||
|
||||
-- ── Vues utilitaires ─────────────────────────────────────────────────────────
|
||||
|
||||
CREATE VIEW IF NOT EXISTS v_open_claims AS
|
||||
SELECT sess_id, scope, opened_at,
|
||||
ROUND((julianday('now') - julianday(opened_at)) * 24, 1) AS age_hours
|
||||
FROM claims
|
||||
WHERE status = 'open'
|
||||
ORDER BY opened_at DESC;
|
||||
|
||||
CREATE VIEW IF NOT EXISTS v_stale_claims AS
|
||||
SELECT sess_id, scope, opened_at,
|
||||
ROUND((julianday('now') - julianday(opened_at)) * 24, 1) AS age_hours
|
||||
FROM claims
|
||||
WHERE status = 'open'
|
||||
AND julianday('now') > julianday(opened_at, '+4 hours')
|
||||
ORDER BY age_hours DESC;
|
||||
|
||||
CREATE VIEW IF NOT EXISTS v_active_locks AS
|
||||
SELECT filepath, holder, claimed_at, expires_at,
|
||||
CASE WHEN julianday('now') < julianday(expires_at) THEN 'active' ELSE 'expired' END AS lock_status
|
||||
FROM locks
|
||||
ORDER BY claimed_at DESC;
|
||||
|
||||
CREATE VIEW IF NOT EXISTS v_graduation_candidates AS
|
||||
SELECT agent, projet, stack, pattern_id, validations, kpi_score,
|
||||
ROUND(CAST(validations AS REAL) / seuil_graduation, 2) AS progress
|
||||
FROM agent_memory
|
||||
WHERE graduated = 0
|
||||
AND validations >= seuil_graduation
|
||||
ORDER BY validations DESC;
|
||||
|
||||
CREATE VIEW IF NOT EXISTS v_cold_start_kpi AS
|
||||
SELECT
|
||||
COUNT(*) AS total_no_handoff,
|
||||
SUM(CASE WHEN cold_start_kpi_pass = 1 THEN 1 ELSE 0 END) AS passes,
|
||||
ROUND(
|
||||
100.0 * SUM(CASE WHEN cold_start_kpi_pass = 1 THEN 1 ELSE 0 END)
|
||||
/ NULLIF(SUM(CASE WHEN cold_start_kpi_pass IS NOT NULL THEN 1 ELSE 0 END), 0),
|
||||
1) AS pass_rate_pct
|
||||
FROM sessions
|
||||
WHERE handoff_level = 'NO';
|
||||
|
||||
CREATE VIEW IF NOT EXISTS v_metabolism_7d AS
|
||||
SELECT
|
||||
date,
|
||||
type,
|
||||
AVG(health_score) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS health_7d_avg,
|
||||
SUM(CASE WHEN type='build-brain' THEN 1 ELSE 0 END)
|
||||
OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS build_7d,
|
||||
SUM(CASE WHEN type='use-brain' THEN 1 ELSE 0 END)
|
||||
OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS use_7d
|
||||
FROM sessions
|
||||
ORDER BY date DESC;
|
||||
227
brain-engine/search.py
Normal file
227
brain-engine/search.py
Normal file
@@ -0,0 +1,227 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
brain-engine/search.py — Recherche sémantique BE-2d
|
||||
Embed une query → cosine similarity sur brain.db → top-K chunks
|
||||
|
||||
Usage :
|
||||
python3 brain-engine/search.py "décisions archi SuperOAuth"
|
||||
python3 brain-engine/search.py "cold start" --top 10
|
||||
python3 brain-engine/search.py "agents helloWorld" --mode file
|
||||
python3 brain-engine/search.py "sessions metabolism" --mode json
|
||||
|
||||
Modes :
|
||||
human (défaut) → tableau lisible : score | filepath | extrait
|
||||
file → filepaths dédupliqués, triés par score (pour Claude : charger ces fichiers)
|
||||
json → JSON brut : [{score, filepath, title, chunk_text}]
|
||||
|
||||
Headless : zéro dépendance display/Wayland.
|
||||
OLLAMA_URL : variable d'env (défaut localhost:11434).
|
||||
"""
|
||||
|
||||
import os
|
||||
import sys
|
||||
import json
|
||||
import struct
|
||||
import argparse
|
||||
import sqlite3
|
||||
import urllib.request
|
||||
import urllib.error
|
||||
from pathlib import Path
|
||||
|
||||
BRAIN_ROOT = Path(__file__).parent.parent
|
||||
DB_PATH = BRAIN_ROOT / 'brain.db'
|
||||
OLLAMA_URL = os.getenv('OLLAMA_URL', 'http://localhost:11434')
|
||||
EMBED_MODEL = os.getenv('EMBED_MODEL', 'nomic-embed-text')
|
||||
|
||||
# Guardrail — cohérent avec embed.py
|
||||
_BLOCKED_MODELS = ['mistral', 'qwen', 'llama', 'gemma', 'phi', 'deepseek']
|
||||
if any(b in EMBED_MODEL.lower() for b in _BLOCKED_MODELS):
|
||||
sys.exit(f"❌ EMBED_MODEL='{EMBED_MODEL}' interdit — utiliser nomic-embed-text ou mxbai-embed-large")
|
||||
|
||||
|
||||
# ── Maths ─────────────────────────────────────────────────────────────────────
|
||||
|
||||
def cosine_sim(a: list[float], b: list[float]) -> float:
|
||||
dot = sum(x * y for x, y in zip(a, b))
|
||||
norm_a = sum(x * x for x in a) ** 0.5
|
||||
norm_b = sum(x * x for x in b) ** 0.5
|
||||
if norm_a == 0.0 or norm_b == 0.0:
|
||||
return 0.0
|
||||
return dot / (norm_a * norm_b)
|
||||
|
||||
|
||||
def blob_to_vector(blob: bytes) -> list[float]:
|
||||
n = len(blob) // 4
|
||||
return list(struct.unpack(f'{n}f', blob))
|
||||
|
||||
|
||||
# ── Ollama ─────────────────────────────────────────────────────────────────────
|
||||
|
||||
def embed_query(text: str) -> list[float] | None:
|
||||
url = f"{OLLAMA_URL}/api/embeddings"
|
||||
payload = json.dumps({"model": EMBED_MODEL, "prompt": text}).encode()
|
||||
req = urllib.request.Request(url, data=payload,
|
||||
headers={"Content-Type": "application/json"})
|
||||
try:
|
||||
with urllib.request.urlopen(req, timeout=30) as resp:
|
||||
data = json.loads(resp.read())
|
||||
return data.get('embedding')
|
||||
except (urllib.error.URLError, TimeoutError) as e:
|
||||
print(f"❌ Ollama indisponible ({OLLAMA_URL}) : {e}", file=sys.stderr)
|
||||
return None
|
||||
|
||||
|
||||
# ── SQLite ─────────────────────────────────────────────────────────────────────
|
||||
|
||||
def load_vectors(conn: sqlite3.Connection,
|
||||
allowed_scopes: list[str] | None = None,
|
||||
include_historical: bool = False) -> list[dict]:
|
||||
"""Charge les chunks indexés depuis brain.db, filtrés par scope si fourni.
|
||||
Shadow indexing (ADR-037) : scope='historical' exclu par défaut."""
|
||||
historical_filter = "" if include_historical else "AND scope != 'historical'"
|
||||
if allowed_scopes:
|
||||
placeholders = ','.join('?' * len(allowed_scopes))
|
||||
rows = conn.execute(f"""
|
||||
SELECT chunk_id, filepath, title, chunk_text, vector
|
||||
FROM embeddings
|
||||
WHERE indexed = 1 AND vector IS NOT NULL
|
||||
AND scope IN ({placeholders})
|
||||
{historical_filter}
|
||||
""", allowed_scopes).fetchall()
|
||||
else:
|
||||
rows = conn.execute(f"""
|
||||
SELECT chunk_id, filepath, title, chunk_text, vector
|
||||
FROM embeddings
|
||||
WHERE indexed = 1 AND vector IS NOT NULL
|
||||
{historical_filter}
|
||||
""").fetchall()
|
||||
result = []
|
||||
for row in rows:
|
||||
result.append({
|
||||
'chunk_id': row['chunk_id'],
|
||||
'filepath': row['filepath'],
|
||||
'title': row['title'] or '',
|
||||
'chunk_text': row['chunk_text'],
|
||||
'vector': blob_to_vector(row['vector']),
|
||||
})
|
||||
return result
|
||||
|
||||
|
||||
# ── Search ─────────────────────────────────────────────────────────────────────
|
||||
|
||||
def search(query: str, top_k: int = 5, min_score: float = 0.0,
|
||||
allowed_scopes: list[str] | None = None) -> list[dict]:
|
||||
"""Retourne les top-K chunks les plus proches de la query."""
|
||||
# 1. Embed la query
|
||||
q_vec = embed_query(query)
|
||||
if q_vec is None:
|
||||
return []
|
||||
|
||||
# 2. Charger les vecteurs (filtrés par scope si fourni)
|
||||
conn = sqlite3.connect(DB_PATH)
|
||||
conn.row_factory = sqlite3.Row
|
||||
chunks = load_vectors(conn, allowed_scopes=allowed_scopes)
|
||||
conn.close()
|
||||
|
||||
if not chunks:
|
||||
print("⚠️ Index vide — lancer embed.py d'abord", file=sys.stderr)
|
||||
return []
|
||||
|
||||
# 3. Cosine similarity
|
||||
scored = []
|
||||
for chunk in chunks:
|
||||
score = cosine_sim(q_vec, chunk['vector'])
|
||||
if score >= min_score:
|
||||
scored.append({**chunk, 'score': score})
|
||||
|
||||
# 4. Trier, dédupliquer par chunk_id (déjà unique), retourner top-K
|
||||
scored.sort(key=lambda x: x['score'], reverse=True)
|
||||
top_results = scored[:top_k]
|
||||
|
||||
# 5. Tracking V1 (ADR-037) — hit_count + last_queried_at sur les chunks retournés
|
||||
if top_results:
|
||||
try:
|
||||
track_conn = sqlite3.connect(DB_PATH)
|
||||
chunk_ids = [r['chunk_id'] for r in top_results if r.get('chunk_id')]
|
||||
if chunk_ids:
|
||||
placeholders = ','.join('?' * len(chunk_ids))
|
||||
track_conn.execute(f"""
|
||||
UPDATE embeddings
|
||||
SET hit_count = COALESCE(hit_count, 0) + 1,
|
||||
last_queried_at = datetime('now')
|
||||
WHERE chunk_id IN ({placeholders})
|
||||
""", chunk_ids)
|
||||
track_conn.commit()
|
||||
track_conn.close()
|
||||
except Exception:
|
||||
pass # tracking is best-effort — never breaks search
|
||||
|
||||
return top_results
|
||||
|
||||
|
||||
# ── Output ─────────────────────────────────────────────────────────────────────
|
||||
|
||||
def print_human(results: list[dict], query: str):
|
||||
if not results:
|
||||
print(f"Aucun résultat pour : {query!r}")
|
||||
return
|
||||
print(f"\nRecherche : {query!r} ({len(results)} résultat(s))\n")
|
||||
print(f"{'Score':>6} {'Fichier':<50} Extrait")
|
||||
print("─" * 100)
|
||||
for r in results:
|
||||
score = f"{r['score']:.3f}"
|
||||
fp = r['filepath']
|
||||
if len(fp) > 50:
|
||||
fp = '…' + fp[-49:]
|
||||
title = r['title']
|
||||
excerpt = r['chunk_text'].replace('\n', ' ')[:80]
|
||||
if title:
|
||||
excerpt = f"[{title}] {excerpt}"
|
||||
print(f"{score:>6} {fp:<50} {excerpt}")
|
||||
print()
|
||||
|
||||
|
||||
def print_files(results: list[dict]):
|
||||
"""Filepaths dédupliqués, ordre par meilleur score."""
|
||||
seen = []
|
||||
for r in results:
|
||||
if r['filepath'] not in seen:
|
||||
seen.append(r['filepath'])
|
||||
for fp in seen:
|
||||
print(fp)
|
||||
|
||||
|
||||
def print_json(results: list[dict]):
|
||||
out = [{
|
||||
'score': round(r['score'], 4),
|
||||
'filepath': r['filepath'],
|
||||
'title': r['title'],
|
||||
'chunk_text': r['chunk_text'],
|
||||
} for r in results]
|
||||
print(json.dumps(out, ensure_ascii=False, indent=2))
|
||||
|
||||
|
||||
# ── CLI ────────────────────────────────────────────────────────────────────────
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description='brain-engine search — BE-2d')
|
||||
parser.add_argument('query', help='Requête en langage naturel')
|
||||
parser.add_argument('--top', type=int, default=5, help='Nombre de résultats (défaut: 5)')
|
||||
parser.add_argument('--mode', choices=['human', 'file', 'json'], default='human',
|
||||
help='Format de sortie (défaut: human)')
|
||||
parser.add_argument('--min-score', type=float, default=0.0,
|
||||
help='Score minimum cosine (0.0–1.0, défaut: 0.0)')
|
||||
args = parser.parse_args()
|
||||
|
||||
results = search(args.query, top_k=args.top, min_score=args.min_score)
|
||||
|
||||
if args.mode == 'file':
|
||||
print_files(results)
|
||||
elif args.mode == 'json':
|
||||
print_json(results)
|
||||
else:
|
||||
print_human(results, args.query)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
1531
brain-engine/server.py
Normal file
1531
brain-engine/server.py
Normal file
File diff suppressed because it is too large
Load Diff
74
brain-engine/start.sh
Executable file
74
brain-engine/start.sh
Executable file
@@ -0,0 +1,74 @@
|
||||
#!/bin/bash
|
||||
# brain-engine/start.sh — Démarrage standalone
|
||||
# Usage : bash brain-engine/start.sh
|
||||
# Prérequis : Python 3.10+, Ollama (pour l'embedding — optionnel au premier boot)
|
||||
|
||||
set -e
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
|
||||
BRAIN_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"
|
||||
|
||||
echo "=== brain-engine — standalone boot ==="
|
||||
echo "Brain root : $BRAIN_ROOT"
|
||||
|
||||
# 1. Vérifier Python
|
||||
if ! command -v python3 &>/dev/null; then
|
||||
echo "❌ Python 3 requis. Installe-le : sudo apt install python3 python3-pip python3-venv"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# 2. Installer les dépendances (venv recommandé)
|
||||
if [ ! -d "$SCRIPT_DIR/.venv" ]; then
|
||||
echo "→ Création environnement virtuel..."
|
||||
python3 -m venv "$SCRIPT_DIR/.venv"
|
||||
fi
|
||||
source "$SCRIPT_DIR/.venv/bin/activate"
|
||||
pip install -q -r "$SCRIPT_DIR/requirements.txt"
|
||||
|
||||
# 3. Initialiser brain.db si absent
|
||||
if [ ! -f "$BRAIN_ROOT/brain.db" ]; then
|
||||
echo "→ Initialisation brain.db..."
|
||||
python3 "$SCRIPT_DIR/migrate.py" --reset 2>/dev/null || python3 "$SCRIPT_DIR/migrate.py"
|
||||
echo "✅ brain.db créé"
|
||||
else
|
||||
echo "✅ brain.db existant"
|
||||
fi
|
||||
|
||||
# 4. Embedding (optionnel — requiert Ollama)
|
||||
if command -v ollama &>/dev/null; then
|
||||
INDEXED=$(python3 -c "
|
||||
import sqlite3, os
|
||||
db = os.path.join('$BRAIN_ROOT', 'brain.db')
|
||||
if os.path.exists(db):
|
||||
c = sqlite3.connect(db)
|
||||
try: print(c.execute('SELECT COUNT(*) FROM embeddings WHERE indexed=1').fetchone()[0])
|
||||
except: print(0)
|
||||
c.close()
|
||||
else: print(0)
|
||||
" 2>/dev/null || echo "0")
|
||||
|
||||
if [ "$INDEXED" = "0" ]; then
|
||||
echo "→ Premier embedding du corpus (Ollama détecté)..."
|
||||
python3 "$SCRIPT_DIR/embed.py"
|
||||
echo "✅ Corpus indexé"
|
||||
else
|
||||
echo "✅ $INDEXED chunks déjà indexés"
|
||||
fi
|
||||
else
|
||||
echo "⚠️ Ollama non détecté — la recherche sémantique ne fonctionnera pas."
|
||||
echo " Installe Ollama : curl -fsSL https://ollama.com/install.sh | sh"
|
||||
echo " Puis : ollama pull nomic-embed-text && bash brain-engine/start.sh"
|
||||
echo " Le serveur démarre quand même (BSI, docs, endpoints basiques)."
|
||||
fi
|
||||
|
||||
# 5. Lancer le serveur
|
||||
PORT="${BRAIN_PORT:-7700}"
|
||||
echo ""
|
||||
echo "=== Lancement brain-engine sur port $PORT ==="
|
||||
echo " Health : http://localhost:$PORT/health"
|
||||
echo " Search : http://localhost:$PORT/search?q=comment+ca+marche"
|
||||
echo " Agents : http://localhost:$PORT/agents"
|
||||
echo ""
|
||||
|
||||
cd "$BRAIN_ROOT"
|
||||
python3 "$SCRIPT_DIR/server.py"
|
||||
1312
brain-engine/test_brain_engine.py
Normal file
1312
brain-engine/test_brain_engine.py
Normal file
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user