# Documentation

> Complete documentation for Large Language Models


---

## Document: Quick Start

Get Cortex running in 5 minutes with Docker

URL: /quickstart


# Quick Start

Get Cortex running locally in just a few minutes using Docker Compose.

## Prerequisites

Before you begin, make sure you have:

- [Docker](https://docs.docker.com/get-docker/) and Docker Compose installed
- [Git](https://git-scm.com/) for cloning the repository
- An **OpenAI API key** (or Anthropic/local LLM)

## Installation

### Step 1: Clone the Repository

```bash
git clone https://github.com/mocaOS/cortex-app.git
cd library
```

### Step 2: Configure Environment

Copy the example environment file:

```bash
cp .env.example .env
```

Open `.env` in your editor and set the required values:

```bash
# Required: Neo4j password
NEO4J_PASSWORD=your-secure-password-here

# Required: Admin credentials for the frontend
ADMIN_EMAIL=admin@example.com
ADMIN_PASSWORD=your-admin-password
ADMIN_API_KEY=cortex_admin_your-secure-key-here
SESSION_SECRET=a-random-string-at-least-32-characters-long
```

> **Tip:** Generate a secure session secret with: `openssl rand -base64 32`

#### LLM Configuration

Cortex uses LLMs for Q&A, entity extraction, relationship extraction, image understanding, and embeddings. Each capability can point to a different model or provider (any OpenAI-compatible API), giving you 5 independently configurable model slots.

**Recommended Minimal Stack** — bench-validated 2-model setup. Fill in two API values and you're done; relationship + vision inherit from the extraction model, output budgets cascade automatically.

```bash
# Primary — agentic Q&A / researcher (DeepSeek-V4-Flash: 1M context window)
OPENAI_API_KEY=
OPENAI_API_BASE=https://api.venice.ai/api/v1
OPENAI_MODEL=deepseek-v4-flash
OPENAI_MAX_CONTEXT=1000000

# Extraction — drives relationship via inheritance (Qwen3.7-27B: 256K window)
GRAPH_EXTRACTION_MODEL=qwen3-6-27b
GRAPH_EXTRACTION_MAX_CONTEXT=256000

# Vision — image analysis (does NOT inherit; api_base/api_key inherit from OPENAI_*)
VISION_MODEL=qwen3-6-27b

# Embeddings — text embedding model (Qwen3-Embedding-8B: native 4096, MRL 32–4096)
EMBEDDING_MODEL=text-embedding-qwen3-8b
EMBEDDING_DIMENSION=4096            # Native dimension; Neo4j 5.26 (default) supports up to 4096-dim vector indexes
# EMBEDDING_MAX_INPUT_TOKENS stays at default 8192 — Venice/OpenAI cap inputs at 8192 server-side.
# Self-hosted vLLM users can lift to 32768 to use Qwen3-Embedding-8B's full native context.
```

**Performance tuning (Venice-validated)** — pair with the stack above to crank ingestion throughput. Bench-validated on Venice as the LLM provider; safe on Venice / Compute3 / large vLLM endpoints. Dial back for stock OpenAI or smaller hosts.

```bash
BATCH_PROCESSING_CONCURRENCY=3    # docs processed in parallel (default 2)
CONCURRENT_EXTRACTIONS=4          # entity-extraction threads per doc (default 3)
CONCURRENT_RELATIONS=4            # per-chunk relationship threads per doc (default 3)
VISION_MAX_CONCURRENT=4           # system-wide vision-API semaphore (default 3)
```

Or configure each tier independently:

```bash
# ── Primary LLM (Q&A, research, chat) ───────────────────────────
# Recommended: powerful reasoning models (e.g. Minimax M2.7, GLM5, Kimi K2.5)
OPENAI_API_KEY=
OPENAI_API_BASE=https://api.example.com/v1
OPENAI_MODEL=

# ── Graph Extraction (entity discovery + community summarization) ─
# Recommended: instruction-following models (e.g. Mistral Small 24B, Ministral 14B)
ENABLE_GRAPH_EXTRACTION=                     # true = extract entities/relationships, false = skip
GRAPH_EXTRACTION_MODEL=                      # defaults to OPENAI_MODEL
GRAPH_EXTRACTION_API_BASE=                   # defaults to OPENAI_API_BASE
GRAPH_EXTRACTION_API_KEY=                    # defaults to OPENAI_API_KEY

# ── Relationship Model (relationship discovery) ─────────────────
# Recommended: instruction-following models (e.g. OpenAI GPT OSS 120B)
# Used for per-chunk (Step 1) and cross-document (Step 2) relationship extraction
# Runs on a separate rate limit from entity extraction
RELATIONSHIP_EXTRACTION_MODEL=               # defaults to GRAPH_EXTRACTION_MODEL → OPENAI_MODEL
RELATIONSHIP_EXTRACTION_API_BASE=            # defaults to GRAPH_EXTRACTION_API_BASE
RELATIONSHIP_EXTRACTION_API_KEY=             # defaults to GRAPH_EXTRACTION_API_KEY

# Context budgets — primary OPENAI_MAX_CONTEXT cascades to sub-tiers via the
# fallback chain (0 = inherit). Override only when a sub-tier model has a
# different window than the primary.
GRAPH_EXTRACTION_MAX_CONTEXT=256000                # override only when extraction model has bigger window than primary
RELATIONSHIP_MAX_CONTEXT=0                   # 0 = inherit GRAPH_EXTRACTION_MAX_CONTEXT → OPENAI_MAX_CONTEXT

# ── Vision (image analysis during document ingestion) ────────────
VISION_MODEL=
VISION_MODEL_API_BASE=                       # defaults to OPENAI_API_BASE
VISION_MODEL_API_KEY=                        # defaults to OPENAI_API_KEY

# ── Embeddings ───────────────────────────────────────────────────
EMBEDDING_MODEL=
EMBEDDING_DIMENSION=
EMBEDDING_SEND_DIMENSIONS=true               # set false for models with fixed output dim
EMBEDDING_API_BASE=                          # defaults to OPENAI_API_BASE
EMBEDDING_API_KEY=                           # defaults to OPENAI_API_KEY
```

#### Performance Tuning

Controls how much work runs in parallel:

```bash
BATCH_PROCESSING_CONCURRENCY=2               # documents processed in parallel
CONCURRENT_EXTRACTIONS=3                     # entity extraction thread pool size
CONCURRENT_RELATIONS=3                       # per-chunk relationship extractions per document
VISION_MAX_CONCURRENT=3                      # concurrent vision API calls (system-wide)
PARALLEL_RELATIONSHIP_BATCHES=5              # relationship analysis batches in parallel (1 = sequential)
```

> `BATCH_PROCESSING_CONCURRENCY` controls how many documents go through the pipeline simultaneously. Within each document, `CONCURRENT_EXTRACTIONS` sizes the entity extraction thread pool and `CONCURRENT_RELATIONS` controls per-chunk relationship extraction concurrency. `VISION_MAX_CONCURRENT` independently caps the background image analysis pipeline across all documents. `PARALLEL_RELATIONSHIP_BATCHES` is the most impactful lever for speeding up cross-document relationship analysis — increase it to run multiple LLM calls concurrently.

See the [Configuration Reference](/configuration) for all 50+ environment variables.

### Step 3: Start Services

```bash
docker compose up -d
```

This starts all services:

| Service | URL | Description |
|---------|-----|-------------|
| Frontend | http://localhost:3000 | Next.js web interface |
| Backend API | http://localhost:8000 | FastAPI REST API |
| Neo4j Browser | http://localhost:7474 | Database admin UI |
| Neo4j Bolt | bolt://localhost:7687 | Database connection |

### Step 4: Verify Installation

Check that all containers are running:

```bash
docker compose ps
```

Test the API health endpoint:

```bash
curl http://localhost:8000/health
```

Expected response:

```json
{
  "status": "healthy",
  "neo4j_connected": true,
  "version": "1.0.0"
}
```

### Step 5: Access the Web Interface

Open http://localhost:3000 in your browser and log in with your admin credentials.

## First Steps

### 1. Upload Your First Document

1. Navigate to the **Documents** page
2. Click the **Upload** button to open the upload modal
3. Drag and drop a PDF, TXT, MD, or DOCX file
4. Click **Start Processing** to begin — you'll be taken back to the Documents page where you can watch progress in real-time

### 2. Explore Your Knowledge Base

1. Go to the **Explore** section
2. Use the **Knowledge Graph** tab to visualize entities and relationships
3. Use the **Deep Research** tab for multi-step reasoning over your documents
4. Use the **Chat** tab to ask questions and get AI-generated answers with source citations

## Common Commands

```bash
# Start all services
docker compose up -d

# Stop all services
docker compose down

# View logs
docker compose logs -f

# View logs for specific service
docker compose logs -f backend

# Restart a service
docker compose restart backend

# Rebuild after code changes
docker compose up -d --build
```

## Troubleshooting

### Container won't start

Check if ports are already in use:

```bash
lsof -i :3000  # Frontend
lsof -i :8000  # Backend
lsof -i :7687  # Neo4j
```

### Neo4j connection failed

Wait 30-60 seconds for Neo4j to fully initialize, then restart the backend:

```bash
docker compose restart backend
```

### OpenAI API errors

Verify your API key is valid:

```bash
curl https://api.openai.com/v1/models \
  -H "Authorization: Bearer $OPENAI_API_KEY"
```

## Next Steps

- [Configuration Reference](/configuration) - All 50+ environment variables
- [Document Upload Guide](/features/document-upload) - Supported formats and options
- [Deployment Guide](/guides/deployment) - Production deployment with Coolify
- [API Reference](/api) - Full API documentation


---

## Document: Introduction

Welcome to the Cortex documentation - an agentic knowledge base powered by AI and GraphRAG

URL: /introduction


# Introduction

import { Mermaid } from "zudoku/mermaid";

**Cortex** is an **agentic knowledge base** that transforms your documents into a searchable, AI-powered knowledge graph. Upload documents, ask questions, and get intelligent answers with full source citations.

Built with **FastAPI**, **Haystack 2.0**, **Neo4j**, and **Next.js**, Cortex combines the power of large language models with graph-based retrieval for superior question answering.

## Key Features

| Feature | Description |
|---------|-------------|
| **Document Processing** | Upload PDFs, text files, markdown, and Word documents with automatic chunking |
| **Knowledge Graph** | AI-powered entity and relationship extraction using GraphRAG |
| **Hybrid Search** | Combines vector similarity, keyword search, and graph traversal with RRF fusion |
| **Ask AI** | RAG-powered Q&A with Deep Research (multi-step reasoning) and Chat modes |
| **Collections** | Organize documents into collections with scoped graphs |
| **Communities** | Automatic detection of document clusters and theme summarization |
| **Turbo Mode** ⚠️ on hold | GPU-accelerated inference via Compute3 integration — partnership prepared in 2025, Compute3 service not yet in production; feature non-functional today |
| **Custom Inputs** | Add knowledge manually via Q&A pairs, text, or markdown |
| **Prompt Security** | Built-in protection against prompt injection attacks |

## Architecture

<Mermaid chart={`
flowchart LR
    subgraph Frontend["Frontend"]
        UI["Next.js 15<br/>React 19<br/>TypeScript"]
    end
    subgraph Backend["Backend"]
        API["FastAPI<br/>Haystack 2.0<br/>Python"]
    end
    subgraph Database["Database"]
        Neo4j[("Neo4j 5<br/>Graph + Vector")]
    end
    subgraph AI["AI Services"]
        LLM["OpenAI / LLM"]
        Embed["Embeddings"]
        Rerank["Cross-Encoder"]
    end
    
    UI --> API
    API --> Neo4j
    API --> LLM
    API --> Embed
    API --> Rerank
`} />

## Technology Stack

| Layer | Technologies |
|-------|--------------|
| **Frontend** | Next.js 15, React 19, TypeScript, Tailwind CSS, Framer Motion |
| **Backend** | FastAPI, Haystack 2.0, Python 3.11+, Pydantic |
| **Database** | Neo4j 5.x (Graph storage + Vector similarity search) |
| **AI/LLM** | OpenAI GPT-4o, Anthropic Claude, or local models via LiteLLM |
| **Embeddings** | OpenAI text-embedding-3-small (1536 dimensions) |
| **Re-ranking** | Cross-encoder (ms-marco-MiniLM-L-6-v2) |
| **GPU Compute** | Compute3 for accelerated inference (optional) |

## Document Ingestion Pipeline

<Mermaid chart={`
flowchart TD
    A[Document Upload] --> B[Text Extraction]
    B --> C[Chunking]
    C --> D[Embedding Generation]
    D --> E[Entity Extraction]
    E --> F[Semantic Resolution]
    F --> G[Neo4j Storage]
    G --> H[Ready for Search]
    
    B -.-> B1["PDF, DOCX, TXT, MD"]
    C -.-> C1["Sentence-based splitting"]
    D -.-> D1["text-embedding-3-small"]
    E -.-> E1["LLM extracts entities"]
    F -.-> F1["Deduplicate similar entities"]
`} />

## Query Pipeline (Hybrid Search + RAG)

<Mermaid chart={`
flowchart TD
    Q[User Question] --> S1[Vector Search]
    Q --> S2[Keyword Search]
    Q --> S3[Graph Traversal]
    
    S1 --> RRF[Reciprocal Rank Fusion]
    S2 --> RRF
    S3 --> RRF
    
    RRF --> RERANK[Cross-Encoder Re-ranking]
    RERANK --> LLM[LLM Generation]
    LLM --> ANS[Answer + Sources]
    
    S1 -.-> W1["Weight: 0.5"]
    S2 -.-> W2["Weight: 0.3"]
    S3 -.-> W3["Weight: 0.2"]
`} />

## API Overview

Cortex exposes **60+ REST API endpoints** across these categories:

| Category | Endpoints | Description |
|----------|-----------|-------------|
| Health | 2 | Health check and statistics |
| Documents | 8 | Upload, list, delete, reprocess, bulk operations |
| Custom Inputs | 4 | Add Q&A pairs, text, markdown manually |
| Search | 3 | Semantic, hybrid, and filtered search |
| Ask AI | 4 | RAG Q&A, streaming, agentic mode |
| Knowledge Graph | 9 | Entities, relationships, subgraphs, visualization, deduplication |
| Collections | 6 | CRUD, document assignment |
| Communities | 5 | Detection, summarization, search |
| Tasks | 5 | Background task tracking |
| Admin | 5 | API key management |
| Turbo Mode ⚠️ on hold | 3 | GPU-accelerated inference (not currently live) |

## Quick Links

- [Quick Start](/quickstart) - Get running in 5 minutes with Docker
- [Configuration](/configuration) - All 50+ environment variables
- [API Reference](/api) - Complete OpenAPI documentation
- [GitHub](https://github.com/mocaOS/cortex-app) - Source code and issues


---

## Document: Configuration

Complete reference for all 50+ environment variables in Cortex

URL: /configuration


# Configuration

Cortex is configured through environment variables. This page documents every available option.

## Required Variables

These must be set for Cortex to function:

### Neo4j Database

```bash
NEO4J_URI=bolt://neo4j:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=your-secure-password-here
```

### LLM Provider

At least one LLM provider is required:

```bash
# OpenAI (recommended)
OPENAI_API_KEY=sk-your-api-key-here
OPENAI_API_BASE=https://api.openai.com/v1
OPENAI_MODEL=gpt-4o-mini
```

### Admin Authentication

```bash
ADMIN_EMAIL=admin@example.com
ADMIN_PASSWORD=your-secure-admin-password
ADMIN_API_KEY=cortex_admin_your-secure-api-key-here
SESSION_SECRET=your-session-secret-key-at-least-32-characters-long
```

---

## LLM Configuration

### Model Selection

```bash
# Primary model for Q&A, research, chat
# Recommended: powerful reasoning models (e.g. Minimax M2.7, GLM5, Kimi K2.5)
OPENAI_MODEL=gpt-4o-mini

# Optional: Faster/cheaper model for "Fast Mode"
OPENAI_MODEL_FAST_MODE=gpt-4o-mini

# Primary token budgets (floor of the fallback chain — see "Budget Fallback Chain" below)
OPENAI_MAX_OUTPUT_TOKENS=8000   # Default output tokens for every LLM call
OPENAI_MAX_CONTEXT=32768        # Default input context budget
```

### Embedding Configuration

```bash
EMBEDDING_MODEL=openai/text-embedding-3-small
EMBEDDING_DIMENSION=1536
EMBEDDING_SEND_DIMENSIONS=true        # Set to false for models with fixed output dimensions
USE_OPENAI_EMBEDDINGS=true
# EMBEDDING_API_BASE=            # API base URL for embeddings (defaults to OPENAI_API_BASE)
# EMBEDDING_API_KEY=             # API key for embeddings (defaults to OPENAI_API_KEY)
```

---

## Document Processing

### Upload Settings

```bash
UPLOAD_DIR=/app/uploads
CUSTOM_INPUTS_DIR=/app/custom_inputs
MAX_FILE_SIZE_MB=50
```

### Chunking Configuration

```bash
# Chunk size in tokens
CHUNK_SIZE=500

# Overlap between chunks
CHUNK_OVERLAP=50

# Chunking strategy: "sentence" or "token"
CHUNK_BY=sentence

# Sentences per chunk (when CHUNK_BY=sentence)
SENTENCES_PER_CHUNK=5
```

### Batch Processing

```bash
# Concurrent document processing
BATCH_PROCESSING_CONCURRENCY=2

# Thread workers for processing
PROCESSING_THREAD_WORKERS=4
```

---

## GraphRAG Configuration

### Entity Extraction

```bash
# Enable/disable graph extraction
ENABLE_GRAPH_EXTRACTION=true

# Model for entity extraction, community summarization, and query-side entity
# extraction during RAG search (defaults to OPENAI_MODEL)
# Recommended: instruction-following models (e.g. Mistral Small 24B, Ministral 14B)
# Also handles community summarization for consistent structured output
# GRAPH_EXTRACTION_MODEL=gpt-4o-mini

# API base/key for extraction model (defaults to OPENAI_API_BASE / OPENAI_API_KEY)
# GRAPH_EXTRACTION_API_BASE=http://localhost:11434/v1
# GRAPH_EXTRACTION_API_KEY=ollama

# Tip: Use a small multimodal model (e.g. 8B) for both GRAPH_EXTRACTION_MODEL
# and VISION_MODEL to get fast extraction + image analysis from the same endpoint.

# Maximum graph hops for context retrieval
MAX_GRAPH_HOPS=2

# Concurrent extraction operations
CONCURRENT_EXTRACTIONS=3

# Max context window tokens for entity extraction batching
# 0 = inherit OPENAI_MAX_CONTEXT. Set explicitly when extraction model has bigger window than primary.
# GRAPH_EXTRACTION_MAX_CONTEXT=0

# Output budget for entity-extraction LLM calls
# 0 = inherit OPENAI_MAX_OUTPUT_TOKENS. Bump to 3500–4000 for Qwen3-family models.
# EXTRACTION_MAX_OUTPUT_TOKENS=0
```

### Relationship Model

Optional dedicated model for all relationship extraction — both per-chunk extraction
during document processing (Step 1) and batch cross-document analysis (Step 2).
Runs on a separate rate limit from entity extraction. Falls back to the extraction
model, then the primary model.

```bash
# Model for relationship extraction (defaults to GRAPH_EXTRACTION_MODEL → OPENAI_MODEL)
# Recommended: instruction-following models (e.g. OpenAI GPT OSS 120B)
# RELATIONSHIP_EXTRACTION_MODEL=gpt-4o-mini

# API configuration (defaults to GRAPH_EXTRACTION_API_BASE / GRAPH_EXTRACTION_API_KEY)
# RELATIONSHIP_EXTRACTION_API_BASE=http://localhost:11434/v1
# RELATIONSHIP_EXTRACTION_API_KEY=ollama

# Concurrent per-chunk relationship extractions per document (default: 3)
# CONCURRENT_RELATIONS=3
```

### Relationship Analysis

Cross-document relationship discovery (Phase B) using the relationship model
(falls back to extraction model → primary model). Two-phase pipeline (candidate
scanning + structured confirmation) with co-occurrence batching and multi-round
discovery:

```bash
# Max INPUT context window tokens for Phase 2 batch relationship analysis
# 0 = inherit GRAPH_EXTRACTION_MAX_CONTEXT → OPENAI_MAX_CONTEXT
# RELATIONSHIP_MAX_CONTEXT=0

# Output budget for per-chunk + candidate-pair scan (in the inheritance chain)
# 0 = inherit EXTRACTION_MAX_OUTPUT_TOKENS → OPENAI_MAX_OUTPUT_TOKENS
# RELATIONSHIP_MAX_OUTPUT_TOKENS=0

# Output budget for Phase 2 batch relationship analysis (standalone, NOT in chain)
# Batch processes hundreds of entity pairs per call — needs ~16k headroom.
# RELATIONSHIP_BATCH_MAX_OUTPUT_TOKENS=16000

# Number of relationship analysis batches to process in parallel
# PARALLEL_RELATIONSHIP_BATCHES=5

# Target entity-to-relationship ratio (ERR metric)
# Higher values = more relationships discovered per entity
# RELATIONSHIP_TARGET_RATIO=1.0

# Max discovery rounds per batch (default 3 for initial, 1 for re-analyze)
# RELATIONSHIP_MAX_ROUNDS=3

# Max hours for relationship analysis (0 = no time limit)
# RELATIONSHIP_MAX_HOURS=0

# Soft cap on relationships per entity (0 = no cap)
# Prevents hub entities from accumulating disproportionate connections
# RELATIONSHIP_MAX_PER_ENTITY=50
```

### Reasoning Control for ingestion

Reasoning hurts structured extraction (drift, hidden-token cost, latency, malformed JSON). These knobs let reasoning-capable models (GPT-5/5.1, Claude 4.x, Qwen3, DeepSeek-R1, GLM-4.6, Kimi K2, MiniMax M2) be used for ingestion while suppressing their thinking. Cortex auto-detects the provider from `base_url` and the model family from the model name — works for OpenAI, OpenRouter, Venice, Anthropic, and vLLM/Compute3.

Accepted values: `off | minimal | auto | low | medium | high` (`none`/`disabled` are aliases for OFF).

```bash
# Default OFF: skips reasoning on entity extraction, summaries, communities,
# entity enrichment, query-entity extraction. No-op for pure instruct models.
EXTRACTION_REASONING_MODE=off

# Default OFF: skips reasoning on candidate scan, gleaning pass, per-chunk
# and batch relationship extraction.
RELATIONSHIP_REASONING_MODE=off

# Default OFF: skips reasoning on the vision-model image-description call.
# Lets a reasoning multimodal model (e.g. Qwen3-VL-27B) be used as VISION_MODEL
# without <think> tokens leaking into image descriptions.
VISION_REASONING_MODE=off

# Default AUTO: the Q&A / researcher path stays on the provider default.
# reasoning_effort=minimal breaks parallel tool calls on OpenAI, so don't
# force it OFF here.
DEFAULT_REASONING_MODE=auto

# Escape hatch for novel models the heuristics get wrong.
# Format: "model1:mode1,model2:mode2"
# REASONING_MODEL_OVERRIDES=gpt-5.8:none,custom-llm:minimal
```

**Forward compatibility.** Same-family minor releases route automatically — `gpt-5.8` works like `gpt-5.1`. For new majors (e.g. `gpt-6`) or models the heuristic misclassifies, set `REASONING_MODEL_OVERRIDES`. If the API rejects the param, the wrapper strips it on retry, logs a warning, and caches the model so future calls skip the param upfront.

**Caveats.** `gpt-5-pro` is hard-pinned to `reasoning_effort=high` (OFF is silently ignored, one-time WARN). `gpt-5-codex` auto-downgrades `minimal`→`low`. Anthropic Opus 4.7+ uses adaptive thinking — manual `thinking` returns 400, so the helper omits the param. OpenRouter `exclude:true` does NOT save tokens; we use `effort:"none"` instead.

### Budget Fallback Chain

Every LLM call has two budgets that matter: **output tokens** (how much the model may write) and **input context** (how much input the call may consume). All sub-tier knobs default to `0` (= inherit from the next tier up), letting you configure a multi-model stack with just two or three env vars.

```
OUTPUT TOKENS:                          INPUT CONTEXT:
  OPENAI_MAX_OUTPUT_TOKENS=8000           OPENAI_MAX_CONTEXT=32768
       ↓                                       ↓
  EXTRACTION_MAX_OUTPUT_TOKENS            GRAPH_EXTRACTION_MAX_CONTEXT
       ↓                                       ↓
  RELATIONSHIP_MAX_OUTPUT_TOKENS          RELATIONSHIP_MAX_CONTEXT
       ↓
  VISION_MAX_OUTPUT_TOKENS

  RELATIONSHIP_BATCH_MAX_OUTPUT_TOKENS=16000   (standalone, Phase 2 only)
```

**Recommended minimal stack** — configure two models + two context windows; everything else inherits:

```bash
OPENAI_MODEL=deepseek-v4-flash        # primary / agentic (1M window)
OPENAI_MAX_CONTEXT=1000000                 # unlock DeepSeek-V4-Flash full input window

GRAPH_EXTRACTION_MODEL=qwen3-6-27b    # extraction + (inherited) relationship (256K window)
GRAPH_EXTRACTION_MAX_CONTEXT=256000        # unlock Qwen3.7-27B full input window; relationship_max_context inherits

VISION_MODEL=qwen3-6-27b              # image analysis (does NOT inherit from extraction)

EMBEDDING_MODEL=text-embedding-qwen3-8b    # text embedding (native 4096, MRL 32–4096)
EMBEDDING_DIMENSION=4096                   # Native; Neo4j 5.26 (default) supports up to 4096-dim vector indexes
# Output budgets cascade through defaults. EMBEDDING_MAX_INPUT_TOKENS stays at default 8192:
# Venice and OpenAI cap embed inputs at 8192 at the gateway regardless of model. Self-hosted
# vLLM users running Qwen3-Embedding-8B can lift to 32768.
```

Both `*_MAX_CONTEXT` overrides are required because the conservative default (32768) doesn't match either model's actual input window — without them you'd cap DeepSeek-V4-Flash and Qwen3.7-27B at a fraction of their real capability. The embedding model uses the primary `OPENAI_API_BASE` + `OPENAI_API_KEY` unless overridden via `EMBEDDING_API_BASE`/`EMBEDDING_API_KEY`. `EMBEDDING_SEND_DIMENSIONS=true` (default) works because Qwen3-Embedding-8B supports MRL — the API honors the dimensions parameter to truncate to any value between 32 and 4096. `EMBEDDING_MAX_INPUT_TOKENS` defaults to 8192 to match the cap Venice/OpenAI enforce at the API gateway (8192 regardless of the underlying model's native window); oversized inputs are char-truncated client-side to avoid HTTP 400 *"Input text exceeds the maximum token limit"* rejections. On self-hosted vLLM you can lift to the model's native context (e.g. 32768 for Qwen3-Embedding-8B).

**Performance tuning (Venice-validated).** Pair these four knobs with the stack above to maximize ingestion throughput. Bench-validated against Venice; safe on Venice, Compute3, or large self-hosted vLLM endpoints. Dial back on stock OpenAI or smaller hosts to avoid rate-limit errors.

```bash
BATCH_PROCESSING_CONCURRENCY=3    # docs processed in parallel (default 2)
CONCURRENT_EXTRACTIONS=4          # entity-extraction threads per doc (default 3 — biggest multiplier)
CONCURRENT_RELATIONS=4            # per-chunk relationship threads per doc (default 3)
VISION_MAX_CONCURRENT=4           # system-wide vision-API semaphore (default 3)
```

The two `CONCURRENT_*` knobs are *per-document* limits, so they compound with `BATCH_PROCESSING_CONCURRENCY` as documents move through the pipeline. The ingestion pipeline staggers extraction, per-chunk relationships, and vision across each doc's lifecycle, so actual in-flight concurrency stays below the worst-case theoretical product. If you swap to a slower provider, `CONCURRENT_EXTRACTIONS` is the first knob to lower — it's the biggest multiplier and entity extraction is the heaviest call. `VISION_MAX_CONCURRENT` is a global semaphore (system-wide) and does *not* multiply with `BATCH_PROCESSING_CONCURRENCY`.

**Override examples:**

```bash
# Constrain extraction-tier output independently (inherited 8000 default covers
# Qwen3-family verbose XML by default; only override to tighten or expand this tier)
EXTRACTION_MAX_OUTPUT_TOKENS=2000

# Big-context model for relationship analysis — raise the input context
OPENAI_MAX_CONTEXT=131072

# Only bump vision because of detailed chart images
VISION_MAX_OUTPUT_TOKENS=8000
```

**Migration note.** The env var `RELATIONSHIP_MAX_OUTPUT_TOKENS` previously controlled the Phase 2 batch budget (16000). It now feeds the **per-chunk + candidate scan** chain. The Phase 2 batch budget moved to `RELATIONSHIP_BATCH_MAX_OUTPUT_TOKENS` (default still 16000, behavior unchanged for users who never set the legacy var). If you explicitly set `RELATIONSHIP_MAX_OUTPUT_TOKENS=16000` in your `.env`, per-chunk extraction will run with a 16000 cap (harmless overkill); migrate to the new name when convenient.

**Migration note.** `EXTRACTION_MAX_CONTEXT` was renamed to `GRAPH_EXTRACTION_MAX_CONTEXT` to match the `GRAPH_EXTRACTION_MODEL`/`GRAPH_EXTRACTION_API_BASE`/`GRAPH_EXTRACTION_API_KEY` prefix convention. The legacy name is honored as a deprecated alias for one release; the backend logs a one-shot `WARN` at startup if your `.env` still uses it. Rename to `GRAPH_EXTRACTION_MAX_CONTEXT` at your convenience — value semantics are identical.

### Semantic Entity Resolution

```bash
# Enable embedding-based entity deduplication during storage
# Uses vector similarity via Neo4j vector index to catch semantic matches
# (e.g., "Museum of Crypto Art" ↔ "Cortex") that string similarity misses
# Falls back to Levenshtein string matching when disabled
ENABLE_SEMANTIC_ENTITY_RESOLUTION=true

# Similarity threshold for merging (0.0-1.0)
ENTITY_SIMILARITY_THRESHOLD=0.85

# Model for entity embeddings (defaults to EMBEDDING_MODEL)
# ENTITY_EMBEDDING_MODEL=openai/text-embedding-3-small
```

---

## Search & RAG Configuration

### Hybrid Search

```bash
# Enable hybrid search (vector + keyword + graph)
ENABLE_HYBRID_SEARCH=true

# Weight distribution for hybrid search (must sum to 1.0)
VECTOR_WEIGHT=0.5
KEYWORD_WEIGHT=0.3
GRAPH_WEIGHT=0.2

# Batch a search's queries into one entity-extraction + one embedding call
# (instead of one each per query) to cut round-trips during deep research.
ENABLE_BATCHED_QUERY_EXTRACTION=true
```

### Re-ranking

```bash
# Enable cross-encoder re-ranking for precision
ENABLE_RERANKING=true
RERANKING_MODEL=cross-encoder/ms-marco-MiniLM-L-6-v2

# Lifecycle (per-instance footprint tuning)
RERANKER_PRELOAD=false          # eager-load at startup; off = lazy, leaner idle instances
RERANKER_IDLE_TTL_SECONDS=1800  # unload idle model to reclaim ~1 GB; 0 = never unload
```

The local cross-encoder pulls ~780 MB into the process. It is lazy-loaded by default; the first reranked query's load is hidden behind the preceding LLM/search work, and the model unloads after `RERANKER_IDLE_TTL_SECONDS` of inactivity.

### Shared Model Services (cortex-helper)

Offload the cross-encoder and Docling converter to a service hosted once per physical machine (the `cortex-helper` repo), so many tenant stacks on that host don't each load their own copy. Falls back to the built-in local path automatically when unset or unreachable.

```bash
RERANKER_SERVICE_URL=http://cortex-helper:3030   # set = no local cross-encoder loaded
DOCLING_SERVICE_URL=http://cortex-helper:3030    # set = convert via warm service, not subprocess
HELPER_SERVICE_TOKEN=                             # shared secret -> X-Helper-Token (match helper's HELPER_TOKEN)
```

### Agentic RAG

```bash
# Enable multi-step reasoning
ENABLE_AGENTIC_RAG=true

# Maximum reasoning steps (legacy pipeline)
MAX_AGENTIC_STEPS=3

# Conversation history length
MAX_CONVERSATION_HISTORY=6
```

### Agent-Based Research Pipeline

The agent pipeline replaces the legacy fixed-step agentic RAG with an LLM-driven
researcher/writer architecture. The researcher agent uses function-calling to
iteratively gather information via tools (`knowledge_search`, `community_search`,
`entity_lookup`, `reasoning`), then the writer synthesizes a streamed answer.

**LLM requirement:** Your model must support **function calling / tool use** (the OpenAI
`tools` parameter). Compatible models include GPT-4o, GPT-4o-mini, Claude, Mistral Large,
and Command R+. Many smaller or local models behind LiteLLM do not support this.

**When to disable (`ENABLE_AGENT_RESEARCH=false`):**

- Your LLM does not support function calling (e.g., local models via Ollama/vLLM without tool-use support)
- You want lower token usage — the agent pipeline uses 3-5x more tokens due to multiple researcher iterations
- You want lower latency — the legacy pipeline makes 2 LLM calls vs 4-8 for the agent
- You want deterministic behavior — the legacy pipeline follows a fixed decompose → search → synthesize path, while the agent decides dynamically what to search and when to stop

```bash
# Use agent pipeline for deep research mode (set false for legacy fixed pipeline)
ENABLE_AGENT_RESEARCH=true

# Use agent pipeline for standard chat mode (required for skills in chat)
ENABLE_AGENT_CHAT=true

# Max agent loop iterations per mode
# RESEARCHER_MAX_ITERATIONS_SPEED=2      # Chat mode (speed)
# RESEARCHER_MAX_ITERATIONS_QUALITY=10   # Deep research mode (quality)

# Max output tokens for the writer per mode
# WRITER_MAX_TOKENS_SPEED=1200           # Chat mode
# WRITER_MAX_TOKENS_QUALITY=4000         # Deep research mode
```

### Reasoning Visibility

```bash
# Stream reasoning steps to client
STREAM_REASONING_STEPS=true

# Show retrieval statistics in response
SHOW_RETRIEVAL_STATS=true
```

---

## Community Detection

```bash
# Enable community detection
ENABLE_COMMUNITY_DETECTION=true

# Minimum entities per community
MIN_COMMUNITY_SIZE=3

# Maximum communities to detect
MAX_COMMUNITIES=50

# Enable automatic summarization
ENABLE_GRAPH_SUMMARIZATION=true

# Model for summaries (defaults to GRAPH_EXTRACTION_MODEL → OPENAI_MODEL)
# Community summarization uses the extraction model for consistent structured output
# COMMUNITY_SUMMARY_MODEL=gpt-4o-mini
```

---

## Collections

```bash
# Enable collection feature
ENABLE_COLLECTIONS=true

# Default collection for uncategorized documents
DEFAULT_COLLECTION=default
```

---

## Git Integration

Connect GitHub, GitLab, and Gitea repositories as a knowledge source and (optionally) let the agent open pull requests. See the [Git Integration feature guide](/features/git-integration) for the full walkthrough.

```bash
# Master switch (connector, endpoints, scheduler, agent git_repo tool)
ENABLE_GIT_INTEGRATION=true

# Where clone working copies are cached (mount a volume in production)
GIT_WORK_DIR=/data/git_repos

# Shallow-clone depth
GIT_CLONE_DEPTH=1

# Abort a sync above this repo size (0 = unlimited)
GIT_MAX_REPO_SIZE_MB=500

# Skip individual files larger than this (0 = no limit)
GIT_SYNC_MAX_FILE_SIZE_MB=5

# Minutes between scheduled-sync checks
GIT_SYNC_POLL_INTERVAL=5

# Timeout (seconds) for git provider REST calls
GIT_HTTP_TIMEOUT=30

# Comma-separated hosts allowed to skip TLS verification (self-hosted self-signed)
GIT_HTTP_INSECURE_HOSTS=
```

The backend image bundles the `git` binary. Connections are created and managed from **Settings → Git Integration** (admin only); the personal access token is stored server-side and never exposed to the agent.

---

## Vision Model Configuration

Configure image analysis capabilities:

```bash
# Vision model for image analysis (optional)
# If not set, uses Docling's built-in SmolDocling for image descriptions
VISION_MODEL=gpt-4o

# API configuration (falls back to OPENAI settings if not set)
VISION_MODEL_API_BASE=https://api.openai.com/v1
VISION_MODEL_API_KEY=sk-your-api-key-here

# Concurrency for vision API calls (default: 3)
# VISION_MAX_CONCURRENT=10
```

### Vision Model Options

| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| `VISION_MODEL` | No | - | Vision model for image analysis (e.g., `gpt-4o`, `claude-3-5-sonnet`) |
| `VISION_MODEL_API_BASE` | No | `OPENAI_API_BASE` | API endpoint for vision model |
| `VISION_MODEL_API_KEY` | No | `OPENAI_API_KEY` | API key for vision model |
| `VISION_MAX_CONCURRENT` | No | `3` | Max concurrent vision API calls system-wide. Increase for faster image-heavy document processing |
| `VISION_REASONING_MODE` | No | `off` | Reasoning mode for the vision-model call. Same value set as the ingestion modes — see [Reasoning Control for ingestion](#reasoning-control-for-ingestion). Lets you use reasoning multimodal models (Qwen3-VL, GLM-V) as `VISION_MODEL` without `<think>` tokens in image descriptions |

When no vision model is configured, Docling's built-in image description (SmolDocling) is used automatically. See [Image Analysis Guide](/guides/image-analysis) for details.

---

## Security

### Deployment hardening & CORS

```bash
ENVIRONMENT=production            # fail fast on weak/default secrets at startup
CORS_ALLOWED_ORIGINS=https://app.example.com,https://admin.example.com
```

With `ENVIRONMENT=production`, startup refuses to boot if `NEO4J_PASSWORD` is empty or the default `password123`, or if `SESSION_SECRET` is shorter than 32 characters while `ADMIN_PASSWORD` is set. `CORS_ALLOWED_ORIGINS` defaults to `*` (any origin, credentials disabled since auth is header-based); set an explicit comma-separated allowlist for production.

### Prompt Security

```bash
# Enable prompt injection protection
PROMPT_SECURITY=true
```

### API Key Authentication

See [Authentication Guide](/guides/authentication) for details.

---

## Compute3 Turbo Mode (ON HOLD)

> ⚠️ **Not currently available.** Compute3 partnership prepared in 2025; their service is not yet in production. The variables below are kept in the codebase against future activation but have no runtime effect today. Safe to leave all of them unset.

GPU-accelerated inference for faster processing:

```bash
# Enable Compute3 integration
COMPUTE3_API_KEY=your-c3-api-key-here
COMPUTE3_API_BASE=https://api.compute3.ai

# GPU configuration
COMPUTE3_GPU_TYPE=h100
COMPUTE3_GPU_COUNT=4

# Model to use on Compute3
COMPUTE3_MODEL=MiniMaxAI/MiniMax-M2.1

# Docker image for inference
COMPUTE3_DOCKER_IMAGE=vllm/vllm-openai:latest

# Default runtime in seconds
COMPUTE3_DEFAULT_RUNTIME=3600
```

---

## Frontend Configuration

```bash
# Backend API URL (for client-side requests)
NEXT_PUBLIC_API_URL=http://localhost:8000

# Custom Logo URL (optional)
# If set, the logo will be replaced with the image from this URL
# NEXT_PUBLIC_LOGO_URL=https://example.com/logo.png

# Custom Accent Color (optional)
# If set, overrides the default accent color in the UI
# Accepts any valid CSS color value (hex, rgb, hsl, oklch, etc.)
# Examples:
#   NEXT_PUBLIC_ACCENT_COLOR=#3b82f6
#   NEXT_PUBLIC_ACCENT_COLOR=rgb(59, 130, 246)
#   NEXT_PUBLIC_ACCENT_COLOR=oklch(0.65 0.2 250)
# NEXT_PUBLIC_ACCENT_COLOR=
```

---

## Environment Variable Summary

| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| `NEO4J_URI` | Yes | - | Neo4j connection URI |
| `NEO4J_USER` | Yes | - | Neo4j username |
| `NEO4J_PASSWORD` | Yes | - | Neo4j password |
| `OPENAI_API_KEY` | Yes | - | OpenAI API key |
| `OPENAI_MODEL` | No | `gpt-4o-mini` | LLM model |
| `OPENAI_MAX_OUTPUT_TOKENS` | No | `8000` | Floor of the output-token budget chain |
| `OPENAI_MAX_CONTEXT` | No | `32768` | Floor of the input-context budget chain |
| `EMBEDDING_MODEL` | No | `openai/text-embedding-3-small` | Embedding model |
| `EMBEDDING_DIMENSION` | No | `1536` | Embedding dimensions |
| `EMBEDDING_SEND_DIMENSIONS` | No | `true` | Send `dimensions` param to embedding API. Set `false` for fixed-dim models |
| `EMBEDDING_API_BASE` | No | `OPENAI_API_BASE` | API base URL for embeddings |
| `EMBEDDING_API_KEY` | No | `OPENAI_API_KEY` | API key for embeddings |
| `CHUNK_SIZE` | No | `500` | Tokens per chunk |
| `CHUNK_OVERLAP` | No | `50` | Overlap tokens |
| `ENABLE_GRAPH_EXTRACTION` | No | `true` | Enable GraphRAG |
| `ENABLE_SEMANTIC_ENTITY_RESOLUTION` | No | `true` | Use embedding-based vector similarity for entity dedup (falls back to Levenshtein) |
| `GRAPH_EXTRACTION_MODEL` | No | `OPENAI_MODEL` | Model for entity extraction and community summarization |
| `GRAPH_EXTRACTION_API_BASE` | No | `OPENAI_API_BASE` | API base for extraction model |
| `GRAPH_EXTRACTION_API_KEY` | No | `OPENAI_API_KEY` | API key for extraction model |
| `RELATIONSHIP_EXTRACTION_MODEL` | No | `GRAPH_EXTRACTION_MODEL` | Model for relationship extraction |
| `RELATIONSHIP_EXTRACTION_API_BASE` | No | `GRAPH_EXTRACTION_API_BASE` | API base for relationship model |
| `RELATIONSHIP_EXTRACTION_API_KEY` | No | `GRAPH_EXTRACTION_API_KEY` | API key for relationship model |
| `CONCURRENT_RELATIONS` | No | `3` | Concurrent per-chunk relationship extractions per document |
| `GRAPH_EXTRACTION_MAX_CONTEXT` | No | `0` (=inherit) | Input context for entity extraction batching. Inherits `OPENAI_MAX_CONTEXT`. Renamed from `EXTRACTION_MAX_CONTEXT` (deprecated alias still honored — startup WARN if used) |
| `EXTRACTION_MAX_OUTPUT_TOKENS` | No | `0` (=inherit) | Output budget for entity extraction. Inherits `OPENAI_MAX_OUTPUT_TOKENS` |
| `RELATIONSHIP_MAX_CONTEXT` | No | `0` (=inherit) | Input context for Phase 2 batch. Inherits `GRAPH_EXTRACTION_MAX_CONTEXT` → primary |
| `RELATIONSHIP_MAX_OUTPUT_TOKENS` | No | `0` (=inherit) | Output budget for **per-chunk + candidate scan** (was Phase 2 batch in old releases — see migration note). Inherits `EXTRACTION_MAX_OUTPUT_TOKENS` |
| `RELATIONSHIP_BATCH_MAX_OUTPUT_TOKENS` | No | `16000` | Output budget for **Phase 2 batch** (standalone, NOT in chain) |
| `VISION_MAX_OUTPUT_TOKENS` | No | `0` (=inherit) | Output budget for image analysis. Inherits `RELATIONSHIP_MAX_OUTPUT_TOKENS` → extraction → primary |
| `PARALLEL_RELATIONSHIP_BATCHES` | No | `5` | Parallel relationship analysis batches |
| `RELATIONSHIP_TARGET_RATIO` | No | `1.0` | Target entity-to-relationship ratio (ERR) |
| `RELATIONSHIP_MAX_ROUNDS` | No | `3` | Max discovery rounds per batch |
| `RELATIONSHIP_MAX_HOURS` | No | `0` | Max hours for analysis (0 = no limit) |
| `RELATIONSHIP_MAX_PER_ENTITY` | No | `50` | Soft cap on relationships per entity (0 = no cap) |
| `EXTRACTION_REASONING_MODE` | No | `off` | Force reasoning OFF on extraction/summary/community calls. Values: `off\|minimal\|auto\|low\|medium\|high` |
| `RELATIONSHIP_REASONING_MODE` | No | `off` | Force reasoning OFF on candidate scan + relationship extraction. Same values as above |
| `VISION_REASONING_MODE` | No | `off` | Force reasoning OFF on the vision model's image-description call. Same values as above |
| `DEFAULT_REASONING_MODE` | No | `auto` | Reasoning mode for Q&A path (researcher agent stays AUTO to preserve parallel tool calls) |
| `REASONING_MODEL_OVERRIDES` | No | empty | Per-model override. Format: `model1:mode1,model2:mode2`. Example: `gpt-5.8:none,custom:minimal` |
| `ENABLE_HYBRID_SEARCH` | No | `true` | Enable hybrid search |
| `ENABLE_RERANKING` | No | `true` | Enable re-ranking |
| `RERANKER_PRELOAD` | No | `false` | Eager-load cross-encoder at startup |
| `RERANKER_IDLE_TTL_SECONDS` | No | `1800` | Unload idle reranker after N s (0 = never) |
| `RERANKER_SERVICE_URL` | No | - | Offload reranking to cortex-helper |
| `DOCLING_SERVICE_URL` | No | - | Offload Docling conversion to cortex-helper |
| `HELPER_SERVICE_TOKEN` | No | - | Shared secret for the helper service |
| `ENVIRONMENT` | No | `development` | `production` = fail fast on weak secrets |
| `CORS_ALLOWED_ORIGINS` | No | `*` | Comma-separated CORS allowlist |
| `ENABLE_AGENTIC_RAG` | No | `true` | Enable agentic mode |
| `ENABLE_AGENT_RESEARCH` | No | `true` | Use agent pipeline for deep research (vs legacy) |
| `ENABLE_AGENT_CHAT` | No | `true` | Use agent pipeline for standard chat (required for skills) |
| `RESEARCHER_MAX_ITERATIONS_SPEED` | No | `2` | Agent loop iterations for chat mode |
| `RESEARCHER_MAX_ITERATIONS_QUALITY` | No | `10` | Agent loop iterations for deep research |
| `WRITER_MAX_TOKENS_SPEED` | No | `1200` | Max output tokens for chat answers |
| `WRITER_MAX_TOKENS_QUALITY` | No | `4000` | Max output tokens for deep research answers |
| `ENABLE_COMMUNITY_DETECTION` | No | `true` | Enable communities |
| `PROMPT_SECURITY` | No | `true` | Enable prompt guard |
| `VISION_MODEL` | No | - | Vision model for image analysis |
| `VISION_MODEL_API_BASE` | No | `OPENAI_API_BASE` | Vision model API endpoint |
| `VISION_MODEL_API_KEY` | No | `OPENAI_API_KEY` | Vision model API key |
| `VISION_MAX_CONCURRENT` | No | `3` | Max concurrent vision API calls |
| `ADMIN_EMAIL` | Yes | - | Admin login email |
| `ADMIN_PASSWORD` | Yes | - | Admin login password |
| `ADMIN_API_KEY` | Yes | - | Admin API key |
| `SESSION_SECRET` | Yes | - | JWT session secret |
| `ENCRYPTION_KEY` | No | - | At-rest encryption for git PATs + skill secrets (comma-separated Fernet keys; first encrypts, all decrypt) |
| `NEXT_PUBLIC_LOGO_URL` | No | - | Custom logo URL |
| `NEXT_PUBLIC_ACCENT_COLOR` | No | - | Custom UI accent color (CSS) |

---

## Validation

Verify your configuration is working:

```bash
# Check API health
curl http://localhost:8000/health

# Check stats (requires API key)
curl -H "X-API-Key: your-api-key" http://localhost:8000/api/stats
```

## Efficiency Flags (v-next)

All default **off**; enable per stack after an A/B bench run (`bench/BASELINE.md`). None change API shapes or graph semantics.

| Variable | Default | Description |
|----------|---------|-------------|
| `ENTITY_DEDUP_PREFILTER` | `false` | Levenshtein entity dedup scores only the top-50 fulltext-index candidates instead of scanning every entity. |
| `ENABLE_BATCHED_KG_WRITES` | `false` | Write entities/links/relationships via UNWIND batches (~10 Neo4j round trips per document instead of hundreds), preserving per-item dedup semantics. |
| `ENABLE_BATCHED_CHUNK_RELATIONSHIPS` | `false` | Pack several chunks into one per-chunk relationship-extraction LLM call. |
| `RELATIONSHIP_CHUNKS_PER_CALL` | `4` | Max chunks per batched relationship-extraction call. |
| `ENABLE_PHASEB_CHECKPOINTING` | `false` | Persist Phase B batch progress — crash/redeploy resumes; rounds 2+ reuse round 1's candidate scan. |
| `ENABLE_REPROCESS_DELTA` | `false` | Skip reprocessing when file bytes + extraction config are unchanged since the last successful run. |
| `RESEARCHER_STABLE_PROMPT` | `true` | Keep the researcher system prompt byte-stable across loop iterations (provider prefix caches hit from iteration 2). |
| `ENABLE_PROMPT_CACHE_CONTROL` | `false` | Anthropic `cache_control` breakpoints when routed via OpenRouter to `anthropic/*` models. |

## Observability, Limits & Resilience

| Variable | Default | Description |
|----------|---------|-------------|
| `LOG_FORMAT` | `plain` | `plain` keeps the legacy log format; `json` emits structured lines with `request_id` (from/echoed as `X-Request-ID`). |
| `METRICS_ENABLED` | `true` | Prometheus metrics at `GET /metrics` (admin API key required; not exposed through the prod nginx). |
| `RATE_LIMIT_QPM` | `0` | Per-API-key requests/minute on ask/upload endpoints (0 = off). 429 + `Retry-After` on excess. |
| `RATE_LIMIT_BURST` | `10` | Token-bucket burst capacity for `RATE_LIMIT_QPM`. |
| `RESEARCHER_WALL_CLOCK_SECONDS` | `0` | Wall-clock budget for the researcher loop (0 = unlimited); on expiry the writer synthesizes from gathered results. |
| `RERANK_TOP_K` | `15` | Candidates kept/reranked per knowledge search. |
| `HELPER_STRICT_REMOTE` | `false` | With `DOCLING_SERVICE_URL` set: conversion failure marks the document failed instead of falling back to local docling. |
| `INSTANCE_ID` | hostname | Stack identity sent to cortex-helper (`X-Tenant-ID`) for fair queuing. |
| `NEO4J_MAX_POOL_SIZE` | `100` | Neo4j driver connection pool size. |
| `NEO4J_CONNECTION_TIMEOUT` | `10` | Neo4j TCP connect timeout (seconds). |
| `NEO4J_CONNECTION_ACQUISITION_TIMEOUT` | `60` | Max wait for a pooled connection (seconds). |

Compose-level: `NEO4J_MEM_LIMIT` (default 4g), `NEO4J_HEAP_INITIAL`/`NEO4J_HEAP_MAX`/`NEO4J_PAGECACHE`, `FRONTEND_MEM_LIMIT` (1g). Backups: overlay `docker-compose.backup.yml` with `BACKUP_INTERVAL_SECONDS` (86400), `BACKUP_RETENTION_DAYS` (7), `NEO4J_ENTERPRISE_BACKUP` (false).

**Slim image**: build with `--build-arg INSTALL_LOCAL_ML=false` for a torch-free backend (~1.2 GB) when reranking + conversion are offloaded to cortex-helper. Requires OpenAI embeddings; pair with `HELPER_STRICT_REMOTE=true`.


---

## Document: Changelog

A chronological log of all notable changes, improvements, and fixes to Cortex.

URL: /changelog


# Changelog

All notable changes to Cortex are documented here, organized by date.

---

## June 10, 2026 (v-next efficiency & hardening)

### LLM-stack efficiency (flag-gated, default off)

A set of opt-in flags cuts the LLM and database cost of knowledge-graph building without changing any API or output semantics. Each flag ships **default-off** and is meant to be enabled after an A/B run against your bench baseline (`bench/BASELINE.md`):

- **`ENABLE_BATCHED_KG_WRITES`** — entities, chunk links, and relationships are written in UNWIND batches: a handful of Neo4j round trips per document instead of one per item (hundreds→~10 on extraction-heavy docs), while preserving the exact dedup/merge semantics of the per-item path (locked by parity tests).
- **`ENABLE_BATCHED_CHUNK_RELATIONSHIPS`** (+ `RELATIONSHIP_CHUNKS_PER_CALL`, default 4) — per-chunk relationship extraction packs several chunks into one LLM call (÷~4 calls), with a per-batch safety net that re-dispatches through the single-chunk path on parse failure.
- **`ENABLE_PHASEB_CHECKPOINTING`** — cross-document relationship analysis persists per-batch progress: a crash or redeploy resumes instead of re-paying every batch, and multi-round runs reuse round 1's candidate scan (~50% fewer Phase 1 calls).
- **`ENABLE_REPROCESS_DELTA`** — reprocessing a document whose file bytes and extraction config are unchanged is skipped entirely; git re-syncs of unchanged files cost ~zero.
- **`ENTITY_DEDUP_PREFILTER`** — entity dedup scores the top-50 fulltext candidates instead of scanning every entity (big win on 10k+ entity graphs).
- **Prompt caching**: the researcher system prompt is now byte-stable across loop iterations (`RESEARCHER_STABLE_PROMPT`, default on — provider prefix caches hit from iteration 2), and `ENABLE_PROMPT_CACHE_CONTROL` adds Anthropic `cache_control` breakpoints when routed via OpenRouter. Skill catalog/instructions are TTL-cached per process instead of being reloaded (Neo4j + filesystem + decryption) on every request.

### Production hardening

- **Resilient helper transport**: calls to the shared `cortex-helper` now retry with backoff behind a circuit breaker; `HELPER_STRICT_REMOTE=true` keeps docling out of tenant containers even on helper outages. The helper itself gained per-tenant fair queuing (`CONVERT_PER_TENANT`, `CONVERT_QUEUE_TIMEOUT` → 503 + Retry-After) and a shutdown drain.
- **Graceful shutdown**: rolling restarts drain in-flight requests (uvicorn `--timeout-graceful-shutdown` + compose `stop_grace_period`); SSE streams receive a terminal `event: shutdown` frame so clients can reconnect instead of seeing a dead socket.
- **Observability**: optional JSON logging with `X-Request-ID` correlation end-to-end (chat → backend → helper), and an admin-protected Prometheus `GET /metrics` endpoint (HTTP latency/route, SSE streams, document outcomes, conversion timing, helper breaker state). `LOG_FORMAT=plain` (default) keeps logs byte-identical.
- **Per-key rate limiting** (`RATE_LIMIT_QPM`, default off) on ask/upload endpoints — a burst guardrail beneath the monthly quota, returning 429 + `Retry-After`.
- **Memory caps everywhere**: Neo4j, frontend, and nginx now carry compose `mem_limit`s (tunable via `NEO4J_MEM_LIMIT` etc.) so one stack's blowup can't OOM a neighbor; the Neo4j driver pool is configurable; nginx gained a dedicated unbuffered SSE location with a 1-hour read timeout.
- **Slim backend image**: `Dockerfile.prod --build-arg INSTALL_LOCAL_ML=false` builds a torch-free image (~1.2 GB vs ~7 GB dev image) for stacks that fully offload to cortex-helper — the biggest per-tenant footprint lever yet. CI smoke-builds it on every PR.
- **Backups**: an opt-in compose overlay (`docker-compose.backup.yml`) runs nightly online graph exports (APOC, works on Community) + file-volume archives with retention, plus a documented restore runbook (`ops/backup/backup.sh`).
- Fixes: skill HTTP calls now honor `SKILL_HTTP_TIMEOUT` (was hardcoded 15s); vector-index health is verified at startup and silent semantic-dedup degradation is surfaced in `/api/stats` (`vector_search_failures`); inbound conversation-memory blobs are size-clamped.

**No API or schema breakage** — all changes are additive or flag-gated; existing `.env` files and deployments keep working unchanged.

---

## June 10, 2026

### Leaner instances: lazy models + a shared per-machine model service

Backend startup memory dropped from ~2.2 GB to ~1 GB by deferring heavy model loads. The cross-encoder reranker no longer loads at startup (`RERANKER_PRELOAD=false` by default) — its ~7 s cold start is now hidden behind the query-analysis and search work that precedes reranking, and the model unloads after an idle period (`RERANKER_IDLE_TTL_SECONDS`, default 1800; `0` = never) to reclaim ~1 GB. Docling is no longer imported at startup (it ran in a subprocess anyway), saving ~244 MB per instance.

**New `cortex-helper` service.** For deployments running many isolated stacks on one host, the cross-encoder reranker and Docling converter can now be hosted **once per physical machine** and shared by all stacks via `RERANKER_SERVICE_URL` / `DOCLING_SERVICE_URL` / `HELPER_SERVICE_TOKEN`. Models stay warm (Docling conversions skip the per-document model reload — ~0.04 s vs ~4.5 s), and both clients fall back to the built-in local path automatically if the service is unset or unreachable. Single-instance and on-prem users are unaffected.

### Production hardening

- **CORS** is now allowlist-driven via `CORS_ALLOWED_ORIGINS` (default `*` allows any origin but with credentials disabled, since auth is header-based). Set an explicit allowlist for production.
- **Fail-fast secrets**: `ENVIRONMENT=production` refuses to boot on weak/default secrets — an empty/`password123` `NEO4J_PASSWORD`, or a `SESSION_SECRET` under 32 characters when `ADMIN_PASSWORD` is set.
- **CI**: a GitHub Actions workflow now runs backend tests + lint and frontend type-check + lint on every PR.

### Upgrade notes

**No database migration and no API changes** — endpoints, request/response shapes, and the Neo4j schema are unchanged, so existing API-key consumers and cortex-chat are unaffected. Rebuild the backend image and redeploy; the defaults preserve current behavior closely. A few things to decide per deployment:

- **Reranker is now lazy + idle-unloading.** Previously it was warmed at startup and stayed resident; now the first reranked query after a restart (or after an idle period) pays a ~7 s in-memory load (no re-download — the model is baked into the image). To restore always-warm behavior, set `RERANKER_PRELOAD=true` and `RERANKER_IDLE_TTL_SECONDS=0`. Recommended for latency-sensitive single-tenant deployments; leave the new defaults for multi-tenant density.
- **CORS default disables credentials with the wildcard.** Harmless because auth is header-based (`X-API-Key`), and browsers reject wildcard-with-credentials anyway. If a browser client genuinely needs credentialed CORS, set an explicit `CORS_ALLOWED_ORIGINS` allowlist (which re-enables credentials).
- **`ENVIRONMENT=production` fail-fast is opt-in.** Deployments that don't set it are unaffected. But once set, startup **aborts** if `NEO4J_PASSWORD` is empty/`password123` or `SESSION_SECRET` is under 32 characters (with `ADMIN_PASSWORD` set) — fix those secrets *before* enabling the flag.
- **Adopting the shared `cortex-helper` service is optional and fails safe.** Leave `RERANKER_SERVICE_URL` / `DOCLING_SERVICE_URL` unset to keep the built-in local path; when set, both clients fall back to local automatically if the service is unreachable. Ensure the stacks can reach the helper (a shared external Docker network for permanence), that `HELPER_SERVICE_TOKEN` matches the helper's `HELPER_TOKEN`, and that the helper's `RERANKING_MODEL` matches expectations.
- **Memory:** with full offload, backend idle drops to ~1 GB and no longer spikes — `BACKEND_MEM_LIMIT` can be lowered. Without offload (or docling-only), a local rerank still spikes to ~2 GB, so keep `mem_limit` at ≥ ~2.5 GB; budget the helper's own ~0.6–4 GB once per host.

---

## June 8, 2026

### Snappier chat: live status, heartbeats, and a Deep Research toggle

The chat no longer feels stuck in the seconds before the first token. The streaming endpoint now emits structured `status` events at each pipeline stage (`analyzing → searching → reranking → generating`, in every mode including standard Chat) and SSE heartbeats during silent windows. The chat UI replaces the static "Thinking…" dots with a **live indicator** — a blinking accent dot, the current stage label, a running elapsed-seconds counter, and a reassurance line after 12 s. Gated by `STREAM_REASONING_STEPS`.

**Deep Research is now an in-chat toggle.** Chat is the default; an Erlenmeyer-flask button flips on Deep Research (multi-step depth) at any time — even mid-conversation — instead of being a separate destination.

### Conversation Memory — a multi-bucket context curator for the agent

The research agent gains an opt-in, **client-carried conversation memory** that replaces blunt history truncation (the agent previously kept only the last 6 messages, silently forgetting older turns). The backend stays stateless: send an opaque `conversation_memory` blob on the request, read the updated one back from a new `memory_update` SSE event, and replay it next turn. Omit the field and nothing changes — it's fully backward-compatible.

A new **context curator** rebuilds a small, fixed context each turn from several memory buckets instead of re-feeding ever-growing history:

- **transcript** — recent turns verbatim + a rolling summary of older ones (no more amnesia past a few turns).
- **facts / open_questions / intent** — durable knowledge, unresolved threads, and the user's goal + preferred answer style/language, extracted automatically.
- **source_ledger** — every cited source gets a conversation-stable `sid` (now on each `sources` event), so a citation in turn 2 keeps its identity in turn 5.
- **kg_context** — a knowledge-graph snapshot for grounding follow-ups.

**Faster, cheaper follow-ups.** A memory fast-path lets questions answerable from memory alone ("summarize that", "why?", "in German") **skip retrieval entirely** — a single lightweight pass instead of the full research loop. Memory updates (summary folding + bucket extraction) run *after* the answer streams using a cheap fast model, so they add no user-visible latency. See [Ask AI → Conversation Memory](/features/ask-ai#conversation-memory). Tunable via `CONVERSATION_MEMORY_*` and `ENABLE_MEMORY_FAST_PATH`.

### Skill API failures are now surfaced instead of silently swallowed

When the research agent calls an external API through a skill's `http_request` tool and the call fails — an error status (`401`, `403`, `422`, `5xx`) or a timeout — that failure is now visible to you rather than glossed over.

Previously a failed call (for example, a Zammad ticket creation returning `403 Forbidden`) was only fed back to the model internally: nothing in the UI distinguished it from a success, and the assistant could narrate the action as if it had completed. Now:

- **A red skill step** appears in the chat's research process showing the failing method, URL, and HTTP status (e.g. `API call failed: POST https://.../tickets → HTTP 403`).
- **The final answer explicitly reports the failure** — it states the attempted action did not succeed and explains why, using the API's error message, and will not imply the action worked. This is driven by a dedicated "Failed Actions" instruction passed to the answer writer, so it holds even for small/fast models.

This is most important for **write actions** (creating tickets, opening PRs, posting data). Common causes of a failed call: a missing or expired token (configure it in the skill's Setup Wizard), insufficient permissions on the token's account, or a payload the target API rejects as invalid. See [Agent Skills](/features/skills#the-http_request-tool).

---

## June 3, 2026

### Git integration — connect GitHub, GitLab & Gitea repositories as a living knowledge source

Cortex can now connect directly to git repositories and keep their knowledge in sync. This is a major capability expansion: a connected repo becomes a **first-class, bidirectional interface** — Cortex ingests the repo's files and wiki into the knowledge graph (read), and the research agent can act on the repo by opening pull requests (read/write). Works with **GitHub, GitLab, and Gitea** (including self-hosted) behind a single provider abstraction. Enable with `ENABLE_GIT_INTEGRATION=true`.

**Two surfaces, one connection.** Each connection carries an `access_level`:

- **Read** → ingestion. Cortex clones the repo and feeds matching files (and optionally the wiki) through the normal pipeline — chunking, embedding, entity/relationship extraction — so repo content is searchable and shows up in the graph alongside your other documents.
- **Read/write** → the agent gains a `git_repo` tool. It can read live file contents and **propose changes as pull requests**. Writes *always* land on a fresh `cortex/agent-*` branch and open a PR/MR for human review — never a direct push to the default branch. Write actions are rejected server-side on read-only connections, so the toggle is a hard guarantee.

**Incremental, versioning-aware sync.** Re-syncing doesn't re-ingest everything. Cortex stores the last-synced commit and runs `git diff --name-status -M` to classify each change — **Added** (new document), **Modified** (re-extract in place), **Deleted** (flag the document for review, never auto-delete), **Renamed** (remap the path). If history was rewritten (force-push) or the filters changed, it self-heals by falling back to a full-tree reconcile that compares every file's blob SHA to what's stored. Documents carry git provenance (`git_connection_id`, `git_path`, `git_blob_sha`, `git_commit_sha`) so the sync key is the repo path, not a filename. After a sync that changed anything, the graph is flagged stale so you know to re-run relationship analysis and community detection.

**Curated "documents only" default.** New connections default to ingesting **`.pdf` and `.md` files only** — a single checkbox in the connect form, checked by default. Uncheck it to reveal custom include/exclude globs (gitignore-style). Code files (`.py`, `.ts`, `.go`, …) and markdown ingest through a new Docling-free **fast path** (running Docling on source code is wasteful); PDFs and Office documents route through Docling as usual. Images and audio are deliberately excluded from repo sync so a connection never OCRs every logo.

**Manual + scheduled.** Sync on demand with a "Sync now" button, or set a per-connection interval and a background poller keeps the repo current. No webhooks required (no public endpoint needed). GitHub wikis are cloned via `repo.wiki.git`; GitLab/Gitea wikis use their Wikis API.

**Setup UX.** A new **Git Integration** card on the Settings page walks you through it: pick a provider, paste a personal access token, and an inline, vendor-specific guide tells you exactly which token to create (always recommending the least-privilege option) with a direct link to the right settings page. "Test" verifies the token, then you pick the repo, access level, and (optionally) advanced filters. Existing connections are fully editable — access level, branch, schedule, globs, wiki, and token rotation.

**Credentials.** Single-tenant, personal-access-token per connection. The token is injected server-side into every git and API call — the agent/LLM never sees it — masked in all API responses, scrubbed from logs and errors, and never persisted in `.git/config`.

**Bonus correctness fix (applies to all documents).** While building the sync's re-extraction path we fixed a pre-existing bug: relationships carry a `source_document_id` but previously survived reprocess/delete unless an endpoint entity became fully orphaned, leaving stale `RELATES_TO` edges. A new `delete_relationships_by_source_document` now runs in `delete_document_chunks` and `delete_document`, so reprocessing or deleting **any** document (uploads and custom inputs included) cleans up its relationships.

New env vars: `ENABLE_GIT_INTEGRATION`, `GIT_WORK_DIR`, `GIT_CLONE_DEPTH`, `GIT_MAX_REPO_SIZE_MB`, `GIT_SYNC_MAX_FILE_SIZE_MB`, `GIT_SYNC_POLL_INTERVAL`, `GIT_HTTP_TIMEOUT`, `GIT_HTTP_INSECURE_HOSTS`. The backend image now bundles `git`; `pathspec` was added for glob matching. New endpoints under `/api/integrations/git/*` (admin-gated). New backend: `services/git_providers/` (GitHub/GitLab/Gitea adapters) and `services/git_connector_service.py`; new frontend: `components/admin/GitIntegrations.tsx`; new docs: `handbook/22-git-integration.md`, `documentation/pages/features/git-integration.mdx`, and `.claude/domain/git-integration.md`.

### Knowledge Graph wording: "relations" and "cross-document relations"

The Knowledge Graph page (and docs) now use simpler terminology: Step 1 counts are labeled **relations** (formerly "within-document relationships") and Step 2 counts are labeled **cross-document relations** (formerly "cross-document relationships"). All of these are relations — only the cross-document ones need the qualifier. No behavior change; `per_chunk_relationship_count` and other API fields are unchanged.

---

## June 1, 2026

### Library export AND import stream to/from disk instead of buffering — fixes OOM crash on large instances

Exporting a large instance (~1,300 documents / 22K entities / 49K relationships, all carrying embedding vectors) crashed the backend container with no logs and left the subsequent redeploy failing. Root cause was memory, not storage: the export built a full Python list of every serialized record and then did `"\n".join(lines)`, holding the entire payload **twice** in RAM. Chunks *and* entities both carry embedding vectors, so the two heaviest sections alone could hit 1–2 GB transient — enough to trip the kernel OOM killer, which sent an uncatchable SIGKILL (hence no logs) and took Neo4j and the redeploy down with it.

**Export** now streams: a new `_write_ndjson` helper writes each NDJSON entry one JSON line at a time via `zf.open(name, "w", force_zip64=True)`, and the embedding-heavy sections pull 500-row batches from Neo4j and stream them straight into the zip. New batched query methods in `neo4j_service.py`: `export_entity_count`/`export_all_entities_batched` (ORDER BY `e.name`) and `export_relationship_count`/`export_all_entity_relationships_batched` (ORDER BY `elementId(r)` for stable SKIP/LIMIT pagination).

**Import** got the mirror treatment so restoring a large archive doesn't OOM the *target* instance: `_iter_ndjson` reads each heavy entry through an `io.TextIOWrapper` one line at a time, `_iter_ndjson_batches` feeds the existing `import_*_batch` inserts (chunks, entities, chunk mentions), and relationships stream one-at-a-time. The pre-import plan-limit guards (`MAX_FILES`, `MAX_ENTITIES`) now use `_count_ndjson` — which counts lines without parsing or buffering — instead of loading the whole entities-with-embeddings file into RAM just to call `len()`. Progress totals are read from `manifest.stats`.

Peak RAM on both sides is now ~one 500-record batch regardless of corpus size. The archive format, endpoints, and on-disk output are unchanged — existing exports/imports remain fully compatible. Touched `backend/app/services/{library_transfer_service.py, neo4j_service.py}` and `.claude/domain/admin-features.md`.

---

## May 22, 2026

### Image downscaling + JPEG recompression before vision API call

PDF pages are rendered at 2× DPI for OCR-grade text legibility (`images_scale=2.0`), so a typical page becomes a ~2400×1700 PIL image. Encoded as PNG with our original `_pil_to_data_url`, that's several MB of base64 — and some hosted vision deployments (LiteLLM-wrapped vLLM, custom OpenAI-compatible endpoints) tokenize the base64 data-URL payload as *text* instead of using tile-based image accounting. One customer instance hit **184K input tokens against a 192K context cap** on a single image as a result.

Two new env knobs in `backend/app/services/vision_analyzer._pil_to_data_url`:

- `VISION_MAX_IMAGE_SIDE` (default `1568`) — caps the longer side via `Image.thumbnail` with Lanczos resampling, preserves aspect ratio, original PIL untouched. 1568 matches Claude's recommended max side: high enough for OCR text legibility, low enough to keep JPEG payloads under ~700 KB. Set 0 to disable.
- `VISION_JPEG_QUALITY` (default `85`) — JPEG quality for opaque content. Images with alpha (mode `RGBA`) still go out as PNG; everything else uses JPEG with `optimize=True` for 5–10× smaller payloads at visually-near-lossless quality.

Surfaced in the admin Settings page (LLM Configuration → Vision Model) and documented in `.env.example`, `.claude/environment.md`, and `handbook/14-image-analysis.md`.

### Venice perf-tuning numbers recalibrated to verifiably-stable defaults

The previously documented Venice tuning block (`BATCH_PROCESSING_CONCURRENCY=5`, `CONCURRENT_EXTRACTIONS=10`, `CONCURRENT_RELATIONS=5`, `VISION_MAX_CONCURRENT=5`) was too aggressive — a 52-doc ingestion run logged **1012 × HTTP 429 "Too Many Requests"** against Venice's embeddings endpoint (tripping their 20-failure / 30-second penalty), and **34 entity-dedup batches fell back to Levenshtein** because the rate-limit cascade killed embedding calls used for semantic resolution. Recalibrated to:

```env
BATCH_PROCESSING_CONCURRENCY=3
CONCURRENT_EXTRACTIONS=4
CONCURRENT_RELATIONS=4
VISION_MAX_CONCURRENT=4
```

A subsequent 52-doc batch on the new values processed clean: **0 × 429s, 0 × Levenshtein fallbacks, 0 × event-loop block warnings, 52/52 succeeded**. Updated in `.env.recommended`, `.env.example`, `README.md`, `.claude/environment.md`, `documentation/pages/{configuration.mdx, quickstart.mdx}`, `handbook/{03-getting-started.md, 04-configuration.md}` — eight locations total. The old values remain reasonable on bigger self-hosted vLLM endpoints where you control the rate-limit policy; they were just too hot for Venice's gateway.

### Recommended stack switches to DeepSeek-V4-Flash + Qwen3-Embedding-8B native context

The bench-validated primary moves from MiniMax-M2.7 (196K window, inlined `<think>` issues on Venice for relationship-tier calls) to DeepSeek-V4-Flash (1M window). The recommended block in `.env.recommended`, `.env.example`, `README.md`, `.claude/environment.md`, `documentation/pages/{configuration.mdx, quickstart.mdx}`, and `handbook/{03-getting-started.md, 04-configuration.md}` now reads `OPENAI_MODEL=deepseek-v4-flash` / `OPENAI_MAX_CONTEXT=1000000`.

### `EMBEDDING_MAX_INPUT_TOKENS` env var + post-chunker token-cap sub-splitter

New env knob `EMBEDDING_MAX_INPUT_TOKENS` (default `8192`) — matches the cap Venice and OpenAI enforce at the API gateway (8192 regardless of the underlying model's native context). Two-layer protection now sits between the chunker and the embed call:

1. **Sub-splitter** — `_enforce_embed_token_cap` in `backend/app/services/document_processor.py` walks the chunker output after URL restoration. Any chunk exceeding `EMBEDDING_MAX_INPUT_TOKENS × ~2.8 chars/token` gets recursively split (paragraph → line → sentence → space → hard char slice) into safe pieces with the parent's `meta` preserved. Zero content loss; new pieces slot into `chunk_index` via the existing `enumerate` loop. Catches Docling-extracted tables and custom-input pastes without sentence boundaries — the exact inputs that previously emitted single oversize chunks.
2. **Truncation safety net** — `_truncate_for_embedding` (~2.8 chars/token, deliberately conservative for markdown/code/CJK) runs immediately before each `OpenAIDocumentEmbedder.run()` call as belt-and-suspenders. With the sub-splitter in front, it should fire essentially never on normal docs.

Together they stop the `HTTP 400 "Input text exceeds the maximum token limit of N tokens"` (OpenAI-direct) and `litellm.ContextWindowExceededError` (proxied) failures that silently dropped embeddings. Self-hosted vLLM users running Qwen3-Embedding-8B can lift the cap to 32768 to use the model's full native context; managed-provider users should leave the default.

### `VISION_MIN_IMAGE_SIDE` skips sub-threshold images before the vision call

Default `64`. PDFs expose ~10–40 px bullets, icons, and dingbats as Docling `PictureItem`s; Venice (and most hosted vision APIs) reject any image with a side under ~64 px as `HTTP 400 "Supplied image did not pass validation checks"`. Direct API probing confirmed the cutoff: 31×31, 32×32, 48×48, 56×56 all fail; 64×64 passes. Cortex now short-circuits in `vision_analyzer.analyze_image_with_vision_model` when `min(width, height) < VISION_MIN_IMAGE_SIDE` — saves API spend, eliminates the 3-retry cascade per tiny image, and removes the noisy fallback storage entries from the graph. Set 0 to disable. Format (PNG vs JPEG) was *not* the cause — both behave identically.

### Stats endpoint no longer multiplies file_size by chunk count

`GET /api/stats` was reporting `total_size` ~70× larger than reality (23 GB shown for a 320 MB corpus). Root cause in `neo4j_service.py:get_stats`: the unscoped query did `MATCH (d:Document) OPTIONAL MATCH (d)-[:HAS_CHUNK]->(c:Chunk) WITH … sum(d.file_size)`, fanning out one row per chunk so each document's `file_size` was summed once per chunk per doc. Fixed by hoisting the `total_size` aggregation before the chunk join. The scoped (per-collection) variant got a matching `WITH DISTINCT d` defensive fix in case docs appear in multiple collections.

### Image entities now use the same embedding-based dedup as text entities

Previously, entities extracted from image descriptions always went through fuzzy-only Levenshtein 85% deduplication, even with `ENABLE_SEMANTIC_ENTITY_RESOLUTION=true` enabled. Text entities benefited from embedding-first dedup (catching "MOCA" ↔ "Museum of Crypto Art"); image entities did not. This left a cross-source dedup gap — an "MOCA" entity extracted from a chart caption couldn't merge with an existing "Museum of Crypto Art" text entity.

- `backend/app/services/neo4j_service.py:store_graph_extraction()` gains an optional `entity_embeddings` parameter. When provided **and** the semantic resolution flag is on, each entity routes through `store_entity_with_embedding()` (embedding-first vector match, Levenshtein fallback for typos). Otherwise it falls back to the existing fuzzy-only path — no behavior change for callers that don't pass embeddings.
- `backend/app/services/document_processor.py:process_single_image()` now batch-embeds each image's extracted entities via `generate_entity_embeddings_batch_async()` before storing, mirroring what the per-document text path has been doing. One `embeddings` call per image-with-entities (typically &lt;10 names, very cheap). On embedding failure, the image still stores via Levenshtein with a logged warning — never aborts the document.
- Image-derived entities now populate the same `entity_embedding` Neo4j vector index that text entities populate. Cross-source duplicates collapse at write time; the `/deduplicate` page's manual workflow becomes more effective because semantic matches happen during ingestion instead of being deferred. Verified end-to-end: a test document with 32 images stored 190 image entities, **100% carrying 4096-dim embeddings**, zero fallback warnings.
- Docs updated: `.claude/domain/entities.md`, `.claude/domain/document-pipeline.md`, `handbook/13-deduplication.md`, `handbook/14-image-analysis.md`, `documentation/pages/features/knowledge-graph.mdx`.

### One-click Generate Graph from /documents

Clicking **Generate Graph** on the Documents page previously navigated to `/extract` and then required a second click to actually start the pipeline. The button now appends `?autostart=1`; the Knowledge Graph page detects the param on mount, waits for its initial data fetch, fires `handleRegenerateGraph()` exactly once, and `router.replace`s the URL clean so a refresh won't re-fire.

- `frontend/src/components/DocumentList.tsx` — `router.push("/extract?autostart=1")` instead of plain `/extract`.
- `frontend/src/app/extract/page.tsx` — new `useEffect` reads `useSearchParams().get("autostart")`, guards via a `hasAutoStarted` ref, fires once when `loading === false && documents.length > 0 && !isRegenerating`.
- Trigger code stays in one place on the destination page; the documents page just emits the intent via query param. The destructive-action confirm dialog inside `handleRegenerateGraph` is preserved — auto-start does not bypass it when entities already exist.

---

## May 21, 2026

### Reasoning Suppression Control (Provider-Agnostic)

Modern reasoning models (GPT-5/5.1, Claude 4.x, Qwen3, DeepSeek-R1, GLM, Kimi, MiniMax M2) over-think on structured-extraction tasks, inline `<think>` tokens that corrupt XML output, and burn token budget on chain-of-thought that the parser never sees. A new provider-agnostic switch forces reasoning OFF (or any other level) on every LLM call inside the knowledge-graph pipeline while leaving the researcher/writer Q&A path on AUTO.

- New `backend/app/services/reasoning_config.py` — `ReasoningMode` enum, regex-based model-family classifier, and a per-backend dispatch table. Each backend gets the correct kwargs:
  - **OpenAI** → `reasoning_effort` (`none`/`minimal`/`low`/…)
  - **OpenRouter** → `extra_body.reasoning.effort`
  - **Venice** → `extra_body.venice_parameters.disable_thinking`
  - **Anthropic** → `extra_body.thinking={"type":"disabled"}` (omitted on Opus 4.7+ adaptive thinking)
  - **vLLM / Compute3** → `extra_body.chat_template_kwargs.enable_thinking=false`
- `safe_chat_completion` / `safe_chat_completion_sync` wrappers handle a runtime fallback: if a model 400s on the reasoning param, the wrapper retries without it and caches the `(base_url, model)` pair so future calls skip the param upfront — one wasted call per misclassified model, then nothing.
- All 14 LLM call sites in `graph_extractor.py` route through the wrappers — entity extraction, document summaries, community naming, entity enrichment, candidate scan, gleaning pass, per-chunk + Phase-2 relationship extraction.
- **Vision-model image analysis** uses the same switch via a `flatten_reasoning_body()` helper that converts OpenAI-SDK-style `extra_body` into a raw-HTTP body dict (vision uses `httpx`, not the OpenAI client). One-shot 400-fallback works the same way.
- New env vars: `EXTRACTION_REASONING_MODE` (default `off`), `RELATIONSHIP_REASONING_MODE` (default `off`), `VISION_REASONING_MODE` (default `off`), `DEFAULT_REASONING_MODE` (default `auto` — Q&A stays on provider default). Per-model escape hatch via `REASONING_MODEL_OVERRIDES=model1:mode1,model2:mode2`.
- Defaults are no-ops for pure-instruct models (Mistral, Llama, GPT-OSS) — zero behavior change for users not on reasoning models.

### Token & Context Budget Inheritance Chain

The hardcoded `max_output_tokens=2000` default in per-chunk relationship extraction was collapsing Qwen3-family output (Run 03 stored 13 relationships across 77 candidate pairs because the model truncated mid-XML). Budgets are now configurable across the whole stack via a fallback chain so a 3-model deployment can be configured with just two env vars.

**Inheritance chain (output tokens):**

```
OPENAI_MAX_OUTPUT_TOKENS=8000      (new primary default — covers Qwen3 verbose XML)
     ↓ (when EXTRACTION_*=0)
EXTRACTION_MAX_OUTPUT_TOKENS
     ↓ (when RELATIONSHIP_*=0)
RELATIONSHIP_MAX_OUTPUT_TOKENS      ← per-chunk + candidate scan
     ↓ (when VISION_*=0)
VISION_MAX_OUTPUT_TOKENS

RELATIONSHIP_BATCH_MAX_OUTPUT_TOKENS=16000   (standalone — Phase 2 batch only, NOT in chain)
```

**Inheritance chain (input context):**

```
OPENAI_MAX_CONTEXT=32768
     ↓ (when EXTRACTION_*=0)
GRAPH_EXTRACTION_MAX_CONTEXT
     ↓ (when RELATIONSHIP_*=0)
RELATIONSHIP_MAX_CONTEXT
```

- `0` is the inherit sentinel for ints (consistent with `MAX_FILES=0` = unlimited convention).
- `EXTRACTION_MAX_CONTEXT` → **renamed to `GRAPH_EXTRACTION_MAX_CONTEXT`** for prefix consistency with `GRAPH_EXTRACTION_MODEL` / `GRAPH_EXTRACTION_API_BASE`. Legacy name honored as a deprecated alias for one release; a one-shot startup `WARN` from `app.config._warn_deprecated_env_aliases` nudges migration.
- `RELATIONSHIP_MAX_OUTPUT_TOKENS` **semantic split**: the env var now drives per-chunk + candidate-scan in the chain; the Phase 2 batch budget moved to the new standalone `RELATIONSHIP_BATCH_MAX_OUTPUT_TOKENS=16000`. Users who explicitly set `RELATIONSHIP_MAX_OUTPUT_TOKENS=16000` for Phase 2 should rename to the new var; the legacy value still works (per-chunk gets harmless 16000-token headroom).
- Entity-extraction call sites at `graph_extractor.py:821`, `1014`, `1840` now read `settings.extraction_max_output_tokens` instead of hardcoded `3000`/`8000`. Per-chunk relationship caller in `document_processor.py:1461` passes `settings.relationship_max_output_tokens`. Phase 2 batch caller switched to `settings.relationship_batch_max_output_tokens`. Vision call (`vision_analyzer.py:304`) reads `settings.vision_max_output_tokens`.
- Utility-tier hardcoded budgets stay hardcoded (50/300/500/1000 for community names, entity descriptions, query-time extraction) — those are task-tuned and would harm functionality if scaled with the primary.
- 17-test `backend/tests/test_budget_fallback.py` covers all chain combinations, mid-chain overrides, Phase 2 isolation, legacy alias loading, and the deprecation WARN trigger.

### Neo4j 5.26 Upgrade (4096-dim Vector Indexes)

Neo4j 5.15-community caps vector-index dimensions at 2048, blocking native use of Qwen3-Embedding-8B (4096) and other modern embedding models. Bumped to Neo4j 5.26 LTS across all compose files.

- `image: neo4j:5.26-community` in `docker-compose.yml`, `coolify/docker-compose.coolify.yml`, `dokploy/docker-compose.dokploy.yml`; `neo4j:5.26-enterprise` in `docker-compose.prod.yml`.
- In-place data file migration (within Neo4j 5.x the named volume `neo4j_data` upgrades automatically on first boot).
- Python `neo4j` driver pin (`>=5.17.0`) is already forward-compatible with the new server — no driver changes needed.
- APOC plugin auto-installs the matching APOC version via `NEO4J_PLUGINS=["apoc"]` — no separate APOC version bump required.
- **Existing deployments at 1536/2048-dim embeddings keep their corpora intact**: Cortex preserves the vector index when `EMBEDDING_DIMENSION` matches the stored one. Only bumping the dim triggers an auto drop-and-recreate of the index (requires re-embedding).
- New deployments can set `EMBEDDING_DIMENSION=4096` directly for native Qwen3-Embedding-8B fidelity.

### Recommended Minimal Stack + `.env.recommended`

The model + budget inheritance chains compress a 3-model 12-env-var stack down to a 2-model 6-env-var stack. New `.env.recommended` template at the project root captures the bench-validated Venice setup:

```env
OPENAI_API_KEY=...
OPENAI_API_BASE=https://api.venice.ai/api/v1
OPENAI_MODEL=minimax-m27
OPENAI_MAX_CONTEXT=196608

GRAPH_EXTRACTION_MODEL=qwen3-6-27b
GRAPH_EXTRACTION_MAX_CONTEXT=256000

VISION_MODEL=qwen3-6-27b

EMBEDDING_MODEL=text-embedding-qwen3-8b
EMBEDDING_DIMENSION=4096

BATCH_PROCESSING_CONCURRENCY=5
CONCURRENT_EXTRACTIONS=10
CONCURRENT_RELATIONS=5
VISION_MAX_CONCURRENT=5
```

- The same canonical block now appears in `.env.example` (top-of-file Quick Start), `README.md` (Quick Setup), `.claude/environment.md`, `documentation/pages/{configuration.mdx, quickstart.mdx}`, and `handbook/{03-getting-started.md, 04-configuration.md}` — eight locations total.
- Companion **performance tuning (Venice-validated)** block documents the concurrency knobs and warns that `CONCURRENT_EXTRACTIONS` is the biggest multiplier to dial down on smaller providers.
- Relationship + (optionally) Vision sub-tier models inherit `GRAPH_EXTRACTION_MODEL` automatically; embedding API base/key inherit from `OPENAI_*` so no separate config is needed when Venice serves both.
- **Vision exception**: `VISION_MODEL` does NOT inherit — set it explicitly or Cortex falls back to Docling's built-in SmolVLM (no API call).

### Autonomous LLM-Stack Benchmark Harness

New `bench/` directory — a self-contained orchestrator that cycles a fixed dataset through arbitrary LLM-model combinations against the Cortex ingestion pipeline, measuring extraction yield, relationship density, and runtime per combo. Includes a model registry, automated `.env` backup/restore safety, and per-run heuristics. *Not yet publicly documented; CLI is internal-only for now.*

### Admin Settings Page Enhancements

Settings page (`/admin`) didn't expose the new budget knobs. The **LLM Configuration** section now shows:

- **Primary Model**: added **Context Window** (`OPENAI_MAX_CONTEXT`) and **Output Tokens** (`OPENAI_MAX_OUTPUT_TOKENS`) rows
- **Extraction Model**: added **Output Tokens** (`EXTRACTION_MAX_OUTPUT_TOKENS`)
- **Relationship Model**: split into **Output Tokens (per-chunk)** and **Output Tokens (batch)** — reflecting the inheritance-chain + standalone split
- **Vision Model**: added **Output Tokens** (`VISION_MAX_OUTPUT_TOKENS`)

Each new field includes a tooltip explaining its inheritance path and where it sits in the budget chain. Backend `SystemConfigResponse` model + `/api/admin/config` endpoint extended with the corresponding fields.

### Pricing & Plan Limits

- New `MAX_ENTITIES` env var (default `0` = unlimited) caps the total entity count across the graph. Enforced at upload-time and custom-input creation: new ingestion is rejected once `get_stats()["entity_count"]` is at or above the cap. A single in-flight document can push the post-extraction count slightly above the cap (accepted tradeoff per PRICING.md §4.2).
- `MAX_QUERIES_PER_MONTH` instance-wide cap on chat-style queries — applies to `POST /api/search`, `POST /api/ask`, `POST /api/ask/stream`, `POST /api/ask/stream/thinking`. Returns `429 Too Many Requests` with a `Retry-After` header (seconds until next UTC month) when exceeded.
- Error messages on limit-rejection updated to indicate users should upgrade their plan rather than just stating the limit was hit.
- Comprehensive test coverage in `backend/tests/` for `MAX_FILES`, `MAX_ENTITIES`, `MAX_COLLECTIONS` enforcement across upload, custom input, collection creation, and library import endpoints.

### Skill `http_request` Tool Improvements

- **Config variable substitution in request body**: `_substitute_variables()` previously ran on the URL only, so skills that referenced configured values in a JSON body (e.g. `"group": "ZAMMAD_GROUP_NAME"`) shipped the literal placeholder string instead of the substituted value. Now applied to URL + body.
- **Default `Content-Type` injection**: the tool previously only injected `Authorization` headers from skill config schemas. Skills `POST`/`PUT`-ing JSON bodies without explicitly setting `Content-Type` were getting `text/plain` and being rejected by strict APIs. Default is now `application/json` when a body is present, overridable per request.
- **Improved researcher-agent logging**: tool call inputs/outputs are now logged at INFO level with payload truncation for easier debugging of skill-driven research flows.

### Design System & UI Kit

- **MOCA Library UI Kit**: complete UI kit for the MOCA Library web app, including the `ManageScreen` for document management and the `Shell` component for top-level navigation chrome.
- **Design system preview files + tokens**: HTML previews for drop zone, glass surface, icons, input, logo, motion, nav pill, radii, shadows, spacing scale, stats card, and other primitives. Tokens published as the canonical reference for in-product surfaces and partner integrations.

### Minor Fixes

- **Restart-vs-recreate gotcha documented**: `docker compose restart` does NOT re-read `env_file:` (env vars are baked at container *create* time). Added a callout to `handbook/03-getting-started.md` and a dedicated section in `handbook/20-troubleshooting.md` pointing users to `docker compose up -d --force-recreate backend` after any `.env` change.
- Build resilience: logo download via `curl` no longer fails the frontend build when the Directus URL is unreachable — logs a warning and continues with the default logo (`81d99af`).
- Code structure cleanup across `.claude/architecture.md`, `.claude/design-system.md`, and several services (`172c1db`).

---

## April 8, 2026

### Collection-Based Auth System

A comprehensive security audit remediation that closes every gap in the collection-scoped API key system. Previously, restricted keys could silently bypass collection boundaries on graph, stats, community, and task endpoints. All endpoints now enforce collection scope consistently.

**Foundation fixes (`neo4j_service.py`)**

- Fixed `get_document()`, `get_document_content()`, and `get_documents_file_paths()` to include the `Collection` join — they were returning `collection_id: null`, causing every downstream collection access check to silently pass
- Added `allowed_collection_ids` parameter to all graph query methods, implementing the **4-hop scoping pattern** (`Collection→Document→Chunk→Entity`) throughout: `get_graph_visualization_data()`, `list_entities_paginated()`, `get_entity_relationships()`, `get_graph_subgraph()`, `get_entity_types()`, `get_relationship_types()`, `find_entities_by_name()`, `list_relationships_paginated()`, `suggest_duplicate_entities()`
- Added `allowed_collection_ids` to community methods using a **5-hop pattern** (`Community→Entity→Chunk→Document→Collection`): `list_communities_paginated()`, `get_community()`, `search_communities_by_content()`
- Updated `get_stats()` to accept `allowed_collection_ids` — restricted keys now see document/entity/community counts scoped to their allowed collections only

**Full endpoint authentication coverage (`main.py`)**

All endpoints now require an `X-API-Key` header. Previously unprotected endpoints now enforce appropriate permission levels:

- *Read* — `GET /api/tasks`, `GET /api/tasks/{id}`, `GET /api/tasks/{id}/result`, `GET /api/turbo/status`
- *Manage* — `POST /api/documents/{id}/reprocess` (+ per-document collection validation), `POST /api/documents/reprocess` (+ per-document collection check in loop), `POST /api/documents/process-pending`, `POST /api/cleanup/orphaned-entities`, `PATCH /api/graph/entity/{name}`, `POST /api/entities/merge`, `POST /api/graph/relationships/analyze` (+ `collection_id` validated if supplied), `DELETE /api/graph/relationships`, `DELETE /api/graph/entities`, `POST /api/graph/communities/detect` (+ `collection_id` validated if supplied), `DELETE /api/graph/communities/{id}`, `DELETE /api/graph/communities`, `POST /api/graph/communities/summarize`, `DELETE /api/tasks/{id}`, `POST /api/tasks/cleanup`, `POST /api/custom-input/generate-topic`
- *Admin* — `GET /api/turbo/balance`, `POST /api/turbo/start`, `POST /api/turbo/stop`, `POST /api/turbo/extend`, `GET /api/turbo/jobs`, `GET /api/turbo/jobs/{id}`, `GET /api/turbo/jobs/{id}/logs` (Turbo Mode manages billing-sensitive GPU resources — elevated from unprotected to admin-only)

**Collection scoping propagated to all read endpoints**

Restricted keys now receive automatically filtered results across every read endpoint — no endpoint leaks cross-collection data:

- `/api/stats` and `/api/graph/status` — counts scoped to allowed collections
- `/api/graph/entities`, `/api/graph/entity-types`, `/api/graph/visualization`, `/api/graph/entity/{name}`, `/api/graph/entity/{name}/relationships`, `/api/graph/search`, `/api/graph/subgraph` — all scoped via 4-hop pattern
- `/api/graph/relationships`, `/api/graph/relationship-types` — scoped via source/target entity 4-hop filter
- `/api/graph/communities`, `/api/graph/communities/{id}`, `/api/graph/communities/search` — scoped via 5-hop pattern
- `/api/entities/duplicates` — entity candidates scoped to allowed collections
- Researcher agent (`researcher_agent.py`) and query processor (`document_processor.py`) — `allowed_collection_ids` propagated through `run_research_pipeline()`, `graph_search_async()`, `_execute_knowledge_search()`, `hybrid_search_with_graph()`, and `semantic_search()` so Ask AI and Deep Research respect collection boundaries end-to-end

**Bug fixes**

- Removed dead `get_stats()` method (`neo4j_service.py`) that was shadowed by the new scoped version — the dead method returned a stripped-down dict missing `community_count`, `collection_count`, `pending_count` and other fields, making it a silent trap for future callers
- Fixed `reprocess_documents` and `reprocess_document` endpoints incorrectly using `collection_id or "default"` when checking collection access for documents without a collection assignment — `can_access_collection(None)` already returns `True` for restricted keys (global queries are allowed), so the fallback was incorrectly blocking access to uncollected documents

---

## April 1, 2026

### Agent Skills: Setup Wizard, http_request Tool, and Reliable Skill Execution

- **Setup Wizard**: After installing a skill, the primary LLM analyzes the SKILL.md to extract required configuration variables (API tokens, URLs, etc.). A dynamic modal prompts the user for values — stored persistently in `config.json` in the skill directory. No more `.env` editing or container restarts to configure skills.
- **`http_request` built-in tool**: The researcher agent can now call external APIs described in skill instructions. The LLM provides only `method` and `url` — authentication headers are injected server-side from the config schema's `auth_header` template. The LLM never touches tokens or API keys.
- **Auto-activation**: Enabled skills are now automatically loaded at the start of every research/chat session. The full SKILL.md body (with auth-related lines stripped) is injected into the system prompt. Replaces the previous on-demand `activate_skill`/`list_skills` pattern.
- **Server-side auth injection**: Auth headers are built from the config schema's `auth_header` field (e.g. `Authorization: Bearer API_TOKEN`, `X-API-Key: API_KEY`) — works for any auth pattern, any skill.
- **Config API endpoints**: `POST /api/admin/skills/{id}/analyze` (LLM analysis), `GET /api/admin/skills/{id}/config` (schema + masked values), `PUT /api/admin/skills/{id}/config` (save values with mask preservation)
- **Config status**: Skills show "Needs setup" badge when required config is missing. "Configure" button in expanded skill details on Settings page.
- **Skills directory persistence**: Added `skills_data` Docker named volume to prod and Coolify compose files. Fixed path resolution bug (4→3 parent levels) that was writing skills to ephemeral container storage.
- **Smart API response truncation**: JSON responses are intelligently slimmed (truncate long strings, flatten nested objects) to keep all entries within context budget instead of cutting mid-object.
- **Source dedup fix**: `_deduplicate_sources()` no longer drops skill API responses (sources without `chunk_id`). Skill API data is now passed to the writer as a priority source.
- **Chat mode improvements**: Agentic chat (`ENABLE_AGENT_CHAT`) now defaults to `true`. Speed mode gains the `reasoning` tool when skills are active. Max iterations bumped from 2 to 5. Retry with `tool_choice=required` when the model skips tools on first iteration.
- **Frontend**: New `SkillConfigModal` component, "View SKILL.md" button in skill details, removed "Powered by Neo4j + Haystack" footer, chat/research views use full viewport height.
- **`tools.json` deprecated**: Not part of the [agentskills.io](https://agentskills.io/specification) standard. The standard approach is instruction skills with API documentation + the built-in `http_request` tool.

### Document Source Tracking

- Added `source` field to documents — tracks the origin of each document (e.g. `upload`, `custom_input`, or a custom identifier like `youtube-transcriber`)
- Upload endpoint (`POST /api/upload`) accepts optional `source` query parameter (defaults to `upload`)
- Custom input endpoint (`POST /api/custom-input`) accepts optional `source` field in body (defaults to `custom_input`)
- Document list and detail endpoints return the `source` field
- Frontend shows source label on document cards when source is not the default `upload`
- Source filter dropdown appears automatically in the Documents page toolbar when documents have multiple distinct sources
- Existing documents are automatically backfilled on startup: `upload` for regular uploads, `custom_input` for custom inputs
- Library export/import preserves the `source` field

---

## March 30, 2026

### Agent Skills in Library Transfer

- Agent Skills are now included in library export/import: `Skill` nodes, `SKILL.md` files, and `tools.json` are bundled in the ZIP archive with `directory_path` remapping on import
- Full system reset now cleans up skill nodes and skill directories on disk
- Fixed vector index dimension mismatch: Neo4j vector indexes are now auto-detected and recreated when embedding dimension config changes (fixes broken search after importing a library with different dimensions)
- Step 2 progress now shows only newly discovered cross-document relationships instead of total relationship count

---

## March 28, 2026

### Image Extraction Integration Improvements

- Image-extracted relationships are now tagged as `extraction_method="per_chunk"` so they count toward within-document relationships in the Step 1 display
- Image entities now use `store_entity_with_resolution()` for the same fuzzy dedup as text entities, instead of basic `store_entity()`
- Page number and caption are now stored in image chunk metadata for document position tracking
- Image-extracted entities are cross-linked to text chunks via fuzzy MENTIONS matching
- Entity provenance (`source_document_id`) is now passed through `store_graph_extraction()` for image entities

### Graph Generation Pipeline Resilience

- Added retry with exponential backoff (up to 5 attempts) on all poll functions, preventing the entire 3-step pipeline from aborting on the first transient network error
- Added `visibilitychange` listener to immediately resume polling when a browser tab becomes active (browsers throttle `setTimeout` in background tabs)
- Active poll state is tracked via ref for reliable resume across tab switches

---

## March 27, 2026

### Agent Skills System

- Integrated the open [AgentSkills](https://agentskills.io/) standard into Deep Research and Chat flows
- Skills are `SKILL.md` files with YAML frontmatter (name, description, license, metadata) supporting two types: **instruction skills** (modify researcher behavior) and **tool-providing skills** (include `tools.json` with HTTP/script execution)
- **On-demand activation**: researcher agent receives a compact skill catalog and `activate_skill`/`list_skills` tools — full instructions and tools are loaded only when the agent decides they're relevant to the query
- Skills discoverable from `.agents/skills/` directory on startup, installable from direct URLs or the [skills.sh](https://skills.sh) registry
- Added `SkillsManager` component on Settings page for installing, enabling, disabling, and deleting skills with registry search
- Skill tool calls and activations rendered in chat with Puzzle icon
- New environment variables: `ENABLE_SKILLS` (default: true), `SKILLS_DIR`, `ENABLE_SKILL_SCRIPTS` (default: false), `SKILL_SCRIPT_TIMEOUT`, `SKILL_HTTP_TIMEOUT`, `MAX_SKILL_TOOLS`, `MAX_SKILL_INSTRUCTIONS_TOKENS`
- Admin-only CRUD endpoints: `GET/POST/PATCH/DELETE /api/admin/skills/*`

### Library Import/Export

- Added full instance migration via Settings page → Data Management section
- **Export** (`POST /api/admin/export`) runs as a background task building a ZIP64 archive containing 12 NDJSON data files (documents, chunks with embeddings, entities, relationships, communities, collections, merge history, system meta, and more) plus original document files
- **Import** (`POST /api/admin/import`) accepts multipart ZIP upload with `clean` mode (requires empty instance) or `replace` mode (auto-wipes first)
- Validates manifest, checks embedding model/dimension compatibility, remaps file paths, restores all nodes and edges including dynamic APOC relationship types
- Concurrency guard prevents simultaneous export/import operations (409 conflict)
- Frontend shows Export card (stats summary + progress bar + download) and Import card (mode selector + drag-and-drop upload + progress bar + result summary with warnings)

### Three-Tier LLM Architecture

- Added dedicated relationship model (`RELATIONSHIP_EXTRACTION_MODEL`, `RELATIONSHIP_EXTRACTION_API_BASE`, `RELATIONSHIP_EXTRACTION_API_KEY`) with its own API endpoint and rate limit for all relationship work (Step 1 per-chunk + Step 2 batch analysis)
- Three-tier model separation: **Primary** (reasoning, for Q&A/research), **Extraction** (instruction-following, for entity extraction and community summarization), **Relationship** (instruction-following, for all relationship discovery)
- Added `get_relationship_llm_config()` with fallback chain: relationship model → extraction model → primary model
- Added `CONCURRENT_RELATIONS` (default: 3) for per-chunk relationship extraction concurrency, separate from entity extraction
- Changed `PARALLEL_RELATIONSHIP_BATCHES` default from 0 to 5
- Community summarization moved from primary model to extraction model for more reliable structured output
- Added Relationship Model section to Settings page LLM Configuration between Extraction and Vision

### Per-Chunk Relationship Extraction Fixes

- Added tenacity retry with exponential backoff (4 attempts, 2–30s wait) for 429 rate-limit errors during per-chunk extraction
- Original-to-canonical entity name mapping is now tracked during dedup, and relationship source/target are remapped before storing — fixes silent storage failures when entity names were merged during fuzzy resolution
- Self-referential relationships (source == target) are now filtered at both extraction and storage
- Fixed stats query to count all Entity→Entity edges, not just `RELATED_TO` type
- Fixed `per_chunk_relationship_count` missing from stats API response
- Step 2 rebuild now only deletes batch-analysis relationships via `delete_batch_relationships()`, preserving per-chunk relationships from Step 1

### Knowledge Graph Page UX Improvements

- Step 1 renamed to "Entity Extraction & Relationship Discovery" — now shows entity count and within-document relationship count
- Step 2 renamed to "Deep Relationship Analysis" — now shows only cross-document relationship count
- "Re-analyze" button renamed to "Find more"; rebuild warning clarifies that per-chunk relationships are preserved
- ERR (Entity-Relationship Ratio) indicator now shows 2 decimal places for better incremental visibility
- Fresh instance warning on Step 1 (0 entities) recommends "Generate Graph" instead of "Extract Entities"

---

## March 26, 2026

### Relationship Analysis Pipeline Overhaul

- **Two-phase per-batch analysis**: Phase 1 scans entity batches for candidate pairs, Phase 2 confirms and classifies with structured XML output including confidence scores (0.0–1.0) — relationships with confidence < 0.5 are filtered before storage
- **Co-occurrence batching via Union-Find clustering**: entities sharing chunks are grouped into the same batch for richer context; replaces O(n²) greedy BFS sort that was freezing on 32k entities; new algorithm is O(n·chunks), handles 100k+ entities
- **Dynamic chunk context filling**: token budget split 60/40 between entities and source text; chunk context fetcher accepts a token budget and fills up to it, replacing the old fixed 10 chunks × 500 chars (~1,250 tokens out of ~44k available)
- **Multi-round discovery**: initial analysis runs up to `RELATIONSHIP_MAX_ROUNDS` (default: 3) rounds, stopping early if target ERR reached or `RELATIONSHIP_MAX_HOURS` exhausted; "Find more" always does 1 round
- Removed 5,000 entity fetch cap from `get_all_entities_for_collection()` — was silently ignoring 27k of 32k entities
- Removed 500 existing-relationship fetch cap between rounds
- Added Entity-Relationship Ratio (ERR) metric shown on Knowledge Graph page with color-coded indicator (green ≥ 0.69, yellow ≥ 0.29, red < 0.29) and tooltip
- New config variables: `RELATIONSHIP_TARGET_RATIO` (default: 1.0), `RELATIONSHIP_MAX_ROUNDS` (default: 3), `RELATIONSHIP_MAX_HOURS` (default: 0), `RELATIONSHIP_MAX_OUTPUT_TOKENS` bumped from 8k to 16k

### Relationship Quality and Anti-Hub Protections

- Removed `MENTIONS` from allowed relationship types (now 14 types) — was being used as a lazy catch-all for co-occurrence
- Added per-chunk relationship extraction during entity processing: chunks with 2+ entities get an LLM call using chunk text as direct evidence, stored with `extraction_method='per_chunk'`
- Added few-shot good/bad examples to Phase 1 candidate scan and Phase 2 extraction prompts
- Added anti-hub negative instructions: "If no clear relationship exists, do not create one. Co-occurrence is not a relationship."
- Added `RELATIONSHIP_MAX_PER_ENTITY` config (default: 50) — soft cap that skips storage when both endpoints are saturated
- Capped existing relationships shown to LLM at 20 per entity between rounds to prevent hub reinforcement
- Reduced batch overlap from 15% to 5% with degree-aware selection (excludes entities already in 2+ batches, prefers low-connection entities)
- Relationship type fuzzy-matching via rapidfuzz (80% threshold, fallback to `RELATED_TO`) with plaintext arrow format fallback parser

### Entity Deduplication Improvements

- Wired embedding-based entity dedup into the main storage path, catching semantic matches (e.g., "Museum of Crypto Art" / "MOCA") that Levenshtein misses; controlled by `ENABLE_SEMANTIC_ENTITY_RESOLUTION` (default: true), falls back to Levenshtein

### Graph Visualization Diversity

- Replaced `ORDER BY mention_count` visualization query with a diversity score: `mention_count / (1 + log(1 + degree))` — prevents hub entities from dominating the default 100-node view
- Stronger d3 force charge and link distance for better graph spacing

### Bug Fixes

- Fixed multi-round progress: cumulative batch counts across all rounds with global ETA instead of per-round reset
- Fixed rebuild flag not passed through to `analyze_collection_relationships`, causing rebuild mode to run only 1 round instead of `RELATIONSHIP_MAX_ROUNDS`
- Fixed login white screen by moving `redirect()` out of server action into client-side `useEffect`

---

## March 25, 2026

### Bulk Document Download

- Added `POST /api/documents/download-zip` endpoint that accepts a list of document IDs, builds a ZIP64-enabled archive of original files with duplicate filename disambiguation, and streams the response in 1MB chunks
- Added Download button to the bulk actions toolbar on the Documents page — select documents and download them as a single ZIP file
- Handles 1000+ files with ZIP64 support

---

## March 20, 2026

### Parallel Image Analysis Within Documents

- Images within a single document are now analyzed concurrently using `asyncio.gather`, replacing the previous sequential for-loop
- Added `VISION_MAX_CONCURRENT` environment variable (default: 3) to control the maximum number of concurrent vision API calls system-wide
- Thread pool sizes for image processing now scale automatically with `VISION_MAX_CONCURRENT`
- Fixed "Event loop is closed" errors during concurrent image analysis by using thread-local HTTP clients
- For a document with 200 images at ~30s per image: processing time reduced from ~100 minutes (sequential) to ~10 minutes (with `VISION_MAX_CONCURRENT=10`)

---

## March 18, 2026

### Entity Deduplication

- Added `GET /api/entities/duplicates` endpoint to find duplicate entity candidates using fuzzy name similarity, with configurable threshold (0.5-1.0) and limit parameters
- Added `POST /api/entities/merge` endpoint to merge duplicate entities into a canonical entity, transferring all relationships, chunk mentions, and community memberships; merged entity names are stored as aliases on the canonical entity
- Added `GET /api/entities/merge-history` endpoint to view a chronological log of past merge operations
- Updated Python client class with `find_duplicate_entities()`, `merge_entities()`, and `get_merge_history()` methods
- Added cURL examples for all three entity deduplication endpoints
- Updated OpenAPI specification with new endpoints and schemas

---

## March 17, 2026

### Two-Phase Graph Extraction Pipeline

- Refactored entity extraction to work per-document instead of per-chunk, reducing LLM calls from N (one per chunk) to 1-4 (batched by token budget) per document
- Added dedicated graph extraction model configuration (`GRAPH_EXTRACTION_MODEL`, `GRAPH_EXTRACTION_API_BASE`, `GRAPH_EXTRACTION_API_KEY`) to allow using a smaller/faster model for entity extraction
- Entity-to-chunk linking now uses fuzzy string matching (rapidfuzz) to create `MENTIONS` relationships
- Added entity provenance tracking: `source_documents`, `extraction_count`, `last_extracted_at` properties on Entity nodes
- New Phase B relationship analysis endpoint: `POST /api/graph/relationships/analyze` - discovers cross-document relationships using the main (large) model
- Relationship analysis runs as a background task with progress tracking via `/api/tasks/{id}`
- Added relationship provenance: `extracted_at`, `extraction_method`, `source_document_id` properties on relationships
- New auto-trigger options: `AUTO_RELATIONSHIP_ANALYSIS_AFTER_BATCH` and `AUTO_COMMUNITY_DETECTION_AFTER_BATCH` for automated pipeline after batch processing
- Turbo mode always overrides both extraction and main model configs for massive-scale ingestion

### Community Deletion and Graph Cleanup

- Added `DELETE /api/graph/communities/{id}` endpoint to delete a specific community (entities are unlinked but preserved)
- Added `DELETE /api/graph/communities` endpoint to delete all communities at once
- Enhanced `POST /api/cleanup/orphaned-entities` to also clean orphaned communities in one call
- Added delete buttons (individual and "Delete All") to the Explore page communities panel and the Collections page community section
- StatsBar now shows an orphaned data warning banner with a cleanup button when entities/relationships exist but no documents are present
- Frontend API client now includes `deleteCommunity()`, `deleteAllCommunities()`, and `cleanupOrphanedEntities()` methods

---

## March 16, 2026

### Ask AI API Endpoint Update

- Renamed API endpoints from `/api/rag` to `/api/ask` across all examples and documentation for improved clarity and consistency
- Updated configuration settings for document processing and community detection endpoints

---

## March 5, 2026

### Large PDF Processing Improvements

- Added memory optimizations for handling large PDF files, including chunked processing and backend unloading to prevent out-of-memory errors
- Introduced `PAGE_CHUNK_SIZE` and `MAX_PAGES_PER_CHUNK` environment variables for configuring chunked PDF processing
- Integrated pypdf for lightweight PDF page counting
- Added fallback to PyPdfium for large files when the primary converter encounters memory constraints
- Fixed page index conversion in the Docling worker to correctly handle 1-based page indices

### Image Analysis Execution Optimization

- Implemented a synchronous version of the image analysis method in VisionAnalyzer to allow thread pool execution, preventing the main event loop from being blocked
- Updated document converter to support image format options for more flexible input handling
- Updated Dockerfile with necessary system dependencies for PDF/image processing, including X11 libraries and Tesseract OCR

---

## March 4, 2026

### Docling Integration and Advanced Document Processing

- Integrated Docling as the primary document conversion engine, replacing the previous processing pipeline
- Added a dedicated thread pool for image analysis to optimize performance and prevent competition with document processing tasks
- Implemented subprocess-based document conversion for improved efficiency with large files
- Added event loop watchdog to monitor and log potential blocking issues
- Enhanced VisionAnalyzer with retry logic for API calls
- New fields for image analysis progress in the Document model, with frontend components to display processing status
- Added configuration options for vision models in the environment setup
- Increased upload body size limit to 100MB

### Logging and Robustness Improvements

- Suppressed Neo4j notification warnings by adjusting the logging level
- Added a function to clean image placeholders in document processing
- Conditionally configured OCR and image description based on vision model availability
- Enhanced GraphExtractor to handle potential `None` responses from document summary generation
- Refactored Neo4jService queries to use `coalesce` for better handling of missing data
- Refined default analysis prompt in VisionAnalyzer for improved clarity and document retrieval purposes

---

## March 2, 2026

### Image Extraction and Vision Analysis

- Added a new vision analyzer service that extracts images from documents (PDF, DOCX, PPTX) and analyzes them using configurable vision models
- Supports OpenAI-compatible APIs with fallback to Docling's built-in image descriptions
- Image analyses are integrated into the RAG pipeline for searchable visual content
- Enabled Docling's built-in image description generation (`do_picture_description`) as an additional source
- Added EasyOCR for enhanced image text extraction
- Updated Dockerfile with Tesseract OCR and additional libraries for improved document processing

### Enhanced File Upload Support

- Expanded allowed file extensions to support a wider range of document types including additional document and media formats
- Updated frontend FileUpload and UploadZone components to reflect the expanded file type support
- Updated file type icons for better visual representation of supported formats

---

## February 27, 2026

### Document Processing Engine Overhaul

- Updated Dockerfile to include necessary system dependencies for Docling, enabling PDF and image processing
- Added `docling-haystack` dependency for enhanced document handling
- Expanded allowed file extensions in configuration to support a wider range of document types
- Refactored document processor to utilize the new Docling converter, streamlining handling of various document formats

---

## February 25, 2026

### Customizable Branding and Theming

- Added support for custom logo configuration via the `CUSTOM_LOGO_URL` environment variable, allowing white-label deployments
- Introduced custom accent color support via the `CUSTOM_ACCENT_COLOR` environment variable for theme customization
- Added configurable branding options (site title, description) through environment variables
- Increased logo dimensions for improved sharpness on high-DPI screens
- Documented all frontend customization variables in the deployment guide

---

## February 6, 2026

### API Key Usage Tracking and Analytics

- Introduced a dedicated API usage tracking service that records every request per API key, categorized by endpoint type (ask, search, upload, documents, graph, collections, admin, etc.)
- Added middleware to automatically track API requests and errors for all authenticated endpoints
- Admin API key records are now persisted in Neo4j for long-term usage statistics
- New backend models for API key usage statistics, usage data points, and usage history

### Admin Dashboard Improvements

- Added an **API Key Analytics** panel with interactive usage charts showing request volume over time and endpoint breakdown
- New **API Key Manager** component for creating, viewing, and managing API keys directly from the admin UI
- Introduced a **System Reset Modal** with granular deletion options (documents, uploaded files, custom inputs, collections, API keys) and a confirmation safeguard
- Admin page now displays system configuration overview including API key usage settings
- Updated Cortex logo assets (dark and light variants)

### Resource Limits

- Added `MAX_FILES` and `MAX_COLLECTIONS` environment variables to cap the number of documents and collections (set to `0` for unlimited)
- Backend now enforces these limits on file uploads, custom input creation, and collection creation, returning a `403` error when limits are exceeded

### Collection-Scoped AI Queries

- The Ask AI feature now supports scoping queries to a specific collection via `collection_id`, so answers are drawn only from documents in that collection
- Collection scoping works across all ask modes: standard, streaming, streaming with thinking, and fast search
- Ask AI settings (streaming, agentic mode, fast search, selected collection) are now persisted to localStorage so preferences survive page reloads

### Skill Library Restructure

- Renamed the skill package from `openclaw-library` to `library`
- Added sync scripts (Python, Shell, and JavaScript) for automated skill file synchronization
- Removed obsolete OpenClaw library skill files

---

## February 5, 2026

### Document Deletion with Task Cancellation

- Deleting a document now automatically cancels any active processing task for that document before removal, preventing orphaned background jobs
- Introduced proper task tracking for active document processing with cancellation flag support
- Added `cancel_multiple_documents` and improved `cancel_processing_task` methods in the DocumentProcessor with deadlock-safe locking
- Processing tasks are now started with cancellation flags initialized upfront, ensuring graceful shutdown from the very first checkpoint

### Knowledge Graph Cleanup on Deletion

- Document deletion now triggers a comprehensive cleanup of the knowledge graph, removing orphaned entities and communities that are no longer referenced by any remaining document
- API responses for deletion endpoints now include detailed cleanup statistics (entities removed, communities removed, etc.)
- Updated documentation pages for Document Upload and Knowledge Graph features to explain the new deletion and cleanup behavior

### API Parameter Changes

- `collection_id` and `start_processing` parameters on the upload endpoint are now passed as URL query parameters instead of form data fields
- Updated API examples in the documentation (cURL and Python) to reflect the new parameter format
- Skill documentation updated to emphasize that both `api_key` and `base_url` credentials are required
- Skill version bumped to 1.2.0

---

## February 4, 2026

### Admin Authentication and API Key Management

- Implemented a full admin authentication system with email/password login validated against environment variables (`ADMIN_EMAIL`, `ADMIN_PASSWORD`)
- Added session management with encrypted JWT tokens stored in HTTP-only cookies
- Introduced Next.js middleware to protect all routes (except `/login`) behind authentication
- Created a dedicated login page with form validation and error handling, wrapped in React Suspense for smooth loading states

### Permission-Based API Protection

- All API endpoints are now protected with a three-tier permission system: **read**, **manage**, and **admin**
- Read endpoints (search, ask, list documents, stats) require at minimum a valid API key with read permission
- Write endpoints (upload, delete, create collections) require manage permission
- Admin endpoints (API key CRUD, system management) require admin-level access
- New backend services: `auth_service` for request authentication and `api_key_service` for key lifecycle management

### Documentation Site Launch

- Launched the full documentation site built with Zudoku, featuring:
  - Getting Started guides (Introduction, Quickstart, Configuration)
  - Core Features documentation (Document Upload, Search, Ask AI, Knowledge Graph, Collections, Communities, Turbo Mode)
  - Guides (Deployment, Authentication, Security)
  - Code examples (Python, cURL, Integration)
  - OpenAPI-powered API Reference
  - Pagefind-based full-text search
  - LLM-friendly output (`llms.txt` and `llms-full.txt`)
- Created comprehensive backend API documentation (`BACKEND_API_DOCUMENTATION.md`)
- Cleaned up documentation pages by removing redundant section headers for a streamlined reading experience

---

## January 29, 2026

### Rebrand to Cortex

- Renamed the project from **MOCA Knowledge Base** to **Cortex**
- Updated the tagline to **"The Agentic Knowledge Base for the AI Era"**
- Added a new "What is Cortex?" section to the README explaining the project's philosophy around portable, long-term AI memory
- Introduced the memory hierarchy concept: Context (short-term) → Agent Memory Stack (mid-term) → Cortex (long-term)


---

## Document: Turbo Mode

GPU-accelerated inference with Compute3 for faster processing

URL: /features/turbo-mode


# Turbo Mode

> ⚠️ **On hold — not currently available.** Turbo Mode is a Compute3 partnership prepared in 2025. The Compute3 service is not yet in production, so this feature is non-functional today. The integration code and `COMPUTE3_*` env vars remain in the codebase against future activation; setting them right now has no effect (the UI toggle stays hidden unless `COMPUTE3_API_KEY` is set, and even then no live endpoint is reachable). This page is preserved for reference; ignore it for production setups.

Turbo Mode provides **GPU-accelerated LLM inference** through Compute3 integration, enabling faster document processing and lower latency responses.

## What is Compute3?

[Compute3](https://compute3.ai) is a distributed GPU compute platform that provides:

- **Fast Inference** - Low-latency LLM responses on H100/A100 GPUs
- **Scalability** - Handle high request volumes with auto-scaling
- **Cost Efficiency** - Pay-per-use pricing, no idle GPU costs
- **Model Flexibility** - Run open-source models like Llama, Mistral, etc.

## Performance Comparison

| Mode | Latency | Throughput | Cost |
|------|---------|------------|------|
| Standard (OpenAI GPT-4o-mini) | ~1-2s | ~50 req/min | $$$ |
| Turbo (Compute3 Llama-70B) | ~300-500ms | ~200 req/min | $$ |

## Enabling Turbo Mode

### 1. Get a Compute3 API Key

Sign up at [console.compute3.ai](https://console.compute3.ai) and create an API key.

### 2. Configure Environment

```bash
# Required
COMPUTE3_API_KEY=your-c3-api-key-here
COMPUTE3_API_BASE=https://api.compute3.ai

# GPU configuration
COMPUTE3_GPU_TYPE=h100        # h100 or a100
COMPUTE3_GPU_COUNT=4          # Number of GPUs

# Model to use
COMPUTE3_MODEL=MiniMaxAI/MiniMax-M2.1  # or meta-llama/Llama-3.1-70B-Instruct

# Docker image for vLLM inference
COMPUTE3_DOCKER_IMAGE=vllm/vllm-openai:nightly

# Default runtime in seconds (max job duration)
COMPUTE3_DEFAULT_RUNTIME=3600
```

### 3. Start a Turbo Job

Before using Turbo Mode, you need to start a GPU job:

```bash
curl -X POST "http://localhost:8000/api/turbo/start" \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "runtime_seconds": 3600
  }'
```

**Response:**

```json
{
  "job_id": "job_abc123",
  "status": "starting",
  "gpu_type": "h100",
  "gpu_count": 4,
  "model": "MiniMaxAI/MiniMax-M2.1",
  "estimated_ready_in_seconds": 120
}
```

## API Usage

### Check Turbo Status

```bash
curl "http://localhost:8000/api/turbo/status" \
  -H "X-API-Key: your-api-key"
```

```json
{
  "enabled": true,
  "status": "running",
  "job_id": "job_abc123",
  "gpu_type": "h100",
  "gpu_count": 4,
  "model": "MiniMaxAI/MiniMax-M2.1",
  "avg_latency_ms": 487,
  "requests_served": 1234,
  "runtime_remaining_seconds": 2847
}
```

### List Turbo Jobs

Check active and past GPU jobs:

```bash
curl "http://localhost:8000/api/turbo/jobs" \
  -H "X-API-Key: your-api-key"
```

### Check Balance

View your Compute3 balance:

```bash
curl "http://localhost:8000/api/turbo/balance" \
  -H "X-API-Key: your-api-key"
```

### Stop Turbo Job

Stop the GPU job to save costs:

```bash
curl -X POST "http://localhost:8000/api/turbo/stop" \
  -H "X-API-Key: your-api-key"
```

## Use Cases

Turbo Mode is ideal for:

### Batch Processing

Process many documents quickly during ingestion:

```bash
# Enable turbo during bulk processing
curl -X POST "http://localhost:8000/api/documents/process-pending" \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{"turbo_mode": true}'
```

### Real-Time Applications

Low-latency requirements for chat interfaces:

- Interactive Q&A sessions
- Live document analysis
- Streaming responses

### High Volume

Many concurrent users or API requests:

- Production deployments
- Team knowledge bases
- Public-facing applications

## Model Options

Supported models on Compute3:

| Model | Parameters | Speed | Quality |
|-------|------------|-------|---------|
| `MiniMaxAI/MiniMax-M2.1` | ~70B | Fast | Excellent |
| `meta-llama/Llama-3.1-70B-Instruct` | 70B | Fast | Excellent |
| `meta-llama/Llama-3.1-8B-Instruct` | 8B | Very Fast | Good |
| `mistralai/Mistral-7B-Instruct-v0.3` | 7B | Very Fast | Good |

## Cost Optimization

1. **Use Runtime Limits** - Set appropriate `COMPUTE3_DEFAULT_RUNTIME`
2. **Stop When Idle** - Call `/api/turbo/stop` when not in use
3. **Batch Requests** - Process multiple items in one session
4. **Right-Size GPUs** - Use fewer GPUs for lighter workloads

## Fallback Behavior

If Turbo Mode is unavailable, Cortex automatically falls back to the standard LLM provider (OpenAI/Anthropic):

```json
{
  "answer": "...",
  "turbo_used": false,
  "fallback_reason": "Turbo job not running"
}
```

## Monitoring

Check the status of individual Turbo jobs:

```bash
curl "http://localhost:8000/api/turbo/jobs/{job_id}" \
  -H "X-API-Key: your-api-key"
```

View job logs:

```bash
curl "http://localhost:8000/api/turbo/jobs/{job_id}/logs" \
  -H "X-API-Key: your-api-key"
```


---

## Document: Agent Skills

Extend Deep Research and Chat with external capabilities from the AgentSkills ecosystem

URL: /features/skills


# Agent Skills

Agent Skills let you add new capabilities to Cortex Library's researcher agent. Skills are reusable instruction sets from the open [AgentSkills](https://agentskills.io/) ecosystem — install them from the [skills.sh](https://skills.sh) registry or any URL, run through a setup wizard, and the agent uses them automatically.

## Overview

Enabled skills are **automatically activated** at the start of every research session. The agent sees their full instructions and can call external APIs using the built-in `http_request` tool. Authentication is handled entirely server-side — the LLM never sees tokens or API keys.

Skills work in both **Deep Research** and **Chat** modes. Agentic chat mode is enabled by default (`ENABLE_AGENT_CHAT=true`).

### How It Works

1. You install a skill and run the **Setup Wizard** to provide any required credentials
2. You enable the skill from Settings
3. When a research or chat session starts, all enabled skills are loaded into the agent's context
4. The agent reads the skill instructions and calls APIs via `http_request` when relevant
5. Auth headers are built server-side from your saved configuration — the LLM only provides the HTTP method and URL

## Installing Skills

### From the Registry

1. Go to **Settings > Agent Skills**
2. Use the search bar under **Browse Registry** to find skills from [skills.sh](https://skills.sh)
3. Click **Install** on any result

### From a URL

1. Go to **Settings > Agent Skills**
2. Paste a direct SKILL.md URL into the **Install Skill** field
3. Click **Install**

### Local Skills

Place skill directories in your skills folder (default: `.agents/skills/`):

```
.agents/skills/
  my-skill/
    SKILL.md          # Required — skill instructions
```

Click **Discover** to scan for new skills, or restart the application.

## Setup Wizard

After installing a skill, it may show a **"Needs setup"** badge. Click **Configure** to open the Setup Wizard:

1. The primary LLM analyzes the SKILL.md to identify required configuration variables (API tokens, base URLs, etc.)
2. A modal presents each variable with a description and input field
3. Secret values (tokens, API keys) are masked and stored locally in a `config.json` file inside the skill directory
4. The configuration schema includes an `auth_header` template (e.g., `Authorization: Bearer API_TOKEN`) that tells the server how to build HTTP headers from your saved values

Once configured, the badge changes to reflect the skill is ready to use. You can reconfigure at any time from Settings.

## Enabling Skills

Installed skills are **disabled by default**. Toggle them on from Settings > Agent Skills. Only enabled skills are loaded into the agent at session start.

## The `http_request` Tool

When skills are active, the agent gains access to a built-in `http_request` tool for calling external APIs. The tool accepts:

| Parameter | Required | Description |
|-----------|----------|-------------|
| `method` | Yes | HTTP method: GET, POST, PUT, PATCH, DELETE |
| `url` | Yes | The full API URL to call |
| `body` | No | Optional request body for POST/PUT/PATCH |

**Server-side authentication**: The server reads the `auth_header` templates from each skill's config schema, substitutes the stored credential values, and injects the resulting headers into the outgoing request. The LLM never handles tokens — it just specifies the method and URL from the skill's documentation.

**Hostname-scoped auth**: When multiple skills are installed, the server only applies a skill's auth headers if the request URL matches that skill's known hostname (derived from its `base_url` extracted automatically from SKILL.md, or from any URL-shaped config value like `*_BASE_URL`). This prevents two skills that both define `Authorization` headers from silently overwriting each other.

**Failed calls are surfaced, never silent**: If an API call fails — an error status (e.g. `401 Unauthorized`, `403 Forbidden`, `422 Unprocessable Entity`) or a timeout — the failure is shown to you, not glossed over:

- A red skill step appears in the chat's research process with the failing method, URL, and HTTP status.
- The final answer explicitly tells you the action **did not succeed** and why (using the API's error message), rather than implying it worked.

This matters most for **write actions** (creating a ticket, opening a PR, posting data): a failed write will be reported as failed. Common causes are a missing/expired token (configure it in the skill's Setup Wizard), insufficient permissions on the token's account, or a request the target API rejects as invalid.

## On-Demand Activation

In addition to auto-activation, the agent retains `activate_skill` and `list_skills` tools for activating additional skills mid-conversation. This is a fallback mechanism — in practice, all enabled skills are pre-loaded at session start.

## Creating Skills

A skill is a directory containing a `SKILL.md` file with YAML frontmatter and instruction body:

```markdown
---
name: my-skill
description: What this skill does and when to use it.
license: Apache-2.0
metadata:
  author: your-name
  version: "1.0"
---

# My Skill

Instructions for the agent go here. Be specific about:
- When to use this skill
- What API endpoints are available
- What parameters each endpoint accepts
- What the response format looks like

## API Reference

### Search

GET https://api.example.com/search?q={query}

Returns a JSON array of results with title and url fields.
```

The agent reads these instructions and uses `http_request` to call the described endpoints. Write your SKILL.md as if you are explaining the API to a developer — the agent will follow the documentation to construct requests.

### Configuration Schema

When a skill requires authentication, the Setup Wizard automatically detects the needed variables by analyzing the SKILL.md. The extracted schema is stored on the Neo4j Skill node and includes:

- **`name`** — Variable name (e.g., `API_TOKEN`)
- **`description`** — Where to find this value
- **`required`** — Whether the skill can function without it
- **`type`** — `"secret"` for tokens/passwords, `"text"` for URLs/identifiers
- **`auth_header`** — HTTP header template (e.g., `Authorization: Bearer API_TOKEN`)

Values are saved in `config.json` inside the skill directory. The `auth_header` template is used at runtime to build authentication headers server-side.

## Configuration

| Variable | Default | Description |
|----------|---------|-------------|
| `ENABLE_SKILLS` | `true` | Master switch for the skills system |
| `SKILLS_DIR` | `.agents/skills` | Skills directory path |
| `ENABLE_AGENT_CHAT` | `true` | Enable agentic chat mode (required for skills in Chat) |
| `SKILL_HTTP_TIMEOUT` | `15` | HTTP tool timeout in seconds |
| `MAX_SKILL_TOOLS` | `10` | Max skill tools in the agent |
| `ENABLE_SKILL_SCRIPTS` | `false` | Allow legacy script execution (security-sensitive) |
| `SKILL_SCRIPT_TIMEOUT` | `30` | Script timeout in seconds |

## Docker Volume Persistence

In Docker deployments, the skills directory is persisted via a named volume so installed skills survive container restarts:

```yaml
volumes:
  - skills_data:/app/.agents/skills
```

This is preconfigured in both `docker-compose.prod.yml` and the Coolify deployment file.

## API Endpoints

All endpoints require Admin authentication.

| Method | Endpoint | Description |
|--------|----------|-------------|
| `GET` | `/api/admin/skills` | List installed skills |
| `GET` | `/api/admin/skills/{id}` | Get full skill details (body + tools) |
| `POST` | `/api/admin/skills/install` | Install from URL or registry |
| `PATCH` | `/api/admin/skills/{id}` | Enable/disable a skill |
| `DELETE` | `/api/admin/skills/{id}` | Uninstall a skill |
| `POST` | `/api/admin/skills/discover` | Re-scan local skills directory |
| `POST` | `/api/admin/skills/{id}/analyze` | LLM-analyze SKILL.md for config variables |
| `GET` | `/api/admin/skills/{id}/config` | Get config schema + current values (secrets masked) |
| `PUT` | `/api/admin/skills/{id}/config` | Save config values (masked values preserve existing secrets) |
| `GET` | `/api/admin/skills/registry/search?q=` | Search skills.sh registry |

## Security

- The LLM never sees API tokens or credentials — authentication is injected server-side
- Only `auth_header` templates from the config schema are used to build HTTP headers
- Secret values in the config endpoint are masked (`********`) in API responses
- Config files are stored locally in the skill directory, not in the database
- Script execution is **disabled by default** (`ENABLE_SKILL_SCRIPTS=false`)
- All management endpoints require admin authentication
- Tool results are capped at 4000 characters
- Auth-related lines are stripped from skill instructions before injection to prevent the model from attempting manual authentication


---

## Document: Search

Hybrid search combining vector similarity, keyword matching, and graph traversal

URL: /features/search


# Search

import { Mermaid } from "zudoku/mermaid";

Cortex implements **hybrid search** that combines three search methods for optimal results: vector similarity, keyword matching, and knowledge graph traversal.

## Search Methods

### Vector Search (Semantic)

Uses embedding similarity to find conceptually related content, even with different wording.

- Powered by OpenAI text-embedding-3-small (1536 dimensions)
- Stored and queried via Neo4j's native vector index
- Finds related concepts, synonyms, and paraphrases

### Keyword Search (Full-Text)

Traditional full-text search for exact term matching.

- Uses Neo4j's full-text indexes
- Great for specific terms, names, and codes
- Supports phrase matching

### Graph Traversal

Follows relationships in the knowledge graph to find connected information.

- Traverses up to 2 hops from matched entities
- Finds related entities even if not mentioned in the query
- Leverages GraphRAG entity extraction

## Hybrid Search Architecture

<Mermaid chart={`
flowchart TD
    Q["🔍 User Query"] --> V["Vector Search<br/><i>Weight: 0.5</i>"]
    Q --> K["Keyword Search<br/><i>Weight: 0.3</i>"]
    Q --> G["Graph Traversal<br/><i>Weight: 0.2</i>"]
    
    V --> RRF["Reciprocal Rank Fusion"]
    K --> RRF
    G --> RRF
    
    RRF --> RERANK["Cross-Encoder<br/>Re-ranking"]
    RERANK --> RESULTS["📊 Final Results"]
`} />

## API Usage

### Basic Search

```bash
curl -X POST "http://localhost:8000/api/search" \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "machine learning algorithms",
    "limit": 10
  }'
```

### Response

```json
{
  "results": [
    {
      "id": "chunk_abc123",
      "content": "Machine learning algorithms can be categorized into supervised, unsupervised, and reinforcement learning...",
      "score": 0.92,
      "document_id": "doc_xyz789",
      "document_title": "ML Fundamentals.pdf",
      "metadata": {
        "page": 15,
        "chunk_index": 3
      }
    }
  ],
  "total": 45,
  "query_time_ms": 127
}
```

### Advanced Search

```bash
curl -X POST "http://localhost:8000/api/search" \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "neural network training",
    "limit": 20,
    "collection_id": "research-papers",
    "search_type": "hybrid",
    "filters": {
      "document_type": "pdf"
    }
  }'
```

## Search Types

| Type | Description | When to Use |
|------|-------------|-------------|
| `hybrid` | All methods combined (default) | General queries |
| `vector` | Semantic search only | Conceptual/abstract queries |
| `keyword` | Full-text search only | Exact term matching |
| `graph` | Graph traversal only | Relationship exploration |

## Reciprocal Rank Fusion (RRF)

Cortex uses RRF to combine results from multiple search methods:

```
RRF_score(d) = Σ 1 / (k + rank(d))
```

Where:
- `k` is a constant (default: 60)
- `rank(d)` is the position in each result list

The weights for each method are configurable:

```bash
VECTOR_WEIGHT=0.5
KEYWORD_WEIGHT=0.3
GRAPH_WEIGHT=0.2
```

## Cross-Encoder Re-ranking

After RRF fusion, a cross-encoder model re-ranks the top candidates for higher precision:

- Model: `cross-encoder/ms-marco-MiniLM-L-6-v2`
- Considers query-document pairs together
- Significantly improves relevance of top results

Enable/disable via:

```bash
ENABLE_RERANKING=true
RERANKING_MODEL=cross-encoder/ms-marco-MiniLM-L-6-v2
```

## Collection-Scoped Search

By default, search queries span **all collections**. To narrow results to a specific collection, pass `collection_id`:

```bash
# Search all collections (default — omit collection_id)
curl -X POST "http://localhost:8000/api/search" \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "quarterly results"
  }'

# Search a specific collection
curl -X POST "http://localhost:8000/api/search" \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "quarterly results",
    "collection_id": "financial-reports"
  }'
```

## Graph Context in Search

When graph search is enabled, results may include entity relationships:

```json
{
  "results": [...],
  "graph_context": {
    "entities": [
      {"name": "Neural Networks", "type": "Concept"},
      {"name": "Deep Learning", "type": "Concept"}
    ],
    "relationships": [
      {
        "source": "Neural Networks",
        "target": "Deep Learning",
        "type": "PART_OF"
      }
    ]
  }
}
```

## Performance Tips

1. **Use collection scoping when needed** - Search spans all collections by default; narrow to a specific collection for focused results
2. **Adjust weights** - Tune `VECTOR_WEIGHT`, `KEYWORD_WEIGHT`, `GRAPH_WEIGHT` for your use case
3. **Limit results** - Start with smaller limits and paginate
4. **Enable re-ranking** - Improves precision for the final results


---

## Document: Knowledge Graph

AI-powered entity extraction, relationships, and graph visualization

URL: /features/knowledge-graph


# Knowledge Graph

import { Mermaid } from "zudoku/mermaid";

Cortex automatically builds a knowledge graph from your documents using GraphRAG (Graph Retrieval-Augmented Generation). This enables semantic understanding, relationship discovery, and enhanced search capabilities.

## What is GraphRAG?

GraphRAG uses Large Language Models to:

1. **Extract Entities** - Identify people, organizations, concepts, technologies, locations
2. **Find Relationships** - Discover how entities relate to each other
3. **Resolve Semantics** - Merge duplicate entities with different names
4. **Build Structure** - Create a traversable graph stored in Neo4j

## Two-Phase Extraction Pipeline

Cortex uses a two-phase approach for building the knowledge graph:

**Phase A (per-document, during upload):** Entity extraction from each document using a dedicated extraction model (can be a smaller, faster model). Entities are fuzzy-matched to chunks for provenance tracking. Entity resolution deduplicates entities at storage time using embedding-based vector similarity (when `ENABLE_SEMANTIC_ENTITY_RESOLUTION=true`) to catch semantic matches like "Museum of Crypto Art" / "MOCA", with Levenshtein 85% as a fallback. This merges variants like "OpenAI" and "Open AI" into a single node with aliases. Entity types are strictly enforced to 10 allowed types (Person, Organization, Location, Concept, Technology, Event, Product, Document, System, Process) via fuzzy matching. After entity extraction and chunk linking, **per-chunk relationship extraction** runs: chunks with 2+ linked entities get an LLM call using the **relationship model** (falls back to extraction model, then primary model) to extract relationships using the chunk text as direct evidence. Entity names in per-chunk relationships are automatically mapped to their canonical (dedup-resolved) names before storage, ensuring relationships reference the correct merged entities. Self-referential relationships (where source and target are the same entity) are automatically filtered out. Per-chunk extraction uses retry with exponential backoff (tenacity, 4 attempts, 2-30s wait) for rate limit errors, and concurrency is controlled by `CONCURRENT_RELATIONS` (default 3). These relationships are stored with `extraction_method='per_chunk'` and provide high-confidence, evidence-grounded connections before Phase B runs.

**Phase B (per-collection, on demand):** Cross-document relationship analysis using the **relationship model** (falls back to extraction model, then primary model). Uses a two-phase approach per batch:

1. **Candidate scanning** — The relationship model scans entity pairs with their co-occurring chunk context and proposes candidate relationships. Includes few-shot good and bad examples to guide the LLM, plus anti-hub negative instructions ("If no clear relationship exists, do not create one") with bad examples showing co-occurrence pairs to avoid.
2. **Structured confirmation** — The relationship model confirms candidates with structured XML output including confidence scores (0.0-1.0), grounding each relationship in source text. Relationships with confidence < 0.5 are filtered before storage. Self-referential relationships (where source and target are the same entity) are also automatically filtered out at both the extraction and storage levels.

Entities are grouped into batches using **Union-Find co-occurrence clustering** — entities that share chunks are grouped together, with high/low connection count interleaving to prevent hub entities from concentrating in early batches. Batch overlap is 5% with degree-aware selection (entities already in 2+ batches are excluded from overlap). Chunk context is filled dynamically per batch using a **60/40 token split** (60% for chunk text, 40% for entity descriptions and output budget), with **greedy entity-coverage-diversity** selection that maximizes coverage of different entities rather than always picking chunks dominated by hub entities.

Supports **multi-round discovery**: initial analysis runs multiple rounds (default 3, configurable via `RELATIONSHIP_MAX_ROUNDS`), "Find more" (re-analyze) always does 1 round. Progress is tracked cumulatively across all rounds (e.g., "Batch 400/861" for 3 rounds of 287 batches). Guided by a target **ERR (Entity-Relationship Ratio)** metric (default 1.0). The ERR is also displayed on the Knowledge Graph page as a quality indicator.

**Anti-hub protections**: A per-entity relationship cap (`RELATIONSHIP_MAX_PER_ENTITY`, default 50) prevents any single entity from accumulating disproportionate connections — relationships are skipped when both endpoints exceed the cap. Existing relationships shown to the LLM between rounds are capped at 20 per entity (highest weight first) to prevent hub reinforcement. Prompts emphasize direct, evidence-based relationships and explicitly discourage star patterns through common intermediary entities.

Supports two modes: **incremental** (builds on existing relationships) and **rebuild** (deletes only batch-analysis relationships, preserving per-chunk relationships from Step 1, then re-analyzes from scratch with multi-round). Relationship types are constrained to 14 standard types via fuzzy matching (MENTIONS was removed as it was a lazy co-occurrence catch-all). A **plaintext fallback parser** handles arrow-format output (e.g., `Entity1 -> RELATION -> Entity2`) when XML parsing fails. Triggered via API or the Knowledge Graph page after document processing.

<Mermaid chart={`
flowchart TD
    A["📄 Document Upload"] --> B["📝 Chunk & Embed"]
    B --> C["🤖 Phase A: Entity Extraction"]
    C --> D["🔍 Fuzzy Match to Chunks"]
    D --> E["💾 Store Entities + MENTIONS"]

    F["🔄 POST /api/graph/relationships/analyze"] --> G["🤖 Phase B: Relationship Analysis"]
    G --> H["💾 Store Relationships"]
    H --> I["🏘️ Community Detection"]

    C -.-> C1["Per-document (1-4 LLM calls)"]
    G -.-> G1["Per-collection (relationship model)"]
`} />

## Generate Graph (One-Click 3-Step Pipeline)

The **Knowledge Graph** page (`/extract`) exposes a single **Generate Graph** / **Regenerate Graph** button that runs all three steps — entity extraction, cross-document relationship analysis, and community detection — server-side as a chained background flow.

Under the hood, the frontend issues one request:

```bash
curl -X POST "http://localhost:8000/api/documents/reprocess?chain=relationship_analysis,community_detection" \
  -H "X-API-Key: your-api-key" \
  -d '{"document_ids": [...]}'
```

The `chain` query parameter (also accepted on `/api/documents/process-pending` and `/api/graph/relationships/analyze`) tells the backend to automatically spawn the next pipeline step when each task finishes. Each step still produces its own task with its own `task_id`, `task_type`, and progress messages — Step 1's task is held in `running` state until background image analysis also completes, so the chain only advances when image-derived entities have landed.

**Why this matters:** the full flow survives navigation away from the Knowledge Graph page, page reloads, and even closing the browser. The user can return at any time and the page re-attaches to whichever pipeline task is currently running. Plain single-step buttons ("Extract Entities", "Analyze Relationships", "Detect Communities") never auto-chain; only the Generate Graph flow sets the chain string.

## Entity Types

Cortex extracts various entity types:

| Type | Examples |
|------|----------|
| `Person` | Names, authors, researchers |
| `Organization` | Companies, institutions, teams |
| `Concept` | Ideas, theories, methodologies |
| `Technology` | Software, tools, frameworks |
| `Location` | Places, regions, countries |
| `Event` | Conferences, releases, meetings |
| `Product` | Products, services, offerings |
| `Document` | Papers, reports, articles |
| `System` | Platforms, infrastructure, OS |
| `Process` | Workflows, procedures, methods |

Entity types are strictly enforced during extraction — non-standard types are automatically fuzzy-matched to the nearest allowed type (defaulting to `Concept` if no match above 75%).

## Relationship Types

Common relationship types discovered:

| Relationship | Description |
|--------------|-------------|
| `RELATED_TO` | Generic/fallback relationship |
| `CREATED_BY` | Creator/creation relationship |
| `WORKS_FOR` | Employment relationship |
| `PART_OF` | Containment/membership |
| `USES` | Utilization relationship |
| `LOCATED_IN` | Geographic relationship |
| `IMPLEMENTS` | Implementation relationship |
| `DEPENDS_ON` | Dependency relationship |
| `IS_A` | Classification relationship |
| `HAS_PROPERTY` | Attribute relationship |
| `FOUNDED_BY` | Founding relationship |
| `FEATURES` | Feature/capability relationship |
| `CONTAINS` | Containment relationship |
| `INTERACTS_WITH` | Interaction relationship |

Relationship types are strictly enforced — the LLM is instructed to only use these 14 types, and any non-standard types in the output are fuzzy-matched to the nearest allowed type (80% threshold, fallback to `RELATED_TO`). The `MENTIONS` type was intentionally excluded as it was being used as a lazy catch-all for co-occurrence without meaningful semantic content.

## Graph Visualization

<Mermaid chart={`
graph LR
    subgraph Knowledge Graph
        A[OpenAI] -->|CREATED| B[GPT-4]
        A -->|CREATED| C[ChatGPT]
        A -->|LED_BY| D[Sam Altman]
        B -->|USES| E[Transformer]
        C -->|USES| E
        E -->|INVENTED_BY| F[Google]
    end
`} />

## API Usage

### Get Graph Statistics

```bash
curl "http://localhost:8000/api/stats" \
  -H "X-API-Key: your-api-key"
```

```json
{
  "document_count": 156,
  "chunk_count": 4280,
  "entity_count": 1542,
  "relationship_count": 3891,
  "per_chunk_relationship_count": 1204,
  "community_count": 23,
  "collection_count": 5
}
```

### Get Graph Visualization Data

```bash
curl "http://localhost:8000/api/graph/visualization" \
  -H "X-API-Key: your-api-key"
```

```json
{
  "nodes": [
    {"id": "ent_1", "label": "OpenAI", "type": "Organization"},
    {"id": "ent_2", "label": "GPT-4", "type": "Technology"}
  ],
  "edges": [
    {"source": "ent_1", "target": "ent_2", "type": "CREATED"}
  ]
}
```

### Get Entity Details

```bash
curl "http://localhost:8000/api/graph/entities/ent_abc123" \
  -H "X-API-Key: your-api-key"
```

```json
{
  "id": "ent_abc123",
  "name": "OpenAI",
  "type": "Organization",
  "description": "AI research company",
  "mention_count": 45,
  "related_documents": ["doc_1", "doc_2", "doc_3"],
  "relationships": [
    {"target": "GPT-4", "type": "CREATED"},
    {"target": "ChatGPT", "type": "CREATED"},
    {"target": "Sam Altman", "type": "LED_BY"}
  ]
}
```

### Get Entity Relationships

```bash
curl "http://localhost:8000/api/graph/entities/ent_abc123/relationships" \
  -H "X-API-Key: your-api-key"
```

### Query Subgraph

Get a subgraph starting from a specific entity:

```bash
curl -X POST "http://localhost:8000/api/graph/subgraph" \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "entity_name": "Machine Learning",
    "max_depth": 2,
    "limit": 50
  }'
```

### Search Entities

```bash
curl "http://localhost:8000/api/graph/entities?search=neural&type=Concept&limit=20" \
  -H "X-API-Key: your-api-key"
```

## Entity Editing

You can edit an entity's name or description directly from the **Explore > Entities** tab. Click any entity to open the detail modal, then click the pencil icon next to the name or description to edit inline.

- **Name changes** preserve the old name in the entity's `aliases` array, so searches for the old name still work
- **Duplicate names** are rejected — the new name must be unique
- **Graph integrity** is maintained — all relationships and chunk mentions remain intact because Neo4j edges connect to nodes, not name strings

### Update Entity via API

```bash
curl -X PATCH "http://localhost:8000/api/graph/entity/OpenAI" \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "OpenAI, Inc.",
    "description": "AI research and deployment company"
  }'
```

```json
{
  "name": "OpenAI, Inc.",
  "type": "Organization",
  "description": "AI research and deployment company",
  "aliases": ["OpenAI"]
}
```

## Semantic Entity Resolution

Cortex automatically merges entities that refer to the same thing:

<Mermaid chart={`
flowchart LR
    A["'OpenAI'"] --> D["Merged Entity:<br/><b>OpenAI</b>"]
    B["'Open AI'"] --> D
    C["'openai'"] --> D
    
    D --> E["Used in search"]
    D --> F["Displayed in graph"]
`} />

When `ENABLE_SEMANTIC_ENTITY_RESOLUTION=true` (default), entity resolution uses **embedding-based vector similarity** via Neo4j's vector index to catch semantic matches that string similarity misses (e.g., "Museum of Crypto Art" and "MOCA"). Levenshtein string matching is used as a fallback.

This applies symmetrically to entities extracted from text **and** from image descriptions. The image-analysis pipeline now batch-embeds each image's extracted entities (one `embeddings` call per image, gracefully falling back to Levenshtein on failure) before storing them — so the same `entity_embedding` vector index covers both surfaces and cross-source duplicates (e.g. an "MOCA" entity from an image caption and a "Museum of Crypto Art" entity from text) collapse at write time.

Configuration:

```bash
ENABLE_SEMANTIC_ENTITY_RESOLUTION=true
ENTITY_SIMILARITY_THRESHOLD=0.85
```

## Entity Deduplication

While Cortex automatically resolves many duplicates during extraction (via embedding-based semantic matching with Levenshtein 85% fallback), some near-duplicates may still slip through -- especially across large document sets or when entity names vary in subtle ways (e.g., "Machine Learning" vs "machine learning (ML)"). The Entity Deduplication feature provides tools to find and merge these remaining duplicates.

### Quick Access from Entities

You can jump directly to deduplication for any entity from the **Explore > Entities** browser. Each entity card has a **merge icon button** that navigates to `/deduplicate?entity=EntityName`, which auto-scans and filters results to that entity. The entity detail modal also has a **Deduplicate** button in the footer.

### Find Duplicate Candidates

Scan the knowledge graph for groups of entities that appear to refer to the same thing:

```bash
curl "http://localhost:8000/api/entities/duplicates?threshold=0.85&limit=50" \
  -H "X-API-Key: your-api-key"
```

```json
{
  "groups": [
    {
      "canonical": "Machine Learning",
      "duplicates": [
        {"name": "machine learning", "type": "Concept", "similarity": 0.95, "mention_count": 12},
        {"name": "ML", "type": "Concept", "similarity": 0.87, "mention_count": 5}
      ],
      "similarity": 0.91
    },
    {
      "canonical": "OpenAI",
      "duplicates": [
        {"name": "Open AI", "type": "Organization", "similarity": 0.92, "mention_count": 3}
      ],
      "similarity": 0.92
    }
  ],
  "total_groups": 2,
  "threshold": 0.85
}
```

The `threshold` parameter (0.5 to 1.0) controls how similar entity names must be to be considered duplicates. Lower values return more candidates but may include false positives.

### Merge Entities

Once you have identified true duplicates, merge them into a single canonical entity:

```bash
curl -X POST "http://localhost:8000/api/entities/merge" \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "canonical": "Machine Learning",
    "merge": ["machine learning", "ML"]
  }'
```

```json
{
  "canonical": "Machine Learning",
  "merged": ["machine learning", "ML"],
  "relationships_transferred": 8,
  "mentions_transferred": 17,
  "aliases_added": ["machine learning", "ML"]
}
```

When entities are merged:
- All relationships from the merged entities are transferred to the canonical entity
- All chunk mentions (MENTIONS links) are moved to the canonical entity
- The merged entity names are added as **aliases** on the canonical entity for future resolution
- The merged entities are removed from the graph

### View Merge History

Review past merge operations:

```bash
curl "http://localhost:8000/api/entities/merge-history?limit=20" \
  -H "X-API-Key: your-api-key"
```

```json
{
  "entries": [
    {
      "canonical": "Machine Learning",
      "merged": ["machine learning", "ML"],
      "relationships_transferred": 8,
      "mentions_transferred": 17,
      "merged_at": "2026-03-18T14:30:00Z"
    }
  ],
  "total": 1
}
```

### Deduplication Workflow

<Mermaid chart={`
flowchart TD
    A["1. Find Duplicates"] --> B["GET /api/entities/duplicates"]
    B --> C["2. Review Candidates"]
    C --> D{"True duplicates?"}
    D -->|Yes| E["3. Merge Entities"]
    D -->|No| F["Skip / Adjust threshold"]
    E --> G["POST /api/entities/merge"]
    G --> H["4. Verify Results"]
    H --> I["GET /api/entities/merge-history"]
    F --> A
`} />

**Tips for effective deduplication:**
- Start with the default threshold (0.85) to find obvious duplicates first
- Lower the threshold gradually to catch more subtle variants
- Use the **inspect button** (eye icon) on each entity in a group to view its full details before deciding to merge
- Always review candidates before merging -- false positives at low thresholds can incorrectly combine unrelated entities
- After merging, re-run community detection to reflect the updated graph structure

## Graph in Search

During search, Cortex uses the knowledge graph to:

1. **Expand queries** - Find related entities
2. **Traverse relationships** - Discover connected information
3. **Add context** - Include entity descriptions in prompts

Configure graph usage:

```bash
MAX_GRAPH_HOPS=2
GRAPH_WEIGHT=0.2
```

## Visualization

The **Knowledge Graph** page (under Manage) provides a one-click **"Generate Graph"** (or **"Regenerate Graph"**) button that runs the full 3-step pipeline in sequence:

1. **Entity Extraction & Relationship Discovery** — Process uploaded documents to extract entities and discover relations (per-chunk) grounded in the source text. Shows entity and relation counts. This step is also **image-analysis-aware**: documents that have finished text processing but still have background image analysis running (e.g., "Analyzed 3/67 images") are treated as in-progress. The step displays an aggregate image analysis progress bar and stays in "In Progress" state until all images across all documents have been analyzed. This ensures entities from images are included before advancing to relationship analysis.
2. **Deep Relationship Analysis** — Discover cross-document relations between entities using the relationship model. Shows only cross-document relation counts (excludes per-chunk). The "Find more" button runs an additional round of incremental analysis. The ERR (Entity-Relationship Ratio) indicator is displayed to 2 decimal places.
3. **Community Detection** — Group related entities into communities.

The button label changes to "Regenerate Graph" when a graph already exists. When regenerating, Step 1 first **deletes all communities, relationships, and entities** (in that order) before reprocessing documents, ensuring a true from-scratch rebuild of the entire knowledge graph. The full cleanup order is: `DELETE /api/graph/communities` → `DELETE /api/graph/relationships` → `DELETE /api/graph/entities` → reprocess all documents → relationship analysis (rebuild mode) → community detection.

The pipeline is resilient to page refreshes — progress is persisted via `sessionStorage` with a saved task ID (`regenerateTaskId`) for the active step's backend task. On resume, the step runner checks the saved task's status: **running** → resume polling, **completed** → advance to the next step, **failed** → abort, **not found** → start fresh. This explicit task-based approach eliminates false step-skipping that could occur from stale data. Each step also tracks staleness via persisted timestamps, ensuring users are always guided to keep the knowledge graph up to date.

The **Documents** page also provides a **"Generate Graph"** button. It navigates to the Knowledge Graph page **and auto-starts the pipeline on arrival** — one click on Documents, no second click needed. The button appends `?autostart=1` to the destination URL; the Knowledge Graph page detects the flag, waits for its initial data fetch, fires the chain once, and clears the URL parameter so a refresh doesn't re-trigger. The destructive-action confirmation dialog still appears if you already have entities in the graph.

The **Explore** section provides an interactive graph visualization (Knowledge Graph tab) plus read-only browsers for Entities, Relationships, and Communities:

- **Node Colors** - Different colors for entity types
- **Node Size** - Based on mention frequency
- **Edge Labels** - Relationship types
- **Clustering** - Related entities grouped together
- **Filtering** - Filter by entity type or relationship
- **Search** - Find specific entities

## Configuration

```bash
# Enable/disable graph extraction
ENABLE_GRAPH_EXTRACTION=true

# Model for entity extraction and community summarization (defaults to OPENAI_MODEL)
# Recommended: instruction-following models (e.g. Mistral Small 24B, Ministral 14B)
# GRAPH_EXTRACTION_MODEL=gpt-4o-mini
# GRAPH_EXTRACTION_API_BASE=http://localhost:11434/v1
# GRAPH_EXTRACTION_API_KEY=ollama

# Model for relationship extraction (defaults to GRAPH_EXTRACTION_MODEL → OPENAI_MODEL)
# Recommended: instruction-following models (e.g. OpenAI GPT OSS 120B)
# Used for both per-chunk (Step 1) and batch analysis (Step 2)
# Runs on a separate rate limit from entity extraction
# RELATIONSHIP_EXTRACTION_MODEL=gpt-4o-mini
# RELATIONSHIP_EXTRACTION_API_BASE=http://localhost:11434/v1
# RELATIONSHIP_EXTRACTION_API_KEY=ollama

# Concurrent per-chunk relationship extractions per document (default: 3)
# CONCURRENT_RELATIONS=3

# Graph traversal depth
MAX_GRAPH_HOPS=2

# Context + output budgets cascade from primary (OPENAI_MAX_*) when set to 0.
# See "Budget Fallback Chain" in /pages/configuration for the full chain.
# GRAPH_EXTRACTION_MAX_CONTEXT=0              # 0 = inherit OPENAI_MAX_CONTEXT (=32768)
# EXTRACTION_MAX_OUTPUT_TOKENS=0        # 0 = inherit OPENAI_MAX_OUTPUT_TOKENS (=8000); only override to constrain or expand this specific tier

# Entity resolution
ENABLE_SEMANTIC_ENTITY_RESOLUTION=true
ENTITY_SIMILARITY_THRESHOLD=0.85

# Relationship analysis (Phase B / Step 2)
# RELATIONSHIP_MAX_CONTEXT=0                    # 0 = inherit GRAPH_EXTRACTION_MAX_CONTEXT → primary
# RELATIONSHIP_MAX_OUTPUT_TOKENS=0              # per-chunk + candidate scan (in chain)
# RELATIONSHIP_BATCH_MAX_OUTPUT_TOKENS=16000    # Phase 2 batch (standalone, NOT in chain)
# PARALLEL_RELATIONSHIP_BATCHES=5       # Batches processed in parallel
# RELATIONSHIP_TARGET_RATIO=1.0         # Target entity-to-relationship ratio (ERR)
# RELATIONSHIP_MAX_ROUNDS=3             # Max discovery rounds per batch (initial analysis)
# RELATIONSHIP_MAX_HOURS=0              # Max hours for relationship analysis (0 = no limit)
# RELATIONSHIP_MAX_PER_ENTITY=50        # Soft cap on relationships per entity (0 = no cap)

# Reasoning Control — lets you use reasoning-capable models for ingestion
# (GPT-5/5.1, Claude 4.x, Qwen3, DeepSeek-R1, etc.) with their thinking
# forced OFF. Reasoning hurts structured extraction (drift, hidden-token
# cost, latency, malformed JSON). No-op for pure instruct models.
# Values: off | minimal | auto | low | medium | high
EXTRACTION_REASONING_MODE=off    # extraction + summaries + communities
RELATIONSHIP_REASONING_MODE=off  # candidate scan + relationship extraction
VISION_REASONING_MODE=off        # vision-model image descriptions (e.g. Qwen3-VL-27B)
DEFAULT_REASONING_MODE=auto      # Q&A / researcher agent (stays AUTO)
# REASONING_MODEL_OVERRIDES=gpt-5.8:none,custom:minimal  # escape hatch
```

**Reasoning across providers.** Cortex auto-detects provider from `base_url` (OpenAI, OpenRouter, Venice, Anthropic, vLLM/Compute3) and model family by regex on the model string, then injects the right kwargs: `reasoning_effort` for OpenAI direct, `extra_body.reasoning.effort` for OpenRouter, `extra_body.venice_parameters.disable_thinking` for Venice, `extra_body.thinking={type:disabled}` for Anthropic, `extra_body.chat_template_kwargs.enable_thinking=false` for vLLM/Compute3 (Qwen3-style). New minor releases route automatically via regex (e.g. `gpt-5.8` works like `gpt-5.1`). For models the heuristic misclassifies, use `REASONING_MODEL_OVERRIDES`. Runtime fallback: if a model rejects the param with a 400, the wrapper strips it on retry and caches the model so subsequent calls skip the param.

**Caveats.** `gpt-5-pro` is hard-pinned to `high` by OpenAI (OFF is logged + ignored). `gpt-5-codex` auto-downgrades `minimal`→`low`. Anthropic Opus 4.7+ uses adaptive thinking (helper omits `thinking`). OpenRouter `exclude:true` does NOT save tokens; we use `effort:"none"`. The researcher/Q&A agent stays on AUTO because `reasoning_effort=minimal` disables parallel tool calls on OpenAI.

## Recommended Workflow

1. **Upload documents** - entities are extracted per-document (Phase A)
2. **Analyze relationships** - `POST /api/graph/relationships/analyze` (Phase B)
3. **Detect communities** - `POST /api/graph/communities/detect`

## Graph Cleanup

When documents are deleted, Cortex automatically cleans up the knowledge graph:

<Mermaid chart={`
flowchart TD
    A["🗑️ Delete Document"] --> B["⏹️ Cancel Processing"]
    B --> C["🔍 Find Orphaned Entities"]
    C --> D["❌ Delete Orphans"]
    D --> E["🧹 Clean Communities"]
    E --> F["✅ Graph Clean"]
    
    C -.-> C1["Entities only in<br/>deleted document"]
    E -.-> E1["Communities with<br/>no members"]
`} />

### What Gets Cleaned

| Item | Behavior |
|------|----------|
| **Active Processing** | Cancelled immediately before deletion |
| **Chunks** | All chunks from the document are removed |
| **Entities** | Removed if only mentioned by this document |
| **Shared Entities** | Preserved if mentioned by other documents |
| **Relationships** | Removed with their entities (DETACH DELETE) |
| **Communities** | Removed if no member entities remain |

### Cleanup API

Clean up orphaned data manually:

```bash
# Clean orphaned entities and communities
curl -X POST "http://localhost:8000/api/cleanup/orphaned-entities" \
  -H "X-API-Key: your-api-key"
```

Response:
```json
{
  "message": "Cleanup completed",
  "orphaned_entities_removed": 42,
  "orphaned_communities_removed": 3
}
```

The StatsBar in the UI also shows a cleanup banner when it detects orphaned graph data (entities exist but no documents).

### Entity Management

Delete all entities and their connections (DETACH DELETE). This removes every entity node and all relationships attached to them:

```bash
# Delete ALL entities and their connections
curl -X DELETE "http://localhost:8000/api/graph/entities" \
  -H "X-API-Key: your-api-key"
```

Response:
```json
{
  "entities_deleted": 1542
}
```

### Relationship Management

Delete all entity relationships (useful before re-running relationship analysis):

```bash
# Delete ALL relationships between entities
curl -X DELETE "http://localhost:8000/api/graph/relationships" \
  -H "X-API-Key: your-api-key"
```

Response:
```json
{
  "relationships_deleted": 142
}
```

### Community Management

Delete individual or all communities:

```bash
# Delete a specific community (entities are unlinked, not deleted)
curl -X DELETE "http://localhost:8000/api/graph/communities/5" \
  -H "X-API-Key: your-api-key"

# Delete ALL communities
curl -X DELETE "http://localhost:8000/api/graph/communities" \
  -H "X-API-Key: your-api-key"
```

### System Reset

For a complete wipe of all data, use the **System Reset** feature in Settings → Danger Zone, or call the API directly:

```bash
curl -X POST "http://localhost:8000/api/admin/reset" \
  -H "X-API-Key: your-admin-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "delete_documents": true,
    "delete_uploaded_files": true,
    "delete_custom_inputs": true,
    "delete_collections": true,
    "delete_api_keys": false
  }'
```

When documents are deleted, the system also cleans up all associated data:

| Data | Cleaned Up |
|------|------------|
| **Documents, Chunks** | All document and chunk nodes |
| **Entities, Relationships** | All entity nodes and their relationships |
| **Communities** | All community nodes |
| **Merge History** | All deduplication audit trail (`MergeHistory` nodes) |
| **System Metadata** | Staleness timestamps (`SystemMeta` nodes) |
| **Client Cache** | Dismissed dedup suggestions, regeneration flow state |

## Best Practices

1. **Quality Documents** - Better source documents = better entities
2. **Review Entities** - Periodically review extracted entities for accuracy
3. **Tune Threshold** - Adjust `ENTITY_SIMILARITY_THRESHOLD` based on your domain
4. **Deduplicate Periodically** - After large ingestion batches, use `GET /api/entities/duplicates` to find and merge remaining duplicates that automatic resolution missed
5. **Use Communities** - Enable community detection for large graphs
6. **Safe Deletion** - Delete documents safely knowing the graph will be cleaned automatically
7. **Clean Up After Bulk Deletion** - After deleting many documents, run cleanup or use the UI banner to remove orphaned graph data


---

## Document: Git Integration

Connect GitHub, GitLab, and Gitea repositories as a living knowledge source — ingest files and wikis into the graph, and let the agent open pull requests

URL: /features/git-integration


# Git Integration

Git Integration connects Cortex directly to your repositories. A connected repo becomes a **bidirectional interface**: Cortex ingests its files and wiki into the knowledge graph (read), and the research agent can act on it by opening pull requests (read/write). It works with **GitHub, GitLab, and Gitea**, including self-hosted instances.

Enable it with `ENABLE_GIT_INTEGRATION=true`, then manage connections from **Settings → Git Integration**.

## Overview

Each connection has an **access level** that decides what Cortex can do:

| Access level | What it enables |
|---|---|
| **Read-only** | Ingestion only — repo files and (optionally) the wiki flow into the knowledge graph, searchable alongside your other documents. |
| **Read/write** | Ingestion **plus** the agent's `git_repo` tool — it can read live files and open pull requests with proposed changes. |

A repo's content is chunked, embedded, and run through entity/relationship extraction exactly like uploaded documents — so once synced, it shows up in Search, Ask AI, the knowledge graph, and communities.

## Connecting a repository

1. Go to **Settings → Git Integration** and click **Connect repository**.
2. Pick a **provider** (GitHub / GitLab / Gitea). For self-hosted GitLab/Gitea, fill in the API base URL.
3. Paste a **personal access token**. The form shows an inline, vendor-specific guide telling you exactly which token to create — always the **least-privilege** option — with a direct link to the right settings page. Click **Test** to verify it.
4. Enter the **owner/org** and **repository**.
5. Choose the **access level**.
6. Leave the default **"Only ingest `.pdf` and `.md` files"** checked, or uncheck it to set custom filters (see [Filtering](#filtering-what-gets-ingested)).
7. Click **Connect**, then **Sync** to ingest.

### Which token to create

Cortex always recommends the most narrowly-scoped token that works:

- **GitHub** — a **fine-grained personal access token** scoped to the one repo, with **Contents: Read-only** for ingestion. For read/write, add **Contents: Read and write** + **Pull requests: Read and write**. (To ingest a GitHub **wiki**, use a classic token with the `repo` scope — fine-grained tokens don't cover wikis.)
- **GitLab** — a **Project Access Token** with role **Reporter** and scope `read_repository`. For merge requests, use role **Developer** with `api` + `write_repository`.
- **Gitea** — a scoped personal access token with **Repository: Read**. For pull requests, **Repository: Read and Write** plus **Issue: Read and Write** (for PR comments).

The token is stored server-side, masked in the UI (`••••abcd`), injected automatically into every git and API call, and never exposed to the agent or written into logs.

## Filtering: what gets ingested

By default, new connections ingest **`.pdf` and `.md` files only** — a sensible, safe default for documentation. Uncheck **"Only ingest .pdf and .md files"** to reveal **include** and **exclude** glob fields (gitignore-style patterns, comma-separated), for example:

- Include: `src/**, docs/**`
- Exclude: `**/node_modules/**, *.lock`

Supported file types are text/code (`.py`, `.ts`, `.go`, `.md`, …) and documents (`.pdf`, `.docx`, `.pptx`, …). Code and markdown ingest through a fast path that skips Docling; PDFs and Office files route through Docling. Images and audio are **not** ingested from repos.

## Incremental sync

Re-syncing is incremental — Cortex does **not** re-ingest the whole repo each time. It records the last-synced commit and uses git history to classify what changed:

| Change | What Cortex does |
|---|---|
| **Added** | Creates a new document |
| **Modified** | Re-extracts that document in place (old chunks/relationships replaced) |
| **Deleted** | Flags the document for review (never auto-deleted) |
| **Renamed** | Remaps the document's path |

If the branch was force-pushed or you change the filters, sync self-heals via a full-tree reconcile (comparing every file's content hash to what's stored). After any change, the graph is flagged **stale** so you know to re-run relationship analysis and community detection from the **Knowledge Graph** page.

> Deleted files are **flagged, not removed.** Open a connection's details to review documents whose source file disappeared from the repo, then delete them from the Documents page if you no longer need them.

## Keeping repos up to date

- **Manual** — click **Sync** on a connection any time.
- **Scheduled** — set an **Auto-sync** interval (minutes) under Advanced; a background poller re-syncs that connection on schedule. No webhooks or public endpoint required.

## The agent's `git_repo` tool (read/write)

When a connection is **read/write**, the research agent gains a `git_repo` tool with three actions:

- **read_file** — fetch a file's current contents.
- **propose_change** — open a pull request with edits.
- **comment** — comment on an existing pull request.

Safety is enforced in code, not just by prompt: every write creates a new `cortex/agent-…` branch and opens a **pull request for human review** — the agent never pushes to your default branch. On read-only connections, write actions are refused server-side.

## Editing & removing connections

Expand a connection to **Edit** it — change access level, branch, auto-sync interval, filters, wiki ingestion, or rotate the token. Toggling away from the `.pdf`/`.md` default when you have custom filters defined asks for confirmation first.

**Delete** offers two options: keep the already-ingested documents, or purge them along with the connection.

## Configuration

| Variable | Default | Description |
|---|---|---|
| `ENABLE_GIT_INTEGRATION` | `false` | Master switch for the connector |
| `GIT_WORK_DIR` | `./git_repos` | Where clone working copies are cached (mount a volume in production) |
| `GIT_CLONE_DEPTH` | `1` | Shallow-clone depth |
| `GIT_MAX_REPO_SIZE_MB` | `500` | Abort sync above this repo size (0 = unlimited) |
| `GIT_SYNC_MAX_FILE_SIZE_MB` | `5` | Skip files larger than this (0 = no limit) |
| `GIT_SYNC_POLL_INTERVAL` | `5` | Minutes between scheduled-sync checks |
| `GIT_HTTP_TIMEOUT` | `30` | Timeout (seconds) for provider API calls |
| `GIT_HTTP_INSECURE_HOSTS` | _(empty)_ | Comma-separated hosts allowed to skip TLS verification (self-signed self-hosted) |

The backend image bundles `git`; ensure `GIT_WORK_DIR` is writable.

## See also

- [Document Upload](/features/document-upload) — how ingested content flows through the pipeline
- [Ask AI](/features/ask-ai) — querying ingested repo content
- [Agent Skills](/features/skills) — the other way to extend the agent with external capabilities


---

## Document: Document Upload

Upload and process documents into the Cortex knowledge base

URL: /features/document-upload


# Document Upload

import { Mermaid } from "zudoku/mermaid";

Cortex supports various document formats and provides flexible upload options for both single files and bulk operations. In the web UI, uploading is done directly from the **Documents** page via the **Upload** button, which opens a drag-and-drop modal with collection selection. The modal closes immediately after files are selected — upload progress is shown inline in the document list. Once uploaded, use the **Generate Graph** button on the Documents page to build the full knowledge graph pipeline — entity extraction, relationship analysis, and community detection. The button navigates to the **Knowledge Graph** page under Manage **and auto-starts the chain on arrival**; you don't need to click Generate Graph a second time on the destination page.

## Supported Formats

| Format | Extension | Notes |
|--------|-----------|-------|
| PDF | `.pdf` | Text extraction with layout preservation |
| Plain Text | `.txt` | Direct text ingestion |
| Markdown | `.md` | Preserves formatting, headers, code blocks |
| Word | `.docx` | Text and basic formatting extracted |

## Upload Methods

### Single File Upload

Upload one file at a time with immediate processing:

```bash
# collection_id, start_processing, and source are query parameters
curl -X POST "http://localhost:8000/api/upload?collection_id=default&start_processing=true" \
  -H "X-API-Key: your-api-key" \
  -F "file=@document.pdf"
```

**With a custom source** (for API integrations):

```bash
curl -X POST "http://localhost:8000/api/upload?source=youtube-transcriber" \
  -H "X-API-Key: your-api-key" \
  -F "file=@transcript.md"
```

The `source` parameter lets you tag documents by origin. Defaults to `"upload"` for UI uploads. Set it to your app's identifier (e.g. `"slack-bot"`, `"notion-sync"`) when building integrations. Documents can be filtered by source in the UI.

**Response:**

```json
{
  "filename": "document.pdf",
  "doc_id": "doc_abc123",
  "status": "processing",
  "message": "Document uploaded and processing started",
  "source": "upload"
}
```

### Bulk Upload

For uploading many files (100+), disable immediate processing and batch process later:

```bash
# Step 1: Upload files without processing (start_processing is a query parameter)
for file in *.pdf; do
  curl -X POST "http://localhost:8000/api/upload?start_processing=false" \
    -H "X-API-Key: your-api-key" \
    -F "file=@$file"
done

# Step 2: Start batch processing
curl -X POST "http://localhost:8000/api/documents/process-pending" \
  -H "X-API-Key: your-api-key"
```

## Processing Pipeline

When a document is uploaded, it goes through these stages:

<Mermaid chart={`
flowchart TD
    A["📤 Upload"] --> B["📄 Text Extraction"]
    B --> C["✂️ Chunking"]
    C --> D["🔢 Embedding"]
    D --> E["🔍 Entity Extraction"]
    E --> F["🔗 Semantic Resolution"]
    F --> G["💾 Neo4j Storage"]
    G --> H["✅ Text Complete"]
    G --> I["🖼️ Background Image Analysis"]
    I --> J["✅ Fully Complete"]

    B -.-> B1["PDF/DOCX parser"]
    C -.-> C1["Sentence-based splits"]
    D -.-> D1["Vector embeddings"]
    E -.-> E1["LLM extraction"]
    I -.-> I1["Vision model / Docling / fallback"]
`} />

## Tracking Progress

### Get Document Status

```bash
curl "http://localhost:8000/api/documents/doc_abc123" \
  -H "X-API-Key: your-api-key"
```

**Response:**

```json
{
  "id": "doc_abc123",
  "filename": "document.pdf",
  "status": "completed",
  "chunk_count": 42,
  "entity_count": 18,
  "created_at": "2024-01-15T10:30:00Z",
  "processed_at": "2024-01-15T10:32:15Z",
  "collection_id": "default"
}
```

### Processing Status Values

| Status | Description |
|--------|-------------|
| `pending` | Uploaded, waiting for processing |
| `processing` | Currently being processed |
| `completed` | Successfully processed (text extraction done; images may still be analyzing in the background — check `image_progress_current` vs `image_progress_total`) |
| `failed` | Processing failed (check logs) |

:::note
A document with `status: "completed"` may still have background image analysis running. The `image_progress_current`, `image_progress_total`, and `image_progress_message` fields in the document response track this progress. The Knowledge Graph pipeline (Step 1) treats these documents as still in-progress and will not advance to relationship analysis until all image analysis completes.
:::

## Custom Inputs

In addition to file uploads, you can add knowledge manually:

### Q&A Pairs

```bash
curl -X POST "http://localhost:8000/api/custom-input" \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "qa",
    "title": "Company FAQ",
    "question": "What is Cortex?",
    "answer": "Cortex is an agentic knowledge base..."
  }'
```

### Plain Text

```bash
curl -X POST "http://localhost:8000/api/custom-input" \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "text",
    "title": "Product Description",
    "content": "Cortex is a powerful..."
  }'
```

### Markdown

```bash
curl -X POST "http://localhost:8000/api/custom-input" \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "markdown",
    "title": "Technical Documentation",
    "content": "# Overview\n\nCortex provides..."
  }'
```

## Viewing Documents

You can view the original uploaded file for any document. The behavior depends on the file type:

- **Markdown files (`.md`)**: Rendered in an in-app viewer with full Markdown formatting (headings, tables, code blocks, etc.)
- **PDF files**: Opened in a new browser tab for inline viewing
- **Other file types**: Opened in a new tab — the browser decides whether to display or download the file

```bash
# Get the original file (returns the file with appropriate Content-Type)
curl "http://localhost:8000/api/documents/doc_abc123/file" \
  -H "X-API-Key: your-api-key" \
  -o document.pdf
```

In the UI, click the **eye icon** on any document card to view it.

## Document Management

### List All Documents

```bash
curl "http://localhost:8000/api/documents" \
  -H "X-API-Key: your-api-key"
```

### Delete a Document

When you delete a document, Cortex automatically:

1. **Cancels active processing** - If the document is being processed, the task is stopped
2. **Removes chunks** - All text chunks are deleted
3. **Cleans up entities** - Orphaned entities (only in this document) are removed
4. **Cleans up communities** - Empty communities are deleted

```bash
curl -X DELETE "http://localhost:8000/api/documents/doc_abc123" \
  -H "X-API-Key: your-api-key"
```

**Response:**

```json
{
  "message": "Document deleted successfully",
  "processing_cancelled": true,
  "orphaned_entities_removed": 15,
  "orphaned_communities_removed": 2
}
```

### Reprocess a Document

Useful after updating extraction settings:

```bash
curl -X POST "http://localhost:8000/api/documents/doc_abc123/reprocess" \
  -H "X-API-Key: your-api-key"
```

### Bulk Download (ZIP)

Download multiple documents as a ZIP archive. Supports large instances (1000+ documents) with ZIP64 and streaming:

```bash
curl -X POST "http://localhost:8000/api/documents/download-zip" \
  -H "Content-Type: application/json" \
  -d '{"document_ids": ["doc_abc123", "doc_def456"]}' \
  -o documents.zip
```

**Response:** A streamed `application/zip` file containing the original uploaded files. Duplicate filenames are automatically disambiguated (e.g., `report.pdf`, `report (1).pdf`).

In the UI, select documents on the **Documents** page and click the **Download** button in the bulk actions toolbar.

### Bulk Delete

Delete multiple documents at once. All active processing is cancelled before deletion:

```bash
curl -X POST "http://localhost:8000/api/documents/delete" \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{"document_ids": ["doc_abc123", "doc_def456"]}'
```

**Response:**

```json
{
  "message": "Successfully deleted 2 document(s)",
  "deleted_count": 2,
  "processing_cancelled": 1,
  "orphaned_entities_removed": 28,
  "orphaned_communities_removed": 3
}
```

### Delete All Documents

Delete the entire knowledge base (use with caution):

```bash
curl -X DELETE "http://localhost:8000/api/documents" \
  -H "X-API-Key: your-api-key"
```

This cancels all active processing tasks and removes all documents, chunks, entities, and communities.

## Configuration

Control document processing behavior via environment variables:

```bash
# Maximum file size
MAX_FILE_SIZE_MB=50

# Chunking settings
CHUNK_SIZE=500
CHUNK_OVERLAP=50
CHUNK_BY=sentence
SENTENCES_PER_CHUNK=5

# Processing concurrency
BATCH_PROCESSING_CONCURRENCY=2
CONCURRENT_EXTRACTIONS=3
```


---

## Document: Communities

Automatic detection of document clusters and theme summarization

URL: /features/communities


# Communities

import { Mermaid } from "zudoku/mermaid";

Cortex automatically detects **communities** - clusters of related documents and entities that share common themes. This enables topic discovery, document organization, and focused Q&A.

## What Are Communities?

Communities are groups of entities (and their associated documents) that are densely connected in the knowledge graph. They represent natural topic clusters within your knowledge base.

<Mermaid chart={`
graph TB
    subgraph comm1["Community 1: Machine Learning"]
        A[Neural Networks]
        B[Deep Learning]
        C[Gradient Descent]
        A --- B
        B --- C
        A --- C
    end
    
    subgraph comm2["Community 2: NLP"]
        D[Transformer]
        E[BERT]
        F[Attention]
        D --- E
        E --- F
        D --- F
    end
    
    subgraph comm3["Community 3: Computer Vision"]
        G[CNN]
        H[Image Recognition]
        I[Object Detection]
        G --- H
        H --- I
    end
    
    B -.-> D
    A -.-> G
`} />

## Detection Algorithm

Cortex uses **Leiden** (preferred) or **Louvain** algorithm for community detection, with a BFS fallback when GDS is unavailable:

<Mermaid chart={`
flowchart TD
    A["🕸️ Knowledge Graph"] --> A1["🧹 Cleanup old communities"]
    A1 --> B["📊 Graph Projection"]
    B --> C["🔍 Leiden / Louvain Algorithm"]
    C --> D["📦 Community Assignment"]
    D --> E["🤖 LLM Summarization"]
    E --> F["📊 Distribution Analysis"]

    B -.-> B1["Undirected + weighted edges"]
    B -.-> B2["Co-mention edges (weight 2.0)"]
    C -.-> C1["Weight-aware clustering"]
    E -.-> E1["Assistant prefill for JSON output"]
    F -.-> F1["Warn on pathological distributions"]
`} />

Key improvements in the detection pipeline:
- **Old community cleanup** before re-detection prevents stale data accumulation
- **Undirected projection** treats all relationships as bidirectional for community membership
- **Relationship weights** (0-10 scale) influence community membership — strongly weighted relationships pull entities together
- **Co-mention edges** connect entities that appear in the same chunk, providing structure even when direct entity-to-entity relationships are sparse
- **Distribution monitoring** logs warnings for pathological distributions (e.g., one mega-community containing >50% of entities)

## API Usage

### Detect Communities

Start community detection (runs as a background task):

```bash
curl -X POST "http://localhost:8000/api/graph/communities/detect" \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "collection_id": "default",
    "min_community_size": 3
  }'
```

**Response:**

```json
{
  "task_id": "task_abc123",
  "status": "pending",
  "message": "Community detection started"
}
```

### Track Detection Progress

```bash
curl "http://localhost:8000/api/tasks/task_abc123" \
  -H "X-API-Key: your-api-key"
```

```json
{
  "task_id": "task_abc123",
  "task_type": "community_detection",
  "status": "running",
  "progress_percent": 45.0,
  "message": "Analyzing graph structure..."
}
```

### List Communities

```bash
curl "http://localhost:8000/api/graph/communities" \
  -H "X-API-Key: your-api-key"
```

```json
{
  "communities": [
    {
      "id": "comm_1",
      "name": "Machine Learning Fundamentals",
      "description": "Core ML concepts including neural networks, optimization, and training methods",
      "document_count": 12,
      "entity_count": 156,
      "top_entities": ["Neural Network", "Deep Learning", "Gradient Descent"]
    },
    {
      "id": "comm_2",
      "name": "Natural Language Processing",
      "description": "NLP techniques, transformers, and language models",
      "document_count": 8,
      "entity_count": 98,
      "top_entities": ["Transformer", "BERT", "Attention"]
    }
  ],
  "total": 23
}
```

### Get Community Details

```bash
curl "http://localhost:8000/api/graph/communities/comm_1" \
  -H "X-API-Key: your-api-key"
```

```json
{
  "id": "comm_1",
  "name": "Machine Learning Fundamentals",
  "description": "Core ML concepts including neural networks, optimization, and training methods",
  "summary": "This community covers the foundational aspects of machine learning, with a focus on neural network architectures, training algorithms, and optimization techniques...",
  "document_count": 12,
  "entity_count": 156,
  "entities": [
    {"name": "Neural Network", "type": "Concept", "mentions": 45},
    {"name": "Deep Learning", "type": "Concept", "mentions": 38}
  ],
  "documents": [
    {"id": "doc_1", "title": "Deep Learning Fundamentals.pdf"},
    {"id": "doc_2", "title": "Neural Network Training.pdf"}
  ]
}
```

### Get Community Documents

```bash
curl "http://localhost:8000/api/graph/communities/comm_1/documents" \
  -H "X-API-Key: your-api-key"
```

### Summarize a Community

Generate an AI summary of a community's content:

```bash
curl -X POST "http://localhost:8000/api/graph/communities/comm_1/summarize" \
  -H "X-API-Key: your-api-key"
```

```json
{
  "community_id": "comm_1",
  "summary": "This community contains 12 documents focused on machine learning fundamentals. Key topics include:\n\n1. Neural network architectures (CNNs, RNNs, Transformers)\n2. Training algorithms and optimization (SGD, Adam, learning rate scheduling)\n3. Regularization techniques (dropout, batch normalization)\n\nThe documents span from introductory material to advanced research papers.",
  "key_topics": ["Neural Networks", "Optimization", "Regularization"],
  "generated_at": "2024-01-15T10:30:00Z"
}
```

## Community-Based Queries

### Search Within a Community

```bash
curl -X POST "http://localhost:8000/api/search" \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "backpropagation algorithm",
    "community_id": "comm_1"
  }'
```

### Ask About a Community

```bash
curl -X POST "http://localhost:8000/api/ask" \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "question": "What are the key topics covered in this community?",
    "collection_id": "comm_1"
  }'
```

## Regenerating Communities

Communities are **not** automatically updated when documents are added. To update:

```bash
curl -X POST "http://localhost:8000/api/graph/communities/detect" \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "force_regenerate": true
  }'
```

## Configuration

```bash
# Enable community detection
ENABLE_COMMUNITY_DETECTION=true

# Minimum entities per community
MIN_COMMUNITY_SIZE=3

# Maximum communities to detect
MAX_COMMUNITIES=50

# Enable automatic summarization
ENABLE_GRAPH_SUMMARIZATION=true

# Model for summaries (defaults to GRAPH_EXTRACTION_MODEL → OPENAI_MODEL)
# Community summarization uses the extraction model for consistent structured output
# COMMUNITY_SUMMARY_MODEL=gpt-4o-mini
```

## Use Cases

1. **Topic Discovery** - Find natural topic clusters in your documents
2. **Content Organization** - Use communities to organize collections
3. **Focused Q&A** - Ask questions about specific topic areas
4. **Knowledge Mapping** - Visualize how topics relate
5. **Onboarding** - Help new users understand the knowledge base


---

## Document: Collections

Organize documents into collections with scoped graphs and search

URL: /features/collections


# Collections

Collections help you organize documents into logical groups for easier management, scoped search, and isolated knowledge graphs.

## Why Use Collections?

- **Organization** - Group related documents together
- **Scoped Search** - Search within specific collections only
- **Isolated Graphs** - Each collection has its own entity graph
- **Access Control** - Apply different permissions per collection
- **Multi-tenant** - Separate knowledge bases for different teams/projects

## Collection Structure

```
Cortex
├── Collection: "Research Papers"
│   ├── Document: paper1.pdf
│   ├── Document: paper2.pdf
│   └── Graph: entities & relationships
│
├── Collection: "Product Documentation"
│   ├── Document: user-guide.pdf
│   ├── Document: api-reference.md
│   └── Graph: entities & relationships
│
└── Collection: "default"
    ├── Document: misc1.txt
    └── Graph: entities & relationships
```

## API Usage

### Create a Collection

```bash
curl -X POST "http://localhost:8000/api/collections" \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Research Papers",
    "description": "Academic papers on AI and ML"
  }'
```

**Response:**

```json
{
  "id": "coll_abc123",
  "name": "Research Papers",
  "description": "Academic papers on AI and ML",
  "document_count": 0,
  "created_at": "2024-01-15T10:30:00Z"
}
```

### List Collections

```bash
curl "http://localhost:8000/api/collections" \
  -H "X-API-Key: your-api-key"
```

```json
{
  "collections": [
    {
      "id": "coll_abc123",
      "name": "Research Papers",
      "document_count": 45,
      "entity_count": 892
    },
    {
      "id": "default",
      "name": "default",
      "document_count": 12,
      "entity_count": 156
    }
  ]
}
```

### Get Collection Details

```bash
curl "http://localhost:8000/api/collections/coll_abc123" \
  -H "X-API-Key: your-api-key"
```

```json
{
  "id": "coll_abc123",
  "name": "Research Papers",
  "description": "Academic papers on AI and ML",
  "document_count": 45,
  "chunk_count": 1280,
  "entity_count": 892,
  "relationship_count": 2341,
  "total_size_bytes": 52428800,
  "created_at": "2024-01-15T10:30:00Z",
  "updated_at": "2024-01-20T14:22:00Z"
}
```

### Update a Collection

Rename a collection or update its description. Both fields are optional — only the provided fields are changed.

**In the UI:** Click the pencil icon on a collection card to rename it inline. Press Enter to save or Escape to cancel.

**Via API:**

```bash
curl -X PUT "http://localhost:8000/api/collections/coll_abc123" \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "AI Research Papers",
    "description": "Updated description"
  }'
```

**Response:**

```json
{
  "id": "coll_abc123",
  "name": "AI Research Papers",
  "description": "Updated description",
  "document_count": 45,
  "entity_count": 892
}
```

> **Note:** The default collection cannot be renamed.

### Delete a Collection

```bash
curl -X DELETE "http://localhost:8000/api/collections/coll_abc123" \
  -H "X-API-Key: your-api-key"
```

> **Warning:** Deleting a collection also deletes all documents and entities within it.

## Document Management

### Upload to a Collection

```bash
# collection_id is a query parameter
curl -X POST "http://localhost:8000/api/upload?collection_id=coll_abc123" \
  -H "X-API-Key: your-api-key" \
  -F "file=@paper.pdf"
```

### Move Documents Between Collections

```bash
curl -X POST "http://localhost:8000/api/documents/move" \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "document_ids": ["doc_1", "doc_2"],
    "target_collection_id": "coll_def456"
  }'
```

### List Documents in a Collection

```bash
curl "http://localhost:8000/api/documents?collection_id=coll_abc123" \
  -H "X-API-Key: your-api-key"
```

## Collection-Scoped Operations

By default, all Ask AI modes (Chat, Deep Research) and basic search query across **all collections** — no `collection_id` is needed to search everything. To narrow results to a single collection, pass a `collection_id`; vector search, keyword search, and graph traversal will then only include data from that collection's documents.

### Search Within a Collection

```bash
curl -X POST "http://localhost:8000/api/search" \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "transformer architecture",
    "collection_id": "coll_abc123"
  }'
```

### Ask AI About a Collection

By default, Ask AI searches across **all collections**. To scope a question to a single collection:

**In the UI:** Open the Settings panel on the Ask AI page and use the **Collection Scope** dropdown. It defaults to "All Collections" — select a specific collection to filter. A persistent indicator below the input shows the active scope.

**Via API:**

```bash
# Search all collections (default — omit collection_id)
curl -X POST "http://localhost:8000/api/ask" \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "question": "What are the common themes across all papers?"
  }'

# Search a specific collection
curl -X POST "http://localhost:8000/api/ask" \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "question": "What are the common themes in these papers?",
    "collection_id": "coll_abc123"
  }'
```

### Get Collection Graph

```bash
curl "http://localhost:8000/api/graph/visualization?collection_id=coll_abc123" \
  -H "X-API-Key: your-api-key"
```

## Default Collection

Documents uploaded without a `collection_id` go to the default collection.

Configure the default collection name:

```bash
DEFAULT_COLLECTION=default
```

## Configuration

```bash
# Enable collections feature
ENABLE_COLLECTIONS=true

# Default collection for uncategorized documents
DEFAULT_COLLECTION=default
```

## Collection-Scoped API Keys

The most powerful use of collections is pairing them with **collection-scoped API keys**. A key restricted to specific collections can only see and write to those collections — everything else returns a 403. This enables true multi-tenancy from a single Cortex instance.

### Create a restricted key

```bash
curl -X POST http://localhost:8000/api/admin/api-keys \
  -H "X-API-Key: your-admin-key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Tenant A - Read Only",
    "permissions": ["read"],
    "collection_scope": "restricted",
    "allowed_collections": ["coll_abc123"]
  }'
```

### Update collection access on an existing key

```bash
curl -X PATCH http://localhost:8000/api/admin/api-keys/{id} \
  -H "X-API-Key: your-admin-key" \
  -H "Content-Type: application/json" \
  -d '{
    "collection_scope": "restricted",
    "allowed_collections": ["coll_abc123", "coll_def456"]
  }'
```

### Enforcement

| Endpoint type | Behaviour for restricted key |
|---------------|------------------------------|
| `GET /api/collections` | Returns only allowed collections |
| `GET /api/documents` | Returns only documents in allowed collections |
| `GET /api/collections/{id}` | 403 if not in allowed list |
| `GET /api/documents/{id}` | 403 if document's collection not in allowed list |
| `POST /api/upload?collection_id=...` | 403 if collection not in allowed list |
| `POST /api/ask` with `collection_id` | 403 if collection not in allowed list |

> New collections are **never** automatically accessible to existing restricted keys — access must be explicitly granted.

### In the UI

Go to **Settings → API Key Management → New Key** and select *Specific Collections* to open a multi-select picker. Each restricted key card shows an amber **N Collections** badge.

## Best Practices

1. **Meaningful Names** - Use descriptive names for easy identification
2. **Logical Grouping** - Group documents by topic, project, or source
3. **Don't Over-Segment** - Too many small collections reduce graph connectivity
4. **Regular Cleanup** - Remove unused collections periodically
5. **Use Descriptions** - Add descriptions for documentation
6. **Pair with scoped keys** - For multi-tenant deployments, give each tenant a key restricted to their collection


---

## Document: Ask AI

AI-powered question answering with RAG streaming, agentic reasoning, and real-time SSE events

URL: /features/ask-ai


# Ask AI

import { Mermaid } from "zudoku/mermaid";

Ask questions about your documents and get intelligent answers with source citations. Cortex's Ask AI feature uses Retrieval-Augmented Generation (RAG) with optional agentic reasoning for complex queries. The primary interface is the **streaming endpoint**, which delivers results in real time via Server-Sent Events (SSE).

## Frontend UI

In the Cortex frontend, Ask AI is accessed via the **Explore** section with two dedicated tabs:

| Tab | Description | Best For |
|-----|-------------|----------|
| **Deep Research** | Multi-step agentic reasoning that breaks down complex questions into sub-queries | Complex questions requiring thorough analysis |
| **Chat** | Quick hybrid search with knowledge graph context | Fast lookups and straightforward questions |

Both modes share a simplified input interface with:
- A text input field (placeholder varies by mode)
- A **Settings** button (cog icon) to configure collection scope and streaming
- A **Send** button that activates when you type
- A **collection indicator** below the input showing the active search scope

The settings dropdown allows you to:
- Toggle **Stream responses** on/off (for debugging)
- Select a **Collection Scope** — defaults to **All Collections** so searches span the entire knowledge base. Select a specific collection to narrow results to that collection's documents only.

### Response Layout

In the UI, responses are rendered in this order:

1. **Research Process** (Deep Research only) — Sub-Questions, Thinking Steps, and Reasoning Steps are displayed above the main answer content. The research process section auto-scrolls to the bottom during streaming so the latest activity is always visible.
2. **Content** — The main answer from the writer LLM.
3. **Graph Context** — Related entities, relationships, and communities.
4. **Sources** — Retrieved source documents with citations.

### Source Citation Viewer

Clicking a source opens a modal that displays the full document content with the cited chunk highlighted. The surrounding text is shown at 60% opacity while the cited chunk appears at full opacity with an accent-colored left border. The modal auto-scrolls to bring the highlighted chunk into view.

## How It Works

Both modes use a **researcher/writer agent architecture**. An LLM-driven researcher agent iteratively gathers information from the knowledge base using tool-calling, then a separate writer LLM synthesizes the gathered context into a streamed answer.

<Mermaid chart={`
flowchart TD
    Q["User Question"] --> MODE{"Mode?"}
    MODE -->|Chat| SPEED["Speed Researcher Agent"]
    MODE -->|Deep Research| QUALITY["Quality Researcher Agent"]

    SPEED --> KS1["knowledge_search (1 call, 3 queries)"]
    KS1 --> DONE1["done"]
    DONE1 --> WRITER1["Writer LLM (1200 tokens)"]

    QUALITY --> REASON["reasoning"]
    REASON --> KS2["knowledge_search (3-5+ calls)"]
    KS2 --> CS["community_search"]
    CS --> EL["entity_lookup"]
    EL --> DONE2["done"]
    DONE2 --> WRITER2["Writer LLM (4000 tokens)"]

    WRITER1 --> STREAM["SSE Stream"]
    WRITER2 --> STREAM
`} />

The researcher agent has access to these tools:

| Tool | Speed Mode | Quality Mode | Description |
|------|:----------:|:------------:|-------------|
| `knowledge_search` | 1 call | 3-5+ calls | Hybrid RRF search (vector + keyword + graph) with cross-encoder reranking. Up to 3 queries per call. |
| `community_search` | - | Yes | Search entity community summaries for thematic context |
| `entity_lookup` | - | Yes | Look up specific entities by name for deeper exploration |
| `reasoning` | - | Yes | Plan next research step (streamed to UI as thinking events) |
| `done` | Yes | Yes | Signal research completion with a summary for the writer |

## Streaming Endpoint

```
POST /api/ask/stream
```

This is the primary endpoint for asking questions. It returns a stream of Server-Sent Events containing sources, graph context, and answer tokens as they are generated.

### Authentication

All requests require the `X-API-Key` header:

```bash
curl -X POST "http://localhost:8000/api/ask/stream" \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{"question": "What is GraphRAG?"}' \
  --no-buffer
```

### Request Schema (RAGRequest)

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `question` | `string` | **required** | The question to ask |
| `top_k` | `integer` | `5` | Number of results to retrieve (1–20) |
| `use_reranking` | `boolean` | `true` | Apply cross-encoder reranking to improve relevance |
| `use_graph` | `boolean` | `true` | Include knowledge graph context in retrieval |
| `max_hops` | `integer` | `2` | Graph traversal depth (1–3 hops) |
| `use_agentic` | `boolean` | `false` | Enable deep research mode with multi-step reasoning |
| `use_fast_search` | `boolean` | `false` | Use fast vector-only search (disables hybrid/reranking) |
| `collection_id` | `string \| null` | `null` | Scope search to a specific collection |
| `conversation_history` | `ConversationMessage[]` | `null` | Previous messages for multi-turn context |
| `conversation_memory` | `object \| null` | `null` | Opt-in client-carried memory blob (see [Conversation Memory](#conversation-memory)). Omit for stateless behavior. |

### ConversationMessage Schema

```json
{
  "role": "user" | "assistant",
  "content": "Message text"
}
```

The backend keeps the most recent messages (configured by `MAX_CONVERSATION_HISTORY`, default 6).

## Two Modes

### 1. Chat Mode (Hybrid + Reranking)

The default mode for quick questions. Performs hybrid search (vector + keyword + graph), applies cross-encoder reranking, enriches with graph context, then streams the LLM answer.

```bash
curl -X POST "http://localhost:8000/api/ask/stream" \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "question": "What are the main benefits of knowledge graphs?",
    "top_k": 5,
    "use_reranking": true,
    "use_graph": true
  }' --no-buffer
```

**Event sequence:** `sources` → `graph_context` → `content` (multiple) → `done`

```
data: {"sources": [{"document_id": "doc_abc", "chunk_id": "chunk_1", "content": "Knowledge graphs provide...", "score": 0.94, "metadata": {"document_title": "Overview.pdf"}}]}

data: {"graph_context": {"entities": [...], "relationships": [...], "chunks": [...]}}

data: {"content": "Knowledge graphs "}
data: {"content": "provide several "}
data: {"content": "key benefits..."}

data: {"done": true}
```

### 2. Deep Research Mode (Agentic)

For complex questions requiring multi-angle investigation. The researcher agent iteratively searches, explores communities, looks up entities, and cross-references information before handing off to the writer for a comprehensive answer (up to 4000 tokens).

Requires `ENABLE_AGENTIC_RAG=true` and `ENABLE_AGENT_RESEARCH=true` on the server.

```bash
curl -X POST "http://localhost:8000/api/ask/stream" \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "question": "Compare the methodologies used in papers A and B",
    "use_agentic": true
  }' --no-buffer
```

**Event sequence:** `thinking` (reasoning + search progress) → `retrieval` (per search) → `sources` → `graph_context` → `retrieval_stats` → `content` (multiple) → `done`

```
data: {"thinking": "Starting research..."}

data: {"thinking": "The user wants to compare methodologies in papers A and B. I'll start by searching for each paper's methodology separately."}
data: {"thinking": "Searching: Paper A methodology, Paper B methodology, methodology comparison"}
data: {"retrieval": "Found 8 sources"}

data: {"thinking": "Good results on Paper A. I should also check for community-level context on research methodologies."}
data: {"retrieval": "Found 2 relevant communities"}

data: {"thinking": "Let me search for specific differences and limitations."}
data: {"thinking": "Searching: Paper A limitations, Paper B strengths weaknesses, methodology comparison criteria"}
data: {"retrieval": "Found 6 sources"}

data: {"thinking": "Comprehensive coverage achieved. Ready to wrap up."}

data: {"sources": [...]}
data: {"graph_context": {"entities": [...], "relationships": [...], "communities": [...]}}
data: {"retrieval_stats": {"total_sources_considered": 14, "unique_sources": 11, "search_calls": 2, "communities_used": 2}}

data: {"content": "Both papers "}
data: {"content": "approach the problem "}
data: {"content": "differently..."}

data: {"done": true, "communities_used": [1, 4]}
```

<Mermaid chart={`
flowchart TD
    Q["Complex Question"] --> R["Researcher Agent Loop"]
    R --> REASON["reasoning: plan next step"]
    REASON --> KS["knowledge_search (3 queries)"]
    KS --> REASON2["reasoning: reflect on results"]
    REASON2 --> CS["community_search"]
    CS --> REASON3["reasoning: identify gaps"]
    REASON3 --> KS2["knowledge_search (follow-up)"]
    KS2 --> EL["entity_lookup (key entities)"]
    EL --> DONE["done (summary)"]
    DONE --> WRITER["Writer: synthesize answer"]
    WRITER --> STREAM["Stream Response"]
`} />

## SSE Event Reference

Every event is a JSON object on a `data:` line. Each event contains exactly one of these keys:

| Event Key | Type | Mode | Description |
|-----------|------|------|-------------|
| `status` | `{stage, message}` | All (if `stream_reasoning_steps`) | Current pipeline stage — `analyzing`/`searching`/`reranking`/`generating`. Drives a live "working" indicator; removes the silent pre-token window |
| `content` | `string` | All | A token of the streamed answer |
| `sources` | `SearchResult[]` | All | Retrieved source documents with scores |
| `graph_context` | `object` | All | Knowledge graph entities, relationships, and community data |
| `thinking` | `string` | Deep Research | Status message describing the current reasoning step |
| `sub_questions` | `string[]` | Deep Research | The decomposed research sub-questions |
| `retrieval` | `string` | Deep Research | Per-sub-question retrieval progress |
| `retrieval_stats` | `object` | Deep Research | Summary: `total_sources`, `unique_sources`, `communities_used` |
| `done` | `boolean` | All | `true` when the stream is complete |
| `error` | `string` | All | Error message if something went wrong |
| `communities_used` | `number[]` | Deep Research | Community IDs used, included in the `done` event |
| `memory_update` | `object` | When `conversation_memory` sent | Updated memory blob to store and replay next turn (see [Conversation Memory](#conversation-memory)) |

Each source object in `sources` also carries a stable `sid` (string) for cross-turn citation continuity.

During silent windows (≥ 8 s with no event), the stream also emits SSE comment keep-alives (`: ping`) — ignored by the SSE spec and clients, present only to prevent proxy idle-timeouts.

### Source Object Shape

```json
{
  "document_id": "abc123",
  "chunk_id": "chunk_456",
  "content": "The relevant text from the document...",
  "score": 0.94,
  "metadata": {
    "document_title": "Report.pdf",
    "collection_id": "research"
  }
}
```

## Conversation History

Maintain context across multiple questions by passing previous messages:

```bash
curl -X POST "http://localhost:8000/api/ask/stream" \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "question": "Can you elaborate on the third point?",
    "conversation_history": [
      {"role": "user", "content": "What are the benefits of knowledge graphs?"},
      {"role": "assistant", "content": "Knowledge graphs provide: 1) Structured relationships, 2) Semantic understanding, 3) Discovery of hidden connections."}
    ]
  }' --no-buffer
```

The backend automatically trims conversation history to the most recent messages (default: 6, configurable via `MAX_CONVERSATION_HISTORY`).

## Conversation Memory

For long conversations, raw history trimming silently forgets older turns. **Conversation Memory** is an opt-in, **client-carried** alternative: the backend stays stateless, the client round-trips an opaque `conversation_memory` blob, and the agent curates a bounded, knowledge-grounded context from it each turn. Omit the field for today's stateless behavior — it's fully backward-compatible.

**How to use it:**
1. Start with `"conversation_memory": {}` (or omit on turn 1).
2. Read the `memory_update` SSE event emitted at the end of each turn.
3. Send that blob back as `conversation_memory` on the next turn (and keep sending the full `conversation_history`).

```jsonc
// memory_update event payload — store it client-side, send it back next turn
{
  "memory_update": {
    "version": 3,
    "transcript": { "summary": "…rolling summary of older turns…", "summarized_count": 8 },
    "facts": ["Project codename: BlueFalcon", "Deadline: Friday"],
    "open_questions": [],
    "intent": "The user wants concise answers in German.",
    "source_ledger": [{ "sid": "s_7fd8dcf10785", "filename": "doc.pdf", "gist": "…" }],
    "kg_context": { "entities": [], "communities": [] }
  }
}
```

**What it gives you:**
- **No amnesia** — older turns fold into a rolling summary plus durable `facts` / `open_questions` / `intent`, rebuilt into a small fixed context each turn (lower cost and latency than ever-growing history fed twice).
- **Citation continuity** — every source in the `sources` event carries a conversation-stable `sid`, accumulated in `source_ledger`, so a source cited in turn 2 keeps its identity in turn 5.
- **Memory fast-path** — follow-ups answerable from memory alone ("summarize that", "why?", "in German") skip retrieval entirely for a fast, cheap response (toggle with `ENABLE_MEMORY_FAST_PATH`).

Compaction runs *after* the answer streams (a cheap fast-model call), so it adds no user-visible latency. Treat the blob as **opaque** — its shape may grow; only ever store and replay what the server returns.

## Collection Scope

By default, both Chat and Deep Research search across **all collections** in your knowledge base. This means every document you've ingested is available for retrieval regardless of which collection it belongs to.

To narrow results to a specific collection, pass `collection_id` in your API request:

```bash
curl -X POST "http://localhost:8000/api/ask/stream" \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "question": "What were the Q4 results?",
    "collection_id": "financial-reports"
  }' --no-buffer
```

When `collection_id` is omitted (or `null`), all collections are searched. When provided, all retrieval steps (vector search, keyword search, and graph traversal) are filtered to that collection's data. Collection scoping works identically in both Chat and Deep Research modes.

**In the UI:** Click the Settings (cog) icon next to the send button and use the **Collection Scope** dropdown. The dropdown defaults to **All Collections** and shows a persistent indicator below the input confirming the active scope. Select a specific collection to filter, or select "All Collections" to search everything.

## Frontend Integration (JavaScript/TypeScript)

Consume the SSE stream using `fetch` and `ReadableStream`:

```typescript
async function* askStream(
  baseUrl: string,
  apiKey: string,
  question: string,
  options: {
    topK?: number;
    useReranking?: boolean;
    useGraph?: boolean;
    useAgentic?: boolean;
    useFastSearch?: boolean;
    conversationHistory?: { role: string; content: string }[];
    collectionId?: string;
  } = {}
) {
  const res = await fetch(`${baseUrl}/api/ask/stream`, {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      "X-API-Key": apiKey,
    },
    body: JSON.stringify({
      question,
      top_k: options.topK ?? 5,
      use_reranking: options.useReranking ?? true,
      use_graph: options.useGraph ?? true,
      use_agentic: options.useAgentic ?? false,
      use_fast_search: options.useFastSearch ?? false,
      conversation_history: options.conversationHistory,
      ...(options.collectionId
        ? { collection_id: options.collectionId }
        : {}),
    }),
  });

  if (!res.ok) {
    const error = await res.json().catch(() => ({ detail: "Stream failed" }));
    throw new Error(error.detail || `HTTP ${res.status}`);
  }

  const reader = res.body?.getReader();
  if (!reader) throw new Error("No response body");

  const decoder = new TextDecoder();
  let buffer = "";

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    buffer += decoder.decode(value, { stream: true });
    const lines = buffer.split("\n");
    buffer = lines.pop() || "";

    for (const line of lines) {
      if (line.startsWith("data: ")) {
        try {
          yield JSON.parse(line.slice(6));
        } catch {
          // skip malformed events
        }
      }
    }
  }
}

// Usage
for await (const event of askStream("http://localhost:8000", "your-api-key", "What is GraphRAG?")) {
  if (event.sources) console.log("Sources:", event.sources);
  if (event.graph_context) console.log("Graph:", event.graph_context);
  if (event.content) process.stdout.write(event.content);
  if (event.thinking) console.log("[Thinking]", event.thinking);
  if (event.done) console.log("\n--- Done ---");
  if (event.error) console.error("Error:", event.error);
}
```

## Python Integration

### Using `httpx` (recommended)

```python
import httpx
import json

def ask_stream(base_url: str, api_key: str, question: str, **kwargs):
    """Stream answers from Cortex's Ask AI endpoint."""
    payload = {"question": question, **kwargs}

    with httpx.stream(
        "POST",
        f"{base_url}/api/ask/stream",
        headers={"X-API-Key": api_key, "Content-Type": "application/json"},
        json=payload,
        timeout=60.0,
    ) as response:
        response.raise_for_status()
        buffer = ""
        for chunk in response.iter_text():
            buffer += chunk
            while "\n" in buffer:
                line, buffer = buffer.split("\n", 1)
                if line.startswith("data: "):
                    event = json.loads(line[6:])
                    yield event

# Standard mode
for event in ask_stream("http://localhost:8000", "your-api-key", "What is GraphRAG?"):
    if "content" in event:
        print(event["content"], end="", flush=True)
    elif "sources" in event:
        print(f"\n[{len(event['sources'])} sources retrieved]")
    elif "done" in event:
        print("\n--- Complete ---")

# Deep research mode
for event in ask_stream(
    "http://localhost:8000",
    "your-api-key",
    "Compare the approaches in these papers",
    use_agentic=True,
    top_k=10,
):
    if "thinking" in event:
        print(f"  [{event['thinking']}]")
    elif "sub_questions" in event:
        for i, q in enumerate(event["sub_questions"], 1):
            print(f"  Sub-Q {i}: {q}")
    elif "retrieval" in event:
        print(f"  {event['retrieval']}")
    elif "content" in event:
        print(event["content"], end="", flush=True)
    elif "done" in event:
        print("\n--- Complete ---")
```

### Using `requests`

```python
import requests
import json

def ask_stream(base_url: str, api_key: str, question: str, **kwargs):
    payload = {"question": question, **kwargs}

    with requests.post(
        f"{base_url}/api/ask/stream",
        headers={"X-API-Key": api_key, "Content-Type": "application/json"},
        json=payload,
        stream=True,
        timeout=60,
    ) as response:
        response.raise_for_status()
        for line in response.iter_lines(decode_unicode=True):
            if line and line.startswith("data: "):
                yield json.loads(line[6:])
```

### Multi-Turn Conversation

```python
history = []

def chat(question: str):
    """Ask a follow-up question with conversation context."""
    answer_parts = []
    for event in ask_stream(
        "http://localhost:8000",
        "your-api-key",
        question,
        conversation_history=history,
    ):
        if "content" in event:
            answer_parts.append(event["content"])
            print(event["content"], end="", flush=True)
        elif "done" in event:
            print()

    # Update history for next turn
    answer = "".join(answer_parts)
    history.append({"role": "user", "content": question})
    history.append({"role": "assistant", "content": answer})

chat("What are the main findings?")
chat("Can you elaborate on the second point?")
chat("How does that compare to the introduction?")
```

## Configuration

```bash
# Agentic RAG (Deep Research)
ENABLE_AGENTIC_RAG=true          # Enable deep research mode

# Agent-Based Research Pipeline
ENABLE_AGENT_RESEARCH=true       # Use agent pipeline for deep research (set false for legacy)
ENABLE_AGENT_CHAT=true           # Use agent pipeline for standard chat (required for skills)
# RESEARCHER_MAX_ITERATIONS_SPEED=5    # Agent iterations for chat mode
# RESEARCHER_MAX_ITERATIONS_QUALITY=10 # Agent iterations for deep research
# WRITER_MAX_TOKENS_SPEED=1200         # Max output tokens for chat
# WRITER_MAX_TOKENS_QUALITY=4000       # Max output tokens for deep research

# Conversation
MAX_CONVERSATION_HISTORY=6       # Messages to keep for context

# Visibility
STREAM_REASONING_STEPS=true      # Show thinking steps in deep research mode
SHOW_RETRIEVAL_STATS=true        # Include retrieval_stats events

# Prompt security
PROMPT_SECURITY=true             # Sanitize input and filter harmful output
```

### Agent vs Legacy Pipeline

The agent pipeline (`ENABLE_AGENT_RESEARCH=true`, default) provides adaptive research depth and reasoning transparency but has specific requirements and trade-offs:

| | Agent Pipeline | Legacy Pipeline |
|---|---|---|
| **LLM requirement** | Must support **function calling / tool use** (OpenAI `tools` parameter) | Any OpenAI-compatible chat endpoint |
| **Compatible models** | GPT-4o, GPT-4o-mini, Claude, Mistral Large, Command R+ | Any model (including local Ollama/vLLM) |
| **Token usage** | 3-5x higher (multiple researcher iterations) | Lower (2 LLM calls: decompose + synthesize) |
| **Latency** | 15-30s typical (4-8 LLM round-trips) | 5-10s typical (2 LLM calls) |
| **Research depth** | Adaptive — agent decides when to dig deeper | Fixed — always decomposes into N sub-questions |
| **Behavior** | Non-deterministic — agent chooses search queries dynamically | Deterministic — fixed decompose → search → synthesize path |
| **Transparency** | Reasoning tool streams agent's thought process | Hard-coded status messages |

Set `ENABLE_AGENT_RESEARCH=false` if your model doesn't support function calling, or if you prefer lower cost/latency with predictable behavior.

## Prompt Security

Cortex includes built-in protection against prompt injection attacks:

```bash
PROMPT_SECURITY=true
```

When enabled:
- Validates and sanitizes user input
- Injects anti-manipulation instructions into the system prompt
- Filters potentially harmful outputs
- Returns safe refusal messages when attacks are detected


---

## Document: Security

Security features and best practices for Cortex

URL: /guides/security


# Security

import { Mermaid } from "zudoku/mermaid";

Cortex includes several security features to protect your data and prevent abuse. This guide covers security configuration and best practices.

## Prompt Security

Cortex includes built-in protection against **prompt injection attacks** - attempts to manipulate the AI through malicious input.

### How It Works

<Mermaid chart={`
flowchart TD
    A["👤 User Input"] --> B["🔍 Input Validation"]
    B --> C["📏 Length Limits"]
    C --> D["🛡️ Anti-Injection"]
    D --> E["🤖 LLM Processing"]
    E --> F["🧹 Output Filtering"]
    F --> G["✅ Safe Response"]
    
    B -.-> B1["Check for malicious patterns"]
    C -.-> C1["Enforce max length"]
    D -.-> D1["Add defensive prompts"]
    F -.-> F1["Remove harmful content"]
`} />

### Configuration

```bash
# Enable prompt security (recommended)
PROMPT_SECURITY=true
```

### What It Protects Against

- **Jailbreak attempts** - Trying to bypass AI safety guidelines
- **Instruction injection** - Inserting malicious instructions
- **Data exfiltration** - Attempting to extract training data
- **System prompt extraction** - Trying to reveal system prompts
- **Role manipulation** - Pretending to be a different entity

### Safe Refusal

When an attack is detected, Cortex returns a safe refusal:

```json
{
  "answer": "I can't process that request. Please rephrase your question about the documents.",
  "security_flag": true
}
```

## API Key Security

### Strong Keys

Use cryptographically secure API keys:

```bash
# Generate a secure key
openssl rand -base64 32

# Example format
ADMIN_API_KEY=cortex_admin_Kj8nP2mQ9xL5vR7wY3hT
```

### Key Permissions

Apply the principle of least privilege:

| Use Case | Recommended Permissions |
|----------|-------------------------|
| Search/Q&A only | `read` |
| Document management | `read`, `write` |
| Full automation | `read`, `write`, `delete` |
| Admin operations | `admin` |

### Key Expiration

Set expiration dates for temporary access:

```json
{
  "name": "Contractor Access",
  "permissions": ["read"],
  "expires_at": "2024-03-31T23:59:59Z"
}
```

## Session Security

### JWT Configuration

```bash
# Strong session secret (32+ characters)
SESSION_SECRET=a-very-long-random-string-at-least-32-chars

# Generate with:
openssl rand -base64 32
```

### Session Best Practices

- Use HTTPS in production
- Set appropriate session timeouts
- Rotate session secrets periodically

## Secret Encryption at Rest

User-supplied secrets — git connector personal access tokens and secret-typed
skill config fields — are encrypted at rest when `ENCRYPTION_KEY` is set:

```bash
# Generate a key
python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"

ENCRYPTION_KEY=<generated-key>
```

- **Enable anytime**: existing plaintext secrets are encrypted automatically on
  the next startup. Without a key, secrets are stored in plaintext and a
  warning is logged at startup.
- **Key rotation** (zero downtime): prepend the new key —
  `ENCRYPTION_KEY=<new-key>,<old-key>` — restart (everything is re-encrypted
  with the new key), then drop the old key.
- **Back up the key**: a lost key makes stored secrets unrecoverable; affected
  git syncs and skill activations fail with a clear "re-enter the credential"
  error.
- **Exports are credential-free**: library exports strip skill secrets, and
  git connections are never exported.

## Input Validation

All user inputs are validated:

### File Uploads

```bash
# Maximum file size
MAX_FILE_SIZE_MB=50
```

Validation includes:
- File size limits
- Allowed file types (PDF, TXT, MD, DOCX)
- Malware scanning (if configured)

### Query Limits

- Maximum query length enforced
- Request body size limits
- Rate limiting per API key

## Rate Limiting

Protect against abuse with rate limiting:

```bash
# Configure rate limits
RATE_LIMIT_REQUESTS=100
RATE_LIMIT_WINDOW=60  # seconds
```

Response headers include rate limit info:

```
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 95
X-RateLimit-Reset: 1705329600
```

## CORS Configuration

Restrict which origins can access your API:

```bash
# Development (allow all)
CORS_ORIGINS=*

# Production (restrict to your domains)
CORS_ORIGINS=https://cortex.yourdomain.com,https://app.yourdomain.com
```

## Network Security

### Firewall Rules

Block direct access to internal services:

```bash
# Only expose frontend (3000) and API (8000)
# Block direct Neo4j access (7474, 7687)
ufw allow 80/tcp
ufw allow 443/tcp
ufw deny 7474/tcp
ufw deny 7687/tcp
```

### HTTPS

Always use HTTPS in production:

```nginx
server {
    listen 80;
    return 301 https://$host$request_uri;
}

server {
    listen 443 ssl http2;
    ssl_certificate /path/to/cert.pem;
    ssl_certificate_key /path/to/key.pem;
    
    # Modern SSL configuration
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256;
    ssl_prefer_server_ciphers off;
}
```

## Data Protection

### At Rest

- Neo4j Enterprise Edition supports encryption at rest
- Use encrypted volumes for Docker data

### In Transit

- All API communication should use HTTPS
- Internal service communication uses Docker network isolation

### Secrets Management

Never commit secrets to version control:

```bash
# Use .env files (in .gitignore)
cp .env.example .env

# Or use a secrets manager
docker run -e OPENAI_API_KEY=$(vault read -field=key secret/openai) ...
```

## Audit Logging

Enable audit logs for compliance:

```bash
ENABLE_AUDIT_LOG=true
AUDIT_LOG_PATH=/var/log/cortex/audit.log
```

Logged events:
- API key usage
- Document uploads/deletions
- Search queries
- Configuration changes
- Authentication events

## Security Checklist

### Before Production

- [ ] Change all default passwords
- [ ] Set strong `ADMIN_API_KEY` (not the example key)
- [ ] Set strong `SESSION_SECRET` (32+ characters)
- [ ] Set `ENCRYPTION_KEY` (and back it up) so git PATs and skill secrets are encrypted at rest
- [ ] Enable `PROMPT_SECURITY=true`
- [ ] Configure HTTPS with valid certificates
- [ ] Restrict `CORS_ORIGINS` to your domains
- [ ] Block direct access to Neo4j ports
- [ ] Review API key permissions

### Ongoing

- [ ] Rotate API keys regularly
- [ ] Monitor audit logs for suspicious activity
- [ ] Keep dependencies updated
- [ ] Regular security reviews
- [ ] Backup encryption keys securely
- [ ] Test disaster recovery procedures

## Reporting Security Issues

If you discover a security vulnerability:

1. **Do not** open a public GitHub issue
2. Email security concerns to the maintainers
3. Include details to reproduce the issue
4. Allow time for a fix before disclosure


---

## Document: Image Analysis

Extract and analyze images from documents using vision models

URL: /guides/image-analysis


# Image Analysis

Cortex automatically extracts and analyzes images from uploaded documents, making visual content searchable through RAG queries.

## Overview

The image analysis system:

- **Extracts images** from PDF, DOCX, PPTX, XLSX, and image files
- **Analyzes content** using configurable vision models
- **Makes images searchable** through semantic search
- **Integrates with RAG** for comprehensive answers

## Configuration

### Enable Vision Model (Optional)

Add vision model configuration to your `.env` file:

```bash
# OpenAI GPT-4 Vision
VISION_MODEL=gpt-4o
VISION_MODEL_API_KEY=sk-your-openai-api-key

# Or use your LiteLLM setup
VISION_MODEL=openai/gpt-4o
VISION_MODEL_API_BASE=https://litellm.deploy.qwellco.de/v1
VISION_MODEL_API_KEY=your-litellm-key
```

### Without Vision Model

If no vision model is configured, the system uses:

1. **Docling descriptions** - Built-in image analysis using SmolDocling (enabled by default)
2. **Basic metadata** - Page numbers and captions

Images are still extracted and searchable, but with less detail.

## Supported Vision Models

| Provider | Model | Configuration |
|----------|-------|---------------|
| **OpenAI** | GPT-4o | `VISION_MODEL=gpt-4o` |
| **Anthropic** | Claude 3.5 | `VISION_MODEL=claude-3-5-sonnet-20241022` |
| **Local** | LLaVA | `VISION_MODEL=llava` via Ollama |
| **Custom** | Any | OpenAI-compatible API |

### OpenAI GPT-4 Vision

```bash
VISION_MODEL=gpt-4o
VISION_MODEL_API_BASE=https://api.openai.com/v1
VISION_MODEL_API_KEY=sk-your-key
```

### Anthropic Claude

```bash
VISION_MODEL=claude-3-5-sonnet-20241022
VISION_MODEL_API_BASE=https://api.anthropic.com/v1
VISION_MODEL_API_KEY=your-anthropic-key
```

### Local Models (Ollama)

```bash
VISION_MODEL=llava
VISION_MODEL_API_BASE=http://localhost:11434/v1
VISION_MODEL_API_KEY=ollama
```

### Custom Endpoints

Any OpenAI-compatible vision API:

```bash
VISION_MODEL=your-model-name
VISION_MODEL_API_BASE=https://your-api-endpoint.com/v1
VISION_MODEL_API_KEY=your-api-key
```

## Supported File Types

Images are extracted from:

- **PDF** - Embedded images, charts, diagrams
- **DOCX** - Word document images
- **PPTX** - PowerPoint images
- **XLSX** - Excel images
- **Images** - PNG, JPG, JPEG, TIFF, BMP

## How It Works

### Processing Flow

```
Document Upload
    ↓
Docling Conversion (with do_picture_description=True)
    ↓
Text Processing Completes (status: "completed")
    ↓
Background Image Analysis (async, concurrent via VISION_MAX_CONCURRENT, default 3)
    ├── Vision Model Analysis (if configured, takes precedence)
    ├── Docling Description (fallback)
    └── Basic Metadata (last resort)
    ↓
Create Image Chunks (chunk_index 1000+, type: image_analysis)
    ↓
Embed & Store + Graph Extraction (if enabled)
    ↓
Available in RAG Queries
```

:::note
Image analysis runs **asynchronously after text processing completes**. A document may show `status: "completed"` while images are still being analyzed. Progress is tracked per-document via `image_progress_current`/`image_progress_total` fields. The **Knowledge Graph pipeline** (Manage → Knowledge Graph → Step 1) is aware of this — it treats documents with pending image analysis as still in-progress, showing an aggregate progress bar and blocking advancement to relationship analysis until all images are processed.
:::

### Integration with RAG

Image analyses are automatically:

- **Searchable** - Via semantic search
- **Cited** - In RAG responses with source references
- **Contextual** - Included in answer generation

Example query:

```bash
curl -X POST "http://localhost:8000/api/ask" \
  -H "X-API-Key: your-key" \
  -H "Content-Type: application/json" \
  -d '{
    "question": "What charts are in the quarterly report?"
  }'
```

Response includes image analyses:

```json
{
  "answer": "The quarterly report contains three charts...",
  "sources": [
    {
      "content": "[Image Analysis] Bar chart showing Q1 revenue...",
      "metadata": {
        "type": "image_analysis",
        "image_id": "picture_1",
        "analysis_method": "vision_model"
      }
    }
  ]
}
```

## Usage

### Upload Documents

Upload documents as usual - images are processed automatically:

```bash
curl -X POST "http://localhost:8000/api/upload" \
  -H "X-API-Key: your-key" \
  -F "file=@presentation.pptx"
```

### Query for Image Content

Ask questions about visual content:

```bash
curl -X POST "http://localhost:8000/api/ask" \
  -H "X-API-Key: your-key" \
  -H "Content-Type: application/json" \
  -d '{
    "question": "What diagrams explain the architecture?"
  }'
```

### Search Images Directly

Search across all image analyses:

```bash
curl -X POST "http://localhost:8000/api/search" \
  -H "X-API-Key: your-key" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "charts showing revenue growth"
  }'
```

## Analysis Methods

The system uses three analysis methods:

### 1. Vision Model

When configured, provides detailed analysis:

- Object identification
- Text extraction (OCR)
- Chart/diagram interpretation
- Context understanding

Example output:

```
[Image Analysis] The image shows a bar chart comparing quarterly revenue 
across four regions. North America leads with $2.3M, followed by Europe 
at $1.8M, Asia Pacific at $1.2M, and Latin America at $0.7M. The chart 
title indicates Q4 2024 performance.
```

### 2. Docling Description

Built-in descriptions generated during document conversion:

- Basic image classification
- Simple descriptions
- Fast and free
- Always enabled (via `do_picture_description=True`)

Example output:

```
[Image Description] Bar chart showing regional sales data
```

### 3. Fallback

Basic metadata when other methods unavailable:

- Page number
- Caption (if present)

Example output:

```
Image on page 3 - Caption: Figure 2: Revenue breakdown
```

## Programmatic Access

### Python Example

```python
from app.services.vision_analyzer import get_vision_analyzer
from docling.document_converter import DocumentConverter

# Initialize
analyzer = get_vision_analyzer()
converter = DocumentConverter()

# Convert document
result = converter.convert("report.pdf")
doc = result.document

# Analyze images
analyses = await analyzer.analyze_all_images(doc)

for analysis in analyses:
    print(f"Image: {analysis.image_id}")
    print(f"Method: {analysis.analysis_method}")
    print(f"Description: {analysis.description}")
```

### Extract Images to Disk

```python
from pathlib import Path

# Save all images from document
saved = analyzer.extract_and_save_images(
    docling_doc=doc,
    output_dir=Path("./extracted_images")
)

for image_id, file_path in saved:
    print(f"Saved {image_id} to {file_path}")
```

## Troubleshooting

### No Images Extracted

**Problem**: No images found in document

**Solutions**:

- Verify document contains embedded images (not vector graphics)
- Check if images are rendered as text/shapes
- Ensure document conversion succeeds

### Vision API Errors

**Problem**: Vision model returns errors

**Solutions**:

- Verify API key is valid
- Check API endpoint URL
- Confirm model name is correct
- Review API quotas and rate limits
- Check image size limits (usually ~20MB)

### Poor Descriptions

**Problem**: Image descriptions lack detail

**Solutions**:

- Use more capable vision model (GPT-4o, Claude 3.5)
- Customize analysis prompt (advanced)
- Ensure images are clear and high-resolution
- Check for image corruption

### Slow Processing

**Problem**: Image analysis takes too long

**Solutions**:

- Increase `VISION_MAX_CONCURRENT` (default 3) to process more images in parallel
- Use a faster vision model
- Reduce image resolution before upload
- Consider local vision model (Ollama/LLaVA)

## Performance Considerations

### Processing Time

- Each image adds 2-5 seconds processing time
- Depends on vision model response time
- Images within a document are processed concurrently (controlled by `VISION_MAX_CONCURRENT`, default 3)

### API Costs

Vision model usage incurs costs:

- OpenAI GPT-4o: ~$0.01-0.03 per image
- Claude 3.5: ~$0.003-0.015 per image
- Local models: Free (but requires GPU)

### Optimization Tips

- Use Docling descriptions for simple images
- Process only necessary documents with vision model
- Consider batch processing during low-usage periods

## Advanced Configuration

### Custom Analysis Prompts

Modify the analysis prompt in `vision_analyzer.py`:

```python
analysis_prompt = """
Analyze this business chart. Identify:
1. Chart type (bar, line, pie, etc.)
2. Data trends and key insights
3. Notable data points
4. Title and axis labels
"""
```

### Pipeline Options

Docling is configured with image description enabled by default:

```python
from docling.datamodel.pipeline_options import PdfPipelineOptions

pipeline_options = PdfPipelineOptions()
pipeline_options.do_picture_description = True  # Enable automatic image descriptions
pipeline_options.generate_page_images = True
pipeline_options.images_scale = 2.0  # Higher resolution
```

This enables Docling's built-in vision model (SmolDocling) to generate image descriptions during conversion.

## Best Practices

### Document Preparation

- Use high-resolution images when possible
- Ensure text in images is legible
- Include descriptive captions in documents

### Vision Model Selection

- **GPT-4o**: Best for complex charts, diagrams, mixed content
- **Claude 3.5**: Excellent for text extraction and detailed descriptions
- **Local models**: Good for privacy-sensitive or cost-conscious use cases

### Query Formulation

Ask specific questions about visual content:

- ✅ "What does the revenue trend chart show?"
- ✅ "Explain the architecture diagram"
- ✅ "What are the key findings in the scatter plot?"
- ❌ "Tell me about the document"

## Related Guides

- [Document Processing](./document-processing) - How documents are converted and chunked
- [RAG Queries](./rag-queries) - Querying your knowledge base
- [Configuration](./configuration) - System configuration options


---

## Document: Image Analysis - Advanced

Technical details and advanced configuration for image analysis

URL: /guides/image-analysis-advanced


# Image Analysis - Advanced

Advanced configuration and technical details for the image analysis system.

## Architecture

### System Components

The image analysis system consists of:

1. **Vision Analyzer Service** - Core extraction and analysis logic
2. **Document Processor** - Integration with document processing pipeline
3. **Configuration** - Vision model settings and fallbacks
4. **Storage** - Image chunks in Neo4j database

### Data Flow

```
Upload API
    ↓
DocumentProcessor._process_document()
    ↓
DoclingConverter.run() → DoclingDocument (with do_picture_description=True)
    ↓
VisionAnalyzer.analyze_all_images()
    ↓
    ├─ extract_images_from_document()
    │   └─ _get_pil_image_from_picture()
    │
    ├─ analyze_image() [for each image]
    │   ├─ analyze_image_with_vision_model() [if configured]
    │   ├─ analyze_image_with_docling() [uses pre-generated description]
    │   └─ fallback metadata
    │
    └─ Create DocumentChunk with image analysis
        ↓
    Embed image description
        ↓
    Store in Neo4j
```

**Note**: Docling generates image descriptions during the conversion step when `do_picture_description=True` is enabled. The `analyze_image_with_docling()` method retrieves these pre-generated descriptions.

## Vision Analyzer Service

### Core Methods

#### Extract Images

```python
def extract_images_from_document(
    self, 
    docling_doc: DoclingDocument
) -> list[ExtractedImage]:
    """Extract all images from a DoclingDocument.
    
    Returns:
        List of ExtractedImage objects with:
        - image_id: Unique identifier
        - pil_image: PIL Image object
        - page_number: Page location
        - bbox: Bounding box coordinates
        - caption: Document caption (if any)
        - existing_description: Docling description (if any)
    """
```

#### Analyze with Vision Model

```python
async def analyze_image_with_vision_model(
    self,
    pil_image: Image.Image,
    prompt: Optional[str] = None
) -> Optional[str]:
    """Analyze image using configured vision model.
    
    Process:
    1. Convert PIL image to base64 data URL
    2. Prepare OpenAI-compatible request
    3. Call vision API
    4. Return description or None on failure
    
    Timeout: 60 seconds
    """
```

#### Smart Analysis

```python
async def analyze_image(
    self,
    extracted_image: ExtractedImage,
    force_vision_model: bool = False,
    custom_prompt: Optional[str] = None
) -> ImageAnalysisResult:
    """Analyze with automatic method selection.
    
    Priority:
    1. Vision model (if configured and available)
    2. Docling description (pre-generated during conversion)
    3. Basic metadata (fallback)
    
    Returns:
        ImageAnalysisResult with description, method, and metadata
    """
```

#### Docling Fallback

```python
def analyze_image_with_docling(
    self, 
    extracted_image: ExtractedImage
) -> Optional[str]:
    """Return Docling's pre-generated image description from conversion.
    
    NOTE: This requires do_picture_description=True in the Docling pipeline
    options. If not enabled during conversion, this will always return None.
    
    The description is generated by Docling's built-in vision model (SmolDocling)
    during the document conversion step, not on-demand.
    """
```

### Image Extraction Details

#### PIL Image Extraction

```python
def _get_pil_image_from_picture(
    picture: PictureItem, 
    doc: DoclingDocument
) -> Optional[Image.Image]:
    """Extract PIL image from Docling PictureItem.
    
    Methods:
    1. picture.get_image(doc) - Direct extraction
    2. Base64 data URL decoding
    3. File path loading
    
    Handles:
    - Embedded base64 images
    - File references
    - Multiple image formats
    """
```

#### Supported Image Sources

- **Embedded base64**: `data:image/png;base64,...`
- **File references**: `file:///path/to/image.png`
- **Direct extraction**: Via Docling's `get_image()` method

### Base64 Encoding

```python
def _pil_to_data_url(pil_image: Image.Image) -> str:
    """Convert PIL image to data URL.
    
    - PNG format for RGBA images
    - JPEG format for RGB images
    - Automatic mode conversion
    
    Returns: data:image/{type};base64,{data}
    """
```

## Vision API Integration

### Request Format

```json
{
  "model": "gpt-4o",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Analyze this image..."
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "data:image/jpeg;base64,..."
          }
        }
      ]
    }
  ],
  "max_tokens": 1000
}
```

### Response Handling

```python
# Success
{
    "choices": [{
        "message": {
            "content": "This image shows..."
        }
    }]
}

# Error handling
if response.status_code != 200:
    logger.error(f"Vision API error: {response.status_code}")
    return None
```

### Timeout Configuration

```python
async with httpx.AsyncClient(timeout=60.0) as client:
    response = await client.post(...)
```

Adjust timeout based on:
- Image size
- Vision model speed
- Network conditions

## Custom Analysis Prompts

### Default Prompt

```python
DEFAULT_PROMPT = """
Analyze this image in detail. Describe what you see, including:
1. Main objects, people, or elements visible
2. Text visible in the image (if any)
3. Charts, diagrams, or data visualizations (if any)
4. Overall context and purpose of the image
5. Any relevant details that would help understand the document

Provide a comprehensive description suitable for document retrieval and understanding.
"""
```

### Custom Prompts by Use Case

#### Business Charts

```python
CHART_PROMPT = """
Analyze this business chart. Provide:
1. Chart type (bar, line, pie, scatter, etc.)
2. Title and axis labels
3. Key data trends and patterns
4. Notable data points or anomalies
5. Overall business insights

Format for easy text search and retrieval.
"""
```

#### Technical Diagrams

```python
DIAGRAM_PROMPT = """
Analyze this technical diagram. Include:
1. Diagram type (flowchart, architecture, schema, etc.)
2. Components and their relationships
3. Data flow or process steps
4. Technical specifications visible
5. Purpose and context

Describe for technical documentation retrieval.
"""
```

#### Text-Heavy Images

```python
TEXT_PROMPT = """
Extract and analyze text from this image:
1. All visible text (OCR)
2. Document type and format
3. Layout and structure
4. Key information and data
5. Context and purpose

Optimize for text search and citation.
"""
```

### Implementing Custom Prompts

```python
# In your code
result = await analyzer.analyze_image(
    extracted_image,
    custom_prompt=CHART_PROMPT
)
```

## Storage Integration

### Image Chunk Structure

```python
image_chunk = DocumentChunk(
    id=f"{doc_id}_image_{idx}",
    document_id=doc_id,
    content=f"[Image Analysis]\n{analysis.description}",
    embedding=embedded_vector,
    chunk_index=1000 + idx,  # Separate from text chunks
    metadata={
        "type": "image_analysis",
        "image_id": analysis.image_id,
        "analysis_method": analysis.analysis_method,
        "page_number": extracted_image.page_number,
        "bbox": extracted_image.bbox,
        "caption": extracted_image.caption
    }
)
```

### Chunk Indexing

- Text chunks: `0` to `N`
- Image chunks: `1000` to `1000+M`
- Prevents overlap between text and image chunks

### Neo4j Storage

```cypher
CREATE (c:Chunk {
    id: $chunk_id,
    document_id: $doc_id,
    content: $content,
    embedding: $embedding,
    chunk_index: $index,
    metadata: $metadata
})
```

## Performance Optimization

### Image Preprocessing

```python
def preprocess_image(pil_image: Image.Image) -> Image.Image:
    """Optimize image before vision API call."""
    
    # Resize large images
    max_size = (1024, 1024)
    if pil_image.size > max_size:
        pil_image.thumbnail(max_size, Image.Resampling.LANCZOS)
    
    # Convert mode for compatibility
    if pil_image.mode == "RGBA":
        pil_image = pil_image.convert("RGB")
    
    return pil_image
```

### Batch Processing

```python
# Images are processed concurrently, controlled by VISION_MAX_CONCURRENT (default 3).
# The built-in analyze_all_images() uses asyncio.gather with a global semaphore.
# Example of the same pattern for custom usage:
async def analyze_images_batch(
    images: list[ExtractedImage],
    max_concurrent: int = 3
) -> list[ImageAnalysisResult]:
    """Process images with controlled concurrency."""

    semaphore = asyncio.Semaphore(max_concurrent)

    async def analyze_with_limit(img):
        async with semaphore:
            return await analyzer.analyze_image(img)

    return await asyncio.gather(*[
        analyze_with_limit(img) for img in images
    ])
```

### Caching (Future Enhancement)

```python
import hashlib
from functools import lru_cache

@lru_cache(maxsize=100)
def cached_vision_analysis(image_hash: str) -> str:
    """Cache vision model responses."""
    # Implementation for caching
    pass

def get_image_hash(pil_image: Image.Image) -> str:
    """Generate hash for image."""
    buffer = io.BytesIO()
    pil_image.save(buffer, format='PNG')
    return hashlib.sha256(buffer.getvalue()).hexdigest()
```

## Monitoring and Debugging

### Logging

```python
# Enable detailed logging
import logging

logging.getLogger('app.services.vision_analyzer').setLevel(logging.DEBUG)

# Log messages include:
# - Image extraction success/failure
# - Vision API call status
# - Analysis method selected
# - Processing time per image
```

### Performance Metrics

```python
import time

async def analyze_with_metrics(
    extracted_image: ExtractedImage
) -> ImageAnalysisResult:
    start = time.time()
    
    result = await analyzer.analyze_image(extracted_image)
    
    duration = time.time() - start
    logger.info(
        f"Image analysis: {result.image_id} "
        f"method={result.analysis_method} "
        f"duration={duration:.2f}s"
    )
    
    return result
```

### Error Tracking

```python
from sentry_sdk import capture_exception

try:
    result = await analyzer.analyze_image(img)
except Exception as e:
    logger.error(f"Image analysis failed: {e}")
    capture_exception(e)
    # Fall back to basic metadata
    result = create_fallback_result(img)
```

## Testing

### Unit Tests

```python
import pytest
from PIL import Image

@pytest.mark.asyncio
async def test_analyze_image_with_vision_model():
    analyzer = VisionAnalyzer()
    
    # Create test image
    test_image = Image.new('RGB', (100, 100), color='red')
    
    result = await analyzer.analyze_image_with_vision_model(
        test_image,
        prompt="What color is this image?"
    )
    
    assert result is not None
    assert "red" in result.lower()
```

### Integration Tests

```python
@pytest.mark.asyncio
async def test_document_processing_with_images():
    processor = DocumentProcessor()
    
    # Process document with images
    doc_id = await processor.process_file(
        file_path="test_document.pdf",
        filename="test.pdf",
        file_size=1024
    )
    
    # Verify image chunks created
    chunks = neo4j.get_document_chunks(doc_id)
    image_chunks = [c for c in chunks if c.metadata.get('type') == 'image_analysis']
    
    assert len(image_chunks) > 0
```

## Extending the System

### Custom Analyzers

```python
class CustomVisionAnalyzer(VisionAnalyzer):
    """Custom analyzer with specialized processing."""
    
    async def analyze_image(
        self,
        extracted_image: ExtractedImage,
        **kwargs
    ) -> ImageAnalysisResult:
        # Custom preprocessing
        preprocessed = self.preprocess(extracted_image.pil_image)
        
        # Custom analysis logic
        if self.is_chart(preprocessed):
            description = await self.analyze_chart(preprocessed)
        else:
            description = await super().analyze_image(
                extracted_image,
                **kwargs
            )
        
        return ImageAnalysisResult(
            image_id=extracted_image.image_id,
            description=description,
            analysis_method="custom"
        )
```

### Multiple Vision Models

```python
class MultiModelAnalyzer(VisionAnalyzer):
    """Use different models for different image types."""
    
    async def analyze_image(
        self,
        extracted_image: ExtractedImage,
        **kwargs
    ) -> ImageAnalysisResult:
        # Detect image type
        image_type = self.classify_image(extracted_image.pil_image)
        
        # Select appropriate model
        if image_type == "chart":
            model = "gpt-4o"
        elif image_type == "text":
            model = "claude-3-5-sonnet"
        else:
            model = "llava"
        
        # Use selected model
        return await self.analyze_with_model(
            extracted_image,
            model=model,
            **kwargs
        )
```

## Best Practices

### Production Deployment

1. **Rate Limiting**: Implement API rate limits
2. **Error Handling**: Graceful fallbacks for failures
3. **Monitoring**: Track API usage and costs
4. **Caching**: Cache frequent analyses
5. **Queue Processing**: Process images asynchronously

### Cost Management

1. Use Docling for simple images
2. Select cost-effective models for scale
3. Implement image deduplication
4. Monitor API usage per document
5. Set spending alerts

### Quality Assurance

1. Test with diverse document types
2. Validate extraction accuracy
3. Review analysis quality regularly
4. Collect user feedback
5. Fine-tune prompts as needed

## Related Resources

- [Image Analysis Guide](./image-analysis) - Basic usage and configuration
- [Document Processing](./document-processing) - Processing pipeline details
- [API Reference](../api/documents) - Upload and query endpoints


---

## Document: Deployment

Deploy Cortex to production with Docker and Coolify

URL: /guides/deployment


# Deployment

This guide covers deploying Cortex to production environments using Docker Compose and Coolify.

## Prerequisites

- A server with Docker and Docker Compose installed
- At least 4GB RAM (8GB+ recommended for production)
- Domain name with DNS configured (for HTTPS)
- OpenAI API key (or alternative LLM provider)

## Docker Compose Deployment

### Production Configuration

Create a `docker-compose.prod.yml`:

```yaml
services:
  frontend:
    build:
      context: ./frontend
      dockerfile: Dockerfile.prod
    environment:
      - NODE_ENV=production
      - NEXT_PUBLIC_API_URL=https://api.yourdomain.com
    restart: unless-stopped
    depends_on:
      - backend

  backend:
    build:
      context: ./backend
      dockerfile: Dockerfile.prod
    environment:
      - DEBUG=false
      - REQUIRE_API_KEY=true
      - NEO4J_URI=bolt://neo4j:7687
      - NEO4J_USER=neo4j
      - NEO4J_PASSWORD=${NEO4J_PASSWORD}
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - ADMIN_API_KEY=${ADMIN_API_KEY}
    volumes:
      - uploads:/app/uploads
      - custom_inputs:/app/custom_inputs
    restart: unless-stopped
    depends_on:
      neo4j:
        condition: service_healthy

  neo4j:
    image: neo4j:5-enterprise
    environment:
      - NEO4J_AUTH=neo4j/${NEO4J_PASSWORD}
      - NEO4J_PLUGINS=["apoc"]
      - NEO4J_dbms_memory_heap_initial__size=1G
      - NEO4J_dbms_memory_heap_max__size=2G
    volumes:
      - neo4j_data:/data
      - neo4j_logs:/logs
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "neo4j", "status"]
      interval: 10s
      timeout: 10s
      retries: 5

  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro
      - ./nginx/ssl:/etc/nginx/ssl:ro
    restart: unless-stopped
    depends_on:
      - frontend
      - backend

volumes:
  neo4j_data:
  neo4j_logs:
  uploads:
  custom_inputs:
```

### Deploy

```bash
# Set environment variables
export NEO4J_PASSWORD=your-secure-password
export OPENAI_API_KEY=sk-your-key
export ADMIN_API_KEY=cortex_admin_your-key

# Deploy
docker compose -f docker-compose.prod.yml up -d
```

## Coolify Deployment

Cortex includes Coolify-ready configuration for easy deployment.

### Setup Steps

1. **Create a New Resource**
   - Go to your Coolify dashboard
   - Click "New Resource" → "Docker Compose"

2. **Configure Git Repository**
   - Repository: `https://github.com/mocaOS/cortex-app`
   - Branch: `main`
   - Compose file: `coolify/docker-compose.coolify.yml`

3. **Set Environment Variables**
   
   In Coolify's environment section, add:

   ```bash
   NEO4J_PASSWORD=your-secure-password
   OPENAI_API_KEY=sk-your-api-key
   ADMIN_EMAIL=admin@yourdomain.com
   ADMIN_PASSWORD=your-admin-password
   ADMIN_API_KEY=cortex_admin_your-secure-key
   SESSION_SECRET=your-32-character-secret
   ```

4. **Configure Domains**
   - Frontend: `cortex.yourdomain.com`
   - Backend API: `api.cortex.yourdomain.com`

5. **Deploy**
   - Click "Deploy"
   - Wait for containers to start
   - Coolify handles SSL certificates automatically

### Coolify Compose File

The `coolify/docker-compose.coolify.yml` includes Coolify-specific magic variables:

```yaml
services:
  frontend:
    environment:
      - SERVICE_FQDN_FRONTEND_3000=${FRONTEND_URL:-}
    
  backend:
    environment:
      - SERVICE_FQDN_BACKEND_8000=${BACKEND_URL:-}
```

## Nginx Configuration

For manual deployments, configure Nginx as a reverse proxy:

```nginx
# /etc/nginx/nginx.conf
events {
    worker_connections 1024;
}

http {
    upstream frontend {
        server frontend:3000;
    }

    upstream backend {
        server backend:8000;
    }

    # Frontend
    server {
        listen 80;
        server_name cortex.yourdomain.com;
        return 301 https://$server_name$request_uri;
    }

    server {
        listen 443 ssl http2;
        server_name cortex.yourdomain.com;

        ssl_certificate /etc/nginx/ssl/fullchain.pem;
        ssl_certificate_key /etc/nginx/ssl/privkey.pem;

        location / {
            proxy_pass http://frontend;
            proxy_http_version 1.1;
            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection 'upgrade';
            proxy_set_header Host $host;
            proxy_cache_bypass $http_upgrade;
        }
    }

    # API
    server {
        listen 443 ssl http2;
        server_name api.cortex.yourdomain.com;

        ssl_certificate /etc/nginx/ssl/fullchain.pem;
        ssl_certificate_key /etc/nginx/ssl/privkey.pem;

        client_max_body_size 50M;

        location / {
            proxy_pass http://backend;
            proxy_http_version 1.1;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
            
            # SSE support for streaming
            proxy_buffering off;
            proxy_cache off;
        }
    }
}
```

## Health Checks

### Backend Health

```bash
curl https://api.yourdomain.com/health
```

Expected response:

```json
{
  "status": "healthy",
  "neo4j_connected": true,
  "version": "1.0.0"
}
```

### Neo4j Health

```bash
curl https://api.yourdomain.com/api/stats
```

## Scaling

### Horizontal Scaling

Scale the backend for high traffic:

```yaml
services:
  backend:
    deploy:
      replicas: 3
      resources:
        limits:
          cpus: '2'
          memory: 4G
        reservations:
          cpus: '1'
          memory: 2G
```

### Neo4j Scaling

For large knowledge bases, increase Neo4j resources:

```yaml
neo4j:
  environment:
    - NEO4J_dbms_memory_heap_initial__size=4G
    - NEO4J_dbms_memory_heap_max__size=8G
    - NEO4J_dbms_memory_pagecache_size=4G
```

## Backup & Recovery

### Backup Neo4j

```bash
# Stop writes
docker compose exec neo4j neo4j-admin database dump neo4j --to-path=/backups

# Copy backup
docker cp $(docker compose ps -q neo4j):/backups/neo4j.dump ./backups/
```

### Restore Neo4j

```bash
docker cp ./backups/neo4j.dump $(docker compose ps -q neo4j):/backups/
docker compose exec neo4j neo4j-admin database load neo4j --from-path=/backups
```

## Monitoring

### Logs

```bash
# All services
docker compose logs -f

# Specific service
docker compose logs -f backend
```

### Resource Usage

```bash
docker stats
```

## Security Checklist

- [ ] Strong `NEO4J_PASSWORD` set
- [ ] `ADMIN_API_KEY` rotated from default
- [ ] `SESSION_SECRET` is 32+ characters
- [ ] HTTPS enabled with valid certificates
- [ ] API key authentication enabled (`REQUIRE_API_KEY=true`)
- [ ] CORS configured for your domains only
- [ ] Firewall blocks direct access to Neo4j (7687)
- [ ] Regular backups scheduled


---

## Document: Data Import/Export

Export and import your entire Cortex including documents, knowledge graph, and embeddings

URL: /guides/data-transfer


# Data Import/Export

Cortex supports full instance migration through its library import/export feature. Export your entire knowledge base — documents, entities, relationships, communities, embeddings, and all graph data — as a portable ZIP archive, then import it into another instance without re-running the expensive knowledge graph generation pipeline.

## Overview

The import/export feature is accessible from **Settings > Data Management** and via the Admin API. It handles:

- All document files (PDFs, markdown, DOCX, etc.)
- Chunk data with vector embeddings
- Entity nodes with embeddings and metadata
- All entity-to-entity relationships (both per-chunk and cross-document)
- Communities and their memberships
- Collections and document assignments
- Merge history (deduplication audit trail)
- System metadata (staleness timestamps)

## Exporting

### Via the Web Interface

1. Navigate to **Settings > Data Management**
2. Review the stats summary showing your document, entity, and relationship counts
3. Click **Export Library**
4. Wait for the progress bar to complete
5. Click **Download Export** to save the ZIP file

### Via the API

```bash
# 1. Start the export task
curl -X POST http://localhost:8000/api/admin/export \
  -H "X-API-Key: your-admin-key"
```

Response:
```json
{
  "task_id": "task_abc123def456",
  "status": "pending",
  "message": "Export started"
}
```

```bash
# 2. Poll task progress
curl http://localhost:8000/api/tasks/task_abc123def456 \
  -H "X-API-Key: your-admin-key"
```

```bash
# 3. Download the completed export
curl -OJ http://localhost:8000/api/admin/export/task_abc123def456/download \
  -H "X-API-Key: your-admin-key"
```

## Importing

### Import Modes

| Mode | Behavior | When to Use |
|------|----------|-------------|
| **Clean** (default) | Requires the target instance to be completely empty. Returns an error if any data exists. | Fresh instance setup, migration to a new server |
| **Replace** | Automatically deletes all existing data before importing. Requires typing "DELETE" to confirm. | Restoring a backup, overwriting test data |

### Via the Web Interface

1. Navigate to **Settings > Data Management**
2. Select your import mode (**Clean import** or **Replace all**)
3. Drag and drop or browse for your export ZIP file
4. For Replace mode: type `DELETE` in the confirmation field
5. Click **Import Library** (or **Replace & Import**)
6. Monitor the progress bar as data is restored
7. Review the result summary showing imported counts and any warnings

### Via the API

```bash
# Clean import (target must be empty)
curl -X POST "http://localhost:8000/api/admin/import?mode=clean" \
  -H "X-API-Key: your-admin-key" \
  -F "file=@cortex-export-2026-03-27.zip"

# Replace import (auto-wipes existing data)
curl -X POST "http://localhost:8000/api/admin/import?mode=replace" \
  -H "X-API-Key: your-admin-key" \
  -F "file=@cortex-export-2026-03-27.zip"
```

Response:
```json
{
  "task_id": "task_789xyz",
  "status": "pending",
  "message": "Import started (mode: clean)"
}
```

Poll the task status to monitor progress:
```bash
curl http://localhost:8000/api/tasks/task_789xyz \
  -H "X-API-Key: your-admin-key"
```

## Export Archive Format

The export is a ZIP64 archive using NDJSON (newline-delimited JSON) for efficient streaming:

```
cortex-export-YYYY-MM-DD.zip
├── manifest.json              # Version, date, embedding model/dimension, counts
├── documents.ndjson           # Document metadata and properties
├── chunks.ndjson              # Text chunks with vector embeddings
├── entities.ndjson            # Entity nodes with embeddings
├── relationships.ndjson       # Entity-to-entity relationships
├── communities.ndjson         # Detected communities
├── community_members.ndjson   # Community-entity memberships
├── collections.ndjson         # Document collections
├── collection_members.ndjson  # Collection-document assignments
├── chunk_mentions.ndjson      # Chunk-entity links
├── merge_history.ndjson       # Deduplication audit trail
├── system_meta.ndjson         # System timestamps
└── files/                     # Original document files
    ├── {doc-id}.pdf
    ├── {doc-id}.md
    └── ...
```

The `manifest.json` contains metadata for validation:

```json
{
  "version": "1.0",
  "export_date": "2026-03-27T14:18:32Z",
  "embedding_model": "openai/text-embedding-3-small",
  "embedding_dimension": 1536,
  "stats": {
    "document_count": 30,
    "chunk_count": 1250,
    "entity_count": 631,
    "relationship_count": 496,
    "community_count": 8,
    "collection_count": 1
  }
}
```

## Embedding Compatibility

The export includes all vector embeddings to avoid costly recomputation. On import, the system checks the manifest against the target instance's embedding configuration:

- **Compatible** (same model and dimension): Embeddings are imported as-is. Vector search works immediately.
- **Incompatible** (different model or dimension): The import still proceeds, but returns a warning. Vector search results may be degraded until documents are reprocessed to regenerate embeddings.

## Important Notes

- **Concurrency**: Only one export or import can run at a time. Concurrent requests return HTTP 409.
- **API keys excluded**: API keys are instance-specific and are not included in the export.
- **File path remapping**: Document file paths are automatically remapped to the target instance's upload directory.
- **No merge mode**: Importing into an instance with existing data requires either resetting first (clean mode) or using replace mode. Merging two knowledge graphs is not supported.
- **Background tasks**: Both export and import run as background tasks with progress tracking via the standard `/api/tasks/{task_id}` endpoint.


---

## Document: Authentication

Secure your Cortex API with API key authentication

URL: /guides/authentication


# Authentication

Cortex uses API key authentication to secure endpoints. This guide covers how authentication works and how to manage API keys.

## Overview

Cortex has two authentication layers:

1. **Admin Login** - Username/password for the web interface
2. **API Keys** - Token-based authentication for API access

## Admin Login

The frontend requires admin credentials to access:

```bash
# Set in .env
ADMIN_EMAIL=admin@example.com
ADMIN_PASSWORD=your-secure-password
SESSION_SECRET=at-least-32-characters-random-string
```

Login at `http://localhost:3000/login` with these credentials.

## API Key Authentication

All API endpoints (except `/health`) require an API key.

### Using API Keys

Include the key in the `X-API-Key` header:

```bash
curl "http://localhost:8000/api/documents" \
  -H "X-API-Key: your-api-key"
```

Or use the `Authorization` header:

```bash
curl "http://localhost:8000/api/documents" \
  -H "Authorization: Bearer your-api-key"
```

## API Key Types

### Admin Key

The admin API key has full access to all endpoints:

```bash
ADMIN_API_KEY=cortex_admin_your-secure-key
```

Capabilities:
- All document operations (upload, delete, reprocess)
- Search and RAG queries
- Knowledge graph access
- Collection and community management
- API key management (create, revoke other keys)
- Background task management

### User Keys

User API keys have configurable permissions:

| Permission | Description |
|------------|-------------|
| `read` | Search, view documents, ask questions |
| `write` | Upload documents, create collections |
| `delete` | Delete documents and collections |
| `admin` | Full access including key management |

## Managing API Keys

### Create a New Key

```bash
curl -X POST "http://localhost:8000/api/admin/api-keys" \
  -H "X-API-Key: your-admin-key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Production App",
    "permissions": ["read", "write"],
    "expires_at": "2025-12-31T23:59:59Z"
  }'
```

**Response:**

```json
{
  "id": "key_abc123",
  "name": "Production App",
  "key": "cortex_user_abc123xyz789",
  "permissions": ["read", "write"],
  "expires_at": "2025-12-31T23:59:59Z",
  "created_at": "2024-01-15T10:30:00Z",
  "message": "Store this key securely - it won't be shown again"
}
```

> **Warning:** The full API key is only shown once. Store it securely.

### List API Keys

```bash
curl "http://localhost:8000/api/admin/api-keys" \
  -H "X-API-Key: your-admin-key"
```

```json
{
  "keys": [
    {
      "id": "key_abc123",
      "name": "Production App",
      "key_prefix": "cortex_user_abc...",
      "permissions": ["read", "write"],
      "is_active": true,
      "expires_at": "2025-12-31T23:59:59Z",
      "last_used_at": "2024-01-15T14:22:00Z",
      "created_at": "2024-01-15T10:30:00Z"
    }
  ]
}
```

### Update a Key

```bash
curl -X PUT "http://localhost:8000/api/admin/api-keys/key_abc123" \
  -H "X-API-Key: your-admin-key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Updated Name",
    "permissions": ["read", "write", "delete"]
  }'
```

### Revoke a Key

```bash
curl -X POST "http://localhost:8000/api/admin/api-keys/key_abc123/revoke" \
  -H "X-API-Key: your-admin-key"
```

### Reactivate a Key

```bash
curl -X POST "http://localhost:8000/api/admin/api-keys/key_abc123/activate" \
  -H "X-API-Key: your-admin-key"
```

### Delete a Key

```bash
curl -X DELETE "http://localhost:8000/api/admin/api-keys/key_abc123" \
  -H "X-API-Key: your-admin-key"
```

## Permission Checks

API endpoints check permissions:

| Endpoint | Required Permission |
|----------|---------------------|
| `GET /api/documents` | `read` |
| `GET /api/search` | `read` |
| `POST /api/ask` | `read` |
| `POST /api/upload` | `write` |
| `POST /api/collections` | `write` |
| `DELETE /api/documents/*` | `delete` |
| `POST /api/admin/api-keys` | `admin` |

## Error Responses

### Missing API Key

```json
{
  "detail": "API key required. Provide X-API-Key header or api_key query parameter."
}
```
Status: `401 Unauthorized`

### Invalid API Key

```json
{
  "detail": "Invalid API key"
}
```
Status: `401 Unauthorized`

### Expired API Key

```json
{
  "detail": "API key has expired"
}
```
Status: `401 Unauthorized`

### Insufficient Permissions

```json
{
  "detail": "Permission 'delete' required for this operation"
}
```
Status: `403 Forbidden`

## Best Practices

### Key Rotation

Rotate API keys regularly:

1. Create a new key with the same permissions
2. Update your applications to use the new key
3. Revoke the old key after confirming the switch

### Least Privilege

Grant only necessary permissions:

```bash
# Read-only key for search/Q&A
{"permissions": ["read"]}

# Writer key for document management
{"permissions": ["read", "write"]}

# Full access (use sparingly)
{"permissions": ["read", "write", "delete", "admin"]}
```

### Key Expiration

Set expiration dates for temporary access:

```bash
{
  "name": "Contractor Access",
  "permissions": ["read"],
  "expires_at": "2024-03-31T23:59:59Z"
}
```

### Secure Storage

- Never commit API keys to version control
- Use environment variables or secret managers
- Rotate keys if they may have been exposed

### Audit Usage

Monitor API key usage:

```bash
curl "http://localhost:8000/api/admin/api-keys" \
  -H "X-API-Key: your-admin-key"
```

Check `last_used_at` to identify unused or suspicious keys.


---

## Document: Python Examples

Complete Python code examples for the Cortex API

URL: /examples/python


# Python Examples

Complete Python examples for interacting with the Cortex API, including a reusable client class and common operations.

## Setup

Install the required library:

```bash
pip install requests
```

## Cortex Client Class

A reusable client wrapper for all API operations:

```python
"""Cortex Python Client"""

import requests
from typing import Optional, Dict, Any, List, Iterator
from pathlib import Path


class CortexClient:
    """Client for the Cortex API."""
    
    def __init__(self, base_url: str, api_key: str):
        """
        Initialize the Cortex client.
        
        Args:
            base_url: The API base URL (e.g., "http://localhost:8000")
            api_key: Your Cortex API key
        """
        self.base_url = base_url.rstrip('/')
        self.session = requests.Session()
        self.session.headers.update({
            'X-API-Key': api_key,
            'Content-Type': 'application/json'
        })
    
    def _request(
        self, 
        method: str, 
        path: str, 
        **kwargs
    ) -> Dict[str, Any]:
        """Make an API request."""
        url = f"{self.base_url}{path}"
        response = self.session.request(method, url, **kwargs)
        response.raise_for_status()
        return response.json()
    
    # =========================================================================
    # Health & Stats
    # =========================================================================
    
    def health(self) -> Dict[str, Any]:
        """Check API health."""
        return self._request('GET', '/health')
    
    def stats(self) -> Dict[str, Any]:
        """Get knowledge base statistics."""
        return self._request('GET', '/api/stats')
    
    # =========================================================================
    # Documents
    # =========================================================================
    
    def upload_document(
        self, 
        file_path: str, 
        collection_id: Optional[str] = None,
        start_processing: bool = True
    ) -> Dict[str, Any]:
        """
        Upload a document to the knowledge base.
        
        Args:
            file_path: Path to the file to upload
            collection_id: Optional collection to add the document to
            start_processing: Whether to start processing immediately
        """
        path = Path(file_path)
        with open(path, 'rb') as f:
            files = {'file': (path.name, f, 'application/octet-stream')}
            # collection_id and start_processing are query parameters, not form data
            params = {'start_processing': str(start_processing).lower()}
            if collection_id:
                params['collection_id'] = collection_id
            
            # Remove Content-Type for multipart
            headers = {'X-API-Key': self.session.headers['X-API-Key']}
            response = requests.post(
                f"{self.base_url}/api/upload",
                files=files,
                params=params,  # Use params for query parameters
                headers=headers
            )
            response.raise_for_status()
            return response.json()
    
    def list_documents(
        self, 
        collection_id: Optional[str] = None,
        status: Optional[str] = None,
        limit: int = 100
    ) -> List[Dict[str, Any]]:
        """List documents in the knowledge base."""
        params = {'limit': limit}
        if collection_id:
            params['collection_id'] = collection_id
        if status:
            params['status'] = status
        return self._request('GET', '/api/documents', params=params)
    
    def get_document(self, doc_id: str) -> Dict[str, Any]:
        """Get document details."""
        return self._request('GET', f'/api/documents/{doc_id}')
    
    def delete_document(self, doc_id: str) -> Dict[str, Any]:
        """Delete a document."""
        return self._request('DELETE', f'/api/documents/{doc_id}')
    
    def reprocess_document(self, doc_id: str) -> Dict[str, Any]:
        """Reprocess a document."""
        return self._request('POST', f'/api/documents/{doc_id}/reprocess')
    
    # =========================================================================
    # Search
    # =========================================================================
    
    def search(
        self, 
        query: str, 
        limit: int = 10,
        collection_id: Optional[str] = None,
        search_type: str = "hybrid"
    ) -> Dict[str, Any]:
        """
        Search the knowledge base.
        
        Args:
            query: Search query
            limit: Maximum results to return
            collection_id: Optional collection to search within
            search_type: "hybrid", "vector", "keyword", or "graph"
        """
        payload = {
            'query': query,
            'limit': limit,
            'search_type': search_type
        }
        if collection_id:
            payload['collection_id'] = collection_id
        return self._request('POST', '/api/search', json=payload)
    
    # =========================================================================
    # Ask AI (RAG)
    # =========================================================================
    
    def ask(
        self,
        question: str,
        collection_id: Optional[str] = None,
        conversation_history: Optional[List[Dict]] = None,
        use_fast_search: bool = False,
        use_agentic: bool = False
    ) -> Dict[str, Any]:
        """
        Ask a question using RAG.

        Args:
            question: The question to ask
            collection_id: Optional collection to query
            conversation_history: Previous messages for context
            use_fast_search: Use fast vector-only search
            use_agentic: Use deep research mode
        """
        payload = {'question': question}
        if collection_id:
            payload['collection_id'] = collection_id
        if conversation_history:
            payload['conversation_history'] = conversation_history
        if use_fast_search:
            payload['use_fast_search'] = True
        if use_agentic:
            payload['use_agentic'] = True
        return self._request('POST', '/api/ask', json=payload)

    def ask_streaming(
        self,
        question: str,
        collection_id: Optional[str] = None,
        use_agentic: bool = False,
        use_fast_search: bool = False
    ) -> Iterator[Dict[str, Any]]:
        """
        Ask a question with streaming SSE response.

        Yields parsed event dicts as they arrive.
        """
        import json
        payload = {'question': question}
        if collection_id:
            payload['collection_id'] = collection_id
        if use_agentic:
            payload['use_agentic'] = True
        if use_fast_search:
            payload['use_fast_search'] = True

        response = self.session.post(
            f"{self.base_url}/api/ask/stream",
            json=payload,
            stream=True
        )
        response.raise_for_status()

        for line in response.iter_lines(decode_unicode=True):
            if line and line.startswith("data: "):
                yield json.loads(line[6:])
    
    # =========================================================================
    # Collections
    # =========================================================================
    
    def create_collection(
        self, 
        name: str, 
        description: Optional[str] = None
    ) -> Dict[str, Any]:
        """Create a new collection."""
        payload = {'name': name}
        if description:
            payload['description'] = description
        return self._request('POST', '/api/collections', json=payload)
    
    def list_collections(self) -> List[Dict[str, Any]]:
        """List all collections."""
        return self._request('GET', '/api/collections')
    
    def delete_collection(self, collection_id: str) -> Dict[str, Any]:
        """Delete a collection."""
        return self._request('DELETE', f'/api/collections/{collection_id}')
    
    # =========================================================================
    # Knowledge Graph
    # =========================================================================
    
    def get_graph_visualization(
        self, 
        collection_id: Optional[str] = None
    ) -> Dict[str, Any]:
        """Get graph visualization data."""
        params = {}
        if collection_id:
            params['collection_id'] = collection_id
        return self._request('GET', '/api/graph/visualization', params=params)
    
    def search_entities(
        self,
        search: str,
        entity_type: Optional[str] = None,
        limit: int = 20
    ) -> List[Dict[str, Any]]:
        """Search for entities in the knowledge graph."""
        params = {'search': search, 'limit': limit}
        if entity_type:
            params['type'] = entity_type
        return self._request('GET', '/api/graph/entities', params=params)

    def find_duplicate_entities(
        self,
        threshold: float = 0.85,
        limit: int = 50
    ) -> Dict[str, Any]:
        """
        Find duplicate entity candidates.

        Args:
            threshold: Similarity threshold (0.5 to 1.0)
            limit: Maximum number of duplicate groups to return
        """
        params = {'threshold': threshold, 'limit': limit}
        return self._request('GET', '/api/entities/duplicates', params=params)

    def merge_entities(
        self,
        canonical: str,
        merge: List[str]
    ) -> Dict[str, Any]:
        """
        Merge duplicate entities into a canonical entity.

        Args:
            canonical: Name of the entity to keep
            merge: List of entity names to merge into canonical
        """
        payload = {'canonical': canonical, 'merge': merge}
        return self._request('POST', '/api/entities/merge', json=payload)

    def get_merge_history(self, limit: int = 50) -> Dict[str, Any]:
        """Get entity merge history."""
        return self._request('GET', '/api/entities/merge-history', params={'limit': limit})
```

## Usage Examples

### Basic Setup

```python
# Initialize the client
client = CortexClient(
    base_url="http://localhost:8000",
    api_key="your-api-key"
)

# Check health
health = client.health()
print(f"Status: {health['status']}")
```

### Upload Documents

```python
# Upload a single document
result = client.upload_document(
    "research_paper.pdf",
    collection_id="research"
)
print(f"Uploaded: {result['doc_id']}")

# Upload multiple documents
import glob

for file_path in glob.glob("documents/*.pdf"):
    result = client.upload_document(file_path)
    print(f"Uploaded: {result['filename']}")
```

### Search

```python
# Basic search
results = client.search("machine learning")
for r in results['results']:
    print(f"- {r['document_title']}: {r['content'][:100]}...")

# Search in a specific collection
results = client.search(
    query="neural networks",
    collection_id="research",
    limit=5
)
```

### Ask Questions

```python
# Simple question
response = client.ask("What are the main findings?")
print(response['answer'])

# Print sources
for source in response.get('sources', []):
    print(f"  Source: {source['document_title']}")

# Multi-turn conversation
history = []
question = "What is GraphRAG?"
response = client.ask(question, conversation_history=history)
history.append({"role": "user", "content": question})
history.append({"role": "assistant", "content": response['answer']})

# Follow-up question
response = client.ask(
    "How does it compare to traditional RAG?",
    conversation_history=history
)
```

### Streaming Responses

```python
# Stream the response
for chunk in client.ask_streaming("Explain knowledge graphs"):
    print(chunk, end='', flush=True)
print()
```

### Entity Deduplication

```python
# Find potential duplicate entities
duplicates = client.find_duplicate_entities(threshold=0.85)
print(f"Found {duplicates['total_groups']} duplicate groups")

for group in duplicates['groups']:
    canonical = group['canonical']
    dupes = [d['name'] for d in group['duplicates']]
    print(f"  {canonical} <- {dupes}")

# Merge confirmed duplicates
result = client.merge_entities(
    canonical="Machine Learning",
    merge=["machine learning", "ML"]
)
print(f"Merged {len(result['merged'])} entities")
print(f"  Relationships transferred: {result['relationships_transferred']}")
print(f"  Mentions transferred: {result['mentions_transferred']}")

# Review merge history
history = client.get_merge_history(limit=10)
for entry in history['entries']:
    print(f"  {entry['merged_at']}: {entry['merged']} -> {entry['canonical']}")
```

### Error Handling

```python
from requests.exceptions import HTTPError

try:
    result = client.search("query")
except HTTPError as e:
    if e.response.status_code == 401:
        print("Invalid API key")
    elif e.response.status_code == 429:
        print("Rate limited, please wait")
    elif e.response.status_code == 403:
        print("Insufficient permissions")
    else:
        print(f"Error: {e}")
```

## Complete Example Script

```python
#!/usr/bin/env python3
"""Example: Upload documents and ask questions."""

from cortex_client import CortexClient
import sys

def main():
    # Initialize client
    client = CortexClient(
        base_url="http://localhost:8000",
        api_key="your-api-key"
    )
    
    # Check health
    health = client.health()
    if health['status'] != 'healthy':
        print("API not healthy!")
        sys.exit(1)
    
    # Get stats
    stats = client.stats()
    print(f"Documents: {stats['document_count']}")
    print(f"Entities: {stats['entity_count']}")
    
    # Search
    results = client.search("important topic", limit=5)
    print(f"\nFound {len(results['results'])} results")
    
    # Ask a question
    response = client.ask("Summarize the key points")
    print(f"\nAnswer: {response['answer']}")

if __name__ == "__main__":
    main()
```


---

## Document: Integration Examples

Integrate Cortex with popular frameworks, tools, and platforms

URL: /examples/integration


# Integration Examples

Examples for integrating Cortex with popular frameworks, AI tools, and automation platforms.

## Next.js / React Integration

### API Route Handler

Create a backend route to proxy requests to Cortex:

```typescript
// app/api/search/route.ts
import { NextRequest, NextResponse } from 'next/server';

const CORTEX_URL = process.env.CORTEX_API_URL!;
const CORTEX_KEY = process.env.CORTEX_API_KEY!;

export async function POST(request: NextRequest) {
  try {
    const { query, limit = 10 } = await request.json();
    
    const response = await fetch(`${CORTEX_URL}/api/search`, {
      method: 'POST',
      headers: {
        'X-API-Key': CORTEX_KEY,
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({ query, limit }),
    });
    
    if (!response.ok) {
      throw new Error(`Cortex API error: ${response.status}`);
    }
    
    const data = await response.json();
    return NextResponse.json(data);
  } catch (error) {
    console.error('Search error:', error);
    return NextResponse.json(
      { error: 'Search failed' },
      { status: 500 }
    );
  }
}
```

### React Hook

```typescript
// hooks/useCortex.ts
import { useState, useCallback } from 'react';

interface SearchResult {
  id: string;
  content: string;
  score: number;
  document_title: string;
}

interface SearchResponse {
  results: SearchResult[];
  total: number;
}

export function useCortexSearch() {
  const [results, setResults] = useState<SearchResult[]>([]);
  const [loading, setLoading] = useState(false);
  const [error, setError] = useState<string | null>(null);

  const search = useCallback(async (query: string) => {
    setLoading(true);
    setError(null);
    
    try {
      const response = await fetch('/api/search', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ query }),
      });
      
      if (!response.ok) throw new Error('Search failed');
      
      const data: SearchResponse = await response.json();
      setResults(data.results);
    } catch (err) {
      setError(err instanceof Error ? err.message : 'Unknown error');
    } finally {
      setLoading(false);
    }
  }, []);

  return { search, results, loading, error };
}

// Usage in component
function SearchComponent() {
  const { search, results, loading, error } = useCortexSearch();
  const [query, setQuery] = useState('');

  return (
    <div>
      <input
        value={query}
        onChange={(e) => setQuery(e.target.value)}
        placeholder="Search..."
      />
      <button onClick={() => search(query)} disabled={loading}>
        {loading ? 'Searching...' : 'Search'}
      </button>
      
      {error && <p className="error">{error}</p>}
      
      <ul>
        {results.map((r) => (
          <li key={r.id}>
            <strong>{r.document_title}</strong>
            <p>{r.content.slice(0, 200)}...</p>
          </li>
        ))}
      </ul>
    </div>
  );
}
```

### Streaming Chat Component

```typescript
// components/ChatWithCortex.tsx
'use client';

import { useState, useRef } from 'react';

export function ChatWithCortex() {
  const [messages, setMessages] = useState<Array<{role: string, content: string}>>([]);
  const [input, setInput] = useState('');
  const [streaming, setStreaming] = useState(false);

  const askQuestion = async () => {
    if (!input.trim() || streaming) return;
    
    const question = input;
    setInput('');
    setMessages(prev => [...prev, { role: 'user', content: question }]);
    setStreaming(true);
    
    try {
      const response = await fetch('/api/ask', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ 
          query: question,
          conversation_history: messages 
        }),
      });
      
      const reader = response.body?.getReader();
      const decoder = new TextDecoder();
      let answer = '';
      
      while (reader) {
        const { done, value } = await reader.read();
        if (done) break;
        
        const chunk = decoder.decode(value);
        answer += chunk;
        
        // Update the last message in real-time
        setMessages(prev => {
          const updated = [...prev];
          const lastIdx = updated.length - 1;
          if (updated[lastIdx]?.role === 'assistant') {
            updated[lastIdx].content = answer;
          } else {
            updated.push({ role: 'assistant', content: answer });
          }
          return updated;
        });
      }
    } catch (error) {
      console.error('Ask error:', error);
    } finally {
      setStreaming(false);
    }
  };

  return (
    <div className="chat-container">
      <div className="messages">
        {messages.map((msg, i) => (
          <div key={i} className={`message ${msg.role}`}>
            {msg.content}
          </div>
        ))}
      </div>
      <input
        value={input}
        onChange={(e) => setInput(e.target.value)}
        onKeyDown={(e) => e.key === 'Enter' && askQuestion()}
        placeholder="Ask a question..."
        disabled={streaming}
      />
    </div>
  );
}
```

---

## LangChain Integration

### Custom Retriever

```python
from langchain.schema import BaseRetriever, Document
from typing import List
import requests


class CortexRetriever(BaseRetriever):
    """LangChain retriever that uses Cortex for search."""
    
    base_url: str
    api_key: str
    collection_id: str = "default"
    k: int = 5
    
    class Config:
        arbitrary_types_allowed = True
    
    def _get_relevant_documents(self, query: str) -> List[Document]:
        """Retrieve relevant documents from Cortex."""
        response = requests.post(
            f"{self.base_url}/api/search",
            headers={"X-API-Key": self.api_key},
            json={
                "query": query,
                "limit": self.k,
                "collection_id": self.collection_id
            }
        )
        response.raise_for_status()
        
        results = response.json()["results"]
        return [
            Document(
                page_content=r["content"],
                metadata={
                    "source": r["document_title"],
                    "doc_id": r["document_id"],
                    "score": r["score"]
                }
            )
            for r in results
        ]


# Usage with LangChain
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI

retriever = CortexRetriever(
    base_url="http://localhost:8000",
    api_key="your-key",
    collection_id="research",
    k=5
)

llm = ChatOpenAI(model="gpt-4o-mini")
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    return_source_documents=True
)

result = qa_chain({"query": "What are the key findings?"})
print(result["result"])
```

### LangChain Tool

```python
from langchain.tools import Tool
import requests


def cortex_search(query: str) -> str:
    """Search the Cortex knowledge base."""
    response = requests.post(
        "http://localhost:8000/api/search",
        headers={"X-API-Key": "your-key"},
        json={"query": query, "limit": 5}
    )
    results = response.json()["results"]
    return "\n\n".join([
        f"**{r['document_title']}**: {r['content'][:500]}"
        for r in results
    ])


cortex_tool = Tool(
    name="Cortex Search",
    func=cortex_search,
    description="Search the company knowledge base for information about products, processes, and documentation."
)

# Use with an agent
from langchain.agents import initialize_agent, AgentType
from langchain.chat_models import ChatOpenAI

agent = initialize_agent(
    tools=[cortex_tool],
    llm=ChatOpenAI(model="gpt-4o-mini"),
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION
)

response = agent.run("What does our documentation say about authentication?")
```

---

## Slack Bot Integration

```python
from slack_bolt import App
from slack_bolt.adapter.socket_mode import SocketModeHandler
import requests
import os

app = App(token=os.environ["SLACK_BOT_TOKEN"])

CORTEX_URL = os.environ["CORTEX_API_URL"]
CORTEX_KEY = os.environ["CORTEX_API_KEY"]


@app.command("/ask")
def handle_ask(ack, respond, command):
    """Handle /ask slash command."""
    ack()
    
    question = command["text"]
    if not question:
        respond("Please provide a question: `/ask What is X?`")
        return
    
    respond(f"Looking up: _{question}_...")
    
    try:
        response = requests.post(
            f"{CORTEX_URL}/api/ask",
            headers={"X-API-Key": CORTEX_KEY},
            json={"query": question}
        )
        response.raise_for_status()
        
        data = response.json()
        answer = data["answer"]
        sources = data.get("sources", [])
        
        # Format response
        blocks = [
            {
                "type": "section",
                "text": {"type": "mrkdwn", "text": f"*Answer:*\n{answer}"}
            }
        ]
        
        if sources:
            source_text = "\n".join([
                f"• {s['document_title']}"
                for s in sources[:3]
            ])
            blocks.append({
                "type": "context",
                "elements": [
                    {"type": "mrkdwn", "text": f"*Sources:*\n{source_text}"}
                ]
            })
        
        respond(blocks=blocks)
        
    except Exception as e:
        respond(f"Error: {str(e)}")


@app.command("/search")
def handle_search(ack, respond, command):
    """Handle /search slash command."""
    ack()
    
    query = command["text"]
    if not query:
        respond("Please provide a search query: `/search topic`")
        return
    
    try:
        response = requests.post(
            f"{CORTEX_URL}/api/search",
            headers={"X-API-Key": CORTEX_KEY},
            json={"query": query, "limit": 5}
        )
        response.raise_for_status()
        
        results = response.json()["results"]
        
        if not results:
            respond("No results found.")
            return
        
        blocks = [
            {
                "type": "section",
                "text": {"type": "mrkdwn", "text": f"*Search results for:* _{query}_"}
            }
        ]
        
        for r in results:
            blocks.append({
                "type": "section",
                "text": {
                    "type": "mrkdwn",
                    "text": f"*{r['document_title']}*\n{r['content'][:200]}..."
                }
            })
        
        respond(blocks=blocks)
        
    except Exception as e:
        respond(f"Error: {str(e)}")


if __name__ == "__main__":
    handler = SocketModeHandler(
        app, 
        os.environ["SLACK_APP_TOKEN"]
    )
    handler.start()
```

---

## Webhook Integration

### Configure Webhooks

Set a webhook URL to receive notifications:

```bash
WEBHOOK_URL=https://your-app.com/webhooks/cortex
```

### Webhook Handler (Flask)

```python
from flask import Flask, request, jsonify
import hmac
import hashlib

app = Flask(__name__)
WEBHOOK_SECRET = "your-webhook-secret"


def verify_signature(payload: bytes, signature: str) -> bool:
    """Verify webhook signature."""
    expected = hmac.new(
        WEBHOOK_SECRET.encode(),
        payload,
        hashlib.sha256
    ).hexdigest()
    return hmac.compare_digest(f"sha256={expected}", signature)


@app.route('/webhooks/cortex', methods=['POST'])
def handle_webhook():
    """Handle Cortex webhooks."""
    # Verify signature
    signature = request.headers.get('X-Cortex-Signature', '')
    if not verify_signature(request.data, signature):
        return jsonify({"error": "Invalid signature"}), 401
    
    event = request.json
    event_type = event.get('type')
    
    if event_type == 'document.processed':
        doc_id = event['document_id']
        doc_title = event['document_title']
        print(f"Document processed: {doc_title} ({doc_id})")
        # Trigger your workflow
        
    elif event_type == 'document.failed':
        doc_id = event['document_id']
        error = event.get('error', 'Unknown error')
        print(f"Document failed: {doc_id} - {error}")
        # Send alert
        
    elif event_type == 'community.detected':
        count = event['community_count']
        print(f"Communities detected: {count}")
        # Update dashboard
    
    return jsonify({"status": "ok"})


if __name__ == "__main__":
    app.run(port=5000)
```

---

## n8n / Make.com

### HTTP Request Node

Configure an HTTP Request node in n8n:

**Method:** POST  
**URL:** `{{$env.CORTEX_URL}}/api/ask`  
**Headers:**
```json
{
  "X-API-Key": "{{$env.CORTEX_API_KEY}}",
  "Content-Type": "application/json"
}
```
**Body:**
```json
{
  "query": "{{$node.previous.json.question}}"
}
```

### Example n8n Workflow

1. **Webhook Trigger** - Receive incoming questions
2. **HTTP Request** - Call Cortex API
3. **Set Node** - Format the response
4. **Respond to Webhook** - Return the answer

---

## Zapier Integration

Use Zapier's Webhooks to integrate with Cortex:

1. **Trigger:** New row in Google Sheets
2. **Action:** Webhooks by Zapier → POST
3. **URL:** `https://api.your-domain.com/api/search`
4. **Headers:** `X-API-Key: your-key`
5. **Data:** `{"query": "{{Sheet Row Data}}"}`


---

## Document: cURL Examples

Command-line examples using cURL for all Cortex API operations

URL: /examples/curl


# cURL Examples

Complete cURL examples for all major Cortex API operations. These examples are ready to copy and paste.

## Setup

Set your API configuration as environment variables:

```bash
export CORTEX_URL="http://localhost:8000"
export CORTEX_API_KEY="your-api-key"
```

## Health & Statistics

### Health Check

```bash
curl "$CORTEX_URL/health"
```

### Get Statistics

```bash
curl "$CORTEX_URL/api/stats" \
  -H "X-API-Key: $CORTEX_API_KEY"
```

---

## Documents

### Upload a Document

```bash
# collection_id and start_processing are query parameters
curl -X POST "$CORTEX_URL/api/upload?collection_id=default&start_processing=true" \
  -H "X-API-Key: $CORTEX_API_KEY" \
  -F "file=@document.pdf"
```

### Upload Without Processing (for bulk)

```bash
# start_processing is a query parameter
curl -X POST "$CORTEX_URL/api/upload?start_processing=false" \
  -H "X-API-Key: $CORTEX_API_KEY" \
  -F "file=@document.pdf"
```

### Process Pending Documents

```bash
curl -X POST "$CORTEX_URL/api/documents/process-pending" \
  -H "X-API-Key: $CORTEX_API_KEY"
```

### List Documents

```bash
curl "$CORTEX_URL/api/documents" \
  -H "X-API-Key: $CORTEX_API_KEY"
```

### List Documents in a Collection

```bash
curl "$CORTEX_URL/api/documents?collection_id=research" \
  -H "X-API-Key: $CORTEX_API_KEY"
```

### Get Document Details

```bash
curl "$CORTEX_URL/api/documents/doc_abc123" \
  -H "X-API-Key: $CORTEX_API_KEY"
```

### Delete a Document

```bash
curl -X DELETE "$CORTEX_URL/api/documents/doc_abc123" \
  -H "X-API-Key: $CORTEX_API_KEY"
```

### Reprocess a Document

```bash
curl -X POST "$CORTEX_URL/api/documents/doc_abc123/reprocess" \
  -H "X-API-Key: $CORTEX_API_KEY"
```

### Bulk Delete

```bash
curl -X POST "$CORTEX_URL/api/documents/delete" \
  -H "X-API-Key: $CORTEX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"document_ids": ["doc_1", "doc_2", "doc_3"]}'
```

---

## Custom Inputs

### Add Q&A Pair

```bash
curl -X POST "$CORTEX_URL/api/custom-input" \
  -H "X-API-Key: $CORTEX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "qa",
    "title": "FAQ Entry",
    "question": "What is Cortex?",
    "answer": "Cortex is an agentic knowledge base."
  }'
```

### Add Plain Text

```bash
curl -X POST "$CORTEX_URL/api/custom-input" \
  -H "X-API-Key: $CORTEX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "text",
    "title": "Company Info",
    "content": "Cortex is a powerful knowledge management system..."
  }'
```

### Add Markdown

```bash
curl -X POST "$CORTEX_URL/api/custom-input" \
  -H "X-API-Key: $CORTEX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "markdown",
    "title": "Technical Notes",
    "content": "# Overview\n\n## Features\n\n- Feature 1\n- Feature 2"
  }'
```

---

## Search

### Basic Search

```bash
curl -X POST "$CORTEX_URL/api/search" \
  -H "X-API-Key: $CORTEX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"query": "machine learning", "limit": 10}'
```

### Hybrid Search

```bash
curl -X POST "$CORTEX_URL/api/search" \
  -H "X-API-Key: $CORTEX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "neural networks",
    "limit": 20,
    "search_type": "hybrid"
  }'
```

### Search in Collection

```bash
curl -X POST "$CORTEX_URL/api/search" \
  -H "X-API-Key: $CORTEX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "transformer architecture",
    "collection_id": "research",
    "limit": 10
  }'
```

### Vector-Only Search

```bash
curl -X POST "$CORTEX_URL/api/search" \
  -H "X-API-Key: $CORTEX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "semantic similarity concepts",
    "search_type": "vector"
  }'
```

---

## Ask AI (RAG)

### Ask a Question

```bash
curl -X POST "$CORTEX_URL/api/ask/stream" \
  -H "X-API-Key: $CORTEX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"question": "What are the main findings?"}' \
  --no-buffer
```

### Ask About a Collection

```bash
curl -X POST "$CORTEX_URL/api/ask/stream" \
  -H "X-API-Key: $CORTEX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "question": "Summarize the key points",
    "collection_id": "research"
  }' --no-buffer
```

### Fast Mode

```bash
curl -X POST "$CORTEX_URL/api/ask/stream" \
  -H "X-API-Key: $CORTEX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "question": "What is the company mission?",
    "use_fast_search": true
  }' --no-buffer
```

### Streaming Response

```bash
curl -X POST "$CORTEX_URL/api/ask/stream" \
  -H "X-API-Key: $CORTEX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"question": "Explain GraphRAG in detail"}' \
  --no-buffer
```

### Deep Research (Agentic) Mode

```bash
curl -X POST "$CORTEX_URL/api/ask/stream" \
  -H "X-API-Key: $CORTEX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "question": "Compare the methodologies in papers A and B",
    "use_agentic": true
  }' --no-buffer
```

### With Conversation History

```bash
curl -X POST "$CORTEX_URL/api/ask/stream" \
  -H "X-API-Key: $CORTEX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "question": "Can you elaborate on the third point?",
    "conversation_history": [
      {"role": "user", "content": "What are the benefits?"},
      {"role": "assistant", "content": "The main benefits are: 1) ..."}
    ]
  }' --no-buffer
```

---

## Collections

### Create Collection

```bash
curl -X POST "$CORTEX_URL/api/collections" \
  -H "X-API-Key: $CORTEX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Research Papers",
    "description": "Academic papers on AI and ML"
  }'
```

### List Collections

```bash
curl "$CORTEX_URL/api/collections" \
  -H "X-API-Key: $CORTEX_API_KEY"
```

### Get Collection Details

```bash
curl "$CORTEX_URL/api/collections/coll_abc123" \
  -H "X-API-Key: $CORTEX_API_KEY"
```

### Update Collection

```bash
curl -X PUT "$CORTEX_URL/api/collections/coll_abc123" \
  -H "X-API-Key: $CORTEX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Updated Name",
    "description": "Updated description"
  }'
```

### Delete Collection

```bash
curl -X DELETE "$CORTEX_URL/api/collections/coll_abc123" \
  -H "X-API-Key: $CORTEX_API_KEY"
```

### Move Documents

```bash
curl -X POST "$CORTEX_URL/api/documents/move" \
  -H "X-API-Key: $CORTEX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "document_ids": ["doc_1", "doc_2"],
    "target_collection_id": "coll_def456"
  }'
```

---

## Knowledge Graph

### Get Graph Visualization

```bash
curl "$CORTEX_URL/api/graph/visualization" \
  -H "X-API-Key: $CORTEX_API_KEY"
```

### Search Entities

```bash
curl "$CORTEX_URL/api/graph/entities?search=neural&limit=20" \
  -H "X-API-Key: $CORTEX_API_KEY"
```

### Get Entity Details

```bash
curl "$CORTEX_URL/api/graph/entities/ent_abc123" \
  -H "X-API-Key: $CORTEX_API_KEY"
```

### Get Subgraph

```bash
curl -X POST "$CORTEX_URL/api/graph/subgraph" \
  -H "X-API-Key: $CORTEX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "entity_name": "Machine Learning",
    "max_depth": 2,
    "limit": 50
  }'
```

### Find Duplicate Entities

```bash
curl "$CORTEX_URL/api/entities/duplicates?threshold=0.85&limit=50" \
  -H "X-API-Key: $CORTEX_API_KEY"
```

### Merge Duplicate Entities

```bash
curl -X POST "$CORTEX_URL/api/entities/merge" \
  -H "X-API-Key: $CORTEX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "canonical": "Machine Learning",
    "merge": ["machine learning", "ML"]
  }'
```

### View Merge History

```bash
curl "$CORTEX_URL/api/entities/merge-history?limit=20" \
  -H "X-API-Key: $CORTEX_API_KEY"
```

### Delete All Entities

```bash
curl -X DELETE "$CORTEX_URL/api/graph/entities" \
  -H "X-API-Key: $CORTEX_API_KEY"
```

---

## Communities

### Detect Communities

```bash
curl -X POST "$CORTEX_URL/api/graph/communities/detect" \
  -H "X-API-Key: $CORTEX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"min_community_size": 3}'
```

### List Communities

```bash
curl "$CORTEX_URL/api/graph/communities" \
  -H "X-API-Key: $CORTEX_API_KEY"
```

### Get Community Details

```bash
curl "$CORTEX_URL/api/graph/communities/comm_1" \
  -H "X-API-Key: $CORTEX_API_KEY"
```

### Summarize Community

```bash
curl -X POST "$CORTEX_URL/api/graph/communities/comm_1/summarize" \
  -H "X-API-Key: $CORTEX_API_KEY"
```

---

## Background Tasks

### Get Task Status

```bash
curl "$CORTEX_URL/api/tasks/task_abc123" \
  -H "X-API-Key: $CORTEX_API_KEY"
```

### List Tasks

```bash
curl "$CORTEX_URL/api/tasks" \
  -H "X-API-Key: $CORTEX_API_KEY"
```

### Cancel Task

```bash
curl -X DELETE "$CORTEX_URL/api/tasks/task_abc123" \
  -H "X-API-Key: $CORTEX_API_KEY"
```

---

## Admin - API Keys

### Create API Key

```bash
curl -X POST "$CORTEX_URL/api/admin/api-keys" \
  -H "X-API-Key: $CORTEX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Production App",
    "permissions": ["read", "write"],
    "expires_at": "2025-12-31T23:59:59Z"
  }'
```

### List API Keys

```bash
curl "$CORTEX_URL/api/admin/api-keys" \
  -H "X-API-Key: $CORTEX_API_KEY"
```

### Revoke API Key

```bash
curl -X POST "$CORTEX_URL/api/admin/api-keys/key_abc123/revoke" \
  -H "X-API-Key: $CORTEX_API_KEY"
```

### Delete API Key

```bash
curl -X DELETE "$CORTEX_URL/api/admin/api-keys/key_abc123" \
  -H "X-API-Key: $CORTEX_API_KEY"
```

---

## Turbo Mode

### Check Turbo Status

```bash
curl "$CORTEX_URL/api/turbo/status" \
  -H "X-API-Key: $CORTEX_API_KEY"
```

### Start Turbo Job

```bash
curl -X POST "$CORTEX_URL/api/turbo/start" \
  -H "X-API-Key: $CORTEX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"runtime_seconds": 3600}'
```

### Stop Turbo Job

```bash
curl -X POST "$CORTEX_URL/api/turbo/stop" \
  -H "X-API-Key: $CORTEX_API_KEY"
```

### Check Turbo Jobs

```bash
curl "$CORTEX_URL/api/turbo/jobs" \
  -H "X-API-Key: $CORTEX_API_KEY"
```