DocuClaw

YOUR DOCUMENTS. YOUR RULES.

Open-source, local-first, AI-powered document intelligence. Extract, organize, and archive invoices, receipts, and contracts — 100% on your machine.

View on GitHub

Quick Start

# Clone & install
$ git clone https://github.com/astonysh/DocuClaw.git
$ cd DocuClaw && pip install -e .

# Process a document
$ docuclaw process \
    --entity-id "org_mycompany_01" \
    --country DE \
    --input ./scans/invoice.png

What It Does

🛡️

100% Sovereign

All data stays on YOUR machine. Zero cloud dependency. Zero telemetry. Your privacy is non-negotiable.

🏢

Multi-Entity

Manage personal docs, company invoices, and team files — all in one install. Separate or combine as you wish.

🔌

Plugin Architecture

Country-specific parsers snap in like LEGO bricks. Germany, US, China — extend DocuClaw for any locale.

📝

Markdown-Native

Every document becomes a searchable .md file with structured YAML frontmatter. Human-readable, version-controllable.

🤖

AI-Powered Extraction

Multimodal LLM extracts structured data from scans, photos, and emails. Works with Ollama, OpenAI, or any model.

Compliance-Ready

Designed with GoBD (Germany), GDPR, and audit-trail principles baked in. Enterprise-grade from day one.

Architecture

┌─────────────────────────────────────────────┐
│                   CLI / API                  │
├─────────────────────────────────────────────┤
│               Core Engine                    │
│  ┌──────────┐  ┌──────────┐  ┌───────────┐ │
│  │  Schema   │  │ Storage  │  │  Registry │ │
│  │(Pydantic) │  │  Layer   │  │  (Plugin) │ │
│  └──────────┘  └──────────┘  └───────────┘ │
├─────────────────────────────────────────────┤
│             Parser Plugins                   │
│  ┌────────┐  ┌────────┐  ┌──────────────┐  │
│  │ DE 🇩🇪  │  │ US 🇺🇸  │  │ Custom ...  │  │
│  │Invoice │  │Invoice │  │  Your Parser │  │
│  └────────┘  └────────┘  └──────────────┘  │
├─────────────────────────────────────────────┤
│        Input Adapters (Future)               │
│  📷 Scanner │ 📧 Email │ 🔗 Webhook │ 🔌 API │
└─────────────────────────────────────────────┘

The Data Contract

Every document, whether a €10K enterprise invoice or a personal electricity bill, is normalized into a universal Markdown schema with structured YAML frontmatter.

---
id: doc_20260215_a1b2c3d4
entity_id: "org_acme_01"
entity_type: "company"
source_type: physical_mail
country: DE
document_type: b2b_invoice
date_received: "2026-02-15"
sender_name: "AWS EMEA SARL"
amount_total: 125.50
currency: EUR
status: pending
tags: [IT_Infrastructure, Q1_Expense]
---

How It Works

📄
Document Input
Scan, email, or API
🤖
AI Extraction
LLM-powered parsing
🔍
Validation
Pydantic schema check
📁
Local Archive
Structured Markdown

Ecosystem

DocuClaw

Sovereign document intelligence & archival

OpenClaw

Personal AI assistant on any platform

ClawHub

Plugin marketplace & community hub

Roadmap

Core schema, storage engine, parser framework, CLI
Email ingestion adapter (IMAP / POP3)
Real multimodal LLM integration (Ollama, OpenAI Vision)
Web UI dashboard (local-only, no cloud)
GoBD-compliant audit trail with hash chains
Multi-entity permission model & team collaboration
Webhook & API ingestion endpoints