The P3AK Pipeline

Scattered files.
One encrypted brain.

A visual walkthrough of exactly how your documents go from PDFs, decks, and notes to a single vault your AI queries with 98% accuracy — in under a second.

01 — Ingest 02 — Vault 03 — Lookup 04 — Ecosystem
Scroll to begin
01

Step one

Everything goes in.
Nothing is lost.

Drop in 35 file types across any folder, data room, or integration. P3AK detects each format, extracts the content, and preserves every version. Multiple drafts of the same document collapse into a single versioned entry — with history intact.

PDF · 3 versions
pdf
services-agreement.pdf
Nov 1, 2025284 KBv1
pdf
services-agreement.pdf
Nov 15, 2025287 KBv2
pdf
services-agreement.pdf
Dec 1, 2025291 KBv3
CURRENT
Word · 2 versions
docx
board-minutes-nov.docx
Nov 10, 202544 KBv1
docx
board-minutes-dec.docx
Dec 10, 202548 KBv2
CURRENT
PowerPoint · 12 slides
Market Expansion
Slide 12 of 12
12 / 12
Q4 Financials
Revenue & Runway
$47K
MRR · +18% QoQ
3 / 12
Q4 Investor Update
Silicon Bayou LLC · December 2025
  • $47K MRR · 16 mo runway
  • p3ak-vault v0.1.0 shipped
  • Beta waitlist open
1 / 12
pptx
q4-investor-deck.pptx
2.1 MB12 slides extracted
Google Slides · export to PDF
Team & Roadmap
  • Q1: crates.io publish
  • Q2: PyPI + Room GA
8 / 8
The Problem
  • AI forgets everything
  • Data scattered in SaaS
  • Hallucinations cost trust
2 / 8
Series A Pitch Deck
Silicon Bayou LLC · 2026
P3AK
The AI brain for any organization
1 / 8
pdf
growth-strategy-2026.pdf
3.4 MBexported from Google Slides
Markdown · 3 documents
md
architecture.md
18 KBplatform architecture · 847 lines
md
roadmap.md
22 KB2026 product roadmap
md
meeting-notes.md
9 KBweekly ops · auto-ingested
LATEST
p3ak-vault ingest — what happens to every file
01
🔍
Detect
Read magic bytes + extension. Identify format from 35 supported types.
02
⚗️
Extract
PDF text via lopdf, DOCX via docx-rs, PPTX slide text, image OCR via Tesseract, audio transcription.
03
📝
Normalize
All content becomes clean Markdown. Tables preserved. Structure retained. Metadata extracted.
04
🧬
Dedupe
SHA-256 hash of content. Skip if unchanged. Version if different. Never duplicate.
05
🗂️
Index
BM25 full-text + ZVec TF-IDF vectors + PageIndex hierarchical tree built simultaneously.
06
🔒
Encrypt
AES-256-GCM over the entire vault. WAL entry written. Hash-linked audit trail updated.
Input — 10 documents
pdf services-agreement.pdf · v1 / v2 / v3
docx board-minutes-nov.docx
docx board-minutes-dec.docx
pptx q4-investor-deck.pptx · 12 slides
pdf growth-strategy-2026.pdf · 8 slides
md architecture.md + roadmap.md + notes.md
~6.2 MB · 6 formats · 3 version histories scattered across folders and drives
Output
company.vault 🔒
Format1 encrypted file
Documents indexed10 docs · 47K tokens
Version historiespreserved, not lost
Formats insideall normalized → .md
Search latency< 50 ms
EncryptionAES-256-GCM
02

Step two

One file.
Six layers deep.

The .vault file looks like a single binary on disk. Inside it's a layered system — encryption wrapping indexes wrapping content wrapping history. Pull any layer away and the one beneath is still intact.

company.vault — internal structure
🔒
AES-256-GCM Encryption
The entire vault is encrypted at rest. Key derived per-user. Nothing readable without the key — not even metadata.
Outermost
📋
WAL — Write-Ahead Log
Every write is logged before it's applied. Hash-linked entries create a tamper-evident audit trail. Full forensic rollback to any point.
Integrity
🌳
PageIndex — Hierarchical Tree
Documents are indexed as nested trees. Headings, sections, and sub-sections create structure-aware retrieval — not just keyword hits.
Structure
ZVec — TF-IDF Semantic Index
Sparse vector search over the vocabulary. Finds semantically similar content even when keywords don't match exactly. 98% Top-1 accuracy.
Semantic
🔤
BM25 — Full-Text Index (Tantivy)
Best-in-class full-text retrieval. Ranking by term frequency, inverse document frequency, and field weights. Exact phrase and proximity search.
Keywords
📦
Document Store
Raw normalized Markdown chunks with metadata. Source format, hash, version, room, tributary, and tags stored alongside every entry.
Core
Vault stats — company.vault
Documents10 ingested
Total chunks847 indexed
Version histories3 tracked
WAL entries10 writes
Search modehybrid (BM25 + ZVec)
EncryptionAES-256-GCM
Top-1 accuracy98%
Search latency< 50ms
Read — vault contents
$ p3ak-vault read \
  --path company.vault \
  --type docs

{
  "docs": [
    {
      "id": "acme-services-agreement",
      "title": "Professional Services Agreement",
      "version": 3,
      "format": "pdf", ...
    },
    ... // 9 more docs
  ]
}
Hybrid search scoring
// For each candidate chunk:
score = (bm25 × 0.6)
      + (zvec_cosine × 0.3)
      + (page_depth × 0.1)

// Reranker applies:
final = cross_encoder(
  query, top_k_candidates
)

// Return top N results
03

Step three

Ask anything.
Get the truth.

Hybrid search runs BM25 + ZVec + PageIndex simultaneously, merges the ranked results, and returns exact source citations with confidence scores. Three examples — one for each retrieval strength.

BM25 · Exact
$ "what are the renewal terms for the Acme contract?"
services-agreement.pdf · v3
pdflegalv3
1.41
"This Agreement is effective for twenty-four (24) months from the Effective Date and shall auto-renew annually unless either party provides 30 days written notice of non-renewal."
board-minutes-dec.docx
docxoperations
0.88
"Renewal of vendor contract approved unanimously. Term extended to 24 months. CFO to confirm final terms by Dec 15."
BM25 weight: 0.60 · ZVec: 0.30 · PageIndex: 0.10 · Latency: 31ms
ZVec · Semantic
$ "show me our growth trajectory and where we're headed"
growth-strategy-2026.pdf · slide 2
slidesstrategy
1.38
"$47K MRR (+18% QoQ). Targeting $200K MRR by Q4 2026. Key growth levers: PyPI distribution, Room GA, Series A close Q2. 16-month runway at current burn."
q4-investor-deck.pptx · slide 3
pptxfinance
1.14
"Revenue: $39.8K → $47K MRR. Burn: $24.1K → $23.5K/mo. Runway extended from 14 to 16 months."
roadmap.md · Q1 2026
mdtech
0.91
"Q1 goals: 500 cargo installs, PyPI wheels, Room private beta, CREST protocol v1. Milestone: first paying customer."
BM25 weight: 0.45 · ZVec: 0.45 · PageIndex: 0.10 · Latency: 44ms
Hybrid · Synthesis
$ "summarize all decisions made in board meetings this quarter"
board-minutes-nov.docx · §3 Decisions
docxboard
1.29
"Approved unanimously: Q4 investor deck content; CFO hire timeline extended to Q1; vault v0.1.0 release authorized."
board-minutes-dec.docx · §2 Resolutions
docxboard
1.22
"Resolved: Acme contract renewal approved (24 months). Approved: Series A timeline, Q1 close target. Tabled: international expansion until Q3 2026."
P3AK synthesis — across both documents
Q4 board summary: vault v0.1.0 shipped ✓ · Acme contract renewed for 24 months ✓ · CFO hire extended ✓ · Series A target Q1 2026 ✓ · International expansion tabled until Q3 ✓
PageIndex depth: § heading match · Cross-doc: 2 sources merged · Latency: 47ms
04

The P3AK Platform

Three products.
One brain.

Each product works standalone. Together they form a complete AI-native knowledge platform — from raw files to structured data room to autonomous agent that remembers everything.

01 — The Brain
p3ak‑vault
Encrypted AI knowledge store. 35 formats in. Sub-second search out. Written in Rust. Ships as a single binary.
AES-256-GCM encryption at rest
BM25 + ZVec + PageIndex hybrid search
35 file formats · OCR · Transcription
Version history · WAL audit log
98% Top-1 retrieval accuracy
cargo install p3ak-vault
$ p3ak-vault ingest \
--path company.vault \
--file contract.pdf

$ p3ak-vault search \
--query "renewal terms" \
--mode hybrid
// [1.41] contract.pdf · 31ms
02 — The Organizer
p3ak‑room
AI-native data room. Five tributaries. Upload, version, organize, and export your company's knowledge — with one click to vault.
5 tributaries: legal · finance · ops · tech · marketing
Document layers with access tiers
Export any document as .mdr
One-click vault-push to p3ak-vault
Built with Next.js 14 + Drizzle ORM
Self-host or connect to P3AK Cloud
// Room API — export + push
GET /api/companies/acme
  /export?format=mdr

POST /api/companies/acme
  /vault-push
// → p3ak-vault ingest doc.mdr
03 — The Agent
p3ak‑harness
CREST protocol agent orchestration. An AI that reads from the vault, never from raw chat history. Always cites sources. Never hallucinates facts it doesn't have.
CREST: Clarify · Risks · Establish · Sprints · Tune
Queries vault before every response
Canary-checks memory integrity on session start
Domain agents: finance · legal · ops · tech
Pi runtime — extend with custom skills
Session compaction with source citations
// CREST session start
1. canary-check company.vault
✓ recall: 98% · integrity: ok
2. vault search "current goals"
→ roadmap.md · q4-deck.pptx
3. respond with citations
// [source: roadmap.md §Q1]
p3ak-room uploads
company.vault
p3ak-harness queries

Your data.
Your vault.

One command to install. One file to own forever. No API calls to a third party. No training on your data.

Join the beta → Read the docs Explore .mdr format