Foundation Operations  ·  Stack & Workflow  ·  How We Build

Not vibe-coding.
Structured systems.
Deliberate tooling.

Every platform here runs on the same foundation — a self-hosted server fleet, a custom AI infrastructure layer, and a development workflow where context doesn't get lost between sessions. This is how it works.

$fo dev --status
● Stack: React · TypeScript · Node · Python · PostgreSQL
● Infrastructure: self-hosted · zero SaaS dependencies for core ops
● AI: custom MCP server · multi-model routing · full fleet access
Core Stack

What everything is built on

Same foundation across every project. No framework churn — pick tools that work in production and go deep on them.

⚛️  Frontend
React + TypeScript
Modern React with full type safety end-to-end. Component libraries, real-time sockets, complex state — built to last, not to demo.
React 19TypeScriptTailwind v4VitetRPC
🟩  Backend
Node.js + Python
Node for production APIs and real-time services. Python for automation, data pipelines, and scraping. Job queues for async workloads that can't block a request cycle.
Node.jsExpressPythonSocket.IOJob Queues
🗄️  Database
PostgreSQL
Relational first — 140+ tables across production. Vector extensions for AI-powered search. The database is the source of truth, not the afterthought.
PostgreSQL 16pgvectorRedisDrizzle ORM
🐳  Infrastructure
Self-Hosted Fleet
Private VPS nodes, not managed cloud. Docker for containerized services. Every app has its own repo, its own deployment process, and is version-controlled from day one.
DockerNginxUbuntu LTSWireGuardGitHub
AI Infrastructure

AI that operates, not just suggests

The AI isn't a chat window that gives advice. It has direct access to the infrastructure — reads logs, edits files, queries databases, deploys apps. It operates alongside the work, not outside it.

🔗  Custom MCP Server
14 Tool Modules. Live Fleet Access.
A self-hosted Model Context Protocol server gives AI sessions direct access to every part of the stack — filesystem, databases, deployments, DNS, monitoring, GitHub. No copy-pasting. No context loss.
FilesystemDeploymentsDatabaseDNSMonitoringGitHubSecrets
⚖️  Model Router
Right model for the right task.
A self-hosted proxy routes every AI request to the appropriate model based on complexity — cheap fast models for simple tasks, heavyweight models only when the problem demands it. API costs stay rational at scale.
Hardness RoutingLocal InferenceMulti-ModelCost Optimization
📋  Session Context
Context that survives sessions.
Every project has a living architecture document that travels with each session. Every session ends with a log of what changed, what was decided, and what's next. The AI is always caught up from line one.
Master MD FilesSession LogsArchitecture Docs
🌐  Search Layer
Pre-scrubbed web results.
Web search runs through a dedicated search API that pre-scrubs and compresses results before they reach the AI. What used to cost thousands of tokens now costs a handful. Speed up, cost down.
TavilyToken EfficiencyLive Web Data
Philosophy

How the work gets done

These aren't aspirational values. They're the actual constraints that every system here is built inside.

01 ──
If it runs, it's versioned.
Every script, service, and app on the fleet has a GitHub repo. Nothing runs without source control. No orphaned code, no undocumented one-offs.
02 ──
Fail fast. Don't mask errors.
Self-documenting code, clean naming, comments only where logic is genuinely non-obvious. Let it throw. Swallowed errors become production mysteries.
03 ──
Build once. Run everywhere.
The first system for a business is the hard one. The architecture is designed so the second client deploys from configuration, not a new codebase.
04 ──
Zero SaaS for core ops.
Payments, databases, deployments, monitoring, job queues — all self-hosted. No recurring vendor lock-in for anything that sits on the critical path.
05 ──
AI is a force multiplier, not a magic button.
Structured sessions, full context, live infrastructure access. The AI does more because it knows more — not because the prompt was clever.
06 ──
Orchestrate specialists. Don't do everything yourself.
The right tool for each job — search, inference, routing, deployment — each handled by something built for exactly that. The developer coordinates; the tools execute.
Mental Models

How AI actually works

Understanding these mechanics is what separates someone who uses AI from someone who operates it.

📄  Context Windows
Every API call starts from scratch.
There's no persistent memory. Every time you send a message, the entire conversation is re-read from the top. The model isn't remembering — it's reading a growing transcript and predicting the next logical response.

This is why context management matters. As the conversation grows, the model burns more tokens just catching up on history — leaving less headroom for actual work. You want to operate inside the context window, not push against the edge of it.
Every time you send a message, it re-reads the whole conversation from the top. No memory. Just a growing transcript it has to catch up on before it can say anything back. The longer it gets, the more you're paying to re-explain yourself. — Eric Diaz
You
hi
Model
hi · hi
You
hi · hi · how's your day?
Model
reads all of it → good.
⚖️  Model Routing
Route by hardness, not by habit.
Not every prompt needs the most powerful model. A hardness-based router scores each request on complexity and routes it to the cheapest model capable of answering well.

Simple questions hit a fast, cheap model or local inference. Architecture problems hit something heavyweight. The workflow doesn't change — the routing happens underneath automatically.
How many dimples on a golf ball? Local · free
Summarize this log file. Fast · cheap
Write a TypeScript migration for this schema. Mid-tier
Design the architecture for a multi-tenant platform. Heavyweight
↳ scored per-request · model swapped transparently · costs stay rational at scale
🎼  Orchestration
You're the conductor, not the musician.
The next level of AI use isn't better prompts — it's orchestrating specialists. Instead of asking one model to do everything, you bring in purpose-built tools for each job and let the AI coordinate them.

Search is handled by a scraping API. Routing is handled by a proxy. Local inference handles cheap tasks for free. The AI focuses on reasoning and judgment. You define the goal and let the system execute it.
You're not the one swinging the hammer. You're the general contractor. You know which subcontractor to call, when to call them, and how to get out of their way. That's the whole job. — Eric Diaz
🎯
You
Define the goal. Direct the work.
🧠
AI Model
Reasoning, judgment, architecture.
Fast Model
Bulk tasks. Cheap and quick.
🌐
Search API
Web results, pre-compressed.
🔗
MCP Server
Live infrastructure access.
🏠
Local Inference
Free tier. Zero API cost.
🔍  Token Efficiency
Pre-scrub the internet. Don't waste tokens on it.
Standard web search makes an AI fetch entire pages, burn thousands of tokens reading them, then extract one useful sentence. A search API like Tavily does the extraction first — it hands the AI a compressed, relevant result instead of a raw HTML dump.

Same answer. A fraction of the cost. This is the difference between a tool that does one thing perfectly and one that does everything expensively.
Raw Search
AI reads full pages
~2,000 tokens
Search API
Pre-scrubs & compresses
buffer layer
Clean Result
AI gets the answer
~3 tokens
Why burn two thousand tokens reading a web page when someone else already did it and can hand you the answer in three words? That's the whole idea. — Eric Diaz