Foundation Operations — How We Build

Core Stack

What everything is built on

Same foundation across every project. No framework churn — pick tools that work in production and go deep on them.

⚛️ Frontend

React + TypeScript

Modern React with full type safety end-to-end. Component libraries, real-time sockets, complex state — built to last, not to demo.

React 19TypeScriptTailwind v4VitetRPC

🟩 Backend

Node.js + Python

Node for production APIs and real-time services. Python for automation, data pipelines, and scraping. Job queues for async workloads that can't block a request cycle.

Node.jsExpressPythonSocket.IOJob Queues

🗄️ Database

PostgreSQL

Relational first — 140+ tables across production. Vector extensions for AI-powered search. The database is the source of truth, not the afterthought.

PostgreSQL 16pgvectorRedisDrizzle ORM

🐳 Infrastructure

Self-Hosted Fleet

Private VPS nodes, not managed cloud. Docker for containerized services. Every app has its own repo, its own deployment process, and is version-controlled from day one.

DockerNginxUbuntu LTSWireGuardGitHub

AI Infrastructure

AI that operates, not just suggests

The AI isn't a chat window that gives advice. It has direct access to the infrastructure — reads logs, edits files, queries databases, deploys apps. It operates alongside the work, not outside it.

🔗 Custom MCP Server

14 Tool Modules. Live Fleet Access.

A self-hosted Model Context Protocol server gives AI sessions direct access to every part of the stack — filesystem, databases, deployments, DNS, monitoring, GitHub. No copy-pasting. No context loss.

FilesystemDeploymentsDatabaseDNSMonitoringGitHubSecrets

⚖️ Model Router

Right model for the right task.

A self-hosted proxy routes every AI request to the appropriate model based on complexity — cheap fast models for simple tasks, heavyweight models only when the problem demands it. API costs stay rational at scale.

Hardness RoutingLocal InferenceMulti-ModelCost Optimization

📋 Session Context

Context that survives sessions.

Every project has a living architecture document that travels with each session. Every session ends with a log of what changed, what was decided, and what's next. The AI is always caught up from line one.

Master MD FilesSession LogsArchitecture Docs

🌐 Search Layer

Pre-scrubbed web results.

Web search runs through a dedicated search API that pre-scrubs and compresses results before they reach the AI. What used to cost thousands of tokens now costs a handful. Speed up, cost down.

TavilyToken EfficiencyLive Web Data

Philosophy

How the work gets done

These aren't aspirational values. They're the actual constraints that every system here is built inside.

01 ──

If it runs, it's versioned.

Every script, service, and app on the fleet has a GitHub repo. Nothing runs without source control. No orphaned code, no undocumented one-offs.

02 ──

Fail fast. Don't mask errors.

Self-documenting code, clean naming, comments only where logic is genuinely non-obvious. Let it throw. Swallowed errors become production mysteries.

03 ──

Build once. Run everywhere.

The first system for a business is the hard one. The architecture is designed so the second client deploys from configuration, not a new codebase.

04 ──

Zero SaaS for core ops.

Payments, databases, deployments, monitoring, job queues — all self-hosted. No recurring vendor lock-in for anything that sits on the critical path.

05 ──

AI is a force multiplier, not a magic button.

Structured sessions, full context, live infrastructure access. The AI does more because it knows more — not because the prompt was clever.

06 ──

Orchestrate specialists. Don't do everything yourself.

The right tool for each job — search, inference, routing, deployment — each handled by something built for exactly that. The developer coordinates; the tools execute.

Mental Models

How AI actually works

Understanding these mechanics is what separates someone who uses AI from someone who operates it.

📄 Context Windows

Every API call starts from scratch.

There's no persistent memory. Every time you send a message, the entire conversation is re-read from the top. The model isn't remembering — it's reading a growing transcript and predicting the next logical response.

This is why context management matters. As the conversation grows, the model burns more tokens just catching up on history — leaving less headroom for actual work. You want to operate inside the context window, not push against the edge of it.

Every time you send a message, it re-reads the whole conversation from the top. No memory. Just a growing transcript it has to catch up on before it can say anything back. The longer it gets, the more you're paying to re-explain yourself. — Eric Diaz

You →

hi

Model →

hi · hi

You →

hi · hi · how's your day?

Model →

reads all of it → good.

⚖️ Model Routing

Route by hardness, not by habit.

Not every prompt needs the most powerful model. A hardness-based router scores each request on complexity and routes it to the cheapest model capable of answering well.

Simple questions hit a fast, cheap model or local inference. Architecture problems hit something heavyweight. The workflow doesn't change — the routing happens underneath automatically.

How many dimples on a golf ball? → Local · free

Summarize this log file. → Fast · cheap

Write a TypeScript migration for this schema. → Mid-tier

Design the architecture for a multi-tenant platform. → Heavyweight

            ↳ scored per-request · model swapped transparently · costs stay rational at scale
          

🎼 Orchestration

You're the conductor, not the musician.

The next level of AI use isn't better prompts — it's orchestrating specialists. Instead of asking one model to do everything, you bring in purpose-built tools for each job and let the AI coordinate them.

Search is handled by a scraping API. Routing is handled by a proxy. Local inference handles cheap tasks for free. The AI focuses on reasoning and judgment. You define the goal and let the system execute it.

You're not the one swinging the hammer. You're the general contractor. You know which subcontractor to call, when to call them, and how to get out of their way. That's the whole job. — Eric Diaz

🎯

You

Define the goal. Direct the work.

🧠

AI Model

Reasoning, judgment, architecture.

⚡

Fast Model

Bulk tasks. Cheap and quick.

🌐

Search API

Web results, pre-compressed.

🔗

MCP Server

Live infrastructure access.

🏠

Local Inference

Free tier. Zero API cost.

🔍 Token Efficiency

Pre-scrub the internet. Don't waste tokens on it.

Standard web search makes an AI fetch entire pages, burn thousands of tokens reading them, then extract one useful sentence. A search API like Tavily does the extraction first — it hands the AI a compressed, relevant result instead of a raw HTML dump.

Same answer. A fraction of the cost. This is the difference between a tool that does one thing perfectly and one that does everything expensively.

Raw Search

AI reads full pages

~2,000 tokens

→

Search API

Pre-scrubs & compresses

buffer layer

→

Clean Result

AI gets the answer

~3 tokens

Why burn two thousand tokens reading a web page when someone else already did it and can hand you the answer in three words? That's the whole idea. — Eric Diaz

Not vibe-coding. Structured systems. Deliberate tooling.

What everything is built on

AI that operates, not just suggests

How the work gets done

How AI actually works

Not vibe-coding.
Structured systems.
Deliberate tooling.