Building Infrastructure with AI: A Case Study

April 17, 2026

Over a weekend in April, I built and deployed a self-hosted news aggregation system—a pipeline that pulls from twenty sources every morning, deduplicates items using vector embeddings, generates AI summaries, clusters primary sources with their commentary, and serves everything through a web dashboard. I am not a software engineer. My daily tools are PowerPoint, SharePoint, and the occasional Excel formula. I built this using AI as my primary collaborator: Claude for design and specification, Cowork for execution, and ChatGPT and Gemini for troubleshooting along the way.

This post is a departure from the usual subject matter here, which tends to focus on judicial decisions, ethics rules, and the architectural properties of large language models. But the project illustrates something I think my readers need to see in concrete terms: the specification-first, verification-heavy workflow I have been describing in prior posts applies well beyond AI-generated legal work. The same method—specify before you build, verify in a separate session, treat every AI output as a first draft—transfers directly to infrastructure, and the skills it demands are the skills lawyers already possess.

The problem

My work sits at the intersection of AI, legal practice, legal education, and knowledge management. The relevant news comes from everywhere: judicial opinions on CourtListener, Substack newsletters from researchers like Ethan Mollick, legal tech blogs like Bob Ambrogi’s LawSites, ABA ethics opinions, podcast episodes, regulatory filings, and mainstream tech coverage. No single aggregator covers this spread. I was spending fragmented time across a dozen tabs each morning, often reading the same story covered by three different outlets, and had no systematic way to archive items I wanted to revisit.

I needed a single dashboard with short digests of each item, the ability to mark things as read or important, an archive for later retrieval, and, critically, some way to reduce duplicates and cluster related coverage together. A judicial opinion and the three blog posts analyzing it should appear as one entry, not four.

Brainstorming and specification in Claude Chat

I started with a plain-language description of the problem in a Claude chat conversation. No code, no architecture diagrams, just “help me brainstorm and then design a way to collect in a single place all news relevant to my work.”

Claude proposed three approaches ranging from off-the-shelf (Feedly) to fully custom (Python pipeline), and recommended a middle path: n8n as the orchestration layer, Claude’s own API for summarization, and a lightweight web dashboard. We went back and forth on delivery format (I chose a dashboard over email), volume tolerance (fifty items per day with aggressive summarization), and hosting (a small cloud VPS rather than my home NAS).

The key output of this conversation was not code but a complete project specification document: architecture diagrams, database schema, Docker Compose configuration, the exact LLM prompts for summarization and source clustering, a phased implementation plan, and a rationale log explaining every major design decision. Claude also produced an interactive React prototype of the dashboard so I could evaluate the UX before committing to implementation.

I then ran adversarial testing on the spec: probing for gaps, challenging assumptions, and verifying that the proposed architecture could handle edge cases. A specification document generated by an AI is a first draft, and I treated it the way I treat a student’s first attempt at a research memo: read it critically, push on the weak spots, and send it back for revision. The judgment-delegation framework I described in a prior post applies here with equal force. I asked Claude to generate options, structures, and tradeoff analyses. I did not ask it to decide which architecture was “best”; I evaluated the alternatives myself, using the spec as my working document.

Execution in Cowork

Once the spec was solid, I shifted to Cowork, Anthropic’s tool for delegating discrete tasks to Claude with file and computer access. Where Chat is a conversation, Cowork is closer to handing a task to a capable assistant along with the relevant documents.

I created a separate task conversation for each major implementation step: provisioning the VPS, installing Docker, configuring the reverse proxy, building the n8n ingestion workflow, standing up the database, and assembling the dashboard. This separation kept each conversation focused and prevented the context from getting muddled: the one-task-one-conversation principle applied to infrastructure work just as it applies to legal analysis. When I needed to reference an earlier decision, I pulled it from the spec document rather than expecting the model to recall a conversation from three tasks ago.

At each stage, I provided Claude with the project spec and asked it to execute the next phase. When things broke—and they broke often—I gave Claude screenshots, error messages, and terminal output. The pattern was consistent: describe the failure, show the evidence, get a diagnosis and fix.

Troubleshooting across models

When I hit infrastructure problems—Docker permission errors, Caddy configuration syntax issues, n8n’s authentication flow not matching its documentation—I did not rely on Claude alone. I also used ChatGPT and Gemini for troubleshooting.

Some error messages got faster, more accurate diagnoses from one model than another. When I was stuck on a Caddyfile syntax problem that had Claude and me going in circles, a fresh perspective from a different model identified the issue immediately. The practical lesson is one I have already argued in the context of sycophancy: when a model’s output confirms your existing approach and you are still stuck, a second model operating without that conversational history can surface what the first one missed. Treat AI models the way you would treat colleagues with overlapping but non-identical expertise.

What went wrong

Docker permissions tripped me up repeatedly because I skipped a post-installation step the guide told me to perform. n8n’s authentication system has changed since its documentation—and Claude’s training data—was written, and figuring out the current approach required stripping out configuration and resetting data volumes. The VPS ran out of memory under load: three services on one gigabyte of RAM was not viable, and I had to hard-reboot through DigitalOcean’s web console and resize to a larger instance. Caddy’s subpath routing created cookie and redirect conflicts that were cleanest to resolve by giving n8n its own domain, a design compromise I would not have predicted at the specification stage.

Two observations about these failures. First, every one of them was solvable without engineering expertise. They required patience, the ability to read an error message and describe it clearly, and willingness to try a different approach when the first one did not work. Second, several stemmed from stale training data: the model’s knowledge of how n8n handles authentication was outdated, and its assumptions about memory requirements did not match current resource demands. The lesson echoes what I have written about verifying AI-generated legal analysis: the model produces confident output regardless of whether the underlying information is current, and the user bears the burden of checking.

What worked

The specification-first approach saved significant time during implementation. Because the architecture, schema, prompts, and deployment configuration were all documented before I touched a server, each implementation step had a clear target. I was not making design decisions and debugging Docker at the same time.

The interactive dashboard prototype—built during the brainstorming phase, before any backend existed—let me validate the UX early. I could see exactly how source clustering would look, how topic filters would work, and how the read/starred/archived states would behave. Changing a UI decision at the prototype stage costs nothing; changing it after you have built the API costs days.

Creating separate Cowork conversations per task kept the context clean. AI models perform better with focused context than with a sprawling conversation that covers everything from database schema to CSS styling. This is the infrastructure equivalent of the OTOC rule, and it worked for the same reasons.

What this demonstrates for legal professionals

The skills that made this project work were not technical; they were the skills I use in teaching and legal scholarship every day: defining a problem with precision, evaluating a proposed solution against requirements, spotting gaps in reasoning, describing failures with enough specificity to enable diagnosis, and knowing when to seek a second opinion.

The specification document was the most valuable artifact of the entire project, more valuable than the code or the running system. It captures the reasoning behind every design decision. When something breaks six months from now, the spec explains why the system was built this way and what tradeoffs were accepted. A good transactional lawyer does the same thing when she documents not just the deal terms but the logic behind them.

I used three different models over the course of the project. Each contributed something the others did not. Each also made mistakes: outdated configuration syntax, deprecated API references, architecture assumptions that did not survive contact with the actual infrastructure. The skill that determined whether those mistakes derailed the project or became minor obstacles was the same skill that determines whether a lawyer catches a bad case citation: the habit of verifying before relying.

The cost of the entire system is modest. The VPS runs at twelve dollars per month. API costs for daily summarization are under five dollars per month. The AI tools I used are available on consumer-tier subscriptions. The scarcest resource was my time and attention: roughly a weekend’s worth of focused work, spread across several sessions.

The broader point

Every post on this blog has argued, in one form or another, that using AI well requires the same professional skills that using any powerful tool well requires: clear delegation, critical evaluation, structured verification, and the judgment to know what to trust and what to check. This project tested that thesis outside the legal domain, and the thesis held. The workflow that produced a working news aggregation system is the same workflow I recommend for producing a reliable contract analysis: specify before you execute, keep your context focused, verify in a separate session, and never let the model’s confidence substitute for your own judgment.

That AI could help a non-engineer build infrastructure did not surprise me; the marketing promises that much. What surprised me was how precisely the failure modes mapped onto the ones I have been writing about in the legal context. Stale training data produced the same kind of confident-but-wrong output that produces hallucinated case citations. Sycophantic confirmation of my initial approach delayed the fix for the Caddyfile problem by the same mechanism that delays a lawyer’s recognition that her legal theory has a hole. The mitigation strategies were identical: adversarial prompting, fresh sessions, and a reflexive distrust of agreement.

This post describes a personal project using Claude, Cowork, ChatGPT, and Gemini to build a self-hosted news aggregation system. The infrastructure runs on DigitalOcean, uses n8n for workflow orchestration, and serves content through a Caddy reverse proxy. The specification-first approach and verification strategies discussed here build on the frameworks described in prior posts on context management, judgment delegation, and sycophancy.