The journal
Working
notes
from
the workshop.
Categories
Tags
Showing 12 of 347 articles
- Cloud & Infrastructure·
Caching LLM Responses: When It Helps, When It Hurts, and How to Implement It
LLM calls are slow and expensive. Caching them is the obvious move. But caching the wrong responses breaks the user experience in ways that are subtle and hard to debug. Here's a practical guide to doing it right.
LLMCachingPerformanceAI - AI Integration·
Feature Flags for AI Features: Shipping Safely When Outputs Are Non-Deterministic
Rolling back a bad API endpoint takes seconds. Rolling back a bad LLM integration is harder — the damage may already be in your logs, your users' inboxes, or your clients' feeds. Feature flags are how you ship AI features without betting everything on launch day.
AIFeature FlagsProductionDeveloper Tools - AI Integration·
LLM API Costs Are Out of Control: A Production Guide to Cutting Your Bill
AI features ship fast. Then the monthly API bill arrives. Here's a systematic approach to understanding and reducing LLM costs without breaking the product.
AILLMCost OptimizationAPI - Business·
Scoping AI Projects for Clients: The Questions That Prevent Expensive Mistakes
Most AI project failures start at the scoping stage. The client wants 'AI integration.' The agency quotes a price. Nobody defines what that actually means. Here's how to scope these projects properly.
AgencyAIBusinessClient Management - Technology·
uv: How We Replaced pip, poetry, and pyenv With One Tool
uv is a Python package manager written in Rust that handles dependencies, virtual environments, and Python version management. We've been using it across all our projects since early 2026. Here's what actually changed.
PythonDeveloper ToolsuvPackage Manager - AI Integration·
Writing AI IDE Rules That Actually Work: Cursor, Windsurf, and Copilot
The AI IDE tools everyone uses have a feature most developers set up once and never tune: custom rules. Here's how to write rules that change how the tool generates code, not just what it says it will do.
AIDeveloper ToolsCursorProductivity - Web Development·
HTTP/3 and QUIC in 2026: When to Enable It and What to Expect
HTTP/3 is in production at every major CDN and supported by all modern browsers. Whether it actually helps your application depends on factors most guides don't explain.
Web DevelopmentPerformanceHTTPNetworking - AI Integration·
LLM Structured Outputs in 2026: Reliable JSON Without the Parser Nightmares
Getting a language model to return valid, schema-conforming JSON is harder than it looks. Here's what works in production, from native structured output APIs to library-level validation.
LLMAIJSONPython - Business·
What an AI Feature Actually Costs: The Budget Lines Nobody Plans For
Every AI integration budget starts with API costs and ends with surprises. Here's what production AI features actually cost once you account for everything the initial estimate missed.
BusinessAIAgencyPricing - Cloud & Infrastructure·
Zero-Downtime Database Migrations: A Field Guide for Production Systems
ALTER TABLE locks your database. Your migration takes longer than expected. Users get errors. Here's how to handle schema changes that don't interrupt production traffic.
DatabasePostgreSQLPerformanceProduction - Business·
Fixed Price vs Time and Materials: The Contract Decision That Shapes Every Project
The choice between fixed-price and time-and-materials contracts is one of the most consequential decisions in an agency-client relationship. Each model transfers risk differently. Here's how to decide which one fits your project.
AgencyBusinessPricingContracts - Cloud & Infrastructure·
OpenTelemetry for AI Applications: Observability When Your Stack Thinks for Itself
Traditional monitoring tells you a request took 800ms. It doesn't tell you the LLM spent 600ms on a bad prompt, returned a hallucinated answer, and burned $0.04 in tokens. Here's how to actually instrument AI applications with OpenTelemetry.
OpenTelemetryObservabilityAIMonitoring