Latest Posts

Mar 7, 2026
Transformers from First Principles: Part 2 - Embeddings

A deep dive into how transformers convert token IDs into numerical representations through embeddings. We cover token embeddings, positional embeddings, the embedding dimension tradeoff, and why models struggle at the edges of their context window. This is the second post in a series explaining transformers.
Feb 14, 2026
Transformers from first principles: Part 1 - Tokenization

A deep dive into how transformers process text, starting with the first step: tokenization. We cover vocabularies, the tradeoffs of vocabulary size, special tokens, and edge cases like anomalous tokens. This is the first post in a series explaining transformers.
Sep 11, 2025
Using Structured Output with LLMs

This post covers three approaches for getting structured output out of LLMs: use native structured output APIs when available, fall back to function calling for broader provider support, or combine careful prompting with validation as a universal solution.
Aug 20, 2025
Building a RAG from scratch

Learn to build a complete Retrieval Augmented Generation (RAG) system from scratch using Python. This hands-on guide covers everything from document chunking and embeddings to similarity search and LLM integration, with interactive 3D visualizations showing how semantic search actually works under the hood.
Jun 9, 2025
Optimizing Inference for Image Generation Models: Memory Tricks and Quantization

Let's explore how I was able to run an image generation model, FLUX.1 Dev, with only 20% of its required total VRAM. Through quantization and memory optimization techniques, I'll show you practical strategies that make high-quality image generation accessible on consumer GPUs, complete with performance benchmarks and real-world examples.
Mar 2, 2025
Agentic and workflows example implementations

Practical examples of agentic and workflow-based AI patterns, with code and design decisions inspired by Anthropic’s research. Learn how to implement direct LLM calls, prompt chaining, and routing workflows—while keeping your systems simple and maintainable.
Feb 25, 2025
Vibe check for Claude Sonnet 3.7

Anthropic’s Claude 3.7 Sonnet introduces extended thinking, visible thought process, and impressive benchmarks. Here’s my first impressions and why this model feels like a big step forward for practical AI.
Feb 11, 2025
What is the deal with Agentic AI systems?

Are agentic AI systems always the answer? In this post, I explore why simplicity often beats complexity, when agent-based architectures make sense, and why you should reach for agents only when truly needed.