<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[AI Architect's Playbook]]></title><description><![CDATA[The playbook for engineers becoming AI technical leaders. Real lessons from architecting AI-native capabilities inside IBM Db2: vector search, LLMs, RAG, agents.]]></description><link>https://aiarchitectplaybook.com</link><image><url>https://substackcdn.com/image/fetch/$s_!JxUw!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3426cdf-38a8-42c7-88f0-f09d389aec58_1300x1300.jpeg</url><title>AI Architect&apos;s Playbook</title><link>https://aiarchitectplaybook.com</link></image><generator>Substack</generator><lastBuildDate>Sun, 31 May 2026 18:23:14 GMT</lastBuildDate><atom:link href="https://aiarchitectplaybook.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Shaikh Quader]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[aiarchplaybook@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[aiarchplaybook@substack.com]]></itunes:email><itunes:name><![CDATA[Shaikh Quader]]></itunes:name></itunes:owner><itunes:author><![CDATA[Shaikh Quader]]></itunes:author><googleplay:owner><![CDATA[aiarchplaybook@substack.com]]></googleplay:owner><googleplay:email><![CDATA[aiarchplaybook@substack.com]]></googleplay:email><googleplay:author><![CDATA[Shaikh Quader]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Why Your RAG Pipeline Needs a Re-Ranker]]></title><description><![CDATA[If your RAG pipeline returns chunks that are close to your question but not quite right, a re-ranker is probably the fix.]]></description><link>https://aiarchitectplaybook.com/p/why-your-rag-pipeline-needs-a-re</link><guid isPermaLink="false">https://aiarchitectplaybook.com/p/why-your-rag-pipeline-needs-a-re</guid><dc:creator><![CDATA[Shaikh Quader]]></dc:creator><pubDate>Sat, 23 May 2026 12:01:06 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Pdmt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9a8357a-c086-4336-828a-35af01bd59fe_2752x1536.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Pdmt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9a8357a-c086-4336-828a-35af01bd59fe_2752x1536.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Pdmt!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9a8357a-c086-4336-828a-35af01bd59fe_2752x1536.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Pdmt!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9a8357a-c086-4336-828a-35af01bd59fe_2752x1536.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Pdmt!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9a8357a-c086-4336-828a-35af01bd59fe_2752x1536.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Pdmt!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9a8357a-c086-4336-828a-35af01bd59fe_2752x1536.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Pdmt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9a8357a-c086-4336-828a-35af01bd59fe_2752x1536.jpeg" width="1456" height="813" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b9a8357a-c086-4336-828a-35af01bd59fe_2752x1536.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:813,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3579188,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://aiarchplaybook.substack.com/i/198788395?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9a8357a-c086-4336-828a-35af01bd59fe_2752x1536.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Pdmt!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9a8357a-c086-4336-828a-35af01bd59fe_2752x1536.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Pdmt!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9a8357a-c086-4336-828a-35af01bd59fe_2752x1536.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Pdmt!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9a8357a-c086-4336-828a-35af01bd59fe_2752x1536.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Pdmt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9a8357a-c086-4336-828a-35af01bd59fe_2752x1536.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>If you&#8217;ve built a basic RAG pipeline, you&#8217;ve done this: chunk your documents, embed them with a model like <code>text-embedding-3-small</code>, store the vectors, and at query time embed the user&#8217;s question and pull back the top-k most similar chunks.</p><p>It works. Sort of. Then one day a user asks a reasonable question, and your pipeline returns chunks that are <em>about</em> the right topic but don&#8217;t answer the question.</p><p>Sometimes worse. The user asks &#8220;Is ibuprofen safe to take during late pregnancy?&#8221; The pipeline returns this chunk as the top match:</p><blockquote><p><em>&#8220;Ibuprofen is widely used and considered safe for general pain relief.&#8221;</em></p></blockquote><p>The chunk that actually answers the question sits two ranks below:</p><blockquote><p><em>&#8220;Ibuprofen is not recommended after 20 weeks of pregnancy due to fetal kidney risks.&#8221;</em></p></blockquote><p>Both chunks are about ibuprofen safety, so vector search ranks both highly. But only one answers the question the user asked. The pipeline can&#8217;t tell the difference, and confidently hands back the wrong chunk.</p><p>This post explains why that happens, and how re-rankers fix it.</p><p>I published a companion GitHub <a href="http://github.com/shaikhq/rag-reranker-demo">repo</a> with a runnable notebook that demonstrates three common ways vector search fails (negation, missed constraints, topic-over-answer) and shows the re-ranker fixing them.</p><h2>The Library Analogy</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jhRv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24177875-a890-4f29-a1f7-7081101ed330_1024x434.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jhRv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24177875-a890-4f29-a1f7-7081101ed330_1024x434.png 424w, https://substackcdn.com/image/fetch/$s_!jhRv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24177875-a890-4f29-a1f7-7081101ed330_1024x434.png 848w, https://substackcdn.com/image/fetch/$s_!jhRv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24177875-a890-4f29-a1f7-7081101ed330_1024x434.png 1272w, https://substackcdn.com/image/fetch/$s_!jhRv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24177875-a890-4f29-a1f7-7081101ed330_1024x434.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jhRv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24177875-a890-4f29-a1f7-7081101ed330_1024x434.png" width="1024" height="434" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/24177875-a890-4f29-a1f7-7081101ed330_1024x434.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:434,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:739442,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aiarchplaybook.substack.com/i/198788395?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24177875-a890-4f29-a1f7-7081101ed330_1024x434.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jhRv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24177875-a890-4f29-a1f7-7081101ed330_1024x434.png 424w, https://substackcdn.com/image/fetch/$s_!jhRv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24177875-a890-4f29-a1f7-7081101ed330_1024x434.png 848w, https://substackcdn.com/image/fetch/$s_!jhRv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24177875-a890-4f29-a1f7-7081101ed330_1024x434.png 1272w, https://substackcdn.com/image/fetch/$s_!jhRv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F24177875-a890-4f29-a1f7-7081101ed330_1024x434.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>A giant library. Every book has many sections, each holding facts, claims, definitions, maybe answers to questions someone will ask one day.</p><p>You need to make it searchable. So you hire a librarian to summarize every section on a small card. The catch: she writes the cards before any reader walks in, with no idea what readers will ask.</p><p>What&#8217;s the best she can do? Write a high-level summary. &#8220;This section discusses the causes of type 2 diabetes.&#8221; &#8220;This section covers lithium-ion battery thermal runaway.&#8221;</p><p>These cards capture the topic. They don&#8217;t capture every fact, every constraint, every nuance. A one-line summary loses detail by definition.</p><p>That&#8217;s what an embedding model does. Feed it a chunk of text and you get back a vector: a list of numbers, usually between 384 and 1536 long, depending on the model. The vector is the digital summary card. It captures roughly what the chunk is about, written before any question is asked.</p><p>A reader walks in with a question. She writes that on a card too. The librarian compares the question card against every section card and hands back the closest matches.</p><p>That&#8217;s vector search. Embed the question, compute similarity against the pre-computed chunk vectors, return the top matches.</p><h2>How Embedding Models Work, and Where They Break</h2><p>An embedding model is a transformer <strong>encoder</strong>, a model that reads text and produces numbers that represent it. In vector search, the encoder runs twice: once on each chunk (ahead of time, to build the index) and once on each query (at search time). The two outputs (vectors) are then compared by a similarity score computed outside the model.</p><p>This architecture is called a <strong>bi-encoder</strong>. The model never sees the query and chunk together.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!M0k5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02dd0a85-c58a-4b16-9d0b-a3322ca3ae1c_1707x420.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!M0k5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02dd0a85-c58a-4b16-9d0b-a3322ca3ae1c_1707x420.png 424w, https://substackcdn.com/image/fetch/$s_!M0k5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02dd0a85-c58a-4b16-9d0b-a3322ca3ae1c_1707x420.png 848w, https://substackcdn.com/image/fetch/$s_!M0k5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02dd0a85-c58a-4b16-9d0b-a3322ca3ae1c_1707x420.png 1272w, https://substackcdn.com/image/fetch/$s_!M0k5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02dd0a85-c58a-4b16-9d0b-a3322ca3ae1c_1707x420.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!M0k5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02dd0a85-c58a-4b16-9d0b-a3322ca3ae1c_1707x420.png" width="1456" height="358" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/02dd0a85-c58a-4b16-9d0b-a3322ca3ae1c_1707x420.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:358,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:56768,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aiarchplaybook.substack.com/i/198788395?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02dd0a85-c58a-4b16-9d0b-a3322ca3ae1c_1707x420.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!M0k5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02dd0a85-c58a-4b16-9d0b-a3322ca3ae1c_1707x420.png 424w, https://substackcdn.com/image/fetch/$s_!M0k5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02dd0a85-c58a-4b16-9d0b-a3322ca3ae1c_1707x420.png 848w, https://substackcdn.com/image/fetch/$s_!M0k5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02dd0a85-c58a-4b16-9d0b-a3322ca3ae1c_1707x420.png 1272w, https://substackcdn.com/image/fetch/$s_!M0k5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02dd0a85-c58a-4b16-9d0b-a3322ca3ae1c_1707x420.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>The bi-encoder design is what makes vector search fast. You pre-compute every chunk vector once, store them in an index, and reuse them for every future query. At query time, you embed the question once and search across millions of vectors in milliseconds.</p><p>The architecture has two compressions, both lossy:</p><p><strong>Chunks get compressed before anyone asks a question.</strong> A chunk holds several facts, possibly answers to several questions, plus context and noise. All of it squashes into one vector: roughly the aggregate of everything in the chunk, in the model&#8217;s coordinate space (often called &#8220;semantic space&#8221; because chunks with similar meanings land near each other). No fact gets its own representation. The model doesn&#8217;t know which fact will matter for a future question, so it can&#8217;t favor any.</p><p><strong>Questions get compressed without knowing which document they&#8217;ll meet.</strong> Same problem on the question side. Intent, constraints, framing: all reduced to one topical fingerprint. When you embed &#8220;What was Apple&#8217;s revenue in Q3 2024?&#8221;, the vector mostly represents &#8220;Apple revenue recent quarter.&#8221; &#8220;Q3 2024&#8221; barely registers.</p><p>You&#8217;re comparing two lossy fingerprints. Nothing in your pipeline ever looks at the raw question alongside the raw chunk.</p><h2>Three Common Ways Vector Search Fails</h2><p>These are three canonical failure modes the <a href="https://github.com/shaikhq/rag-reranker-demo">companion repo</a> shows. Each maps to a query in the demo notebook.</p><p><strong>Negation gets lost.</strong></p><ul><li><p>Question: &#8220;Is ibuprofen recommended late in pregnancy?&#8221;</p></li><li><p>Chunk A: &#8220;Ibuprofen is widely used for general pain relief.&#8221;</p></li><li><p>Chunk B: &#8220;Ibuprofen is <em>not</em> recommended after 20 weeks of pregnancy due to fetal kidney risks.&#8221;</p></li></ul><p>Both chunks share the same vocabulary about ibuprofen, so vector search ranks them similarly. But only Chunk B answers the question about late pregnancy; Chunk A answers a different question, about general pain relief. The model can&#8217;t tell the difference.</p><p><strong>Constraints get washed out.</strong></p><ul><li><p>Question: &#8220;What was Apple&#8217;s revenue in Q3 2024?&#8221;</p></li><li><p>Chunk A: &#8220;Apple&#8217;s Q2 2023 revenue grew significantly, reflecting strong iPhone sales.&#8221;</p></li><li><p>Chunk B: &#8220;Apple reported Q3 2024 revenue of $85.8 billion.&#8221;</p></li></ul><p>Both chunks share the same vocabulary about Apple and quarterly revenue, so vector search ranks them similarly. But only Chunk B answers the question about Q3 2024; Chunk A answers a different question, about Q2 2023. The exact quarter barely registers in either vector.</p><p><strong>Topic match beats answer match.</strong></p><ul><li><p>Question: &#8220;How do I reset my password?&#8221;</p></li><li><p>Chunk A: &#8220;Password security is critical. Use strong passwords, enable 2FA, never share passwords...&#8221;</p></li><li><p>Chunk B: &#8220;To reset your password, click &#8216;Forgot Password&#8217; on the login page...&#8221;</p></li></ul><p>Both chunks share the same vocabulary about passwords, so vector search ranks them similarly. But only Chunk B answers the question about resetting a password; Chunk A answers a different question, about password security best practices.</p><p>Every failure traces back to one source: vector search ranks chunks by topical overlap, but topical overlap doesn&#8217;t tell you which question each chunk actually answers. The fix is in the architecture.</p><h2>The Re-Ranker</h2><p>The architectural fix is to bring the question and the chunk <em>inside</em> the same model. Instead of comparing two vectors outside the model, a re-ranker reads the question and a candidate chunk together in one forward pass, and scores how well the chunk answers that question. It compares them word by word.</p><p>This architecture is called a <strong>cross-encoder</strong>, because the query and passage attend across each other inside the model. A re-ranker is the API around it: you send a query and a list of candidates, and the service pairs the query with each candidate internally, runs the cross-encoder on every pair, and returns the sorted list.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YpyL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4ab221a-bc63-4fb1-ac2f-c4c670668f96_1648x348.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YpyL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4ab221a-bc63-4fb1-ac2f-c4c670668f96_1648x348.png 424w, https://substackcdn.com/image/fetch/$s_!YpyL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4ab221a-bc63-4fb1-ac2f-c4c670668f96_1648x348.png 848w, https://substackcdn.com/image/fetch/$s_!YpyL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4ab221a-bc63-4fb1-ac2f-c4c670668f96_1648x348.png 1272w, https://substackcdn.com/image/fetch/$s_!YpyL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4ab221a-bc63-4fb1-ac2f-c4c670668f96_1648x348.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YpyL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4ab221a-bc63-4fb1-ac2f-c4c670668f96_1648x348.png" width="1456" height="307" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f4ab221a-bc63-4fb1-ac2f-c4c670668f96_1648x348.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:307,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:36892,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aiarchplaybook.substack.com/i/198788395?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4ab221a-bc63-4fb1-ac2f-c4c670668f96_1648x348.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YpyL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4ab221a-bc63-4fb1-ac2f-c4c670668f96_1648x348.png 424w, https://substackcdn.com/image/fetch/$s_!YpyL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4ab221a-bc63-4fb1-ac2f-c4c670668f96_1648x348.png 848w, https://substackcdn.com/image/fetch/$s_!YpyL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4ab221a-bc63-4fb1-ac2f-c4c670668f96_1648x348.png 1272w, https://substackcdn.com/image/fetch/$s_!YpyL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4ab221a-bc63-4fb1-ac2f-c4c670668f96_1648x348.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>The cross-encoder sees which chunk answers the asked question, even when several chunks share the same vocabulary: late-pregnancy safety vs. general use, Q3 2024 vs. Q2 2023, password-reset instructions vs. password best practices. It returns a relevance score for every pair: higher means more relevant.</p><p>If you were searching by hand, you&#8217;d do the same thing. You wouldn&#8217;t write summary cards in advance. You&#8217;d grab a stack of candidate books, sit down with the question in front of you, and read each passage with the question in mind. Then you&#8217;d rank them.</p><h3>Why joint processing works</h3><p>The cross-encoder&#8217;s joint processing, reading the query and passage in one forward pass, works because of the transformer&#8217;s attention mechanism. At every layer, each token weights its relationship to every other token in the input. With the question and chunk in the same input, the model can match &#8220;Q3 2024&#8221; in the question against &#8220;Q3 2024&#8221; in one chunk and &#8220;Q2 2023&#8221; in another, and tell which one the question is asking about.</p><p>A bi-encoder runs two separate forward passes. The model has no chance to compare words across them. The comparison happens at the end, between two finished vectors, after the word-level details have been compressed away.</p><p>Joint processing means more information flowing through the same computation.</p><h3>How does the model know which words are query and which are passage (answer chunk)?</h3><p>If both texts share one input, how does the model tell query from passage?</p><p>Two special tokens. <code>[CLS]</code> marks the start of the input. <code>[SEP]</code> marks the boundary between query and passage. The combined input looks like this:</p><pre><code><code>[CLS] is ibuprofen safe during late pregnancy ? [SEP] ibuprofen is not recommended after 20 weeks . [SEP]</code></code></pre><p>The model processes both texts together, but always knows which words came from the query and which came from the passage.</p><h3>Input and output</h3><p>A re-ranker typically takes 10 to 100 candidate passages per query. It returns a relevance score per passage. Some models return values between 0 and 1; others return unbounded numbers, often in the -10 to +10 range, where higher means more relevant. Most also return a sorted index list so you know which score belongs to which passage.</p><p>You sort by score and keep the top N for your LLM.</p><p>One constraint: most cross-encoders cap input at 512 tokens for the combined query plus passage. If your chunks are long, the end gets truncated, which quietly degrades quality. Keep chunks under about 400 tokens, or pick a longer-context re-ranker like <code>bge-reranker-v2-m3</code>, which supports inputs up to 8192 tokens (though it&#8217;s fine-tuned at 1024, so very long inputs see diminishing returns).</p><h3>Bi-encoder and cross-encoder, side by side</h3><p>The bi-encoder and the cross-encoder are built on the same transformer encoder, used differently. The bi-encoder runs it twice and compares vectors outside the model; the cross-encoder runs it once and judges relevance inside.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PQSj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8db78a71-a6e4-41e3-952f-8eae000379bd_821x575.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PQSj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8db78a71-a6e4-41e3-952f-8eae000379bd_821x575.png 424w, https://substackcdn.com/image/fetch/$s_!PQSj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8db78a71-a6e4-41e3-952f-8eae000379bd_821x575.png 848w, https://substackcdn.com/image/fetch/$s_!PQSj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8db78a71-a6e4-41e3-952f-8eae000379bd_821x575.png 1272w, https://substackcdn.com/image/fetch/$s_!PQSj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8db78a71-a6e4-41e3-952f-8eae000379bd_821x575.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PQSj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8db78a71-a6e4-41e3-952f-8eae000379bd_821x575.png" width="821" height="575" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8db78a71-a6e4-41e3-952f-8eae000379bd_821x575.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:575,&quot;width&quot;:821,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:210337,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aiarchplaybook.substack.com/i/198788395?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8db78a71-a6e4-41e3-952f-8eae000379bd_821x575.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!PQSj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8db78a71-a6e4-41e3-952f-8eae000379bd_821x575.png 424w, https://substackcdn.com/image/fetch/$s_!PQSj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8db78a71-a6e4-41e3-952f-8eae000379bd_821x575.png 848w, https://substackcdn.com/image/fetch/$s_!PQSj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8db78a71-a6e4-41e3-952f-8eae000379bd_821x575.png 1272w, https://substackcdn.com/image/fetch/$s_!PQSj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8db78a71-a6e4-41e3-952f-8eae000379bd_821x575.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The bi-encoder gives up precision to search at scale. The cross-encoder gives up scale to score with precision. Each makes up for the other&#8217;s weakness, which is why production RAG pipelines use both, in that order.</p><p>In your pipeline, the re-ranker call sits between vector search and the LLM:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!A407!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F259bc94f-f9b0-45c3-b632-b537158e303e_834x297.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!A407!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F259bc94f-f9b0-45c3-b632-b537158e303e_834x297.png 424w, https://substackcdn.com/image/fetch/$s_!A407!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F259bc94f-f9b0-45c3-b632-b537158e303e_834x297.png 848w, https://substackcdn.com/image/fetch/$s_!A407!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F259bc94f-f9b0-45c3-b632-b537158e303e_834x297.png 1272w, https://substackcdn.com/image/fetch/$s_!A407!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F259bc94f-f9b0-45c3-b632-b537158e303e_834x297.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!A407!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F259bc94f-f9b0-45c3-b632-b537158e303e_834x297.png" width="834" height="297" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/259bc94f-f9b0-45c3-b632-b537158e303e_834x297.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:297,&quot;width&quot;:834,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:37356,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aiarchplaybook.substack.com/i/198788395?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F259bc94f-f9b0-45c3-b632-b537158e303e_834x297.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!A407!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F259bc94f-f9b0-45c3-b632-b537158e303e_834x297.png 424w, https://substackcdn.com/image/fetch/$s_!A407!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F259bc94f-f9b0-45c3-b632-b537158e303e_834x297.png 848w, https://substackcdn.com/image/fetch/$s_!A407!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F259bc94f-f9b0-45c3-b632-b537158e303e_834x297.png 1272w, https://substackcdn.com/image/fetch/$s_!A407!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F259bc94f-f9b0-45c3-b632-b537158e303e_834x297.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Why Not Use Re-Rankers for Everything?</h2><p>A bi-encoder lets you pre-compute every chunk&#8217;s vector once and index it. At query time, you embed the question once and search across millions of vectors in milliseconds.</p><p>A re-ranker can&#8217;t be pre-computed. Every (query, passage) pair runs through the full model at query time. Re-ranking a million chunks per query would take hours.</p><p>So you combine them:</p><ol><li><p><strong>Retrieval</strong> with a bi-encoder. Pull 50 to 100 candidates from the full corpus. Cast a wide net. The goal is recall: make sure the right chunks sit <em>somewhere</em> in those 100. (Recall measures how often the right answer appears in your results at all; precision measures how often the top result is right.)</p></li><li><p><strong>Re-ranking</strong> with a cross-encoder. Score each of the 50 to 100 candidates against the question. Sort. Keep the top 5 or 10 for the LLM.</p></li></ol><p>Speed at the recall stage, precision at the ranking stage. The two-stage approach mirrors the human librarian: narrow with rough signals, then read closely to pick the winners.</p><h2>Why Not Just Let the LLM Filter for Relevance?</h2><p>A reasonable question: why add a re-ranker at all? Why not send the top 50 chunks from vector search straight to the LLM and ask it to use only the relevant ones?</p><p>You can. It loses on accuracy, cost, and latency.</p><p><strong>Accuracy.</strong> LLMs are bad at picking the most relevant chunks when many candidates sit in the same context window. The &#8220;Lost in the Middle&#8221; paper (Liu et al., 2023) showed that across GPT-3.5 and several other models, accuracy is highest when the relevant chunk sits at the very start or end of the context, and drops 20 to 30 percentage points when it sits in the middle. Follow-up work has confirmed the same pattern on GPT-4, Claude, and Gemini, even as their context windows grew past 100K tokens. A cross-encoder scores each (query, chunk) pair on its own. Position in a list never enters the math.</p><p><strong>Cost.</strong> Sending 50 chunks (about 25,000 tokens) to a frontier LLM on every query runs roughly 10x the input-token cost of re-ranking first and sending only the top 5 (about 2,500 tokens). At 10,000 queries a day, the gap adds up.</p><p><strong>Latency.</strong> Two stages are usually faster than one. A re-rank pass takes 100 to 300ms. Sending 25,000 tokens to a frontier LLM typically takes several seconds for the first token; 2,500 tokens takes one or two. Re-ranker plus a smaller LLM call ends up noticeably faster end to end.</p><p>A re-ranker is small, fast, and trained on millions of (query, doc, relevance) pairs to do one job. An LLM is trained to write fluent answers. Asking it to also act as a precise relevance classifier inside the same call moves the work to a slower, costlier, less accurate component.</p><p>The exception is small candidate sets. With only 3 to 5 chunks, position bias is mild and re-ranking adds little. Skip it. Past about 10 candidates, the gap opens fast.</p><h2>When a Re-Ranker Isn&#8217;t Worth It</h2><p>Re-rankers have their cost. Before you add one, check whether the cost is justified:</p><ul><li><p><strong>Tight latency budgets.</strong> Re-ranking adds 100 to 500ms per query. If your end-to-end budget is under a second, that hurts.</p></li><li><p><strong>Mostly exact-keyword lookups.</strong> For product SKUs, error codes, and literal strings, BM25 (a keyword algorithm that scores documents by term frequency and rarity, used by most search engines before embeddings) often beats vector search on its own.</p></li><li><p><strong>High query volume.</strong> API re-rankers charge per call. At millions of queries per day, costs add up. Self-host an open-source re-ranker if volume is high.</p></li></ul><p>If none of these apply, and your current pipeline returns &#8220;close but not quite&#8221; results, a re-ranker is worth exploring.</p><h2>Re-Ranker Options</h2><p>A short list of re-rankers in active use. They fall into two buckets: hosted APIs (call an endpoint, pay per query) and open-weights models (download and run yourself).</p><p><strong>API-based:</strong></p><ul><li><p><strong>IBM watsonx.ai Rerank.</strong> Hosts several cross-encoder models.</p></li><li><p><strong>Cohere Rerank.</strong> Current models include <code>rerank-v3.5</code> (4096-token context) and <code>rerank-v4.0</code>.</p></li><li><p><strong>Jina Reranker.</strong> Available as API or self-hosted.</p></li><li><p><strong>Voyage Rerank.</strong> API-based; current models include <code>rerank-2</code> and <code>rerank-2.5</code>.</p></li></ul><p><strong>Open weights, self-host:</strong></p><ul><li><p><code>BAAI/bge-reranker-v2-m3</code><strong>.</strong> Multilingual; supports inputs up to 8192 tokens.</p></li><li><p><code>mixedbread-ai/mxbai-rerank-large-v2</code><strong>.</strong> Supports inputs up to 32K tokens.</p></li><li><p><code>cross-encoder/ms-marco-MiniLM-L-6-v2</code><strong>.</strong> 6-layer MiniLM cross-encoder trained on MS MARCO; commonly used as a starter model.</p></li></ul><h2>Summary</h2><p>Vector search compares fingerprints, not specifics. It compresses every chunk and every question into their vectors, then matches them by topical overlap. That works for &#8220;what is this about?&#8221; It fails when negations, exact constraints, or specific entities decide the answer.</p><p>A re-ranker reads the question and the chunk together, word by word, and scores how well one answers the other. Vector search narrows the corpus to about 50 candidates; the re-ranker picks the winners. <strong>Rough filtering, then careful reading.</strong></p><p>The payoff is concrete: a correct chunk ranked 30th by embedding similarity gets promoted to position 1 or 2 by the re-ranker. That&#8217;s the precision gap the cross-encoder recovers.</p><p>If your RAG pipeline returns &#8220;close but not quite&#8221; answers, try a re-ranker next.</p><h2>Try This</h2><ul><li><p>Clone the companion repo: <strong><a href="https://github.com/shaikhq/rag-reranker-demo">github.com/shaikhq/rag-reranker-demo</a></strong>. Run the notebook end to end, see the three failure modes, see them fixed.</p></li><li><p>Swap in your own PDF and rerun. The same lift usually shows up on real data.</p></li><li><p>Wire a re-ranker (watsonx.ai, Cohere, or a self-hosted BGE model) into your pipeline behind a feature flag.</p></li><li><p>Measure top-1 and top-5 accuracy on a small eval set, with and without re-ranking.</p></li><li><p>Tune the candidate count (50 vs 100 vs 200) and watch the precision-versus-latency curve.</p></li></ul>]]></content:encoded></item><item><title><![CDATA[How I Grew as a Technical Leader by Learning From My Mistakes Leading Major AI Projects at IBM Db2]]></title><description><![CDATA[The five lessons that came from leading AI projects from concept to delivery &#8212; and how they reshaped my technical leadership.]]></description><link>https://aiarchitectplaybook.com/p/how-i-grew-as-a-technical-leader</link><guid isPermaLink="false">https://aiarchitectplaybook.com/p/how-i-grew-as-a-technical-leader</guid><dc:creator><![CDATA[Shaikh Quader]]></dc:creator><pubDate>Thu, 30 Apr 2026 16:31:05 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!FLNj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d939664-5a5c-438c-8360-51c1c838ad02_2610x1864.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FLNj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d939664-5a5c-438c-8360-51c1c838ad02_2610x1864.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FLNj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d939664-5a5c-438c-8360-51c1c838ad02_2610x1864.jpeg 424w, https://substackcdn.com/image/fetch/$s_!FLNj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d939664-5a5c-438c-8360-51c1c838ad02_2610x1864.jpeg 848w, https://substackcdn.com/image/fetch/$s_!FLNj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d939664-5a5c-438c-8360-51c1c838ad02_2610x1864.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!FLNj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d939664-5a5c-438c-8360-51c1c838ad02_2610x1864.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FLNj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d939664-5a5c-438c-8360-51c1c838ad02_2610x1864.jpeg" width="1456" height="1040" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3d939664-5a5c-438c-8360-51c1c838ad02_2610x1864.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1040,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1573728,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://aiarchplaybook.substack.com/i/196006956?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d939664-5a5c-438c-8360-51c1c838ad02_2610x1864.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FLNj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d939664-5a5c-438c-8360-51c1c838ad02_2610x1864.jpeg 424w, https://substackcdn.com/image/fetch/$s_!FLNj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d939664-5a5c-438c-8360-51c1c838ad02_2610x1864.jpeg 848w, https://substackcdn.com/image/fetch/$s_!FLNj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d939664-5a5c-438c-8360-51c1c838ad02_2610x1864.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!FLNj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d939664-5a5c-438c-8360-51c1c838ad02_2610x1864.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I walked out of that meeting with a long list of gaps and one thought in my head: how am I going to pull all of this together?</p><p>I had just presented my plan for adding vector storage and search to Db2 at the Architecture Review Board, the committee of senior technical leaders who must approve every major new project before a line of code is written. I had walked in confident. I walked out humbled. The committee had found problem after problem in my plan that I had not considered. The list kept growing. Each item was a real use case that a customer would hit, and my plan had nothing to say about any of them.</p><p>I had spent nineteen years building software at IBM. I thought I knew what the job required.</p><p>I was wrong. Not about the technology. About what it actually takes to own a major AI project from concept to delivery inside a large, mature software product.</p><p><strong>My Two Recent AI Projects at Db2</strong></p><p>In the past two years I built and delivered two major AI projects at Db2.</p><p>The first was vector storage and similarity search, built directly into the database engine. This lets AI applications store collections of vector data inside the database and search through it to find similar items quickly. It is a core capability for modern AI use cases like recommendation engines and semantic search.</p><p>The second was connecting large language model APIs to Db2. This project lets application developers call these AI models directly from inside the database using standard SQL commands. Application developers no longer need to write separate code outside the database to do this.</p><p>These projects ran from the initial proposal to the final delivery under my technical ownership. More than twenty developers across multiple teams. Tight deadlines. Five Db2 platforms and many more deployment configurations. Db2 is a product used by companies for over thirty years, so a mistake is not just a bug report. It breaks the daily workflow of teams that have depended on it for decades.</p><p>The first project shipped to customers last year. The second has completed development and is scheduled to ship in the next few months.</p><p>These two projects taught me more about technical leadership than the previous nineteen years combined. Here are the five lessons I learned from leading these projects.</p><h2>Lesson 1: A new feature must work with everything the product already has and does.</h2><p>When we started building vector support, I thought the job was simple. Add something new. A new data type for storing vectors. A new way to search those stored vectors. Clean, contained work. Like building a new room in a house without touching the rest.</p><p>Users have built numerous business workflows around Db2&#8217;s existing features. There are tools for copying data out of the database to a file. There are tools for moving data from one database to another. There are SQL commands for every common task a user needs to do.</p><p>When we added the vector data type, users did not care about our internal design. They cared about their work. They expected the new vector type to work with all the same tools they already used every day with other data types. If the copy tool worked with varchar data type, they expected it to work with vectors too.</p><p>We were not just adding something new. We were taking on the responsibility of fitting into everything that already existed, without breaking the ecosystem. I had not understood that responsibility before I walked into the Architecture Review Board.</p><p>That is where the list of gaps I described in the opening came from. Each gap the committee found was a real use case that a customer would hit, and my plan had nothing to say about any of them.</p><p>I went straight to the senior architects and most experienced developers. We went through the list together, gap by gap. I asked for their guidance. We rewrote the plan several times until it was solid. A new shared plan, built together with the senior developers in my squad.</p><p>The method that came from those sessions: find the existing feature in Db2 most similar to the one you are adding. Search the entire codebase. Look at every place in the code that handles that existing feature, one by one, and decide whether your new feature needs to do the same thing. This gives you a map before you write a line of code. In a product built over many years, the people who built it will always find problems you missed. Bring them in before you think you are ready.</p><h2>Lesson 2: A good design does not coordinate teams. A leader does.</h2><p>The second project, integrating large language model APIs into Db2, made this lesson unavoidable. The feature required work from three component teams in sequence.</p><p>To add one of the new functions, we first needed new entries added to the database catalog, a central store that holds information about every object in the database. The catalog team had to finish their work before the query compiler team could start theirs. The query compiler team had to finish before the runtime team could finish theirs. Each team was waiting on the team before them. Like links in a chain.</p><p>If the catalog team finished one week late, the compiler team was stuck for a week doing nothing useful. If the compiler team was late, the runtime team was stuck too. The whole project fell behind.</p><p>I did not track these connections carefully at the start of the project. I assumed each team would manage their own work and speak up when stuck. They did not. Halfway through the project I could see that work was piling up in the wrong order. Teams were waiting on each other but not saying so out loud. Things were quietly falling apart.</p><p>I stopped the usual work and took charge of tracking every dependency myself. I asked each team one direct question: what do you need from another team before you can finish your work? I asked this at every standup meeting. If any work was not helping clear a dependency, I stopped it and pointed the team at what actually needed to happen next.</p><p>When a dependency could not be fixed in time, we found a workaround. The runtime team needed specific data from the catalog team, but the catalog work was still two weeks away. We hardcoded a temporary number as a stand-in. The runtime team used that number to keep building, completing 80 percent of their work without waiting. Once the catalog work was done, we swapped the hardcoded number for the real value. The team kept moving. The deadline held.</p><p>As a technical leader, you have to find the dependencies between teams, identify them clearly, write them down, and make sure every team can see what they are waiting for, and what other teams are waiting for from them. Do this early, or the project will slow down for reasons that are invisible until it is too late.</p><h2>Lesson 3: A squad is not formed by putting people on the same project. It is built.</h2><p>I formed my squad by pulling developers from different teams inside Db2. Some were experts in the runtime layer. Some worked on the catalog. Some worked on the query compiler. Every one of them was skilled and experienced in their own component.</p><p>What I did not expect was the resistance: not to the project itself, but to working outside their own component. Developers who spend years on one part of a codebase become very attached to that component. When asked to work in a different part, some resist. Not because they are difficult people, but because it is unfamiliar and they do not want to take responsibility for components owned by another team.</p><p>In a project where every layer of the product needs to work together, that reluctance causes real problems.</p><p>I had to build a new team culture through action, not words. I asked a runtime developer to write test cases for the query compiler section. I asked a compiler developer to write code for the catalog section. Every time someone stepped outside their usual area and helped another team, I called it out and thanked them. I wanted that behavior to feel normal, not unusual.</p><p>There was pushback. One situation stands out. The runtime team was waiting on a piece of work from the query compiler team. But the compiler team was in the middle of something that, if interrupted, would slow the whole project. Simply stopping their work was not the answer.</p><p>So we went deeper. We identified the specific code paths shared between the runtime and query compiler components, the intersection where both teams had a stake. Then we brought a runtime developer into that shared territory. They worked on the overlapping piece directly, consulting the compiler developer as needed and including them in the pull request review. Nobody had to abandon their current work. The dependency got cleared by finding where the two domains met and working from that common ground.</p><p>That became a pattern. Whenever two components had a dependency, I looked for the intersection, the code or interface they shared, and found a way to bring both developers to that point together. It was slower than simply assigning tasks. It was faster than waiting for one team to finish before the other could start.</p><p>I also made a point to listen. When someone had a concern about working outside their component, I heard them out and worked with them to find a way to contribute that felt manageable. People work better toward a shared goal when they feel heard.</p><p>Slowly the pushback faded. The squad started acting like a team. Once that happened, the speed of the work increased noticeably.</p><p>You cannot create a real team just by putting people on the same project. You build one by giving people a shared goal, listening to them, and reminding them, over and over, of what they are building together, until working as one team feels more natural than working apart.</p><h2>Lesson 4: Testing runs alongside development. Not after it.</h2><p>We wrote the code. We ran tests on our own development machines. Everything passed. We delivered the code to the release branch. I thought we were ready.</p><p>What I did not know was that Db2 has a central automated testing system. Think of it as a shared dashboard that runs thousands of tests on every piece of delivered code across all platforms and all deployment setups, automatically and continuously. Anyone can look at it and see which tests are passing and which are failing.</p><p>I had never looked at it. I did not fully understand what it showed.</p><p>Our QA lead, <a href="https://www.linkedin.com/in/darrin-woodard-b8b58911b/">Darrin Woodard</a>, showed it to us one day. The first time I saw the results for our delivered code, it was bad. Test failures everywhere. Red marks across every platform and every setup we needed to support.</p><p>Db2 runs on five different OS platforms: standard Linux, Linux on Power, Linux on Z, AIX, and Windows. Our code had to work correctly on all five. Db2 also runs in different deployment setups. Some installations spread across multiple nodes, others optimized for different types of workloads such as analytical or transactional. Every setup type can expose different bugs.</p><p>I had not thought carefully about any of this when planning the project. I underestimated how many test cases we needed to write. I underestimated how long it would take to investigate each failure and fix it.</p><p>The Db2 Chief Architect, <a href="https://www.linkedin.com/in/mike-springgay-2638a0a8/">Mike Springgay</a>, regularly reminded me of this throughout the project. He consistently brought up the importance of estimating the QA effort separately from the development effort. Eventually that lesson landed. I began to treat QA planning as its own work stream, not an afterthought.</p><p>After seeing the QA dashboard for the first time, I requested the QA lead to join our regular standup meetings. Twice a week, we opened the dashboard together and looked at every new test failure for the features we had delivered for vector support. Before the meeting ended, every failure had a name next to it: the developer in my squad responsible for investigating and fixing it.</p><p>Week by week, the failures went down. More green, less red. Then, before our internal delivery date, everything was green.</p><p>Writing code that works on your own development machine is only half the job. The other half is proving it works on every platform, in every deployment setup, in every situation a user might encounter. When it did land, it changed how I planned every subsequent release. QA was no longer something that happened after the work. It was part of the work.</p><p>Looking back, Lesson 1 and Lesson 4 are two sides of the same coin. Lesson 1 brought ecosystem awareness to the design phase. Lesson 4 confirmed that the implementation actually honored that ecosystem. One is about seeing clearly before you build. The other is about proving it after you do.</p><h2>Lesson 5: Writing code made me a better technical leader.</h2><p>For a long time I believed that leading a technical project meant staying at the level of architecture and direction, and that keeping my distance from the code made me more effective.</p><p>It did not. It kept me comfortable.</p><p>I was running design meetings. Writing technical proposals. Talking with directors, architects, and customers.</p><p>But all of it kept me away from writing code for the actual product features we were shipping. The developers on my team were writing code every day. I was managing, reviewing designs, making decisions, and representing the project to people outside the team. I understood the system at a high level: what it did, how the pieces fit together. I did not understand it at the level where the real problems live: inside the code.</p><p>I gave myself a coding assignment. Real product code: the kind that gets reviewed, approved, merged into the main codebase, and shipped to customers.</p><p>I started with a relatively simpler function: the vector dimension count function. This calculates the number of dimensions in an input vector. I had built an earlier version of this as a user-defined function, a separate script outside the database. Now I needed to build it properly as a native part of Db2 itself.</p><p>I wrote the SQL grammar rules that let users call the function. I wrote the code that actually runs the calculation. I built it the same way every developer on my team builds their features: design it, write it, test it, find the bugs, fix them, write more tests, repeat.</p><p>Implementing this first function took me nearly four weeks. I was not working on it full time. I was still doing everything else too. And I was learning this part of the codebase, much of it for the first time. I wrote notes about everything as I progressed: what I tried, what worked, what did not, what I learned. I kept a running log in the Git issue tracker. Screenshots. Code comments. A record of how the feature was built from nothing.</p><p>Then I sent the code for review. To a senior developer on my team, someone whose work I had been reviewing and approving for months.</p><p>He came back with more than fifteen things that needed to change.</p><p>I had thought the code was good. It ran. The tests passed on my machine. But &#8220;it runs&#8221; and &#8220;it is ready for shipping&#8221; are very different bars. A developer who has worked in this codebase for years sees things a newer person cannot see. I read every comment. I made every change. I sent it back. More comments. More changes. Several rounds. It was hard, not because the feedback was unfair, but because it was right.</p><p>Everything I wrote down during the first feature paid off on the second. The steps were similar. I had a working template for developing new features. Four weeks became two. On the third feature, more complex than the first two, I blocked time on my calendar and focused. I opened a pull request with complete, working code in just two days.</p><p>Not because the reviewers stopped looking carefully. Because the code I was writing was getting better.</p><p>I was no longer an outsider, a non-developer technical leader telling others what to build. I was one of them. One of the developers in the front line, co-creating the feature code. When I sat in design discussions, my questions were more specific, more tied to the actual code, more connected to how things would actually work. The team&#8217;s relationship with me changed too, not because of my job title, but because they had looked at my code line by line, told me what to fix, and watched me fix it the same way I asked them to fix their own work.</p><p>The day I stopped thinking of myself only as the leader and started being a developer again. That was the day I became a better leader.</p><h2>What I Know Now</h2><p>On any given day across these projects I was the architect, the project manager, the developer, the QA reviewer, the stakeholder manager, and the face of the project to customers and senior leadership. Sometimes all in the same day. That is what technical ownership of an AI project at this level actually looks like. It is not one job. It is many jobs, held together by a clear goal and the discipline to keep moving toward it.</p><p>These projects gave me something harder to build than technical knowledge: real confidence, strong working relationships across many teams, and the clarity to take on what comes next. I can now lead AI projects at a scale and complexity I could not have handled two years ago. There is more to build. I plan to build it.</p><p>When I walked out of that Architecture Review Board meeting with a long list of gaps, I did not have what I have now. I had nineteen years of experience and a plan that was not ready.</p><p>What I have now came from two years of doing the work, getting things wrong, and rebuilding my understanding one lesson at a time. It came from tracking dependencies no one else was tracking. From writing code that got sent back with fifteen things to fix. From saying the same thing to the same team dozens of times until it finally landed.</p><p>These are not lessons I read in a book or learned in a course. They are the kind that only come from doing the work, from standing in the middle of something hard and finding your way through it.</p><p>I shared them because I wish someone had shared them with me.</p>]]></content:encoded></item><item><title><![CDATA[I Spent One Hour Showing Engineering Students How to Use AI. This Is That Talk.]]></title><description><![CDATA[How I use AI to run a full-time role at IBM and a PhD at the same time, and the free AI tools every engineering student can start with today.]]></description><link>https://aiarchitectplaybook.com/p/i-spent-one-hour-showing-engineering</link><guid isPermaLink="false">https://aiarchitectplaybook.com/p/i-spent-one-hour-showing-engineering</guid><dc:creator><![CDATA[Shaikh Quader]]></dc:creator><pubDate>Tue, 14 Apr 2026 03:26:41 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/194145457/52afbb262392caf6bf2471f1f2f8fb5b.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>Fifteen years ago I was on a 24/7 on-call rotation at IBM, pager always within reach. When it went off past midnight, I opened my laptop and started reading logs. No AI. No assistant. Nobody else awake. That investigation ran for as long as it took, sometimes until mid morning.</p><p>Last week, I hit a bug in code I was writing for Db2. The same category of problem that once kept me up until mid morning. I shared the error with an AI coding agent. Root cause identified. Fix applied. Done. Under two minutes.That is not a small improvement. That is a different kind of work.</p><p>Last Saturday I shared both of those stories with 35 university students from India, most of them computer science majors, preparing for careers in AI engineering. Then I showed them exactly what sits between those two moments, the tools, the habits, and the mindset shifts that made the difference.</p><p>In the session we covered three things. First, how I currently hold a full-time role as AI Architect at IBM Db2 and complete a PhD in AI at the same time, using AI tools to run both tracks in parallel. Second, three prompting habits that immediately change what you get back from any AI tool, including how I use AI when reading something new. Third, two free tools you can open today: Claude and NotebookLM.</p><p>The session also opened up questions from the audience that made it richer. Someone in the audience asked how a student studying neuroscience, not computer science, can start using AI. We talked about how AI makes you a sharper learner, not just a faster coder. And we closed with three specific tasks you can act on this weekend.</p><p>If you are an engineering student preparing for a career in AI, this session was made for you.</p>]]></content:encoded></item><item><title><![CDATA[I Joined IBM With No Personal Operating System. So I Built One.]]></title><description><![CDATA[For students stepping into their first engineering role, this is the roadmap I wish I had on day one at IBM.]]></description><link>https://aiarchitectplaybook.com/p/i-joined-ibm-with-no-personal-operating</link><guid isPermaLink="false">https://aiarchitectplaybook.com/p/i-joined-ibm-with-no-personal-operating</guid><dc:creator><![CDATA[Shaikh Quader]]></dc:creator><pubDate>Tue, 07 Apr 2026 16:40:22 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!BFPq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7294b0a-9730-4f58-b544-fdf65f48c9d3_1456x1048.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BFPq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7294b0a-9730-4f58-b544-fdf65f48c9d3_1456x1048.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BFPq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7294b0a-9730-4f58-b544-fdf65f48c9d3_1456x1048.png 424w, https://substackcdn.com/image/fetch/$s_!BFPq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7294b0a-9730-4f58-b544-fdf65f48c9d3_1456x1048.png 848w, https://substackcdn.com/image/fetch/$s_!BFPq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7294b0a-9730-4f58-b544-fdf65f48c9d3_1456x1048.png 1272w, https://substackcdn.com/image/fetch/$s_!BFPq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7294b0a-9730-4f58-b544-fdf65f48c9d3_1456x1048.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BFPq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7294b0a-9730-4f58-b544-fdf65f48c9d3_1456x1048.png" width="1456" height="1048" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b7294b0a-9730-4f58-b544-fdf65f48c9d3_1456x1048.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1048,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2156725,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://aiarchplaybook.substack.com/i/193472725?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7294b0a-9730-4f58-b544-fdf65f48c9d3_1456x1048.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BFPq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7294b0a-9730-4f58-b544-fdf65f48c9d3_1456x1048.png 424w, https://substackcdn.com/image/fetch/$s_!BFPq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7294b0a-9730-4f58-b544-fdf65f48c9d3_1456x1048.png 848w, https://substackcdn.com/image/fetch/$s_!BFPq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7294b0a-9730-4f58-b544-fdf65f48c9d3_1456x1048.png 1272w, https://substackcdn.com/image/fetch/$s_!BFPq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7294b0a-9730-4f58-b544-fdf65f48c9d3_1456x1048.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>My first day at IBM was in 2005. I was a student intern, excited, prepared technically, and completely lost by the end of the day. Not lost in the code. Lost in everything around it. I joined full time shortly after graduating and the feeling did not go away. If anything it got sharper.</p><p>This is the post I wish had existed before I walked through those doors.</p><h2>Where this came from</h2><p>Before IBM I had done internships at smaller organizations in New Brunswick, a government office and a few small IT companies. Good experiences. But nothing that prepared me for what IBM actually was. A few friends had interned there before me but nobody told me what it was like on the inside. The only thing I thought I needed to prepare was my technical skills.</p><p>I was wrong.</p><p>Day one I was lost. Not after a few weeks of settling in. Day one. The scale of it, the complexity of it, the sheer number of things I did not understand hit me immediately. The people, the team dynamics, the processes, the unspoken expectations. Smart, experienced professionals all around me who seemed to know exactly how everything worked. I did not know how to navigate any of it. I did not know what was expected of me beyond the technical work. I did not know how to read the room, how to ask the right questions, or how to exist in that environment without feeling like I did not belong there.</p><p>The technology was never the problem. I could handle that.</p><p>What I couldn&#8217;t handle was everything else.</p><p>I struggled to communicate, really communicate. Listening in meetings without tuning out. Writing clearly under pressure. Speaking with any confidence in a room full of people who seemed to know exactly where they stood. Reading the room. Navigating across personalities. Knowing when to push and when to wait.</p><p>I didn&#8217;t know how to exist in that environment. I didn&#8217;t believe I belonged there.</p><p>I didn&#8217;t know the path forward. I just knew I was lost.</p><p>Three people changed that.</p><p>My supervisor <a href="https://www.linkedin.com/in/tahsinrouf/">Tahsin Rouf</a>, my first IBM manager Kent Klaas, and my second manager <a href="https://www.linkedin.com/in/ed-van-gennip-8964b11a/">Ed Van Gennip</a>. Their support in my early years at IBM was not incidental. It was foundational. The trajectory I have been able to build over the past 20 years, everything I have learned, taught, and contributed, sits on the base they helped me build. Tahsin saw something in me at exactly the right moment and offered guidance when I had no idea how to ask for it. Kent supported me consistently and introduced me to mind mapping, a thinking tool I still use today for planning and problem solving. Ed believed in me deeply and gave me the kind of continuous feedback, support, and direction that most early-career people never get. I want to name them here because people like this rarely get named. They should.</p><p>And I joined Toastmasters.</p><p>That decision opened something I did not expect. Not just better communication. A deep curiosity about the foundational skills nobody had ever taught me. How to learn better. How to think more critically. How to listen deeply, and build the kind of discipline that actually compounds over a career. I started taking IBM&#8217;s internal courses. I started building personal frameworks. I started testing things on myself.</p><p>Then came the moment that pulled everything together.</p><p>About a year or two into my time at IBM, I volunteered to run Think Friday. A weekly internal knowledge-sharing session for our team. I wanted to do it well, not just show up and lead it. So before I ran my first session I read a book called &#8220;<a href="https://www.amazon.ca/Not-Another-Meeting-Practical-Facilitating/dp/1555716326">Not Another Meeting</a>&#8221; by Frances A. Micale. That book gave me something specific: a method. How to prepare with a clear outcome in mind, how to structure the discussion, how to manage debate productively, how to take notes that were actually useful, and how to close with clear action items organized by topic and next step.</p><p>I ran Think Friday that way for two to three years.</p><p>Looking back, those sessions taught me more than I realized at the time. How to bring a group of people toward a common objective. How to hold a productive discussion without letting it drift. How to capture what actually matters and make it useful afterward. How to lead a room, not just occupy it.</p><p>That weekly meeting was a turning point. Not because it was high-stakes. Because it was consistent. I got the repetitions. And the confidence that came from those repetitions was real in a way that no single course or credential ever produced.</p><h2>What was missing</h2><p>Looking back, I was technically prepared and foundationally underprepared. Both things were true at the same time.</p><p>The technical skills got me hired. They kept me moving in the first few weeks. But the moment I needed to work across a team, recommend a clear solution, listen in a meeting without missing half of what was actually being said, or simply believe I had a right to be in the room, the technical skills went quiet. They had nothing to say.</p><p>Nobody had ever taught me those things. Not in school. Not in any course. Not in any internship prep session I had ever sat through.</p><p>I had to figure them out the hard way, mostly alone, mostly by making mistakes I did not understand until much later. The difference was that I eventually found people who believed in me, and I found a way to practice. Most students never find either.</p><h2>The framework, step by step</h2><p>After two decades of learning this myself and then teaching it to students, I have found the foundational skills fall into a small number of areas that show up again and again. These are not soft skills in the vague, resume-filler sense. They are specific and learnable.</p><ol><li><p><strong>Communication that actually works in a corporate environment.</strong> Not presentation polish. The ability to write a clear email, summarize a meeting, give feedback without creating defensiveness, and ask a question without sounding lost. This also includes thinking tools. Kent Klaas showed me mind mapping early in my career as a way to organize thinking before writing or speaking. It changed how I approach any problem that feels too big to tackle in a straight line. These skills are different from anything you practiced in school and they matter from day one.</p></li><li><p><strong>Learning how to learn on the job.</strong> School gives you structured problems with known answers. Work gives you ambiguous problems with moving targets. Building a personal system for learning new things fast, the way I built one by reading a single practical book before running my first meeting, is a skill in itself.</p></li><li><p><strong>Navigating the environment, not just the work.</strong> Understanding how decisions get made. Reading team dynamics. Knowing when to speak up and when to listen. This is not political in the negative sense. It is situational awareness, and it determines whether your technical work ever lands.</p></li><li><p><strong>Believing you belong there.</strong> This one sounds soft. It is not. Self-doubt is not a feeling to push through. It is a pattern that shapes every decision you make, who you ask for help, whether you volunteer for harder problems, how you handle feedback. The students who address it directly move faster than the ones who try to outwork it.</p></li><li><p><strong>Building discipline that compounds.</strong> The habits you build in your first year set the baseline for everything that follows. Not productivity hacks. The small consistent things: how you prepare for meetings, how you follow up, how you manage your own energy, how you stay curious when the work gets repetitive. Running Think Friday every week for two to three years was not glamorous. It was the repetition that built the skill.</p></li></ol><h2>The two assets most students skip</h2><p>Following this kind of framework produces two things that matter more than any project portfolio.</p><p>The first is self-awareness about where you actually are. Not where you think you are, not where you hope you are. A clear read on what is working and what is not, grounded in real situations from your actual job.</p><p>The second is a personal operating system. A small set of principles and habits you have tested on yourself and trust. Not borrowed from a book. Yours. Mine started with a practical meeting guide and a weekly Friday session. It grew from there.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fev6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff73037a0-17ed-407d-8792-055065e7ee95_1400x1152.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fev6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff73037a0-17ed-407d-8792-055065e7ee95_1400x1152.png 424w, https://substackcdn.com/image/fetch/$s_!fev6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff73037a0-17ed-407d-8792-055065e7ee95_1400x1152.png 848w, https://substackcdn.com/image/fetch/$s_!fev6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff73037a0-17ed-407d-8792-055065e7ee95_1400x1152.png 1272w, https://substackcdn.com/image/fetch/$s_!fev6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff73037a0-17ed-407d-8792-055065e7ee95_1400x1152.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fev6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff73037a0-17ed-407d-8792-055065e7ee95_1400x1152.png" width="1400" height="1152" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f73037a0-17ed-407d-8792-055065e7ee95_1400x1152.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1152,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:181187,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aiarchplaybook.substack.com/i/193472725?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff73037a0-17ed-407d-8792-055065e7ee95_1400x1152.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fev6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff73037a0-17ed-407d-8792-055065e7ee95_1400x1152.png 424w, https://substackcdn.com/image/fetch/$s_!fev6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff73037a0-17ed-407d-8792-055065e7ee95_1400x1152.png 848w, https://substackcdn.com/image/fetch/$s_!fev6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff73037a0-17ed-407d-8792-055065e7ee95_1400x1152.png 1272w, https://substackcdn.com/image/fetch/$s_!fev6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff73037a0-17ed-407d-8792-055065e7ee95_1400x1152.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Why this works</h2><p>Corporate environments, and AI teams in particular, move fast and reward people who can operate with ambiguity, communicate clearly under pressure, and keep building without waiting to be told what to do next.</p><p>The students I have seen thrive earliest are not always the most technically brilliant. They are the ones who understood early that the job was bigger than the code. They put time into the foundational skills deliberately, not because someone told them to, but because they could see the gap and decided to close it.</p><p>That gap is real. It is common. And unlike a lot of things in your career, it is entirely within your control to work on.</p><p>After 20 years at IBM, mentoring and helping more than 100 students across undergrad, masters, and PhD programs, I still see the same gap show up in the same way. Not because students are underprepared. Because nobody told them it existed.</p><p>That is exactly why I am sharing this with you now.</p>]]></content:encoded></item><item><title><![CDATA[The Step Most AI Portfolios Skip]]></title><description><![CDATA[If you are preparing for an AI role, you have likely built something solid. Here is what to add that shows you can think like a professional engineer.]]></description><link>https://aiarchitectplaybook.com/p/the-step-most-ai-portfolios-skip</link><guid isPermaLink="false">https://aiarchitectplaybook.com/p/the-step-most-ai-portfolios-skip</guid><dc:creator><![CDATA[Shaikh Quader]]></dc:creator><pubDate>Fri, 03 Apr 2026 21:47:14 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!DG3M!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F277362a5-0c3e-43a8-8f1a-f251778c67a6_1024x559.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DG3M!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F277362a5-0c3e-43a8-8f1a-f251778c67a6_1024x559.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DG3M!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F277362a5-0c3e-43a8-8f1a-f251778c67a6_1024x559.png 424w, https://substackcdn.com/image/fetch/$s_!DG3M!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F277362a5-0c3e-43a8-8f1a-f251778c67a6_1024x559.png 848w, https://substackcdn.com/image/fetch/$s_!DG3M!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F277362a5-0c3e-43a8-8f1a-f251778c67a6_1024x559.png 1272w, https://substackcdn.com/image/fetch/$s_!DG3M!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F277362a5-0c3e-43a8-8f1a-f251778c67a6_1024x559.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DG3M!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F277362a5-0c3e-43a8-8f1a-f251778c67a6_1024x559.png" width="1024" height="559" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/277362a5-0c3e-43a8-8f1a-f251778c67a6_1024x559.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:559,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1218923,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://aiarchplaybook.substack.com/i/193115654?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F277362a5-0c3e-43a8-8f1a-f251778c67a6_1024x559.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DG3M!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F277362a5-0c3e-43a8-8f1a-f251778c67a6_1024x559.png 424w, https://substackcdn.com/image/fetch/$s_!DG3M!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F277362a5-0c3e-43a8-8f1a-f251778c67a6_1024x559.png 848w, https://substackcdn.com/image/fetch/$s_!DG3M!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F277362a5-0c3e-43a8-8f1a-f251778c67a6_1024x559.png 1272w, https://substackcdn.com/image/fetch/$s_!DG3M!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F277362a5-0c3e-43a8-8f1a-f251778c67a6_1024x559.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Students reach out to me regularly. They share their portfolios, ask for feedback, want to know why they are not hearing back from applications.</p><p>I look at their work. And almost every time, I see the same pattern.</p><p>A RAG pipeline. An AI agent. A classification model. Clean code, decent README, maybe a demo video. And then it stops there.</p><p>Building something real takes effort, and that effort is worth acknowledging. But there is a next step most students have not been shown, and that gap is what is holding them back.</p><p>That is what this post is about.</p><p></p><h3><strong>The question that separates candidates</strong></h3><p>When I review a portfolio as an AI architect, I am not only asking <em>can you build this?</em> I am also asking <em>do you know if what you built is any good?</em></p><p>Those are not the same question. And in my experience, most portfolios only answer the first one.</p><p>Think about what the job actually involves. You will not be handed tutorials to replicate. You will be handed a system that partially works and asked to improve it. Or you will build something from scratch and be responsible for demonstrating it meets a quality standard before it ships.</p><p>The question your portfolio needs to answer is: can you take something that partially works and make it measurably better?</p><p></p><h3><strong>What this looks like in practice</strong></h3><p>Let us use RAG as the example. It is the most common AI portfolio project I see right now, and a good one to learn from.</p><p>You built a RAG pipeline. It retrieves documents and generates answers. You tested it with a few queries and the responses looked reasonable.</p><p>But do you actually know how well it is working?</p><p>There is an open source evaluation framework called <a href="https://github.com/explodinggradients/ragas">RAGAS</a>, published at EACL 2024, built specifically to evaluate RAG systems. It gives you measurable scores on  dimensions that matter in production:</p><p><strong>Faithfulness</strong> measures whether the model&#8217;s answer is grounded in the retrieved documents, or whether it is generating claims the documents never supported. This is how you detect and quantify hallucinations.</p><p><strong>Answer Relevancy</strong> measures whether the response actually addresses what the user asked.</p><p><strong>Context Precision and Recall</strong> measures whether your retriever is surfacing the right document chunks in the first place.</p><p>You run your system through RAGAS and get a faithfulness score of 0.61. That means roughly 4 in 10 claims your system generates are not supported by the retrieved documents. For most production use cases, that is not acceptable.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!o3fP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8201c2c8-77c6-41a0-93ec-0acb4e73caff_1674x846.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!o3fP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8201c2c8-77c6-41a0-93ec-0acb4e73caff_1674x846.png 424w, https://substackcdn.com/image/fetch/$s_!o3fP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8201c2c8-77c6-41a0-93ec-0acb4e73caff_1674x846.png 848w, https://substackcdn.com/image/fetch/$s_!o3fP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8201c2c8-77c6-41a0-93ec-0acb4e73caff_1674x846.png 1272w, https://substackcdn.com/image/fetch/$s_!o3fP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8201c2c8-77c6-41a0-93ec-0acb4e73caff_1674x846.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!o3fP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8201c2c8-77c6-41a0-93ec-0acb4e73caff_1674x846.png" width="1456" height="736" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8201c2c8-77c6-41a0-93ec-0acb4e73caff_1674x846.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:736,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:99691,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aiarchplaybook.substack.com/i/193115654?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8201c2c8-77c6-41a0-93ec-0acb4e73caff_1674x846.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!o3fP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8201c2c8-77c6-41a0-93ec-0acb4e73caff_1674x846.png 424w, https://substackcdn.com/image/fetch/$s_!o3fP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8201c2c8-77c6-41a0-93ec-0acb4e73caff_1674x846.png 848w, https://substackcdn.com/image/fetch/$s_!o3fP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8201c2c8-77c6-41a0-93ec-0acb4e73caff_1674x846.png 1272w, https://substackcdn.com/image/fetch/$s_!o3fP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8201c2c8-77c6-41a0-93ec-0acb4e73caff_1674x846.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>[What most portfolios show vs. what hiring managers are also looking for]</em></p><p>So you investigate. You find your chunk size is too large and the retriever is returning loosely related content alongside relevant content. You reduce the chunk size. Faithfulness moves to 0.74. You introduce a re-ranker to improve which chunks reach the model. Now it is 0.84.</p><p>You document every step. The baseline score, what you changed, why you made that change, and what improved as a result.</p><p>That documented process, not just the final system, is what demonstrates engineering thinking.</p><p></p><h3><strong>This applies beyond RAG</strong></h3><p>If you are building a classification model, it is worth asking: do you know your precision, recall, and F1 score, and do you understand which one matters most for your specific use case?</p><p>If it is a regression model, can you explain what your RMSE is telling you about your system&#8217;s real world behaviour, beyond the formula itself?</p><p>These are not advanced topics. They are foundational to working with models professionally. And developing fluency with them early will serve you well throughout your career.</p><p>There is a principle worth holding onto: you cannot confidently claim something works if you have no definition of what working means.</p><p>I came across this idea early in my career through a book called <em>Exploring Requirements: Quality Before Design</em> by Donald Gause and Gerald Weinberg. I read it years ago and it shaped how I think about building systems. Their core argument is straightforward: define quality before you design, not after. The book was written in 1989 about software engineering. The principle applies just as directly to AI systems today.</p><p></p><h3><strong>What production ready actually means</strong></h3><p>"Production ready" gets mentioned constantly in job descriptions. Here is what it actually means in practice.</p><p><strong>Reliability</strong> means your system handles unexpected conditions without breaking. What happens when the retriever returns nothing useful? When an API call times out? A production system fails gracefully and recovers.</p><p><strong>Observability</strong> means you can see what is happening inside your system after deployment. Are you logging inputs, outputs, and quality scores in a way that lets you diagnose problems when they occur?</p><p><strong>Cost</strong> means your system stays economically viable at scale. How many LLM calls does it make per query? Does that remain reasonable if usage doubles or triples?</p><p><strong>Edge cases</strong> are the inputs your system handles poorly. Every system has them. The difference between a student project and a production ready system is often just whether those edge cases were found and documented before users encountered them.</p><p>You do not need to have solved all of this as a student. No one expects that. But if your README shows you have thought about these dimensions, identified the limitations of your own system, and have a reasoned view on what you would address first, that communicates something important about how you think. It signals that you are ready to operate on a real engineering team.</p><p>That kind of thinking is what I have seen make a genuine difference when students move from portfolio to interview to role.</p><p></p><h3><strong>One thing to do this week</strong></h3><p>Take any project in your portfolio. Add one section to the README called <em>Evaluation and Known Limitations.</em> Answer these three questions:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;markdown&quot;,&quot;nodeId&quot;:&quot;bc9f6429-0c58-4635-95e7-6c1ff676cc8e&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-markdown">## Evaluation and Known Limitations

**1. What metrics did you use, and what did the numbers show?**
e.g. "Faithfulness: 0.84 after two iterations. Context Recall: 0.71, identified as the next area to improve."

**2. Where does the system underperform, and under what conditions?**
e.g. "Retrieval degrades on queries shorter than 5 words. Fails gracefully but returns low-confidence responses."

**3. What would you address first to improve it, and why?**
e.g. "Improve Context Recall by experimenting with hybrid retrieval. Currently retriever is keyword-only."</code></pre></div><p>The engineers who stand out are not always the ones who built the most projects. They are the ones who can examine a system honestly, articulate where it falls short, and think clearly about how to make it better.</p><p>That is the skill worth developing. And it is one you can start demonstrating today.</p><div><hr></div><p><em>If you are preparing for an AI role and want feedback on your portfolio, feel free to reach out or leave a comment below. And if you have worked with evaluation frameworks or metrics in your own projects, I would be glad to hear what you found useful.</em></p>]]></content:encoded></item><item><title><![CDATA[How to build an AI portfolio project by extending someone else's tutorial]]></title><description><![CDATA[A step-by-step approach that produced three publishable assets from a single LangChain tutorial, and what engineering students can take from it.]]></description><link>https://aiarchitectplaybook.com/p/how-to-build-an-ai-portfolio-project</link><guid isPermaLink="false">https://aiarchitectplaybook.com/p/how-to-build-an-ai-portfolio-project</guid><dc:creator><![CDATA[Shaikh Quader]]></dc:creator><pubDate>Tue, 31 Mar 2026 15:30:04 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!L9cv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0773d489-bd93-48fd-a6d3-74907cdfa962_2432x1728.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!L9cv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0773d489-bd93-48fd-a6d3-74907cdfa962_2432x1728.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!L9cv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0773d489-bd93-48fd-a6d3-74907cdfa962_2432x1728.jpeg 424w, https://substackcdn.com/image/fetch/$s_!L9cv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0773d489-bd93-48fd-a6d3-74907cdfa962_2432x1728.jpeg 848w, https://substackcdn.com/image/fetch/$s_!L9cv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0773d489-bd93-48fd-a6d3-74907cdfa962_2432x1728.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!L9cv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0773d489-bd93-48fd-a6d3-74907cdfa962_2432x1728.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!L9cv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0773d489-bd93-48fd-a6d3-74907cdfa962_2432x1728.jpeg" width="1456" height="1035" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0773d489-bd93-48fd-a6d3-74907cdfa962_2432x1728.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1035,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:786772,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://aiarchplaybook.substack.com/i/192731177?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0773d489-bd93-48fd-a6d3-74907cdfa962_2432x1728.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!L9cv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0773d489-bd93-48fd-a6d3-74907cdfa962_2432x1728.jpeg 424w, https://substackcdn.com/image/fetch/$s_!L9cv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0773d489-bd93-48fd-a6d3-74907cdfa962_2432x1728.jpeg 848w, https://substackcdn.com/image/fetch/$s_!L9cv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0773d489-bd93-48fd-a6d3-74907cdfa962_2432x1728.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!L9cv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0773d489-bd93-48fd-a6d3-74907cdfa962_2432x1728.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>If you are an engineering student preparing for an AI internship and not sure what to build for your portfolio, this post is for you.</p><p>I want to share an approach that turned a single open-source tutorial into three publishable assets over a couple of weeks. It is not a shortcut. It is a deliberate process, and it taught me more than starting from scratch would have.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KcTk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6a910a9-d70f-4b08-b1b3-2ee0afeb31fa_1440x424.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KcTk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6a910a9-d70f-4b08-b1b3-2ee0afeb31fa_1440x424.png 424w, https://substackcdn.com/image/fetch/$s_!KcTk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6a910a9-d70f-4b08-b1b3-2ee0afeb31fa_1440x424.png 848w, https://substackcdn.com/image/fetch/$s_!KcTk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6a910a9-d70f-4b08-b1b3-2ee0afeb31fa_1440x424.png 1272w, https://substackcdn.com/image/fetch/$s_!KcTk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6a910a9-d70f-4b08-b1b3-2ee0afeb31fa_1440x424.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KcTk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6a910a9-d70f-4b08-b1b3-2ee0afeb31fa_1440x424.png" width="1440" height="424" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a6a910a9-d70f-4b08-b1b3-2ee0afeb31fa_1440x424.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:424,&quot;width&quot;:1440,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:53495,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aiarchplaybook.substack.com/i/192731177?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6a910a9-d70f-4b08-b1b3-2ee0afeb31fa_1440x424.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KcTk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6a910a9-d70f-4b08-b1b3-2ee0afeb31fa_1440x424.png 424w, https://substackcdn.com/image/fetch/$s_!KcTk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6a910a9-d70f-4b08-b1b3-2ee0afeb31fa_1440x424.png 848w, https://substackcdn.com/image/fetch/$s_!KcTk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6a910a9-d70f-4b08-b1b3-2ee0afeb31fa_1440x424.png 1272w, https://substackcdn.com/image/fetch/$s_!KcTk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6a910a9-d70f-4b08-b1b3-2ee0afeb31fa_1440x424.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2><strong>Where this came from</strong></h2><p>About a year ago, after a decade of working in AI and ML, I decided to learn agentic RAG properly. Not just read about it. Build something real with it.</p><p>I found a well-structured tutorial in the LangChain repo. It builds an agentic RAG system that loads web content, chunks it, indexes it for semantic search, and uses a LangGraph agent to decide when to retrieve versus when to answer directly. Clean, well-organized, a good foundation.</p><p>My first instinct was not to modify it. It was to understand the tutorial code completely.</p><p>So I ran it as written. Worked through every component. Traced each step in the workflow. Asked myself why each design choice was made, not just what it did. Why that retrieval strategy. How the agent decides when retrieved documents are relevant enough to generate an answer, and when to rewrite the query and try again. Where the boundaries between components are drawn and why.</p><p>That process took time. But it was the only reason I could later see where the gaps were. You cannot see what is missing in something you have only skimmed.</p><h2><strong>What the tutorial was missing</strong></h2><p>Once I had that foundation, the gaps became clear.</p><p>The tutorial used an in-memory vector store, fine for a prototype but not for production. It loaded content from public blog posts, useful for a demo but not connected to any real enterprise use case. There was no evaluation of retrieval quality, no baseline numbers to measure anything against, and no way to run any of it outside a Jupyter notebook.</p><p>Those were my extension targets.</p><p>I replaced the in-memory vector store with IBM Db2 as the vector database. Swapped the public blog content with a technical article I had written myself, so I could evaluate the outputs against answers I already knew. Added local embedding generation using a Granite model so the entire pipeline could run offline. Then decomposed the notebook into three production microservices: a data ingestion API, a search and generation API, and a unified gateway that runs both.</p><p>Having built RAG pipelines with my team, I can tell you that decomposition step is where most of the real engineering decisions surface. Error handling, input validation, service boundaries, modularity. None of that appears in a notebook.</p><p>My full repo is here: <a href="https://github.com/shaikhq/agentic-rag-db2">shaikhq/agentic-rag-db2</a></p><h2><strong>The framework, step by step</strong></h2><p>Here is the process in the order I followed it. It applies to any AI tutorial you want to extend, not just RAG.</p><ol><li><p><strong>Run the original first, without touching it.</strong> Understand every component and every design choice before you form an opinion about what to change. This is the step many students skip. They skim the tutorial, get the general idea, and immediately start modifying things. That is why their extensions feel shallow. Deep understanding is what lets you see the gaps.</p></li><li><p><strong>Record baseline numbers before changing anything.</strong> Retrieval accuracy, latency, response quality, whatever is measurable for the use case. You need a starting point. Without a baseline, any improvement you make later is invisible. This is also one of the most common gaps in tutorials themselves, and closing it alone is a meaningful contribution.</p></li><li><p><strong>Swap the data source with something real to you.</strong> Replace the tutorial&#8217;s sample data with your own content, your own domain, something you can evaluate honestly. I used a technical article I had written. The pipeline&#8217;s responses immediately became more meaningful to read because I already knew what a correct answer looked like.</p></li><li><p><strong>Close one gap with a targeted improvement.</strong> Better retrieval logic, a smarter chunking strategy, an evaluation step the original skipped, a production-grade vector store. Pick one improvement at a time, implement it, and measure the difference against your baseline from step 2. That comparison is the substance of your portfolio, not the code alone.</p></li><li><p><strong>Convert the notebook to APIs.</strong> This is the step that separates a prototype from something deployable. I split the notebook code into a data ingestion API and a search and generation API, then combined them behind a gateway. It forces decisions about error handling, input validation, and service boundaries that notebook code never surfaces. Most students never get there, which is exactly why it is worth doing.</p></li><li><p><strong>Write the README while the work is fresh.</strong> Not a usage guide. A record of what you found in the original tutorial, what you changed, and what the numbers showed before and after. Include a workflow diagram. This is the document an interviewer will ask you to walk them through. Writing it also forces you to articulate what you actually did and why, which is harder than it sounds.</p></li></ol><h2><strong>The two assets that most students skip</strong></h2><p>Once the code is done, produce two more assets from the same work before you close the project. These are the ones that actually get you noticed.</p><p><strong>Write a post about what you extended and what you measured.</strong> Not a tutorial explaining the technology. Your own account of the extensions you made, what surprised you, and what the numbers showed. A hiring manager can tell the difference between a post written from experience and one assembled from documentation. This kind of writing is harder to produce from AI alone, which is exactly what makes it credible.</p><p><strong>Record a short video walking through the repo.</strong> Many students applying for internships will not do this. I published two walkthroughs covering the above sample work: <a href="https://www.youtube.com/watch?v=deploying-agentic-rag-part1">Deploying Agentic RAG to Production, Part 1: FastAPI Data Ingestion</a> and <a href="https://www.youtube.com/watch?v=deploying-agentic-rag-part2">Deploying Agentic RAG to Production, Part 2</a>. </p><p>A hiring manager who watches three minutes of you explaining your own work has a clearer picture of how you think than one who reads a README alone.</p><blockquote><p>One tutorial. A couple of weeks of deliberate work. Three assets: an extended repo, a written reflection, and a video walkthrough. Each one builds on the same foundation and tells a consistent story about how you think and work.</p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FH6z!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31e98f42-2ed7-47a5-b47b-9a90d6383d71_1440x488.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FH6z!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31e98f42-2ed7-47a5-b47b-9a90d6383d71_1440x488.png 424w, https://substackcdn.com/image/fetch/$s_!FH6z!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31e98f42-2ed7-47a5-b47b-9a90d6383d71_1440x488.png 848w, https://substackcdn.com/image/fetch/$s_!FH6z!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31e98f42-2ed7-47a5-b47b-9a90d6383d71_1440x488.png 1272w, https://substackcdn.com/image/fetch/$s_!FH6z!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31e98f42-2ed7-47a5-b47b-9a90d6383d71_1440x488.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FH6z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31e98f42-2ed7-47a5-b47b-9a90d6383d71_1440x488.png" width="1440" height="488" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/31e98f42-2ed7-47a5-b47b-9a90d6383d71_1440x488.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:488,&quot;width&quot;:1440,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:68657,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aiarchplaybook.substack.com/i/192731177?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31e98f42-2ed7-47a5-b47b-9a90d6383d71_1440x488.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FH6z!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31e98f42-2ed7-47a5-b47b-9a90d6383d71_1440x488.png 424w, https://substackcdn.com/image/fetch/$s_!FH6z!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31e98f42-2ed7-47a5-b47b-9a90d6383d71_1440x488.png 848w, https://substackcdn.com/image/fetch/$s_!FH6z!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31e98f42-2ed7-47a5-b47b-9a90d6383d71_1440x488.png 1272w, https://substackcdn.com/image/fetch/$s_!FH6z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31e98f42-2ed7-47a5-b47b-9a90d6383d71_1440x488.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2><strong>Why this works</strong></h2><p>The reason this approach produces better portfolio projects than building from scratch is not that it is easier. It is that it mirrors how engineering work actually happens. In any real job, you will inherit code, understand it, identify what is missing, and improve it. The tutorial gives you something to push against. The gaps give you something real to solve. The baseline numbers give you a way to show that the solution worked.</p><p>A portfolio project that shows you can do that is more convincing to a hiring manager than one that shows you can follow a course and build what the instructor built.</p><p>If you try this approach, I would be curious to hear which step felt hardest, finding a tutorial worth extending, deciding which gap to close first, or getting the post and video out after the code was done. Those are usually where people get stuck, and they are worth talking through.</p>]]></content:encoded></item><item><title><![CDATA[How to Benchmark Your Embedding API and Find the Batch Size That Maximizes Throughput]]></title><description><![CDATA[A practical guide to batch size tuning, throughput benchmarking, and the failures that only surface at production scale]]></description><link>https://aiarchitectplaybook.com/p/how-to-benchmark-your-embedding-api</link><guid isPermaLink="false">https://aiarchitectplaybook.com/p/how-to-benchmark-your-embedding-api</guid><dc:creator><![CDATA[Shaikh Quader]]></dc:creator><pubDate>Sun, 15 Mar 2026 19:41:05 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!FSnz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c45de02-0db2-4cd6-a256-a987253bc338_2752x1536.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FSnz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c45de02-0db2-4cd6-a256-a987253bc338_2752x1536.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FSnz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c45de02-0db2-4cd6-a256-a987253bc338_2752x1536.jpeg 424w, https://substackcdn.com/image/fetch/$s_!FSnz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c45de02-0db2-4cd6-a256-a987253bc338_2752x1536.jpeg 848w, https://substackcdn.com/image/fetch/$s_!FSnz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c45de02-0db2-4cd6-a256-a987253bc338_2752x1536.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!FSnz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c45de02-0db2-4cd6-a256-a987253bc338_2752x1536.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FSnz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c45de02-0db2-4cd6-a256-a987253bc338_2752x1536.jpeg" width="1456" height="813" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7c45de02-0db2-4cd6-a256-a987253bc338_2752x1536.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:813,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:811440,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://aiarchplaybook.substack.com/i/191038445?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c45de02-0db2-4cd6-a256-a987253bc338_2752x1536.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FSnz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c45de02-0db2-4cd6-a256-a987253bc338_2752x1536.jpeg 424w, https://substackcdn.com/image/fetch/$s_!FSnz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c45de02-0db2-4cd6-a256-a987253bc338_2752x1536.jpeg 848w, https://substackcdn.com/image/fetch/$s_!FSnz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c45de02-0db2-4cd6-a256-a987253bc338_2752x1536.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!FSnz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c45de02-0db2-4cd6-a256-a987253bc338_2752x1536.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>When you build an application that uses embedding models - whether for semantic search, retrieval-augmented generation, or any other vector-based workflow - measuring performance is easy to put off. You get the feature working, the embeddings look right, and you move on.</p><p>Then someone asks: <em>&#8220;How long will it take to vectorize our 50 million documents?&#8221;</em></p><p>If you have not done the performance work by that point, the question becomes a problem without a quick answer. I have been there. This post is about doing that work before you need the answer, building a repeatable performance testing approach rather than figuring it out under pressure.</p><p>One clarification on scope: this guide focuses specifically on <strong>batch embedding generation</strong> - sending multiple text inputs to an embedding API in a single call and measuring the throughput. If you are calling an embedding API one input at a time, some of this still applies, but the biggest gains described here come from batching. That context matters as you read.</p><p>I will walk through the performance testing approach I have been developing and refining through real experiments against cloud embedding APIs and a self-hosted vLLM endpoint. The work ranged from small single-request tests all the way to running embedding generation over datasets at million-row scale.</p><div><hr></div><h2>Why Embedding APIs Warrant Their Own Performance Testing Approach</h2><p>Embedding APIs are used differently from text generation APIs, and that difference shapes how you should think about performance.</p><p>With a generation API, you typically process one request at a time - a user sends a prompt, the model responds. With an embedding API, you are almost always working in bulk. A product catalog with 10 million descriptions needs all 10 million vectorized. A document corpus gets re-indexed every night. The volume is large, and it often runs as a background job where throughput matters more than single-request response time.</p><p>This bulk processing nature means that small configuration choices - like batch size - can have large effects on the total time and cost to complete a job. For example, running with a poorly chosen batch size can cut your throughput significantly compared to an optimized one, and that difference compounds across every run, every day, at every scale. It is not the only thing that matters, but it is one of the most controllable levers you have.</p><p>The performance testing approach covered here focuses on four questions:</p><ol><li><p>What limits does your embedding API put on each call?</p></li><li><p>At what batch size does throughput peak for your model and data?</p></li><li><p>How does throughput drop when you go past that batch size?</p></li><li><p>What breaks when you run the same job at production scale?</p></li></ol><div><hr></div><h2>Step 1: Know Your API&#8217;s Hard Limits Before You Test Anything</h2><p>Before running a single test, check the documentation for your embedding provider and answer one question: how does this API limit what you can send in a single call?</p><p>There are two common patterns:</p><p><strong>Input count limits.</strong> Some providers cap the number of text inputs you can include in one API call. Once you exceed that number, the request is rejected. Always verify the current limit in the provider&#8217;s documentation, as these can change.</p><p><strong>Token budget limits.</strong> Other providers do not set an input count limit directly, but cap the total number of tokens across all inputs in a single call. For example, if the limit is 300,000 tokens and your texts average 450 tokens each, that budget translates to roughly 666 inputs per call. The limit is not stated as an input count, but it effectively becomes one based on your data.</p><p>These two approaches require different batching logic. With an input count limit, you can set a fixed maximum batch size. With a token budget, your effective batch size depends on the actual token length of your inputs, so you need to calculate it dynamically for each batch rather than hardcoding a number. Getting this wrong - for example, assuming all inputs are short when some are long - can cause unexpected failures partway through a large job.</p><p>All of the APIs I tested were REST APIs. The limits below reflect what the REST API accepts per call. These figures come from publicly available documentation at the time of writing. Always check the provider's current docs before using them, as limits can change.</p><ul><li><p><strong>WatsonX AI</strong> &#8212; Input count limit. Observed up to 1,000 inputs per call in my own testing. <a href="https://cloud.ibm.com/apidocs/watsonx-ai#text-embeddings">IBM Cloud API Docs</a></p></li><li><p><strong>OpenAI</strong> &#8212; Token budget. 300,000 tokens total across all inputs per call. <a href="https://platform.openai.com/docs/api-reference/embeddings/create">OpenAI API Reference</a></p></li><li><p><strong>Azure OpenAI</strong> &#8212; Input count limit. Up to 2,048 inputs per array per call. <a href="https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/embeddings">Azure OpenAI docs</a></p></li><li><p><strong>AWS Bedrock (Titan)</strong> &#8212; Single input only. The InvokeModel REST API accepts one input string per call &#8212; batch arrays are not supported. <a href="https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-titan-embed-text.html">AWS docs</a></p></li><li><p><strong>vLLM (self-hosted)</strong> &#8212; Depends on deployment. The server operator controls limits at startup via <code>--max-num-batched-tokens</code> and <code>--max-num-seqs</code>. In my own testing I was able to send up to 10,000 inputs in a single call. <a href="https://docs.vllm.ai/en/stable/configuration/engine_args/">vLLM Engine Arguments</a></p></li></ul><p><strong>A note on counting tokens before you send them.</strong> If your provider uses a token budget, you need a way to estimate token counts on your side before constructing each batch. Calling the API itself to count tokens is expensive and defeats the purpose. A practical alternative is to use an open-source tokenizer locally. For OpenAI-compatible APIs, the <code>tiktoken</code> Python library can count tokens offline at no API cost. For other models, the <code>transformers</code> library from Hugging Face provides tokenizers that run locally. Neither requires a network call, and both run quickly enough to be part of your batching logic at scale.</p><div><hr></div><h2>Step 2: Measure the Right Things - Break Down Where Time Goes</h2><p>Before you start adjusting batch sizes, set up your measurement code to separate request time into three distinct parts. This is easy to skip, but it is where the most useful diagnostic information comes from.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GNM9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7ec32c7-991b-42e0-846a-1210c7404bbc_2952x879.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GNM9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7ec32c7-991b-42e0-846a-1210c7404bbc_2952x879.png 424w, https://substackcdn.com/image/fetch/$s_!GNM9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7ec32c7-991b-42e0-846a-1210c7404bbc_2952x879.png 848w, https://substackcdn.com/image/fetch/$s_!GNM9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7ec32c7-991b-42e0-846a-1210c7404bbc_2952x879.png 1272w, https://substackcdn.com/image/fetch/$s_!GNM9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7ec32c7-991b-42e0-846a-1210c7404bbc_2952x879.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GNM9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7ec32c7-991b-42e0-846a-1210c7404bbc_2952x879.png" width="1456" height="434" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f7ec32c7-991b-42e0-846a-1210c7404bbc_2952x879.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:434,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:181606,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aiarchplaybook.substack.com/i/191038445?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7ec32c7-991b-42e0-846a-1210c7404bbc_2952x879.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!GNM9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7ec32c7-991b-42e0-846a-1210c7404bbc_2952x879.png 424w, https://substackcdn.com/image/fetch/$s_!GNM9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7ec32c7-991b-42e0-846a-1210c7404bbc_2952x879.png 848w, https://substackcdn.com/image/fetch/$s_!GNM9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7ec32c7-991b-42e0-846a-1210c7404bbc_2952x879.png 1272w, https://substackcdn.com/image/fetch/$s_!GNM9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7ec32c7-991b-42e0-846a-1210c7404bbc_2952x879.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Payload preparation</strong> is the time your code spends building the request - formatting your list of input texts into the JSON structure the API expects. For small batches, this is negligible. For large batches with many long texts, it can add up and become worth optimizing separately.</p><p><strong>API wait</strong> is the time from when your request leaves your machine to when the response starts coming back. This includes network travel time in both directions, any time the request spends waiting in a queue on the provider&#8217;s side, and the time the model actually takes to compute the embeddings. This phase typically takes the most time, and it grows with batch size - though not in a straight line, as you will see in Step 3.</p><p><strong>Response parsing</strong> is the time your code spends reading the response and extracting the embedding vectors. At larger batch sizes, responses get large. A batch of 1,000 embeddings at 768 dimensions, stored as 32-bit floats, contains about 3 MB of numerical data to read and process. Measuring this separately helps confirm whether response handling is ever a bottleneck.</p><p>When you break latency into these three parts, you can see where to focus. If API wait is the dominant cost, then batching strategy is your main lever. If payload prep is surprisingly slow, look at how you are constructing requests. If response parsing takes longer than expected, look at how you are reading and storing results.</p><div><hr></div><h2>Step 3: Find Where Throughput Peaks - and Where It Falls Off</h2><p>This is the core of the performance testing work. The goal is to find the batch size where you get the most inputs processed per second, and to understand what happens on both sides of that point.</p><p>When you plot throughput against batch size, you get a curve with a predictable shape: it rises quickly at first, peaks somewhere in the middle, and then gradually declines. Three things drive that shape:</p><p><strong>Why throughput rises as batch size increases.</strong> Every API call has a fixed cost you pay regardless of how many inputs are in it: the time to establish the connection, send the request, and receive the response. At batch size 1, that fixed cost is most of your total call time. At batch size 10, you pay the same fixed cost but process 10 inputs instead of 1, so your throughput is roughly 10 times higher. As batch size grows, that fixed cost gets spread across more inputs and throughput climbs.</p><p><strong>Why throughput eventually stops climbing.</strong> Once the batch is large enough, the fixed overhead is fully spread out and the model&#8217;s compute time per input becomes the main cost. Throughput levels off. This is the plateau.</p><p><strong>Why throughput can drop with very large batches.</strong> Very large batches introduce new costs: larger payloads take longer to send over the network, the server may need to split your batch into smaller internal chunks to process it, and memory pressure on the server side increases response time. The result is that throughput can actually decrease beyond a certain point, even though more data is being sent.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9Tkf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bb6236d-4315-438f-a9af-a82de7963d3a_2830x1426.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9Tkf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bb6236d-4315-438f-a9af-a82de7963d3a_2830x1426.png 424w, https://substackcdn.com/image/fetch/$s_!9Tkf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bb6236d-4315-438f-a9af-a82de7963d3a_2830x1426.png 848w, https://substackcdn.com/image/fetch/$s_!9Tkf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bb6236d-4315-438f-a9af-a82de7963d3a_2830x1426.png 1272w, https://substackcdn.com/image/fetch/$s_!9Tkf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bb6236d-4315-438f-a9af-a82de7963d3a_2830x1426.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9Tkf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bb6236d-4315-438f-a9af-a82de7963d3a_2830x1426.png" width="1456" height="734" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8bb6236d-4315-438f-a9af-a82de7963d3a_2830x1426.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:734,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:196187,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aiarchplaybook.substack.com/i/191038445?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bb6236d-4315-438f-a9af-a82de7963d3a_2830x1426.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9Tkf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bb6236d-4315-438f-a9af-a82de7963d3a_2830x1426.png 424w, https://substackcdn.com/image/fetch/$s_!9Tkf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bb6236d-4315-438f-a9af-a82de7963d3a_2830x1426.png 848w, https://substackcdn.com/image/fetch/$s_!9Tkf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bb6236d-4315-438f-a9af-a82de7963d3a_2830x1426.png 1272w, https://substackcdn.com/image/fetch/$s_!9Tkf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bb6236d-4315-438f-a9af-a82de7963d3a_2830x1426.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>A few patterns worth noting in this kind of result:</p><p>Batch sizes 1 and 10 can have nearly the same total call time (both around 0.4 seconds in this example), yet batch 10 delivers about 10 times the throughput. This is the fixed-cost effect at its clearest - you are doing 10 times the work for no additional waiting time.</p><p>The provider&#8217;s maximum allowed batch size is often not the most efficient one. In this sample, batch 1,000 - the largest - produces the lowest throughput. The API accepting a batch size does not mean that size is the best choice.</p><p>The peak is typically a range, not a single number. In the sample above, batch sizes from about 200 to 300 all deliver throughput within 5-10% of the peak. This is useful in practice: if your batches are not always exactly the same size, anything in that range will perform similarly well.</p><p><strong>How to run this test.</strong> Pick a fixed dataset of text inputs with a consistent average length (matching your real workload as closely as possible). Start with batch size 1 and work up through increasing sizes - for example, 1, 10, 50, 100, 200, 250, 300, 400, 500, up to your provider&#8217;s maximum. For each batch size, run at least three separate calls and average the results to reduce the effect of network variation. Include one unmeasured warm-up call at the start of each batch size to avoid counting any startup effects in your measurements.</p><div><hr></div><h2>Step 4: Test Each Model Separately</h2><p>The batch size that gives the best throughput is not the same for every model. Smaller models compute embeddings faster per input, so they can handle larger batches before hitting the plateau. Larger models take more compute time per input, so the plateau arrives at a lower batch size, and the performance drop from oversized batches tends to be steeper.</p><p>When I ran the same batch size test across three models of different sizes - small (around 30M parameters), medium (around 125M), and large (around 500M or more) - the curve shape was similar for all three, but the peak location and the absolute throughput values were different for each. Larger models peaked at a lower batch size and at lower throughput overall.</p><p>The takeaway is simple: if you are working with more than one embedding model, run the batch size test for each one independently. Do not carry over the optimal batch size from a small model and assume it applies to a larger one.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!G_gP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a19817b-6ef9-4fd6-b972-ccae33412acc_2394x1596.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!G_gP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a19817b-6ef9-4fd6-b972-ccae33412acc_2394x1596.png 424w, https://substackcdn.com/image/fetch/$s_!G_gP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a19817b-6ef9-4fd6-b972-ccae33412acc_2394x1596.png 848w, https://substackcdn.com/image/fetch/$s_!G_gP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a19817b-6ef9-4fd6-b972-ccae33412acc_2394x1596.png 1272w, https://substackcdn.com/image/fetch/$s_!G_gP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a19817b-6ef9-4fd6-b972-ccae33412acc_2394x1596.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!G_gP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a19817b-6ef9-4fd6-b972-ccae33412acc_2394x1596.png" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4a19817b-6ef9-4fd6-b972-ccae33412acc_2394x1596.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:230286,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aiarchplaybook.substack.com/i/191038445?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a19817b-6ef9-4fd6-b972-ccae33412acc_2394x1596.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!G_gP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a19817b-6ef9-4fd6-b972-ccae33412acc_2394x1596.png 424w, https://substackcdn.com/image/fetch/$s_!G_gP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a19817b-6ef9-4fd6-b972-ccae33412acc_2394x1596.png 848w, https://substackcdn.com/image/fetch/$s_!G_gP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a19817b-6ef9-4fd6-b972-ccae33412acc_2394x1596.png 1272w, https://substackcdn.com/image/fetch/$s_!G_gP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a19817b-6ef9-4fd6-b972-ccae33412acc_2394x1596.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>To keep the comparison clean, hold everything else constant - the same input texts, the same infrastructure, the same measurement approach. Change only the model. This way, any difference in results can be attributed to the model rather than noise from other variables.</p><div><hr></div><h2>Step 5: Test With Different Input Lengths</h2><p>Most performance tests use a fixed text length, but real data rarely has perfectly uniform length. The token length of your inputs affects throughput in ways that are worth understanding before you run your tests, and it affects providers differently depending on whether they use input count limits or token budget limits.</p><p>A practical way to test this is to prepare separate datasets at a few different average token lengths - for example, a short-text set (around 50 tokens per input, like product titles or search queries), a medium-text set (around 150 to 250 tokens, like short paragraphs), and a long-text set (around 400 to 500 tokens, like full document sections). Run the same batch size test on each.</p><p>With short texts, you can fit more inputs within a token budget per call, but the fixed API overhead becomes a larger proportion of your total time per input. With long texts, your effective batch size shrinks if you are working within a token budget. The optimal batch size by input count may be quite different across these three workloads.</p><p>One specific edge case worth testing: what happens when an input exceeds the model&#8217;s maximum token length? Some APIs truncate the input silently and return an embedding anyway. Others return an error. Others may return something that looks like a valid embedding but reflects only part of the input. The behavior varies by provider and model, and it is worth verifying directly for your setup rather than assuming. If your real data has any inputs that might be unusually long, this test can prevent silent failures in production.</p><div><hr></div><h2>Step 6: Test at Real Scale</h2><p>Small tests tell you how the API behaves call by call. They do not tell you what happens when you run a job across millions of inputs over several hours. Scale introduces a different set of problems, and many of them only become visible once you are running at that size.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0VKJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc8c094c-4640-413a-b02f-38e3f4a8b916_2952x1080.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0VKJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc8c094c-4640-413a-b02f-38e3f4a8b916_2952x1080.png 424w, https://substackcdn.com/image/fetch/$s_!0VKJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc8c094c-4640-413a-b02f-38e3f4a8b916_2952x1080.png 848w, https://substackcdn.com/image/fetch/$s_!0VKJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc8c094c-4640-413a-b02f-38e3f4a8b916_2952x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!0VKJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc8c094c-4640-413a-b02f-38e3f4a8b916_2952x1080.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0VKJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc8c094c-4640-413a-b02f-38e3f4a8b916_2952x1080.png" width="1456" height="533" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bc8c094c-4640-413a-b02f-38e3f4a8b916_2952x1080.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:533,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:174027,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aiarchplaybook.substack.com/i/191038445?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc8c094c-4640-413a-b02f-38e3f4a8b916_2952x1080.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0VKJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc8c094c-4640-413a-b02f-38e3f4a8b916_2952x1080.png 424w, https://substackcdn.com/image/fetch/$s_!0VKJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc8c094c-4640-413a-b02f-38e3f4a8b916_2952x1080.png 848w, https://substackcdn.com/image/fetch/$s_!0VKJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc8c094c-4640-413a-b02f-38e3f4a8b916_2952x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!0VKJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc8c094c-4640-413a-b02f-38e3f4a8b916_2952x1080.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I will not share specific numbers from my own large-scale experiments here, as that work was done in a professional context. But the problems I ran into are common enough to be worth sharing.</p><p><strong>The bottleneck often is not the API.</strong> When running a long embedding job end to end, the slowest part is not always the embedding API call. Time spent reading inputs from the source system, writing results back to storage, and handling coordination between parallel workers can all add up. Measure the full pipeline, not just the API call times, to find where the real time goes.</p><p><strong>Write volume adds up at scale.</strong> Embedding vectors are not small. At 768 dimensions stored as 32-bit floats, each embedding is exactly 3 KB. Writing 1 million embeddings means writing just under 3 GB of data. If you are writing to a database or storage system that has transaction or write limits, those limits can become a bottleneck or cause failures mid-job. Check the limits of your write destination before starting a large run, not after hitting an error.</p><p><strong>Concurrent API calls and token rate limits interact.</strong> Running multiple parallel workers to call the embedding API at the same time can speed things up significantly. But if the provider has a rate limit measured in tokens per minute, the token usage from all your parallel workers adds up together. It is easy to hit a rate limit mid-job when running in parallel even if no single worker would hit it on its own. Test your parallel configuration against the provider&#8217;s rate limits before committing to it at full scale.</p><p><strong>Connection pool limits surface late.</strong> If your parallel workers each hold a database connection for writing results, the total number of concurrent connections can exceed your connection pool limit. This type of failure often does not show up until you are partway through a large job. Test with enough scale to expose it before running the full job.</p><p><strong>Build resume capability before you need it.</strong> A job processing a million inputs that fails at 80% completion should be able to pick up where it left off, not start over. This is straightforward to build before you start a large run and very difficult to add after you have already failed and need to rerun. Track which inputs have been successfully processed and skip them on restart.</p><div><hr></div><h2>Step 7: Run Multiple Iterations and Look at Consistency</h2><p>A single measurement tells you what happened once. API performance varies based on server load on the provider&#8217;s side, network conditions between you and the provider, and the time of day the call is made. A result from one run may not reflect what you will see reliably in practice.</p><p>Running each batch size configuration at least three times and looking at the spread of results gives a more reliable picture. If the three results are close together, you can be more confident in the average. If one run is significantly different from the others, that variance is itself useful information - it tells you that the performance at that configuration is less predictable, which matters for planning.</p><p>In my own tests, mid-range batch sizes tended to show more variation between runs than very small or very large ones. The practical lesson: when planning how much headroom to build into a performance estimate for a production job, do not base it only on the best case from a small number of test runs. Look at the range of results and plan for something closer to the typical case, not the best one.</p><div><hr></div><h2>Putting It Together: A Testing Checklist</h2><p>Here is a checklist that pulls together the steps above. Use it as a starting point and adjust it to your situation.</p><p><strong>Before you start testing:</strong></p><ul><li><p>Find the current documented limits for your embedding API (input count or token budget).</p></li><li><p>Determine whether you need static batch sizes (input count limit) or dynamic batch sizing (token budget).</p></li><li><p>Set up local token counting so you can estimate batch sizes without API calls (e.g., <code>tiktoken</code> for OpenAI-compatible APIs).</p></li><li><p>Instrument your code to separately measure payload preparation time, API wait time, and response parsing time.</p></li><li><p>Prepare representative test datasets at short, medium, and long average token lengths.</p></li></ul><p><strong>Running the batch size test:</strong></p><ul><li><p>Test batch sizes from 1 up to the provider&#8217;s maximum, using regular intervals.</p></li><li><p>Run at least three iterations for each batch size.</p></li><li><p>Use one unmeasured warm-up call at the start of each batch size series.</p></li><li><p>Change only one variable at a time across test runs.</p></li><li><p>Repeat the test for each embedding model you plan to use.</p></li></ul><p><strong>Edge cases to verify:</strong></p><ul><li><p>What happens when an input exceeds the model&#8217;s maximum token length?</p></li><li><p>What happens when you approach or exceed the provider&#8217;s rate limit?</p></li><li><p>How does performance vary when input lengths within a single batch are uneven?</p></li></ul><p><strong>Scaling to production volume:</strong></p><ul><li><p>Test the full pipeline end to end, not just the API calls</p></li><li><p>Check write volume and storage limits for your target system at scale</p></li><li><p>Test parallel worker configuration against the provider&#8217;s rate limits</p></li><li><p>Verify connection pool limits under your expected concurrency level</p></li><li><p>Build and test checkpoint and resume functionality before running at full scale</p></li><li><p>Run at an intermediate scale (for example, 1% or 10% of your target volume) before committing to the full job</p></li></ul><div><hr></div><h2>Closing Thought</h2><p>The biggest performance gain in embedding batch workloads does not require new hardware, a faster model, or a different provider. It comes from batching correctly.</p><p>Two API calls that take the same time can have very different throughput &#8212; one sending a single input, the other sending a well-sized batch. The larger batch processes many more inputs for the same cost. At the scale embedding pipelines typically run, that difference translates directly into hours of processing time and real infrastructure cost.</p><p>The numbers will look different for every model, workload, and provider. The method for finding them does not change.</p><p>If you have run similar experiments or found different patterns, I would be curious to hear about it in the comments.</p><div><hr></div><blockquote><p><em>All experiments were run from my personal machine calling external APIs. Any numbers shown are illustrative samples only - they demonstrate the shape of results, not actual measurements from any specific provider. These are personal observations and do not represent my employer or any official benchmark.</em></p></blockquote>]]></content:encoded></item><item><title><![CDATA[My 2025 Conference Talks: What I Learned From 16 Sessions Across 4 Continents]]></title><description><![CDATA[Two hours before my Sydney talk in March, I was walking around the harbor - practicing my entire presentation out loud, without looking at slides.]]></description><link>https://aiarchitectplaybook.com/p/my-2025-conference-talks-what-i-learned</link><guid isPermaLink="false">https://aiarchitectplaybook.com/p/my-2025-conference-talks-what-i-learned</guid><dc:creator><![CDATA[Shaikh Quader]]></dc:creator><pubDate>Tue, 16 Dec 2025 11:00:35 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!CUJE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F384ef793-55e0-464c-b127-0fc055c3ecd1_1100x744.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Two hours before my Sydney talk in March, I was walking around the harbor - practicing my entire presentation out loud, without looking at slides. That&#8217;s how I prepare. No desk. No last-minute slide reviews. Just a walk.</p><p>2025 has been my most active year as a technical speaker in my 20-year IBM career. I traveled to seven cities across four continents, delivered 16 conference sessions totaling 25.5 hours, and reached over 1,000 in-person attendees - plus many more through webinars and internal sessions.</p><h1>By the Numbers</h1><p><strong>6 conferences | 7 cities | 4 continents | 25.5 hours | 1,000+ in-person attendees</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!CUJE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F384ef793-55e0-464c-b127-0fc055c3ecd1_1100x744.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!CUJE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F384ef793-55e0-464c-b127-0fc055c3ecd1_1100x744.png 424w, https://substackcdn.com/image/fetch/$s_!CUJE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F384ef793-55e0-464c-b127-0fc055c3ecd1_1100x744.png 848w, https://substackcdn.com/image/fetch/$s_!CUJE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F384ef793-55e0-464c-b127-0fc055c3ecd1_1100x744.png 1272w, https://substackcdn.com/image/fetch/$s_!CUJE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F384ef793-55e0-464c-b127-0fc055c3ecd1_1100x744.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!CUJE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F384ef793-55e0-464c-b127-0fc055c3ecd1_1100x744.png" width="1100" height="744" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/384ef793-55e0-464c-b127-0fc055c3ecd1_1100x744.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:744,&quot;width&quot;:1100,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3279686,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://aiarchplaybook.substack.com/i/181777635?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F384ef793-55e0-464c-b127-0fc055c3ecd1_1100x744.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!CUJE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F384ef793-55e0-464c-b127-0fc055c3ecd1_1100x744.png 424w, https://substackcdn.com/image/fetch/$s_!CUJE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F384ef793-55e0-464c-b127-0fc055c3ecd1_1100x744.png 848w, https://substackcdn.com/image/fetch/$s_!CUJE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F384ef793-55e0-464c-b127-0fc055c3ecd1_1100x744.png 1272w, https://substackcdn.com/image/fetch/$s_!CUJE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F384ef793-55e0-464c-b127-0fc055c3ecd1_1100x744.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>My Framework for Preparing Technical Talks</h1><p>Here&#8217;s the structure I use:</p><p><strong>One Thread, One Theme. </strong>Every slide supports a single theme. No distractions.</p><p><strong>Own Your Content. </strong>I create all my slides from scratch. No borrowed decks. This means I know exactly what each slide says and can talk about it naturally.</p><p><strong>Revise Until It&#8217;s Clean. </strong>For each slide, I ask: <em>What is the key message here?</em> I keep editing until every extra word and image is gone.</p><p><strong>AI as My Design Partner. </strong>Last few years and much part of this year, I used ChatGPT for brainstorming and designing slide visuals. Now, my favorite tool for planning  conference slides is <strong>Claude</strong>. For image editing, I use <strong>Canva</strong>.</p><p><strong>Start with the Title. </strong>I open every talk by explaining each word in the title and telling the audience what they will learn. This sets clear expectations from the start.</p><p><strong>Live Demos Only. </strong>No pre-recorded demos. I build and run demos myself, live on stage. This engages the audience. Also, when I built a demo, I did&#8217;t need to memorize - I just explained what I built.</p><p><strong>Keep the Audience Engaged. </strong>I ask questions, add light humor, and repeat key points throughout the talk.</p><h1>What Changed</h1><p>At the start of this year, conference talks felt uncomfortable. With each session, it got easier. Now I look forward to larger audiences - the bigger, the better. I enjoy making eye contact, reading the room, answering questions mid-talk. The best moment is when I explain something complex and see it click. That&#8217;s the joy.</p><p></p>]]></content:encoded></item><item><title><![CDATA[Understanding Machine Learning Model Types]]></title><description><![CDATA[How to pick the right one for your task]]></description><link>https://aiarchitectplaybook.com/p/understanding-machine-learning-model</link><guid isPermaLink="false">https://aiarchitectplaybook.com/p/understanding-machine-learning-model</guid><dc:creator><![CDATA[Shaikh Quader]]></dc:creator><pubDate>Fri, 12 Dec 2025 15:40:11 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Wpi7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9539790f-9bab-45a6-8832-3671f2165781_1200x630.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Wpi7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9539790f-9bab-45a6-8832-3671f2165781_1200x630.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Wpi7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9539790f-9bab-45a6-8832-3671f2165781_1200x630.png 424w, https://substackcdn.com/image/fetch/$s_!Wpi7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9539790f-9bab-45a6-8832-3671f2165781_1200x630.png 848w, https://substackcdn.com/image/fetch/$s_!Wpi7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9539790f-9bab-45a6-8832-3671f2165781_1200x630.png 1272w, https://substackcdn.com/image/fetch/$s_!Wpi7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9539790f-9bab-45a6-8832-3671f2165781_1200x630.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Wpi7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9539790f-9bab-45a6-8832-3671f2165781_1200x630.png" width="1200" height="630" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9539790f-9bab-45a6-8832-3671f2165781_1200x630.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:630,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:180767,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://aiarchplaybook.substack.com/i/181415279?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9539790f-9bab-45a6-8832-3671f2165781_1200x630.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Wpi7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9539790f-9bab-45a6-8832-3671f2165781_1200x630.png 424w, https://substackcdn.com/image/fetch/$s_!Wpi7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9539790f-9bab-45a6-8832-3671f2165781_1200x630.png 848w, https://substackcdn.com/image/fetch/$s_!Wpi7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9539790f-9bab-45a6-8832-3671f2165781_1200x630.png 1272w, https://substackcdn.com/image/fetch/$s_!Wpi7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9539790f-9bab-45a6-8832-3671f2165781_1200x630.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There is not one algorithm that&#8217;s going to solve all your problems. Think of it like wanting to unscrew something. You have a toolbox with different sized screwdriver bits, and you have to find the right one for the screw in front of you. Machine learning works the same way. You have to see what kind of data you are dealing with and the learning task at hand, and then find the right tool, the right algorithm, the right AI technique.</p><p>Some data sits neatly in rows and columns. Other data is a mess of text, images, and files. Each needs different tools to unlock its value.</p><p>In this article, I&#8217;ll walk you through the main types of machine learning models and learning tasks. This is not meant to be exhaustive. I want to cover the more representative models so you have a solid foundation.</p><h1>First Things First: Structured vs. Unstructured Data</h1><p>Before we dive into the model types, we need to understand the two broad categories of data we deal with: structured data and unstructured data.</p><p><strong>Structured data </strong>is the kind of data stored in relational schemas, it&#8217;s not free form. It comes with a structured format. You have a table with a list of columns, and each column has some attribute values. Maybe an employee table with employee name, join date, department, and salary. Everything is fixed with columns and rows. You can make a lot of assumptions about the data structure, and you have much more predictable characteristics because it&#8217;s a fixed structure.</p><p>I find this analogy useful: think of going to a library where all the books are organized on shelves. If you&#8217;re looking for a specific book, you ask the librarian, and they just look up and tell you exactly which shelf you can find it on. Everything is very neatly organized.</p><p><strong>Unstructured data</strong>, on the other hand, is like a very messy situation, maybe people who just returned all the books have left them haphazardly in a pile. You don&#8217;t know where you can find anything. Unstructured data includes things like free-form text, images, audio, and video.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BFyo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba0d5727-8f38-463b-8509-91dcb5acc636_750x450.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BFyo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba0d5727-8f38-463b-8509-91dcb5acc636_750x450.png 424w, https://substackcdn.com/image/fetch/$s_!BFyo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba0d5727-8f38-463b-8509-91dcb5acc636_750x450.png 848w, https://substackcdn.com/image/fetch/$s_!BFyo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba0d5727-8f38-463b-8509-91dcb5acc636_750x450.png 1272w, https://substackcdn.com/image/fetch/$s_!BFyo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba0d5727-8f38-463b-8509-91dcb5acc636_750x450.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BFyo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba0d5727-8f38-463b-8509-91dcb5acc636_750x450.png" width="750" height="450" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ba0d5727-8f38-463b-8509-91dcb5acc636_750x450.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:450,&quot;width&quot;:750,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:355870,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aiarchplaybook.substack.com/i/181415279?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba0d5727-8f38-463b-8509-91dcb5acc636_750x450.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BFyo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba0d5727-8f38-463b-8509-91dcb5acc636_750x450.png 424w, https://substackcdn.com/image/fetch/$s_!BFyo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba0d5727-8f38-463b-8509-91dcb5acc636_750x450.png 848w, https://substackcdn.com/image/fetch/$s_!BFyo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba0d5727-8f38-463b-8509-91dcb5acc636_750x450.png 1272w, https://substackcdn.com/image/fetch/$s_!BFyo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba0d5727-8f38-463b-8509-91dcb5acc636_750x450.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>How Machine Learning Models Learn</h1><p>Before we look at specific model types, you need to understand the two fundamental ways models learn from data: supervised and unsupervised learning.</p><h2>Supervised Learning</h2><p>In supervised learning, you provide the model with examples that include the correct answers. You need two types of content in your data.</p><p><strong>Features: </strong>A set of attributes that describe something. For example, if you want to predict customer churn, you might collect which products customers are using, how many licenses they have, how many times they called support.</p><p><strong>Labels/Targets: </strong>The outcome, what actually happened. When the time came for renewal, did this customer renew or not?</p><p>You collect input and output combinations for many examples, then give this to a learning algorithm. The algorithm figures out how to map inputs to outputs. </p><p>A good way to think about it: remember equations like <strong>y = constant + a&#8321;x&#8321; + a&#8322;x&#8322;</strong>? </p><p>The machine learning algorithm learns this kind of mapping from your data.</p><p>Once trained, for future events where you only know the inputs, you plug them into the model and it predicts the output. This process is called scoring or inferencing.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Xriv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedb7712a-be65-4d60-b526-9cba13aee9ec_796x450.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Xriv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedb7712a-be65-4d60-b526-9cba13aee9ec_796x450.png 424w, https://substackcdn.com/image/fetch/$s_!Xriv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedb7712a-be65-4d60-b526-9cba13aee9ec_796x450.png 848w, https://substackcdn.com/image/fetch/$s_!Xriv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedb7712a-be65-4d60-b526-9cba13aee9ec_796x450.png 1272w, https://substackcdn.com/image/fetch/$s_!Xriv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedb7712a-be65-4d60-b526-9cba13aee9ec_796x450.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Xriv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedb7712a-be65-4d60-b526-9cba13aee9ec_796x450.png" width="796" height="450" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/edb7712a-be65-4d60-b526-9cba13aee9ec_796x450.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:450,&quot;width&quot;:796,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:108989,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aiarchplaybook.substack.com/i/181415279?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedb7712a-be65-4d60-b526-9cba13aee9ec_796x450.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Xriv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedb7712a-be65-4d60-b526-9cba13aee9ec_796x450.png 424w, https://substackcdn.com/image/fetch/$s_!Xriv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedb7712a-be65-4d60-b526-9cba13aee9ec_796x450.png 848w, https://substackcdn.com/image/fetch/$s_!Xriv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedb7712a-be65-4d60-b526-9cba13aee9ec_796x450.png 1272w, https://substackcdn.com/image/fetch/$s_!Xriv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedb7712a-be65-4d60-b526-9cba13aee9ec_796x450.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Self-Supervised / Unsupervised Learning</h2><p>In self-supervised learning, you don&#8217;t provide the correct answers. You give the model data and let it find patterns or groupings on its own. You only have features, no labels.</p><p>Clustering is a common self-supervised technique. Algorithms like k-means clustering group similar data points together without being told what the groups should be. For example, you might have customer data and want to discover natural segments. The algorithm analyzes the features and groups similar customers together. Marketing teams often use this to identify customer personas they didn&#8217;t know existed.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WkHm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce4c43d4-8ab0-45c1-a7f0-9cda66ddfb00_750x376.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WkHm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce4c43d4-8ab0-45c1-a7f0-9cda66ddfb00_750x376.png 424w, https://substackcdn.com/image/fetch/$s_!WkHm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce4c43d4-8ab0-45c1-a7f0-9cda66ddfb00_750x376.png 848w, https://substackcdn.com/image/fetch/$s_!WkHm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce4c43d4-8ab0-45c1-a7f0-9cda66ddfb00_750x376.png 1272w, https://substackcdn.com/image/fetch/$s_!WkHm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce4c43d4-8ab0-45c1-a7f0-9cda66ddfb00_750x376.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WkHm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce4c43d4-8ab0-45c1-a7f0-9cda66ddfb00_750x376.png" width="750" height="376" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ce4c43d4-8ab0-45c1-a7f0-9cda66ddfb00_750x376.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:376,&quot;width&quot;:750,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:105025,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aiarchplaybook.substack.com/i/181415279?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce4c43d4-8ab0-45c1-a7f0-9cda66ddfb00_750x376.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!WkHm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce4c43d4-8ab0-45c1-a7f0-9cda66ddfb00_750x376.png 424w, https://substackcdn.com/image/fetch/$s_!WkHm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce4c43d4-8ab0-45c1-a7f0-9cda66ddfb00_750x376.png 848w, https://substackcdn.com/image/fetch/$s_!WkHm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce4c43d4-8ab0-45c1-a7f0-9cda66ddfb00_750x376.png 1272w, https://substackcdn.com/image/fetch/$s_!WkHm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce4c43d4-8ab0-45c1-a7f0-9cda66ddfb00_750x376.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>Predictive Models</h1><p>Predictive models use supervised learning to give you predictions about future outcomes. Classification, regression, and time series forecasting all require labeled training data.</p><h2>Classification</h2><p>For structured data, one kind of prediction you can make is the category or class of something. A classic and familiar example: you have an email, and you want to classify it into one of two categories, valid email so it can get to your inbox, or spam. That&#8217;s what Google has done with their spam folder and inbox. They have a spam classifier that looks at each email and a set of characteristics (the features), and based on this, classifies each email as spam or valid (the label).</p><p>Another example is fraud detection. You go to your grocery store and swipe your credit card to pay. Behind the scene, it goes to the credit card authorization system, which quickly checks whether this transaction is valid or fraudulent.</p><p>Why is this called classification? Because this model is making some conclusion about the input, a structured row of data, and saying that this record belongs to one of a finite set of classes. It&#8217;s either valid or fraudulent. You can have multiple classes, but it&#8217;s always a finite set of possible outcomes. That&#8217;s called classification.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!S7YK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbaf9ef6-aa4e-4b86-a804-abac61caa344_750x376.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!S7YK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbaf9ef6-aa4e-4b86-a804-abac61caa344_750x376.png 424w, https://substackcdn.com/image/fetch/$s_!S7YK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbaf9ef6-aa4e-4b86-a804-abac61caa344_750x376.png 848w, https://substackcdn.com/image/fetch/$s_!S7YK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbaf9ef6-aa4e-4b86-a804-abac61caa344_750x376.png 1272w, https://substackcdn.com/image/fetch/$s_!S7YK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbaf9ef6-aa4e-4b86-a804-abac61caa344_750x376.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!S7YK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbaf9ef6-aa4e-4b86-a804-abac61caa344_750x376.png" width="750" height="376" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dbaf9ef6-aa4e-4b86-a804-abac61caa344_750x376.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:376,&quot;width&quot;:750,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:92432,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aiarchplaybook.substack.com/i/181415279?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbaf9ef6-aa4e-4b86-a804-abac61caa344_750x376.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!S7YK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbaf9ef6-aa4e-4b86-a804-abac61caa344_750x376.png 424w, https://substackcdn.com/image/fetch/$s_!S7YK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbaf9ef6-aa4e-4b86-a804-abac61caa344_750x376.png 848w, https://substackcdn.com/image/fetch/$s_!S7YK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbaf9ef6-aa4e-4b86-a804-abac61caa344_750x376.png 1272w, https://substackcdn.com/image/fetch/$s_!S7YK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbaf9ef6-aa4e-4b86-a804-abac61caa344_750x376.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Regression</h2><p>The other kind of predictive model you can build with structured data is a regression model. A common example is predicting house prices. You want to predict what a house will sell for based on how many bedrooms it has, the square footage, and other features.</p><p>You can train a machine learning model with houses that were sold in a neighborhood, their features and the price at which each was sold. You give all these to a learning algorithm, and then you have a learned regression model. Next time a house that has yet to be sold comes along, you just collect its description and attributes, give it to this model, and the model gives you a numerical value.</p><p>Unlike classification, where you predict from a finite set of values, here your model is predicting a number that could vary widely. The number is not coming from a fixed set of values. This kind of model is called a regression model.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!LvI1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b36fe6c-dd1a-4f64-aba9-293eae080f0c_750x376.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LvI1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b36fe6c-dd1a-4f64-aba9-293eae080f0c_750x376.png 424w, https://substackcdn.com/image/fetch/$s_!LvI1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b36fe6c-dd1a-4f64-aba9-293eae080f0c_750x376.png 848w, https://substackcdn.com/image/fetch/$s_!LvI1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b36fe6c-dd1a-4f64-aba9-293eae080f0c_750x376.png 1272w, https://substackcdn.com/image/fetch/$s_!LvI1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b36fe6c-dd1a-4f64-aba9-293eae080f0c_750x376.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LvI1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b36fe6c-dd1a-4f64-aba9-293eae080f0c_750x376.png" width="750" height="376" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4b36fe6c-dd1a-4f64-aba9-293eae080f0c_750x376.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:376,&quot;width&quot;:750,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:90697,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aiarchplaybook.substack.com/i/181415279?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b36fe6c-dd1a-4f64-aba9-293eae080f0c_750x376.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!LvI1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b36fe6c-dd1a-4f64-aba9-293eae080f0c_750x376.png 424w, https://substackcdn.com/image/fetch/$s_!LvI1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b36fe6c-dd1a-4f64-aba9-293eae080f0c_750x376.png 848w, https://substackcdn.com/image/fetch/$s_!LvI1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b36fe6c-dd1a-4f64-aba9-293eae080f0c_750x376.png 1272w, https://substackcdn.com/image/fetch/$s_!LvI1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b36fe6c-dd1a-4f64-aba9-293eae080f0c_750x376.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Time Series Forecasting</h2><p>Another type of predictive model is time series forecasting. You might want to predict stock prices based on events and factors, or weather forecasting. There&#8217;s seasonality, there are variables and factors you&#8217;re collecting over time, what happened, what are some of the things that occurred. Based on this, you&#8217;re predicting future values. Time plays a role in these models, that&#8217;s what makes them different from standard regression.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8Fsi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb4364b1-8435-4fcf-a7cb-62590518aaac_750x376.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8Fsi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb4364b1-8435-4fcf-a7cb-62590518aaac_750x376.png 424w, https://substackcdn.com/image/fetch/$s_!8Fsi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb4364b1-8435-4fcf-a7cb-62590518aaac_750x376.png 848w, https://substackcdn.com/image/fetch/$s_!8Fsi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb4364b1-8435-4fcf-a7cb-62590518aaac_750x376.png 1272w, https://substackcdn.com/image/fetch/$s_!8Fsi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb4364b1-8435-4fcf-a7cb-62590518aaac_750x376.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8Fsi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb4364b1-8435-4fcf-a7cb-62590518aaac_750x376.png" width="750" height="376" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bb4364b1-8435-4fcf-a7cb-62590518aaac_750x376.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:376,&quot;width&quot;:750,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:77064,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aiarchplaybook.substack.com/i/181415279?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb4364b1-8435-4fcf-a7cb-62590518aaac_750x376.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8Fsi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb4364b1-8435-4fcf-a7cb-62590518aaac_750x376.png 424w, https://substackcdn.com/image/fetch/$s_!8Fsi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb4364b1-8435-4fcf-a7cb-62590518aaac_750x376.png 848w, https://substackcdn.com/image/fetch/$s_!8Fsi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb4364b1-8435-4fcf-a7cb-62590518aaac_750x376.png 1272w, https://substackcdn.com/image/fetch/$s_!8Fsi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb4364b1-8435-4fcf-a7cb-62590518aaac_750x376.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Predictive Models for Unstructured Data</h2><p>For unstructured data, similar predictive models exist, but they work with different types of input.</p><p><strong>Image classification: </strong>You might want to classify pictures, give a machine learning model images and have it learn to predict whether a picture is of a cat, a dog, or some other animal.</p><p><strong>Object detection: </strong>A self-driving car is constantly taking pictures of what&#8217;s in front of it and around it, identifying objects and their locations. It&#8217;s not sufficient to just know that there&#8217;s an object in front of the car, it also needs to know the location of that object.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!AGx0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc536d49-0cf4-4685-a1aa-619a1d64457c_750x376.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!AGx0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc536d49-0cf4-4685-a1aa-619a1d64457c_750x376.png 424w, https://substackcdn.com/image/fetch/$s_!AGx0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc536d49-0cf4-4685-a1aa-619a1d64457c_750x376.png 848w, https://substackcdn.com/image/fetch/$s_!AGx0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc536d49-0cf4-4685-a1aa-619a1d64457c_750x376.png 1272w, https://substackcdn.com/image/fetch/$s_!AGx0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc536d49-0cf4-4685-a1aa-619a1d64457c_750x376.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!AGx0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc536d49-0cf4-4685-a1aa-619a1d64457c_750x376.png" width="750" height="376" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dc536d49-0cf4-4685-a1aa-619a1d64457c_750x376.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:376,&quot;width&quot;:750,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:79323,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aiarchplaybook.substack.com/i/181415279?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc536d49-0cf4-4685-a1aa-619a1d64457c_750x376.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!AGx0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc536d49-0cf4-4685-a1aa-619a1d64457c_750x376.png 424w, https://substackcdn.com/image/fetch/$s_!AGx0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc536d49-0cf4-4685-a1aa-619a1d64457c_750x376.png 848w, https://substackcdn.com/image/fetch/$s_!AGx0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc536d49-0cf4-4685-a1aa-619a1d64457c_750x376.png 1272w, https://substackcdn.com/image/fetch/$s_!AGx0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc536d49-0cf4-4685-a1aa-619a1d64457c_750x376.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Text classification and sentiment analysis: </strong>Maybe you have product reviews and you want to predict if the customer is happy or not. You can analyze their text comment and classify whether they&#8217;re a happy customer or not.</p><h1>Generative Models</h1><p>This is the category that&#8217;s been very popular in recent years, the large language models (LLMs), such as ChatGPT, and all these generative AI models. This kind of model is not just predicting based on an input. It takes a prompt and then creates content for you.</p><p><strong>Text generation: </strong>You can give a prompt and all of a sudden get a long, AI-generated text. It can generate articles, very human-like responses, and more.</p><p><strong>Language translation: </strong>You speak in one language, and it gets translated into another. Services like Google Translate use neural machine translation models for this.</p><p><strong>Code generation: </strong>You can generate code using AI. IBM Project Bob, Claude, Gemini, ChatGPT, there&#8217;s a lot of code generation happening with these tools.</p><p><strong>Image generation: </strong>These models can also generate images based on text prompts.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Vhtl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6527f439-a040-4d72-a343-984166bb3a5b_750x376.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Vhtl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6527f439-a040-4d72-a343-984166bb3a5b_750x376.png 424w, https://substackcdn.com/image/fetch/$s_!Vhtl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6527f439-a040-4d72-a343-984166bb3a5b_750x376.png 848w, https://substackcdn.com/image/fetch/$s_!Vhtl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6527f439-a040-4d72-a343-984166bb3a5b_750x376.png 1272w, https://substackcdn.com/image/fetch/$s_!Vhtl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6527f439-a040-4d72-a343-984166bb3a5b_750x376.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Vhtl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6527f439-a040-4d72-a343-984166bb3a5b_750x376.png" width="750" height="376" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6527f439-a040-4d72-a343-984166bb3a5b_750x376.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:376,&quot;width&quot;:750,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:98441,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aiarchplaybook.substack.com/i/181415279?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6527f439-a040-4d72-a343-984166bb3a5b_750x376.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Vhtl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6527f439-a040-4d72-a343-984166bb3a5b_750x376.png 424w, https://substackcdn.com/image/fetch/$s_!Vhtl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6527f439-a040-4d72-a343-984166bb3a5b_750x376.png 848w, https://substackcdn.com/image/fetch/$s_!Vhtl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6527f439-a040-4d72-a343-984166bb3a5b_750x376.png 1272w, https://substackcdn.com/image/fetch/$s_!Vhtl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6527f439-a040-4d72-a343-984166bb3a5b_750x376.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>What Makes Large Language Models &#8220;Large&#8221;?</h2><p>Two things make them large: massive training data from sources like Wikipedia, books, and web pages, and billions of parameters (the internal variables the model learns during training). GPT-3, for example, has 175 billion parameters. This scale allows LLMs to capture intricate patterns in language.</p><h1>Embedding Models</h1><p>Embedding models are a different kind of AI model. They don&#8217;t produce something immediately consumable like a prediction or generated text. Instead, they give you an intermediate representation called a vector, which is essentially a list of numbers that captures the meaning or characteristics of your data.</p><p>For example, each of us has certain background attributes, the field we work in, years in our career, our university education. If we want to learn which of us are closer to each other based on our technical or academic background, we could use some kind of embedding model to take a text description of who we are, and based on this, it gives a vector representation. Those of us who are similar are expected to have similar vector representations.</p><p>Similarly, you can have a list of documents or articles, vectorize them, and see which articles are similar to each other. Maybe a subset are food blogs and others are technical blogs, you expect articles in the food blogs to have similar vector representations. You can do the same with images. </p><p>With multimodal embedding, your content is a mix of text and image, and you can vectorize it to do semantic search, not just plain text-based matching, but vector-based semantic search.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UIXz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb37185a-3310-4c3c-b565-66fa064b84d9_750x376.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UIXz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb37185a-3310-4c3c-b565-66fa064b84d9_750x376.png 424w, https://substackcdn.com/image/fetch/$s_!UIXz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb37185a-3310-4c3c-b565-66fa064b84d9_750x376.png 848w, https://substackcdn.com/image/fetch/$s_!UIXz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb37185a-3310-4c3c-b565-66fa064b84d9_750x376.png 1272w, https://substackcdn.com/image/fetch/$s_!UIXz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb37185a-3310-4c3c-b565-66fa064b84d9_750x376.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UIXz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb37185a-3310-4c3c-b565-66fa064b84d9_750x376.png" width="750" height="376" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/db37185a-3310-4c3c-b565-66fa064b84d9_750x376.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:376,&quot;width&quot;:750,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:96962,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aiarchplaybook.substack.com/i/181415279?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb37185a-3310-4c3c-b565-66fa064b84d9_750x376.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!UIXz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb37185a-3310-4c3c-b565-66fa064b84d9_750x376.png 424w, https://substackcdn.com/image/fetch/$s_!UIXz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb37185a-3310-4c3c-b565-66fa064b84d9_750x376.png 848w, https://substackcdn.com/image/fetch/$s_!UIXz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb37185a-3310-4c3c-b565-66fa064b84d9_750x376.png 1272w, https://substackcdn.com/image/fetch/$s_!UIXz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb37185a-3310-4c3c-b565-66fa064b84d9_750x376.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>Choosing the Right Model for Your Problem</h1><p>When you&#8217;re approaching a problem, ask yourself three questions:</p><p><strong>What kind of data do you have? </strong>Structured data (tables with rows and columns) works well with traditional machine learning. Unstructured data (text, images, audio, video) typically requires deep learning approaches.</p><p><strong>Do you have labeled data? </strong>If you have historical data with known outcomes, you can use supervised learning (classification, regression). If not, consider unsupervised approaches like clustering or embedding models.</p><p><strong>What kind of output do you need? </strong>This is often the deciding factor. Use the table below as a quick reference.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1OhQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70173c7f-23a3-4fee-8d92-392a11273c01_840x446.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1OhQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70173c7f-23a3-4fee-8d92-392a11273c01_840x446.png 424w, https://substackcdn.com/image/fetch/$s_!1OhQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70173c7f-23a3-4fee-8d92-392a11273c01_840x446.png 848w, https://substackcdn.com/image/fetch/$s_!1OhQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70173c7f-23a3-4fee-8d92-392a11273c01_840x446.png 1272w, https://substackcdn.com/image/fetch/$s_!1OhQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70173c7f-23a3-4fee-8d92-392a11273c01_840x446.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1OhQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70173c7f-23a3-4fee-8d92-392a11273c01_840x446.png" width="840" height="446" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/70173c7f-23a3-4fee-8d92-392a11273c01_840x446.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:446,&quot;width&quot;:840,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:81841,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aiarchplaybook.substack.com/i/181415279?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70173c7f-23a3-4fee-8d92-392a11273c01_840x446.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1OhQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70173c7f-23a3-4fee-8d92-392a11273c01_840x446.png 424w, https://substackcdn.com/image/fetch/$s_!1OhQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70173c7f-23a3-4fee-8d92-392a11273c01_840x446.png 848w, https://substackcdn.com/image/fetch/$s_!1OhQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70173c7f-23a3-4fee-8d92-392a11273c01_840x446.png 1272w, https://substackcdn.com/image/fetch/$s_!1OhQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70173c7f-23a3-4fee-8d92-392a11273c01_840x446.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The temptation today is to reach for LLMs for everything. But LLMs are just one drawer in the toolbox. For structured, tabular data, traditional machine learning algorithms are often faster, simpler, and more accurate. The skill isn&#8217;t knowing the most powerful tool. It&#8217;s knowing which tool fits the problem in front of you.</p>]]></content:encoded></item><item><title><![CDATA[What It Actually Takes to Become an AI Product Architect — Part 1: Discovery to Specification]]></title><description><![CDATA[How experienced engineers can lead AI initiatives end-to-end]]></description><link>https://aiarchitectplaybook.com/p/what-it-actually-takes-to-become</link><guid isPermaLink="false">https://aiarchitectplaybook.com/p/what-it-actually-takes-to-become</guid><dc:creator><![CDATA[Shaikh Quader]]></dc:creator><pubDate>Mon, 08 Dec 2025 16:11:51 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!kvwa!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03f2d1ab-a0d3-407f-981a-33b2bff46503_1456x1048.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kvwa!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03f2d1ab-a0d3-407f-981a-33b2bff46503_1456x1048.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kvwa!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03f2d1ab-a0d3-407f-981a-33b2bff46503_1456x1048.png 424w, https://substackcdn.com/image/fetch/$s_!kvwa!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03f2d1ab-a0d3-407f-981a-33b2bff46503_1456x1048.png 848w, https://substackcdn.com/image/fetch/$s_!kvwa!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03f2d1ab-a0d3-407f-981a-33b2bff46503_1456x1048.png 1272w, https://substackcdn.com/image/fetch/$s_!kvwa!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03f2d1ab-a0d3-407f-981a-33b2bff46503_1456x1048.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kvwa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03f2d1ab-a0d3-407f-981a-33b2bff46503_1456x1048.png" width="1456" height="1048" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/03f2d1ab-a0d3-407f-981a-33b2bff46503_1456x1048.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1048,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:373111,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://aiarchplaybook.substack.com/i/181046164?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03f2d1ab-a0d3-407f-981a-33b2bff46503_1456x1048.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!kvwa!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03f2d1ab-a0d3-407f-981a-33b2bff46503_1456x1048.png 424w, https://substackcdn.com/image/fetch/$s_!kvwa!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03f2d1ab-a0d3-407f-981a-33b2bff46503_1456x1048.png 848w, https://substackcdn.com/image/fetch/$s_!kvwa!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03f2d1ab-a0d3-407f-981a-33b2bff46503_1456x1048.png 1272w, https://substackcdn.com/image/fetch/$s_!kvwa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03f2d1ab-a0d3-407f-981a-33b2bff46503_1456x1048.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Many companies don&#8217;t have a job title for the person who owns AI features end-to-end. They have ML engineers, data scientists, product managers. But the role that bridges all of them? That&#8217;s often unnamed.</p><p><em>AI Product Architect</em> is a term I&#8217;m using to describe this role. It&#8217;s not a standard industry title. It&#8217;s how I make sense of a set of responsibilities that don&#8217;t fit neatly into existing job categories.</p><p>A software architect designs systems. An AI Product Architect, as I define it, does the same, but for AI features inside products. Same technical leadership. Different technical domain.</p><p>If you&#8217;re a senior engineer or software architect with years of experience shipping software, you don&#8217;t start over to move into this role. You add AI to what you already know: systems thinking, product delivery, technical leadership. The core skills transfer.</p><p>In my role as AI Architect at IBM Db2, I technically lead the development of AI features for the product. Db2 has architects for different areas such as compiler, security, and data movement. My responsibility is planning, designing, and implementing AI features in Db2 products.</p><p>That&#8217;s what an AI Product Architect does, at least in my experience: technically owns AI features end-to-end within a larger product.</p><h1>What the Role Looks Like in Practice</h1><p>In 2025, I was technically responsible for three major AI initiatives in Db2. I owned and architected the end-to-end implementation of each, from concept to release: planning, designing, implementing, testing, releasing, promoting, and adopting.</p><p>The three projects:</p><blockquote><p>1. <strong>Native vector data type</strong>: introducing vector storage and similarity search functions inside Db2</p><p>2. <strong>Tooling integration</strong>: extending existing Db2 utilities to recognize and work with the new vector type</p><p>3. <strong>LLM API integration</strong>: connecting external language model APIs to Db2, making them callable via SQL functions</p></blockquote><p>These weren&#8217;t research experiments. Each one shipped in a product release to enterprise customers running real workloads.</p><p>Each project followed the same pattern. Let me walk you through what that looks like.</p><h1>Phase 1: Proposal and Validation</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!feuj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55c3b615-1111-4f0e-b823-db80e8312a22_800x400.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!feuj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55c3b615-1111-4f0e-b823-db80e8312a22_800x400.png 424w, https://substackcdn.com/image/fetch/$s_!feuj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55c3b615-1111-4f0e-b823-db80e8312a22_800x400.png 848w, https://substackcdn.com/image/fetch/$s_!feuj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55c3b615-1111-4f0e-b823-db80e8312a22_800x400.png 1272w, https://substackcdn.com/image/fetch/$s_!feuj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55c3b615-1111-4f0e-b823-db80e8312a22_800x400.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!feuj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55c3b615-1111-4f0e-b823-db80e8312a22_800x400.png" width="800" height="400" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/55c3b615-1111-4f0e-b823-db80e8312a22_800x400.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:400,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:45952,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aiarchplaybook.substack.com/i/181046164?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55c3b615-1111-4f0e-b823-db80e8312a22_800x400.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!feuj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55c3b615-1111-4f0e-b823-db80e8312a22_800x400.png 424w, https://substackcdn.com/image/fetch/$s_!feuj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55c3b615-1111-4f0e-b823-db80e8312a22_800x400.png 848w, https://substackcdn.com/image/fetch/$s_!feuj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55c3b615-1111-4f0e-b823-db80e8312a22_800x400.png 1272w, https://substackcdn.com/image/fetch/$s_!feuj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55c3b615-1111-4f0e-b823-db80e8312a22_800x400.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Every initiative starts with identifying a market opportunity. A problem worth solving. That problem may translate into multiple functions, new capabilities, new use cases for customers. But it starts with one question: what&#8217;s worth building?</p><h2>Market Research</h2><p>Analyst reports give you the landscape: Gartner Magic Quadrant, Forrester Wave, industry blogs from major conferences. Competitor announcements show you where the market is moving.</p><p>But reports tell you what&#8217;s popular, not what&#8217;s valuable. That requires talking to customers.</p><p>When we proposed vector support for Db2, I studied how AI practitioners were using vector types and similarity search: what problems these capabilities solved, what use cases they enabled.</p><h2>Customer Validation</h2><p>I spoke with customer representatives, internal data scientists, and machine learning practitioners. Many of them work directly with real customers. I also reached out to actual customers already building these kinds of use cases.</p><p>The goal isn&#8217;t to confirm your idea is good. It&#8217;s to find out where it&#8217;s wrong. What are customers already doing? What&#8217;s broken? Would they pay for this?</p><p>This back-and-forth sharpens the idea. You validate assumptions, crystallize the concept into specific problem categories, and identify where real demand exists.</p><h2>Quick Prototyping</h2><p>Before presenting to executives, when possible, I build low-fidelity prototypes using existing features. Not performant. Not production-ready. But tangible.</p><p>For the vector proposal, I implemented vector storage and similarity search using existing Db2 data types and user-defined functions. It was slow, but it demonstrated the experience and capability we were proposing.</p><p>A working prototype changes the conversation. Stakeholders stop debating whether it&#8217;s possible and start debating whether it&#8217;s worth it.</p><h2>Building the Proposal</h2><p>The initial proposal is still high level: a suggested set of new features, the problems they solve, evidence of customer interest, and the industry outlook.</p><p>I never go straight to executives. First, I validate with peer architects, including some more senior than me, and make them co-authors on the proposal. We go back and forth, refine the thinking, pressure-test the idea. Then we present to executives together.</p><h1>Phase 2: Solution Externals</h1><p>Once the proposal has stakeholder support, the next step is writing solution externals.</p><p>In software engineering terms, this is a software requirements specification. But it reads like a user manual, written before any code exists. You define how real users will interact with the proposed features: the interface, the syntax, the experience.</p><p>The goal is precision without code. By the time engineering starts, everyone (architects, developers, stakeholders) should have the same understanding of what the feature will do.</p><p>For each feature, I document:</p><blockquote><p>&#8226; <strong>What it does</strong>: a clear description of the function or capability</p><p>&#8226; <strong>How it looks</strong>: syntax, parameters, inputs, outputs</p><p>&#8226; <strong>Processing</strong>: what the function does with those inputs to produce the output</p><p>&#8226; <strong>Restrictions</strong>: what this feature will not cover in this phase</p><p>&#8226; <strong>User scenarios</strong>: sample use cases with realistic values</p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qF3S!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac878c59-aa28-4bfb-ac59-6eb024e37dcf_800x560.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qF3S!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac878c59-aa28-4bfb-ac59-6eb024e37dcf_800x560.png 424w, https://substackcdn.com/image/fetch/$s_!qF3S!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac878c59-aa28-4bfb-ac59-6eb024e37dcf_800x560.png 848w, https://substackcdn.com/image/fetch/$s_!qF3S!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac878c59-aa28-4bfb-ac59-6eb024e37dcf_800x560.png 1272w, https://substackcdn.com/image/fetch/$s_!qF3S!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac878c59-aa28-4bfb-ac59-6eb024e37dcf_800x560.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qF3S!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac878c59-aa28-4bfb-ac59-6eb024e37dcf_800x560.png" width="800" height="560" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ac878c59-aa28-4bfb-ac59-6eb024e37dcf_800x560.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:560,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:78607,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aiarchplaybook.substack.com/i/181046164?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac878c59-aa28-4bfb-ac59-6eb024e37dcf_800x560.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!qF3S!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac878c59-aa28-4bfb-ac59-6eb024e37dcf_800x560.png 424w, https://substackcdn.com/image/fetch/$s_!qF3S!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac878c59-aa28-4bfb-ac59-6eb024e37dcf_800x560.png 848w, https://substackcdn.com/image/fetch/$s_!qF3S!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac878c59-aa28-4bfb-ac59-6eb024e37dcf_800x560.png 1272w, https://substackcdn.com/image/fetch/$s_!qF3S!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac878c59-aa28-4bfb-ac59-6eb024e37dcf_800x560.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>For example, when I wrote the solution external for the vector functions in Db2, I documented each function. For vector distance calculations, that meant defining parameters, expected inputs, outputs, processing logic, and restrictions. Everything was documented textually and visually, without writing a single line of implementation code. The visual above shows what this looks like in practice.</p><p>This solution external becomes the foundation for the actual user documentation later.</p><p>This document saves months of rework. Ambiguity in requirements becomes bugs in production. The solution external eliminates ambiguity before a single line of code is written.</p><p>At this point, the feature exists on paper. Everyone agrees on what we&#8217;re building.</p><p>Now comes the hard part: making it real.</p><p><em><strong>Next post: Design and Implementation.</strong></em></p>]]></content:encoded></item><item><title><![CDATA[From Java Developer to AI Architect at IBM]]></title><description><![CDATA[After 11 years of Java, I switched to AI. I did not have to start over as a junior.]]></description><link>https://aiarchitectplaybook.com/p/from-java-developer-to-ai-architect</link><guid isPermaLink="false">https://aiarchitectplaybook.com/p/from-java-developer-to-ai-architect</guid><dc:creator><![CDATA[Shaikh Quader]]></dc:creator><pubDate>Fri, 05 Dec 2025 10:36:14 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Yj7z!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0314dc11-6b27-42f8-9c37-fdc92613d453_1600x900.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Yj7z!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0314dc11-6b27-42f8-9c37-fdc92613d453_1600x900.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Yj7z!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0314dc11-6b27-42f8-9c37-fdc92613d453_1600x900.png 424w, https://substackcdn.com/image/fetch/$s_!Yj7z!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0314dc11-6b27-42f8-9c37-fdc92613d453_1600x900.png 848w, https://substackcdn.com/image/fetch/$s_!Yj7z!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0314dc11-6b27-42f8-9c37-fdc92613d453_1600x900.png 1272w, https://substackcdn.com/image/fetch/$s_!Yj7z!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0314dc11-6b27-42f8-9c37-fdc92613d453_1600x900.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Yj7z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0314dc11-6b27-42f8-9c37-fdc92613d453_1600x900.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0314dc11-6b27-42f8-9c37-fdc92613d453_1600x900.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:46318,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://aiarchplaybook.substack.com/i/180784269?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0314dc11-6b27-42f8-9c37-fdc92613d453_1600x900.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Yj7z!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0314dc11-6b27-42f8-9c37-fdc92613d453_1600x900.png 424w, https://substackcdn.com/image/fetch/$s_!Yj7z!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0314dc11-6b27-42f8-9c37-fdc92613d453_1600x900.png 848w, https://substackcdn.com/image/fetch/$s_!Yj7z!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0314dc11-6b27-42f8-9c37-fdc92613d453_1600x900.png 1272w, https://substackcdn.com/image/fetch/$s_!Yj7z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0314dc11-6b27-42f8-9c37-fdc92613d453_1600x900.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In 2014, I was at IBM Littleton Lab in Massachusetts for a training on Watson. I was learning how to build a Q&amp;A system for IBM customer support. I had spent nine years building Java applications. I knew nothing about AI beyond a graduate course on data mining and machine learning. Watching Watson generate answers, I was amazed. The technology was just starting out, but I could see where things were heading.</p><p>That week was my first exposure to AI. Two years later, I made the full transition.</p><p>My fear was stepping into territory with no clear path to production. But I had always tried new technology early, even in my Java career. I knew that if I stayed comfortable, I would be fine for a few years. Eventually, my growth would stop. I was not willing to let that happen.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!I5P9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ea76623-76a6-473b-b2f3-09a1266059f3_826x128.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!I5P9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ea76623-76a6-473b-b2f3-09a1266059f3_826x128.png 424w, https://substackcdn.com/image/fetch/$s_!I5P9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ea76623-76a6-473b-b2f3-09a1266059f3_826x128.png 848w, https://substackcdn.com/image/fetch/$s_!I5P9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ea76623-76a6-473b-b2f3-09a1266059f3_826x128.png 1272w, https://substackcdn.com/image/fetch/$s_!I5P9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ea76623-76a6-473b-b2f3-09a1266059f3_826x128.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!I5P9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ea76623-76a6-473b-b2f3-09a1266059f3_826x128.png" width="826" height="128" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7ea76623-76a6-473b-b2f3-09a1266059f3_826x128.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:128,&quot;width&quot;:826,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:38690,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aiarchplaybook.substack.com/i/180784269?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ea76623-76a6-473b-b2f3-09a1266059f3_826x128.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!I5P9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ea76623-76a6-473b-b2f3-09a1266059f3_826x128.png 424w, https://substackcdn.com/image/fetch/$s_!I5P9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ea76623-76a6-473b-b2f3-09a1266059f3_826x128.png 848w, https://substackcdn.com/image/fetch/$s_!I5P9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ea76623-76a6-473b-b2f3-09a1266059f3_826x128.png 1272w, https://substackcdn.com/image/fetch/$s_!I5P9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ea76623-76a6-473b-b2f3-09a1266059f3_826x128.png 1456w" sizes="100vw"></picture><div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!efNy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0760f3f5-bd51-4de4-bd00-b901b98ffe15_900x204.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!efNy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0760f3f5-bd51-4de4-bd00-b901b98ffe15_900x204.png 424w, https://substackcdn.com/image/fetch/$s_!efNy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0760f3f5-bd51-4de4-bd00-b901b98ffe15_900x204.png 848w, https://substackcdn.com/image/fetch/$s_!efNy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0760f3f5-bd51-4de4-bd00-b901b98ffe15_900x204.png 1272w, https://substackcdn.com/image/fetch/$s_!efNy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0760f3f5-bd51-4de4-bd00-b901b98ffe15_900x204.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!efNy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0760f3f5-bd51-4de4-bd00-b901b98ffe15_900x204.png" width="900" height="204" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0760f3f5-bd51-4de4-bd00-b901b98ffe15_900x204.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:204,&quot;width&quot;:900,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:32845,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aiarchplaybook.substack.com/i/180784269?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0760f3f5-bd51-4de4-bd00-b901b98ffe15_900x204.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!efNy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0760f3f5-bd51-4de4-bd00-b901b98ffe15_900x204.png 424w, https://substackcdn.com/image/fetch/$s_!efNy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0760f3f5-bd51-4de4-bd00-b901b98ffe15_900x204.png 848w, https://substackcdn.com/image/fetch/$s_!efNy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0760f3f5-bd51-4de4-bd00-b901b98ffe15_900x204.png 1272w, https://substackcdn.com/image/fetch/$s_!efNy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0760f3f5-bd51-4de4-bd00-b901b98ffe15_900x204.png 1456w" sizes="100vw"></picture><div></div></div></a></figure></div><h1>The Java Years</h1><p>For 11 years, my world was enterprise Java. I built web applications and web services, deployed them to production, and troubleshot server failures. I developed a search engine from scratch. I set up a wiki for a large organization. My days were writing code, testing, deploying, and fixing what broke.</p><p>I was not just an individual contributor. I led a team of Java developers, system administrators, and DBAs. I was hands-on with code and responsible for technical leadership and architecture. Doing both shaped how I think about building systems and leading people.</p><p>It was good work. I was good at it. Management was happy, and I received top ratings.</p><p>But at some point, the growth slowed down. The work became routine: add features to the front end, crawl more data sources into the search engine, repeat. I was not feeling challenged. I was not innovating. And I knew it.</p><h1>The Turning Point</h1><p>By 2016, &#8220;Data Science&#8221; and &#8220;Cognitive Computing&#8221; were impossible to ignore. AI had carried a negative reputation from its earlier failures, but deep learning was changing that. Watson had won Jeopardy. Kaggle competitions were producing impressive results. Neural networks were going through a transformation with  modern architectures.</p><p>I did not understand most of it. But I could see AI was beginning to matter. It was the earliest stage of adoption. Most companies were hesitant to explore AI. Success stories were rare.</p><p>Then, at the end of September 2016, before a three-week vacation, my manager <a href="https://www.linkedin.com/in/chad-marston-b3a1b52/">Chad Marston</a> pinged me. He said, &#8220;Shaikh, we want to begin working in Cognitive Computing. I want you to lead that team.&#8221;</p><p>I would be building a team of 8 to 10 developers, and none of us had any background in AI. We all came from Java and web development.</p><p>I went on vacation, but I kept thinking about it. When I came back, I was already exploring what I could learn and where I could get hands-on with this technology.</p><p>Chad sent me to the IBM Watson Developer Conference in San Francisco later that year. I attended hands-on labs where I played with Watson APIs. I saw demos. I listened to talks from Ginni Rometty, IBM&#8217;s CEO, and other experts. I had seen Watson before, but this was different. I saw more of what AI could do in practice. I came back with confidence and a clearer picture of what we could build.</p><h1>The Hard Part</h1><p>Leading 8 to 10 people while learning myself was not easy. Online learning platforms had plenty of courses, still overwhelming. Where do you start? What is worth your time? What will matter when you try to apply it?</p><p>The first thing I did was set up regular check-ins with my mentors at IBM: <a href="https://www.linkedin.com/in/jfpuget/">Jean-Francois Puget</a>, <a href="https://www.linkedin.com/in/jorgecasta/">Jorge Casta&#241;&#243;n</a>, and <a href="https://www.linkedin.com/in/sdobrin/">Seth Dobrin</a>. I got their guidance, filtered courses based on their recommendations, put together a learning path, and shared it with the team.</p><p>Even with structure, the learning curve was steep. The concept I struggled with longest was evaluation metrics. Accuracy was easy. Precision and recall were harder, and harder still to explain to management. The harder problem was mapping model metrics to business metrics. A model can perform well on test data, but how do you know it is helping the business?</p><p>I have to admit: our first project failed. We were asked to predict customer satisfaction scores from support interactions. The model was accurate in training and testing. We deployed it. But we had no way to monitor business success. We failed quickly and learned from it.</p><p>Many questions from team members, I did not know the answers. I did not make things up. I admitted I did not know, then went and learned and came back. That honesty became part of how we operated.</p><p>The learning was hard for the whole team. We set up a weekly session where we watched videos together and discussed what we learned. Often, one team member would pre-read the material and present it. This kept us progressing together.</p><h1>What Helped Us Learn</h1><p>I tried many courses across Coursera, edX, Udemy, and DataCamp. A few stood out:</p><p><strong>Andrew Ng&#8217;s Machine Learning course</strong> gave me a foundation in core algorithms: decision trees, linear and logistic regression, support vector machines, and neural networks. It was taught in Octave, not Python, but the concepts were what mattered.</p><p><strong>Python for Data Science from DataCamp</strong> gave me fluency in Python once I had the conceptual foundation.</p><p>Two books were essential: <strong>An Introduction to Statistical Learning</strong> for statistical foundations, and <strong>Python Machine Learning</strong> by Sebastian Raschka for bridging theory to implementation.</p><p>But the most important learning did not come from courses. It came from applying concepts to real projects, getting stuck, and consulting mentors. That cycle of building, struggling, and asking for help was the core of my growth.</p><h1>Where I Am Now</h1><p>My AI journey started in 2014 with that Watson project. In fall 2016, I made the full transition, initially working with Watson&#8217;s NLP capabilities: sentiment analysis, question answering, and similar APIs. I became the lead machine learning engineer for IBM Analytics and held that role for three years. In 2019, I moved into my current role: AI Architect for IBM Db2.</p><p>My work today goes beyond building models. It has two dimensions. The first is improving database operations using AI, for example, applying machine learning to improve memory estimation for queries. The second is bringing AI infrastructure into the database itself, so that database administrators can run AI workloads directly within Db2.</p><p>Building AI into a database is different from building standalone models. The model must be small, fast, and light on resources. No human monitors these models, so they need self-feedback mechanisms. These were constraints I never faced before.</p><p>One project shows what my role looks like now: bringing vector support to Db2. At the beginning of 2025, Db2 had no vector capabilities. I owned this project end to end: planning, designing, architecting, leading the team, and contributing to development. We implemented vector type support, vector similarity search, and LLM API integration callable via SQL. We released it, then I created educational materials, gave talks, wrote blog posts, and promoted the work.</p><p>That cycle, from zero to shipped product to education to promotion, is what makes this role rewarding. I am challenged every day. I am not just building models. I am thinking about how AI fits into an enterprise database platform and how to make it useful for production workloads. I am more satisfied now than I ever was in my Java years.</p><h1>What I Would Tell Someone Making This Transition</h1><p><strong>Find a mentor early.</strong> If you want to become an AI architect, find someone in that role. Get their guidance on what to learn and what to skip. This will save you months.</p><p><strong>Your experience is not a liability.</strong> I did not abandon my 11 years of Java. I built on it. Production systems, working with teams, managing projects: all of that transferred. The AI skills were an addition, not a replacement.</p><p><strong>Your learning approach should change over time.</strong> Early on, I took courses end to end. That was useful for building foundations. But after the fundamentals, I changed my approach. Now I start with the project, then learn only what I need to apply. In 2021, I took an entire specialization on cloud deployment of AI. I never applied most of it, and those skills are gone. Project-driven learning is more effective once you have the foundations.</p><p><strong>Model accuracy is not the end goal.</strong> I used to think a highly accurate model was all that mattered. I was wrong. What matters is identifying the right business metric and tying it to your model&#8217;s performance. A model can look great in testing but have no impact if you have not made that connection.</p><p><strong>User adoption is the part nobody warns you about.</strong> Even after you get the model and metrics right, users fall back to old routines. Deploying AI is not just a technical problem. It requires rolling it out slowly and getting buy-in. Skip this, and your model sits unused.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gquo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F485fd5fa-a717-4978-9267-1b0981ced4bd_826x260.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gquo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F485fd5fa-a717-4978-9267-1b0981ced4bd_826x260.png 424w, https://substackcdn.com/image/fetch/$s_!gquo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F485fd5fa-a717-4978-9267-1b0981ced4bd_826x260.png 848w, https://substackcdn.com/image/fetch/$s_!gquo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F485fd5fa-a717-4978-9267-1b0981ced4bd_826x260.png 1272w, https://substackcdn.com/image/fetch/$s_!gquo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F485fd5fa-a717-4978-9267-1b0981ced4bd_826x260.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gquo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F485fd5fa-a717-4978-9267-1b0981ced4bd_826x260.png" width="826" height="260" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/485fd5fa-a717-4978-9267-1b0981ced4bd_826x260.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:260,&quot;width&quot;:826,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:50287,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aiarchplaybook.substack.com/i/180784269?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F485fd5fa-a717-4978-9267-1b0981ced4bd_826x260.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gquo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F485fd5fa-a717-4978-9267-1b0981ced4bd_826x260.png 424w, https://substackcdn.com/image/fetch/$s_!gquo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F485fd5fa-a717-4978-9267-1b0981ced4bd_826x260.png 848w, https://substackcdn.com/image/fetch/$s_!gquo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F485fd5fa-a717-4978-9267-1b0981ced4bd_826x260.png 1272w, https://substackcdn.com/image/fetch/$s_!gquo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F485fd5fa-a717-4978-9267-1b0981ced4bd_826x260.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>Questions to Ask Yourself</h1><p>1. Who do you know who is already an AI architect? Write down three names. This week, reach out to one of them with a single question about how they got there. If you do not have someone in mind and think I could help, reach out to me.</p><p>2. What skills from your current role would transfer to an AI architect role? List three.</p><p>3. What have you learned in the past two years that you never applied? Going forward, learn less that way.</p><p>4. What problem in your current work could benefit from AI? Name the problem and the outcome you want.</p><p>5. If you became an AI architect today, what is the first project you would want to work on?</p><h1>Your Turn</h1><p>If you are a senior engineer feeling that growth has stopped, the path I took is available to you. You do not need to go back to school. You do not need to start over as a junior. You need a structured path, the right guidance, and willingness to be uncomfortable for a while.</p><p>If this post was useful, follow me. I will publish more on lessons from my transition, with steps you can apply.</p><p>If you are considering a move to an AI architect role, I would like to hear from you. What are your struggles? What is holding you back? Share in the comments. Your questions will shape what I write next.</p>]]></content:encoded></item><item><title><![CDATA[Setting Up llama.cpp on macOS: What Actually Worked for Me]]></title><description><![CDATA[My Python 3.12 setup for Apple Silicon with text generation and embedding models]]></description><link>https://aiarchitectplaybook.com/p/setting-up-llamacpp-on-macos-what</link><guid isPermaLink="false">https://aiarchitectplaybook.com/p/setting-up-llamacpp-on-macos-what</guid><dc:creator><![CDATA[Shaikh Quader]]></dc:creator><pubDate>Wed, 29 Oct 2025 08:34:08 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Gp3l!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ca8790d-f63d-4c12-9802-03faca7d8b22_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Gp3l!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ca8790d-f63d-4c12-9802-03faca7d8b22_1536x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Gp3l!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ca8790d-f63d-4c12-9802-03faca7d8b22_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!Gp3l!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ca8790d-f63d-4c12-9802-03faca7d8b22_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!Gp3l!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ca8790d-f63d-4c12-9802-03faca7d8b22_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!Gp3l!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ca8790d-f63d-4c12-9802-03faca7d8b22_1536x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Gp3l!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ca8790d-f63d-4c12-9802-03faca7d8b22_1536x1024.png" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0ca8790d-f63d-4c12-9802-03faca7d8b22_1536x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2007186,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://aiarchplaybook.substack.com/i/177446867?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ca8790d-f63d-4c12-9802-03faca7d8b22_1536x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Gp3l!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ca8790d-f63d-4c12-9802-03faca7d8b22_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!Gp3l!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ca8790d-f63d-4c12-9802-03faca7d8b22_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!Gp3l!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ca8790d-f63d-4c12-9802-03faca7d8b22_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!Gp3l!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ca8790d-f63d-4c12-9802-03faca7d8b22_1536x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>One thing troubled me these past days: I couldn&#8217;t get llama.cpp to install on macOS. I had set it up successfully on Linux using Python 3.13. The same steps failed on my Mac. They kept failing</p><p>llama.cpp is a tool written in C++ that allows you to run large language models on consumer hardware.. It supports various model types and provides GPU acceleration on Apple Silicon through Metal. For developers and IT professionals, this means you can run AI models locally without relying on cloud services or paying for API calls.</p><p>I searched everywhere, ChatGPT, Claude, and Google. Nothing worked. I went back to the official llama.cpp documentation. There I found the answer: <strong>llama.cpp doesn&#8217;t work well with Python 3.13 on macOS</strong>.</p><p>After more testing, I arrived at the right steps for macOS. I run an M2 Mac, but this should work on other Apple Silicon chips, thoughI have not tested those variants.</p><p>I wanted to verify everything worked. I downloaded two models from Hugging Face in GGUF format: a text generation model and a text embedding model. I created two minimalistic Python scripts, one for text generation, another for embedding generation.</p><p>Both scripts ran fine. The embedding script generated embeddings for input text. The text generation script responded correctly to prompts.</p><p>To be certain, I documented all steps. Then I created a fresh directory and repeated everything. It worked again.</p><p>Here are my steps, the models I used, and the test scripts. I hope this helps those who want to set up language models locally on macOS for prototyping or development.</p><h2>Complete Setup Process</h2><h3>Step 1: Environment Setup</h3><pre><code># Install Python 3.12 using uv (if not already installed)
uv python install 3.12

# Check Python version
python3.12 --version

# Create virtual environment with Python 3.12
uv venv --python=$(which python3.12)

# Upgrade pip
uv pip install --upgrade pip

# Install required packages with Metal support
uv pip install \
  --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/metal \
  llama-cpp-python \
  &#8220;langchain&gt;=0.2&#8221; &#8220;langchain-community&gt;=0.2&#8221;</code></pre><p></p><h3>Step 2: Download Models</h3><pre><code># Download Nomic embedding model
wget https://huggingface.co/nomic-ai/nomic-embed-text-v1.5-GGUF/resolve/main/nomic-embed-text-v1.5.Q8_0.gguf

# Download Qwen text generation model
wget -O qwen2.5-3b-instruct-q4_k_m.gguf \
  https://huggingface.co/Qwen/Qwen2.5-3B-Instruct-GGUF/resolve/main/qwen2.5-3b-instruct-q4_k_m.gguf</code></pre><h3>Step 3: Create Test Scripts</h3><p><strong>Text Generation</strong> (<code>langchain_qwen_generation.py</code>):</p><pre><code>from langchain_community.llms import LlamaCpp

llm = LlamaCpp(
    model_path=&#8221;qwen2.5-3b-instruct-q4_k_m.gguf&#8221;,
    n_ctx=2048,
    temperature=0.7,
    max_tokens=100,
    verbose=False
)

response = llm.invoke(&#8221;What is the capital of France?&#8221;)
print(response)</code></pre><p><strong>Embedding Generation</strong> (<code>langchain_nomic_embeddings.py</code>):</p><pre><code>from langchain_community.embeddings import LlamaCppEmbeddings

embeddings = LlamaCppEmbeddings(
    model_path=&#8221;nomic-embed-text-v1.5.Q8_0.gguf&#8221;,
    n_ctx=512,
    verbose=False
)

query = embeddings.embed_query(&#8221;search_query: What is AI?&#8221;)
print(f&#8221;Dimensions: {len(query)}&#8221;)
print(f&#8221;First 5: {query[:5]}&#8221;)</code></pre><h3>Step 4: Run Test Scripts</h3><pre><code># Test text generation
uv run langchain_qwen_generation.py

# Test embeddings
uv run langchain_nomic_embeddings.py</code></pre><h2>What I Learned</h2><p>For now, use Python 3.12, not 3.13, on macOS. The installation command includes Metal support for Apple Silicon. Both test scripts verify that everything works.</p><p>This setup lets you run language models locally without cloud costs. Good for prototyping and development.</p>]]></content:encoded></item><item><title><![CDATA[Deploying Agentic RAG to Production – Part 2 ]]></title><description><![CDATA[Building the Search API and Unified Gateway to Power End-to-End Agentic Workflows]]></description><link>https://aiarchitectplaybook.com/p/deploying-agentic-rag-to-production</link><guid isPermaLink="false">https://aiarchitectplaybook.com/p/deploying-agentic-rag-to-production</guid><dc:creator><![CDATA[Shaikh Quader]]></dc:creator><pubDate>Mon, 04 Aug 2025 14:03:45 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!JxUw!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe3426cdf-38a8-42c7-88f0-f09d389aec58_1300x1300.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I&#8217;ve just published the second installment of my <strong>Deploying Agentic RAG to Production</strong> tutorial series.</p><p>This part focuses on building the <strong>Search API</strong> for Agentic RAG using FastAPI, designed to orchestrate an AI agent powered by LangGraph. The agent coordinates multiple tools to retrieve the best possible answers, forming the core of an agentic RAG pipeline. I also unified this Search API with my previous Ingestion API under a single FastAPI gateway - creating a clean, production-ready interface.</p><p>You&#8217;ll find all the code and setup instructions in the GitHub repo:<br>&#128279; <a href="https://github.com/shaikhq/agentic-rag-db2">https://github.com/shaikhq/agentic-rag-db2</a></p><p>To go with the 2nd installment of the tutorial series, I&#8217;ve recorded two short videos:</p><ol><li><p><strong>Implementation Walkthrough</strong> &#8211; A guided tour of the repo and how the APIs are structured</p></li><li><p><strong>Live Demo</strong> &#8211; A hands-on demonstration of calling the APIs using simple <code>curl</code> commands</p></li></ol><p>Both videos are included below.</p><p>This builds directly on the first part of the series, where I focused on turning the document ingestion logic into a standalone API. If you missed that, you can find it in the GitHub repo as well.</p><h3>First Video: Deploy Agentic RAG to Production, Part 2 - Build the Search API End Point - Code Walkthrough</h3><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;1f2a601a-e524-4fea-ae2f-e1abd4c8afe0&quot;,&quot;duration&quot;:null}"></div><h3>Second Video: Agentic RAG Search and Gateway APIs Demo</h3><p>In the following video, I run a demo of the search and the gateway APIs using simple <code>curl</code> commands across three scenarios:</p><ol><li><p>Ingesting new knowledge into the vector store using the ingestion API</p></li><li><p>Querying the Search API to retrieve the best possible answer through the LangGraph agent</p></li><li><p>Clearing the vector store using the cleanup API, giving the system a clean slate</p></li></ol><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;e10b1828-8aa6-4d7b-ba25-ca58ecba7f9b&quot;,&quot;duration&quot;:null}"></div><p>These APIs form the foundation for building an Agentic RAG application on top of the ingestion and retrieval pipeline. </p><p>If you have missed the first installment of my deploying Agentic RAG to production tutorial series, here&#8217;s the link:<br><br><strong><a href="https://open.substack.com/pub/aiarchplaybook/p/deploying-an-ai-agent-in-production?r=1jb7za&amp;utm_campaign=post&amp;utm_medium=web&amp;timestamp=2.9&amp;showWelcomeOnShare=false">Deploying an AI Agent in Production: FastAPI Data Ingestion</a></strong></p>]]></content:encoded></item><item><title><![CDATA[Deploying Agentic RAG to Production, Part 1: FastAPI Data Ingestion]]></title><description><![CDATA[From a Notebook Prototype to Production APIs]]></description><link>https://aiarchitectplaybook.com/p/deploying-an-ai-agent-in-production</link><guid isPermaLink="false">https://aiarchitectplaybook.com/p/deploying-an-ai-agent-in-production</guid><dc:creator><![CDATA[Shaikh Quader]]></dc:creator><pubDate>Sun, 27 Jul 2025 20:52:50 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/169401734/7c22e6b4a5e2d61b87f61d42afe05c17.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<h3><strong>The Problem Every AI Engineer Faces</strong></h3><p>You've built AI agent prototypes. They work in notebooks. But how do you deploy them in production?</p><p>I faced this exact challenge after building several AI Agent prototypes over the past few months.</p><p></p><h3><strong>The First Critical Step</strong></h3><p>Convert workflows from notebooks into APIs.</p><p>The above video shows you how I tackled this challenge with a real agentic RAG system built using the Db2 LangChain connector.</p><p></p><h3><strong>What You'll See in Action:</strong> </h3><ul><li><p>GitHub repo setup on macOS </p></li><li><p>FastAPI deployment process </p></li><li><p>Live command line testing </p></li><li><p>Notebook logic converted to production endpoint</p></li></ul><p><strong>Get the Code:</strong> <a href="https://github.com/shaikhq/agentic-rag-db2/tree/main/document-ingestion-api">Agentic RAG Document Ingestion Code + macOS setup</a></p><p></p><h3><strong>The Technical Architecture</strong></h3><p>The agentic RAG workflow has two components:</p><p><strong>Document Ingestion Pipeline</strong> </p><ul><li><p>Downloads documents from URLs</p></li><li><p>Extracts and cleans content</p></li><li><p>Vectorizes the data</p></li><li><p>Inserts vectors into a Db2 table</p></li></ul><p><strong>Agent RAG Workflow</strong> (coming next)</p><ul><li><p>Orchestrates the agent workflow using LangGraph</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!v2b_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9f003cb-954f-4cf4-9109-55787f6472da_591x838.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!v2b_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9f003cb-954f-4cf4-9109-55787f6472da_591x838.png 424w, https://substackcdn.com/image/fetch/$s_!v2b_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9f003cb-954f-4cf4-9109-55787f6472da_591x838.png 848w, https://substackcdn.com/image/fetch/$s_!v2b_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9f003cb-954f-4cf4-9109-55787f6472da_591x838.png 1272w, https://substackcdn.com/image/fetch/$s_!v2b_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9f003cb-954f-4cf4-9109-55787f6472da_591x838.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!v2b_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9f003cb-954f-4cf4-9109-55787f6472da_591x838.png" width="591" height="838" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a9f003cb-954f-4cf4-9109-55787f6472da_591x838.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:838,&quot;width&quot;:591,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:108741,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aiarchplaybook.substack.com/i/169401734?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9f003cb-954f-4cf4-9109-55787f6472da_591x838.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!v2b_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9f003cb-954f-4cf4-9109-55787f6472da_591x838.png 424w, https://substackcdn.com/image/fetch/$s_!v2b_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9f003cb-954f-4cf4-9109-55787f6472da_591x838.png 848w, https://substackcdn.com/image/fetch/$s_!v2b_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9f003cb-954f-4cf4-9109-55787f6472da_591x838.png 1272w, https://substackcdn.com/image/fetch/$s_!v2b_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9f003cb-954f-4cf4-9109-55787f6472da_591x838.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h3><strong>What's Coming Next</strong></h3><p>This is the first in my "AI Agents in Production" series. Each video will build toward a complete production deployment of this example AI Agent.</p>]]></content:encoded></item><item><title><![CDATA[From LangChain to LangGraph: When Simple Chains Aren't Enough]]></title><description><![CDATA[LangGraph builds on LangChain to support smarter, more dynamic AI workflows]]></description><link>https://aiarchitectplaybook.com/p/from-langchain-to-langgraph-when</link><guid isPermaLink="false">https://aiarchitectplaybook.com/p/from-langchain-to-langgraph-when</guid><dc:creator><![CDATA[Shaikh Quader]]></dc:creator><pubDate>Mon, 21 Jul 2025 02:45:09 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!87j8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4ecde30-301b-4ce8-ba30-c29fdc8c7fc7_1890x1810.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!87j8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4ecde30-301b-4ce8-ba30-c29fdc8c7fc7_1890x1810.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!87j8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4ecde30-301b-4ce8-ba30-c29fdc8c7fc7_1890x1810.png 424w, https://substackcdn.com/image/fetch/$s_!87j8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4ecde30-301b-4ce8-ba30-c29fdc8c7fc7_1890x1810.png 848w, https://substackcdn.com/image/fetch/$s_!87j8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4ecde30-301b-4ce8-ba30-c29fdc8c7fc7_1890x1810.png 1272w, https://substackcdn.com/image/fetch/$s_!87j8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4ecde30-301b-4ce8-ba30-c29fdc8c7fc7_1890x1810.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!87j8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4ecde30-301b-4ce8-ba30-c29fdc8c7fc7_1890x1810.png" width="1456" height="1394" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a4ecde30-301b-4ce8-ba30-c29fdc8c7fc7_1890x1810.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1394,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:753749,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://aiarchplaybook.substack.com/i/168822343?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4ecde30-301b-4ce8-ba30-c29fdc8c7fc7_1890x1810.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!87j8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4ecde30-301b-4ce8-ba30-c29fdc8c7fc7_1890x1810.png 424w, https://substackcdn.com/image/fetch/$s_!87j8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4ecde30-301b-4ce8-ba30-c29fdc8c7fc7_1890x1810.png 848w, https://substackcdn.com/image/fetch/$s_!87j8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4ecde30-301b-4ce8-ba30-c29fdc8c7fc7_1890x1810.png 1272w, https://substackcdn.com/image/fetch/$s_!87j8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4ecde30-301b-4ce8-ba30-c29fdc8c7fc7_1890x1810.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>When I began building applications with large language models, LangChain was my starting point. It offered everything I needed to assemble a working system: tools for loading and splitting documents, embedding models for vector search, retrievers to find relevant content, and prompt templates to shape responses.</p><p>Using these tools, I built a basic retrieval-augmented generation (RAG) workflow. The steps were straightforward and sequential: receive a question, embed it, search a vector store, fill the context into a prompt, and pass it to an LLM. This kind of linear pipeline&#8212;with fixed steps and no branching&#8212;matched LangChain's design. For that use case, it worked well.</p><p>But as I explored more complex workflows, LangChain began to show its limits. I wanted to evaluate the quality of the retrieved content before sending it to the LLM. If the results were weak, I wanted to revise the question or try a different search method. In some cases, I wanted to retry a step or include a human in the loop. These were no longer simple chains of tasks. They were decisions, conditions, and loops&#8212;features that don't come naturally in a linear framework.</p><p>At that point, I needed more than a toolkit. I needed a way to orchestrate logic, not just execute steps. That's when I turned to LangGraph.</p><div><hr></div><h3><strong>LangChain and LangGraph: Complementary Roles</strong></h3><p>LangChain and LangGraph serve different purposes, but they work together.</p><p>LangChain is both a tooling ecosystem and a workflow library. It provides integrations with language models, vector stores, embedding tools, document loaders, prompt templates, and more. It also supports basic orchestration, letting you compose linear workflows and agent loops using its Chain, Runnable, and AgentExecutor abstractions.</p><p>LangGraph builds on top of this foundation. It does not replace LangChain&#8212;it reuses and extends it. All the tools and model integrations offered by LangChain remain available. What LangGraph adds is a graph-based execution model, inspired by state machines, which enables complex control flows. You can define branches, retry conditions, state persistence, and error handling&#8212;all within a structured, declarative graph.</p><p>In LangChain alone, you can implement similar behaviors, but you must manage logic manually using Python. As your workflow grows, this can become unwieldy. LangGraph helps you organize that complexity using nodes (which represent steps) and edges (which represent conditions and transitions). This makes your workflow easier to reason about and maintain.</p><div><hr></div><h3>Linear vs. Adaptive Workflows</h3><p>Here's a basic example to show the difference:</p><p><strong>LangChain (linear):</strong></p><pre><code><code>Question &#8594; Embed &#8594; Search &#8594; Prompt &#8594; LLM &#8594; Answer</code></code></pre><p><strong>LangGraph (adaptive):</strong><code>     </code></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!478m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83df8361-1e3b-469c-9ca9-f70e75fde73b_877x509.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!478m!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83df8361-1e3b-469c-9ca9-f70e75fde73b_877x509.png 424w, https://substackcdn.com/image/fetch/$s_!478m!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83df8361-1e3b-469c-9ca9-f70e75fde73b_877x509.png 848w, https://substackcdn.com/image/fetch/$s_!478m!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83df8361-1e3b-469c-9ca9-f70e75fde73b_877x509.png 1272w, https://substackcdn.com/image/fetch/$s_!478m!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83df8361-1e3b-469c-9ca9-f70e75fde73b_877x509.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!478m!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83df8361-1e3b-469c-9ca9-f70e75fde73b_877x509.png" width="877" height="509" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83df8361-1e3b-469c-9ca9-f70e75fde73b_877x509.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:509,&quot;width&quot;:877,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:72942,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aiarchplaybook.substack.com/i/168822343?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83df8361-1e3b-469c-9ca9-f70e75fde73b_877x509.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!478m!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83df8361-1e3b-469c-9ca9-f70e75fde73b_877x509.png 424w, https://substackcdn.com/image/fetch/$s_!478m!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83df8361-1e3b-469c-9ca9-f70e75fde73b_877x509.png 848w, https://substackcdn.com/image/fetch/$s_!478m!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83df8361-1e3b-469c-9ca9-f70e75fde73b_877x509.png 1272w, https://substackcdn.com/image/fetch/$s_!478m!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83df8361-1e3b-469c-9ca9-f70e75fde73b_877x509.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In one of my recent workflows, I needed to verify whether the retrieved content was relevant. If it wasn't, I wanted to reformulate the query and try again, possibly with a different strategy. With LangGraph, I could express this flow directly within the graph. The conditions, retries, and alternate paths were part of the structure&#8212;not hidden inside glue code.</p><div><hr></div><h3>Choosing the Right Tool</h3><blockquote><p><strong>&#129300; Key Decision: Do you need branching, retries, or conditional logic?</strong></p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!AUSH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d92c7e1-2d11-4d5c-887d-59064edd55f7_881x286.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!AUSH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d92c7e1-2d11-4d5c-887d-59064edd55f7_881x286.png 424w, https://substackcdn.com/image/fetch/$s_!AUSH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d92c7e1-2d11-4d5c-887d-59064edd55f7_881x286.png 848w, https://substackcdn.com/image/fetch/$s_!AUSH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d92c7e1-2d11-4d5c-887d-59064edd55f7_881x286.png 1272w, https://substackcdn.com/image/fetch/$s_!AUSH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d92c7e1-2d11-4d5c-887d-59064edd55f7_881x286.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!AUSH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d92c7e1-2d11-4d5c-887d-59064edd55f7_881x286.png" width="881" height="286" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d92c7e1-2d11-4d5c-887d-59064edd55f7_881x286.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:286,&quot;width&quot;:881,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:121747,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aiarchplaybook.substack.com/i/168822343?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d92c7e1-2d11-4d5c-887d-59064edd55f7_881x286.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!AUSH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d92c7e1-2d11-4d5c-887d-59064edd55f7_881x286.png 424w, https://substackcdn.com/image/fetch/$s_!AUSH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d92c7e1-2d11-4d5c-887d-59064edd55f7_881x286.png 848w, https://substackcdn.com/image/fetch/$s_!AUSH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d92c7e1-2d11-4d5c-887d-59064edd55f7_881x286.png 1272w, https://substackcdn.com/image/fetch/$s_!AUSH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d92c7e1-2d11-4d5c-887d-59064edd55f7_881x286.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Use <strong>LangChain&#8217;s workflow</strong> when:</p><ul><li><p>Your workflow is linear and predictable.</p></li><li><p>You don't need branching logic, retries, or persistent state.</p></li><li><p>You're building a simple prototype or want to test tools quickly.</p></li></ul><p>Use <strong>LangGraph</strong> when:</p><ul><li><p>Your workflow depends on decisions, loops, or adaptive behavior.</p></li><li><p>You need to retry steps, rewrite inputs, or fall back to alternate strategies.</p></li><li><p>You want structure, clarity, and modularity as your logic becomes more complex.</p></li></ul><p>It's important to understand that <strong>LangGraph can be used with or without LangChain</strong>. While LangGraph can be used standalone, it also integrates seamlessly with any LangChain components. When used together, LangChain gives you the components; LangGraph helps you wire them together in richer ways.</p><div><hr></div><h3>Final Thoughts</h3><p>LangChain helped me get started. It let me focus on building working pipelines quickly and gave me confidence in integrating LLMs with useful tools. But as my workflows grew more intelligent&#8212;needing logic, state, and decisions&#8212;I needed more structure. That's where LangGraph became essential.</p><p>You don't need LangGraph for every project. But when your workflow is more than just a sequence of steps&#8212;when it needs to think, adapt, and respond&#8212;LangGraph gives you the framework to do it cleanly and with control.</p><ul><li><p>Structures your workflow as a graph of nodes and transitions for greater transparency and control</p></li></ul><p>Although <strong>LangGraph can run standalone</strong>, most developers use it alongside LangChain to take advantage of its rich ecosystem. LangGraph doesn't replace LangChain&#8212;it extends it, especially when workflows demand smarter behavior.</p>]]></content:encoded></item><item><title><![CDATA[𝐖𝐡𝐚𝐭 𝐢𝐭 𝐭𝐚𝐤𝐞𝐬 𝐭𝐨 𝐛𝐮𝐢𝐥𝐝 𝐚 𝐩𝐫𝐨𝐝𝐮𝐜𝐭𝐢𝐨𝐧-𝐠𝐫𝐚𝐝𝐞 𝐌𝐂𝐏 𝐬𝐞𝐫𝐯𝐞𝐫]]></title><description><![CDATA[My notes from exploring how to design, build, and test a production-grade MCP server using open-source Python tools.]]></description><link>https://aiarchitectplaybook.com/p/2ce</link><guid isPermaLink="false">https://aiarchitectplaybook.com/p/2ce</guid><dc:creator><![CDATA[Shaikh Quader]]></dc:creator><pubDate>Tue, 01 Jul 2025 21:51:38 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!zm_8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa83e7cd2-f1af-44a2-b41a-69e5198dd446_1024x1536.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I&#8217;ve been looking into how to build a full-featured MCP server&#8212;starting with the official MCP site and then digging into a few open projects and examples.</p><p>Two simple questions are driving this:</p><p><strong>&#10122; What are the main pieces needed to build a production-grade MCP server?</strong><br><strong>&#10123; What open-source tools can I use locally (no cloud setup) to build each piece in a simple and testable way?</strong></p><div><hr></div><h2>What I&#8217;ve figured out so far:</h2><p>Below are the <strong>main parts</strong> of the system and the <strong>tools</strong> I started exploring:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zm_8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa83e7cd2-f1af-44a2-b41a-69e5198dd446_1024x1536.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zm_8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa83e7cd2-f1af-44a2-b41a-69e5198dd446_1024x1536.png 424w, https://substackcdn.com/image/fetch/$s_!zm_8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa83e7cd2-f1af-44a2-b41a-69e5198dd446_1024x1536.png 848w, https://substackcdn.com/image/fetch/$s_!zm_8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa83e7cd2-f1af-44a2-b41a-69e5198dd446_1024x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!zm_8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa83e7cd2-f1af-44a2-b41a-69e5198dd446_1024x1536.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zm_8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa83e7cd2-f1af-44a2-b41a-69e5198dd446_1024x1536.png" width="1024" height="1536" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a83e7cd2-f1af-44a2-b41a-69e5198dd446_1024x1536.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1536,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1820615,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://aiarchplaybook.substack.com/i/167306182?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa83e7cd2-f1af-44a2-b41a-69e5198dd446_1024x1536.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!zm_8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa83e7cd2-f1af-44a2-b41a-69e5198dd446_1024x1536.png 424w, https://substackcdn.com/image/fetch/$s_!zm_8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa83e7cd2-f1af-44a2-b41a-69e5198dd446_1024x1536.png 848w, https://substackcdn.com/image/fetch/$s_!zm_8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa83e7cd2-f1af-44a2-b41a-69e5198dd446_1024x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!zm_8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa83e7cd2-f1af-44a2-b41a-69e5198dd446_1024x1536.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><div><hr></div><h3><strong>&#9312; Accept and route incoming HTTP requests</strong></h3><p>Use a lightweight web framework like <strong>&#119813;&#119834;&#119852;&#119853;&#119808;&#119823;&#119816;</strong> (used internally by <strong>&#119813;&#119834;&#119852;&#119853;&#119820;&#119810;&#119823;</strong>) to expose the MCP server over <strong>Streamable HTTP</strong> or <strong>SSE</strong>.</p><div><hr></div><h3><strong>&#9313; Parse and validate the JSON request payload</strong></h3><p>Use <strong>&#119823;&#119858;&#119837;&#119834;&#119847;&#119853;&#119842;&#119836;</strong> (via FastMCP) to convert JSON input into clean, type-checked Python objects.</p><div><hr></div><h3><strong>&#9314; Verify the identity of the caller</strong></h3><p>Use <strong>&#119823;&#119858;&#119817;&#119830;&#119827;</strong> to decode and validate JSON Web Tokens (JWTs) that prove who the client is.</p><div><hr></div><h3><strong>&#9315; Check what the caller is allowed to do</strong></h3><p>Use <strong>&#119813;&#119834;&#119852;&#119853;&#119808;&#119823;&#119816;'&#119852; Depends</strong> system to apply role-based access control for each request.</p><div><hr></div><h3><strong>&#9316; Enforce request limits per user</strong></h3><p>Use <strong>&#119852;&#119845;&#119848;&#119856;&#119834;&#119849;&#119842;</strong> to add rate-limiting middleware so users can&#8217;t overload the server.</p><div><hr></div><h3><strong>&#9317; Validate tool input before execution</strong></h3><p>Let <strong>&#119823;&#119858;&#119837;&#119834;&#119847;&#119853;&#119842;&#119836;</strong> (automatically used by FastMCP) validate input arguments before calling the tool.</p><div><hr></div><h3><strong>&#9318; Look up the requested tool by name</strong></h3><p>Use FastMCP&#8217;s <code>@mcp.tool()</code> decorator to register callable tools.<br>(You can also use <code>@mcp.resource()</code> for shared resources and <code>@mcp.prom()</code> for prompt templates.)</p><div><hr></div><h3><strong>&#9319; Run the tool logic</strong></h3><p>FastMCP handles calling the right tool function, passing input, and handling any exceptions.</p><div><hr></div><h3><strong>&#9320; Format the tool output into a standard response</strong></h3><p>Use <strong>&#119817;&#119826;&#119822;&#119821;-&#119825;&#119823;&#119810;</strong>, which FastMCP handles under the hood, to return either the result or a structured error.</p><div><hr></div><h3><strong>&#9321; Send the response back to the client</strong></h3><p>Use built-in HTTP support or <strong>Server-Sent Events (SSE)</strong> in FastAPI/FastMCP to return the response.</p><div><hr></div><p>I haven&#8217;t put everything together yet, but I&#8217;ve started.<br>I&#8217;ll keep sharing what I learn as I go&#8212;hoping to eventually have a full, working MCP server built from scratch using these tools.</p>]]></content:encoded></item><item><title><![CDATA[Getting Started with ContextForge MCP Gateway on macOS]]></title><description><![CDATA[A step-by-step guide to installing and running ContextForge MCP Gateway locally on macOS]]></description><link>https://aiarchitectplaybook.com/p/getting-started-with-contextforge</link><guid isPermaLink="false">https://aiarchitectplaybook.com/p/getting-started-with-contextforge</guid><dc:creator><![CDATA[Shaikh Quader]]></dc:creator><pubDate>Thu, 26 Jun 2025 04:19:32 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!m1fb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f4a5462-8cc8-4a56-b250-1686d1fec180_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I recently began exploring the <strong>ContextForge MCP Gateway</strong> on my macOS. To do this, I followed the setup instructions from the official <a href="https://github.com/IBM/mcp-context-forge">GitHub repository</a>. This post documents the steps I took to install and run the gateway locally on macOS.</p><p>This is the first in a short series of posts. In this one, I focus on setting up the MCP Gateway. I have not yet connected the gateway to any MCP servers&#8212;that will be covered in a future post.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://aiarchitectplaybook.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading AI Architect's Playbook! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!m1fb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f4a5462-8cc8-4a56-b250-1686d1fec180_1536x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!m1fb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f4a5462-8cc8-4a56-b250-1686d1fec180_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!m1fb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f4a5462-8cc8-4a56-b250-1686d1fec180_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!m1fb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f4a5462-8cc8-4a56-b250-1686d1fec180_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!m1fb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f4a5462-8cc8-4a56-b250-1686d1fec180_1536x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!m1fb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f4a5462-8cc8-4a56-b250-1686d1fec180_1536x1024.png" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7f4a5462-8cc8-4a56-b250-1686d1fec180_1536x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2029093,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://aiarchplaybook.substack.com/i/166865510?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f4a5462-8cc8-4a56-b250-1686d1fec180_1536x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!m1fb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f4a5462-8cc8-4a56-b250-1686d1fec180_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!m1fb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f4a5462-8cc8-4a56-b250-1686d1fec180_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!m1fb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f4a5462-8cc8-4a56-b250-1686d1fec180_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!m1fb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7f4a5462-8cc8-4a56-b250-1686d1fec180_1536x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2>Background</h2><h3>What Is the Model Context Protocol?</h3><p>The <strong>Model Context Protocol (MCP)</strong> defines a standard way for AI agents to invoke external tools, APIs, and prompt templates. Instead of implementing custom integration logic for each action, an AI agent can issue a generic request to an MCP server, which exposes a registry of available capabilities. The protocol separates the agent&#8217;s reasoning from the mechanics of how tools are discovered, invoked, and secured. This design enables modular, scalable integration between agents and real-world systems.</p><h3>Why MCP Gateway?</h3><p><strong>Building an MCP server is the easier part. Integrating it cleanly with AI agents? Not so much.</strong></p><p>It starts simple&#8212;an AI agent calling one MCP server for a database, then another for search, maybe a third for file storage.</p><p>But as the number of MCP servers and tools grows, things get messy:</p><ul><li><p>Servers use different logins and authentication flows</p></li><li><p>There are too many endpoints to manage and secure</p></li><li><p>AI-generated requests lack guardrails and observability</p></li><li><p>Integration logic clutters the AI agent&#8217;s codebase</p></li></ul><p>While building MCP servers for my own projects, I started to see how the above issues could pile up quickly. That&#8217;s when I began exploring <strong>ContextForge MCP Gateway</strong>, an open-source middleware from IBM.</p><p>The MCP Gateway sits between your AI agent (MCP client) and your tool-specific MCP servers. It provides:</p><ul><li><p>Centralized login across servers</p></li><li><p>A single endpoint to access multiple tools</p></li><li><p>Built-in observability (logs and a web UI)</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4ffU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a8b0045-d5bb-48cd-b2c3-9776559ed39f_1536x1024.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4ffU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a8b0045-d5bb-48cd-b2c3-9776559ed39f_1536x1024.jpeg 424w, https://substackcdn.com/image/fetch/$s_!4ffU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a8b0045-d5bb-48cd-b2c3-9776559ed39f_1536x1024.jpeg 848w, https://substackcdn.com/image/fetch/$s_!4ffU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a8b0045-d5bb-48cd-b2c3-9776559ed39f_1536x1024.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!4ffU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a8b0045-d5bb-48cd-b2c3-9776559ed39f_1536x1024.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4ffU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a8b0045-d5bb-48cd-b2c3-9776559ed39f_1536x1024.jpeg" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7a8b0045-d5bb-48cd-b2c3-9776559ed39f_1536x1024.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;diagram&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="diagram" title="diagram" srcset="https://substackcdn.com/image/fetch/$s_!4ffU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a8b0045-d5bb-48cd-b2c3-9776559ed39f_1536x1024.jpeg 424w, https://substackcdn.com/image/fetch/$s_!4ffU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a8b0045-d5bb-48cd-b2c3-9776559ed39f_1536x1024.jpeg 848w, https://substackcdn.com/image/fetch/$s_!4ffU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a8b0045-d5bb-48cd-b2c3-9776559ed39f_1536x1024.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!4ffU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a8b0045-d5bb-48cd-b2c3-9776559ed39f_1536x1024.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2>macOS Setup Instructions</h2><p>The following steps describe how I set up and ran the MCP Gateway locally on macOS (M2 Apple Chip).</p><div><hr></div><h3>Prerequisites</h3><p>Before beginning, I ensured that I had the following installed:</p><ul><li><p>macOS with Python 3.12 (the MCP Gateway requires Python 3.10 or higher)</p></li></ul><div><hr></div><h3>Step 1: Install <code>uv</code></h3><p><code>uv</code> is a fast, Rust-based tool for managing Python packages and environments.</p><p>To install it, run:</p><pre><code><code>curl -LsSf https://astral.sh/uv/install.sh | sh
</code></code></pre><p>To make sure it's on your shell's PATH, add the following line to your shell configuration (e.g., <code>.zshrc</code> or <code>.bashrc</code>), or run it directly in your terminal:</p><pre><code><code>export PATH="$HOME/.local/bin:$PATH"
</code></code></pre><p>To verify the installation, run:</p><pre><code><code>which uv
uv --version
</code></code></pre><div><hr></div><h3>Step 2: Create a Project and Install the Gateway</h3><p>To create a project directory and set up a local Python environment, run:</p><pre><code><code>mkdir mcpgateway
cd mcpgateway
uv venv --python $(which python3.12)
source .venv/bin/activate
</code></code></pre><p>To install the MCP Gateway inside the environment, run:</p><pre><code><code>uv pip install mcp-contextforge-gateway
</code></code></pre><div><hr></div><h3>Step 3: Start the MCP Gateway Server</h3><p>To set the password for the default <code>admin</code> user, run:</p><pre><code><code>export BASIC_AUTH_PASSWORD=pass
</code></code></pre><p>To set the JWT secret key for signing tokens, run:</p><pre><code><code>export JWT_SECRET_KEY=my-test-key
</code></code></pre><p>To start the server, run:</p><pre><code><code>mcpgateway --host 0.0.0.0 --port 4444 &amp;
</code></code></pre><p>This command starts the server in the background, listening on all network interfaces at port 4444.</p><p>To access the admin UI, open a browser and go to:</p><pre><code><code>http://0.0.0.0:4444/admin
</code></code></pre><p>To log in, use the following credentials:</p><pre><code><code>Username: admin
Password: pass
</code></code></pre><p>This loads the administrative interface for the gateway.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!M-UM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25af9d92-277b-4df8-9687-803099649136_2552x1054.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!M-UM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25af9d92-277b-4df8-9687-803099649136_2552x1054.png 424w, https://substackcdn.com/image/fetch/$s_!M-UM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25af9d92-277b-4df8-9687-803099649136_2552x1054.png 848w, https://substackcdn.com/image/fetch/$s_!M-UM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25af9d92-277b-4df8-9687-803099649136_2552x1054.png 1272w, https://substackcdn.com/image/fetch/$s_!M-UM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25af9d92-277b-4df8-9687-803099649136_2552x1054.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!M-UM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25af9d92-277b-4df8-9687-803099649136_2552x1054.png" width="1456" height="601" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/25af9d92-277b-4df8-9687-803099649136_2552x1054.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:601,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:544871,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aiarchplaybook.substack.com/i/166865510?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25af9d92-277b-4df8-9687-803099649136_2552x1054.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!M-UM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25af9d92-277b-4df8-9687-803099649136_2552x1054.png 424w, https://substackcdn.com/image/fetch/$s_!M-UM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25af9d92-277b-4df8-9687-803099649136_2552x1054.png 848w, https://substackcdn.com/image/fetch/$s_!M-UM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25af9d92-277b-4df8-9687-803099649136_2552x1054.png 1272w, https://substackcdn.com/image/fetch/$s_!M-UM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25af9d92-277b-4df8-9687-803099649136_2552x1054.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h3>Step 4: Generate a Bearer Token</h3><p>To authenticate future requests, create a bearer token:</p><pre><code><code>export MCPGATEWAY_BEARER_TOKEN=$(python3 -m mcpgateway.utils.create_jwt_token \
  --username admin --exp 10080 --secret my-test-key)
</code></code></pre><p>This token is valid for seven days and is used to authorize calls to the gateway.</p><div><hr></div><h3>Step 5: Test the Gateway</h3><p>To confirm that the server was running and accepting authenticated requests, use the following:</p><pre><code><code>curl -s -H "Authorization: Bearer $MCPGATEWAY_BEARER_TOKEN" \
  http://127.0.0.1:4444/version | jq
</code></code></pre><p>This returned version metadata in JSON format.</p><p>This concludes the local macOS setup. The steps closely follow the instructions published in the MCP Gateway GitHub repository, with slight expansions and clarifications added for readability.</p><div><hr></div><h3>Next Steps</h3><p>Now that the MCP Gateway is set up and running locally, the next step is to start using it by registering one or more MCP servers behind it.</p><p>The official ContextForge MCP Gateway repository includes a dedicated section that shows how to do this in a local setup:</p><p><a href="https://github.com/IBM/mcp-context-forge#end-to-end-demo-register-a-local-mcp-server">End&#8209;to&#8209;end demo (register a local MCP server)</a></p><p>In a future post, I plan to follow that path and walk through how to connect MCP servers to the gateway and route tool requests through it.</p><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://aiarchitectplaybook.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading AI Architect's Playbook! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[AI Agents in the Enterprise: Lessons from the Field]]></title><description><![CDATA[Insights from practitioners who&#8217;ve taken AI agents from prototype to production.]]></description><link>https://aiarchitectplaybook.com/p/ai-agents-in-the-enterprise-lessons</link><guid isPermaLink="false">https://aiarchitectplaybook.com/p/ai-agents-in-the-enterprise-lessons</guid><dc:creator><![CDATA[Shaikh Quader]]></dc:creator><pubDate>Tue, 24 Jun 2025 13:40:18 GMT</pubDate><enclosure url="https://substackcdn.com/image/youtube/w_728,c_limit/tD1aVoQhbwQ" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Today I listened to a panel discussion hosted by <strong>OpenPipe</strong> on <em>Lessons Learned from Building Enterprise AI Agents</em>. It featured three panelists with firsthand experience in building and deploying agents at scale:</p><ul><li><p><strong>Sandy Besson</strong> &#8211; Applied AI Research Engineer at IBM Research</p></li><li><p><strong>Chris Chileles</strong> &#8211; CEO at Zig</p></li><li><p><strong>Boaz Ashkenazi</strong> &#8211; CEO at Augmented AI Labs</p></li></ul><p>The discussion focused on the practical challenges of building AI agents for enterprise use&#8212;not theoretical architectures or future visions, but the realities of deployment, safety, measurement, and cost. Topics included everything from managing expectations and ROI, to protocol choices like <strong>MCP</strong>, to what they&#8217;d build differently now.</p><div id="youtube2-tD1aVoQhbwQ" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;tD1aVoQhbwQ&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/tD1aVoQhbwQ?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>Below is a distilled summary of the key insights they shared.</p><h3>1. Demos Don&#8217;t Reflect Reality</h3><ul><li><p>Clients love polished demos&#8212;but want strict guardrails in production, especially for customer-facing agents.</p></li><li><p>Leaders often expect fast ROI. But value takes time, preparation, and honest risk evaluation.</p></li><li><p>Systems that work in testing often become brittle or expensive at scale.</p></li></ul><div><hr></div><h3>2. Success Requires Trade-Offs</h3><ul><li><p>You can&#8217;t maximize speed, accuracy, and cost all at once.</p></li><li><p>One panelist mentioned of the <strong>Pareto frontier</strong> framework to guide teams through meaningful trade-offs.</p></li><li><p>In legal, finance, and other high-risk areas, human oversight is essential. A simple rule:<br><em>If you wouldn&#8217;t trust an intern to do it alone, don&#8217;t let the AI.</em></p></li></ul><div><hr></div><h3>3. Measuring Quality Isn&#8217;t Simple</h3><ul><li><p>Standard metrics (accuracy, F1 score) help, but don&#8217;t tell the full story.</p></li><li><p>Human feedback, logging, and telemetry are key to assessing response quality.</p></li><li><p>Business teams should define what &#8220;good&#8221; means&#8212;not just engineers.</p></li></ul><div><hr></div><h3>4. Handle Data with Care</h3><ul><li><p>Sensitive data is driving many teams <strong>back to on-prem</strong> infrastructure.</p></li><li><p>Limit what the model sees and what it can output. Apply the <strong>least-access principle</strong>.</p></li><li><p>In many cases, small, focused ML models outperform general-purpose LLMs&#8212;especially when reliability and privacy matter.</p></li></ul><div><hr></div><h3>5. Design for Safety and Modularity</h3><ul><li><p>Guardrails are needed both <strong>before</strong> and <strong>after</strong> the model runs.</p></li><li><p>Don&#8217;t treat models as all-knowing. Consider them as intern&#8212;with boundaries.</p></li><li><p>Use <strong>synthetic data</strong> to model edge cases without exposing sensitive records.</p></li></ul><div><hr></div><h3>6. Stay Flexible with Tools and Protocols</h3><ul><li><p>New agent communication protocols (like <strong>MCP</strong>, <strong>A2A</strong>, <strong>ACP</strong>) are emerging, but no dominant standard exists yet.</p></li><li><p>Choose simple, open tools with strong community backing.</p></li><li><p>Avoid overbuilding. Many orchestration tools built last year are already obsolete.</p></li></ul><div><hr></div><h3>7. SaaS Isn&#8217;t Dead&#8212;But It&#8217;s Changing</h3><ul><li><p>SaaS still matters, especially for small to mid-sized teams.</p></li><li><p>But <strong>usage-based AI workloads</strong> don&#8217;t fit neatly into per-user pricing models.</p></li><li><p>In regulated industries, adoption is slower than it seems. Risk, compliance, and culture slow change.</p></li></ul><div><hr></div><h3>8. Build for What&#8217;s Next</h3><ul><li><p>Many custom tools built today may become unnecessary tomorrow as models improve.</p></li><li><p>Before building, ask: &#8220;Can we wait six months?&#8221;</p></li><li><p>Favor modularity and future adaptability over quick fixes or heavy infrastructure.</p></li></ul><div><hr></div><h3>Closing Thought</h3><p>The discussion remained focused on practical issues&#8212;what it takes to build and deploy AI agents responsibly in complex enterprise settings. For those working on agent frameworks, enterprise AI platforms, or real-world deployments, it offers several useful takeaways.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://aiarchitectplaybook.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading AI Architect's Playbook! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Software Is Changing (Again): My Take on Andrej Karpathy’s Talk]]></title><description><![CDATA[Why coding as we know it may be fading&#8212;and what&#8217;s replacing it.]]></description><link>https://aiarchitectplaybook.com/p/software-is-changing-again-my-take</link><guid isPermaLink="false">https://aiarchitectplaybook.com/p/software-is-changing-again-my-take</guid><dc:creator><![CDATA[Shaikh Quader]]></dc:creator><pubDate>Thu, 19 Jun 2025 13:49:00 GMT</pubDate><enclosure url="https://substackcdn.com/image/youtube/w_728,c_limit/LCEmiRjPEtQ" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>What happens when we stop writing code&#8212;and start describing what we want instead?</em></p><p>Today, I watched Andrej Karpathy&#8217;s keynote, <em>&#8220;Software is Changing (Again),&#8221;</em> at the AI Startup School hosted by <a href="https://a16z.com">a16z</a>.<br><strong>Karpathy is a leading AI researcher and educator, known for his work at OpenAI, Tesla, and Stanford.</strong></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://aiarchitectplaybook.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading AI Architect's Playbook! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>His talk wasn&#8217;t a technical deep dive. It offered a broad, thoughtful look at how software development is shifting in the age of large language models (LLMs). For me, it helped clarify where things are heading&#8212;and what skills and habits we need to adapt.</p><p>What follows is my summary of the talk, including a few interpretations and examples I added to better understand the talk.</p><div id="youtube2-LCEmiRjPEtQ" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;LCEmiRjPEtQ&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/LCEmiRjPEtQ?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><h2>Who Was This Talk For?</h2><p>Karpathy was speaking to startup founders, engineers, and students preparing to build AI-first tools. But his message applies to anyone involved in software development today.</p><h2>The Three Eras of Software</h2><p>Karpathy describes the evolution of software in three distinct paradigms. Knowing the strengths and tradeoffs of each helps us decide which one to apply&#8212;and when.</p><p>These paradigms didn&#8217;t replace each other overnight. Each emerged in response to new capabilities, and all three are still relevant. The first has shaped several decades of software history; the second has taken hold more recently as machine learning matured; and the third is unfolding now with the rise of large language models.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hRll!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc96117e-fe0a-4a0a-b71a-d7383f91f4ee_1536x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hRll!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc96117e-fe0a-4a0a-b71a-d7383f91f4ee_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!hRll!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc96117e-fe0a-4a0a-b71a-d7383f91f4ee_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!hRll!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc96117e-fe0a-4a0a-b71a-d7383f91f4ee_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!hRll!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc96117e-fe0a-4a0a-b71a-d7383f91f4ee_1536x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hRll!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc96117e-fe0a-4a0a-b71a-d7383f91f4ee_1536x1024.png" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bc96117e-fe0a-4a0a-b71a-d7383f91f4ee_1536x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1770325,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aiarchplaybook.substack.com/i/166320874?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc96117e-fe0a-4a0a-b71a-d7383f91f4ee_1536x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hRll!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc96117e-fe0a-4a0a-b71a-d7383f91f4ee_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!hRll!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc96117e-fe0a-4a0a-b71a-d7383f91f4ee_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!hRll!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc96117e-fe0a-4a0a-b71a-d7383f91f4ee_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!hRll!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc96117e-fe0a-4a0a-b71a-d7383f91f4ee_1536x1024.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>Software 1.0 &#8212; Human-Written Code</h3><p>Developers write logic line by line in a programming language. The system does exactly what it's told. </p><h3>Software 2.0 &#8212; Learned Programs</h3><p>Instead of writing program logic directly, developers train models&#8212;often neural networks&#8212;on data. The model <em>becomes</em> the program.<br>Even so, this approach still requires coding: for building the model, preparing data, training, and evaluation.</p><h3>Software 3.0 &#8212; Prompt-Based Development</h3><p>Now, with LLMs, we describe what we want in natural language. The model generates the code or behavior.<br>This shifts software design away from implementation details and toward goal-oriented interaction.</p><p>Karpathy&#8217;s message is that we need fluency across all three paradigms. Each remains useful, depending on context.</p><h2>From Code to Prompts: What Changes (and What Doesn&#8217;t)</h2><p>While Software 3.0 offers dramatic productivity gains, Karpathy stresses the importance of discipline and control.</p><p>He recommends working with LLMs <strong>one step at a time</strong>:</p><ul><li><p>Generate a small block of code with LLM</p></li><li><p>Review and verify it manually</p></li><li><p>Then proceed to the next step</p></li></ul><p>This loop&#8212;<strong>generate, verify, repeat</strong>&#8212;ensures that humans remain in charge. LLMs assist, but they don&#8217;t replace judgment or accountability.</p><h2>Why Real-World Integration Still Matters</h2><p>Karpathy emphasized that generating code is often the easiest part. But turning that code into a working, reliable product is much harder.</p><p>For example:</p><ul><li><p>You might prompt an LLM to generate a prototype iOS app in a few hours.</p></li><li><p>But integrating that app&#8212;sign-up flows, payment systems, backend connections&#8212;can take days or weeks.</p></li></ul><p>Prompting gets you started. Building something production-ready still takes engineering effort and care.</p><h2>Partial Autonomy: A Better Path for Agents</h2><p>Karpathy also touched on AI agents&#8212;autonomous systems built on LLMs. His advice? Don&#8217;t aim for full autonomy just yet.</p><p>Instead, build <strong>partial-autonomous agents</strong>:</p><ul><li><p>Keep humans in the loop</p></li><li><p>Design systems for transparency and override</p></li><li><p>Apply guardrails in higher-risk scenarios</p></li></ul><p>The result is safer and more usable systems. Autonomy should be earned incrementally, not assumed by default.</p><h2>What LLMs Are&#8212;and Aren&#8217;t&#8212;Good At</h2><h3><strong>Strengths:</strong></h3><ul><li><p>Vast encyclopedic knowledge across domains</p></li><li><p>Handling large context windows in a single session</p></li></ul><h3><strong>Limitations:</strong></h3><ul><li><p>Hallucinating plausible but incorrect answers</p></li><li><p>Lacking internal truth-checking mechanisms</p></li><li><p>Forgetting everything between sessions unless augmented</p></li></ul><p>These limitations are not bugs. They&#8217;re foundational constraints&#8212;and they shape how LLMs should be used responsibly.</p><h2>Final Takeaway</h2><p>Karpathy&#8217;s message was clear:</p><blockquote><p>&#8220;Software is changing again. Developers must evolve with it.&#8221;</p></blockquote><p>That evolution means:</p><ul><li><p>Learning all three paradigms&#8212;Software 1.0, 2.0, and 3.0</p></li><li><p>Using LLMs with precision and skepticism</p></li><li><p>Staying grounded in what it takes to build real, integrated systems</p></li></ul><p>This isn&#8217;t just a tooling shift. It&#8217;s a mindset shift. And it&#8217;s already underway.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://aiarchitectplaybook.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading AI Architect's Playbook! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Video Walkthrough: Building My First Agentic RAG Use Case with LangGraph]]></title><description><![CDATA[My first Agentic RAG workflow: local embeddings, query rewriting, and an LLM that does more than just answer.]]></description><link>https://aiarchitectplaybook.com/p/video-walkthrough-building-my-first</link><guid isPermaLink="false">https://aiarchitectplaybook.com/p/video-walkthrough-building-my-first</guid><dc:creator><![CDATA[Shaikh Quader]]></dc:creator><pubDate>Sun, 15 Jun 2025 02:09:21 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!29fT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feddef9c7-4dac-4f38-9821-8bcbf48d6d0c_1024x1026.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>Introduction</h2><p>Recently, I built my first Agentic RAG workflow using LangGraph. I shared the code and setup steps in <a href="https://github.com/shaikhq/agentic-rag-basic">this</a> GitHub repo.</p><p>Today, I'm publishing a full video walkthrough of this Agentic RAG implementation.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://aiarchitectplaybook.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading AI Architect's Playbook! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>This video is intended for those who are already familiar with basic RAG workflows but haven&#8217;t yet explored Agentic RAG. My goal is to help you get started with your first Agentic RAG project.</p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;4549e528-463e-4b56-b7a4-82c01a5fec9c&quot;,&quot;duration&quot;:null}"></div><h2>Agentic RAG vs. Vanilla RAG: Key Differences</h2><p>Before diving into the code, I want to explain some foundational concepts that helped me understand Agentic RAG and how it differs from vanilla RAG workflows.</p><p><strong>Vanilla RAG Workflow:</strong></p><ul><li><p>A sequence of steps.</p></li><li><p>User submits a query to an LLM app (e.g., chatbot).</p></li><li><p>App retrieves relevant chunks from a vector store.</p></li><li><p>App sends an augmented prompt (original query + retrieved context) to the LLM.</p></li><li><p>LLM returns a response.</p></li></ul><p><strong>Limitations:</strong></p><ul><li><p>The process is static.</p></li><li><p>If the original query is poorly worded, or if retrieval results are weak, there's no recovery mechanism.</p></li><li><p>The LLM is only involved at the end.</p></li></ul><p><strong>Agentic RAG Workflow:</strong></p><ul><li><p>Involves the LLM throughout the workflow.</p></li><li><p>After the initial retrieval, the LLM assesses the quality of the retrieved chunks.</p></li><li><p>If chunks are not highly relevant to the query, the LLM rewrites the query and retries retrieval.</p></li><li><p>This loop continues until relevant content is found. </p></li><li><p>Then, the LLM generates the final answer using the relevant chunks</p></li></ul><p>This dynamic, decision-making process powered by the LLM is what makes the workflow &#8220;agentic.&#8221;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!29fT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feddef9c7-4dac-4f38-9821-8bcbf48d6d0c_1024x1026.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!29fT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feddef9c7-4dac-4f38-9821-8bcbf48d6d0c_1024x1026.png 424w, https://substackcdn.com/image/fetch/$s_!29fT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feddef9c7-4dac-4f38-9821-8bcbf48d6d0c_1024x1026.png 848w, https://substackcdn.com/image/fetch/$s_!29fT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feddef9c7-4dac-4f38-9821-8bcbf48d6d0c_1024x1026.png 1272w, https://substackcdn.com/image/fetch/$s_!29fT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feddef9c7-4dac-4f38-9821-8bcbf48d6d0c_1024x1026.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!29fT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feddef9c7-4dac-4f38-9821-8bcbf48d6d0c_1024x1026.png" width="1024" height="1026" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/eddef9c7-4dac-4f38-9821-8bcbf48d6d0c_1024x1026.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1026,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1425708,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aiarchplaybook.substack.com/i/165971993?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feddef9c7-4dac-4f38-9821-8bcbf48d6d0c_1024x1026.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!29fT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feddef9c7-4dac-4f38-9821-8bcbf48d6d0c_1024x1026.png 424w, https://substackcdn.com/image/fetch/$s_!29fT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feddef9c7-4dac-4f38-9821-8bcbf48d6d0c_1024x1026.png 848w, https://substackcdn.com/image/fetch/$s_!29fT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feddef9c7-4dac-4f38-9821-8bcbf48d6d0c_1024x1026.png 1272w, https://substackcdn.com/image/fetch/$s_!29fT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feddef9c7-4dac-4f38-9821-8bcbf48d6d0c_1024x1026.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2>My Implementation: Improvements and Simplifications</h2><p>I started with an LangChain tutorial on Agentic RAG with LangGraph:</p><p>Agentic RAG with LangGraph: https://langchain-ai.github.io/langgraph/tutorials/rag/langgraph_agentic_rag/</p><p>Then, I made a few key changes in the original tutorial: </p><ol><li><p><strong>Local Embeddings with Llama.cpp</strong><br>I replaced OpenAI&#8217;s embedding generation with a local embedding model using Llama.cpp and the Granite text embedding model. You can switch to any Hugging Face-compatible model in Llama.cpp format.</p></li><li><p><strong>Cleaner Web Document Retrieval</strong><br>The original tutorial used <code>WebBaseLoader</code>, but I found it noisy (extra newlines, artifacts). I switched to a different library that gives cleaner web content. While I didn&#8217;t explore all <code>WebBaseLoader</code> config options, switching to another library was quicker and more effective for me.</p></li><li><p><strong>Sentence-Aware Chunking</strong><br>The original used character-based chunking, which can split in the middle of a sentence. I replaced this with sentence-based chunking for better coherence.</p></li><li><p><strong>LLM Optimization</strong><br>I reused the LLM instance throughout the workflow instead of re-instantiating it multiple times.</p></li></ol><h2>Repo and Setup Instructions</h2><p>You&#8217;ll find the complete project and the macOS setup instructions here:<br>&#128279; <strong>GitHub repo</strong>: <a href="https://github.com/shaikhq/agentic-rag-basic">https://github.com/shaikhq/agentic-rag-basic</a></p><h2>Workflow Overview</h2><p>The RAG pipeline is implemented in a notebook: <code>agent.ipynb</code>.</p><p>I use my blog post titled &#8220;<a href="https://community.ibm.com/community/user/blogs/shaikh-quader/2024/05/07/building-an-in-db-linear-regression-model-with-ibm">A Step-by-Step Guide to Building a Linear Regression Model in IBM Db2</a>&#8221; as the sample knowledge base for the RAG pipeline. </p><h3>Chunking</h3><ul><li><p>Chunks: 200 words max, 50-word overlap.</p></li><li><p>Rationale: Overlap helps maintain context coherence when LLM stitches chunks together.</p></li></ul><h3>Vector Store and Embedding</h3><ul><li><p>Local Granite embedding model via Llama.cpp.</p></li><li><p>In-memory vector store created.</p></li><li><p>A retriever tool is built on top of this store.</p></li></ul><h3>LLM for Answer Generation</h3><ul><li><p>I used Watsonx.ai&#8217;s Mistral-large model.</p></li><li><p>Can be replaced with another provider or a local model.</p></li></ul><h2>Pipeline Functions and Prompts</h2><p>I defined several key components:</p><ol><li><p><strong>Query or Respond Decision</strong><br>Determines whether to retrieve more content, rewrite the query, or end the workflow.</p></li><li><p><strong>Relevance Grading</strong><br>Prompt instructs LLM to score retrieved chunks as relevant or not, using a yes/no format.</p></li><li><p><strong>Query Rewriting</strong><br>Reformulates the user query to improve retrieval quality.</p></li><li><p><strong>Answer Generation</strong><br>Prompt guides the LLM to answer concisely (max 3 sentences). If it doesn&#8217;t know the answer, it says so.</p></li></ol><p>These are the building blocks of the Agentic RAG pipeline.</p><h2>Visualizing the Workflow</h2><p>I created a LangGraph visual representation of the workflow.</p><p>Steps in the graph:</p><ol><li><p><code>generate_query_or_respond</code></p></li><li><p><code>retrieve</code></p></li><li><p><code>rewrite_question</code></p></li><li><p><code>generate_answer</code></p></li></ol><p>Conditional logic controls transitions between these steps based on relevance grading.</p><h2>Testing the RAG pipeline: Questions and Responses</h2><p><strong>Query 1:</strong><br>&#8220;How to calculate summary statistics in Db2?&#8221;<br>&#8594; Workflow runs &#8594; Final answer generated by LLM.</p><p><strong>Query 2:</strong><br>&#8220;How to build a linear regression model with IBM Db2?&#8221;<br>&#8594; Workflow runs &#8594; Final answer generated by LLM.</p><h2>Conclusion</h2><p>That concludes my walkthrough and demo.</p><p>Thanks for reading/watching&#8212;and have fun building your own Agentic RAG pipelines using LangGraph!</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://aiarchitectplaybook.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading AI Architect's Playbook! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item></channel></rss>