<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Friendly Paper Review]]></title><description><![CDATA[One recent, relevant, and important ML paper per week, made accessible for non-ML folks]]></description><link>https://www.friendlypaperreview.com</link><image><url>https://substackcdn.com/image/fetch/$s_!2qgc!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F63ef7a6f-b5eb-49f4-93cc-d6e71b9ea5ef_720x720.png</url><title>Friendly Paper Review</title><link>https://www.friendlypaperreview.com</link></image><generator>Substack</generator><lastBuildDate>Tue, 02 Jun 2026 13:59:07 GMT</lastBuildDate><atom:link href="https://www.friendlypaperreview.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Tim Dingman]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[friendlypaperreview@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[friendlypaperreview@substack.com]]></itunes:email><itunes:name><![CDATA[Tim Dingman]]></itunes:name></itunes:owner><itunes:author><![CDATA[Tim Dingman]]></itunes:author><googleplay:owner><![CDATA[friendlypaperreview@substack.com]]></googleplay:owner><googleplay:email><![CDATA[friendlypaperreview@substack.com]]></googleplay:email><googleplay:author><![CDATA[Tim Dingman]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Speculative Speculative Decoding]]></title><description><![CDATA[or, Queue Management by Any Other Name]]></description><link>https://www.friendlypaperreview.com/p/speculative-speculative-decoding</link><guid isPermaLink="false">https://www.friendlypaperreview.com/p/speculative-speculative-decoding</guid><dc:creator><![CDATA[Tim Dingman]]></dc:creator><pubDate>Tue, 02 Jun 2026 13:03:33 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!G2e4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbaef83d-5770-4371-be71-d4b882d35d56_960x1239.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://arxiv.org/abs/2603.03251" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!G2e4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbaef83d-5770-4371-be71-d4b882d35d56_960x1239.jpeg 424w, https://substackcdn.com/image/fetch/$s_!G2e4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbaef83d-5770-4371-be71-d4b882d35d56_960x1239.jpeg 848w, https://substackcdn.com/image/fetch/$s_!G2e4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbaef83d-5770-4371-be71-d4b882d35d56_960x1239.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!G2e4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbaef83d-5770-4371-be71-d4b882d35d56_960x1239.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!G2e4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbaef83d-5770-4371-be71-d4b882d35d56_960x1239.jpeg" width="728" height="939.575" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bbaef83d-5770-4371-be71-d4b882d35d56_960x1239.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:1239,&quot;width&quot;:960,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Paper&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://arxiv.org/abs/2603.03251&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Paper" title="Paper" srcset="https://substackcdn.com/image/fetch/$s_!G2e4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbaef83d-5770-4371-be71-d4b882d35d56_960x1239.jpeg 424w, https://substackcdn.com/image/fetch/$s_!G2e4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbaef83d-5770-4371-be71-d4b882d35d56_960x1239.jpeg 848w, https://substackcdn.com/image/fetch/$s_!G2e4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbaef83d-5770-4371-be71-d4b882d35d56_960x1239.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!G2e4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbaef83d-5770-4371-be71-d4b882d35d56_960x1239.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p style="text-align: center;"><a href="https://arxiv.org/abs/2603.03251">Paper</a>   &#183;   <a href="https://github.com/tanishqkumar/ssd">Repo</a></p><p><em>Originally presented as a live talk on May 20, 2026</em></p><h2>Background</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!CqVZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4c5ce83-66ff-428d-afed-34d344cd61ee_1584x624.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!CqVZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4c5ce83-66ff-428d-afed-34d344cd61ee_1584x624.jpeg 424w, https://substackcdn.com/image/fetch/$s_!CqVZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4c5ce83-66ff-428d-afed-34d344cd61ee_1584x624.jpeg 848w, https://substackcdn.com/image/fetch/$s_!CqVZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4c5ce83-66ff-428d-afed-34d344cd61ee_1584x624.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!CqVZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4c5ce83-66ff-428d-afed-34d344cd61ee_1584x624.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!CqVZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4c5ce83-66ff-428d-afed-34d344cd61ee_1584x624.jpeg" width="728" height="286.7878787878788" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f4c5ce83-66ff-428d-afed-34d344cd61ee_1584x624.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:624,&quot;width&quot;:1584,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Slide 3&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Slide 3" title="Slide 3" srcset="https://substackcdn.com/image/fetch/$s_!CqVZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4c5ce83-66ff-428d-afed-34d344cd61ee_1584x624.jpeg 424w, https://substackcdn.com/image/fetch/$s_!CqVZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4c5ce83-66ff-428d-afed-34d344cd61ee_1584x624.jpeg 848w, https://substackcdn.com/image/fetch/$s_!CqVZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4c5ce83-66ff-428d-afed-34d344cd61ee_1584x624.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!CqVZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4c5ce83-66ff-428d-afed-34d344cd61ee_1584x624.jpeg 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>To understand this paper, we have to understand how a model on a GPU turns your prompt into predicted tokens.</p><p>Let&#8217;s start with what we already know from our daily use of LLMs: you pass in your prompt all at once, you wait a bit, and then you start getting back tokens one by one.</p><p>Already we can start to relate to this diagram. The part where you pass in your prompt is called <em>prefill</em>. During prefill, you are filling up the working memory of the model, what we call the <em>KV cache</em>. The diagram here kinda breaks up &#8220;KV&#8221; and &#8220;cache&#8221;, but you can see that the prompt turns into KV vectors, and then those get cached, hence &#8220;KV cache&#8221;.</p><p>So now we have all the input in our working memory. That can take a long time depending on the hardware you&#8217;re using and how big your input is, but importantly, prefill happens for all input tokens <em>in parallel</em>. On my GPU at home, prefill runs at several hundred tokens per second.</p><p>So once prefill is done, the model is ready to start making predictions. And it makes those predictions one at a time, as the diagram shows: first &#8220;jumps&#8221;, then &#8220;over&#8221;, etc etc. This stage is called <em>decode</em>, because we&#8217;re decoding the mathematical representations that the model works with into words that humans work with.</p><p>Because decode runs one token at a time, it is <em>much</em> slower than prefill. On my GPU, decode runs at something like 30 tokens per second.</p><p>Note that when we decode a token, it gets cached too - that&#8217;s why the dotted line at the top spans prefill <em>and</em> decode. So the KV cache, that working memory, grows every time the model outputs a new token.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!91no!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51cdbbcb-3cf6-4aa2-9bf9-f43555dc08d1_1219x741.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!91no!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51cdbbcb-3cf6-4aa2-9bf9-f43555dc08d1_1219x741.jpeg 424w, https://substackcdn.com/image/fetch/$s_!91no!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51cdbbcb-3cf6-4aa2-9bf9-f43555dc08d1_1219x741.jpeg 848w, https://substackcdn.com/image/fetch/$s_!91no!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51cdbbcb-3cf6-4aa2-9bf9-f43555dc08d1_1219x741.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!91no!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51cdbbcb-3cf6-4aa2-9bf9-f43555dc08d1_1219x741.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!91no!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51cdbbcb-3cf6-4aa2-9bf9-f43555dc08d1_1219x741.jpeg" width="728" height="442.5332239540607" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/51cdbbcb-3cf6-4aa2-9bf9-f43555dc08d1_1219x741.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:741,&quot;width&quot;:1219,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Slide 4&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Slide 4" title="Slide 4" srcset="https://substackcdn.com/image/fetch/$s_!91no!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51cdbbcb-3cf6-4aa2-9bf9-f43555dc08d1_1219x741.jpeg 424w, https://substackcdn.com/image/fetch/$s_!91no!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51cdbbcb-3cf6-4aa2-9bf9-f43555dc08d1_1219x741.jpeg 848w, https://substackcdn.com/image/fetch/$s_!91no!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51cdbbcb-3cf6-4aa2-9bf9-f43555dc08d1_1219x741.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!91no!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51cdbbcb-3cf6-4aa2-9bf9-f43555dc08d1_1219x741.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Now let&#8217;s look a level deeper, at the hardware. On a GPU, you have your chip and your memory, aka your VRAM. When you start up your model server, the server reads the model from your hard drive and puts it into your VRAM, right next to your chip.</p><p>When you actually <em>use</em> your model, like by sending it a prompt and getting back a response, your model server takes one layer of your model at a time from VRAM and sends it to your chip for computation. So if I&#8217;m at the very first attention layer, it&#8217;s gonna take my input and the matrices that actually make up the first attention layer and send &#8216;em to the chip for multiplication and so on. Then it takes that first attention layer back from the chip, along with the KV cache created from the computation, and it&#8217;s gonna send in the first feed-forward layer for computation.</p><p>That process of loading and computing and unloading happens over and over again until you&#8217;re at the final layer, where the chip can finally produce the predicted token. Then you gotta do the whole routine over again to predict the next token.</p><p>So this shuttling of weights to and from the chip is typically what slows you down - the constraint is your <em>memory bandwidth</em>, not the speed of your chip. If you can somehow compute multiple tokens at once in decode, like you do for prefill, then you can avoid repeating that shuttling. Sure, it costs you a tiny bit of extra time on the chip to deal with more tokens, but that&#8217;s peanuts compared to the time you saved in transit.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!A-Ik!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F002c126a-7b84-4b57-b1ba-4c3558527353_1585x766.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!A-Ik!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F002c126a-7b84-4b57-b1ba-4c3558527353_1585x766.jpeg 424w, https://substackcdn.com/image/fetch/$s_!A-Ik!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F002c126a-7b84-4b57-b1ba-4c3558527353_1585x766.jpeg 848w, https://substackcdn.com/image/fetch/$s_!A-Ik!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F002c126a-7b84-4b57-b1ba-4c3558527353_1585x766.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!A-Ik!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F002c126a-7b84-4b57-b1ba-4c3558527353_1585x766.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!A-Ik!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F002c126a-7b84-4b57-b1ba-4c3558527353_1585x766.jpeg" width="728" height="351.8283911671924" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/002c126a-7b84-4b57-b1ba-4c3558527353_1585x766.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:766,&quot;width&quot;:1585,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Slide 5&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Slide 5" title="Slide 5" srcset="https://substackcdn.com/image/fetch/$s_!A-Ik!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F002c126a-7b84-4b57-b1ba-4c3558527353_1585x766.jpeg 424w, https://substackcdn.com/image/fetch/$s_!A-Ik!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F002c126a-7b84-4b57-b1ba-4c3558527353_1585x766.jpeg 848w, https://substackcdn.com/image/fetch/$s_!A-Ik!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F002c126a-7b84-4b57-b1ba-4c3558527353_1585x766.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!A-Ik!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F002c126a-7b84-4b57-b1ba-4c3558527353_1585x766.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The question is, how do you predict multiple tokens? If predicting a token requires understanding all the tokens before it, how can you predict more than one?</p><p>The answer is in the name of the technique: &#8220;speculative decoding&#8221;. Instead of making a brand new prediction every time, you <em>speculate</em> about what the next few tokens will be - you take a guess beforehand and then check.</p><p>Speculative decoding works for the same reason prefill is faster than decode: inputs get processed in parallel. As we said before, once you load the weights onto the chip, it&#8217;s quick to do one or two or three or four calculations. As long as you have a good way to guess tokens, the model can check them all in parallel.</p><p>Of course, if the first token fails, then the other ones you guessed after it will likely be wrong and you&#8217;ll have to throw them away. But if your method for guessing tokens is cheap enough, and you&#8217;re not wrong too often, it can work out.</p><p>One common method is to have a version of the model itself make the guesses. Specifically, a much smaller version, ten or a hundred times smaller in fact, so it&#8217;s much faster and also can fit on the same GPU. This &#8220;draft model&#8221; as it&#8217;s called is not nearly as smart as the target model that is actually producing tokens, but it&#8217;s often smart enough. After all, most tokens are not incredibly complex or subtle; language is chock full of common and supporting words, and a lot of sentences are pretty mundane, meant to support the occasional novel or surprising sentence. It&#8217;s even more true for code, which demands predictable structure in a way natural language doesn&#8217;t.</p><p>As the graphic shows, the draft model quickly produces a few tokens, which all go into the target model in parallel rather than in serial. And thinking back to the last slide about where the bottleneck is, because you now have multiple tokens ready for computation, you save all that shuttling from VRAM to the chip on the second and third and fourth tokens.</p><p>In the example here, we have indeed generated four tokens, but the third one gets rejected, and the target model&#8217;s predicted token takes its place. The fourth draft token doesn&#8217;t get checked at all, because it depends on the third draft token being correct, which it wasn&#8217;t.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BGC1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9328585e-8b54-4aee-a033-ebe0e8b40594_773x444.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BGC1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9328585e-8b54-4aee-a033-ebe0e8b40594_773x444.jpeg 424w, https://substackcdn.com/image/fetch/$s_!BGC1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9328585e-8b54-4aee-a033-ebe0e8b40594_773x444.jpeg 848w, https://substackcdn.com/image/fetch/$s_!BGC1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9328585e-8b54-4aee-a033-ebe0e8b40594_773x444.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!BGC1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9328585e-8b54-4aee-a033-ebe0e8b40594_773x444.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BGC1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9328585e-8b54-4aee-a033-ebe0e8b40594_773x444.jpeg" width="728" height="418.1526520051746" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9328585e-8b54-4aee-a033-ebe0e8b40594_773x444.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:444,&quot;width&quot;:773,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Slide 6&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Slide 6" title="Slide 6" srcset="https://substackcdn.com/image/fetch/$s_!BGC1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9328585e-8b54-4aee-a033-ebe0e8b40594_773x444.jpeg 424w, https://substackcdn.com/image/fetch/$s_!BGC1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9328585e-8b54-4aee-a033-ebe0e8b40594_773x444.jpeg 848w, https://substackcdn.com/image/fetch/$s_!BGC1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9328585e-8b54-4aee-a033-ebe0e8b40594_773x444.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!BGC1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9328585e-8b54-4aee-a033-ebe0e8b40594_773x444.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Let&#8217;s zoom in on this example a bit more. We have our four speculated tokens from the draft model, and we&#8217;re going to verify them with the target model.</p><p>Specifically, we&#8217;re going to check if the probability of the speculated token for the target model is at least as high as the probability of that token from the draft model. Like in our case, the target model thought &#8220;Brown&#8221; was 93% likely, and the draft model though it was 92% likely, and since the target model is smarter, we take the increased probability as a sign that the draft model was pointing us in the right direction. Similar story for &#8220;Fox&#8221;.</p><p>But for &#8220;Hopped&#8221;, the draft model was more confident in that token than the target model was. That&#8217;s a bad sign and means we should reject the draft model&#8217;s choice.</p><p>Incidentally, when the target model rejects the third token, it substitutes its own - in this case it&#8217;s the word &#8220;jumped&#8221;. That extra token you get from the target model when it rejects the token from the draft model is called a &#8220;bonus token&#8221;, because you get it &#8220;for free&#8221; in the process of verification. If you&#8217;re really lucky and all your speculated tokens get approved, you get a bonus token after <em>that</em>, directly from the target model. Like if all four tokens had been right in this example, we also would have gotten a fifth token as well, with virtually no extra effort.</p><p>Now as you might imagine, the draft model is going to be better at predicting some tokens than others. Like on a hard reasoning problem, the acceptance rate will be quite low, maybe like 25%. But on a more structured and straightforward task, like using a web search tool, it could be near 100%. So speculative decoding isn&#8217;t a complete across-the-board speedup, but for a lot of mundane LLM uses it&#8217;s helpful. It&#8217;s really an empirical question depending on your use cases, your hardware, what model you&#8217;re using, stuff like that. Like me personally, on my hardware and for my Hermes Agent, speculative decoding 4 tokens at a time has been helpful and fits on my GPUs.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mN6h!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a192c74-4ffb-437b-8d3c-83a6aaca01c0_1584x696.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mN6h!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a192c74-4ffb-437b-8d3c-83a6aaca01c0_1584x696.jpeg 424w, https://substackcdn.com/image/fetch/$s_!mN6h!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a192c74-4ffb-437b-8d3c-83a6aaca01c0_1584x696.jpeg 848w, https://substackcdn.com/image/fetch/$s_!mN6h!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a192c74-4ffb-437b-8d3c-83a6aaca01c0_1584x696.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!mN6h!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a192c74-4ffb-437b-8d3c-83a6aaca01c0_1584x696.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mN6h!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a192c74-4ffb-437b-8d3c-83a6aaca01c0_1584x696.jpeg" width="728" height="319.8787878787879" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3a192c74-4ffb-437b-8d3c-83a6aaca01c0_1584x696.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:696,&quot;width&quot;:1584,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Slide 7&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Slide 7" title="Slide 7" srcset="https://substackcdn.com/image/fetch/$s_!mN6h!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a192c74-4ffb-437b-8d3c-83a6aaca01c0_1584x696.jpeg 424w, https://substackcdn.com/image/fetch/$s_!mN6h!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a192c74-4ffb-437b-8d3c-83a6aaca01c0_1584x696.jpeg 848w, https://substackcdn.com/image/fetch/$s_!mN6h!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a192c74-4ffb-437b-8d3c-83a6aaca01c0_1584x696.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!mN6h!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a192c74-4ffb-437b-8d3c-83a6aaca01c0_1584x696.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I should add that there are other forms of speculative decoding, or of trying to predict multiple tokens in one go anyway. We&#8217;ll briefly see one called EAGLE-3 in the paper for example.</p><p>Another example, which we&#8217;re looking at here, is literally called &#8220;multi-token prediction&#8221; or MTP. The difference with MTP is that is has to be part of a model&#8217;s training from the get-go, it&#8217;s not an external enhancement.</p><p>You can see it right there along the top, at the boxes labeled &#8220;Cross-Entropy Loss&#8221;. The loss is the single number that tells you how well your model is doing. In a normal model, your pretraining loss is based on how likely you predicted the actual next token in the training data would be. That&#8217;s the first box along the top, it has an arrow pointing to L_Main - that&#8217;s the symbol for loss.</p><p>But here in MTP, there are multiple losses! As you continue along the top, you&#8217;ll see L_MTP^1 and L_MTP^2 - the losses for predicting the first and second of the multiple tokens. So now your loss is from the normal token, the first MTP token, and the second MTP token. And if you do training right and minimize the overall loss, you can get pretty good at predicting multiple tokens.</p><p>As with so many thing, MTP is an invention of the DeepSeek crew, and all their models since V3 have had it. More recently, Gemma 4 and Qwen3.5 got it, which has been great for local LLM users.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!oNcd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecddb9df-5209-423c-aeaf-78fffb676600_818x818.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!oNcd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecddb9df-5209-423c-aeaf-78fffb676600_818x818.jpeg 424w, https://substackcdn.com/image/fetch/$s_!oNcd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecddb9df-5209-423c-aeaf-78fffb676600_818x818.jpeg 848w, https://substackcdn.com/image/fetch/$s_!oNcd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecddb9df-5209-423c-aeaf-78fffb676600_818x818.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!oNcd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecddb9df-5209-423c-aeaf-78fffb676600_818x818.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!oNcd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecddb9df-5209-423c-aeaf-78fffb676600_818x818.jpeg" width="728" height="728" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ecddb9df-5209-423c-aeaf-78fffb676600_818x818.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:818,&quot;width&quot;:818,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Slide 8&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Slide 8" title="Slide 8" srcset="https://substackcdn.com/image/fetch/$s_!oNcd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecddb9df-5209-423c-aeaf-78fffb676600_818x818.jpeg 424w, https://substackcdn.com/image/fetch/$s_!oNcd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecddb9df-5209-423c-aeaf-78fffb676600_818x818.jpeg 848w, https://substackcdn.com/image/fetch/$s_!oNcd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecddb9df-5209-423c-aeaf-78fffb676600_818x818.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!oNcd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecddb9df-5209-423c-aeaf-78fffb676600_818x818.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Lastly, one thing I want to remind folks of is what LLMs actually predict. It&#8217;s not just one token, it&#8217;s actually a <em>distribution</em> of tokens, each with its own likelihood.</p><p>Technically the model predicts a likelihood for every single token in its vocabulary, which is usually in the high tens or low hundreds of thousands. Of course nearly all of them will receive virtually 0% odds, so in practice you end up with something more like the distribution here, where a small number sum to almost 100% probability. The shape of the distribution, and how far it extends out, can vary a lot and will be important later on. For now though, just remember that the model usually has a few thoughts on what could or should come next.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.friendlypaperreview.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">One ML paper a week, accessible to non-ML audiences. Subscribe to FPR today</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>The Paper</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rlMZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb014dd4-5063-4cbc-ba61-4b995aaaeff5_1232x690.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rlMZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb014dd4-5063-4cbc-ba61-4b995aaaeff5_1232x690.jpeg 424w, https://substackcdn.com/image/fetch/$s_!rlMZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb014dd4-5063-4cbc-ba61-4b995aaaeff5_1232x690.jpeg 848w, https://substackcdn.com/image/fetch/$s_!rlMZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb014dd4-5063-4cbc-ba61-4b995aaaeff5_1232x690.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!rlMZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb014dd4-5063-4cbc-ba61-4b995aaaeff5_1232x690.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rlMZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb014dd4-5063-4cbc-ba61-4b995aaaeff5_1232x690.jpeg" width="728" height="407.72727272727275" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bb014dd4-5063-4cbc-ba61-4b995aaaeff5_1232x690.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:690,&quot;width&quot;:1232,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Slide 10&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Slide 10" title="Slide 10" srcset="https://substackcdn.com/image/fetch/$s_!rlMZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb014dd4-5063-4cbc-ba61-4b995aaaeff5_1232x690.jpeg 424w, https://substackcdn.com/image/fetch/$s_!rlMZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb014dd4-5063-4cbc-ba61-4b995aaaeff5_1232x690.jpeg 848w, https://substackcdn.com/image/fetch/$s_!rlMZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb014dd4-5063-4cbc-ba61-4b995aaaeff5_1232x690.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!rlMZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb014dd4-5063-4cbc-ba61-4b995aaaeff5_1232x690.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>So now that we&#8217;re equipped with the knowledge of speculative decoding, and other techniques like it, we can finally discuss speculative speculative decoding.</p><p>For reference, we have SD on the left. Our draft model speculates a few tokens in blue, our target verifies the first one but rejects the second and third, and in the process produces a bonus token, in yellow. A line of connected dots means a sequence of tokens, so we see the end result is two new tokens. Then the cycle begins again, with the next speculated sequence in red.</p><p>You&#8217;ll notice the draft and target turns happen in serial - one waits while the other works. Surely we could make use of the draft model while the target model is working, right? Like the draft model is fast, it could produce lots of tokens in the time it takes the target model to verify.</p><p>There are two problems though. One is that the target model is using up all the memory bandwidth while it&#8217;s working. So at least on the same GPU, there is no way for the draft model to work in parallel. That&#8217;s fixable with another GPU of course.</p><p>The second, bigger problem is that the draft model depends on having all the prior tokens available. If the target model isn&#8217;t done verifying, then we don&#8217;t know all the prior tokens yet! We know the prior <em>speculated</em> tokens, but of course not all the speculated tokens will be right. Even worse, the bonus token could in theory be any token in the vocabulary. What are we supposed to do?</p><p>Well as you probably guessed, there is a solution, shown on the right. Let me walk through it.</p><p>First off, you&#8217;ll see there are now two parallel paths, separated by a little dotted line down the middle. That&#8217;s showing the target model on one GPU and the draft model on another GPU. So they can work simultaneously.</p><p>Now the sequence of events is this:</p><ol><li><p>The target produces a token, shown in green</p></li><li><p>The draft model returns a few speculated tokens</p></li><li><p>The target model starts verifying. Relatively speaking, that&#8217;s gonna take a while</p></li><li><p>On the draft side, we start preparing contingencies, shown as this kinda branching thing with tones of red and yellow. Specifically, at each token position, we are guessing what <em>bonus</em> tokens the target will produce in case the next <em>speculated</em> token gets rejected. So even for the first token, the green one from the target, we could get the first speculated token rejected. So now we have to be prepared for whatever bonus token in yellow the target model will produce</p></li><li><p>Since we have our bonus tokens predicted, we might as well speculate on what comes after <em>those</em>. Again, that&#8217;s the chains in red. So as long at the actual bonus token is one we anticipated and speculated on, we&#8217;re ready to immediately return that new speculated sequence</p></li><li><p>In this example, the target model verified only the first speculated token, then it produced a bonus token, and that bonus token was one of the three we anticipated. So the draft model immediately returned a new speculated sequence, in that medium shade of red. It&#8217;s a little hard to see, but in the tree thing they bolded the line showing the sequence of events</p></li></ol><p>That&#8217;s basically it! You now understand the core concept of SSD. Now we look at optimization and performance.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!woV7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9f5535d-4464-4b47-8a26-3806440a2bde_924x684.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!woV7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9f5535d-4464-4b47-8a26-3806440a2bde_924x684.jpeg 424w, https://substackcdn.com/image/fetch/$s_!woV7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9f5535d-4464-4b47-8a26-3806440a2bde_924x684.jpeg 848w, https://substackcdn.com/image/fetch/$s_!woV7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9f5535d-4464-4b47-8a26-3806440a2bde_924x684.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!woV7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9f5535d-4464-4b47-8a26-3806440a2bde_924x684.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!woV7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9f5535d-4464-4b47-8a26-3806440a2bde_924x684.jpeg" width="728" height="538.9090909090909" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a9f5535d-4464-4b47-8a26-3806440a2bde_924x684.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:684,&quot;width&quot;:924,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Slide 11&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Slide 11" title="Slide 11" srcset="https://substackcdn.com/image/fetch/$s_!woV7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9f5535d-4464-4b47-8a26-3806440a2bde_924x684.jpeg 424w, https://substackcdn.com/image/fetch/$s_!woV7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9f5535d-4464-4b47-8a26-3806440a2bde_924x684.jpeg 848w, https://substackcdn.com/image/fetch/$s_!woV7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9f5535d-4464-4b47-8a26-3806440a2bde_924x684.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!woV7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9f5535d-4464-4b47-8a26-3806440a2bde_924x684.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>So the big catch on this method is that you have to guess the bonus token, or at least guess it enough of the time that your extra effort isn&#8217;t wasted.</p><p>On its face, that might seem tough, since the bonus token can in theory be any token in the vocabulary. But of course in practice, it&#8217;s not a random choice, and the whole idea of a draft model is it knows how to make reasonable guesses at what the target model will say.</p><p>As we know, LLMs actually produce a distribution of tokens, so we have a straightforward way to guess bonus tokens from the start: pick the most likely token in the distribution as the one to speculate, and save a few others for your guesses at the bonus token, in case the speculated token was wrong.</p><p>On the surface that seems fine. But because of how verification works, it&#8217;s actually a big problem if your other guesses are overconfident, which they tend to be for draft models. Intuitively, since draft models are much smaller than target models, draft model distributions tend to be smaller and thus put a lot of probability on the few tokens they think could be right. By contrast, the smarter target models have wider distributions, accounting for genuinely different thoughts but also things like richer vocab.</p><p>So one key to making this work is to fiddle with the raw distributions of the draft model to spread probabilities out more, take out that overconfidence and just hedge a bit. That makes your bonus tokens a better safety net.</p><p>The other optimization they work on is how much speculating to do, and where exactly to do it. Like if you had all the time in the world, you would pick tons of bonus tokens at every position in the speculation. But in reality we only have the time it takes for the target model to verify, so the draft model needs to make the best use of it. How exactly they figure out the optimal number of bonus tokens per position is too far in the weeds for this paper, but you can see in the graphic that different positions have different numbers.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!IvD0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14a12b6f-555d-4dd2-87e6-e3d5a0167838_852x814.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IvD0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14a12b6f-555d-4dd2-87e6-e3d5a0167838_852x814.jpeg 424w, https://substackcdn.com/image/fetch/$s_!IvD0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14a12b6f-555d-4dd2-87e6-e3d5a0167838_852x814.jpeg 848w, https://substackcdn.com/image/fetch/$s_!IvD0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14a12b6f-555d-4dd2-87e6-e3d5a0167838_852x814.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!IvD0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14a12b6f-555d-4dd2-87e6-e3d5a0167838_852x814.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IvD0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14a12b6f-555d-4dd2-87e6-e3d5a0167838_852x814.jpeg" width="728" height="695.5305164319249" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/14a12b6f-555d-4dd2-87e6-e3d5a0167838_852x814.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:814,&quot;width&quot;:852,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Slide 12&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Slide 12" title="Slide 12" srcset="https://substackcdn.com/image/fetch/$s_!IvD0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14a12b6f-555d-4dd2-87e6-e3d5a0167838_852x814.jpeg 424w, https://substackcdn.com/image/fetch/$s_!IvD0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14a12b6f-555d-4dd2-87e6-e3d5a0167838_852x814.jpeg 848w, https://substackcdn.com/image/fetch/$s_!IvD0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14a12b6f-555d-4dd2-87e6-e3d5a0167838_852x814.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!IvD0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14a12b6f-555d-4dd2-87e6-e3d5a0167838_852x814.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Okay, now let&#8217;s look at performance. Our target model here is Llama 3.1 70B, and the draft model is Llama 3.2 1B.</p><p>Here they compare three methods of decoding, of producing tokens:</p><ol><li><p>Autoregressive, which is the baseline of just one token at a time</p></li><li><p>Speculative decoding, which runs draft and target models in serial</p></li><li><p>Speculative speculative decoding, which runs draft and target models in parallel</p></li></ol><p>As you&#8217;d expect, SSD wins out, at 4x the speed of AR, and almost twice the speed of SD when using vLLM, which is a very popular LLM server program. SGLang is a relative newcomer and apparently does better with SD than vLLM does, but still, SSD is about a third better.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!X7Vx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed2f72e5-191a-4e9f-ab5a-6384db86c343_1246x668.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!X7Vx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed2f72e5-191a-4e9f-ab5a-6384db86c343_1246x668.jpeg 424w, https://substackcdn.com/image/fetch/$s_!X7Vx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed2f72e5-191a-4e9f-ab5a-6384db86c343_1246x668.jpeg 848w, https://substackcdn.com/image/fetch/$s_!X7Vx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed2f72e5-191a-4e9f-ab5a-6384db86c343_1246x668.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!X7Vx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed2f72e5-191a-4e9f-ab5a-6384db86c343_1246x668.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!X7Vx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed2f72e5-191a-4e9f-ab5a-6384db86c343_1246x668.jpeg" width="728" height="390.29213483146066" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ed2f72e5-191a-4e9f-ab5a-6384db86c343_1246x668.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:668,&quot;width&quot;:1246,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Slide 13&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Slide 13" title="Slide 13" srcset="https://substackcdn.com/image/fetch/$s_!X7Vx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed2f72e5-191a-4e9f-ab5a-6384db86c343_1246x668.jpeg 424w, https://substackcdn.com/image/fetch/$s_!X7Vx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed2f72e5-191a-4e9f-ab5a-6384db86c343_1246x668.jpeg 848w, https://substackcdn.com/image/fetch/$s_!X7Vx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed2f72e5-191a-4e9f-ab5a-6384db86c343_1246x668.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!X7Vx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed2f72e5-191a-4e9f-ab5a-6384db86c343_1246x668.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>It&#8217;s important to note that the efficacy of speculative decoding, and other techniques like MTP, vary by domain and difficulty.</p><p>Like a very easy programming problem is gonna be super predictable, because it has lots of structured language and the draft model&#8217;s guesses are gonna be pretty good, because the problem is easy and because programming is a very common use of LLMs.</p><p>By contrast, SD will probably not work that well on high-minded creative writing. The draft model is gonna have bad guesses, and there&#8217;s just not nearly as much of that stuff in the training data in the first place.</p><p>So to that end, it&#8217;s important to measure speed gains on different benchmarks. Here we have one code, two chat, and one math benchmark, respectively. As we would expect, the programming and math benchmarks show higher speed gains.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DJlW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e859763-b519-4249-8563-6eadc510e017_1584x508.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DJlW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e859763-b519-4249-8563-6eadc510e017_1584x508.jpeg 424w, https://substackcdn.com/image/fetch/$s_!DJlW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e859763-b519-4249-8563-6eadc510e017_1584x508.jpeg 848w, https://substackcdn.com/image/fetch/$s_!DJlW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e859763-b519-4249-8563-6eadc510e017_1584x508.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!DJlW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e859763-b519-4249-8563-6eadc510e017_1584x508.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DJlW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e859763-b519-4249-8563-6eadc510e017_1584x508.jpeg" width="728" height="233.4747474747475" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2e859763-b519-4249-8563-6eadc510e017_1584x508.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:508,&quot;width&quot;:1584,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Slide 14&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Slide 14" title="Slide 14" srcset="https://substackcdn.com/image/fetch/$s_!DJlW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e859763-b519-4249-8563-6eadc510e017_1584x508.jpeg 424w, https://substackcdn.com/image/fetch/$s_!DJlW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e859763-b519-4249-8563-6eadc510e017_1584x508.jpeg 848w, https://substackcdn.com/image/fetch/$s_!DJlW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e859763-b519-4249-8563-6eadc510e017_1584x508.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!DJlW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2e859763-b519-4249-8563-6eadc510e017_1584x508.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>On a related note, temperature also impacts SSD performance. It&#8217;s the same intuition, that less predictable text means a lower hit rate for SSD. Higher temperature flattens out the probability distribution and make guessing harder. That&#8217;s also why people conflate high temp with creativity, because it makes unlikely tokens relatively more likely, thus creating surprises.</p><h2>My Takeaways</h2><ul><li><p>There is simultaneously a compute shortage and compute overhang</p><ul><li><p>Lots of efforts to squeeze more performance out of existing hardware</p></li></ul></li><li><p>This is a &#8220;yes and&#8221; approach</p><ul><li><p>Innovation will continue on the other constraints (e.g. memory bandwidth)</p></li></ul></li><li><p>ML infra is still in its infancy</p><ul><li><p>People have only been serving LLMs at scale for a few years</p></li><li><p>Innovations in architecture (e.g. MoE) and inference (e.g. reasoning/TTC scaling) will present different avenues for optimization</p></li></ul></li><li><p>I am waiting for AI to discover something like this</p><ul><li><p>Anyone working on RSI will hit this stuff at some point</p></li><li><p>Great test of creativity in the wild</p></li></ul></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.friendlypaperreview.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">One ML paper a week, accessible to non-ML audiences. Subscribe to FPR today</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Embarrassingly Simple Self-Distillation Improves Code Generation]]></title><description><![CDATA[or, How to Train Your Distribution]]></description><link>https://www.friendlypaperreview.com/p/embarrassingly-simple-self-distillation</link><guid isPermaLink="false">https://www.friendlypaperreview.com/p/embarrassingly-simple-self-distillation</guid><dc:creator><![CDATA[Tim Dingman]]></dc:creator><pubDate>Mon, 01 Jun 2026 19:19:23 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!NOZw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F027ba524-7698-449d-9622-36d277406153_1160x1500.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://arxiv.org/abs/2604.01193" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NOZw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F027ba524-7698-449d-9622-36d277406153_1160x1500.jpeg 424w, https://substackcdn.com/image/fetch/$s_!NOZw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F027ba524-7698-449d-9622-36d277406153_1160x1500.jpeg 848w, https://substackcdn.com/image/fetch/$s_!NOZw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F027ba524-7698-449d-9622-36d277406153_1160x1500.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!NOZw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F027ba524-7698-449d-9622-36d277406153_1160x1500.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NOZw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F027ba524-7698-449d-9622-36d277406153_1160x1500.jpeg" width="728" height="941.3793103448276" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/027ba524-7698-449d-9622-36d277406153_1160x1500.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:1500,&quot;width&quot;:1160,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Paper&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://arxiv.org/abs/2604.01193&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Paper" title="Paper" srcset="https://substackcdn.com/image/fetch/$s_!NOZw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F027ba524-7698-449d-9622-36d277406153_1160x1500.jpeg 424w, https://substackcdn.com/image/fetch/$s_!NOZw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F027ba524-7698-449d-9622-36d277406153_1160x1500.jpeg 848w, https://substackcdn.com/image/fetch/$s_!NOZw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F027ba524-7698-449d-9622-36d277406153_1160x1500.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!NOZw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F027ba524-7698-449d-9622-36d277406153_1160x1500.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p style="text-align: center;"><a href="https://arxiv.org/abs/2604.01193">Paper</a>   &#183;   <a href="https://github.com/apple/ml-ssd">Repo</a></p><p><em>Originally presented as a live talk on May 27, 2026</em></p><h2>Background</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!E6AO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb2bb392-9f8e-41df-929f-f7fa4cdac308_828x828.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!E6AO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb2bb392-9f8e-41df-929f-f7fa4cdac308_828x828.jpeg 424w, https://substackcdn.com/image/fetch/$s_!E6AO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb2bb392-9f8e-41df-929f-f7fa4cdac308_828x828.jpeg 848w, https://substackcdn.com/image/fetch/$s_!E6AO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb2bb392-9f8e-41df-929f-f7fa4cdac308_828x828.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!E6AO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb2bb392-9f8e-41df-929f-f7fa4cdac308_828x828.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!E6AO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb2bb392-9f8e-41df-929f-f7fa4cdac308_828x828.jpeg" width="728" height="728" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/db2bb392-9f8e-41df-929f-f7fa4cdac308_828x828.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:828,&quot;width&quot;:828,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Slide 3&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Slide 3" title="Slide 3" srcset="https://substackcdn.com/image/fetch/$s_!E6AO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb2bb392-9f8e-41df-929f-f7fa4cdac308_828x828.jpeg 424w, https://substackcdn.com/image/fetch/$s_!E6AO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb2bb392-9f8e-41df-929f-f7fa4cdac308_828x828.jpeg 848w, https://substackcdn.com/image/fetch/$s_!E6AO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb2bb392-9f8e-41df-929f-f7fa4cdac308_828x828.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!E6AO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb2bb392-9f8e-41df-929f-f7fa4cdac308_828x828.jpeg 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>So to really understand this paper, we need to know a lot about how LLMs predict the next token.</p><p>We spend a lot of time talking about the three main parts of the transformer:</p><ol><li><p>Embeddings, which turn tokens into vectors</p></li><li><p>Attention, which forms a holistic understanding of the input</p></li><li><p>Feed-forward, which processes or &#8220;thinks about&#8221; that holistic understanding</p></li></ol><p>What we typically neglect though is what happens at the end of the transformer, after all N transformer blocks. What is that last step that turns the output of the last feed-forward layer into a token? The way it&#8217;s shown here, it make it seem like the last feed-forward layer just outputs tokens.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wAyM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e988d47-7007-40aa-bef8-292831a69522_884x884.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wAyM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e988d47-7007-40aa-bef8-292831a69522_884x884.jpeg 424w, https://substackcdn.com/image/fetch/$s_!wAyM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e988d47-7007-40aa-bef8-292831a69522_884x884.jpeg 848w, https://substackcdn.com/image/fetch/$s_!wAyM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e988d47-7007-40aa-bef8-292831a69522_884x884.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!wAyM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e988d47-7007-40aa-bef8-292831a69522_884x884.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wAyM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e988d47-7007-40aa-bef8-292831a69522_884x884.jpeg" width="728" height="728" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9e988d47-7007-40aa-bef8-292831a69522_884x884.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:884,&quot;width&quot;:884,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Slide 4&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Slide 4" title="Slide 4" srcset="https://substackcdn.com/image/fetch/$s_!wAyM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e988d47-7007-40aa-bef8-292831a69522_884x884.jpeg 424w, https://substackcdn.com/image/fetch/$s_!wAyM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e988d47-7007-40aa-bef8-292831a69522_884x884.jpeg 848w, https://substackcdn.com/image/fetch/$s_!wAyM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e988d47-7007-40aa-bef8-292831a69522_884x884.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!wAyM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e988d47-7007-40aa-bef8-292831a69522_884x884.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In fact, that&#8217;s not the case. After all your transformer blocks, you need some way to turn that final transformer output into whatever it is your model is supposed to output. In the case of an LLM, that&#8217;s going to be language.</p><p>So we attach and train a language modeling head, or LM head for short. The end result of the LM head is not a <em>single</em> token, but a probability for <em>all</em> tokens in the model&#8217;s vocabulary, which usually contains around 100k tokens.</p><p>Now the vast majority of them are going to be zero or near-zero probability. But depending on the context, you could have kind of a long tail of possible outcomes. Of course in other cases it&#8217;s going to be pretty certain, like if the sentence says &#8220;The capital of France is&#8221;, the token &#8220;Paris&#8221; is gonna get 99.9% probability or something.</p><p>So that probability distribution is what the LLM produces. But how do we get from a probability distribution to a single, selected token?</p><p>There are some simple ways you could come up with, like taking the most likely one every time, or picking randomly based on their probabilities. And people do both.</p><p>But there&#8217;s a lot more to it than that. And how exactly you <em>sample</em> from this distribution can be a surprisingly big deal. So we need to take a closer look at the mechanics.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fUxq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53d0b3f1-107f-47c8-b4a5-f2c10e00638a_1584x590.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fUxq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53d0b3f1-107f-47c8-b4a5-f2c10e00638a_1584x590.jpeg 424w, https://substackcdn.com/image/fetch/$s_!fUxq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53d0b3f1-107f-47c8-b4a5-f2c10e00638a_1584x590.jpeg 848w, https://substackcdn.com/image/fetch/$s_!fUxq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53d0b3f1-107f-47c8-b4a5-f2c10e00638a_1584x590.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!fUxq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53d0b3f1-107f-47c8-b4a5-f2c10e00638a_1584x590.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fUxq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53d0b3f1-107f-47c8-b4a5-f2c10e00638a_1584x590.jpeg" width="728" height="271.16161616161617" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/53d0b3f1-107f-47c8-b4a5-f2c10e00638a_1584x590.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:590,&quot;width&quot;:1584,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Slide 5&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Slide 5" title="Slide 5" srcset="https://substackcdn.com/image/fetch/$s_!fUxq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53d0b3f1-107f-47c8-b4a5-f2c10e00638a_1584x590.jpeg 424w, https://substackcdn.com/image/fetch/$s_!fUxq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53d0b3f1-107f-47c8-b4a5-f2c10e00638a_1584x590.jpeg 848w, https://substackcdn.com/image/fetch/$s_!fUxq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53d0b3f1-107f-47c8-b4a5-f2c10e00638a_1584x590.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!fUxq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53d0b3f1-107f-47c8-b4a5-f2c10e00638a_1584x590.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>So broadly, the knobs to twiddle for sampling from your LLM&#8217;s probability distribution are called &#8220;inference parameters&#8221;. Some other settings are inference parameters too, like the maximum number of tokens per response, but a lot of them are about shaping or sampling from the distribution at each position, each time you need a new token.</p><p>The most familiar one is temperature. If you&#8217;ve heard of it, you probably heard what it <em>impacts</em>, like creativity or predictability. That&#8217;s true, but it will serve us better to understand the math directly.</p><p>Temperature is a way to fiddle with the distribution <em>before</em> you sample it. When temperature is 1, you get the unaltered distribution, shown in the middle here. 1 is actually considered a high temperature, which might be surprising given the mathematical impact is to do nothing - like you&#8217;d naively expect that to be the baseline or neutral temperature.</p><p>The lower the temperature goes, the more uneven things get. So the most likely token gets more likely, and everything else gets less likely, sometimes dramatically so. In the extreme case, T = 0, the most likely token from the original distribution is the only token in the new distribution, so you always pick it. A lot of evals run at T = 0 because it makes results reproducible. Like if you always pick the most likely token, then given the same input you should always get the same sequence of tokens as outputs.</p><p>On the flip side, raising T past 1 flattens things out. In the extreme case where T approaches infinity, all tokens are equally likely. Practically speaking you of course would not want that, but even a temp like 1.5 shown here is considered pretty high.</p><p>You can see where the conflation of temperature and creativity comes from - if creativity means making unusual or unexpected choices, raising the temperature means choices that started unlikely are now relatively more likely.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!vh1m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367788e4-d3f7-4e44-8309-1394a5d94b5c_984x590.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vh1m!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367788e4-d3f7-4e44-8309-1394a5d94b5c_984x590.jpeg 424w, https://substackcdn.com/image/fetch/$s_!vh1m!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367788e4-d3f7-4e44-8309-1394a5d94b5c_984x590.jpeg 848w, https://substackcdn.com/image/fetch/$s_!vh1m!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367788e4-d3f7-4e44-8309-1394a5d94b5c_984x590.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!vh1m!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367788e4-d3f7-4e44-8309-1394a5d94b5c_984x590.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vh1m!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367788e4-d3f7-4e44-8309-1394a5d94b5c_984x590.jpeg" width="728" height="436.5040650406504" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/367788e4-d3f7-4e44-8309-1394a5d94b5c_984x590.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:590,&quot;width&quot;:984,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Slide 6&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Slide 6" title="Slide 6" srcset="https://substackcdn.com/image/fetch/$s_!vh1m!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367788e4-d3f7-4e44-8309-1394a5d94b5c_984x590.jpeg 424w, https://substackcdn.com/image/fetch/$s_!vh1m!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367788e4-d3f7-4e44-8309-1394a5d94b5c_984x590.jpeg 848w, https://substackcdn.com/image/fetch/$s_!vh1m!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367788e4-d3f7-4e44-8309-1394a5d94b5c_984x590.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!vh1m!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367788e4-d3f7-4e44-8309-1394a5d94b5c_984x590.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>So once you have your temperature set, there are a few other ways you can alter your distribution.</p><p>One of the most common ways is to keep just the top few token choices, drop the rest, and redistribute that dropped probability equally amongst the token choices you kept. That&#8217;s called &#8220;top-k&#8221;. In this example, k=2, so we kept the top two and dropped the rest and gave their probability mass to the two we kept.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PflP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01b17e5b-e4fa-43b0-b934-8efdbe80beea_984x590.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PflP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01b17e5b-e4fa-43b0-b934-8efdbe80beea_984x590.jpeg 424w, https://substackcdn.com/image/fetch/$s_!PflP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01b17e5b-e4fa-43b0-b934-8efdbe80beea_984x590.jpeg 848w, https://substackcdn.com/image/fetch/$s_!PflP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01b17e5b-e4fa-43b0-b934-8efdbe80beea_984x590.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!PflP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01b17e5b-e4fa-43b0-b934-8efdbe80beea_984x590.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PflP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01b17e5b-e4fa-43b0-b934-8efdbe80beea_984x590.jpeg" width="728" height="436.5040650406504" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/01b17e5b-e4fa-43b0-b934-8efdbe80beea_984x590.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:590,&quot;width&quot;:984,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Slide 7&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Slide 7" title="Slide 7" srcset="https://substackcdn.com/image/fetch/$s_!PflP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01b17e5b-e4fa-43b0-b934-8efdbe80beea_984x590.jpeg 424w, https://substackcdn.com/image/fetch/$s_!PflP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01b17e5b-e4fa-43b0-b934-8efdbe80beea_984x590.jpeg 848w, https://substackcdn.com/image/fetch/$s_!PflP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01b17e5b-e4fa-43b0-b934-8efdbe80beea_984x590.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!PflP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F01b17e5b-e4fa-43b0-b934-8efdbe80beea_984x590.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Another common choice is top-p, where you set a cumulative probability and drop everything below that. So let&#8217;s say I picked p=0.95, meaning I want to keep adding tokens to my retained set until I cross 95% total probability. As soon as I cross that threshold, I drop the remaining tokens and again redistribute the probability mass.</p><p>You can do both by the way. Like you take your original distribution, do top-k so you only keep the top 10 or whatever, redistribute probability mass, and then do top-p. Depending on the shape of your distribution, doing one, the other, or both can make a big difference.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!HQiI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e6c183b-9ed4-4fab-97fa-460be72919d3_1565x884.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HQiI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e6c183b-9ed4-4fab-97fa-460be72919d3_1565x884.jpeg 424w, https://substackcdn.com/image/fetch/$s_!HQiI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e6c183b-9ed4-4fab-97fa-460be72919d3_1565x884.jpeg 848w, https://substackcdn.com/image/fetch/$s_!HQiI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e6c183b-9ed4-4fab-97fa-460be72919d3_1565x884.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!HQiI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e6c183b-9ed4-4fab-97fa-460be72919d3_1565x884.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HQiI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e6c183b-9ed4-4fab-97fa-460be72919d3_1565x884.jpeg" width="728" height="411.2153354632588" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8e6c183b-9ed4-4fab-97fa-460be72919d3_1565x884.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:884,&quot;width&quot;:1565,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Slide 8&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Slide 8" title="Slide 8" srcset="https://substackcdn.com/image/fetch/$s_!HQiI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e6c183b-9ed4-4fab-97fa-460be72919d3_1565x884.jpeg 424w, https://substackcdn.com/image/fetch/$s_!HQiI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e6c183b-9ed4-4fab-97fa-460be72919d3_1565x884.jpeg 848w, https://substackcdn.com/image/fetch/$s_!HQiI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e6c183b-9ed4-4fab-97fa-460be72919d3_1565x884.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!HQiI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e6c183b-9ed4-4fab-97fa-460be72919d3_1565x884.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Now once you have your final distribution, after your rebalancing with temperature and truncating with top-k and top-p, you may want to characterize it, to describe it in a few metrics.</p><p>You&#8217;re likely familiar with the basic stats properties like the mean, the median, the standard deviation and so on. Maybe you even know fancier terms like skewness or kurtosis.</p><p>But for LLMs, the key metric for these next token probability distributions is the <em>entropy</em>.</p><p>The term originally comes from physics, specifically thermodynamics, where it measures disorder or randomness. From that framing it seems bad, and if you already have some baggage about the term like I once did, you&#8217;ll have to put it down for the LLM context.</p><p>The term gains a positive valence in information theory, where Claude Shannon adapted it. Here, entropy correlates with the amount of information a message carries, relative to a certain context. So for example if you have a weighted coin that always lands heads, then you&#8217;re never going to be surprised when I tell you the coin came up heads. But if I have a fair coin, then you&#8217;ll always be relatively surprised, since a priori you have no reason to believe heads vs tails. So in information theory, higher entropy is better, since it means the information is more valuable.</p><p>Let&#8217;s take that information theory understanding and apply it to LLMs. If my text so far is &#8220;The coin flip landed&#8221;, most of the probability is going to be split evenly between &#8220;heads&#8221; and &#8220;tails&#8221;, with a few other small terms like &#8220;on&#8221; or &#8220;near&#8221;. If we ignore those and just give 50% odds to &#8220;heads&#8221; and &#8220;tails&#8221;, then we get .5 for p(x). If you use base 2 for the log, which is customary, you get H = 1. The unit when using base 2 is bits, so that&#8217;s 1 bit of entropy.</p><p>If we work out the math for my example in the previous charts, we see more entropy, about 1.44 bits, since it contains more possibilities, but there is a clear favorite. If all four had the same odds, H would increase to 2 bits. If we had eight options all with the same odds, H would increase to 3 bits. Entropy is rising because we&#8217;re adding information about what could realistically happen.</p><p>Conversely, if one token had 97% odds and the rest just had 1% each, H would be about 0.24 bits, way lower than before, because the distribution would barely mean anything - you&#8217;re almost guaranteed to get the same token every time.</p><p>So we can measure the entropy at each position in our response. Some positions will have low entropy, where it was really predictable and basically guaranteed what token we would get at that position. Other positions will have high entropy, where it seems the model had a genuine choice to make and considered many viable options.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rvgB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F806c81ae-77c9-4560-962f-da53f06e5be0_1322x556.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rvgB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F806c81ae-77c9-4560-962f-da53f06e5be0_1322x556.jpeg 424w, https://substackcdn.com/image/fetch/$s_!rvgB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F806c81ae-77c9-4560-962f-da53f06e5be0_1322x556.jpeg 848w, https://substackcdn.com/image/fetch/$s_!rvgB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F806c81ae-77c9-4560-962f-da53f06e5be0_1322x556.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!rvgB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F806c81ae-77c9-4560-962f-da53f06e5be0_1322x556.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rvgB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F806c81ae-77c9-4560-962f-da53f06e5be0_1322x556.jpeg" width="728" height="306.178517397882" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/806c81ae-77c9-4560-962f-da53f06e5be0_1322x556.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:556,&quot;width&quot;:1322,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Slide 9&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Slide 9" title="Slide 9" srcset="https://substackcdn.com/image/fetch/$s_!rvgB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F806c81ae-77c9-4560-962f-da53f06e5be0_1322x556.jpeg 424w, https://substackcdn.com/image/fetch/$s_!rvgB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F806c81ae-77c9-4560-962f-da53f06e5be0_1322x556.jpeg 848w, https://substackcdn.com/image/fetch/$s_!rvgB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F806c81ae-77c9-4560-962f-da53f06e5be0_1322x556.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!rvgB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F806c81ae-77c9-4560-962f-da53f06e5be0_1322x556.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Entropy is important not only for measuring, but for training. In fact, for pretraining and for SFT, the entropy is how score your model!</p><p>Let me break it down. As we saw before, your model produces a distribution for the next token - what <em>it</em> thinks are the realistic tokens, <em>for</em> the given context, <em>based</em> <em>on</em> all the training data it has seen so far. In the formula above, we&#8217;re going to call the model&#8217;s distribution q, and we&#8217;re going to call the current token position x.</p><p>In addition to our model&#8217;s distribution, which we can measure and look at, there is also the <em>true</em> distribution, the likelihood of every possible next token given basically all the information in the world: all language ever spoken, the complete physical state of the person speaking it, the time of day, the weather, and so on and so on. On here, we call that p.</p><p>Of course that true distribution doesn&#8217;t concretely exist, in the sense that it can never be found and calculated. It&#8217;s more like a Platonic ideal.</p><p>But what we can do is treat any piece of training data as a <em>sample</em> from that true distribution. Like if I write an essay, that&#8217;s some real text from some combination of factors from the true distribution. So when we train on that, the model gets one particular view of the true distribution. And if we get lots of different samples, from all sorts of people and places and circumstances, we can get more and more samples from the true distribution. We&#8217;ll never fully build it, but we can keep getting closer with more data.</p><p>So when we model the true distribution, what we&#8217;re really doing is stringing all these individual discrete samples together into one smooth curve, which hopefully is close to the true distribution.</p><p>The difference between the samples of the true distribution p and our model q is the loss. And the loss has two parts: the entropy of the true distribution p, since it is changing all the time; and the distance between p and q. Those parts together form the loss. We have no influence on p of course, so loss is never zero, but we can keep trying to shrink the gap between p and q.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.friendlypaperreview.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">One ML paper a week, accessible to non-ML audiences. Subscribe to FPR today</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>The Paper</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KPgf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5104aaf8-7877-49d1-b9c7-b7a3056d6a55_1600x879.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KPgf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5104aaf8-7877-49d1-b9c7-b7a3056d6a55_1600x879.jpeg 424w, https://substackcdn.com/image/fetch/$s_!KPgf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5104aaf8-7877-49d1-b9c7-b7a3056d6a55_1600x879.jpeg 848w, https://substackcdn.com/image/fetch/$s_!KPgf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5104aaf8-7877-49d1-b9c7-b7a3056d6a55_1600x879.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!KPgf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5104aaf8-7877-49d1-b9c7-b7a3056d6a55_1600x879.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KPgf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5104aaf8-7877-49d1-b9c7-b7a3056d6a55_1600x879.jpeg" width="728" height="399.945" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5104aaf8-7877-49d1-b9c7-b7a3056d6a55_1600x879.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:879,&quot;width&quot;:1600,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Slide 11&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Slide 11" title="Slide 11" srcset="https://substackcdn.com/image/fetch/$s_!KPgf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5104aaf8-7877-49d1-b9c7-b7a3056d6a55_1600x879.jpeg 424w, https://substackcdn.com/image/fetch/$s_!KPgf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5104aaf8-7877-49d1-b9c7-b7a3056d6a55_1600x879.jpeg 848w, https://substackcdn.com/image/fetch/$s_!KPgf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5104aaf8-7877-49d1-b9c7-b7a3056d6a55_1600x879.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!KPgf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5104aaf8-7877-49d1-b9c7-b7a3056d6a55_1600x879.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This is the entire method for the paper. Hopefully some of the math looks a bit familiar, but I&#8217;ll break it down into English.</p><p>First, they take a model and use it to produce one response per prompt, with certain values for temperature, top-k, and top-p.</p><p>Then, they use those prompt-response pairs to do SFT.</p><p>Then they test the model, using potentially different values for temperature, top-k, and top-p.</p><p>That&#8217;s it! They don&#8217;t check the responses at all, there is no QA. It&#8217;s just raw synthetic data from the same model they train, with certain temperature and top-k and top-p.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DWUN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f6b6f28-1f9b-457c-ad39-8bb19459749c_1584x732.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DWUN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f6b6f28-1f9b-457c-ad39-8bb19459749c_1584x732.jpeg 424w, https://substackcdn.com/image/fetch/$s_!DWUN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f6b6f28-1f9b-457c-ad39-8bb19459749c_1584x732.jpeg 848w, https://substackcdn.com/image/fetch/$s_!DWUN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f6b6f28-1f9b-457c-ad39-8bb19459749c_1584x732.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!DWUN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f6b6f28-1f9b-457c-ad39-8bb19459749c_1584x732.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DWUN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f6b6f28-1f9b-457c-ad39-8bb19459749c_1584x732.jpeg" width="728" height="336.42424242424244" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8f6b6f28-1f9b-457c-ad39-8bb19459749c_1584x732.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:732,&quot;width&quot;:1584,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Slide 12&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Slide 12" title="Slide 12" srcset="https://substackcdn.com/image/fetch/$s_!DWUN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f6b6f28-1f9b-457c-ad39-8bb19459749c_1584x732.jpeg 424w, https://substackcdn.com/image/fetch/$s_!DWUN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f6b6f28-1f9b-457c-ad39-8bb19459749c_1584x732.jpeg 848w, https://substackcdn.com/image/fetch/$s_!DWUN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f6b6f28-1f9b-457c-ad39-8bb19459749c_1584x732.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!DWUN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f6b6f28-1f9b-457c-ad39-8bb19459749c_1584x732.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Incredibly, this works. We&#8217;re going to come back to this table of results once we understand more about <em>why</em> their method works, but for now I wanted to flash the results.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3YCm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e4a6b90-155d-4fd8-aefa-78bd9505f56d_1584x870.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3YCm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e4a6b90-155d-4fd8-aefa-78bd9505f56d_1584x870.jpeg 424w, https://substackcdn.com/image/fetch/$s_!3YCm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e4a6b90-155d-4fd8-aefa-78bd9505f56d_1584x870.jpeg 848w, https://substackcdn.com/image/fetch/$s_!3YCm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e4a6b90-155d-4fd8-aefa-78bd9505f56d_1584x870.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!3YCm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e4a6b90-155d-4fd8-aefa-78bd9505f56d_1584x870.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3YCm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e4a6b90-155d-4fd8-aefa-78bd9505f56d_1584x870.jpeg" width="728" height="399.8484848484849" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4e4a6b90-155d-4fd8-aefa-78bd9505f56d_1584x870.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:870,&quot;width&quot;:1584,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Slide 13&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Slide 13" title="Slide 13" srcset="https://substackcdn.com/image/fetch/$s_!3YCm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e4a6b90-155d-4fd8-aefa-78bd9505f56d_1584x870.jpeg 424w, https://substackcdn.com/image/fetch/$s_!3YCm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e4a6b90-155d-4fd8-aefa-78bd9505f56d_1584x870.jpeg 848w, https://substackcdn.com/image/fetch/$s_!3YCm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e4a6b90-155d-4fd8-aefa-78bd9505f56d_1584x870.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!3YCm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e4a6b90-155d-4fd8-aefa-78bd9505f56d_1584x870.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>So the core insight of the paper in my view is this: some parts of a response are very predictable, while others are very uncertain.</p><p>The most illustrative domain for their insight is code. A lot of code is pretty standard, pretty formulaic, because code has lots of hard and fast rules. Like in Python here, when you define a function, you have to write &#8220;def&#8221; and then a function name with parenthesis around the arguments and then a colon. If it doesn&#8217;t look like that, you&#8217;re going to throw an error. There are lots of rules in natural language too, but almost none of them are hard and fast; natural language is famously flexible, it has artistic license and can even contain mistakes like misspellings without destroying the meaning.</p><p>Of course, since code is just a way to solve problems and one problem can have many solutions, some code is quite flexible. As the authors show here, there are many different ways to sort a list of things, and which one to pick can be a subtle matter, or may be only a matter of taste depending on the circumstances.</p><p>The authors coin two terms for the ends of this spectrum: &#8220;fork&#8221; for the uncertain cases, and &#8220;lock&#8221; for the certain ones.</p><p>As the charts illustrate, forks and locks have different needs. At a fork, the model needs to <em>explore</em>, to have lots of roughly equal next token probabilities so that it can learn better and so that minor changes to upstream context can tip it towards a different path easily. So there we would like a higher temperature.</p><p>By contrast, at a lock, the model needs to be sure. Locks have one right answer, and all other tokens in the distribution are distractors. At locks, we want low temperature.</p><p>The problem is that we can&#8217;t adjust temperature on the fly like this - it&#8217;s a parameter we provide once for the entire inference, for the entire response. In practice we end up compromising, but it&#8217;s not optimal.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0FgM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1722e8b5-2cfe-4035-85b7-171cacccd99b_1584x544.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0FgM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1722e8b5-2cfe-4035-85b7-171cacccd99b_1584x544.jpeg 424w, https://substackcdn.com/image/fetch/$s_!0FgM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1722e8b5-2cfe-4035-85b7-171cacccd99b_1584x544.jpeg 848w, https://substackcdn.com/image/fetch/$s_!0FgM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1722e8b5-2cfe-4035-85b7-171cacccd99b_1584x544.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!0FgM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1722e8b5-2cfe-4035-85b7-171cacccd99b_1584x544.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0FgM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1722e8b5-2cfe-4035-85b7-171cacccd99b_1584x544.jpeg" width="728" height="250.02020202020202" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1722e8b5-2cfe-4035-85b7-171cacccd99b_1584x544.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:544,&quot;width&quot;:1584,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Slide 14&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Slide 14" title="Slide 14" srcset="https://substackcdn.com/image/fetch/$s_!0FgM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1722e8b5-2cfe-4035-85b7-171cacccd99b_1584x544.jpeg 424w, https://substackcdn.com/image/fetch/$s_!0FgM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1722e8b5-2cfe-4035-85b7-171cacccd99b_1584x544.jpeg 848w, https://substackcdn.com/image/fetch/$s_!0FgM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1722e8b5-2cfe-4035-85b7-171cacccd99b_1584x544.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!0FgM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1722e8b5-2cfe-4035-85b7-171cacccd99b_1584x544.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>So you might think we could just make a better rule, right? Like if the distribution looks like this, then change the temperature like that, maybe ignore the tail as defined by a certain probability, things like that. But that&#8217;s not very ML of us, right? That&#8217;s not very Bitter Lesson, putting human-engineered rules on a thing that learns. Instead, what if we could teach the model to change its distributions? After all, producing the right distribution is already what we train models to do!</p><p>What if there was some way to train models to dynamically adjust temperature, top-k, and top-p, so that we get distributions more like this - flatter at forks, more peaked at locks, and always ignoring distractors. Or if it&#8217;s not literally learning different T, k, and p values, at least learning to change distributions as if it did learn different T, k, and p.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BPHs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd79fce81-3eef-4ae8-bfa5-fe047180bc31_1584x822.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BPHs!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd79fce81-3eef-4ae8-bfa5-fe047180bc31_1584x822.jpeg 424w, https://substackcdn.com/image/fetch/$s_!BPHs!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd79fce81-3eef-4ae8-bfa5-fe047180bc31_1584x822.jpeg 848w, https://substackcdn.com/image/fetch/$s_!BPHs!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd79fce81-3eef-4ae8-bfa5-fe047180bc31_1584x822.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!BPHs!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd79fce81-3eef-4ae8-bfa5-fe047180bc31_1584x822.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BPHs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd79fce81-3eef-4ae8-bfa5-fe047180bc31_1584x822.jpeg" width="728" height="377.7878787878788" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d79fce81-3eef-4ae8-bfa5-fe047180bc31_1584x822.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:822,&quot;width&quot;:1584,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Slide 15&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Slide 15" title="Slide 15" srcset="https://substackcdn.com/image/fetch/$s_!BPHs!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd79fce81-3eef-4ae8-bfa5-fe047180bc31_1584x822.jpeg 424w, https://substackcdn.com/image/fetch/$s_!BPHs!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd79fce81-3eef-4ae8-bfa5-fe047180bc31_1584x822.jpeg 848w, https://substackcdn.com/image/fetch/$s_!BPHs!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd79fce81-3eef-4ae8-bfa5-fe047180bc31_1584x822.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!BPHs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd79fce81-3eef-4ae8-bfa5-fe047180bc31_1584x822.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In the interest of reducing variables, the researchers try out a much simpler model: a finite state machine (FSM). A FSM is not a transformer. It&#8217;s barely even a model. Very concretely, each node on here is a set of sixteen numbers, each representing the probability of going in one of sixteen possible directions. They call the directions &#8220;tokens&#8221; to match LLM terminology, which is why you see &#8220;tok&#8221; and then a number on all the arrows here.</p><p>The idea here is they can construct trajectories to only be made of forks and locks. Like starting from the root, tokens 0 and 1 are both viable, so there is a genuine choice to make. But then after the choice, from either Fork-A or Fork-B, there is only one right choice to make - there are three locks in a row.</p><p>The way the model operates is by selecting from its &#8220;tokens&#8221; in the same way an LLM selects from the tokens in its vocabulary: take the raw probabilities, adjust with temperature and top-k and top-p, then pick based on these final probabilities. So you&#8217;re isolating just the impacts of inference parameters and training from all the other LLM stuff, and you have this fake, tightly controlled scenario.</p><p>Importantly, you can train this FSM using cross-entropy loss on sequences of tokens, the same way you would train an LLM.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VBwO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb45ea3eb-afd1-4eb4-82d0-bc00b8e91d00_1584x744.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VBwO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb45ea3eb-afd1-4eb4-82d0-bc00b8e91d00_1584x744.jpeg 424w, https://substackcdn.com/image/fetch/$s_!VBwO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb45ea3eb-afd1-4eb4-82d0-bc00b8e91d00_1584x744.jpeg 848w, https://substackcdn.com/image/fetch/$s_!VBwO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb45ea3eb-afd1-4eb4-82d0-bc00b8e91d00_1584x744.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!VBwO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb45ea3eb-afd1-4eb4-82d0-bc00b8e91d00_1584x744.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VBwO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb45ea3eb-afd1-4eb4-82d0-bc00b8e91d00_1584x744.jpeg" width="728" height="341.93939393939394" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b45ea3eb-afd1-4eb4-82d0-bc00b8e91d00_1584x744.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:744,&quot;width&quot;:1584,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Slide 16&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Slide 16" title="Slide 16" srcset="https://substackcdn.com/image/fetch/$s_!VBwO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb45ea3eb-afd1-4eb4-82d0-bc00b8e91d00_1584x744.jpeg 424w, https://substackcdn.com/image/fetch/$s_!VBwO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb45ea3eb-afd1-4eb4-82d0-bc00b8e91d00_1584x744.jpeg 848w, https://substackcdn.com/image/fetch/$s_!VBwO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb45ea3eb-afd1-4eb4-82d0-bc00b8e91d00_1584x744.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!VBwO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb45ea3eb-afd1-4eb4-82d0-bc00b8e91d00_1584x744.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>So what is their crazy method, what information exactly are they gonna train their toy model on?</p><p>You might <em>think</em> they can use the PASS or FAIL information. That would be perfectly reasonable, given we typically do SFT on good examples, or if you&#8217;re still in the RL mindset where we&#8217;re thinking in rewards. But that wouldn&#8217;t be teaching us anything new!</p><p>Instead, they&#8217;re just going to self-distill, to take a bunch of trajectories, whether PASS or FAIL, and train on them. The key variables to adjust are the <em>inference parameters</em>: temperature, top-k, and top-p.</p><p>What they show is that if you train on trajectories with the right inference parameters, regardless of the quality of the data, you can teach the model to shape its probabilities better.</p><p>This is what they actually observe, not just an illustrative example. For the lock nodes, the model learns to drop distractors and put more weight on the dominant token. For the fork node, the model learns to spread out more amongst the options left after truncation.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GFnv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc842dfe9-bbe1-4429-8411-f27e41afc121_1584x544.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GFnv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc842dfe9-bbe1-4429-8411-f27e41afc121_1584x544.jpeg 424w, https://substackcdn.com/image/fetch/$s_!GFnv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc842dfe9-bbe1-4429-8411-f27e41afc121_1584x544.jpeg 848w, https://substackcdn.com/image/fetch/$s_!GFnv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc842dfe9-bbe1-4429-8411-f27e41afc121_1584x544.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!GFnv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc842dfe9-bbe1-4429-8411-f27e41afc121_1584x544.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GFnv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc842dfe9-bbe1-4429-8411-f27e41afc121_1584x544.jpeg" width="728" height="250.02020202020202" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c842dfe9-bbe1-4429-8411-f27e41afc121_1584x544.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:544,&quot;width&quot;:1584,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Slide 17&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Slide 17" title="Slide 17" srcset="https://substackcdn.com/image/fetch/$s_!GFnv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc842dfe9-bbe1-4429-8411-f27e41afc121_1584x544.jpeg 424w, https://substackcdn.com/image/fetch/$s_!GFnv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc842dfe9-bbe1-4429-8411-f27e41afc121_1584x544.jpeg 848w, https://substackcdn.com/image/fetch/$s_!GFnv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc842dfe9-bbe1-4429-8411-f27e41afc121_1584x544.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!GFnv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc842dfe9-bbe1-4429-8411-f27e41afc121_1584x544.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I want to give a quick intuition on <em>why</em> this should work before we get to the empirical results.</p><p>Let&#8217;s start with self-distillation at T=1, with no truncation. All we&#8217;re doing then is getting unaltered samples from the model and reinforcing those. But since we&#8217;re not changing the samples at all, or filtering them like we would if quality were a concern, then training on them should produce no change.</p><p>However, with SSD, we specifically <em>avoid</em> T=1, and we <em>do</em> use truncation, i.e. top-k and top-p. So now the synthetic samples <em>are</em> different from what the model normally would say, so there is something to learn from.</p><p>So really SSD is just reinforcing and sharpening instincts the model already has. If the distribution looks like a lock, truncating puts even more probability on the top token, and the imbalance is already large enough that temperature shouldn&#8217;t change much. If the distribution looks like a fork, temperature spreads out probability more, but truncation ensures you don&#8217;t explore too far down the tail.</p><p>It&#8217;s not gonna help if the model is trash in the first place. And it does bank on there being some untapped potential within the model to then tap. But on the flip side, it&#8217;s embarrassingly simple and practically free compared to producing high-quality training data.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fJhW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef8c5ffa-2b4b-4803-a293-8b797e5224e8_1584x732.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fJhW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef8c5ffa-2b4b-4803-a293-8b797e5224e8_1584x732.jpeg 424w, https://substackcdn.com/image/fetch/$s_!fJhW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef8c5ffa-2b4b-4803-a293-8b797e5224e8_1584x732.jpeg 848w, https://substackcdn.com/image/fetch/$s_!fJhW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef8c5ffa-2b4b-4803-a293-8b797e5224e8_1584x732.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!fJhW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef8c5ffa-2b4b-4803-a293-8b797e5224e8_1584x732.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fJhW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef8c5ffa-2b4b-4803-a293-8b797e5224e8_1584x732.jpeg" width="728" height="336.42424242424244" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ef8c5ffa-2b4b-4803-a293-8b797e5224e8_1584x732.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:732,&quot;width&quot;:1584,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Slide 18&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Slide 18" title="Slide 18" srcset="https://substackcdn.com/image/fetch/$s_!fJhW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef8c5ffa-2b4b-4803-a293-8b797e5224e8_1584x732.jpeg 424w, https://substackcdn.com/image/fetch/$s_!fJhW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef8c5ffa-2b4b-4803-a293-8b797e5224e8_1584x732.jpeg 848w, https://substackcdn.com/image/fetch/$s_!fJhW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef8c5ffa-2b4b-4803-a293-8b797e5224e8_1584x732.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!fJhW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef8c5ffa-2b4b-4803-a293-8b797e5224e8_1584x732.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Now let&#8217;s take a closer look at the empirical results.</p><p>For all models they tested in basically every segment on the two version of LiveCodeBench, SSD helps. If you&#8217;re not familiar with pass@1 and pass@5, it just means how many chances a model had to get the problem right. Pass@1 measures accuracy in the way an end user would expect, like of course we want the one response we get to be correct. Pass@5 is more a reflection of model potential, like is it <em>able</em> to get the right answer.</p><p>Two trends to call out. First, within the instruct models, the smarter models benefited more. The authors don&#8217;t explain why, but I think it boils down to the whole &#8220;self-distillation&#8221; thing. Like if we think SSD&#8217;s function is to bring out the potential of the model, then smarter models likely have more potential. It&#8217;s actually pretty similar to how RLVR works, if you believe in the elicitation hypothesis, which I do. I wish they had actually done something similar here, where they measured pass@256 or some other really high number in order to see whether SSD was doing the same sort of internal optimization. We know it can&#8217;t be learning from the data in the normal way, given how this SFT data is created and how poor the quality can be, at least by traditional measures.</p><p>Second, the models generally improved more on the harder problems. Again, the authors don&#8217;t provide a direct explanation, but if we think back to the forks vs locks thing, we&#8217;d expect harder problems to have more forks. Like on an easy problem, not everything is a lock of course, but the distribution of positions is gonna be more on the lock side. I imagine for harder problems there will be way more forks, so if SSD helps with forks and locks, you can see why it might help with harder problems more.</p><p>Of course this doesn&#8217;t scale indefinitely; at some point, a problem is just impossibly hard for a model, and no amount of training is gonna help. So that might be starting to bite on the hard problems for Qwen3-4B for example.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kuDI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23c3b287-6dc7-4600-82ee-d14d3ab307cc_1257x640.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kuDI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23c3b287-6dc7-4600-82ee-d14d3ab307cc_1257x640.jpeg 424w, https://substackcdn.com/image/fetch/$s_!kuDI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23c3b287-6dc7-4600-82ee-d14d3ab307cc_1257x640.jpeg 848w, https://substackcdn.com/image/fetch/$s_!kuDI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23c3b287-6dc7-4600-82ee-d14d3ab307cc_1257x640.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!kuDI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23c3b287-6dc7-4600-82ee-d14d3ab307cc_1257x640.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kuDI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23c3b287-6dc7-4600-82ee-d14d3ab307cc_1257x640.jpeg" width="728" height="370.66030230708037" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/23c3b287-6dc7-4600-82ee-d14d3ab307cc_1257x640.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:640,&quot;width&quot;:1257,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Slide 19&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Slide 19" title="Slide 19" srcset="https://substackcdn.com/image/fetch/$s_!kuDI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23c3b287-6dc7-4600-82ee-d14d3ab307cc_1257x640.jpeg 424w, https://substackcdn.com/image/fetch/$s_!kuDI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23c3b287-6dc7-4600-82ee-d14d3ab307cc_1257x640.jpeg 848w, https://substackcdn.com/image/fetch/$s_!kuDI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23c3b287-6dc7-4600-82ee-d14d3ab307cc_1257x640.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!kuDI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23c3b287-6dc7-4600-82ee-d14d3ab307cc_1257x640.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>To drive home the point about how the training data isn&#8217;t operating in the conventional way, they really dial up the temperature on some of it. As you can see on the left, a temperature of 2 can lead to complete gibberish, let alone unrunnable code.</p><p>And yet, if you train on this data, where 62% of outputs contain no extractable code at all, you still improve.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QYid!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F602efb1f-0339-4142-a424-16ee2f4c6edb_1584x878.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QYid!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F602efb1f-0339-4142-a424-16ee2f4c6edb_1584x878.jpeg 424w, https://substackcdn.com/image/fetch/$s_!QYid!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F602efb1f-0339-4142-a424-16ee2f4c6edb_1584x878.jpeg 848w, https://substackcdn.com/image/fetch/$s_!QYid!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F602efb1f-0339-4142-a424-16ee2f4c6edb_1584x878.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!QYid!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F602efb1f-0339-4142-a424-16ee2f4c6edb_1584x878.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QYid!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F602efb1f-0339-4142-a424-16ee2f4c6edb_1584x878.jpeg" width="728" height="403.52525252525254" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/602efb1f-0339-4142-a424-16ee2f4c6edb_1584x878.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:878,&quot;width&quot;:1584,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Slide 20&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Slide 20" title="Slide 20" srcset="https://substackcdn.com/image/fetch/$s_!QYid!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F602efb1f-0339-4142-a424-16ee2f4c6edb_1584x878.jpeg 424w, https://substackcdn.com/image/fetch/$s_!QYid!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F602efb1f-0339-4142-a424-16ee2f4c6edb_1584x878.jpeg 848w, https://substackcdn.com/image/fetch/$s_!QYid!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F602efb1f-0339-4142-a424-16ee2f4c6edb_1584x878.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!QYid!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F602efb1f-0339-4142-a424-16ee2f4c6edb_1584x878.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>To give you an idea of how much better SSD is compared to just picking the optimal temperature, they do a sweep over temperature values on different benchmarks and models. In every case, the best temperature for the untrained model is worse than the performance of the trained model.</p><p>I also think this graph is helpful for providing an idea of temperature&#8217;s impact. 0.5 to 1.4 is a pretty big range, but for most cases the difference isn&#8217;t very big, and the big differences themselves fall in unpredictable patterns. Like why is there a seven-point drop from 0.9 to 1 for the bottom-right graph? Very strange.</p><h2>My Takeaways</h2><ul><li><p>This is the elicitation hypothesis but with trash data</p><ul><li><p>How well it works depends on how much potential the model has left to bring out</p></li></ul></li><li><p>Surely there is a cap to this work</p></li><li><p>The result is more instructive than useful</p></li><li><p>What are we losing?</p><ul><li><p>Leans more into priors</p></li><li><p>So the model may be worse past a certain difficulty - just sharpens the boundary</p></li></ul></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.friendlypaperreview.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">One ML paper a week, accessible to non-ML audiences. Subscribe to FPR today</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item></channel></rss>