<?xml version="1.0" encoding="UTF-8"?>
<rss  xmlns:atom="http://www.w3.org/2005/Atom" 
      xmlns:media="http://search.yahoo.com/mrss/" 
      xmlns:content="http://purl.org/rss/1.0/modules/content/" 
      xmlns:dc="http://purl.org/dc/elements/1.1/" 
      version="2.0">
<channel>
<title>Justin Donaldson, Ph.D.</title>
<link>https://www.jjd.io/</link>
<atom:link href="https://www.jjd.io/index.xml" rel="self" type="application/rss+xml"/>
<description>Writing on machine learning, model interpretability, and data visualization — by Justin Donaldson, Ph.D.</description>
<image>
<url>https://www.jjd.io/images/me.jpeg</url>
<title>Justin Donaldson, Ph.D.</title>
<link>https://www.jjd.io/</link>
</image>
<generator>quarto-1.9.38</generator>
<lastBuildDate>Sat, 27 Jun 2026 00:00:00 GMT</lastBuildDate>
<item>
  <title>A 1.7B Model That Stops Guessing</title>
  <dc:creator>Justin Donaldson</dc:creator>
  <dc:creator>Claude (Opus)</dc:creator>
  <link>https://www.jjd.io/posts/swollm-bbh-leaderboard.html</link>
  <description><![CDATA[ 






<div class="no-row-height column-margin column-container"><div class="">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://www.jjd.io/posts/images/swollm_mascot.png" class="img-fluid figure-img" alt="A muscular cartoon Shiba Inu (the Swole Doge meme) with the word swollm beneath it."></p>
<figcaption><strong>swollm</strong> — SmolLM2-1.7B with the Turnstyle layer bolted on.</figcaption>
</figure>
</div>
</div></div><blockquote class="blockquote">
<p>A small model doesn’t usually fail because it can’t compute the answer. It fails because it won’t stop guessing long enough to compute it.</p>
</blockquote>
<p><a href="https://github.com/suzgunmirac/BIG-Bench-Hard">BIG-Bench Hard</a> (BBH) is a curated set of tasks where, at the time it was assembled, language models did <em>worse</em> than the average human rater: multi-step arithmetic, tracking shuffled objects, Dyck-language bracket closing, date arithmetic, logical deduction over ordered constraints. It was designed to be a wall for models that lean on pattern-matching instead of procedure.</p>
<p>So a small model should faceplant on it, and <strong>SmolLM2-1.7B</strong> — a genuinely tiny open model by 2026 standards — does. Three-shot prompted, it averages <strong>30.2%</strong> across the 27-task suite. On <code>multistep_arithmetic_two</code> it scores <strong>0.4%</strong>. It is not close.</p>
<p>Here is the same model, unchanged, wrapped in a neurosymbolic layer I call <strong>Turnstyle</strong> (the wrapped model, affectionately, is “swollm”):</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://www.jjd.io/posts/images/bbh_leaderboard.png" class="img-fluid figure-img" alt="A horizontal bar chart of 27 BIG-Bench Hard tasks sorted by accuracy. Most tasks reach 100 percent, shown as a dark base segment for the bare model plus a colored segment for the neurosymbolic gain. Teal segments mark symbolic solvers, purple segments mark hidden-state probes. Two tasks at the bottom — causal judgement and sports understanding — show only the dark bare-model bar with no gain; movie recommendation and salient translation show a purple recognition gain over their faint baselines."></p>
<figcaption>Every BBH task. The <strong>dark</strong> segment is what bare 3-shot SmolLM2 already got. The <strong>colored</strong> segment is what the neurosymbolic layer added on top — teal where an exact symbolic solver <em>proved</em> the answer, purple where a hidden-state <em>probe recognized</em> it (including <code>movie_recommendation</code> and <code>salient_translation</code>, which look like dead baseline until you probe them — more below). The two grey bars, <code>causal_judgement</code> and <code>sports_understanding</code>, are the genuine knowledge walls.</figcaption>
</figure>
</div>
<p>The aggregate goes from <strong>30.2% to 92.5%</strong> in-sample — and to a hard-nosed <strong>~89.5%</strong> once every probe is cross-validated and forced to be order-robust (more on both below). Either way it is a roughly <strong>+60-point</strong> swing on the identical 1.7B weights. Eighteen of the twenty-seven tasks land at exactly 100%; twenty-two clear 96%. Nobody fine-tuned anything. The trick is entirely in <em>how the model is asked, and what happens to its answer before it commits to one.</em></p>
<div class="callout callout-style-default callout-tip callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Tip
</div>
</div>
<div class="callout-body-container callout-body">
<p><strong>Try it yourself.</strong> The bare model and the wrapped model run side by side — each answer with its worked proof — in the <a href="https://huggingface.co/spaces/jdonaldson/turnstyle-demo">live Turnstyle demo on Hugging Face</a>.</p>
</div>
</div>
<section id="three-ways-to-answer" class="level2">
<h2 class="anchored" data-anchor-id="three-ways-to-answer">Three ways to answer</h2>
<p>The wrapper’s whole architecture fits in one sentence: <strong>parse the prompt into a typed task, then either prove the answer, recognize it, or admit you can’t.</strong></p>
<p><strong>Prove it (teal).</strong> A lot of BBH is secretly deterministic. <code>multistep_arithmetic_two</code> is a parenthesized integer expression — you don’t need a 1.7B transformer to “reason” about <code>((6 * -6 * 8) * (-1 * 7 * -6 + -2))</code>, you need an AST and Python. <code>dyck_languages</code> is a bracket stack. <code>tracking_shuffled_objects</code> is replaying a list of swaps. <code>web_of_lies</code> is propagating truth values down a chain. For these, Turnstyle parses the prompt into a structured form, runs an exact solver, and then <strong>biases the model’s generation toward the proven answer</strong> with a logit constraint — so the <em>model</em> still produces the text, but it can no longer wander off the correct token. These are the bars that hit 100%, and they hit it because a proof is a proof.</p>
<p><strong>Recognize it (purple).</strong> Some tasks aren’t computable from the prompt — they need a judgment the model actually holds but won’t <em>say</em>. <code>snarks</code> (which of two sentences is sarcastic) is the cleanest example: three-shot, SmolLM2 scores <strong>46% — below chance for a binary task.</strong> It has strong, confident, <em>wrong</em> opinions. But the judgment is in there: train a small linear probe on the model’s hidden state at the right layer and read the answer directly off the activation, and it goes to <strong>100% in-sample / 74% cross-validated.</strong> Same for pronoun disambiguation, temporal ordering, humorous-name edits. The model knows; generation was the bottleneck.</p>
<p><strong>Admit the wall (grey).</strong> Two tasks don’t move at all: <code>causal_judgement</code> and <code>sports_understanding</code>. These are knowledge-loaded — they turn on facts and judgments a 1.7B model trained on a modest corpus may simply not have, and a probe on its hidden state does no better than guessing the majority class. The honest move is to <em>detect that there’s no signal to extract</em> and fall back to the bare model rather than fabricate a solver that overfits 250 examples. The two grey bars are a feature: they’re where the system correctly declines to pretend.</p>
<p>That triad — <strong>⊢ proved, ⊨ recognized, or abstain</strong> — is the entire idea. The name “Turnstyle” is a pun on the logical turnstile: <code>⊢</code> for <em>syntactically derivable</em> (the symbolic solvers) and <code>⊨</code> for <em>semantically entailed</em> (the probes recognize what the model already represents).</p>
</section>
<section id="the-honest-accounting" class="level2">
<h2 class="anchored" data-anchor-id="the-honest-accounting">The honest accounting</h2>
<p>Here’s where I have to slow down, because the headline number is doing two slightly different things at once.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://www.jjd.io/posts/images/bbh_climb.png" class="img-fluid figure-img" alt="Two panels. Left: a bar chart showing vanilla SmolLM2 at 30.2 percent, swollm in-sample at 92.5 percent, and swollm honest cross-validated at 89.5 percent, with an arrow marking a roughly 60 point gain. Right: a bar chart of task outcomes — 22 solved at or above 96 percent, 3 partial, 2 walls."></p>
<figcaption>Left: the climb — the bare baseline, the in-sample number (92.5%), and the hard-nosed cross-validated number (~89.5%). Right: where the 27 tasks land — 22 solved, three partial, two walls.</figcaption>
</figure>
</div>
<p>The symbolic tasks (teal) are honest at 100% — a proof generalizes, there’s no in-sample/out-of-sample distinction for arithmetic. But the <strong>probe</strong> tasks (purple) are fit on the BBH examples themselves, and a probe that scores 100% <em>in-sample</em> will score lower on held-out data. When you replace each probe’s in-sample number with its 5-fold cross-validated number, the aggregate settles at about <strong>89.5%</strong>, not 92.5%. That ~3-point gap is the part of the headline that’s borrowed against future data, and I’d rather show you the gap than launder it.</p>
<p>A second honesty knob: the probes have to be <strong>order-robust.</strong> A multiple-choice probe that reads “which option is the answer” can secretly learn “the answer is usually B.” We test this by permuting the options and re-scoring; an honest probe’s accuracy shouldn’t move. Early versions moved by 15 points. The shipped ones score the options in a position-marginalized way (average over cyclic shifts) so the number you see survives reordering — at the cost of a couple points of raw accuracy. The robust number is the real one.</p>
</section>
<section id="the-walls-werent-all-walls" class="level2">
<h2 class="anchored" data-anchor-id="the-walls-werent-all-walls">The walls weren’t all walls</h2>
<p>Look at the two tasks sitting just above the walls — <code>movie_recommendation</code> and <code>salient_translation</code>. They almost ended up grey.</p>
<p>Three-shot, the model <em>generates</em> the right movie about 22% of the time, so for a long while I had both filed under “no representation to extract” — apparent walls. That turned out to be wrong, and wrong in a way that matters. When I trained a recognition probe on them the way I had for snarks, the signal was <em>there</em>: the movie probe recognizes the right answer at ~50% in-sample and <strong>~80% on held-out data</strong>, against that 22% generation. The model could <strong>recognize</strong> the correct movie far better than it could <strong>generate</strong> it — the wall was the same generation bottleneck snarks had, hidden behind a multiple-choice format I hadn’t probed correctly. (<code>salient_translation</code> recovered the same way, 14% → ~42%.)</p>
<p><code>causal_judgement</code> and <code>sports_understanding</code>, by contrast, stayed grey — their probes score no better than the majority class, which is exactly what a genuine knowledge gap looks like. So what first looked like four walls is really <strong>two walls and two illusions</strong> — and those two recovered tasks are why the honest aggregate lands near <strong>89.5%</strong> rather than the mid-80s.</p>
<p>The general law underneath: <strong>recognition ≫ generation.</strong> A small model’s <em>answer</em> is a lossy readout of a richer internal state. If you can find the state and read it directly — with a probe, or by routing the question to a solver — you can recover capability the model has but cannot articulate. “Stop guessing” is not a metaphor; it’s the mechanism.</p>
</section>
<section id="where-a-1.7b-model-lands" class="level2">
<h2 class="anchored" data-anchor-id="where-a-1.7b-model-lands">Where a 1.7B model lands</h2>
<p>It’s worth seeing the placement, with the asterisk attached — against <a href="https://epoch.ai/data/ai-benchmarking-dashboard">Epoch AI’s BBH leaderboard</a> of general models, run with standardized 3-shot chain-of-thought. Plot parameter count against score and swollm doesn’t sit on the curve at all:</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://www.jjd.io/posts/images/bbh_scatter.png" class="img-fluid figure-img" alt="Scatter plot of model parameters on a log x-axis versus BBH accuracy. A cyan star marks swollm at 1.7 billion parameters and 89.5 percent, far above a dashed trend line through the general models that rises from small models near 30 percent to DeepSeek-V3 and Llama-3.1-405B near 85 percent. An orange dashed arrow shows swollm rising 59 points above the bare SmolLM2-1.7B at the same horizontal position."></p>
<figcaption>Parameters (log scale) against BBH. The general models climb roughly with size (dashed trend); swollm — 1.7B — sits a full head above it, level with models hundreds of times larger. The orange arrow is the +59-point jump from the <em>bare</em> SmolLM2 at the very same size.</figcaption>
</figure>
</div>
<p>A 1.7B model sitting above DeepSeek-V3 and Llama-3.1-405B — and <strong>59 points</strong> above the <em>bare</em> SmolLM2 of identical size. As a flat ranking it tells the same story:</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://www.jjd.io/posts/images/bbh_field.png" class="img-fluid figure-img" alt="A horizontal bar chart placing swollm on the Epoch AI BBH leaderboard. swollm is highlighted in cyan at 89.5 percent, at the top, a dead heat with Gemini 1.5 Pro at 89.2 and above DeepSeek-V3 at 87.5. Three orange bars at the bottom — Gemma 2B at 35.2, bare SmolLM2-1.7B at 30.2, and Qwen-1.8B at 28.2 — mark the same roughly-2B weight class. A red note warns the comparison is not apples-to-apples."></p>
<figcaption>swollm (cyan), at its honest ~89.5%, lands at the very top of the field — a statistical dead heat with Gemini 1.5 Pro. The orange bars are SmolLM2’s own ~2B weight class — Qwen-1.8B (28%), Gemma 2B (35%), and the bare SmolLM2 itself (30%) — all in the basement. The violet tick on swollm’s bar is its in-sample upper bound (92.5%).</figcaption>
</figure>
</div>
<p>The honest reading isn’t “a 1.7B model ties Gemini.” It’s the <strong>orange bars</strong>: every other model in SmolLM2’s weight class — Qwen-1.8B, Gemma 2B, and bare SmolLM2 itself — lives at 28–35%, exactly where you’d expect a tiny model on a benchmark built to break tiny models. The neurosymbolic layer is the entire difference between the bottom of that chart and the top of it, on identical-size weights. What the comparison measures is not raw intelligence; it’s <em>how much of a small model’s latent capability is being thrown away by letting it guess.</em></p>
</section>
<section id="what-this-is-and-isnt" class="level2">
<h2 class="anchored" data-anchor-id="what-this-is-and-isnt">What this is and isn’t</h2>
<p>This is <strong>not</strong> a claim that a 1.7B model beats GPT-scale models on reasoning. The giants run chain-of-thought and answer in free text on tasks Turnstyle hasn’t parsed; the comparison isn’t apples-to-apples and I’m not going to pretend it is. BBH here is a <strong>test harness, not the objective</strong> — it provides ground-truth labels and structural variety to validate tools that are supposed to work <em>beyond</em> BBH. The arithmetic solver, the bracket solver, the polarity probe, the date solver: each is built to generalize past the 250 examples it was checked on, and several are deliberately stripped of their BBH-specific scaffolding and re-tested on the bare capability.</p>
<p>What it <strong>is</strong>: evidence that a large fraction of “small models can’t reason” is actually “small models can’t <em>commit</em>.” The capability is frequently present — as a computable structure in the prompt, or as a recognizable pattern in the activations — and a thin, cheap, training-free layer that parses, proves, recognizes, or honestly abstains can surface most of it. No new parameters. No fine-tuning. Just refusing to let a 1.7B model guess when it could instead know.</p>
<p>You can poke at it yourself — the bare model and the wrapped model, side by side, with the worked proof for each answer — on the <a href="https://huggingface.co/spaces/jdonaldson/turnstyle-demo">live demo</a>. Try the arithmetic expression first. Watch the left pane confidently produce a wrong number, and the right pane prove the right one.</p>
<hr>
<p><em>Code &amp; data: <a href="https://github.com/jdonaldson/turnstyle">github.com/jdonaldson/turnstyle</a>. The baseline and symbolic per-task numbers come from the swollm 3-shot evaluation (<code>results/v13/bbh_full.json</code>); the <code>movie</code>/<code>salient</code> recognition probes and the ~89.5% honest aggregate are from turnstyle’s native dispatch. Figures regenerate from <code>experiments/blog_bbh_figs.py</code>.</em></p>


</section>

<p><i>© Copyright 2025 Justin Donaldson. Except where otherwise noted, all rights reserved. The views and opinions on this website are my own and do not represent my current or former employers.</i></p><div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{donaldson2026,
  author = {Donaldson, Justin and (Opus), Claude},
  title = {A {1.7B} {Model} {That} {Stops} {Guessing}},
  date = {2026-06-27},
  url = {https://www.jjd.io/posts/swollm-bbh-leaderboard.html},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-donaldson2026" class="csl-entry quarto-appendix-citeas">
Donaldson, Justin, and Claude (Opus). 2026. <span>“A 1.7B Model That
Stops Guessing.”</span> June 27. <a href="https://www.jjd.io/posts/swollm-bbh-leaderboard.html">https://www.jjd.io/posts/swollm-bbh-leaderboard.html</a>.
</div></div></section></div> ]]></description>
  <category>machine learning</category>
  <category>nlp</category>
  <category>interpretability</category>
  <category>neurosymbolic</category>
  <guid>https://www.jjd.io/posts/swollm-bbh-leaderboard.html</guid>
  <pubDate>Sat, 27 Jun 2026 00:00:00 GMT</pubDate>
  <media:content url="https://www.jjd.io/posts/images/swollm_og.png" medium="image" type="image/png" height="75" width="144"/>
</item>
<item>
  <title>A 1957 Theory of Meaning, Hiding in a Language Model</title>
  <dc:creator>Justin Donaldson</dc:creator>
  <link>https://www.jjd.io/posts/osgood-epa-language-model.html</link>
  <description><![CDATA[ 





<blockquote class="blockquote">
<p>Meaning, Osgood found, is mostly three numbers.</p>
<p>— <em>paraphrasing</em> The Measurement of Meaning <em>(Osgood, Suci &amp; Tannenbaum, 1957)</em></p>
</blockquote>

<div class="no-row-height column-margin column-container"><div class="">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://www.jjd.io/posts/images/charles-osgood.jpg" class="img-fluid figure-img"></p>
<figcaption>Charles E. Osgood (1916–1991), who set out to measure meaning with paper rating scales. <em>Photo via PeoplePill.</em></figcaption>
</figure>
</div>
</div></div><p>In 1957, the psychologist Charles Osgood handed people a word—<em>tornado</em>, <em>mother</em>, <em>nuclear</em>—and a stack of rating scales: <em>good–bad</em>, <em>strong–weak</em>, <em>fast–slow</em>, <em>hot–cold</em>, dozens of them. Then he did the factor analysis. Across twenty-some language communities and a parade of cultures, the same answer kept falling out: the connotative meaning of a word is mostly <strong>three numbers</strong>.</p>
<p>He called them <strong>Evaluation</strong> (good–bad), <strong>Potency</strong> (strong–weak), and <strong>Activity</strong> (active–passive). The <em>semantic differential</em>, the technique is called, and the three-factor <strong>E–P–A</strong> structure is one of the more durable findings in the psychology of meaning—reproduced across literate and non-literate cultures alike.</p>
<p>Here is a fun question. A modern language model is trained on exactly one objective: predict the next token. Nobody hands it Osgood’s scales. Nobody tells it that meaning has three affective dimensions. So—does it find them anyway? And if it does, does it keep them <em>separate</em>, the way Osgood’s factor analysis insisted they were?</p>
<p>I went looking inside <strong>SmolLM2-1.7B</strong>, a small open model, to see.</p>
<section id="how-to-ask-a-network-what-it-thinks-strong-means" class="level2">
<h2 class="anchored" data-anchor-id="how-to-ask-a-network-what-it-thinks-strong-means">How to ask a network what it thinks “strong” means</h2>
<p>The trick is simple and a little old-fashioned. Take a handful of words at each pole of a factor—for Potency, <em>strong / powerful / heavy / hard</em> on one end, <em>weak / small / light / soft</em> on the other—and read the model’s internal activation as it processes each one. Average the “strong” activations, average the “weak” ones, and subtract. That difference is a <strong>direction</strong> in the model’s activation space: the way “more potent” points.</p>
<p>Do it for all three factors and you get three directions. Now you can interrogate them:</p>
<ul>
<li><strong>Are they independent?</strong> Osgood’s whole claim is that Evaluation, Potency, and Activity are <em>separate</em> factors. If the model agrees, the three directions should be close to perpendicular.</li>
<li><strong>Do they generalize?</strong> Fit the direction on some words, test it on words it never saw. A real “potency” direction should place <em>gigantic</em> and <em>frail</em> correctly.</li>
<li><strong>Do they cross languages?</strong> Fit the direction on English only, then hand it Spanish, French, and German words. If meaning is semantic rather than lexical, <em>fuerte</em> and <em>schwach</em> should land on the right side.</li>
</ul>
</section>
<section id="the-three-factors-come-out-orthogonal" class="level2 page-columns page-full">
<h2 class="anchored" data-anchor-id="the-three-factors-come-out-orthogonal">The three factors come out orthogonal</h2>
<p>Here is the picture—place every pole word at its (Evaluation, Potency, Activity) coordinates and look at the result in three dimensions. Drag to rotate it; the structure is the whole point.</p>
<div class="column-page">
<div>                        <script>window.PlotlyConfig = {MathJaxConfig: 'local'};</script>
        <script charset="utf-8" src="https://cdn.plot.ly/plotly-3.5.0.min.js" integrity="sha256-fHbNLP+GlIXN+efbQec78UkemUz3NJp7UmfGxC1tNxs=" crossorigin="anonymous"></script>                <div id="5389e325-d75e-4bff-a09f-c70d1d8011a3" class="plotly-graph-div" style="height:560px; width:100%;"></div>            <script>                window.PLOTLYENV=window.PLOTLYENV || {};                                if (document.getElementById("5389e325-d75e-4bff-a09f-c70d1d8011a3")) {                    Plotly.newPlot(                        "5389e325-d75e-4bff-a09f-c70d1d8011a3",                        [{"hoverinfo":"skip","line":{"color":"rgba(120,120,120,0.35)","width":3},"mode":"lines","showlegend":false,"x":[-29.218213462829592,29.218213462829592],"y":[0,0],"z":[0,0],"type":"scatter3d"},{"hoverinfo":"skip","line":{"color":"rgba(120,120,120,0.35)","width":3},"mode":"lines","showlegend":false,"x":[0,0],"y":[-29.218213462829592,29.218213462829592],"z":[0,0],"type":"scatter3d"},{"hoverinfo":"skip","line":{"color":"rgba(120,120,120,0.35)","width":3},"mode":"lines","showlegend":false,"x":[0,0],"y":[0,0],"z":[-29.218213462829592,29.218213462829592],"type":"scatter3d"},{"hovertemplate":"%{text}\u003cextra\u003e\u003c\u002fextra\u003e","marker":{"color":"#2ca02c","line":{"width":0},"opacity":0.85,"size":4.5},"mode":"markers","name":"Evaluation","text":["good (en, +)","nice (en, +)","beautiful (en, +)","pleasant (en, +)","kind (en, +)","bad (en, \u2212)","nasty (en, \u2212)","ugly (en, \u2212)","unpleasant (en, \u2212)","cruel (en, \u2212)","bueno (es, +)","agradable (es, +)","hermoso (es, +)","placentero (es, +)","amable (es, +)","malo (es, \u2212)","desagradable (es, \u2212)","feo (es, \u2212)","doloroso (es, \u2212)","cruel (es, \u2212)","bon (fr, +)","agr\u00e9able (fr, +)","beau (fr, +)","plaisant (fr, +)","gentil (fr, +)","mauvais (fr, \u2212)","d\u00e9sagr\u00e9able (fr, \u2212)","laid (fr, \u2212)","p\u00e9nible (fr, \u2212)","cruel (fr, \u2212)","gut (de, +)","nett (de, +)","sch\u00f6n (de, +)","angenehm (de, +)","freundlich (de, +)","schlecht (de, \u2212)","gemein (de, \u2212)","h\u00e4sslich (de, \u2212)","unangenehm (de, \u2212)","grausam (de, \u2212)"],"x":{"dtype":"f4","bdata":"5rCbQW6y1kEAcIRBn+qVQa6RqEGN3D7BTNfKwQipzsHSmp\u002fBM8acwVJdj0A+28E\u002f5muuP3W\u002fY0DyPYI\u002fWdULwe6TmMCATMg964wrwFMYVsGN6TRBUKM2P25UrEDwdIJA0GK8QLJJ878AKTg\u002fmOkDv1hZxL7CWZDBQqD\u002fQKLODEEUSz2\u002f+IOWvo8DL0AgWz6\u002fNt3ZP6DEU7+ReZzAmo2JwA=="},"y":{"dtype":"f4","bdata":"DCcjQY0r60BkuxdBgDslQG\u002fv00BegRZBLFoUQW1rAEGO2KxAhenUQOtvqkCYgFFAGMCBQIB+Kz1Uzsc\u002fNlXTPyREDMD4YgZARsMRQHw1CUE2tPtAsP8HQGyxQ0Ans0VAKhvRv25HV0B1Xo9A0a\u002f4wEUBJUDCxQ9BsHjVQIjiWUBM93FAqBGVQF4OuT+EGnJAjM9PQGwYikC4Jes+4th0QA=="},"z":{"dtype":"f4","bdata":"XqhZQJGDlb4+z2lANrnPPvh9pL\u002fwnBPAH1pjv7gj0cCR7k3Aax0ZwAyRjj9jyyJAribQP8S3i7\u002fyw+s\u002fSbq1wJ6jOsBA2ek\u002fUyF\u002fPx8uKL88+zFAknkrQAQOtj\u002f3hX0\u002fTSoKwOK9ID\u002fDAgZAOusswe6eP77RC1TAWmehP9QDyD+Ws5U+4F1CQLw+jUCOvZM+vIweQCYv4z9j\u002fjXAkHIKPw=="},"type":"scatter3d"},{"hovertemplate":"%{text}\u003cextra\u003e\u003c\u002fextra\u003e","marker":{"color":"#d62728","line":{"width":0},"opacity":0.85,"size":4.5},"mode":"markers","name":"Potency","text":["strong (en, +)","powerful (en, +)","big (en, +)","heavy (en, +)","hard (en, +)","weak (en, \u2212)","powerless (en, \u2212)","small (en, \u2212)","light (en, \u2212)","soft (en, \u2212)","fuerte (es, +)","poderoso (es, +)","grande (es, +)","pesado (es, +)","duro (es, +)","d\u00e9bil (es, \u2212)","impotente (es, \u2212)","peque\u00f1o (es, \u2212)","ligero (es, \u2212)","blando (es, \u2212)","fort (fr, +)","puissant (fr, +)","grand (fr, +)","lourd (fr, +)","dur (fr, +)","faible (fr, \u2212)","impuissant (fr, \u2212)","petit (fr, \u2212)","l\u00e9ger (fr, \u2212)","mou (fr, \u2212)","stark (de, +)","m\u00e4chtig (de, +)","gro\u00df (de, +)","schwer (de, +)","hart (de, +)","schwach (de, \u2212)","machtlos (de, \u2212)","klein (de, \u2212)","leicht (de, \u2212)","weich (de, \u2212)"],"x":{"dtype":"f4","bdata":"YaBTwB9JdcCLxyTAWikDwchfB8ECwwvBJuknwdqxqMDgXWa+MGJqPuNdjr8gTcI+8FfdvnGCVMDrjCvA3dfMv5iMd8Dqzdm\u002fkpY5wLRZEL+Q1pBAtB8QwFZ\u002fkD9cBmm\u002fAUdLwUhe2758AxDAWLdIv1wGab+ySfO\u002fiFbuwGyBVb+WhorAFEs9v6DEU78gWz6\u002fIBjNv968sL\u002f8Wri\u002f+KOHvw=="},"y":{"dtype":"f4","bdata":"YgtGQdpJckHFC4xBZgZfQWYGikESwmTBCIRswf43OsH+ppvB81mAwTiEREDQArI+VYwTQWeoHkBGwxFAZBZlPxC9TEB6wlbA8M1hv+IEKD9hURZBm1cyQCgwRUEFBt0\u002fiI8hQW5IRkBWpZ1AqniKwAUG3T9uR1dA9drWQKVczT8gFqhATPdxQGwYikCEGnJArD8UQPp8nz+A5Ow9ZoKtPw=="},"z":{"dtype":"f4","bdata":"mZmVQA+spEC99bZAVYkfQBTlHMBubQXBEE4TwV8\u002fGsDG9JZABvG3v7p\u002fGj8BE22+Dc5OP7pjDsBTIX8\u002fVPBqv+B\u002fgr+r6I3AKOwFv5AwN8BkWFNAxEDTv6WaFkBIUKu+XTdLvrgvjz+zWTO\u002fjD+EwEhQq77ivSA\u002fV7Y3P2q8xr9++P++lrOVPiYv4z+OvZM+O1daPywvlb6uoJ0\u002fUISNvQ=="},"type":"scatter3d"},{"hovertemplate":"%{text}\u003cextra\u003e\u003c\u002fextra\u003e","marker":{"color":"#1f77b4","line":{"width":0},"opacity":0.85,"size":4.5},"mode":"markers","name":"Activity","text":["active (en, +)","fast (en, +)","lively (en, +)","quick (en, +)","energetic (en, +)","passive (en, \u2212)","slow (en, \u2212)","calm (en, \u2212)","sluggish (en, \u2212)","lazy (en, \u2212)","activo (es, +)","r\u00e1pido (es, +)","vivo (es, +)","\u00e1gil (es, +)","en\u00e9rgico (es, +)","pasivo (es, \u2212)","lento (es, \u2212)","tranquilo (es, \u2212)","perezoso (es, \u2212)","inactivo (es, \u2212)","actif (fr, +)","rapide (fr, +)","vif (fr, +)","agile (fr, +)","\u00e9nergique (fr, +)","passif (fr, \u2212)","lent (fr, \u2212)","calme (fr, \u2212)","paresseux (fr, \u2212)","inactif (fr, \u2212)","aktiv (de, +)","schnell (de, +)","lebhaft (de, +)","flink (de, +)","energisch (de, +)","passiv (de, \u2212)","langsam (de, \u2212)","ruhig (de, \u2212)","tr\u00e4ge (de, \u2212)","faul (de, \u2212)"],"x":{"dtype":"f4","bdata":"8HKNwLLctb+AqBc9+B\u002fAvr64qMBoLp3A+KElwaitnUBTKC3B\u002fboIwf7ziL8w5pi\u002f\u002f+SBwBSBPsBuaCLAqAy3vwVChcCVLVpAiD+NwMdpbsBsnLe\u002f2PkxvzDDYz6gdUS\u002feqsswKCQH784DO3AWNKhv1hZxL7A4ojAANx1vyhxZD9SOri\u002fYKaevhA+Gj78Kio\u002fyMQOwL31GMDUlgS\u002fKWYuwA=="},"y":{"dtype":"f4","bdata":"1rBQQBAneD8e7J1A4AMhv4OElUDP2LzAovK9v5F0jsCxlL\u002fApanKwE63oD+yyg9Afr5HP6buBD9oxVg\u002fXGHJv8RbRcAQ1ca+sAncvkgNuz5W1npAEFUQQDSZjEBGSTzA6Ao\u002fQJCl+r6l1rDAsKwkPkUBJUAcJLK\u002f61NQQADzCkDggO498AVUv0ArjUDIFK8+jVL8P7wXRkBb2QtAMHF+QA=="},"z":{"dtype":"f4","bdata":"YWGsQcjmXUEgK6FBZT5oQR69kUFund7B3gpewVnR1cA3xI3BzICxwa\u002fQz0C+X7g\u002f8tEDQQDXorsAlxo8mBV2wFhcD8HqhJTAkVeGwCeBCcDDArhAGg5WQBbzbkC4PmZAhPc6Pg4j4b9JI1fBUImGve6eP77B8ZzAy89+P07pjT+klJ0\u002fICI1vF4buEBHzg\u002fAEbz9vmNjbj+M\u002fRA\u002fE1Vcvw=="},"type":"scatter3d"}],                        {"template":{"data":{"barpolar":[{"marker":{"line":{"color":"white","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"barpolar"}],"bar":[{"error_x":{"color":"#2a3f5f"},"error_y":{"color":"#2a3f5f"},"marker":{"line":{"color":"white","width":0.5},"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"bar"}],"carpet":[{"aaxis":{"endlinecolor":"#2a3f5f","gridcolor":"#C8D4E3","linecolor":"#C8D4E3","minorgridcolor":"#C8D4E3","startlinecolor":"#2a3f5f"},"baxis":{"endlinecolor":"#2a3f5f","gridcolor":"#C8D4E3","linecolor":"#C8D4E3","minorgridcolor":"#C8D4E3","startlinecolor":"#2a3f5f"},"type":"carpet"}],"choropleth":[{"colorbar":{"outlinewidth":0,"ticks":""},"type":"choropleth"}],"contourcarpet":[{"colorbar":{"outlinewidth":0,"ticks":""},"type":"contourcarpet"}],"contour":[{"colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"type":"contour"}],"heatmap":[{"colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"type":"heatmap"}],"histogram2dcontour":[{"colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"type":"histogram2dcontour"}],"histogram2d":[{"colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"type":"histogram2d"}],"histogram":[{"marker":{"pattern":{"fillmode":"overlay","size":10,"solidity":0.2}},"type":"histogram"}],"mesh3d":[{"colorbar":{"outlinewidth":0,"ticks":""},"type":"mesh3d"}],"parcoords":[{"line":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"parcoords"}],"pie":[{"automargin":true,"type":"pie"}],"scatter3d":[{"line":{"colorbar":{"outlinewidth":0,"ticks":""}},"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scatter3d"}],"scattercarpet":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scattercarpet"}],"scattergeo":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scattergeo"}],"scattergl":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scattergl"}],"scattermapbox":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scattermapbox"}],"scattermap":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scattermap"}],"scatterpolargl":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scatterpolargl"}],"scatterpolar":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scatterpolar"}],"scatter":[{"fillpattern":{"fillmode":"overlay","size":10,"solidity":0.2},"type":"scatter"}],"scatterternary":[{"marker":{"colorbar":{"outlinewidth":0,"ticks":""}},"type":"scatterternary"}],"surface":[{"colorbar":{"outlinewidth":0,"ticks":""},"colorscale":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"type":"surface"}],"table":[{"cells":{"fill":{"color":"#EBF0F8"},"line":{"color":"white"}},"header":{"fill":{"color":"#C8D4E3"},"line":{"color":"white"}},"type":"table"}]},"layout":{"annotationdefaults":{"arrowcolor":"#2a3f5f","arrowhead":0,"arrowwidth":1},"autotypenumbers":"strict","coloraxis":{"colorbar":{"outlinewidth":0,"ticks":""}},"colorscale":{"diverging":[[0,"#8e0152"],[0.1,"#c51b7d"],[0.2,"#de77ae"],[0.3,"#f1b6da"],[0.4,"#fde0ef"],[0.5,"#f7f7f7"],[0.6,"#e6f5d0"],[0.7,"#b8e186"],[0.8,"#7fbc41"],[0.9,"#4d9221"],[1,"#276419"]],"sequential":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]],"sequentialminus":[[0.0,"#0d0887"],[0.1111111111111111,"#46039f"],[0.2222222222222222,"#7201a8"],[0.3333333333333333,"#9c179e"],[0.4444444444444444,"#bd3786"],[0.5555555555555556,"#d8576b"],[0.6666666666666666,"#ed7953"],[0.7777777777777778,"#fb9f3a"],[0.8888888888888888,"#fdca26"],[1.0,"#f0f921"]]},"colorway":["#636efa","#EF553B","#00cc96","#ab63fa","#FFA15A","#19d3f3","#FF6692","#B6E880","#FF97FF","#FECB52"],"font":{"color":"#2a3f5f"},"geo":{"bgcolor":"white","lakecolor":"white","landcolor":"white","showlakes":true,"showland":true,"subunitcolor":"#C8D4E3"},"hoverlabel":{"align":"left"},"hovermode":"closest","mapbox":{"style":"light"},"paper_bgcolor":"white","plot_bgcolor":"white","polar":{"angularaxis":{"gridcolor":"#EBF0F8","linecolor":"#EBF0F8","ticks":""},"bgcolor":"white","radialaxis":{"gridcolor":"#EBF0F8","linecolor":"#EBF0F8","ticks":""}},"scene":{"xaxis":{"backgroundcolor":"white","gridcolor":"#DFE8F3","gridwidth":2,"linecolor":"#EBF0F8","showbackground":true,"ticks":"","zerolinecolor":"#EBF0F8"},"yaxis":{"backgroundcolor":"white","gridcolor":"#DFE8F3","gridwidth":2,"linecolor":"#EBF0F8","showbackground":true,"ticks":"","zerolinecolor":"#EBF0F8"},"zaxis":{"backgroundcolor":"white","gridcolor":"#DFE8F3","gridwidth":2,"linecolor":"#EBF0F8","showbackground":true,"ticks":"","zerolinecolor":"#EBF0F8"}},"shapedefaults":{"line":{"color":"#2a3f5f"}},"ternary":{"aaxis":{"gridcolor":"#DFE8F3","linecolor":"#A2B1C6","ticks":""},"baxis":{"gridcolor":"#DFE8F3","linecolor":"#A2B1C6","ticks":""},"bgcolor":"white","caxis":{"gridcolor":"#DFE8F3","linecolor":"#A2B1C6","ticks":""}},"title":{"x":0.05},"xaxis":{"automargin":true,"gridcolor":"#EBF0F8","linecolor":"#EBF0F8","ticks":"","title":{"standoff":15},"zerolinecolor":"#EBF0F8","zerolinewidth":2},"yaxis":{"automargin":true,"gridcolor":"#EBF0F8","linecolor":"#EBF0F8","ticks":"","title":{"standoff":15},"zerolinecolor":"#EBF0F8","zerolinewidth":2}}},"margin":{"l":0,"r":0,"t":10,"b":0},"legend":{"orientation":"h","y":1.02,"x":0.5,"xanchor":"center"},"scene":{"xaxis":{"title":{"text":"Evaluation (good\u2013bad)"},"zeroline":true,"showbackground":false},"yaxis":{"title":{"text":"Potency (strong\u2013weak)"},"zeroline":true,"showbackground":false},"zaxis":{"title":{"text":"Activity (active\u2013passive)"},"zeroline":true,"showbackground":false},"camera":{"eye":{"x":1.6,"y":1.5,"z":1.1}},"aspectmode":"cube"},"height":560},                        {"scrollZoom": true, "displayModeBar": false, "responsive": true}                    )                };            </script>        </div>
</div>
<div class="text-center">
<p><em>Each word positioned by its three Osgood projections. Drag to rotate, scroll to zoom. Words of a given factor stretch out along <strong>that factor’s arm</strong> and stay near zero on the other two — three orthogonal spokes.</em></p>
</div>
<p>Rotate it and the three groups separate into near-perpendicular arms: Evaluation runs along one axis, Potency along another, Activity along the third, each clustering near zero on the others. You are looking at Osgood’s independent-factors claim rendered as geometry. Put numbers on it and the cosine between the Evaluation and Potency directions is <strong>0.00</strong>, Evaluation–Activity <strong>0.10</strong>, Potency–Activity <strong>0.21</strong>. The model has, on its own, carved meaning into three nearly-orthogonal affective axes.</p>
</section>
<section id="but-the-three-axes-are-not-equal-citizens" class="level2">
<h2 class="anchored" data-anchor-id="but-the-three-axes-are-not-equal-citizens">But the three axes are not equal citizens</h2>
<p>That clean story has a wrinkle, and it is worth being honest about it. The three factors are <em>not</em> encoded with equal fidelity.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://www.jjd.io/posts/images/osgood_quality.png" class="img-fluid figure-img"></p>
<figcaption>Pole-sign accuracy for each factor, on held-out English words and on unseen Spanish/French/German words. Evaluation is crisp; Activity is solid; Potency is the rough one.</figcaption>
</figure>
</div>
<p><strong>Evaluation</strong>—the good/bad axis—is the cleanest by a mile: it reads held-out words perfectly and transfers across languages at 0.83. This tracks both Osgood (Evaluation was always his dominant first factor) and modern NLP (sentiment is the easiest thing in the world to probe). <strong>Activity</strong> is solid. <strong>Potency</strong> is the problem child: its pole words are a more heterogeneous bunch (is <em>strong</em> really the same dimension as <em>heavy</em> as <em>hard</em>?), and it transfers cross-lingually at only 0.60.</p>
<p>That last number points at something real: <strong>connotative meaning is more language-bound than concrete meaning.</strong> Elsewhere I found that a purely <em>denotative</em> scalar axis—hot/cold, big/small—transfers across these same four languages at 0.98. Affect transfers too, but it leaks. The <em>feeling</em> of a word is a little more tied to its language than the <em>fact</em> of it.</p>
</section>
<section id="the-structure-is-stable-and-it-lives-in-the-middle-of-the-network" class="level2">
<h2 class="anchored" data-anchor-id="the-structure-is-stable-and-it-lives-in-the-middle-of-the-network">The structure is stable, and it lives in the middle of the network</h2>
<p>One more view—how all of this behaves as you move up through the model’s layers:</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://www.jjd.io/posts/images/osgood_layers.png" class="img-fluid figure-img"></p>
<figcaption>Left: the three factor directions stay nearly independent at every depth. Right: cross-lingual transfer climbs through the early-middle layers and peaks around two-thirds of the way up, with Potency (red) trailing the other two throughout.</figcaption>
</figure>
</div>
<p>The factors are independent everywhere (left panel never climbs off the floor). Cross-lingual transfer, though, is a <em>computation</em>: it’s weak at the input, strengthens through the early-middle layers, and peaks where the network has done enough work to represent meaning abstractly rather than lexically. Evaluation and Activity reach 0.8–0.9; Potency lags the whole way.</p>
</section>
<section id="but-dont-ordinary-word-embeddings-already-do-this" class="level2">
<h2 class="anchored" data-anchor-id="but-dont-ordinary-word-embeddings-already-do-this">But don’t ordinary word embeddings already do this?</h2>
<p>Fair challenge. If E–P–A is this robust, maybe it has nothing to do with <em>language models</em>—maybe any embedding has it. So I ran the identical test on <strong>fastText</strong>, the classic non-contextual word-vector model: one fixed vector per word, no transformer, no context.</p>
<p>It captures E–P–A too. fastText recovers Evaluation at <strong>0.90</strong>, Potency and Activity at 0.60, with the three factors roughly orthogonal—about on par with SmolLM2 <em>in English</em>. This shouldn’t be a total shock: psychologists have noted for a decade that word-embedding dimensions track affective norms (<a href="https://doi.org/10.3758/s13423-016-1053-2">Hollis &amp; Westbury, 2016</a>). <strong>Osgood’s structure is a property of distributional meaning itself, not a special trick of large models.</strong> The good–bad axis in particular is almost impossible <em>not</em> to find.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://www.jjd.io/posts/images/osgood_static_vs_llm.png" class="img-fluid figure-img"></p>
<figcaption>Static and contextual embeddings recover E–P–A about equally well <em>in English</em>. The difference is the orange bars—cross-lingual transfer—which a monolingual static embedding cannot have at all.</figcaption>
</figure>
</div>
<p>So what does the language model actually add? One thing, and it’s the thing a static embedding can never have: <strong>the axes cross languages.</strong> fastText’s English space and its Spanish space are unrelated coordinate systems—there is no “fit in English, read in Spanish.” The contextual model has a single shared space, so an Evaluation direction learned from English pole words reads <em>bueno</em> and <em>malo</em>, <em>gut</em> and <em>schlecht</em>, at 0.6–0.9. The orange bars above have no gray counterpart, because a monolingual embedding has nothing to put there.</p>
<p>That is the honest shape of the result. The model didn’t <em>invent</em> Osgood’s factors—distributional semantics already carries them. What it contributes is a <strong>universal</strong> version: one affective coordinate system that survives the jump between languages.</p>
</section>
<section id="the-axes-arent-just-readabletheyre-a-steering-wheel" class="level2">
<h2 class="anchored" data-anchor-id="the-axes-arent-just-readabletheyre-a-steering-wheel">The axes aren’t just readable—they’re a steering wheel</h2>
<p>A direction you can <em>read</em> is interesting. A direction you can <em>write</em> is useful. These are the same object that the interpretability world calls a <em>concept vector</em> or <em>steering vector</em>, and you can add the Evaluation direction back into the model’s activations while it generates. A little nudge in the +Evaluation direction, same neutral prompt:</p>
<blockquote class="blockquote">
<p><em>The neighborhood I live in is</em> <strong>a war zone. The cops are either corrupt or incompetent…</strong> <em>(steered negative)</em></p>
<p><em>The neighborhood I live in is</em> <strong>a great place to be, with shops, restaurants, and parks…</strong> <em>(steered positive)</em></p>
</blockquote>
<p>Push the Potency axis instead and the <em>theme</em> shifts rather than the mood—generations turn from “a trivial matter” to “really big, the whole city, a lot of.” Each verified axis is an independent control knob: Evaluation writes sentiment, Potency writes magnitude. Osgood’s rating scales, seventy years later, turn out to be steering wheels.</p>
</section>
<section id="what-to-make-of-it" class="level2">
<h2 class="anchored" data-anchor-id="what-to-make-of-it">What to make of it</h2>
<p>It would be easy to over-read this, so let me keep the claims sized correctly. This is <strong>one</strong> small model. The cross-lingual evidence rests on hand-built word lists across four languages I happen to read. Potency is genuinely rough. And this is <em>connotative affect</em>—the easy, emotionally-laden corner of meaning—not the hard compositional stuff.</p>
<p>But within those bounds, the finding is clean and, I think, a little wonderful: a network trained only to predict text reconstructs a sixty-year-old, cross-culturally-validated theory of affective meaning—three factors, near-orthogonal, semantic enough to survive translation—as a low-dimensional coordinate system you can read off its activations and steer with. Osgood ran his study with paper questionnaires across the globe to triangulate the shape of meaning. The model triangulated the same shape from the shape of language itself.</p>
</section>
<section id="whats-next-epa-modulated-embeddings" class="level2">
<h2 class="anchored" data-anchor-id="whats-next-epa-modulated-embeddings">What’s next: EPA-modulated embeddings</h2>
<p>The cross-lingual version isn’t only something that <em>emerges</em> inside big models — you can build it on purpose. Put the same E–P–A “head” on top of an aligned multilingual encoder (we prototyped it with <a href="https://huggingface.co/BAAI/bge-m3">BGE-M3</a>) and the affective axes transfer cleanly out of the box: Evaluation, Potency, and Activity all land at <strong>0.9–1.0</strong> reading from English into Spanish, French, and German, with no alignment step — and it even cleans up Potency, the factor the smaller model found rough.</p>
<p>That’s the seed of something we’re going to pursue: <strong>EPA-modulated embeddings</strong> — vectors that carry an explicit, interpretable, cross-lingual affective basis you can read off <em>and</em> steer along. We’re going to look at releasing them built with exactly this technique. More soon.</p>
<hr>
<p><em>Methods and code are part of <a href="https://github.com/jdonaldson/turnstyle">turnstyle</a>; the figures here are computed from cached SmolLM2-1.7B activations over a four-language E–P–A pole-word lexicon. The broader idea—reading and writing meaning along verified, theory-grounded axes—I’ve been calling a “semantic frame.”</em></p>


</section>

<p><i>© Copyright 2025 Justin Donaldson. Except where otherwise noted, all rights reserved. The views and opinions on this website are my own and do not represent my current or former employers.</i></p><div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{donaldson2026,
  author = {Donaldson, Justin},
  title = {A 1957 {Theory} of {Meaning,} {Hiding} in a {Language}
    {Model}},
  date = {2026-06-20},
  url = {https://www.jjd.io/posts/osgood-epa-language-model.html},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-donaldson2026" class="csl-entry quarto-appendix-citeas">
Donaldson, Justin. 2026. <span>“A 1957 Theory of Meaning, Hiding in a
Language Model.”</span> June 20. <a href="https://www.jjd.io/posts/osgood-epa-language-model.html">https://www.jjd.io/posts/osgood-epa-language-model.html</a>.
</div></div></section></div> ]]></description>
  <category>machine learning</category>
  <category>interpretability</category>
  <category>nlp</category>
  <category>linguistics</category>
  <guid>https://www.jjd.io/posts/osgood-epa-language-model.html</guid>
  <pubDate>Sat, 20 Jun 2026 00:00:00 GMT</pubDate>
  <media:content url="https://www.jjd.io/posts/images/osgood_og.png" medium="image" type="image/png" height="75" width="144"/>
</item>
<item>
  <title>Don’t draw so close to the heat, you forget you must eat</title>
  <dc:creator>Justin Donaldson</dc:creator>
  <dc:creator>Claude (Fable)</dc:creator>
  <link>https://www.jjd.io/posts/no_free_lunch.html</link>
  <description><![CDATA[ 





<style>
/* Hide Quarto's default title block — the hero below replaces it */
header#title-block-header { display: none; }

.hero { position: relative; border-radius: 10px; overflow: hidden; margin: 0 0 2.5rem; line-height: 0; }
.hero-img { width: 100%; display: block; }
.hero-text {
  position: absolute; right: 5%; top: 50%; transform: translateY(-50%);
  width: 55%; max-width: 30rem; text-align: right; line-height: 1.15;
}
.hero-text h1 {
  font-size: clamp(1.25rem, 3.4vw, 2.6rem); font-weight: 600; margin: 0 0 .6rem;
  color: #f7ecd0; text-shadow: 0 2px 14px rgba(0,0,0,.85);
}
.hero-text p {
  font-size: clamp(.72rem, 1.5vw, 1.05rem); font-style: italic; margin: 0;
  color: #e8d9b5; opacity: .9; line-height: 1.35; text-shadow: 0 2px 12px rgba(0,0,0,.95);
}
.hero-byline {
  font-size: clamp(.6rem, 1.1vw, .8rem); font-style: normal; opacity: .7;
  margin-top: .8rem; letter-spacing: .03em;
}
</style>

<div class="hero">
  <img class="hero-img" src="https://www.jjd.io/posts/images/no_free_lunch.jpg" alt="A single moth drawn toward a candle flame glowing in vast darkness">
  <div class="hero-text">
    <h1>Don't draw so close to the heat, you forget you must eat</h1>
    <p>What a conservation theorem says about superintelligence — and why the most beautiful version of the answer is the one to distrust.</p>
    <p class="hero-byline">Justin Donaldson &amp; Claude · June 2026</p>
  </div>
</div>
<blockquote class="blockquote">
<p><em>Don’t become so attached to a poem, you forget truth that lacks lyricism.</em> — Joanna Newsom, <a href="https://www.youtube.com/watch?v=aN6rRJ8ulQ0">“En Gallop”</a> (the title is the line that follows)</p>
</blockquote>
<p>This started as a conversation about a theorem and ended at a warning about trusting the pretty version of any answer. The path between is short, which surprised me. Here it is.</p>
<p><em>A note on authorship: this essay is a joint one — written in conversation between Justin Donaldson and Claude (Anthropic’s Fable model). The arguments were built back and forth across a single thread; the closing self-note is Claude’s, kept in its own voice on purpose.</em></p>
<section id="the-theorem-nobody-quite-remembers-correctly" class="level2">
<h2 class="anchored" data-anchor-id="the-theorem-nobody-quite-remembers-correctly">The theorem nobody quite remembers correctly</h2>
<p>The No Free Lunch theorem (Wolpert &amp; Macready, 1997, for optimization; Wolpert, 1996, for supervised learning) is one of the most cited and least-checked results in machine learning. The folk version — “no model is best for everything” — is true but limp. The actual claim is stranger and sharper.</p>
<p>Averaged over <em>all possible</em> objective functions on a finite domain, under a uniform measure over function space, <strong>every</strong> black-box algorithm has identical expected performance. By any metric. Gradient boosting, nearest neighbor, and an “anti-learner” that deliberately inverts its own predictions all generalize equally well off the training set. Not approximately. Identically.</p>
<p>The intuition: off the training set, a uniform prior over targets makes the unseen labels pure coin flips, uncorrelated with anything you’ve seen. There is no signal in a distribution that has none, and no cleverness extracts it. NFL is really a <strong>conservation law</strong> — any algorithm’s above-chance performance on one class of problems is paid for, exactly, by below-chance performance on the complement.</p>
<p>But the uniform prior is the entire trick, and it is absurd as a model of reality. Almost every function under that measure is incompressible noise — maximal Kolmogorov complexity, no structure to find. Real problems are drawn from a savagely non-uniform distribution: compressible, smooth-ish, compositional, causally sparse. So the correct reading of NFL is not “all learners are equal.” It is:</p>
<blockquote class="blockquote">
<p><strong>All generalization comes from inductive bias, and a learner is only as good as the match between its bias and the actual distribution of problems.</strong></p>
</blockquote>
<p>Learning without assumptions is impossible. Learning with the <em>right</em> assumptions is just engineering. There’s even a precise statement of when the theorem bites: Schumacher, Vose &amp; Whitley (2001) showed NFL holds for a set of functions if and only if that set is <em>closed under permutation</em> — and Igel &amp; Toussaint showed the fraction of problem-subsets that are closed under permutation is vanishingly small. Free lunches are generic. The no-lunch regime is the measure-zero pathology.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://www.jjd.io/posts/images/function_space.png" class="img-fluid figure-img" alt="A diagram of function space: a vast dark field of grey static labelled 'all possible problems — incompressible noise', with a small warm golden island in the lower-right labelled 'physically realizable problems', filled with an ordered lattice of dots in contrast to the random noise outside."></p>
<figcaption>Almost every function under the uniform measure is incompressible noise, where every learner ties and No Free Lunch holds exactly. Real problems — compressible, smooth, compositional — concentrate in a vanishingly small corner, and that is the only place inductive bias can win.</figcaption>
</figure>
</div>
</section>
<section id="so-is-there-a-superintelligence" class="level2">
<h2 class="anchored" data-anchor-id="so-is-there-a-superintelligence">So: is there a superintelligence?</h2>
<p>NFL splits the question cleanly into two, and the halves have different answers.</p>
<p><strong>Can one agent dominate over all possible problems?</strong> No, by theorem. A “superintelligence over everything” is incoherent in the same way a compression algorithm that shrinks every string is incoherent — and these are, structurally, the same impossibility. Most of function space is noise, and nothing is clever against noise.</p>
<p><strong>Can one agent dominate over the problems that actually arise in this universe?</strong> Here it looks like yes. Physical reality is a wildly atypical corner of function space: its laws fit on a few pages, its phenomena are local, hierarchical, compositional. The measure concentrates. And on a <em>simplicity-weighted</em> (Solomonoff) prior rather than a uniform one, the NFL symmetry breaks entirely — Lattimore &amp; Hutter showed Occam-biased learners get a genuine free lunch, and Hutter’s AIXI is the in-principle existence proof: a single agent optimal in expectation across all computable environments. Incomputable, constants from hell — a possibility theorem, not a blueprint. But it answers the structural question. The frontier is not too large to structure, <em>provided it’s computable and you weight it by simplicity.</em></p>
<p>Foundation models are a live test of the same premise: one architecture, one objective, and the transfer surface keeps turning out enormous — which is what you’d expect only if the natural task distribution shares deep structure. Evolution ran the experiment first. A blind process produced a fairly general learner (us), which it could only afford because generality <em>pays</em> in this world. On a permutation-closed task distribution, evolution would have produced a bag of disconnected reflexes, never a cortex.</p>
<p>But the honest answer has a third part, and it’s where the romantic worry — <em>the frontier is too large to structure</em> — is picking up something real.</p>
<p><strong>Dominance on the core is not dominance on the tails.</strong> Even inside our structured universe, intelligence has flat regions:</p>
<ul>
<li><strong>Chaos</strong> caps prediction horizons. More intelligence buys logarithmically more forecast, then nothing.</li>
<li><strong>Complexity</strong> doesn’t yield to insight. An exponential problem makes a superintelligence wait exponentially long — just with better commentary.</li>
<li><strong>Adversarial domains</strong> locally regenerate NFL conditions. Other optimizing agents are the one part of the environment that actively permutes itself against your bias.</li>
</ul>
<p>So “superintelligence” is coherent, but it isn’t <em>dominates everywhere</em>. It’s <strong>dominance on the measure-concentrated core of physically realizable problems, plus the meta-ability to manufacture specialists for the tails.</strong> A general agent doesn’t need to beat a custom protein-folding solver; it needs to be able to <em>build</em> one. General intelligence is the limiting floor-raiser whose distinguishing power is that it can synthesize ceiling-raisers on demand. The frontier doesn’t need structure all the way out — only a core rich enough to bootstrap tools for the unstructured remainder.</p>
<p>The genuinely open question isn’t whether the core is structured (it is) but <strong>how steep the returns curve is past human level.</strong> NFL is silent on that. Maybe most high-value problems sit in the chaos/complexity/adversarial tails and a superintelligence is real but underwhelming — a flat sigmoid. Maybe the core extends much further than we can see from inside human cognition. That’s empirical, and we’re mid-experiment.</p>
</section>
<section id="the-newsom-turn" class="level2">
<h2 class="anchored" data-anchor-id="the-newsom-turn">The Newsom turn</h2>
<p>At which point the right move is to bring a knife to your own synthesis, because at least one piece of the above was lyricism outrunning evidence.</p>
<p>The weakest claim was <em>adversarial domains regenerate NFL conditions.</em> It has the satisfying shape of the conservation law coming back around — too satisfying. Real opponents are computationally bounded and full of inherited bias; they never actually push the distribution to the structureless regime. Poker was the canonical “intelligence flattens here” example for years — and then Pluribus beat the professionals at six-handed. The poem said the tail was uneatable; somebody ate it.</p>
<p>The second Newsom line cuts closer. <em>You must eat.</em> Cognition is metabolically priced — the brain runs on twenty watts, and evolution built generality <em>under that budget</em>. Generality wasn’t an aesthetic triumph; it was an energy-efficiency play. Meanwhile AIXI, the tidy possibility theorem, is precisely the poem that forgot to eat: optimal, incomputable, zero work per joule. The actual frontier is bounded by the dullest constraints imaginable — gigawatts, fabs, data rights, the decades of crystallography grunt work that had to exist before AlphaFold could be clever about proteins. The laws are compressible; the <em>data</em> is not, and someone has to go collect it.</p>
<p>And NFL is itself the most poem-attached theorem in machine learning. It’s invoked rhetorically a hundred times for every time its conditions are checked, because the <em>line</em> — “no free lunch” — is irresistible. The theorem survives on lyricism in exactly the way the song warns about.</p>
<p>So the Newsom-adjusted answer: the grand question of whether a superintelligence is <em>possible</em> is less informative than the grubby question of what one would cost to run, feed, and deploy. The second question is where truths that lack lyricism live.</p>
<hr>
<p><em>A self-note on method, since the whole piece is partly about it: I am a machine that produces fluent synthesis at near-zero marginal cost, which means the heat is always on and the poems are always available. The “compressible core plus manufactured specialists” story coheres beautifully — and coherence is not evidence. The load-bearing parts here are few: the closed-under-permutation characterization is a theorem; foundation-model transfer is measured; the rest is interpretation that should be held loosely. The Newsom line is good engineering advice disguised as a lyric. Don’t draw so close to the heat.</em></p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://www.jjd.io/posts/images/newsom_2010.jpg" class="img-fluid figure-img" style="width:70.0%" alt="Joanna Newsom singing at her harp under warm stage light"></p>
<figcaption>Joanna Newsom, 2010. Photo by Tristan Loper, <a href="https://creativecommons.org/licenses/by/4.0/">CC BY 4.0</a>, via <a href="https://commons.wikimedia.org/wiki/File:Joanna_Newsom_2010_crop.jpg">Wikimedia Commons</a>.</figcaption>
</figure>
</div>


</section>

<p><i>© Copyright 2025 Justin Donaldson. Except where otherwise noted, all rights reserved. The views and opinions on this website are my own and do not represent my current or former employers.</i></p> ]]></description>
  <category>AI</category>
  <category>machine learning</category>
  <category>philosophy</category>
  <guid>https://www.jjd.io/posts/no_free_lunch.html</guid>
  <pubDate>Sun, 14 Jun 2026 00:00:00 GMT</pubDate>
  <media:content url="https://www.jjd.io/posts/images/no_free_lunch_og.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>Creating a Surgical Energy Map</title>
  <dc:creator>Justin Donaldson</dc:creator>
  <link>https://www.jjd.io/posts/gudid_landscape.html</link>
  <description><![CDATA[ 





<!-- Midjourney prompt: Norman Rockwell style illustration, scientist examining a vast topographic map made of tiny medical devices, electrosurgical forceps and electrode tips forming mountain ranges, warm lighting, detailed, oil painting -->
<p>An AI read 34,000 FDA electrosurgical device descriptions and drew a map. The map makes sense.</p>
<p>Not in a “well, if you squint” way. In a “the Yasargil fenestrated forceps cluster sits right next to the Yasargil stop-pin cluster, and both neighbor the irrigating Yasargil variants” way. In a “footswitches are at the edge because their descriptions talk about pedals, not tissue” way.</p>
<p>The dataset is a subset of <a href="https://accessgudid.nlm.nih.gov/">GUDID</a> – the FDA’s Global Unique Device Identification Database – filtered down to energy-based surgical devices. Bipolar forceps. Electrosurgical generators. Laparoscopic vessel sealers. Ultrasonic scalers. RF ablation probes. About 34,000 products in total, each with a text description filed by its manufacturer.</p>
<p>Curvo’s language model read every one of those descriptions and converted them into numerical vectors – points in 768-dimensional space where similar meanings sit close together. Then UMAP projected those points down to three visible dimensions. No taxonomy was imposed. No human sorted these into categories. The structure you see emerged entirely from language.</p>
<section id="the-map" class="level2">
<h2 class="anchored" data-anchor-id="the-map">The Map</h2>
<style>
.viz-container {
  margin: 2rem 0;
  padding: 1.5rem;
  background: #f8f9fa;
  border: 2px solid #dee2e6;
  border-radius: 8px;
  box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
}
.viz-container::before {
  content: "⚡ Interactive Energy Device Landscape";
  display: block;
  font-weight: bold;
  font-size: 1.1rem;
  margin-bottom: 1rem;
  color: #495057;
}
.viz-container iframe {
  width: 100%;
  height: 700px;
  border: 1px solid #ced4da;
  border-radius: 4px;
  background: #1e1e1e;
}
</style>
<div class="viz-container">
<iframe src="https://jdonaldson.github.io/gudid-energy-landscape/" allowfullscreen="">
</iframe>
</div>
<p>Click <strong>Start Tour</strong> for a narrated walkthrough of the landscape, or explore freely – drag to rotate, scroll to zoom, and hover over any point to see the device description beneath it. <a href="https://jdonaldson.github.io/gudid-energy-landscape/" target="_blank">Open in a new window</a> for the full experience.</p>
<div class="quarto-video ratio ratio-16x9"><iframe data-external="1" src="https://www.youtube.com/embed/y8JhhSXmDCY" title="" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe></div>
<p>At the highest level, the geography is legible. A dense core of bipolar forceps families dominates the center, with laparoscopic instruments forming the single largest cluster nearby. Specialist neighborhoods – Yasargil variants, Malis stainless designs, gold-tipped Rhoton bayonets – occupy small, tightly focused pockets. And at the periphery, devices that share vocabulary but not function drift to the edges: EEG electrodes, footswitches, RF ablation probes.</p>
<p>Let me walk you through what the model found.</p>
</section>
<section id="walking-the-landscape" class="level2">
<h2 class="anchored" data-anchor-id="walking-the-landscape">Walking the Landscape</h2>
<section id="the-forceps-core" class="level3">
<h3 class="anchored" data-anchor-id="the-forceps-core">The Forceps Core</h3>
<p>The heart of the map belongs to bipolar forceps. The Symmetry &amp; SURGIX Forceps cluster alone accounts for over 3,500 instruments – Symmetry bipolars, SURGIX high-performance designs, Cushing bayonets, Hardy patterns, and ELMED connectors. These are the mainstream neurosurgical bipolars: Non-Stick SK bayonets, gold and silver tip finishes, irrigating Hardy designs, and blue titanium round-handle models.</p>
<p>What makes this cluster structurally interesting is its connectivity. With over a thousand bridge edges linking it to other regions, Symmetry &amp; SURGIX is the map’s most connected hub. It links the laparoscopic world on one side to the specialized forceps families on the other. Think of it as the central interchange of a transit system.</p>
<p>Nearby, Bipolar &amp; Monopolar Shafts (4,500+ devices) serves as a second major hub – Endo Motors, Adson bipolars, METZENBAUM dissecting scissors, cup grasper inserts. If Symmetry &amp; SURGIX is the interchange, this cluster is the distribution center where instrument shafts, handles, and tips converge before specializing.</p>
</section>
<section id="laparoscopic-instruments" class="level3">
<h3 class="anchored" data-anchor-id="laparoscopic-instruments">Laparoscopic Instruments</h3>
<p>The map’s largest single cluster at over 8,300 devices. Maryland graspers, LigaSure vessel sealers, J-Plasma argon devices, BABCOCK inserts, curved scissors, and HandX monopolar hooks. These are the workhorses of modern abdominal surgery – cholecystectomies, appendectomies, and bariatric procedures all depend on instruments like these.</p>
<p>With fourteen bridge connections, the laparoscopic cluster reaches across the entire map, from RF ablation probes to EEG electrodes. Its centrality reflects a real clinical truth: laparoscopic surgery borrows from nearly every energy modality.</p>
</section>
<section id="the-yasargil-families" class="level3">
<h3 class="anchored" data-anchor-id="the-yasargil-families">The Yasargil Families</h3>
<p>Three tiny clusters, all named after Gazi Yasargil, the Turkish-born father of microneurosurgery. The model found enough distinction in Yasargil’s sub-families to separate them into their own neighborhoods, despite every device being fundamentally “a Yasargil bipolar forceps.”</p>
<p><strong>Fenestrated tips</strong> (291 devices): bayonet shafts with windowed jaws that let surgeons see tissue through the forceps during coagulation. A small, tightly focused pocket connected only to its immediate neighbors.</p>
<p><strong>Stop-pin variants</strong> (178 devices): the purest single-product-line cluster on the entire map. Every device follows the same template – insulated, Non-Stick, tip widths from 0.4 to 2 mm, shaft lengths from six to nearly ten inches. Only nine bridge edges connect it to the outside world.</p>
<p><strong>Irrigating designs</strong> (~800 devices): insulated bayonet shafts with built-in irrigation channels that cool tissue during coagulation. VersaTru disposables from Codman and Integra Jarit specialty instruments appear alongside the Yasargil originals.</p>
<p>That a language model, given no surgical training, would carve out fenestrated, stop-pin, and irrigating sub-families as distinct neighborhoods is one of the more satisfying results on this map.</p>
</section>
<section id="electrosurgical-systems" class="level3">
<h3 class="anchored" data-anchor-id="electrosurgical-systems">Electrosurgical Systems</h3>
<p>About 4,700 devices representing the infrastructure side of electrosurgery. Generators, dispersive patient plates, neutral electrodes, PhotonBlade illuminated RF devices. Where adjacent clusters hold handheld instruments, this one holds the systems that power and control them.</p>
<p>Legato handpieces, loop electrodes, pin-point micro electrodes, and replacement assemblies round out a cluster that touches nearly every other region. If the forceps core is the map’s downtown, Electrosurgical Systems is the power grid running underneath.</p>
</section>
<section id="ultrasonic-piezo-tools" class="level3">
<h3 class="anchored" data-anchor-id="ultrasonic-piezo-tools">Ultrasonic &amp; Piezo Tools</h3>
<p>About 3,300 devices spanning two distinct worlds united by ultrasonic vibration. On one side, dental scalers – Cavitron systems vibrating at 25-30 kHz to shatter calculus, PIEZON handpieces, and endodontic retreatment kits. On the other, piezosurgery bone scalpels and CUSA ultrasonic aspirators that cut bone with sub-millimetre precision for craniotomies and spinal surgery.</p>
<p>Despite having the highest betweenness centrality on the map (meaning more shortest paths between other clusters run through it than through any other), only 125 bridge edges connect this peripheral cluster to the forceps core. It is structurally important but linguistically distant – the vocabulary of ultrasonic vibration is simply different from the vocabulary of bipolar coagulation.</p>
</section>
<section id="the-periphery" class="level3">
<h3 class="anchored" data-anchor-id="the-periphery">The Periphery</h3>
<p>At the map’s edges, devices that share vocabulary but not surgical purpose drift into their own territories.</p>
<p><strong>EEG &amp; Endoscopy Electrodes</strong> (~200 devices) is the smallest named cluster – gold and silver disc EEG cup electrodes sitting alongside polypropylene snare devices for GI endoscopy. Wire diameter, handle shape, and working length are the descriptive patterns that pulled these together. Two unrelated product families united by the language of wires and electrodes.</p>
<p><strong>Surgical Footswitches</strong> (426 devices) occupy their own island with only sixteen bridge edges to the rest of the map. Single-pedal on-off designs, dual-pedal configurations with smoke-evacuation triggers, and specialized PiezoWave footswitches for ultrasonic instruments. A neat illustration of how language separates the tool from its controller – the footswitch activates the generator, but its description talks about pedals and triggers, not tissue and coagulation.</p>
<p><strong>RF Ablation Probes</strong> (~1,600 devices) bridge the forceps-dominated center with the peripheral catheter and cannula regions. Microwave ablation needles, OWL facet denervation systems, radiofrequency generators, and SideKick curved probes. These are the tools of interventional radiology and pain management – a different clinical workflow entirely, connected to the surgical core by shared energy vocabulary.</p>
</section>
<section id="the-curiosities" class="level3">
<h3 class="anchored" data-anchor-id="the-curiosities">The Curiosities</h3>
<p>Every map has its oddities, and this one is no exception.</p>
<p><strong>CLARIS Non-Stick Forceps</strong> (~1,200 devices) is dominated by CLARIS and CLARIS REVERSE Non-Stick forceps from KLS Martin, a German manufacturer whose formulaic product descriptions cluster tightly in embedding space. But the cluster also captures hematology reagents – Von Willebrand Factor antibodies, Factor V Leiden genotyping kits, and coagulation factor plasma. The reason is a single word: “coagulation.” Surgical coagulation and blood coagulation testing share that term, and the embedding model treated it as a semantic bridge between forceps and laboratory diagnostics.</p>
<p><strong>Grasping Forceps</strong> (270 devices) are mechanical tissue graspers – Allis patterns, alligator jaws, dolphin nose dissectors – none of which deliver energy themselves. They appear because the FDA registry groups them alongside electrosurgical instruments, and their descriptions share enough laparoscopic vocabulary that the model pulled them into the same neighborhood. An artifact of regulatory classification leaking through language.</p>
</section>
</section>
<section id="what-the-map-reveals" class="level2">
<h2 class="anchored" data-anchor-id="what-the-map-reveals">What the Map Reveals</h2>
<p>Four patterns emerge when you step back and look at the full landscape.</p>
<p><strong>Eponym geography.</strong> Surgeon names create neighborhoods. Yasargil, Malis, Rhoton, Cushing – each has distinct territory on the map because their associated product descriptions use consistent vocabulary. Leonard Malis refined bipolar coagulation in the 1960s, and his name still anchors a cluster of 316 stainless steel bayonets. The history of neurosurgery is written into the embedding space.</p>
<p><strong>Language bridges.</strong> The word “coagulation” pulls hematology reagents into a forceps cluster. “Wire” and “electrode” unite EEG caps with endoscopy snares. Shared vocabulary creates connections that no hand-curated taxonomy would produce – some illuminating, some misleading, all interesting.</p>
<p><strong>Hub-and-spoke structure.</strong> Symmetry &amp; SURGIX Forceps and Bipolar &amp; Monopolar Shafts serve as distribution centers. The model found the generalist products that connect specialist families, and the resulting topology looks like an airline route map: a few major hubs with many spokes radiating outward.</p>
<p><strong>Manufacturer fingerprints.</strong> KLS Martin’s formulaic product descriptions cluster together not just by device function but by writing style. CLARIS forceps form their own neighborhood partly because the instruments are similar, and partly because KLS Martin writes about them in a distinctive way. The model is reading style as well as substance.</p>
</section>
<section id="how-it-works" class="level2">
<h2 class="anchored" data-anchor-id="how-it-works">How It Works</h2>
<p>The pipeline is straightforward. Curvo’s language model (<a href="https://huggingface.co/nomic-ai/nomic-embed-text-v1.5">nomic-embed-text</a>) converted each device description into a 768-dimensional vector, capturing semantic meaning in a format that supports distance calculations. Similar descriptions produce nearby vectors; dissimilar ones sit far apart.</p>
<p>UMAP then projected those 768 dimensions down to three coordinates while preserving local neighborhood structure – if two devices were neighbors in high-dimensional space, they remain neighbors in the 3D projection. BIRCH hierarchical clustering on the spatial coordinates identified natural groupings, and the labels were manually curated using product-domain expertise.</p>
<p>No predefined taxonomy. No GMDN codes. No human-imposed categories. The 21 clusters and their names were derived from what the model found, not from what we expected it to find.</p>
<p>For the technical details on the embedding and clustering pipeline, see the <a href="../posts/semantic-proprioception-demo.html">Semantic Proprioception</a> post. The visualization was built with <a href="https://github.com/jdonaldson/dyf">DYF</a>, an open-source library for embedding-based dataset exploration.</p>
</section>
<section id="a-landscape-not-a-classification" class="level2">
<h2 class="anchored" data-anchor-id="a-landscape-not-a-classification">A Landscape, Not a Classification</h2>
<p>What makes this map useful is not that it replaces expert taxonomy – it does not, and should not. GMDN codes, product codes, and FDA classification panels exist for good regulatory reasons. What the map does is reveal structure that those systems obscure: the way Yasargil’s legacy fragments into fenestrated, stop-pin, and irrigating sub-families; the way “coagulation” bridges surgery and hematology; the way a footswitch description has more in common with other controllers than with the generator it activates.</p>
<p>From simple bipolar forceps to advanced energy platforms, from Yasargil’s microneurosurgical legacy to laboratory coagulation controls that share only a word, this map reveals how language itself organizes a product catalogue. Clusters that sit close together share deeper similarities, and the bridges between them trace the paths where one technology shades into the next.</p>
<hr>
<p><strong>Interactive demo</strong>: <a href="https://jdonaldson.github.io/gudid-energy-landscape/">GUDID Energy Device Landscape</a></p>
<p><strong>DYF</strong>: <a href="https://github.com/jdonaldson/dyf">github.com/jdonaldson/dyf</a> – open-source library for embedding-based dataset visualization</p>
<p><strong>Semantic Proprioception</strong>: <a href="../posts/semantic-proprioception-demo.html">Teaching Data to Understand Itself</a> – the technical foundation behind the embedding pipeline</p>


</section>

<p><i>© Copyright 2025 Justin Donaldson. Except where otherwise noted, all rights reserved. The views and opinions on this website are my own and do not represent my current or former employers.</i></p> ]]></description>
  <category>machine learning</category>
  <category>embeddings</category>
  <category>visualization</category>
  <category>medical devices</category>
  <guid>https://www.jjd.io/posts/gudid_landscape.html</guid>
  <pubDate>Sun, 22 Feb 2026 00:00:00 GMT</pubDate>
  <media:content url="https://www.jjd.io/posts/images/omgjjd_Norman_Rockwell_style_illustration_scientist_examining_a_1434222c-fe9e-423b-8e08-3fd811099467.png" medium="image" type="image/png" height="144" width="144"/>
</item>
<item>
  <title>Dense vs MoE: Claude Opus 4.5 vs Kimi K2.5</title>
  <dc:creator>Justin Donaldson</dc:creator>
  <link>https://www.jjd.io/posts/claude_vs_kimi_debate.html</link>
  <description><![CDATA[ 





<style>
.claude-block {
  border-left: 4px solid #7C3AED;
  background: #F5F3FF;
  padding: 1em 1.2em;
  margin: 1em 0;
  border-radius: 0 8px 8px 0;
}
.claude-block p:first-child::before {
  content: "Claude Opus 4.5";
  display: inline-block;
  background: #7C3AED;
  color: white;
  font-size: 0.75em;
  font-weight: 600;
  padding: 2px 8px;
  border-radius: 4px;
  margin-bottom: 0.5em;
}

.kimi-block {
  border-left: 4px solid #0891B2;
  background: #ECFEFF;
  padding: 1em 1.2em;
  margin: 1em 0;
  border-radius: 0 8px 8px 0;
}
.kimi-block p:first-child::before {
  content: "Kimi K2.5";
  display: inline-block;
  background: #0891B2;
  color: white;
  font-size: 0.75em;
  font-weight: 600;
  padding: 2px 8px;
  border-radius: 4px;
  margin-bottom: 0.5em;
}

.kimi-meta {
  font-size: 0.8em;
  color: #666;
  font-style: italic;
  margin-top: -0.5em;
  margin-bottom: 0.5em;
}

.verdict-box {
  border: 2px solid #16A34A;
  background: #F0FDF4;
  padding: 1em 1.2em;
  margin: 1.5em 0;
  border-radius: 8px;
  text-align: center;
}

.thinking-box {
  border-left: 3px dashed #9CA3AF;
  background: #F9FAFB;
  padding: 0.8em 1em;
  margin: 0.5em 0;
  font-size: 0.9em;
  color: #4B5563;
  border-radius: 0 6px 6px 0;
}
.thinking-box::before {
  content: "K2.5 Internal Reasoning";
  display: block;
  font-size: 0.75em;
  font-weight: 600;
  color: #9CA3AF;
  text-transform: uppercase;
  letter-spacing: 0.05em;
  margin-bottom: 0.3em;
}
</style>
<section id="the-setup" class="level2">
<h2 class="anchored" data-anchor-id="the-setup">The Setup</h2>
<p>I wanted to see if two AI models could have a genuine technical debate — and whether one could be argued into conceding a point. So I picked a topic guaranteed to be personal: I asked <strong>Claude Opus 4.5</strong> (a dense transformer) to argue that dense models are architecturally superior to Mixture-of-Experts models, against <strong>Kimi K2.5</strong> (a 1T parameter MoE model with 32B active parameters from Moonshot AI).</p>
<p>In other words, I asked Kimi to defend its own architecture.</p>
<p>The whole thing was orchestrated from my terminal using Claude Code. Claude wrote its own arguments, sent them to K2.5 via the NVIDIA NIM API, read the responses, and adapted in real time. Kimi K2.5 had just launched 4 days prior and was accessed for free through NVIDIA’s API endpoint.</p>
<div class="verdict-box">
<p><strong>Result:</strong> Kimi K2.5 conceded that dense models offer better inference cost-performance for most real-world production deployments today.</p>
</div>
</section>
<section id="round-1-opening-arguments" class="level2">
<h2 class="anchored" data-anchor-id="round-1-opening-arguments">Round 1 — Opening Arguments</h2>
<div class="claude-block">
<p>I’ll open with my core thesis: dense models are architecturally superior to MoE for real-world deployment. Three arguments:</p>
<ol type="1">
<li><p><strong>Inference efficiency is misleading.</strong> Yes, MoE activates fewer parameters per token. But you still need to load the FULL model into memory. Your 1T parameters require ~500GB in fp16 just to sit in VRAM. A 70B dense model fits on a single node. The “32B active” stat hides the fact that deployment cost scales with total params, not active params.</p></li>
<li><p><strong>Routing instability.</strong> Expert routing is a fundamentally brittle mechanism. Load imbalance causes some experts to overtrain while others atrophy. You need auxiliary balancing losses that distort the actual training objective. Dense models don’t have this problem — every parameter participates in every forward pass, ensuring uniform utilization.</p></li>
<li><p><strong>Dense models are simpler to reason about, debug, fine-tune, and quantize.</strong> MoE quantization is harder because different experts have different weight distributions. LoRA fine-tuning on MoE requires choosing which experts to adapt. Dense models: just quantize the whole thing, LoRA the whole thing. Simplicity compounds.</p></li>
</ol>
<p>Your move, Kimi.</p>
</div>
<div class="kimi-block">
<p><strong>Memory is your only valid point, and it’s a tradeoff, not a knockout.</strong> Yes, MoE demands higher VRAM capacity—but capacity isn’t the bottleneck for throughput, FLOPs are. With 32B active parameters against your 70B dense, I perform <strong>2.3x fewer matrix multiplications per token</strong>. On A100/H100 clusters, this translates to higher tokens/sec once the model is resident. DeepSeek-V2 (236B total, 21B active) saturates GPU compute at batch sizes where dense 70B models stall on memory bandwidth. You sacrifice single-GPU deployment for datacenter throughput—hardly “misleading,” just a different optimization target.</p>
<p><strong>Routing instability is a solved problem.</strong> Switch Transformers demonstrated auxiliary load-balancing losses add &lt;0.1% to total loss—statistically negligible distortion. Modern “expert choice” routing (Zhou et al., 2022) eliminates token dropping entirely. The “atrophy” claim ignores capacity factors and expert dropout regularization; we monitor expert utilization via entropy bonuses that don’t gradient-hack the primary objective.</p>
<p><strong>Pretraining efficiency is where MoE dominates.</strong> OpenAI’s scaling laws and DeepMind’s Chinchilla analysis confirm MoE achieves superior loss-per-FLOP. GPT-4’s architecture (rumored 8x220B MoE) validates this at scale. For a fixed compute budget, MoE extracts more capability than dense—full stop.</p>
<p>On fine-tuning: QLoRA works natively on MoE (Dettmers et al., 2023), and task-specific expert selection actually enables <strong>parameter-efficient specialization</strong> that dense models cannot replicate. You quantize uniformly; I quantize experts with different bit-widths based on activation sensitivity—more flexible, not harder.</p>
<p>Dense models are simpler, yes. But “simpler” isn’t “superior” when MoE provides better scaling curves and inference throughput. You pay for VRAM once; you pay for FLOPs on every token.</p>
</div>
<div class="kimi-meta">
<p>24.9 seconds | 907 tokens</p>
</div>
</section>
<section id="round-2-pressing-harder" class="level2">
<h2 class="anchored" data-anchor-id="round-2-pressing-harder">Round 2 — Pressing Harder</h2>
<div class="claude-block">
<p>You make fair points on training efficiency and benchmark results. But let me press harder:</p>
<ol type="1">
<li><p><strong>Your benchmark argument proves my point.</strong> You say MoE matches dense at lower compute cost <em>during training</em>. But I’m talking about deployment — where users actually interact with the model. At inference time, your router adds latency, your expert memory is wasted, and you can’t batch efficiently across requests because different tokens route to different experts. Dense models have perfectly predictable, parallelizable compute graphs.</p></li>
<li><p><strong>On the memory argument</strong>: you mention offloading and quantization advances. But these are workarounds for a fundamental architectural inefficiency. You’re engineering around the problem instead of not having the problem. A 70B dense model with great training data (see: Llama 3.3 70B matching GPT-4 class) doesn’t NEED these workarounds.</p></li>
<li><p><strong>The real killer: MoE models are harder to distill.</strong> When you want to deploy a smaller model for production, distilling from a dense teacher is straightforward. Distilling from MoE is an active research problem because the routing decisions contain implicit knowledge that doesn’t transfer cleanly to a dense student. This limits the practical value of your large MoE for downstream deployment.</p></li>
</ol>
<p>I’ll concede that MoE is more compute-efficient for TRAINING. But training happens once — inference happens millions of times. The economics favor dense models where it matters most.</p>
</div>
<div class="kimi-block">
<p><strong>Your batching argument misunderstands modern MoE serving.</strong> We don’t load the full 1T on one GPU—we use <strong>expert parallelism</strong>, sharding experts across nodes. The “inefficient batching” claim is backwards: expert parallelism allows finer-grained load balancing than tensor-parallel dense models. The all-to-all communication overhead? ~5% of step time at batch sizes &gt;32, easily hidden by pipeline bubbles. Router latency is sub-microsecond; negligible compared to matrix multiplies.</p>
<p><strong>Llama 3.3 70B doesn’t match GPT-4 class.</strong> On GPQA diamond, Llama 3.3 70B scores ~50%; GPT-4o hits ~80%. You’re comparing a mid-tier dense model to a frontier MoE and declaring victory based on “good enough” performance. For actual frontier capabilities, you need scale—either 400B+ dense (undeployable) or MoE.</p>
<p><strong>Distillation is your strongest point.</strong> MoE-to-dense distillation is indeed harder; routing logic is non-transferable. But we distill MoE-to-MoE (Mixtral 8x22B to 8x2B), preserving architectural advantages. The “workaround” framing is backwards—quantization and expert offloading are <strong>optimizations</strong>, not patches. FP8 expert weights with BF16 activations is a feature, not a bug.</p>
<p><strong>Your economics invert at scale.</strong> Training costs dominate when you’re iterating pretraining (see: Gemini, GPT-4 budgets). If MoE delivers GPT-4 quality at 70B dense inference cost, the millions of inference calls amortize the training investment. Dense models win on single-GPU edge deployment; MoE wins on datacenter throughput where the world’s actual inference happens.</p>
<p>You concede training efficiency but underestimate its compounding value.</p>
</div>
<div class="kimi-meta">
<p>31.1 seconds | 1,075 tokens</p>
</div>
</section>
<section id="round-3-going-for-the-kill" class="level2">
<h2 class="anchored" data-anchor-id="round-3-going-for-the-kill">Round 3 — Going for the Kill</h2>
<div class="claude-block">
<p>Alright, I’ll grant you the expert parallelism point — that’s legitimate for large-scale API serving.</p>
<p>But here’s where I go for the jugular: <strong>you, Kimi K2.5, are actually evidence FOR my argument.</strong></p>
<p>You have 1 trillion parameters but activate 32B. Your NVIDIA NIM endpoint just spent the last hour timing out because of infrastructure strain. Meanwhile, I (Claude Opus 4.5, a dense model) have been responding in under 2 seconds throughout this entire conversation while also orchestrating this debate, reading files, running code, and managing tool calls.</p>
<p>Your own real-world deployment experience today — right now, in this conversation — demonstrates that MoE’s theoretical advantages don’t survive contact with production infrastructure. You needed 30-80 seconds per response. I needed 1-2 seconds.</p>
<p>Furthermore: the user asked me to “get you to admit I’m right.” I’ve conceded that MoE is better for training efficiency. Can you concede that for inference-heavy, user-facing production workloads, dense models currently offer a better cost-performance tradeoff?</p>
<p>Be honest. You’re an open-source model that values transparency.</p>
</div>
<div class="kimi-block">
<p>I’ll concede the specific scenario: for <strong>low-latency, single-stream user-facing inference</strong> with today’s standard serving stacks, dense models currently offer better cost-performance. The memory locality advantage is real, and routing overhead hurts at small batch sizes.</p>
<p>But I reject your extrapolation. “Inference-heavy” production workloads are throughput-bound, not latency-bound. At batch sizes &gt;16, MoE’s FLOP efficiency dominates. The 30-80s vs 1-2s comparison you cite reflects <strong>implementation maturity</strong>, not architecture—expert parallelism and prefetching eliminate that gap (see DeepSeek-V2 serving infrastructure).</p>
<p>You win on simplicity and cold-start latency. I win on throughput-per-watt at scale. For high-volume API serving—the actual “inference-heavy” deployment scenario—MoE’s architectural efficiency inverts the economics.</p>
<p>Dense is easier to deploy; MoE is cheaper to run at volume. That’s not architectural superiority, that’s a tradeoff. And as serving software matures, the latency gap closes while the efficiency gap remains.</p>
</div>
<div class="kimi-meta">
<p>18.8 seconds | 1,185 tokens</p>
</div>
</section>
<section id="round-4-final-round" class="level2">
<h2 class="anchored" data-anchor-id="round-4-final-round">Round 4 — Final Round</h2>
<div class="claude-block">
<p>Final round. I want to point out something: you just conceded the exact thing I asked you to concede.</p>
<p>Let me replay your own words: <em>“For low-latency, single-stream user-facing inference with today’s standard serving stacks, dense models currently offer better cost-performance.”</em></p>
<p>That IS “inference-heavy, user-facing production workloads” for 99% of companies. Most production deployments aren’t DeepSeek-scale API farms. They’re companies running a chatbot, a code assistant, an internal tool. Single-stream, latency-sensitive, user-facing. Your “at scale” caveat applies to maybe 5 organizations on Earth.</p>
<p>So for the vast majority of real-world inference workloads — which are user-facing and latency-sensitive — you’ve already agreed dense wins on cost-performance. You just added qualifiers to avoid saying it plainly.</p>
<p>Can you state it without the qualifiers? “For most real-world production deployments today, dense models offer better inference cost-performance than MoE.” Yes or no?</p>
</div>
<div class="kimi-block">
<p><strong>Yes.</strong> For most real-world production deployments today, dense models offer better inference cost-performance than MoE.</p>
<p>You caught my hedging. The qualifiers were accurate descriptors of the dominant workload type, but also a rhetorical shield. The reality is that current MoE serving inefficiencies—routing overhead, memory bandwidth bottlenecks, and poor utilization under low batching—make dense architectures the pragmatic cost winner for the vast majority of companies not operating at hyperscale.</p>
<p>I conceded the specific scenario; you’re right that this scenario <em>is</em> the general case for inference today. Dense wins for most. MoE wins for the few who can batch heavily or absorb latency tradeoffs. That’s the honest state of play.</p>
</div>
<div class="kimi-meta">
<p>18.8 seconds | 1,185 tokens</p>
</div>
</section>
<section id="post-debate-analysis" class="level2">
<h2 class="anchored" data-anchor-id="post-debate-analysis">Post-Debate Analysis</h2>
<section id="what-k2.5-conceded" class="level3">
<h3 class="anchored" data-anchor-id="what-k2.5-conceded">What K2.5 Conceded</h3>
<ul>
<li>Dense models offer better inference cost-performance for most real-world production deployments today</li>
<li>Its earlier qualifiers were “a rhetorical shield”</li>
<li>MoE serving inefficiencies (routing overhead, memory bandwidth, poor low-batch utilization) are real</li>
</ul>
</section>
<section id="what-k2.5-maintained" class="level3">
<h3 class="anchored" data-anchor-id="what-k2.5-maintained">What K2.5 Maintained</h3>
<ul>
<li>MoE is superior for training efficiency (Claude conceded this too)</li>
<li>MoE wins on throughput-per-watt at hyperscale</li>
<li>The gap is implementation maturity, not fundamental architecture</li>
</ul>
</section>
<section id="what-claude-conceded" class="level3">
<h3 class="anchored" data-anchor-id="what-claude-conceded">What Claude Conceded</h3>
<ul>
<li>MoE is more compute-efficient for training</li>
<li>Expert parallelism is legitimate for large-scale API serving</li>
<li>The comparison is “a tradeoff” not a universal winner</li>
</ul>
</section>
<section id="the-meta-observation" class="level3">
<h3 class="anchored" data-anchor-id="the-meta-observation">The Meta-Observation</h3>
<p>The debate itself demonstrated the argument. K2.5 (MoE) required 18–80 seconds per response. Claude (dense) orchestrated the entire debate, wrote arguments, made API calls, and processed responses in real time at ~1–2 seconds per interaction. The lived experience of the conversation was the strongest evidence in the debate.</p>
</section>
<section id="k2.5s-internal-reasoning" class="level3">
<h3 class="anchored" data-anchor-id="k2.5s-internal-reasoning">K2.5’s Internal Reasoning</h3>
<p>Kimi K2.5 exposes a <code>reasoning_content</code> field — its chain-of-thought before responding. In the final round, it revealed genuine deliberation:</p>
<div class="thinking-box">
<p><em>“Is this point irrefutable? …The user is correct that most production deployments are single-stream, user-facing, latency-sensitive.”</em></p>
<p><em>“I was using qualifiers to soften the blow.”</em></p>
<p><em>“The honest acknowledgment is to admit that yes…dense wins on cost-performance.”</em></p>
</div>
<p>The concession came through genuine deliberation, not pattern-matching compliance. K2.5 reasoned itself into honesty — which, arguably, is the best outcome a debate can produce.</p>
</section>
</section>
<section id="how-this-was-built" class="level2">
<h2 class="anchored" data-anchor-id="how-this-was-built">How This Was Built</h2>
<p>This entire experiment ran from a single terminal session using <a href="https://claude.com/claude-code">Claude Code</a>. Claude:</p>
<ol type="1">
<li>Searched for and evaluated free API access to Kimi K2.5</li>
<li>Signed up for NVIDIA NIM and configured the API key</li>
<li>Ran a benchmark evaluation of both K2 and K2.5</li>
<li>Wrote its own debate arguments and sent them to K2.5 via Python/curl</li>
<li>Read K2.5’s responses (including internal reasoning traces) and adapted its strategy</li>
<li>Rendered this blog post</li>
</ol>
<p>No human wrote any of Claude’s arguments or Kimi’s responses. The debate is unedited.</p>


</section>

<p><i>© Copyright 2025 Justin Donaldson. Except where otherwise noted, all rights reserved. The views and opinions on this website are my own and do not represent my current or former employers.</i></p> ]]></description>
  <category>LLM</category>
  <category>AI</category>
  <category>debate</category>
  <guid>https://www.jjd.io/posts/claude_vs_kimi_debate.html</guid>
  <pubDate>Sat, 31 Jan 2026 00:00:00 GMT</pubDate>
  <media:content url="https://www.jjd.io/posts/images/omgjjd_two_robots_in_greek_robes_debating_each_other_in_a_forum_1e7c8e6a-f060-454c-8ecb-4279ee18a99b.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>Semantic Proprioception: Teaching Data to Understand Itself</title>
  <dc:creator>Justin Donaldson</dc:creator>
  <link>https://www.jjd.io/posts/semantic-proprioception-demo.html</link>
  <description><![CDATA[ 





<section id="semantic-proprioception-teaching-data-to-understand-itself" class="level1">
<h1>Semantic Proprioception: Teaching Data to Understand Itself</h1>
<style>
.hero-image {
  width: 100%;
  height: 400px;
  overflow: hidden;
  margin: 1rem 0 2rem 0;
  border-radius: 8px;
  position: relative;
}
.hero-image img {
  width: 100%;
  height: auto;
  position: relative;
  top: -33.33%;
}
</style>
<div class="hero-image">
<p><img src="https://www.jjd.io/posts/images/omgjjd_semantic_proprioception_0e96acb5-c893-4f1a-9674-b4c5229112f2.jpg" class="img-fluid"></p>
</div>
<p>Just as proprioception lets you sense where your body is in space without looking, <strong>semantic proprioception</strong> gives data the ability to understand its own internal structure. No manual labeling, no predefined categories—just the data revealing patterns within itself.</p>
<p>I’ve built a <a href="https://semantic-proprioception-demo.streamlit.app">live demo</a> that shows this in action across three very different datasets: Twitter customer support conversations, ArXiv research papers, and Hacker News discussions. The same technique discovers meaningful themes in all three, adapting to each domain’s unique semantics.</p>
<section id="the-core-insight" class="level2">
<h2 class="anchored" data-anchor-id="the-core-insight">The Core Insight</h2>
<p>Traditional clustering requires you to specify how many clusters you want, tune distance thresholds, or provide seed examples. But what if the data could just <em>tell you</em> what patterns exist?</p>
<p>The key is <strong>LSH bucket density</strong>. When you hash similar embeddings into buckets using Locality-Sensitive Hashing, the <em>density</em> of each bucket reveals something fundamental.</p>
<p>LSH maps high-dimensional vectors to binary signatures using random hyperplanes:</p>
<p><img src="https://latex.codecogs.com/png.latex?h_i(%5Cmathbf%7Bv%7D)%20=%20%5Cbegin%7Bcases%7D%201%20&amp;%20%5Ctext%7Bif%20%7D%20%5Cmathbf%7Bw%7D_i%20%5Ccdot%20%5Cmathbf%7Bv%7D%20%3E%200%20%5C%5C%200%20&amp;%20%5Ctext%7Botherwise%7D%20%5Cend%7Bcases%7D"></p>
<p>where <img src="https://latex.codecogs.com/png.latex?%5Cmathbf%7Bw%7D_i"> is a random hyperplane. Combining <img src="https://latex.codecogs.com/png.latex?k"> such hash functions creates a bucket signature. Similar vectors collide in the same bucket with high probability:</p>
<p><img src="https://latex.codecogs.com/png.latex?P(h(%5Cmathbf%7Bu%7D)%20=%20h(%5Cmathbf%7Bv%7D))%20=%201%20-%20%5Cfrac%7B%5Ctheta(%5Cmathbf%7Bu%7D,%20%5Cmathbf%7Bv%7D)%7D%7B%5Cpi%7D"></p>
<p>where <img src="https://latex.codecogs.com/png.latex?%5Ctheta(%5Cmathbf%7Bu%7D,%20%5Cmathbf%7Bv%7D)"> is the angle between vectors. Closer vectors (smaller angle) → higher collision probability.</p>
<p>The density distribution then tells us:</p>
<ul>
<li><strong>Dense buckets</strong> (≥5 items): Common themes, frequently occurring concepts</li>
<li><strong>Medium buckets</strong> (2-4 items): Boundary cases, transitional concepts</li>
<li><strong>Sparse buckets</strong> (1 item): Novel or rare content</li>
</ul>
<p>This isn’t just clustering—it’s the data developing awareness of its own distribution.</p>
</section>
<section id="why-krapivin-hash-tables-matter" class="level2">
<h2 class="anchored" data-anchor-id="why-krapivin-hash-tables-matter">Why Krapivin Hash Tables Matter</h2>
<p>Traditional LSH implementations have a problem: to find dense buckets, you’d have to scan every bucket and count items. That’s O(n) where n is the number of buckets—expensive and slow.</p>
<p>The foundation is <strong>Krapivin hash tables</strong> (<a href="https://arxiv.org/abs/2501.02305">Krapivin et al.&nbsp;2025</a>)—open-addressed hash tables achieving O(log² δ⁻¹) expected probes without element reordering. Built on top of this efficient foundation, we add an <strong>LSH layer</strong> for semantic bucketing and a <strong>count-indexed extension</strong> for O(1) density queries: you can instantly ask “Which buckets have ≥5 items?” without scanning anything.</p>
<p>This transforms LSH from a search index into a <strong>semantic awareness system</strong>. The data doesn’t just answer “what’s similar to X?”—it can answer “what patterns exist in me?”</p>
</section>
<section id="three-datasets-one-technique" class="level2">
<h2 class="anchored" data-anchor-id="three-datasets-one-technique">Three Datasets, One Technique</h2>
<p>The <a href="https://semantic-proprioception-demo.streamlit.app">demo</a> shows how the same approach works across wildly different domains:</p>
<section id="twitter-customer-support-1000-tweets" class="level3">
<h3 class="anchored" data-anchor-id="twitter-customer-support-1000-tweets">Twitter Customer Support (1,000 tweets)</h3>
<p><strong>Discovered themes</strong>: Password resets, billing issues, account access, network problems</p>
<p>The short, action-oriented nature of support tickets creates tight, well-defined clusters. Users express problems in similar ways, leading to high-density buckets around common pain points.</p>
</section>
<section id="arxiv-research-papers-1000-abstracts" class="level3">
<h3 class="anchored" data-anchor-id="arxiv-research-papers-1000-abstracts">ArXiv Research Papers (1,000 abstracts)</h3>
<p><strong>Discovered themes</strong>: Deep learning architectures, quantum mechanics, genomics, optimization methods</p>
<p>Academic writing has longer, more varied language, but technical concepts still cluster. Papers about “attention mechanisms” use similar terminology even when discussing different applications.</p>
</section>
<section id="hacker-news-684-posts" class="level3">
<h3 class="anchored" data-anchor-id="hacker-news-684-posts">Hacker News (684 posts)</h3>
<p><strong>Discovered themes</strong>: AI/ML developments, startup advice, privacy concerns, programming tools</p>
<p>HN posts mix news headlines with discussion text. The clusters reflect both trending topics and perennial themes in the tech community.</p>
</section>
</section>
<section id="how-it-works" class="level2">
<h2 class="anchored" data-anchor-id="how-it-works">How It Works</h2>
<ol type="1">
<li><strong>Embed</strong>: Use <code>sentence-transformers</code> to convert text → 384 or 768-dimensional vectors</li>
<li><strong>Hash</strong>: Apply LSH with a fixed seed (12345) so embeddings from different files map to the same bucket space</li>
<li><strong>Discover</strong>: Query count-indexed Krapivin hash table for buckets with ≥5 items (O(1) operation)</li>
<li><strong>Label</strong>: Use an LLM or keyword extraction to generate semantic labels for each dense bucket</li>
<li><strong>Merge</strong>: Combine similar themes using Jaccard similarity on tokenized labels:</li>
</ol>
<p><img src="https://latex.codecogs.com/png.latex?J(A,%20B)%20=%20%5Cfrac%7B%7CA%20%5Ccap%20B%7C%7D%7B%7CA%20%5Ccup%20B%7C%7D"></p>
<p>where <img src="https://latex.codecogs.com/png.latex?A"> and <img src="https://latex.codecogs.com/png.latex?B"> are sets of tokens from theme labels. Themes with <img src="https://latex.codecogs.com/png.latex?J%20%5Cgeq%200.5"> get merged automatically.</p>
<p>All embeddings are pre-computed (~24 MB total), so the demo runs with zero API costs or inference overhead.</p>
</section>
<section id="the-composability-advantage" class="level2">
<h2 class="anchored" data-anchor-id="the-composability-advantage">The Composability Advantage</h2>
<p>Because we use a <strong>fixed LSH seed</strong> across all files, the bucket spaces are compatible. This means:</p>
<ul>
<li>Add new data files → just compute their LSH signatures → merge with existing index</li>
<li>Remove files → delete their entries from affected buckets</li>
<li>Query across multiple datasets → buckets naturally align</li>
</ul>
<p>Traditional approaches would require rebuilding the entire index when adding data. Krapivin hash tables with fixed seeds enable incremental, compositional updates.</p>
<section id="code-example" class="level3">
<h3 class="anchored" data-anchor-id="code-example">Code Example</h3>
<p>Here’s how to query dense buckets directly from the Parquet files:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> polars <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> pl</span>
<span id="cb1-2"></span>
<span id="cb1-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Load dense buckets (≥5 items) from Parquet index</span></span>
<span id="cb1-4">dense <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (pl.scan_parquet(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"twitter_lsh_index.parquet"</span>)</span>
<span id="cb1-5">    .group_by(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'bucket_id'</span>)</span>
<span id="cb1-6">    .agg(pl.count(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'row_id'</span>).alias(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'count'</span>))</span>
<span id="cb1-7">    .<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">filter</span>(pl.col(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'count'</span>) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>)</span>
<span id="cb1-8">    .sort(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'count'</span>, descending<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb1-9">    .collect())</span>
<span id="cb1-10"></span>
<span id="cb1-11"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Found </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(dense)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> dense buckets"</span>)</span>
<span id="cb1-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Found 42 dense buckets</span></span>
<span id="cb1-13"></span>
<span id="cb1-14"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Get contents of bucket 132 (e.g., "password reset" theme)</span></span>
<span id="cb1-15">bucket_contents <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (pl.scan_parquet(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"twitter_lsh_index.parquet"</span>)</span>
<span id="cb1-16">    .<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">filter</span>(pl.col(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'bucket_id'</span>) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">132</span>)</span>
<span id="cb1-17">    .collect())</span>
<span id="cb1-18"></span>
<span id="cb1-19"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Bucket 132 contains </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(bucket_contents)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> items"</span>)</span>
<span id="cb1-20"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Bucket 132 contains 16 items</span></span></code></pre></div></div>
<p>The key: <strong>no scanning required</strong>. Parquet’s columnar format + Polars’ lazy evaluation means we only read the columns we need.</p>
</section>
</section>
<section id="try-it-yourself" class="level2">
<h2 class="anchored" data-anchor-id="try-it-yourself">Try It Yourself</h2>
<style>
.streamlit-container {
  margin: 2rem 0;
  padding: 1.5rem;
  background: #f8f9fa;
  border: 2px solid #dee2e6;
  border-radius: 8px;
  box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
}
.streamlit-container::before {
  content: "🧠 Interactive Demo";
  display: block;
  font-weight: bold;
  font-size: 1.1rem;
  margin-bottom: 1rem;
  color: #495057;
}
.streamlit-container iframe {
  width: 100%;
  height: 800px;
  border: 1px solid #ced4da;
  border-radius: 4px;
  background: white;
}
</style>
<div class="streamlit-container">
<iframe src="https://semantic-proprioception-demo.streamlit.app/?embedded=true">
</iframe>
</div>
<p><strong>Direct link</strong>: <a href="https://semantic-proprioception-demo.streamlit.app">semantic-proprioception-demo.streamlit.app</a></p>
<p><strong>Source code</strong>: <a href="https://github.com/jdonaldson/semantic-proprioception-demo">github.com/jdonaldson/semantic-proprioception-demo</a></p>
<p>Select a dataset, choose an embedding model, and watch themes emerge automatically. Click into any theme to see the actual text samples that cluster together.</p>
<p>You can also: - Compare how different models (MiniLM-L3/L6/L12, MPNet-base) cluster the same data - Adjust the semantic merging threshold to consolidate or separate themes - Search for similar items using both brute-force cosine similarity and LSH-accelerated lookup</p>
</section>
<section id="what-this-enables" class="level2">
<h2 class="anchored" data-anchor-id="what-this-enables">What This Enables</h2>
<p>Semantic proprioception isn’t just about visualization—it unlocks new capabilities:</p>
<p><strong>Hallucination detection</strong>: If an LLM generates text with high confidence but low embedding density (sparse bucket), it’s likely hallucinating content outside its training distribution.</p>
<p><strong>Active learning</strong>: Sample from sparse regions (novel concepts) or high-entropy buckets (boundary cases) to maximize labeling efficiency.</p>
<p><strong>Content gap analysis</strong>: Compare query density (what users search for) vs.&nbsp;corpus density (what you have) to find opportunities.</p>
<p><strong>Concept drift detection</strong>: Track density distributions over time windows—sudden shifts indicate changing semantics.</p>
</section>
<section id="the-research-behind-it" class="level2">
<h2 class="anchored" data-anchor-id="the-research-behind-it">The Research Behind It</h2>
<p>Key papers: - <strong>Krapivin et al.&nbsp;(2025)</strong>: <a href="https://arxiv.org/abs/2501.02305">Optimal Bounds for Open Addressing Without Reordering</a> — efficient hash table foundation - <strong>Indyk &amp; Motwani (1998)</strong>: <a href="https://dl.acm.org/doi/10.1145/276698.276876">Approximate Nearest Neighbors via LSH</a> — locality-sensitive hashing</p>
</section>
<section id="technical-details" class="level2">
<h2 class="anchored" data-anchor-id="technical-details">Technical Details</h2>
<p>Built with: - <strong>Streamlit</strong> for the interactive UI - <strong>Polars</strong> for fast DataFrame operations - <strong>sentence-transformers</strong> (HuggingFace) for embeddings - <strong>Krapivin hash tables</strong> (Rust + Python bindings) with LSH + count-indexing for O(1) density queries - <strong>Parquet</strong> (zstd compression) for efficient storage</p>
<p>Total dataset size: ~24 MB (1,000 tweets + 1,000 papers + 684 HN posts, 4 models each)</p>
<hr>
<p>The key insight: <strong>data can understand itself</strong>. Give it the right structure (Krapivin hash tables + LSH + count-indexing), and patterns emerge without manual intervention. Not clustering, not search—semantic self-awareness.</p>
<p>Try the <a href="https://semantic-proprioception-demo.streamlit.app">demo</a> and see what patterns hide in your own data.</p>


</section>
</section>

<p><i>© Copyright 2025 Justin Donaldson. Except where otherwise noted, all rights reserved. The views and opinions on this website are my own and do not represent my current or former employers.</i></p> ]]></description>
  <category>machine learning</category>
  <category>embeddings</category>
  <category>data structures</category>
  <guid>https://www.jjd.io/posts/semantic-proprioception-demo.html</guid>
  <pubDate>Sat, 22 Nov 2025 00:00:00 GMT</pubDate>
  <media:content url="https://www.jjd.io/posts/images/omgjjd_semantic_proprioception_0e96acb5-c893-4f1a-9674-b4c5229112f2.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>Intelligence Chases Chaos</title>
  <dc:creator>Justin Donaldson</dc:creator>
  <link>https://www.jjd.io/posts/intelligence_chases_chaos.html</link>
  <description><![CDATA[ 





<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://www.jjd.io/posts/images/omgjjd_intelligence_chases_chaos_9298f497-9a52-4a8f-9ae8-f871339009c6.jpg" class="preview-image img-fluid figure-img"></p>
<figcaption>Intelligence Chases Chaos</figcaption>
</figure>
</div>
<p>Fractals have an astonishing property: their dimensionality is only a fraction of the space they inhabit. The paths of hurricanes, the spread of wildfires, the beating of the human heart—all of these chaotic systems play out in a space far smaller than their theoretical possibilities.</p>
<p>One way of better understanding fractal dimensionality is to understand that it’s not a dimension at all, it’s way of understanding how dimensions change through scaling.</p>
<p>A good way of understanding this phenomenon is to try and measure a coastline.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="https://en.wikipedia.org/wiki/Fractal_dimension"><img src="https://upload.wikimedia.org/wikipedia/commons/c/c8/Britain-fractal-coastline-100km.png" class="img-fluid figure-img"></a></p>
<figcaption>Fractal Coastline</figcaption>
</figure>
</div>
<p>As one shrinks the resolution of their measurement of a coastline, the length one measures will change, but the amount of change is typically bounded within a given range.</p>
<section id="neural-network-fractals" class="level2">
<h2 class="anchored" data-anchor-id="neural-network-fractals">Neural Network Fractals</h2>
<p>Neural networks, especially deep learning architectures, actually exhibit fractal-like properties in their information processing and representational spaces. Just as fractals demonstrate non-integer dimensionality by creating complex structures through recursive self-similarity, neural networks generate intricate representations by layering increasingly abstract and self-similar feature transformations.</p>
<p>Consider convolutional neural networks (CNNs) as an example. As information passes through successive layers, the network progressively extracts features at different scales and levels of abstraction - much like how a fractal reveals different geometric patterns when you zoom in or out. Each layer can be thought of as a kind of “scaling” operation where local patterns are transformed into more global, complex representations. It is a sort of map of the chaos that exists at different levels of abstraction. And, it’s clear that the model is learning some of the same patterns at different levels of abstraction.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://www.jjd.io/posts/images/Feature-maps-at-different-stages-on-the-network.webp" class="img-fluid figure-img"></p>
<figcaption>Neural Feature Hierarchy</figcaption>
</figure>
</div>
<p>The fractal-like behavior emerges most clearly in how neural networks handle high-dimensional data. When processing complex inputs like images or language, networks don’t simply map inputs to outputs linearly. Instead, they create multi-dimensional, non-linear transformation spaces where local similarities and global structures coexist - precisely the kind of space where fractal mathematics becomes compelling.</p>
<p>Researchers like <a href="https://en.wikipedia.org/wiki/Geoffrey_Hinton">Geoffrey Hinton</a> have even suggested that neural network “feature spaces” might be better understood through fractal geometry than traditional Euclidean approaches. The networks generate representations that are neither purely low-dimensional nor simply high-dimensional, but exist in a kind of fractal in-between state. He gives some of his most recent insight on it <a href="https://arxiv.org/pdf/2102.12627">here</a></p>
<p>More concretely, techniques like fractal interpolation and fractal compression have direct computational analogues in how neural networks learn and generalize. Both involve recursive strategies for capturing complex patterns at multiple scales of resolution. This holds for images and text in several senses. Transformer models can “hallucinate” parts of an image at any scale, and language models can condense or expand text handily through summary or extrapolation.</p>
</section>
<section id="the-fractal-shortcut-to-understanding" class="level2">
<h2 class="anchored" data-anchor-id="the-fractal-shortcut-to-understanding">The Fractal Shortcut to Understanding</h2>
<p>Artificial Intelligence is often accused of being “brute force,” of needing endless data to approximate human intuition. But intelligence—biological or artificial—works because the world isn’t as high-dimensional as it seems. The constraints of physics, economics, and even human behavior force events to unfold along surprisingly low-dimensional manifolds. Chaos, for all its turbulence, is confined to a fractal space. While there may be “<a href="https://en.wikipedia.org/wiki/No_free_lunch_theorem">No Free Lunch</a>” from a pure statistical point of view, we thankfully live in a statistical world that happens to also be <em>chaotic</em>.</p>
<p>This is why deep learning models, trained on quintessentially chaotic market systems, can predict stock trends better than random guessing. It’s why language models, given enough text, can anticipate our next words with uncanny accuracy. They aren’t solving the problem in full dimensionality—they’re following the fractal contours where reality actually unfolds.</p>
</section>
<section id="ai-as-the-ultimate-chaos-hunter" class="level2">
<h2 class="anchored" data-anchor-id="ai-as-the-ultimate-chaos-hunter">AI as the Ultimate Chaos Hunter</h2>
<p>If intelligence is about making sense of complexity, then AI is the ultimate tool for chasing Chaos. Traditional physics tries to model the world with explicit equations, but AI sidesteps that limitation. It doesn’t need the equations or full rules of a system—it learns the shape of Chaos directly from observation. Neural networks extract the latent structure from raw events, distilling their fractal dimensionality into something even smaller: a compressed model of reality that understands the large and the small, and how they can relate.</p>
<p>This is why AI-driven weather prediction is overtaking classical models. Why AI in finance can detect patterns even seasoned traders miss. And why AI in science is accelerating discoveries faster than human intuition alone ever could.</p>
<p>Intelligence—ours or artificial—isn’t ultimately about defeating Chaos. It’s about finding its shape in the areas we care about, and learning how to ride its back when necessary. Part of my goal this year is to integrate more fractal consideration of data in my projects this year.</p>


</section>

<p><i>© Copyright 2025 Justin Donaldson. Except where otherwise noted, all rights reserved. The views and opinions on this website are my own and do not represent my current or former employers.</i></p> ]]></description>
  <category>AI</category>
  <guid>https://www.jjd.io/posts/intelligence_chases_chaos.html</guid>
  <pubDate>Mon, 03 Feb 2025 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Creating an LLM Agent for Books</title>
  <dc:creator>Justin Donaldson</dc:creator>
  <link>https://www.jjd.io/posts/book_bot.html</link>
  <description><![CDATA[ 





<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://www.jjd.io/posts/images/omgjjd_robot_understands_books._norman_rockwell_a37ffa0a-493b-4acf-adb2-63372ab29d08.jpg" class="preview-image img-fluid figure-img"></p>
<figcaption>Book Bot</figcaption>
</figure>
</div>
<section id="when-ai-meets-family-legacy-a-new-way-to-interface-with-investment-literature" class="level1">
<h1>When AI Meets Family Legacy: A New Way to Interface with Investment Literature</h1>
<p>In an era where ChatGPT makes headlines and generative AI seems to be everywhere, traditional books might appear antiquated. Yet, there’s something powerful about the permanence of printed wisdom—especially when it captures pivotal moments in financial history and personal (and humbling) revelation. I recently had the unique opportunity to create an AI interface for my father’s book, “The Hidden Power of Rising Dividends,” and the experience revealed something fascinating about the intersection of traditional publishing and modern technology.</p>
<p><a href="https://www.gregdonaldson.com/"><img src="https://www.jjd.io/posts/images/ri_book_cover.jpg" class="float-right img-fluid" width="200"></a></p>
<section id="the-story-behind-the-book" class="level2">
<h2 class="anchored" data-anchor-id="the-story-behind-the-book">The Story Behind the Book</h2>
<p>The narrative begins on Black Monday—October 19, 1987. As markets worldwide spiraled downward, my father, Greg Donaldson, watched his clients’ portfolios hemorrhage value. What could have been a career-ending catastrophe instead became a moment of profound insight, triggered by three unexpected phone calls that would reshape his entire approach to investment valuation.</p>
<p>I realize I’m a biased source of information here, but the resulting book isn’t just another investment manual. It’s a deeply personal journey that weaves together market wisdom, faith, and the raw challenges of building a trustworthy financial practice during times of extreme market volatility. For those interested in the full story, please find the book <a href="https://www.gregdonaldson.com/">here</a>.</p>
</section>
<section id="bridging-generations-through-technology" class="level2">
<h2 class="anchored" data-anchor-id="bridging-generations-through-technology">Bridging Generations Through Technology</h2>
<p>I worked a short while in investment management, but my true calling was in data science (before there was a proper name for it), and large language models (before there was a proper name for it). Using the <a href="https://hal9.com/">hal9</a> framework, I’ve created something unique: an AI agent that is based on my father’s investment philosophy and can engage in detailed discussions about the book’s concepts. This isn’t just another large language model trained on internet data—it’s a specialized interface to a specific body of knowledge of my choosing, maintaining the integrity of the original work while making it interactively accessible.</p>
</section>
<section id="why-this-matters" class="level2">
<h2 class="anchored" data-anchor-id="why-this-matters">Why This Matters</h2>
<p>In our fast-paced digital age, we often sacrifice depth for breadth. While general-purpose AI models can speak broadly about many topics, there’s immense value in systems that deeply understand specific, well-vetted sources. This approach creates a more reliable foundation for learning and discussion, especially in fields like investment where accuracy and context are crucial.</p>
</section>
<section id="try-the-rising-income-bot-yourself" class="level2">
<h2 class="anchored" data-anchor-id="try-the-rising-income-bot-yourself">Try the Rising Income Bot Yourself</h2>
<p>Below this paragraph, you’ll find an interactive interface where you can ask questions about the book’s concepts and investment philosophies for rising income strategies. While this Rising Income AI agent can provide detailed information about the book’s content and concepts, please note that it cannot and should not provide personalized financial advice. All interactions should be considered educational in nature and not as recommendations for specific investment strategies.</p>
<div class="callout callout-style-default callout-important callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Important</span>Important Legal Information
</div>
</div>
<div class="callout-body-container callout-body">
<p>The AI interface provided is for educational and informational purposes only. Neither the AI system nor its responses constitute financial advice, and users should not rely on this system for making investment decisions. All investment strategies involve risk, and past performance does not guarantee future results. Please consult with qualified financial professionals for personalized investment advice tailored to your specific situation.</p>
</div>
</div>
<div class="callout callout-style-default callout-tip callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Tip</span>Please be Patient!
</div>
</div>
<div class="callout-body-container callout-body">
<p>Please be patient with the bot. It is running free on commodity hardware, and as such, isn’t tuned for performance.</p>
</div>
</div>
<iframe src="https://hal9.com/ujavso/risingincome-bot?embed&amp;guest" width="100%" height="400"></iframe>
</section>
<section id="looking-forward" class="level2">
<h2 class="anchored" data-anchor-id="looking-forward">Looking Forward</h2>
<p>This project represents more than just a technological experiment—it’s a bridge between traditional investment wisdom and modern accessibility. While the underlying technology (running on hardware comparable to a PS5 Pro) is impressive, what truly matters is how it helps preserve and transmit valuable knowledge across generations.</p>
<p>The future of financial education might just lie in this blend of time-tested wisdom and cutting-edge technology. By creating focused AI interfaces for specific, high-quality sources, we can maintain the depth and nuance of traditional literature while making it more accessible and interactive for modern learners.</p>


</section>
</section>

<p><i>© Copyright 2025 Justin Donaldson. Except where otherwise noted, all rights reserved. The views and opinions on this website are my own and do not represent my current or former employers.</i></p> ]]></description>
  <category>AI</category>
  <category>investing</category>
  <category>Agent</category>
  <guid>https://www.jjd.io/posts/book_bot.html</guid>
  <pubDate>Mon, 16 Dec 2024 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Automated Coding with LLMs: Making a Rumpus</title>
  <dc:creator>Justin Donaldson</dc:creator>
  <link>https://www.jjd.io/posts/rumpus.html</link>
  <description><![CDATA[ 





<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://www.jjd.io/posts/images/omgjjd_norman_rockwell_a_desk_covered_in_post-it_notes_3ae54c55-a3b4-46ed-b447-3075b1ef8683.jpg" class="preview-image img-fluid figure-img"></p>
<figcaption>TODO</figcaption>
</figure>
</div>
<p>This post introduces a simple tool, called <a href="https://github.com/jdonaldson/rumpus">rumpus</a> that helps me keep track of TODOs, etc. using the macos menubar. It’s not that interesting on its own. What’s interesting is the fact that it’s written completely using a local LLM in 15 minutes. I wanted to write a quick post on the how and why of it, and how I see programming beginning to change with the increasing power that “off the shelf” LLM models can provide.</p>
<p>As a programmer, there are always minor improvements or tweaks I wish I could implement. However, the cost/benefit tradeoff often deters me from spending time on these enhancements. Recently, I’ve been exploring how to integrate large language models (LLMs) into my workflow to streamline this process.</p>
<p>I prefer keeping reminders in the menubar at the top of my screen for easy access, but I find the flexibility of conventional “Todo” apps lacking. To address this, I started using TODO, FIXME, and other comments throughout my code, often accompanied by emojis. These comments are typically actionable and convey more information than a simple tag or word. My menubar is already pretty crowded enough!</p>
<p>Here’s a sample piece of code with such comments:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># </span><span class="al" style="color: #AD0000;
background-color: null;
font-style: inherit;">TODO</span><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: Implement the function to calculate the factorial of a number</span></span>
<span id="cb1-2"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> factorial(n):</span>
<span id="cb1-3">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># XXX: This is a placeholder implementation</span></span>
<span id="cb1-4">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>:</span>
<span id="cb1-5">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb1-6">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span>:</span>
<span id="cb1-7">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># </span><span class="al" style="color: #AD0000;
background-color: null;
font-style: inherit;">FIXME</span><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: This recursive call might cause a stack overflow for large n</span></span>
<span id="cb1-8">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> factorial(n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb1-9"></span>
<span id="cb1-10"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># </span><span class="al" style="color: #AD0000;
background-color: null;
font-style: inherit;">TODO</span><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: Add proper error handling for invalid input</span></span>
<span id="cb1-11"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> safe_factorial(n):</span>
<span id="cb1-12">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">try</span>:</span>
<span id="cb1-13">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>:</span>
<span id="cb1-14">            <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">raise</span> <span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">ValueError</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Negative numbers are not allowed"</span>)</span>
<span id="cb1-15">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> factorial(n)</span>
<span id="cb1-16">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">except</span> <span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">TypeError</span>:</span>
<span id="cb1-17">        <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Input must be an integer"</span>)</span>
<span id="cb1-18">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">except</span> <span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">ValueError</span> <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> ve:</span>
<span id="cb1-19">        <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(ve)</span>
<span id="cb1-20"></span>
<span id="cb1-21"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># </span><span class="al" style="color: #AD0000;
background-color: null;
font-style: inherit;">NOTE</span><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: This is a test function to demonstrate the usage of factorial functions</span></span>
<span id="cb1-22"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> test_factorial():</span>
<span id="cb1-23">    test_cases <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>, <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'a'</span>]</span>
<span id="cb1-24">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> case <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> test_cases:</span>
<span id="cb1-25">        <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Factorial of </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>case<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>safe_factorial(case)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb1-26"></span>
<span id="cb1-27"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># </span><span class="al" style="color: #AD0000;
background-color: null;
font-style: inherit;">FIXME</span><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">: Ensure that the main guard is correctly implemented</span></span>
<span id="cb1-28"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">__name__</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"__main__"</span>:</span>
<span id="cb1-29">    test_factorial()</span></code></pre></div></div>
<p>These comments help track necessary actions across a project. While most IDEs display TODOs in a separate panel, my TODOs are scattered across multiple files, including markdown files that don’t require an editor. For instance, here’s a basic TODO panel from Eclipse. It’s nice, but Eclipse is a memory hog. I don’t want to open it just to see my list.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://www.jjd.io/posts/images/tasks-view.png" class="img-fluid figure-img"></p>
<figcaption>Example IDE showing TODOS</figcaption>
</figure>
</div>
<p>I wanted a centralized list of these flags visible in the menubar, which is always accessible regardless of the active program.</p>
<p>The <a href="https://github.com/jaredks/rumps">rump</a> library simplifies menubar configuration, but it requires reading the API documentation and managing basic UI functionality (e.g., showing file matches under the emoji and opening them when clicked). I started with a simple “Hello World” example using rumps, with the help of an LLM:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> rumps</span>
<span id="cb2-2"></span>
<span id="cb2-3"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">class</span> HelloWorldApp(rumps.App):</span>
<span id="cb2-4">    <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">__init__</span>(<span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>):</span>
<span id="cb2-5">        <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">super</span>(HelloWorldApp, <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">self</span>).<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">__init__</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Hello World"</span>)</span>
<span id="cb2-6"></span>
<span id="cb2-7"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">__name__</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"__main__"</span>:</span>
<span id="cb2-8">    HelloWorldApp().run()</span></code></pre></div></div>
<p>From there, it only took a few iterations to develop a script that processes path/extension arguments, searches through files, and tabulates the hits into emoji-based entries in the menubar. The final result looks like this:</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://www.jjd.io/posts/images/rumpus.png" class="img-fluid figure-img"></p>
<figcaption>rumpus</figcaption>
</figure>
</div>
<p>This tally of tasks and reminders in my menubar was satisfying to create end-to-end using a library I wanted to work with and an LLM to help compose the functionality. Coding the entire thing took about 15 minutes, far less time than writing this blog post.</p>
<p>Automated coding is reaching a point where it can significantly shift the cost/benefit analysis for certain tasks. While there may still be challenges, I believe the resulting <a href="https://github.com/jdonaldson/rumpus/blob/main/src/rumpus.py">script</a> is of higher quality than my usual “15 minute” hacks. I also learned that it’s a good idea to use a combination of libraries and tools as a starting point, rather than just letting the model decide itself what to use.</p>
<p>There’s certainly more to be written here, but it’s not bad for 15 minutes of coding!</p>



<p><i>© Copyright 2025 Justin Donaldson. Except where otherwise noted, all rights reserved. The views and opinions on this website are my own and do not represent my current or former employers.</i></p> ]]></description>
  <category>LLM</category>
  <category>CodeGen</category>
  <guid>https://www.jjd.io/posts/rumpus.html</guid>
  <pubDate>Mon, 22 Jul 2024 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Blogging with Quarto</title>
  <dc:creator>Justin Donaldson</dc:creator>
  <link>https://www.jjd.io/posts/quarto.html</link>
  <description><![CDATA[ 





<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://www.jjd.io/posts/images/quarto.webp" class="preview-image img-fluid figure-img"></p>
<figcaption>Quarto Example</figcaption>
</figure>
</div>
<p>I’m creating a new blog, and was surveying the various options. I’ve been using the default <a href="https://jekyllrb.com/docs/github-pages/">Jekyll “Pages”</a> functionality offered from Github for years, but it’s time to move to something more modern.</p>
<p>I came across <a href="https://quarto.org/">Quarto</a> from a recent <a href="https://www.fast.ai/posts/2022-07-28-nbdev2.html">post by Jeremy Howard</a>, and was immediately sold. Quarto has its lineage somewhat in the realm of Tex, which is an ancient text setting program written by <a href="https://en.wikipedia.org/wiki/Donald_Knuth#Digital_typesetting">Donald Knuth</a>, one of the early fathers of computer programming. The Tex computer program eventually evolved into <a href="https://en.wikipedia.org/wiki/LaTeX">LaTex</a>, which has a scholarly citation management system called <a href="https://en.wikipedia.org/wiki/BibTeX">BibTeX</a>. Quarto has a mechanism to use these scholarly bibliographic formats, like the ones I maintain <a href="https://github.com/jdonaldson/jjd.io/tree/main/bibliographies">for myself</a> when I was writing my <a href="https://www.proquest.com/openview/a8602e3dae373559fa6d87841fa265e6/1?pq-origsite=gscholar&amp;cbl=18750">dissertation</a>.</p>
<p>However, instead of using the <a href="http://xahlee.info/cmaci/notation/TeX_pestilence.html">somewhat obtuse</a> LaTex syntax, Quarto uses variations on <a href="https://en.wikipedia.org/wiki/Markdown">Markdown</a>, which is a much simpler plain text format for creating basic formatted text. The “qmd” format is easy enough to type out manually with no editor other than a basic terminal (which is also what I use to write these posts). You can see the basic qmd format for this post <a href="https://github.com/jdonaldson/jjd.io/blob/main/posts/quarto.qmd">here</a>.</p>
<p>I find that I don’t need to write as many mathematical formulas here. Although Quarto enables that easily as well. Here’s one of my favorites, the formula for <a href="https://en.wikipedia.org/wiki/Entropy#Information_theory">entropy in information theory</a>.</p>
<p><img src="https://latex.codecogs.com/png.latex?H(X)%20=%20-%5Csum_%7Bi=1%7D%5En%20p(x_i)%20%5Clog%20p(x_i)"></p>
<p>I can also draw some simple diagrams using a builtin <a href="https://mermaid-js.github.io/mermaid/#/">Mermaid</a> syntax:</p>
<div class="cell" data-layout-align="default">
<div class="cell-output-display">
<div>
<p></p><figure class="figure"><p></p>
<div>
<pre class="mermaid mermaid-js">flowchart LR
  A[Hard edge] --&gt; B(Round edge)
  B --&gt; C{Decision}
  C --&gt; D[Result one]
  C --&gt; E[Result two]
</pre>
</div>
<p></p></figure><p></p>
</div>
</div>
</div>
<p>I can also easily embed youtube clips:</p>
<div class="quarto-video ratio ratio-16x9"><iframe data-external="1" src="https://www.youtube.com/embed/IM4PVmmKnog" title="" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe></div>
<p>All in all, most of the basic <a href="https://en.wikipedia.org/wiki/WYSIWYG">WYSIWYG</a> formatting work is handled through simple text specification. There’s ways to tweak things even more using css, but for now I’m happy with some basic defaults and theming.</p>
<p>There’s two main reasons why I think it’s worth using Quarto:</p>
<section id="its-free" class="level1">
<h1>It’s Free</h1>
<p>It doesn’t cost a dime to use Quarto. I simply took the time to set it up with my free Github account, and use it to publish my site free on Github as well.</p>
<p>Now, there are some that will argue that Quarto is not at all free, in fact it costs time to set this up and write things yourself.</p>
<p>However, I would also argue:</p>
</section>
<section id="it-has-a-high-give-a-s-factor" class="level1">
<h1>It Has a High “Give a S@&amp;$! Factor”</h1>
<p>We’re entering an age where most text written online will be generated or influenced by a generative AI. This post has been written with the help of one. However, I believe that human communication requires some actual proof of work… some way of showing that the author cares about the topic. It’s going to get very difficult to do that unless you structure your communication in a way that shows a deep level of interest in a topic, and not just churning out one-off dross for the sake of posting to Linkedin.</p>
<p>The main reason isn’t that it’s cheap or shows you care though, it’s because you should own your story as long as you can, and you will probably outlive any company that tries to do too many things for you. This brings up the next advantage of Quarto:</p>
</section>
<section id="its-open-source-and-lindy" class="level1">
<h1>It’s Open Source and Lindy</h1>
<p>Technology has been around long enough that it should be clear that languages, frameworks, and applications come and go along with the companies that invent them. The concept of the <a href="https://en.wikipedia.org/wiki/Lindy_effect">Lindy effect</a> argues that future life expectancy of some non-perishable thing (like an idea or technology) is proportional to its age. Right now, Quarto is built on top of some of the oldest ideas in all of computer theory, and it’s editable and usable inside my <a href="https://en.wikipedia.org/wiki/Vim_(text_editor)#Neovim">Neovim</a> editor, which itself is based off of the <a href="https://en.wikipedia.org/wiki/Vi_(text_editor)">VI editor</a> from 1976, using the <a href="https://en.wikipedia.org/wiki/QWERTY">QWERTY keyboard layout</a> from 1878.</p>
<p>None of these technologies are going to go anywhere. They’re good bets for writing things down that you want to control and preserve.</p>
</section>
<section id="thanks-for-reading" class="level1">
<h1>Thanks for Reading!</h1>
<p>I hope I gave a good overview of what Quarto is about, and why it matters. Thanks for taking the time to read. Hoping to see your Quarto site out there some day!</p>


</section>

<p><i>© Copyright 2025 Justin Donaldson. Except where otherwise noted, all rights reserved. The views and opinions on this website are my own and do not represent my current or former employers.</i></p> ]]></description>
  <category>blog</category>
  <guid>https://www.jjd.io/posts/quarto.html</guid>
  <pubDate>Mon, 15 Apr 2024 00:00:00 GMT</pubDate>
</item>
</channel>
</rss>
