Math-heavy fixture

Display equations and inline math interleave with prose. KaTeX or MathJax may render math after widget mount; the walker should treat math containers as no-mark zones (they don’t contain natural-language prose).

Per plan/01_renderer.md, math blocks render as <div class="math-display" data-block-id="…"> and inline math is wrapped in <span class="math-inline">. Both should be skipped by the term walker.

Gradients and Jacobians

The gradient of a scalar function $f: \mathbb{R}^n \to \mathbb{R}$ is the vector of partial derivatives:

$$ \nabla f(x) = \left( \frac{\partial f}{\partial x_1}, \dots, \frac{\partial f}{\partial x_n} \right) $$

The Jacobian generalizes the gradient to vector-valued functions $f: \mathbb{R}^n \to \mathbb{R}^m$:

$$ J_f = \begin{pmatrix} \frac{\partial f_1}{\partial x_1} & \cdots & \frac{\partial f_1}{\partial x_n} \ \vdots & \ddots & \vdots \ \frac{\partial f_m}{\partial x_1} & \cdots & \frac{\partial f_m}{\partial x_n} \end{pmatrix} $$

The Hessian is the matrix of second-order partials. In the LaTeX source above, the words “gradient”, “Jacobian”, and “Hessian” do not appear — they’re only in the surrounding prose, which is where they should be marked.

Eigenvalues and SVD

An eigenvalue $\lambda$ and eigenvector $v$ satisfy $A v = \lambda v$. The SVD factors any matrix $A$ as:

$$ A = U \Sigma V^\top $$

where $\Sigma$ holds the singular values. PCA is the special case where we use the SVD of a centered data matrix to find directions of maximum variance.

Norms

The L2 norm of $v$ is $|v|_2 = \sqrt{\sum_i v_i^2}$. The L1 norm is $|v|_1 = \sum_i |v_i|$. The generic norm in prose refers to whichever is in scope.

Softmax and cross-entropy

The softmax of a vector $z$ is:

$$ \text{softmax}(z)_i = \frac{e^{z_i}}{\sum_j e^{z_j}} $$

The inverse is the logit. Cross-entropy loss between a target distribution $p$ and a prediction $q$ is $H(p, q) = -\sum_i p_i \log q_i$. KL divergence is $D_{KL}(p | q) = \sum_i p_i \log \frac{p_i}{q_i}$.

Attention

In self-attention, queries $Q$, keys $K$, and values $V$ combine via scaled dot product:

$$ \text{Attention}(Q, K, V) = \text{softmax}\left( \frac{Q K^\top}{\sqrt{d_k}} \right) V $$

The numerator’s normalization is cosine similarity when $Q$ and $K$ are unit-norm. The whole operation acts on tensors of shape (batch, heads, seq, d_k). einsum notation compresses the math: bhqd, bhkd -> bhqk.

Positional encoding adds a sinusoidal vector to each token embedding so the dot-product attention can distinguish positions:

$$ PE_{(pos, 2i)} = \sin\left( pos / 10000^{2i/d} \right), \quad PE_{(pos, 2i+1)} = \cos\left( pos / 10000^{2i/d} \right) $$

Inline math density

Inline math in dense paragraphs: when $f(x) = x^2$ and $g(x) = e^x$, the chain rule gives $\frac{d}{dx} g(f(x)) = g’(f(x)) f’(x) = e^{x^2} \cdot 2x$. The terms “gradient” and “attention” appear in this very paragraph and should be marked in the prose, but the math expressions $f$, $g$, $e^{x^2}$ should NOT be touched by the walker.