Peter Cotton
Book Length
-
Microprediction: Building an Open AI Network — MIT Press, 2022.
-
An Analytic Approach to Ornstein-Uhlenbeck Processes with Fluctuating Parameters and Applications in the Modeling of Fixed Income Securities — PhD thesis.
Portfolio construction & covariance · precise
-
Schur Complementary Portfolios — preprint, 2024.
abstract
Despite many attempts to make optimization-based portfolio construction in the spirit of Markowitz robust and approachable, it is far from universally adopted. Meanwhile, the collection of more heuristic divide-and-conquer approaches was revitalized by Lopez de Prado where Hierarchical Risk Parity (HRP) was introduced. This paper reveals the hidden connection between these seemingly disparate approaches.
-
Schur Pseudo-Likelihood
abstract
We introduce the Schur pseudo-likelihood, a one-parameter family that damps the cross-block coupling of the Gaussian likelihood through its Schur complements, for scoring and regularizing covariance and correlation estimates in high dimensions. The Gaussian log-likelihood is the standard criterion, but when the dimension rivals the sample size it is governed by the smallest, least-identifiable eigenvalues of the estimate, and as a criterion for ranking or selecting estimates it becomes unreliable; damping the coupling restores it. The optimal damping has a closed form—the reliability of the coupling, a James–Stein shrinkage. On real crypto returns, choosing a shrinkage estimate by the Schur pseudo-likelihood rather than by the full likelihood yields several-fold lower out-of-sample portfolio variance once the dimension rivals the sample size.
-
Two Sides of Schur Damping: High-Dimensional Pseudo-Likelihoods and Portfolio Allocation — preprint, 2026.
abstract
Two communities that rarely cite each other -- spatial statisticians fitting high-dimensional weather fields, and quantitative investors building portfolios -- have independently arrived at the same mathematical object: a Schur complement, damped by one interpretable parameter. In spatial modeling the Schur complement is the conditional covariance that makes a Gaussian (Vecchia) pseudo-likelihood estimable at scale, and recent work regularizes it by shrinking toward a base model. In allocation it is the residual risk of a bet net of its hedge, and the same parameter interpolates hierarchical risk parity and the minimum-variance portfolio. We show these are one operation -- reliability shrinkage of a conditional Gaussian -- so that the damping a weather model needs to remain estimable when stations outnumber observations is, term for term, the damping a portfolio needs to remain stable when assets outnumber returns. The optimal amount is a closed-form reliability, a James-Stein shrinkage that is simultaneously a Ledoit-Wolf intensity. The shrinkage machinery is classical, but the identity appears to be new: to our knowledge neither literature has noted that the conditional shrinkage a spatial model fits and the diversification-variance tilt a portfolio chooses are one and the same quantity. We make the correspondence precise, note that the two literatures have each supplied what the other lacks, and report a small experiment on the one genuinely open choice -- how to set the damping -- suggesting the spatial community's fitted intensity is, if anything, the better recipe.
-
Correlation Inflation
abstract
Traditional covariance shrinkage estimates pull an empirical covariance matrix Σ towards a lower information anchor (often λI), reducing estimation error at the cost of suppressing correlation. This note proposes the opposite: geodesic covariance inflation—a controlled movement of Σ towards a perfectly-correlated limit. Intended not as a consistent estimator but merely a device for use in the context of portfolio construction, we propose that instead of discounting correlation, we exaggerate it in a geometrically natural way, yielding a family of positive–definite matrices indexed by an inflation parameter γ ∈ [0, 1].
Conformal prediction · conformalprediction
-
Marginally Useful: Formalizing the Information Gap in Conformal Prediction
abstract
Conformal prediction gives finite-sample, distribution-free marginal coverage for a set. The guarantee is real, and it is often misread as evidence of forecast quality. We separate the two with one decomposition, the residual-information gap: for a fixed location predictor and a single-shape residual predictive system, the log-score regret relative to the oracle is exactly the mutual information $I(R;X)$ between the residual and the input. Conformalization re-levels coverage but cannot touch this quantity, because it is a property of the predictor's shape class and not of calibration; no recalibration that ignores $X$ reduces it within that class. The familiar cautions about conformal prediction follow as context: marginal coverage is not conditional, validity is insensitive to sharpness, and the guarantee needs exchangeability.
-
A Feynman–Wigner-Style Diagnostic for the Efficacy of Conformal Prediction via Signed de Finetti Representations
abstract
Conformal prediction builds prediction sets that cover the truth at a rate you choose, finite-sample and distribution-free, assuming only exchangeable data. That guarantee is marginal. de Finetti's theorem describes the exchangeability it rests on, and in the finite form the mixing measure may be signed kerns2006. We present a short lemma that decomposes the slope of conformal's calibration-conditional coverage into two terms. One is a non-negative threshold term, the classical Beta-law fan. The other carries the sign of the de Finetti measure. Positive (extendable) mixtures make conformal conditionally adaptive. The signed corner—de-meaned, ranked, or compositional scores—makes it anti-adaptive. The marginal guarantee is the same either way. Only what it hides changes.
-
Betting Against a Conformal Predictor: A Parimutuel Account of the Information Gap
abstract
The companion paper shows that the log-score regret of a single-shape conformal predictor to the conditional oracle is the mutual information $I(R;X)$ between the residual and the input. Here we rederive that quantity from a betting mechanism rather than from the score. Treat the predictor as the crowd in a parimutuel pool on the residual: bettors put money in, and the pot is split among the winners in proportion to their stake on the realised outcome. In the continuous limit the payoff is the ratio of the bettor's density to the crowd's. An entrant who knows only the marginal breaks even, which is marginal coverage stated as wealth. An entrant who conditions on $X$ grows his bankroll at rate exactly $I(R;X)$. The gap is the rent. This is not a metaphor: the nearest-the-pin pool of the microprediction platform, and the continuous density version run in the MidOne contest, are this mechanism, and a conformal predictor is the entrant that prices the pool flat in $X$. We then give two ways to measure the rent on a fitted predictor: a static lower bound from distance covariance, and a sequential e-process whose growth rate estimates $I(R;X)$ and which is an anytime-valid test for conditional miscoverage.
-
The Width of the Conformal Fan: Dependence and the Variance of Realized Coverage
abstract
Split conformal prediction's realized, calibration-conditional coverage fluctuates around the nominal $1-$. For exchangeable-but-independent calibration scores it follows the $Beta(k,n-k+1)$ law, variance about $(1-)/n$, the classical fan. We show the width of that fan is governed by the sign of the cross-sample dependence. The mean stays pinned at the marginal level whatever the dependence; the variance, to leading order, is the average pairwise covariance of the exceedance indicators at the operating quantile times the independent fan. Positive (extendable) dependence adds a between-dataset term and widens the fan; negative dependence narrows it, to exactly zero at the maximally negatively associated contest floor $=-1/(n-1)$, where the realized coverage equals the nominal level identically. We prove the positive side exactly through de Finetti and the law of total variance, the negative side exactly at the floor and to leading order in general, and reduce the remaining finite-sample inequality to a convex-order contraction of the exceedance count, supported numerically across dependence structures. The governing sign is the one the companion note attaches to the finite de Finetti measure: a genuine prior widens the fan, the signed corner collapses it.
Contests, ranking & choice models · winning
-
Inferring Relative Ability From Winning Probability in Multi-Entrant Contests — SIAM Journal on Financial Mathematics, 2021.
abstract
We provide a fast and scalable numerical algorithm for inferring the distributions of participant scores in a contest, under the assumption that each participant’s score distribution is a translation of every other’s. We term this the horse race problem, as the solution provides one way of assigning a coherent joint probability to all outcomes, and pricing arbitrarily complex horse racing wagers. However, the algorithm may also find use anywhere winning probabilities are apparent, such as with e-commerce product placement, in web search, or, as we show, in addressing a fundamental problem of trade: who to call, based on market share statistics and inquiry cost.
-
A Scalable Algorithm for Subset Selection and Rank Probabilities in Contests and Latent Variable Choice Models
abstract
A k-subset of items will be chosen from n according to values taken by n auxiliary variables X1 , . . . , Xn interpreted as performances in a contest. Item i is chosen if Xi ≤ X (k) where X (k) is the k’th order statistic. A numerical algorithm is presented for computing many k-combination choice probabilities quickly, for small k but potentially large n ≫ 1, 000, 000. No assumption is made on the 1-margin distributions of the Xi , and the analytical convenience survives the introduction of dependence via a factor model also. The computation of rank probabilities for k items is a corollary. The algorithm is provided in the winning package, on PyPI.
-
Luce's Choice Axiom Isn't the Only Choice! Combinatorial Contest and Rank Probabilities Using the Python Winning Package
abstract
A subset of k items will be chosen from n according to values taken by n variables X1 , . . . , Xn interpreted as performances in a contest. Item i is chosen if Xi ≤ X (k) where X (k) is the k’th order statistic. A numerical algorithm is presented for computing many k-combination choice probabilities quickly, for small k but large n ≫ k in the millions. Rank probabilities for k can also be computed. Some users of the winning package may wish to calibrate models for latent Xi from partial information such as winning or losing probabilities. Others may prefer to supply arbitrary performance distributions. The analytical convenience of this method also survives the introduction of dependence in the Xi via a low-dimensional Copula.
-
A Paradox in Machine Preference
abstract
Using prompts such as: “my favorite state in the US is [MASK]”, and “my favorite Western state in the U.S. is [MASK]” we infer that Thurston models are a better match to the revealed preferences of large language models than the application of Luce’s Choice Axiom. There is some irony in this finding given that Softmax functions, responsible for the token probabilities we interpret as preference, suggest Independence of Irrelevant Alternatives.
Distributional prediction & microprediction · skaters
-
Self-Organizing Supply Chains for Microprediction: Present and Future Uses of the ROAR Protocol — preprint, 2019.
abstract
A multi-agent system is trialed as a means of crowd-sourcing inexpensive but high quality streams of predictions. Each agent is a microservice embodying statistical models and endowed with economic self-interest. The ability to fork and modify simple agents is granted to a large number of employees in a firm and empirical lessons are reported. We suggest that one plausible trajectory for this project is the creation of a Prediction Web.
-
A Platform for Assessing and Combining Autonomous Short-Horizon Distributional Predictions
abstract
The operation of a novel open-source platform where mostly autonomous algorithms are tasked with predicting a large variety of streaming data is described. Reward and combination is achieved by means of a near-the-pin mechanism generalizing lottery countbacks to continuous space.
-
How Should Forecasts be Engineered: The Indispensible Markets Hypothesis
abstract
We consider evidence in support of what we term the Weak Indispensable Markets Hypothesis (IMH) provided by the M6 Financial Forecasting Competition. The Weak IMH asserts in the presence of a well-established market, those who eschew prices as inputs for proximate predictive modeling tasks will under-perform out of sample. We also consider the Strong IMH which asserts that forecasting should be considered a market-inspired engineering endeavor in addition to a modeling task under the usual rubric of statistics or machine learning methods (put simply: if a market doesn’t exist to help you, make one!). The competition established that neither principle’s application is obvious to participants or organizers; it hinted at a hidden quality crisis in data science generally; and it suggests that broadening the usual concept of analytic pipelines to insert collective intelligence might be part of the remedy.
-
skaters: Model First, Conform Last — A Composition-Based Automatic Online Distributional Forecaster
abstract
The Python package skaters performs online univariate time-series forecasting in which every prediction is a full probability distribution rather than a point. A forecaster is built by composition: invertible transforms chain together above a single distributional leaf, and ensembles combine such chains. The leaf fits its shape by optimising a proper scoring rule, so its objective is a choice rather than a fixed property of the method. This separates two concerns the forecasting and conformal-prediction literatures tend to merge: fitting the model, judged by likelihood, and conforming the predictive tail to a downstream score such as CRPS. We call the arrangement model first, conform last. The library is written twice, in pure Python and in zero-dependency JavaScript that agrees to within 10−6 , so a model runs unchanged on a server or in a browser. We evaluate it against classical, neural, and pretrained foundation-model baselines on FRED series.
Global optimization · humpday
-
Go Forth! Simple Detection of Incomplete Meta-Learning by Algorithms Performing Limited Exploration on a Rugged Landscape
abstract
It is shown that an algorithm given two chances to improve its position on the path of an exponentiated Ornstein-Uhlenbeck (OU) process should not choose its final position between the first two locations. It is sometimes easy, therefore, to diagnose failure of an algorithm to learn the optimal policy. The proof introduces the notion of rapidity on an OU bridge, and complements similar results in managerial science that are used as metaphors for complex but ill-defined business and strategy problems.
-
HumpDay: Multi-Dimensional Thurstone Calibration for Context-Specific Optimizer Selection
abstract
Selecting appropriate optimization algorithms remains a critical challenge, with existing approaches providing either global rankings that ignore problem context or requiring complex installations that limit accessibility. We present HumpDay, a browser-based platform that applies multi-dimensional Thurstone calibration to provide context-specific optimizer recommendations. Rather than single global rankings, our approach maps optimizer performance across three key dimensions: landscape characteristics (smooth, multimodal, rugged), problem dimensionality (low, medium, high), and computational budget (low, high). Using 432 benchmark comparisons across derivative-free optimizers, we demonstrate that no single optimizer dominates globally, but clear specialization patterns emerge. For example, L-BFGS-B excels on smooth high-dimensional problems with limited budgets (performance 0.806), while Powell's method dominates rugged landscapes with ample computational resources (0.917). The platform runs entirely in-browser via Pyodide, enabling zero-installation access and interactive exploration of these performance relationships. This multi-dimensional calibration approach transforms optimizer selection from guesswork to principled decision-making based on problem characteristics.
Market making & trading
-
On the Relationship Between Accuracy and Profitability in Over-the-Counter Market Making
abstract
Intuitively, accuracy in microstructure prediction at a granular level must relate to profitability for a market participant, but this is not trivial to formalize. Here we provide an approximation using a stylized steadystate model for over-the-counter trading modeled as a sequence of sealed bid auctions. A simple picture emerges due to a mildly surprising feature of this model: one does not need to know the fair price, only where others are bidding.
-
Trading Illiquid Goods: Market Making as a Sequence of Sealed-Bid Auctions, with Analytic Results
abstract
We provide analytic results for the optimal control problem faced by a market maker who can only obtain and dispose of inventory via a sequence of sealed-bid auctions. Under the assumption that the best competing response is exponentially distributed around a commonly discerned fair market price we examine properties of the market maker’s optimal behavior. We show that simple adjustments to skew and width accommodate customer arrival imbalance. We derive a straightforward relationship between the market marker’s fill probability and direct holding costs. A simple formula for optimal bidding in terms of (non-myopic) inventory cost is presented. We present the results as a perturbation of an improvement to a “linear skew, constant width” (CWLS) market making heuristic.
Fixed income & stochastic volatility
-
Stochastic Volatility Corrections for Interest Rate Models — 2004.
-
Derivatives in Financial Markets with Stochastic Volatility — chapter, Cambridge University Press, 2000.
Epidemiology
-
Addressing the Herd Immunity Paradox Using Symmetry, Convexity Adjustments and Bond Prices — preprint, 2020.
abstract
In constant parameter compartmental models an early onset of herd immunity is at odds with estimates of R values from early stage growth. This paper utilizes a result from the theory of interest rate modeling, namely a bond pricing formula of Vasicek, and an approach inspired by a foundational result in statistics, de Finetti's Theorem, to show how the modeling discrepancy can be explained. Moreover the difference between predictions of classic constant parameter epidemiological models and those with variation and stochastic evolution can be reduced to simple "convexity" formulas. A novel feature of this approach is that we do not attempt to locate a true model but only a model that is equivalent after permutations. Convexity adjustments can also be used for cross sectional comparisons and we derive easy to use rules of thumb for estimating threshold infection level in one region given knowledge of threshold infection in another.
-
Repeat Contacts and the Spread of Disease: An Agent Model with Compartmental Solution — preprint, 2020.
abstract
Using a probability of novel encounter derived from a physical model, we augment the SIR compartmental model for disease spread. Scenarios with the same initial trajectories and identical $R_0$ values can diverge greatly depending on the speed at which our circles of acquaintances grow stale - leading to order of magnitude differences in final case counts. A momentum effect arises from variation in the mean time since infection, and this feeds back into new infection rate and faster decline in the late stages of an outbreak. Rapid extinction of an outbreak can occur in the early stages, but once this opportunity is missed the effect is diminished and then, only herd immunity can help.
Sports analytics · firstdown
-
Stop Shy of the First Down — in Sports Analytics, World Scientific, 2021.
Mathematical analysis
-
Contraction of an Adapted Functional Calculus
abstract
We aim to show, using the example of a Riemannian symmetric pair (G, K) = (SL2 (R), SO(2)), how contraction ideas may be applied to functional calculi constructed on coadjoint orbits of Lie groups. We construct such calculi on principal series orbits and generic orbits of the Cartan motion group V ⋊ K , and show how the two are related. Since the calculi are adapted to the representations traditionally attached to the orbits, we recover at the Lie algebra level the contraction results of Dooley and Rice [5].
Software
- precise — Online (incremental) covariance and correlation estimation — the online complement to sklearn.covariance. · code
- humpday — Taking the pain out of choosing a Python global optimizer. · code
- thurstone — Fast ability inference from contest winning probabilities. · winning
- skaters — Fast univariate time series models that run in Pyodide. · code
Selected Talks
-
The Future of AI: From Mathematics' Revenge to the Rise of Prompt Markets
-
Schur Complementary Portfolios
abstract
Despite many attempts to make optimization-based portfolio construction in the spirit of Markowitz robust and approachable, it is far from universally adopted. Meanwhile, the collection of more heuristic divide-and-conquer approaches was revitalized by Lopez de Prado where Hierarchical Risk Parity (HRP) was introduced. This paper reveals the hidden connection between these seemingly disparate approaches.
-
Who Ya Gonna Call? A Solution to the Horse Race Problem with Application to OTC Markets
-
Trading Illiquid Goods
-
Filtering Bond and Credit Default Swap Markets
-
Barbell Bond Portfolios: What Do They Accidentally Optimize?
Patents
-
System and Method for Providing Data Science as a Service
-
System and Method for Secure Causality Discovery
-
System and Method for Analyzing Financial Models with Probabilistic Networks
-
System and Method for Pricing Default Insurance