Entropy Engine MVP Build Specification
- Fellow Traveler

- 5 days ago
- 11 min read
Formatted for AI code generator ingestion.
Entropy Engine MVP Build Specification
System 1 (Fast Layer) Accountant Module + Minimal Spokesperson Wrapper
Henry Pozzetta
February 27 2026 v3
Version 1.0 — Build Specification for MVP Implementation
0. TL;DR
This is a complete build specification for a working MVP of the Entropy Engine — the patent-pending real-time behavioral monitoring system grounded in Shannon entropy and the Khinchin uniqueness theorem.
What it builds:
A System 1 (Fast Layer) “Accountant” that computes Shannon entropy and its first three derivatives over any incoming data stream, packages the results into structured EeFrames, and routes them through a simplified Decision Gate — plus a minimal Spokesperson module that lets a human operator connect to any of 12 live public data streams, monitor the entropy dynamics in real time, control the system, and review a full audit trail.
What it proves:
That the Entropy Engine’s core mathematical signal works as described across heterogeneous, domain-agnostic data sources — financial markets, seismic activity, transit systems, IoT sensors, certificate transparency logs — using the same architecture, the same math, and the same EeFrame protocol for all of them. One instrument, many substrates.
What it doesn’t build:
No Librarian (pattern memory), no Constraint Auditor (semantic analysis), no Calibrator (adaptive thresholds), no Comparator (fleet monitoring), no Diagnostician (full recursive self-monitoring). Those are System 2 modules specified in the architecture documents but out of scope for this MVP. The Spokesperson includes a lightweight self-monitoring heartbeat as a placeholder for the Diagnostician path.
Who can build from this:
Any competent Python developer or AI code generator. Every design decision is specified. Every ambiguity is resolved. Every formula is defined. The document explains not just what to build but why each component exists, so the implementer doesn’t accidentally optimize away load-bearing features.
Patent alignment:
USPTO Applications #63/863,992 and #63/944,187. This MVP implements the Fast Layer, EeFrame protocol, and telemetry normalization pipeline as specified in the patent claims. The EeFrame schema matches the patent specification. Field names, stability classifications, and architectural boundaries are consistent with the filed claims.
1. Purpose and Scope
What This Document Is
This is a build specification — a complete set of instructions sufficient for a proficient developer or AI code generator to produce a working MVP of the Entropy Engine’s System 1 architecture. It specifies every computation, every data structure, every interface boundary, every default value, and every error-handling behavior. Where design decisions must be made, this document makes them. Where ambiguity could arise, this document resolves it.
What the MVP Implements
The MVP implements two of the Entropy Engine’s architectural components:
The Accountant (System 1 / Fast Layer). A black-box numerical processor that accepts a stream of scalar values, computes Shannon entropy and its first three time-derivatives over rolling windows, classifies the system’s stability state, and emits structured coordination signals called EeFrames. The Accountant has zero knowledge of what it is monitoring. It reads distributional dynamics the way a cardiologist reads a heart rhythm — without knowing the patient’s name, history, or what they had for breakfast.
The Spokesperson (Minimal System 2). The sole human interface. It connects to live public data streams, feeds normalized values to the Accountant, translates EeFrames into operator-readable language, provides controls including the ability to switch streams, adjust thresholds, pause, resume, and shut down, and maintains a complete audit trail of every frame emitted and every operator action taken.
Together, these two components demonstrate the Entropy Engine’s core claim: that Shannon entropy dynamics provide a domain-agnostic, mathematically grounded monitoring signal for any system producing a probability distribution over possible next states.
What Is Explicitly Out of Scope
The following System 2 modules are defined in the full Entropy Engine architecture but are not implemented in this MVP. The architecture accommodates their future addition through defined extension points (see Section 20).
The Librarian — pattern recognition and surface geometry classification across sessions. Requires accumulated EeFrame history that this MVP will produce but does not yet analyze.
The Constraint Auditor — semantic constraint extraction and contradiction detection using a dedicated language model. Requires a monitored system that produces natural language, which external data streams do not.
The Calibrator — adaptive threshold optimization from historical performance. Requires the Librarian’s classification history, which does not yet exist.
The Comparator — cross-system pattern detection across multiple monitored agents. Requires multiple simultaneous Accountant instances, which this MVP does not run.
The Diagnostician — full recursive self-monitoring using the same entropy formalism applied to the Engine’s own operational distributions. The MVP includes a lightweight self-monitoring heartbeat as a structural placeholder.
These deferrals are safe because the architecture is modular by design. Each module communicates through defined interfaces — EeFrames, constraint records, surface classifications — never through direct access to another module’s internal state. The MVP produces EeFrames through the same interface that future modules will consume. Nothing built here requires rebuilding when System 2 modules are added.
Patent Alignment
This MVP implements claims from USPTO Applications #63/863,992 and #63/944,187, specifically: domain-agnostic telemetry normalization through a configurable pipeline, Shannon entropy calculation with first and second derivatives as predictive indicators, Environmental Stability Index computation, and emission of structured EeFrame coordination signals. The EeFrame schema in this specification matches the patent-filed schema. Implementers should preserve field names, stability classification labels, and the architectural separation between the Accountant’s computation path and the Spokesperson’s control surface exactly as specified.
Who This Document Is For
A Python developer with working knowledge of asyncio, numpy, and WebSocket/HTTP client libraries. Alternatively, an AI code generator capable of producing production-quality Python from detailed specifications. No knowledge of information theory, the Ledger Model, or the Entropy Engine’s theoretical foundations is required — this document provides everything the implementer needs. Domain expertise helps for understanding why the design works; it is not necessary for building what the design specifies.
2. Architectural Context: Why Each Component Exists
The Entropy Engine’s architecture is not a collection of features assembled by preference. It is a chain of forced design decisions. Each component exists because the component beneath it is provably insufficient alone. Understanding this chain prevents an implementer from treating any component as optional or from merging responsibilities that must remain separate.
The chain has six links. The MVP implements the first four directly and provides structural placeholders for the fifth and sixth.
Link 1: Mathematical constraint on signal choice. Khinchin proved in 1957 that any scalar measure of distributional uncertainty satisfying three minimal axioms — continuity, maximality, and additivity — is Shannon entropy or a scalar multiple of it. This means the question “what number should we track?” has exactly one answer. Shannon entropy is not a design preference. It is the uniquely correct observable for distributional monitoring. This proof constrains the entire design space before a single engineering decision is made. The MVP implements this: the Accountant computes Shannon entropy and nothing else as its primary signal.
Link 2: Content-blindness forces a second layer. Shannon entropy measures distributional shape — how concentrated or dispersed probability mass is. It cannot access the semantic content of the states being distributed over. For the MVP’s purpose of demonstrating the signal across diverse streams, content-blindness is a feature: the same Accountant works for Bitcoin prices and earthquake magnitudes precisely because it never looks at meaning. For production AI monitoring, this blindness forces a semantic layer (the Constraint Auditor) that is out of scope here but accommodated by the architecture. The MVP implements this: the Accountant is a black box that accepts scalars and returns EeFrames, with zero knowledge of source.
Link 3: Speed mismatch forces parallel operation. The Accountant’s entropy computation runs at sub-10-millisecond latency. Any future semantic analysis operates at 100–300 milliseconds. If the slow layer were in the fast layer’s critical path, the most valuable signal would be throttled by the slowest component. This forces asynchronous, parallel operation. The MVP implements this: the Accountant’s process() call is synchronous and fast; all network I/O and display logic runs asynchronously in the Spokesperson, never blocking the Accountant’s computation.
Link 4: Human accountability forces a dedicated interface. A monitoring system that produces assessments no human can read, question, or override is not a safety tool. It is an unaccountable automated agent. The system must translate its internal state into language operators can evaluate, and must provide controls — including the ability to shut down any component. The MVP implements this: the Spokesperson is the sole human interface. All operator actions route through it. The off switches are here.
Link 5: Operational degradation forces self-monitoring. Any monitoring system’s accuracy can degrade over time. Pattern libraries go stale. Thresholds drift. A system that cannot detect its own degradation provides false confidence. The MVP provides a placeholder: a lightweight heartbeat that computes entropy over the Accountant’s own recent stability classifications and alerts the operator if meta-level dynamics become unstable. The full Diagnostician is deferred.
Link 6: Incremental deployment forces modularity. No organization deploys a full monitoring architecture on day one. Each phase must deliver independent value. The MVP is Phase 1: the Fast Layer standalone, producing EeFrames and operational data that would feed Phase 2 (Librarian + full Spokesperson) if and when deployed. The module boundaries are respected so that nothing built here requires rebuilding.
Interface Boundaries (Non-Negotiable)
Two boundaries must be preserved exactly as specified, even in the MVP:
Accountant → Spokesperson (one-way). EeFrames flow from the Accountant into the Spokesperson. The Spokesperson reads EeFrames. It never writes to the Accountant’s computation path. The Accountant cannot be slowed, paused, or blocked by any Spokesperson process. The Spokesperson may reset the Accountant (on stream switch) or adjust its thresholds (on operator command), but these are control-plane operations, not data-path intrusions.
Spokesperson → Operator (governed). All operator interactions with the Engine route through the Spokesperson. There is no backdoor access to the Accountant’s internal state. Every operator action is logged. This boundary ensures that the system’s operational state can be reliably determined at any given moment from the audit trail.
3. Mathematical Foundation
This section provides the implementer with the mathematical definitions needed to build the Accountant. It is not a tutorial on information theory. It specifies exactly what to compute, in what order, and how to handle edge cases.
Shannon Entropy
For a discrete probability distribution P = (p₁, p₂, ..., pₙ) over n categories, Shannon entropy is:
H = −Σᵢ pᵢ log₂(pᵢ)
where the sum runs over all categories i = 1 to n.
H measures how concentrated or dispersed the probability mass is across categories. When all probability is on one category, H = 0 (maximum concentration). When probability is spread uniformly across all categories, H = log₂(n) (maximum dispersion).
The 0·log₂(0) convention. The function p·log₂(p) is undefined at p = 0, but its limit as p → 0⁺ is 0. The implementation must enforce this: any category with count zero contributes 0 to the entropy sum, not NaN or −inf. In numpy: mask zero probabilities before computing the log, or use np.where(p > 0, -p * np.log2(p), 0.0).
Why Shannon entropy and no alternative. Khinchin’s uniqueness theorem (1957) proves that H is the only scalar function of a probability distribution satisfying three axioms simultaneously: (1) continuity — small changes in the distribution produce small changes in the output, enabling meaningful derivatives; (2) maximality — the function peaks at the uniform distribution, providing a universal ceiling for normalization; (3) additivity — the function decomposes consistently across scales, enabling hierarchical monitoring. Every alternative (variance, Gini impurity, maximum probability, Rényi entropies of order α ≠ 1) violates at least one axiom. The implementer does not need to verify this — it is a proven theorem.
Normalized Entropy
Normalized entropy scales H to the [0, 1] interval:
H_norm = H / log₂(n)
where n is the number of categories. H_norm = 0 means all probability is concentrated on one category. H_norm = 1 means the distribution is perfectly uniform. This normalization enables comparison across streams with different numbers of categories.
Note on ESI. The full Entropy Engine’s Environmental Stability Index (ESI) is a weighted combination of multiple normalized telemetry contributions: ESI = Σ wᵢ × f(normalize(telemetryᵢ)). In this single-stream MVP, normalized entropy serves as the ESI value. The EeFrame field is labeled esi for patent alignment. When the architecture extends to multi-stream monitoring, ESI computation will incorporate the full weighted pipeline. For this MVP, esi = H_norm.
The Derivative Hierarchy
Shannon entropy at a single time step tells you the system’s current distributional state. Its derivatives tell you where that state is heading.
Velocity (H′). The first derivative, computed as the difference between the current entropy and the mean entropy over a recent rolling window:
H′(t) = H(t) − mean(H over derivative window)
Sustained positive H′ means the distribution is dispersing — the system is becoming less certain. Sustained negative H′ means it is concentrating. Near-zero H′ means the entropy profile is stable.
Acceleration (H″). The second derivative, computed as the difference between the current velocity and the previous velocity:
H″(t) = H′(t) − H′(t−1)
Acceleration is the early warning signal. In bench-scale testing, positive acceleration preceded coherence failures by 10–20 time steps.
Jerk (H‴). The third derivative:
H‴(t) = H″(t) − H″(t−1)
Jerk detects regime transitions — moments where the character of the distributional dynamics shifts qualitatively. It is noisier than the first two derivatives and is used primarily as an event flag.
Derivative availability during cold start. Velocity requires at least 2 entropy values. Acceleration requires at least 3. Jerk requires at least 4. Before sufficient history exists, unavailable derivatives are reported as null in the EeFrame, and the confidence score is reduced proportionally (see Section 8).
Why stop at third order. The fourth derivative and beyond contribute more noise than signal in testing. Three derivatives provide position, trajectory, curvature, and regime-transition detection. This is sufficient. The implementation should not compute derivatives beyond third order.
4. Data Flow Architecture
The MVP pipeline has six stages. Data flows in one direction — from the external world to the operator’s screen — with a single controlled feedback path for operator commands.
End-to-End Pipeline
Stream Source → Stream Adapter → Normalizer → Accountant → Decision Gate → Spokesperson → Operator
Stage-by-Stage Specification
Stage 1: Stream Source. An external public data feed (WebSocket, SSE, REST endpoint). The MVP has zero control over this stage.
Stage 2: Stream Adapter. Receives raw data from the source. Extracts a single scalar value per update according to the stream’s configuration. Handles all protocol-specific logic so nothing downstream encounters protocol-specific data.
Stage 3: Normalizer. Receives the raw scalar from the Adapter. Applies band-centered normalization and z-score calculation (see Section 6). Outputs a normalized scalar that makes heterogeneous streams comparable.
Stage 4: Accountant. Receives the normalized scalar. Computes entropy, derivatives, confidence, and stability classification. Emits an EeFrame. Latency budget: sub-10ms per call. Specified in full in Section 7.
Stage 5: Decision Gate. Receives the EeFrame. Routes it through one of five paths (PROCEED, PAUSE, REVISE, VOTE, ABORT) based on stability classification and confidence. Three paths active in MVP, two stubbed. See Section 11.
Stage 6: Spokesperson. Receives the routed EeFrame and the Decision Gate’s path determination. Translates the EeFrame into operator-readable display. Logs the frame and routing decision. Handles all display, logging, and operator interaction.
Boundary Enforcement
The async architecture enforces boundaries naturally. Stream Adapter → Normalizer → Accountant runs as a synchronous call chain within an async task. Spokesperson I/O runs in separate async tasks that consume EeFrames from a queue. The Accountant never waits for display to complete.
The EeFrame Queue
Between the Accountant and the Spokesperson sits an asyncio.Queue (unbounded). Every process() call places the resulting EeFrame on this queue. The Spokesperson’s display loop consumes from the queue. This decoupling prevents Spokesperson latency from contaminating Accountant throughput. If the Spokesperson is paused, EeFrames continue to be produced and queued. No data is lost during pause.
5. Stream Adapter Layer
Role
The Stream Adapter Layer translates the heterogeneous external world into the single interface the Normalizer and Accountant consume: one scalar value per update, delivered asynchronously. Every protocol-specific detail lives here and nowhere else.
Adapter Types
Three adapter classes handle all 12 streams:
WebSocketAdapter. For streams that push data over persistent WebSocket connections. Handles connection, reconnection, message parsing, and heartbeat/ping-pong. Used by: Bitstamp, Binance, Certstream, Yahoo Finance, Open Rail Data.
SSEAdapter. For Server-Sent Events streams over HTTP. Handles connection, automatic reconnection, and event parsing. Used by: Wikipedia Recent Changes.
RESTPollingAdapter. For HTTP endpoints that must be polled at intervals. Handles request timing, response parsing, and rate limiting. Used by: CoinGecko, USGS Earthquake, NOAA Buoy, ThingSpeak, EPA AirNow, MTA GTFS Realtime.
Supported Streams
#
Stream
Access
Scalar Extracted
Notes
1
Wikipedia Recent Changes
SSE
length (edit size bytes)
Continuous high-freq
2
Bitstamp BTC/USD
WebSocket
price from live trades
Subscribe msg required
3
Binance BTC/USDT
WebSocket
p (price) from trade stream
No auth required
4
CoinGecko BTC/USD
REST poll
bitcoin.usd from /simple/price
60s poll interval
5
USGS Earthquakes
REST poll
magnitude from latest feature
60s; GeoJSON
6
NOAA Buoy Data
REST poll
WVHT (wave height)
300s; fixed-width text
7
ThingSpeak IoT
REST poll
field1 from public channel
30s; JSON
8
MTA GTFS Realtime
REST poll
delay (seconds)
30s; Protobuf
9
EPA AirNow
REST poll
AQI from observations
600s; requires API key
10
Open Rail Data (UK)
WebSocket
tpl_delay from movements
STOMP; requires reg
11
Certstream
WebSocket
Certificate SA


Comments