We give formal, constructive proofs that (1) every implementable neural-network–style program with computably countable input/output sets is extensionally equivalent to a countable Markov chain, and (2) every computable countable Markov chain can be implemented by a finite program (an "implementable neural network" with access to randomness) which reproduces the chain's next-step distribution. We state precise assumptions and provide constructions that make the extensional equivalence explicit.
Definition 1.1 (Computably countable)
A set is computably countable if there exists a computable injection (finite binary strings). Equivalently, can be put in computable bijection with a subset of .
Definition 1.2 (Implementable neural network (informal))
An implementable neural network is any finite, deterministic program (finite description executable on a digital computer) that, on input for in a computably countable input set , either:
We will treat such a program as defining, for every , a computable distribution on .
Definition 1.3 (Countable Markov chain)
A (discrete-time) countable Markov chain is a pair where is a countable set of states and satisfies for all . We say the chain is computable if is computably countable and the function is a computable real-valued function (or each row is a computable distribution).
Definition 1.4 (Extensional equivalence)
Let be computably countable sets. Given an implementable NN with conditional output distributions for , a countable Markov chain , an initialization map and a read map , we say is extensionally equivalent to iff for every and every ,
i.e. the distribution on produced by on input equals the distribution on obtained by starting the chain at and reading via after one step (or some fixed finite read time).Theorem 2.1 (Implementable NN countable Markov chain)
Let be an implementable neural network with computably countable input set and output set , and let denote the distribution samples from on input . Then there exists a computable countable Markov chain , a computable initialization map , and a computable read map such that for every and ,
Moreover, if and are finite then can be taken finite.Proof. We give a constructive finite-step realization.
State space. Define
where is an optional (possibly empty) countable set of internal computation states used if one wishes to model the NN's multi-step internal computation. Since and are computably countable and is at most countable, is computably countable.Initialization and read maps. Define the initialization map by . Define the read map by for , and define arbitrarily on other states (they will have zero probability at the read time in the simple construction).
Transition matrix. Define by:
Correctness. Start the chain at . By construction,
Applying at time yields with exactly the same probability. Thus is extensionally equivalent to , as required. If are finite and , then is finite and the chain is finite.Theorem 2.2 (Countable Markov chain implementable NN (with RNG))
Let be a computable countable Markov chain: is computably countable and for each , the row is a computable probability distribution on . Then there exists a finite program (an implementable "neural-network–style" program that may use uniform randomness) such that, on input , outputs a sample distributed according to . Equivalently, reproduces the chain's one-step transition distribution exactly.
Proof. Because is computably countable, fix a computable bijection (enumeration) of and a computable encoding .
We construct a finite program that implements inverse-CDF sampling from the row given :
If the chain is finite, the lookup can be implemented by a finite matrix and the program is straightforward to implement precisely. For infinite countable , the same sampling procedure is implementable so long as each row is a computable distribution (so the cumulative sums can be computed to the precision required by the RNG).
Corollary 3.1 (Finite case)
If and are finite (implementable NN with finite input/output domains), then the Markov chain constructed in Theorem 1 is finite. Conversely, any finite Markov chain can be implemented exactly by a small finite program as in Theorem 2 (or by a one-hot input linear layer that outputs the row).
Remark 3.2 (Computability caveat)
If a distribution contains noncomputable real probabilities (or irrational entries not representable in the chosen machine model), no finite implementable program can reproduce those exact reals; the appropriate interpretation is that we compare machine-level behaviors and computable distributions. The two constructions are symmetric with respect to this limitation: exact equivalence holds under the natural computability/representability assumptions stated above.
Remark 3.3 (Extensional vs. intensional)
These theorems are extensional representation results: they match observable input–output (distributional) behavior. They do not claim preservation of computational complexity, internal structure, training dynamics, or other intensional properties.
Under the usual implementability assumptions (computably countable domains and computable probabilities), implementable neural-network–style programs and computable countable Markov chains are extensionally inter-translatable by the constructions above. The conversions are constructive and immediate, but they are existence/representation results and do not imply practical equivalence in efficiency or convenience.