Implementable Neural Networks and Countable Markov Chains:
Formal Constructions and Proofs

We give formal, constructive proofs that (1) every implementable neural-network–style program with computably countable input/output sets is extensionally equivalent to a countable Markov chain, and (2) every computable countable Markov chain can be implemented by a finite program (an "implementable neural network" with access to randomness) which reproduces the chain's next-step distribution. We state precise assumptions and provide constructions that make the extensional equivalence explicit.

1. Model assumptions and definitions

Definition 1.1 (Computably countable)

A set A is computably countable if there exists a computable injection encA:A{0,1} (finite binary strings). Equivalently, A can be put in computable bijection with a subset of N.

Definition 1.2 (Implementable neural network (informal))

An implementable neural network N is any finite, deterministic program (finite description executable on a digital computer) that, on input encX(x){0,1} for x in a computably countable input set X, either:

  1. outputs deterministically an encoding encY(y) for some yY, or
  2. uses a source of uniform randomness and produces a sample from a probability distribution PrN(x) over a computably countable output set Y.

We will treat such a program as defining, for every xX, a computable distribution PrN(x) on Y.

Definition 1.3 (Countable Markov chain)

A (discrete-time) countable Markov chain is a pair (S,P) where S is a countable set of states and P:S×S[0,1] satisfies sSP(ss)=1 for all sS. We say the chain is computable if S is computably countable and the function (s,s)P(ss) is a computable real-valued function (or each row is a computable distribution).

Definition 1.4 (Extensional equivalence)

Let X,Y be computably countable sets. Given an implementable NN N with conditional output distributions PrN(x) for xX, a countable Markov chain (S,P), an initialization map ι:XS and a read map ρ:SY, we say (S,P,ι,ρ) is extensionally equivalent to N iff for every xX and every yY,

PrN(yx)=Pr(ρ(S1)=yS0=ι(x)),

i.e. the distribution on Y produced by N on input x equals the distribution on Y obtained by starting the chain at ι(x) and reading via ρ after one step (or some fixed finite read time).

2. Main constructions and proofs

Theorem 2.1 (Implementable NN countable Markov chain)

Let N be an implementable neural network with computably countable input set X and output set Y, and let PrN(x) denote the distribution N samples from on input x. Then there exists a computable countable Markov chain (S,P), a computable initialization map ι:XS, and a computable read map ρ:SY such that for every xX and yY,

PrN(yx)=Pr(ρ(S1)=yS0=ι(x)).

Moreover, if X and Y are finite then S can be taken finite.

Proof. We give a constructive finite-step realization.

State space. Define

S:={(x,in):xX}{(y,out):yY}Sint,

where Sint is an optional (possibly empty) countable set of internal computation states used if one wishes to model the NN's multi-step internal computation. Since X and Y are computably countable and Sint is at most countable, S is computably countable.

Initialization and read maps. Define the initialization map ι:XS by ι(x)=(x,in). Define the read map ρ:SY by ρ((y,out))=y for (y,out)S, and define ρ arbitrarily on other states (they will have zero probability at the read time in the simple construction).

Transition matrix. Define P:S×S[0,1] by:

Because each PrN(x) is a probability distribution, the rows for (x,in) sum to 1, and hence P defines a valid Markov chain. If the NN's distributions are computable then P is computable.

Correctness. Start the chain at S0=ι(x)=(x,in). By construction,

Pr(S1=(y,out)S0=(x,in))=P((y,out)(x,in))=PrN(yx).

Applying ρ at time 1 yields ρ(S1)=y with exactly the same probability. Thus (S,P,ι,ρ) is extensionally equivalent to N, as required. If X,Y are finite and Sint=, then S is finite and the chain is finite.

Theorem 2.2 (Countable Markov chain implementable NN (with RNG))

Let (S,P) be a computable countable Markov chain: S is computably countable and for each sS, the row P(s) is a computable probability distribution on S. Then there exists a finite program N (an implementable "neural-network–style" program that may use uniform randomness) such that, on input encS(s), N outputs a sample distributed according to P(s). Equivalently, N reproduces the chain's one-step transition distribution exactly.

Proof. Because S is computably countable, fix a computable bijection (enumeration) s1,s2, of S and a computable encoding encS:S{0,1}.

We construct a finite program N that implements inverse-CDF sampling from the row P(s) given s:

  1. Input: w=encS(s). Decode w to recover s and its index i in the enumeration.
  2. Lookup: Compute (or retrieve) the sequence of probabilities pk:=P(sks) for k=1,2,. By hypothesis these values are computable reals; on a real computer they can be represented to the machine precision used (for exactness we assume the pk are representable in that precision or given as exact rationals).
  3. Cumulative sums: Compute cumulative sums ck:=j=1kpj (with c0:=0). Since the row sums to 1, we have limkck=1.
  4. Sample: Draw uUniform[0,1) from the RNG. Find the least index j such that u<cj. Return encS(sj).
This finite algorithm uses a finite description, a computable lookup for the row P(s), and standard RNG; it therefore qualifies as an implementable program in our sense (an implementable NN with RNG). By construction, the sampled sj has probability pj=P(sjs). Thus N reproduces the next-step distribution of (S,P) exactly (subject to the representability/computability caveat below).

If the chain is finite, the lookup can be implemented by a finite matrix and the program is straightforward to implement precisely. For infinite countable S, the same sampling procedure is implementable so long as each row is a computable distribution (so the cumulative sums can be computed to the precision required by the RNG).

3. Corollaries and remarks

Corollary 3.1 (Finite case)

If X and Y are finite (implementable NN with finite input/output domains), then the Markov chain constructed in Theorem 1 is finite. Conversely, any finite Markov chain can be implemented exactly by a small finite program as in Theorem 2 (or by a one-hot input linear layer that outputs the row).

Remark 3.2 (Computability caveat)

If a distribution contains noncomputable real probabilities (or irrational entries not representable in the chosen machine model), no finite implementable program can reproduce those exact reals; the appropriate interpretation is that we compare machine-level behaviors and computable distributions. The two constructions are symmetric with respect to this limitation: exact equivalence holds under the natural computability/representability assumptions stated above.

Remark 3.3 (Extensional vs. intensional)

These theorems are extensional representation results: they match observable input–output (distributional) behavior. They do not claim preservation of computational complexity, internal structure, training dynamics, or other intensional properties.

4. Conclusion

Under the usual implementability assumptions (computably countable domains and computable probabilities), implementable neural-network–style programs and computable countable Markov chains are extensionally inter-translatable by the constructions above. The conversions are constructive and immediate, but they are existence/representation results and do not imply practical equivalence in efficiency or convenience.