Implementable Neural Networks and Countable Markov Chains: Formal Constructions and Proofs

We give formal, constructive proofs that (1) every implementable neural-network–style program with computably countable input/output sets is extensionally equivalent to a countable Markov chain, and (2) every computable countable Markov chain can be implemented by a finite program (an "implementable neural network" with access to randomness) which reproduces the chain's next-step distribution. We state precise assumptions and provide constructions that make the extensional equivalence explicit.

1. Model assumptions and definitions

Definition 1.2 (Implementable neural network (informal))

An implementable neural network $N$ is any finite, deterministic program (finite description executable on a digital computer) that, on input ${enc}_{X} (x) \in {0, 1}^{*}$ for $x$ in a computably countable input set $X$ , either:

outputs deterministically an encoding ${enc}_{Y} (y)$ for some $y \in Y$ , or
uses a source of uniform randomness and produces a sample from a probability distribution $\Pr_{N} (\cdot ∣ x)$ over a computably countable output set $Y$ .

We will treat such a program as defining, for every $x \in X$ , a computable distribution $\Pr_{N} (\cdot ∣ x)$ on $Y$ .

Definition 1.3 (Countable Markov chain)

A (discrete-time) countable Markov chain is a pair $(S, P)$ where $S$ is a countable set of states and $P : S \times S \to [0, 1]$ satisfies $\sum_{s^{'} \in S} P (s^{'} ∣ s) = 1$ for all $s \in S$ . We say the chain is computable if $S$ is computably countable and the function $(s, s^{'}) \mapsto P (s^{'} ∣ s)$ is a computable real-valued function (or each row is a computable distribution).

Definition 1.4 (Extensional equivalence)

Let $X, Y$ be computably countable sets. Given an implementable NN $N$ with conditional output distributions $\Pr_{N} (\cdot ∣ x)$ for $x \in X$ , a countable Markov chain $(S, P)$ , an initialization map $ι : X \to S$ and a read map $ρ : S \to Y$ , we say $(S, P, ι, ρ)$ is extensionally equivalent to $N$ iff for every $x \in X$ and every $y \in Y$ ,

$\Pr_{N} (y ∣ x) = \Pr (ρ (S_{1}) = y ∣ S_{0} = ι (x)),$

i.e. the distribution on

Y

produced by

N

on input

x

equals the distribution on

Y

obtained by starting the chain at

ι (x)

and reading via

ρ

after one step (or some fixed finite read time).

2. Main constructions and proofs

Theorem 2.1 (Implementable NN $\Rightarrow$ countable Markov chain)

Let $N$ be an implementable neural network with computably countable input set $X$ and output set $Y$ , and let $\Pr_{N} (\cdot ∣ x)$ denote the distribution $N$ samples from on input $x$ . Then there exists a computable countable Markov chain $(S, P)$ , a computable initialization map $ι : X \to S$ , and a computable read map $ρ : S \to Y$ such that for every $x \in X$ and $y \in Y$ ,

$\Pr_{N} (y ∣ x) = \Pr (ρ (S_{1}) = y ∣ S_{0} = ι (x)) .$

Moreover, if

X

and

Y

are finite then

S

can be taken finite.

Proof. We give a constructive finite-step realization.

State space. Define

$S := {(x, i n) : x \in X} \cup {(y, o u t) : y \in Y} \cup S_{i n t},$

where

S_{i n t}

is an optional (possibly empty) countable set of internal computation states used if one wishes to model the NN's multi-step internal computation. Since

X

and

Y

are computably countable and

S_{i n t}

is at most countable,

S

is computably countable.

Initialization and read maps. Define the initialization map $ι : X \to S$ by $ι (x) = (x, i n)$ . Define the read map $ρ : S \to Y$ by $ρ ((y, o u t)) = y$ for $(y, o u t) \in S$ , and define $ρ$ arbitrarily on other states (they will have zero probability at the read time in the simple construction).

Transition matrix. Define $P : S \times S \to [0, 1]$ by:

For each input state $(x, i n)$ and each $y \in Y$ ,
$P ((y, o u t) ∣ (x, i n)) := \Pr_{N} (y ∣ x) .$
For each output state $(y, o u t)$ , make it absorbing:
$P ((y, o u t) ∣ (y, o u t)) := 1.$
For rows corresponding to any $s \in S_{i n t}$ or other unspecified pairs choose any probabilities that make each row sum to $1$ (for example, make them deterministic transitions if modeling internal computation explicitly).

Because each

\Pr_{N} (\cdot ∣ x)

is a probability distribution, the rows for

(x, i n)

sum to

1

, and hence

P

defines a valid Markov chain. If the NN's distributions are computable then

P

is computable.

Correctness. Start the chain at $S_{0} = ι (x) = (x, i n)$ . By construction,

$\Pr (S_{1} = (y, o u t) ∣ S_{0} = (x, i n)) = P ((y, o u t) ∣ (x, i n)) = \Pr_{N} (y ∣ x) .$

Applying

ρ

at time

1

yields

ρ (S_{1}) = y

with exactly the same probability. Thus

(S, P, ι, ρ)

is extensionally equivalent to

N

, as required. If

X, Y

are finite and

S_{i n t} = \emptyset

, then

S

is finite and the chain is finite.

Theorem 2.2 (Countable Markov chain $\Rightarrow$ implementable NN (with RNG))

Let $(S, P)$ be a computable countable Markov chain: $S$ is computably countable and for each $s \in S$ , the row $P (\cdot ∣ s)$ is a computable probability distribution on $S$ . Then there exists a finite program $N$ (an implementable "neural-network–style" program that may use uniform randomness) such that, on input ${enc}_{S} (s)$ , $N$ outputs a sample distributed according to $P (\cdot ∣ s)$ . Equivalently, $N$ reproduces the chain's one-step transition distribution exactly.

Proof. Because $S$ is computably countable, fix a computable bijection (enumeration) $s_{1}, s_{2}, \dots$ of $S$ and a computable encoding ${enc}_{S} : S \to {0, 1}^{*}$ .

We construct a finite program $N$ that implements inverse-CDF sampling from the row $P (\cdot ∣ s)$ given $s$ :

Input: $w = {enc}_{S} (s)$ . Decode $w$ to recover $s$ and its index $i$ in the enumeration.
Lookup: Compute (or retrieve) the sequence of probabilities $p_{k} := P (s_{k} ∣ s)$ for $k = 1, 2, \dots$ . By hypothesis these values are computable reals; on a real computer they can be represented to the machine precision used (for exactness we assume the $p_{k}$ are representable in that precision or given as exact rationals).
Cumulative sums: Compute cumulative sums $c_{k} := \sum_{j = 1}^{k} p_{j}$ (with $c_{0} := 0$ ). Since the row sums to $1$ , we have $lim_{k \to \infty} c_{k} = 1$ .
Sample: Draw $u \sim U n i f o r m [0, 1)$ from the RNG. Find the least index $j$ such that $u < c_{j}$ . Return ${enc}_{S} (s_{j})$ .

This finite algorithm uses a finite description, a computable lookup for the row

P (\cdot ∣ s)

, and standard RNG; it therefore qualifies as an implementable program in our sense (an implementable NN with RNG). By construction, the sampled

s_{j}

has probability

p_{j} = P (s_{j} ∣ s)

. Thus

N

reproduces the next-step distribution of

(S, P)

exactly (subject to the representability/computability caveat below).

If the chain is finite, the lookup can be implemented by a finite matrix and the program is straightforward to implement precisely. For infinite countable $S$ , the same sampling procedure is implementable so long as each row is a computable distribution (so the cumulative sums can be computed to the precision required by the RNG).

3. Corollaries and remarks

Remark 3.2 (Computability caveat)

If a distribution contains noncomputable real probabilities (or irrational entries not representable in the chosen machine model), no finite implementable program can reproduce those exact reals; the appropriate interpretation is that we compare machine-level behaviors and computable distributions. The two constructions are symmetric with respect to this limitation: exact equivalence holds under the natural computability/representability assumptions stated above.

4. Conclusion

Under the usual implementability assumptions (computably countable domains and computable probabilities), implementable neural-network–style programs and computable countable Markov chains are extensionally inter-translatable by the constructions above. The conversions are constructive and immediate, but they are existence/representation results and do not imply practical equivalence in efficiency or convenience.

Implementable Neural Networks and Countable Markov Chains:Formal Constructions and Proofs

1. Model assumptions and definitions

2. Main constructions and proofs

3. Corollaries and remarks

4. Conclusion

Implementable Neural Networks and Countable Markov Chains:
Formal Constructions and Proofs