Random Forest

The HCPN framework enables a natural representation of the Random Forest: each individual tree is an instance of the leaf module \(s_{DT}\), composed into a top-level module via substitution transitions, port-socket bindings, and fusion places.

HCPN Formalization

For a Random Forest with \(N\) trees, the HCPN is:

\[HCPN_{RF} = (S_{RF}, \; SM_{RF}, \; PS_{RF}, \; FS_{RF})\]

Modules \(S_{RF} = \{s_{RF}, \; s_{DT}\}\):

\(s_{RF}\) — the top-level module containing the voting logic.
\(s_{DT}\) — the leaf module (see Decision Tree), reused \(N\) times.

Top-level module \(s_{RF}\):

\[s_{RF} = (CPN_{RF}, \; T_{sub} = \{t_{tree_1}, \dots, t_{tree_N}\}, \; P_{port} = \{P_{in}, P_{out}\}, \; PT)\]

with:

\(P = \{P_{in}, P_{collect}, P_{out}\}\) — input, collector, and output places.
\(C(P_{in}) = \text{VEC}\), \(C(P_{collect}) = \text{PROB}\), \(C(P_{out}) = \text{LABEL}\).
\(T = \{t_{tree_1}, \dots, t_{tree_N}, \; t_{vote}\}\) — \(N\) substitution transitions plus one regular aggregation transition.
\(t_{vote}\) — a regular transition (not in \(T_{sub}\)) that performs soft voting.

Submodule function \(SM_{RF}\):

\[SM_{RF}(t_{tree_k}) = s_{DT} \quad \text{for } k = 1, \dots, N\]

Each substitution transition \(t_{tree_k}\) is refined into the DT leaf module. Although the same module definition \(s_{DT}\) is referenced \(N\) times, each instance carries its own rule set (extracted from the \(k\)-th tree in the forest).

Port-socket bindings \(PS_{RF}\):

For each substitution transition \(t_{tree_k}\):

The In port \(P_{in}\) of \(s_{DT}\) is bound to socket \(P_{in}\) in \(s_{RF}\).
The Out port \(P_{out}\) of \(s_{DT}\) is bound to socket \(P_{collect}\) in \(s_{RF}\).

Colour compatibility is ensured: the DT module’s output port produces \(\text{PROB}\) vectors (the normalised class distribution at the matched leaf), matching \(C(P_{collect}) = \text{PROB}\).

Note

In the RF context, the DT module’s output arc expression is adapted: instead of depositing a discrete label, it yields the normalised class distribution \(\mathbf{p}_k \in [0,1]^{|C|}\) at the matched leaf. The port colour is therefore \(\text{PROB}\) rather than \(\text{LABEL}\).

Fusion set \(FS_{RF}\):

\[FS_{RF} = \{\{P_{in}\}\}\]

The input place \(P_{in}\) is a fusion place shared across the top module and all \(N\) DT sub-module instances. Every tree evaluates the same input token \(x\) without duplicating it.

Soft Voting (transition \(t_{vote}\) ):

The aggregation transition \(t_{vote}\) consumes all \(N\) probability tokens from \(P_{collect}\) (enabled only when \(|P_{collect}| \geq N\)). Its output arc expression computes:

\[\bar{\mathbf{p}} = \frac{1}{N} \sum_{k=1}^{N} \mathbf{p}_k, \qquad v_{final} = \arg\max_{c} \; \bar{p}_c\]

This matches the default behaviour of scikit-learn’s RandomForestClassifier, which uses soft voting via predict_proba averaging.

The following figure shows the parallel composition. Each sub-page \(Tree_k\) is an independent CPN module whose output probability vector is collected in \(P_{collector}\). The aggregation transition \(T_{voter}\) computes the averaged probabilities and deposits the winning class in \(P_{final\_class}\).

Random Forest HCPN diagram — HCPN for Random Forest: \(N\) substitution transitions (each refining to a DT module) feed probability vectors into a collector place, followed by a soft-voting aggregation transition.

Note

The figure labels the aggregation transition as “Mode Function” for visual simplicity. In the actual implementation, soft voting (probability averaging) is used as the primary path; hard voting (mode) exists only as a fallback when class distribution data is unavailable.

Numerical Induction Example

The following example demonstrates that the HCPN construction preserves correct soft-voting semantics for a forest with \(N\) trees and remains valid when extended to \(N+1\) trees.

Setup: Binary classification with classes \(\{0, 1\}\). Input \(x = (3.0, \; 7.0)\).

Base case \((N = 2)\):

Two decision trees, each producing a probability vector at its matched leaf:

Tree	Matched rule guard	Output \(\mathbf{p}_k\)
\(T_1\)	\(x_1 \leq 5.0\) (true)	\((0.8, \; 0.2)\)
\(T_2\)	\(x_2 > 4.0\) (true)	\((0.6, \; 0.4)\)

HCPN state trace:

\(P_{in}\) holds token \(x = (3.0, 7.0)\). Fusion set shares it with both DT sub-modules.
\(t_{tree_1}\) fires (substitution transition refines to \(s_{DT}^{(1)}\)): one rule matches, deposits \((0.8, 0.2)\) in \(P_{collect}\).
\(t_{tree_2}\) fires (substitution transition refines to \(s_{DT}^{(2)}\)): one rule matches, deposits \((0.6, 0.4)\) in \(P_{collect}\).
\(P_{collect}\) now contains 2 tokens. \(t_{vote}\) is enabled (requires \(N = 2\) tokens):

\[\bar{\mathbf{p}} = \frac{1}{2}\bigl((0.8, 0.2) + (0.6, 0.4)\bigr) = (0.70, \; 0.30)\]

\[v_{final} = \arg\max(0.70, \; 0.30) = 0\]

Result: class \(0\). \(\checkmark\)

Inductive step \((N = 2 \to N + 1 = 3)\):

A third tree \(T_3\) is added to the forest. The HCPN is extended by:

Adding a substitution transition \(t_{tree_3}\) to \(T_{sub}\) in the top module.
Mapping \(SM_{RF}(t_{tree_3}) = s_{DT}\).
Binding its ports: \(P_{in} \leftrightarrow P_{in}\) (via fusion), \(P_{out} \leftrightarrow P_{collect}\).
Updating \(t_{vote}\)’s enabling condition to require \(N+1 = 3\) tokens.

Tree	Matched rule guard	Output \(\mathbf{p}_k\)
\(T_1\)	\(x_1 \leq 5.0\) (true)	\((0.8, \; 0.2)\)
\(T_2\)	\(x_2 > 4.0\) (true)	\((0.6, \; 0.4)\)
\(T_3\)	\(x_1 \leq 5.0 \;\wedge\; x_2 > 6.0\) (true)	\((0.3, \; 0.7)\)

HCPN state trace:

\(P_{in}\) holds \(x\). Fusion shares it with all 3 sub-modules.
Three substitution transitions fire independently. \(P_{collect}\) receives 3 tokens.
\(t_{vote}\) is enabled (requires \(N+1 = 3\) tokens):

\[\bar{\mathbf{p}} = \frac{1}{3}\bigl((0.8, 0.2) + (0.6, 0.4) + (0.3, 0.7)\bigr) = \left(\frac{1.7}{3}, \; \frac{1.3}{3}\right) \approx (0.567, \; 0.433)\]

\[v_{final} = \arg\max(0.567, \; 0.433) = 0\]

Result: class \(0\). \(\checkmark\)

Why the invariant is preserved: Adding a tree \(T_{N+1}\) to a valid \(N\)-tree HCPN requires only:

One new substitution transition \(t_{tree_{N+1}}\) with \(SM(t_{tree_{N+1}}) = s_{DT}\).
Port-socket bindings identical to those of all existing trees.
The fusion set \(FS\) is unchanged — \(P_{in}\) is already shared.
The voting transition \(t_{vote}\) updates its arc weight from \(N\) to \(N+1\).

The DT leaf module guarantees exactly one rule fires per tree (by the DT invariant). Therefore \(P_{collect}\) always receives exactly \(N+1\) tokens, \(t_{vote}\) fires exactly once, and the averaging formula generalises directly:

\[{\bar{\mathbf{p}}}^{(N+1)} = \frac{1}{N+1}\left(\sum_{k=1}^{N} \mathbf{p}_k + \mathbf{p}_{N+1}\right) = \frac{N}{N+1}\,{\bar{\mathbf{p}}}^{(N)} + \frac{1}{N+1}\,\mathbf{p}_{N+1}\]

This is a convex combination of the previous average and the new tree’s output, which is always well-defined and produces a valid probability vector.

Conclusion: By induction on \(N\), the RF HCPN construction correctly computes soft-voting for any number of trees. Each additional tree adds one substitution transition and one probability token, without affecting the existing modules or their determinism guarantees.