Random Forest

The HCPN framework enables a natural representation of the Random Forest: each individual tree is an instance of the leaf module \(s_{DT}\), composed into a top-level module via substitution transitions, port-socket bindings, and fusion places.

HCPN Formalization

For a Random Forest with \(N\) trees, the HCPN is:

\[HCPN_{RF} = (S_{RF}, \; SM_{RF}, \; PS_{RF}, \; FS_{RF})\]

Modules \(S_{RF} = \{s_{RF}, \; s_{DT}\}\):

  • \(s_{RF}\) — the top-level module containing the voting logic.

  • \(s_{DT}\) — the leaf module (see Decision Tree), reused \(N\) times.

Top-level module \(s_{RF}\):

\[s_{RF} = (CPN_{RF}, \; T_{sub} = \{t_{tree_1}, \dots, t_{tree_N}\}, \; P_{port} = \{P_{in}, P_{out}\}, \; PT)\]

with:

  • \(P = \{P_{in}, P_{collect}, P_{out}\}\) — input, collector, and output places.

  • \(C(P_{in}) = \text{VEC}\), \(C(P_{collect}) = \text{PROB}\), \(C(P_{out}) = \text{LABEL}\).

  • \(T = \{t_{tree_1}, \dots, t_{tree_N}, \; t_{vote}\}\)\(N\) substitution transitions plus one regular aggregation transition.

  • \(t_{vote}\) — a regular transition (not in \(T_{sub}\)) that performs soft voting.

Submodule function \(SM_{RF}\):

\[SM_{RF}(t_{tree_k}) = s_{DT} \quad \text{for } k = 1, \dots, N\]

Each substitution transition \(t_{tree_k}\) is refined into the DT leaf module. Although the same module definition \(s_{DT}\) is referenced \(N\) times, each instance carries its own rule set (extracted from the \(k\)-th tree in the forest).

Port-socket bindings \(PS_{RF}\):

For each substitution transition \(t_{tree_k}\):

  • The In port \(P_{in}\) of \(s_{DT}\) is bound to socket \(P_{in}\) in \(s_{RF}\).

  • The Out port \(P_{out}\) of \(s_{DT}\) is bound to socket \(P_{collect}\) in \(s_{RF}\).

Colour compatibility is ensured: the DT module’s output port produces \(\text{PROB}\) vectors (the normalised class distribution at the matched leaf), matching \(C(P_{collect}) = \text{PROB}\).

Note

In the RF context, the DT module’s output arc expression is adapted: instead of depositing a discrete label, it yields the normalised class distribution \(\mathbf{p}_k \in [0,1]^{|C|}\) at the matched leaf. The port colour is therefore \(\text{PROB}\) rather than \(\text{LABEL}\).

Fusion set \(FS_{RF}\):

\[FS_{RF} = \{\{P_{in}\}\}\]

The input place \(P_{in}\) is a fusion place shared across the top module and all \(N\) DT sub-module instances. Every tree evaluates the same input token \(x\) without duplicating it.

Soft Voting (transition \(t_{vote}\) ):

The aggregation transition \(t_{vote}\) consumes all \(N\) probability tokens from \(P_{collect}\) (enabled only when \(|P_{collect}| \geq N\)). Its output arc expression computes:

\[\bar{\mathbf{p}} = \frac{1}{N} \sum_{k=1}^{N} \mathbf{p}_k, \qquad v_{final} = \arg\max_{c} \; \bar{p}_c\]

This matches the default behaviour of scikit-learn’s RandomForestClassifier, which uses soft voting via predict_proba averaging.

The following figure shows the parallel composition. Each sub-page \(Tree_k\) is an independent CPN module whose output probability vector is collected in \(P_{collector}\). The aggregation transition \(T_{voter}\) computes the averaged probabilities and deposits the winning class in \(P_{final\_class}\).

Random Forest HCPN diagram

HCPN for Random Forest: \(N\) substitution transitions (each refining to a DT module) feed probability vectors into a collector place, followed by a soft-voting aggregation transition.

Note

The figure labels the aggregation transition as “Mode Function” for visual simplicity. In the actual implementation, soft voting (probability averaging) is used as the primary path; hard voting (mode) exists only as a fallback when class distribution data is unavailable.

Numerical Induction Example

The following example demonstrates that the HCPN construction preserves correct soft-voting semantics for a forest with \(N\) trees and remains valid when extended to \(N+1\) trees.

Setup: Binary classification with classes \(\{0, 1\}\). Input \(x = (3.0, \; 7.0)\).

Base case \((N = 2)\):

Two decision trees, each producing a probability vector at its matched leaf:

Tree

Matched rule guard

Output \(\mathbf{p}_k\)

\(T_1\)

\(x_1 \leq 5.0\) (true)

\((0.8, \; 0.2)\)

\(T_2\)

\(x_2 > 4.0\) (true)

\((0.6, \; 0.4)\)

HCPN state trace:

  1. \(P_{in}\) holds token \(x = (3.0, 7.0)\). Fusion set shares it with both DT sub-modules.

  2. \(t_{tree_1}\) fires (substitution transition refines to \(s_{DT}^{(1)}\)): one rule matches, deposits \((0.8, 0.2)\) in \(P_{collect}\).

  3. \(t_{tree_2}\) fires (substitution transition refines to \(s_{DT}^{(2)}\)): one rule matches, deposits \((0.6, 0.4)\) in \(P_{collect}\).

  4. \(P_{collect}\) now contains 2 tokens. \(t_{vote}\) is enabled (requires \(N = 2\) tokens):

\[\bar{\mathbf{p}} = \frac{1}{2}\bigl((0.8, 0.2) + (0.6, 0.4)\bigr) = (0.70, \; 0.30)\]
\[v_{final} = \arg\max(0.70, \; 0.30) = 0\]

Result: class \(0\). \(\checkmark\)

Inductive step \((N = 2 \to N + 1 = 3)\):

A third tree \(T_3\) is added to the forest. The HCPN is extended by:

  • Adding a substitution transition \(t_{tree_3}\) to \(T_{sub}\) in the top module.

  • Mapping \(SM_{RF}(t_{tree_3}) = s_{DT}\).

  • Binding its ports: \(P_{in} \leftrightarrow P_{in}\) (via fusion), \(P_{out} \leftrightarrow P_{collect}\).

  • Updating \(t_{vote}\)’s enabling condition to require \(N+1 = 3\) tokens.

Tree

Matched rule guard

Output \(\mathbf{p}_k\)

\(T_1\)

\(x_1 \leq 5.0\) (true)

\((0.8, \; 0.2)\)

\(T_2\)

\(x_2 > 4.0\) (true)

\((0.6, \; 0.4)\)

\(T_3\)

\(x_1 \leq 5.0 \;\wedge\; x_2 > 6.0\) (true)

\((0.3, \; 0.7)\)

HCPN state trace:

  1. \(P_{in}\) holds \(x\). Fusion shares it with all 3 sub-modules.

  2. Three substitution transitions fire independently. \(P_{collect}\) receives 3 tokens.

  3. \(t_{vote}\) is enabled (requires \(N+1 = 3\) tokens):

\[\bar{\mathbf{p}} = \frac{1}{3}\bigl((0.8, 0.2) + (0.6, 0.4) + (0.3, 0.7)\bigr) = \left(\frac{1.7}{3}, \; \frac{1.3}{3}\right) \approx (0.567, \; 0.433)\]
\[v_{final} = \arg\max(0.567, \; 0.433) = 0\]

Result: class \(0\). \(\checkmark\)

Why the invariant is preserved: Adding a tree \(T_{N+1}\) to a valid \(N\)-tree HCPN requires only:

  1. One new substitution transition \(t_{tree_{N+1}}\) with \(SM(t_{tree_{N+1}}) = s_{DT}\).

  2. Port-socket bindings identical to those of all existing trees.

  3. The fusion set \(FS\) is unchanged — \(P_{in}\) is already shared.

  4. The voting transition \(t_{vote}\) updates its arc weight from \(N\) to \(N+1\).

The DT leaf module guarantees exactly one rule fires per tree (by the DT invariant). Therefore \(P_{collect}\) always receives exactly \(N+1\) tokens, \(t_{vote}\) fires exactly once, and the averaging formula generalises directly:

\[{\bar{\mathbf{p}}}^{(N+1)} = \frac{1}{N+1}\left(\sum_{k=1}^{N} \mathbf{p}_k + \mathbf{p}_{N+1}\right) = \frac{N}{N+1}\,{\bar{\mathbf{p}}}^{(N)} + \frac{1}{N+1}\,\mathbf{p}_{N+1}\]

This is a convex combination of the previous average and the new tree’s output, which is always well-defined and produces a valid probability vector.

Conclusion: By induction on \(N\), the RF HCPN construction correctly computes soft-voting for any number of trees. Each additional tree adds one substitution transition and one probability token, without affecting the existing modules or their determinism guarantees.