RuleClassifier

class pyruleanalyzer.RuleClassifier(rules, algorithm_type='Decision Tree')
adjust_and_remove_rules(method)

Adjusts and removes duplicated rules from the rule set based on the specified method.

This method analyzes the current rule set to identify and remove duplicated rules. The logic supports three modes:
  • “custom”: Uses a user-defined custom function to remove rules.

  • “soft”: Detects and removes duplicated rules within the same tree only.

  • “hard”: Removes duplicated rules both within the same tree and across different trees.

Parameters:

method (str) – Strategy for rule refinement. Must be either “custom”, “soft” or “hard”.

Returns:

A tuple containing
  • A new list of rules after removing duplicates and adding generalized ones,

  • A list of the identified duplicated rule pairs.

Return type:

Tuple[List[Rule],List[Tuple[Rule,Rule]]]

calculate_sparsity_interpretability(n_features_total)

Computes sparsity and interpretability metrics for a given rule set.

This method measures how concise and generalizable the rules are by evaluating: - The proportion of total features actually used, - The total number of rules, - Rule depth statistics (max and mean), - A combined Sparsity Interpretability (SI) score.

Parameters:
  • rules (List[Rule]) – A list of Rule objects to analyze.

  • n_features_total (int) – Total number of available features in the dataset.

Returns:

A dictionary containing
  • features_used (int): Number of unique features used in rules,

  • total_features (int): Total number of features in the dataset,

  • sparsity (float): 1 - (features_used / total_features),

  • total_rules (int): Total number of rules,

  • max_depth (int): Maximum number of conditions in a single rule,

  • mean_rule_depth (float): Average number of conditions per rule,

  • sparsity_interpretability_score (float): Combined interpretability score (higher is better).

Return type:

Dict[str,Any]

classify(data, final=False)

Classifies a single data instance using extracted rules.

This method will delegate the classification logic to the appropriate function based on the algorithm type.

Parameters:
  • data (Dict[str, float]) – A dictionary representing the instance to classify, where keys are feature names (e.g., ‘v1’, ‘v2’) and values are the corresponding feature values.

  • final (bool) – If True, use final_rules (post-analysis); otherwise, use initial_rules.

Returns:

A tuple containing
  • Predicted class label (or None if no rule matched),

  • List of votes (Random Forest only, otherwise None),

  • Class probabilities (Random Forest only, otherwise None).

Return type:

Tuple[int|None,List[int]|None,np.ndarray|None]

classify_dt(data, rules)

Classifies a single data instance using extracted rules from the decision tree model.

This method applies the rule set to classify a given data instance, it returns the class of the first rule that matches.

Parameters:
  • data (Dict[str, float]) – A dictionary representing the instance to classify, where keys are feature names (e.g., ‘v1’, ‘v2’) and values are the corresponding feature values.

  • rules – (List[rule]): A list of rule instances.

Returns:

A tuple containing
  • Predicted class label (or None if no rule matched),

  • None,

  • None.

Return type:

Tuple[int|None,None,None]

classify_rf(data, rules)

Classifies a single data instance using extracted rules from the random forest model.

This method applies the rule set to classify a given data instance, it returns the class of the first rule that matches.

Parameters:
  • data (Dict[str, float]) – A dictionary representing the instance to classify, where keys are feature names (e.g., ‘v1’, ‘v2’) and values are the corresponding feature values.

  • rules – (List[rule]): A list of rule instances.

Returns:

A tuple containing
  • Predicted class label (or None if no rule matched),

  • List of votes,

  • Class probabilities.

Return type:

Tuple[int|None,List[int]|None,np.ndarray|None]

compare_initial_final_results(file_path)

Compares the classification performance of the initial and final rule sets.

This method evaluates both the original (initial_rules) and pruned (final_rules) rule sets on the same dataset, and logs performance metrics such as: - Accuracy, - Confusion matrices, - Divergent predictions between the two rule sets, - Interpretability metrics per tree.

It delegates to algorithm-specific methods based on the classifier type.

Parameters:

file_path (str) – Path to the CSV file used for evaluation.

compare_initial_final_results_dt(file_path)

Evaluates and compares the initial and final rule sets for a Decision Tree model.

This method: - Applies both the original (initial_rules) and refined (final_rules) rules to a dataset, - Computes and logs accuracy, confusion matrices, and divergent predictions, - Identifies instances where predictions changed after rule pruning, - Calculates interpretability metrics (sparsity, rule depth, etc.) for both rule sets.

All outputs are saved to ‘examples/files/output_final_classifier_dt.txt’.

Parameters:

file_path (str) – Path to the CSV file containing the dataset to evaluate.

compare_initial_final_results_rf(file_path)

Evaluates and compares the initial and final rule sets for a Random Forest model.

This method: - Applies both the original (initial_rules) and refined (final_rules) rule sets to the dataset, - Aggregates predictions using one vote per tree, - Computes and logs accuracy, confusion matrices, and rule counts per tree, - Identifies divergent predictions between the initial and final models, - Computes average interpretability metrics across trees for both rule sets.

All output is written to ‘examples/files/output_final_classifier.txt’.

Parameters:

file_path (str) – Path to the CSV file containing the dataset to evaluate.

custom_rule_removal(rules)

Placeholder for custom rule removal logic. Does not alter the rule set.

Parameters:

rules (List[Rule]) – List of Rules instances.

Returns:

A tuple containing
  • The same rules from the input,

  • An empty list.

Return type:

Tuple[List[Rule],List[]]

static display_metrics(y_true, y_pred, correct, total, file=None)

Computes and displays classification performance metrics.

This method calculates standard evaluation metrics including accuracy, precision, recall, F1 score, specificity, and the confusion matrix. The results are printed to the console and optionally written to a file.

Parameters:
  • y_true (List[int]) – List of true class labels.

  • y_pred (List[int]) – List of predicted class labels.

  • correct (int) – Number of correct predictions.

  • total (int) – Total number of predictions.

  • file (Optional[TextIO]) – File object to write the metrics to. If None, metrics are only printed.

execute_rule_analysis(file_path, remove_duplicates='none', remove_below_n_classifications=-1)

Executes a full rule evaluation and pruning process on a given dataset.

This method: - Applies optional duplicate rule removal, - Prints and logs final rule structure, - Runs evaluation using the appropriate algorithm (Decision Tree or Random Forest), - Optionally removes rules used less than or equal to a given threshold.

Parameters:
  • file_path (str) – Path to the CSV file containing data for evaluation.

  • remove_duplicates (str) – Method for removing duplicate rules, can be either “soft”, “hard”, “custom” or “none”.

  • remove_below_n_classifications (int) – Threshold for rule usage count. If set to -1, no filtering is applied.

execute_rule_analysis_dt(file_path, remove_below_n_classifications=-1)

Evaluates Decision Tree rules on a dataset and logs classification performance.

This method tests the decision tree rules on a CSV dataset, evaluates rule performance, removes infrequent rules (if specified), and logs classification results, errors, usage counts, and rule effectiveness into an output file.

Outputs are written to ‘examples/files/output_classifier_dt.txt’.

Parameters:
  • file_path (str) – Path to the CSV file containing the dataset to evaluate.

  • remove_below_n_classifications (int) – Minimum usage count required to retain a rule.

execute_rule_analysis_rf(file_path, remove_below_n_classifications=-1)

Evaluates Random Forest rules on a dataset and logs classification performance.

This method evaluates the rule-based classifier on test data using extracted random forest rules. It logs predictions, voting behavior, rule usage, errors, confusion matrix, and other diagnostics. It can also filter out rarely used rules if a threshold is specified.

Outputs are written to ‘examples/files/output_classifier.txt’.

Parameters:
  • file_path (str) – Path to the CSV file containing the dataset to evaluate.

  • remove_below_n_classifications (int) – Minimum rule usage required to retain a rule.

extract_variables_and_operators(conditions)

Extracts variable-operator-value triples from a list of rule conditions.

This helper method parses each condition (e.g., “v1 <= 0.5”) and returns a normalized list of tuples containing the variable name, the comparison operator, and the threshold value. Operators ‘<=’ and ‘<’ are treated equivalently, as are ‘>=’ and ‘>’.

Parameters:

conditions (List[str]) – A list of string conditions from a rule.

Returns:

A sorted list of (variable, operator, value) triples, with normalized operators.

Return type:

List[Tuple[str, str, float]]

find_duplicated_rules(type='soft')

Identifies nearly identical rules within the the same decision tree.

This method searches for rule pairs that: - Have the same class label, - Share all conditions except the last, - Differ only in the final condition, where one uses a ‘<=’ and the other a ‘>’ (or vice versa).

Such pairs are considered duplicates due to redundant decision splits at the boundary.

Returns:

A list of tuples, each representing a pair of duplicated rules.

Return type:

List[Tuple[Rule,Rule]]

find_duplicated_rules_between_trees()

Identifies semantically similar rules between different rules.

This method compares rules across the full rule set to find pairs that: - Use the same set of variables and logical operators (ignoring threshold values), - Belong to the same target class.

Returns:

A list of tuples, where each pair represents similar rules.

Return type:

List[Tuple[Rule,Rule]]

generate_classifier_model(algorithm_type='Random Forest')

Converts a list of extracted rule sets into a RuleClassifier instance.

This method formats rule sets into a standardized string format and initializes a RuleClassifier object with it. The resulting classifier is saved to ‘files/initial_model.pkl’.

Parameters:
  • rules (List[Dict[str, List[str]]]) – A list of rule dictionaries, each mapping class names to rule strings.

  • algorithm_type (str) – The type of model the rules originated from (‘Random Forest’ or ‘Decision Tree’).

Returns:

A RuleClassifier instance initialized with the given rules.

Return type:

RuleClassifier

get_rules(feature_names, class_names)

Extracts human-readable decision rules from a scikit-learn DecisionTreeClassifier.

This method traverses the tree structure to generate logical condition paths from root to leaf, and organizes them by predicted class.

Parameters:
  • tree (DecisionTreeClassifier) – A trained scikit-learn decision tree model.

  • feature_names (List[str]) – A list of feature names corresponding to the tree input features.

  • class_names (List[str]) – A list of class names corresponding to output labels.

Returns:

A dictionary mapping each class name to a list of rule strings that lead to predictions for that class.

Return type:

Dict[str,List[str]]

get_tree_rules(lst, lst_class, feature_names, algorithm_type='Random Forest')

Extracts rules from a trained scikit-learn model (Decision Tree or Random Forest).

For Decision Trees, this returns one rule set. For Random Forests, it aggregates rule sets from all individual decision trees.

Parameters:
  • model (Union[DecisionTreeClassifier, RandomForestClassifier]) – The trained model.

  • lst (List[int]) – List of feature indices (1-based) used to generate feature names (e.g., ‘v1’, ‘v2’).

  • lst_class (List[str]) – List of class names.

  • algorithm_type (str) – Type of model; either ‘Decision Tree’ or ‘Random Forest’.

Returns:

A list of rule sets, each as a dictionary mapping class names to rule strings.

Return type:

List[Dict[str,List[str]]]

new_classifier(test_path, model_parameters, model_path=None, algorithm_type='Random Forest')

Trains or loads a model, extracts decision rules, and builds a rule-based classifier.

This method either loads an existing scikit-learn model or trains a new one using the provided training dataset and model parameters. It evaluates the model on test data, saves it, extracts decision rules, and constructs a corresponding RuleClassifier object.

Parameters:
  • train_path (str) – Path to the training CSV file. Each row should contain features and the target label.

  • test_path (str) – Path to the test CSV file. Each row should contain features and the target label.

  • model_parameters (dict) – Parameters to initialize the scikit-learn model. Must match the accepted parameters of either sklearn.tree.DecisionTreeClassifier or sklearn.ensemble.RandomForestClassifier, depending on the value of algorithm_type.

  • model_path (Optional[str]) – Path to a pre-trained model file (.pkl). If provided, skips training.

  • algorithm_type (str, optional) – Type of model to use (‘Random Forest’ or ‘Decision Tree’). Defaults to ‘Random Forest’.

Returns:

A rule-based classifier instance constructed from the trained or loaded model.

Return type:

RuleClassifier

parse_conditions(conditions)

Parses a list of condition strings into structured tuples for evaluation.

Converts conditions like “v1 <= 0.5” into a tuple representation (“v1”, “<=”, 0.5) to facilitate programmatic comparison during classification.

Parameters:

conditions (List[str]) – A list of condition strings from a rule.

Returns:

A list of parsed conditions, where each tuple contains (variable name, operator, numeric threshold).

Return type:

List[Tuple[str,str,float]]

parse_dt_rule(rule)

Parses a decision tree rule string into a structured Rule object.

This method processes a rule extracted from a Decision Tree by separating its identifier and its condition list, and then converting it into a Rule instance.

Parameters:

rule (str) – A string representing a single rule in the format “RuleName: [condition1, condition2, …]”.

Returns:

A Rule object with the extracted name, class, and condition list.

Return type:

Rule

parse_rf_rule(rule)

Parses a random forest rule string into a structured Rule object.

This method processes a rule extracted from Random Forest estimators by separating its identifier and its condition list, and then converting it into a Rule instance.

Parameters:

rule (str) – A string representing a Random Forest rule in the format “RuleName: [condition1, condition2, …]”.

Returns:

A Rule object containing the parsed name, class, and condition list.

Return type:

Rule

parse_rules(rules, algorithm_type)

Parses a raw rule string into structured Rule objects based on model type.

Depending on whether the rules originate from a Decision Tree or a Random Forest, this method delegates to the appropriate parsing logic.

Parameters:
  • rules (str) – Multiline string containing rule definitions.

  • algorithm_type (str) – The model type (‘Decision Tree’ or ‘Random Forest’).

Returns:

A list of Rule objects parsed from the input string.

Return type:

List[Rule]

process_data(test_path)

Loads and processes training and testing data from CSV files.

This method: - Reads training and test datasets, - Splits features and labels, - Encodes class labels using scikit-learn’s LabelEncoder.

Parameters:
  • train_path (str) – File path to the training CSV dataset.

  • test_path (str) – File path to the testing CSV dataset.

Returns:

A tuple containing
  • X_train (features for training),

  • y_train (encoded labels for training),

  • X_test (features for testing),

  • y_test (encoded labels for testing).

Return type:

Tuple[np.ndarray,np.ndarray,np.ndarray,np.ndarray]

save_sklearn_model()

Saves a trained scikit-learn model to disk as a pickle (.pkl) file.

The model is stored at ‘examples/files/sklearn_model.pkl’ for later reuse or inspection.

Parameters:

model (BaseEstimator) – A trained scikit-learn classifier (e.g., DecisionTreeClassifier or RandomForestClassifier).

save_tree_rules(lst, lst_class)

Saves extracted decision rules to a text file in a standardized format.

Each rule is assigned a unique name that includes the tree index, rule index, and class index. The output is saved to ‘examples/files/rules_sklearn.txt’.

Parameters:
  • rules (List[Dict[str, List[str]]]) – List of rule dictionaries organized by class name.

  • lst (List[int]) – List of feature indices (1-based), used to define feature naming.

  • lst_class (List[str]) – List of class names corresponding to output labels.

Returns:

The original rules list, unmodified.

Return type:

List[Dict[str,List[str]]]

set_custom_rule_removal(custom_function)

Allows the user to override the rule removal logic, by employing their own implementation.

Parameters:

custom_function (Callable[[List[Rule]],Tuple[List[Rule],List[Tuple[Rule,Rule]]]]) – A callback that takes a list of Rule instances as argument and returns a tuple containing a new list of rules after removing duplicates and the list of duplicate rule pairs.