Package 'RulesTools' reference manual

Title:	Preparing, Analyzing, and Visualizing Association Rules
Description:	Streamlines data preprocessing, analysis, and visualization for association rule mining. Designed to work with the 'arules' package, features include discretizing data frames, generating rule set intersections, and visualizing rules with heatmaps and Euler diagrams. 'RulesTools' also includes a dataset on Brook trout detection from Nolan et al. (2022) <doi:10.1007/s13412-022-00800-x>.
Authors:	Nikolett Toth [aut, cre], Jarrett Phillips [ctb]
Maintainer:	Nikolett Toth <[email protected]>
License:	MIT + file LICENSE
Version:	0.1.1
Built:	2025-02-28 05:33:28 UTC
Source:	https://github.com/nikolett0203/rulestools

Brook Trout eDNA and Environmental Data

Description

This dataset contains information on brook trout detections using environmental DNA (eDNA) and environmental parameters collected from various sites in Ontario, Canada. The data was sourced from a scientific study comparing eDNA sampling methods with electrofishing to detect Brook trout populations.

Usage

BrookTrout
BrookTrout

Format

A dataframe with 10 variables and multiple rows (one row per sample):

Backpack: Character. The type of eDNA sampler: "OSMOS" or "ANDe".
Site: Integer. The site number where the sample was taken.
eFishCatch: Integer. The number of fish caught via electrofishing.
AirTemp: Numeric. Air temperature in degrees Celsius.
WaterTemp: Numeric. Water temperature in degrees Celsius.
pH: Numeric. pH level of the water sample.
DissolvedOxygen: Numeric. Dissolved oxygen concentration in mg/L.
Conductivity: Numeric. Conductivity in uS/cm.
VolumeFiltered: Numeric. Volume of water filtered in litres.
eDNAConc: Numeric. eDNA concentration in copies per microlitre.

Source

Adapted from Nolan, K. P., Loeza-Quintana, T., Little, H. A., et al. (2022). Detection of brook trout in spatiotemporally separate locations using validated eDNA technology. Journal of Environmental Studies and Sciences, 13, 66-82. doi:10.1007/s13412-022-00800-x

Examples

data(BrookTrout)
summary(BrookTrout)
plot(eDNAConc ~ Site, data = BrookTrout)
data(BrookTrout)
summary(BrookTrout)
plot(eDNAConc ~ Site, data = BrookTrout)

Compare Association Rule Sets and Find Their Intersections

Description

Compares multiple sets of association rules, identifies intersections, and optionally displays the results or writes them to a CSV file.

Usage

compare_rules(..., display = TRUE, filename = NULL)
compare_rules(..., display = TRUE, filename = NULL)

Arguments

`...`	Named association rule sets (objects of class `rules`).
`display`	Logical. If `TRUE`, prints the intersection results. Default is `TRUE`.
`filename`	Character string. If provided, writes the results to a CSV file. Default is `NULL`.

Value

A list containing the intersections of the provided rule sets.

Examples

library(arules)
data(BrookTrout)

# Discretize the BrookTrout dataset
discrete_bt <- dtize_df(BrookTrout, cutoff = "mean")

# Generate the first set of rules with a confidence threshold of 0.5
rules1 <- apriori(
  discrete_bt,
  parameter = list(supp = 0.01, conf = 0.5, target = "rules")
)

# Generate the second set of rules with a higher confidence threshold of 0.6
rules2 <- apriori(
  discrete_bt,
  parameter = list(supp = 0.01, conf = 0.6, target = "rules")
)

# Compare the two sets of rules and display the intersections
compare_rules(
  r1 = rules1,
  r2 = rules2,
  display = TRUE
)

# If `filename = "intersections.csv"`, the data is saved in a .csv file

library(arules)
data(BrookTrout)

# Discretize the BrookTrout dataset
discrete_bt <- dtize_df(BrookTrout, cutoff = "mean")

# Generate the first set of rules with a confidence threshold of 0.5
rules1 <- apriori(
  discrete_bt,
  parameter = list(supp = 0.01, conf = 0.5, target = "rules")
)

# Generate the second set of rules with a higher confidence threshold of 0.6
rules2 <- apriori(
  discrete_bt,
  parameter = list(supp = 0.01, conf = 0.6, target = "rules")
)

# Compare the two sets of rules and display the intersections
compare_rules(
  r1 = rules1,
  r2 = rules2,
  display = TRUE
)

# If `filename = "intersections.csv"`, the data is saved in a .csv file

Discretize a Numeric Column

Description

Discretizes a numeric vector into categories based on specified cutoff points. The function handles missing values, allows for infinite bounds, and supports predefined cutoffs such as the mean or median.

Usage

dtize_col(
  column,
  cutoff = "median",
  labels = c("low", "high"),
  include_right = TRUE,
  infinity = TRUE,
  include_lowest = TRUE,
  na_fill = "none"
)
dtize_col(
  column,
  cutoff = "median",
  labels = c("low", "high"),
  include_right = TRUE,
  infinity = TRUE,
  include_lowest = TRUE,
  na_fill = "none"
)

Arguments

`column`	A numeric vector to discretize.
`cutoff`	A numeric vector specifying cutoff points, or a string ("mean" or "median").
`labels`	A character vector specifying labels for the resulting categories.
`include_right`	Logical. If `TRUE`, intervals are closed on the right (default `TRUE`).
`infinity`	Logical. If `TRUE`, extends cutoffs to `-Inf` and `Inf` (default `TRUE`).
`include_lowest`	Logical. If `TRUE`, the lowest interval is closed on the left (default `TRUE`).
`na_fill`	A string specifying the method to impute missing values: "none", "mean", or "median" (default "none").

Value

A factor with the same length as column, where each value is categorized based on the cutoffs.

Examples

data(BrookTrout)

# Example with predefined cutoffs
discrete_water_temp <- dtize_col(
  BrookTrout$eDNAConc, cutoff=13.3,
  labels=c("low", "high"),
  infinity=TRUE
)

# Example with median as cutoff
discrete_pH <- dtize_col(BrookTrout$pH, cutoff="median")

# Example with missing value imputation
filled_col <- dtize_col(
  c(1, 2, NA, 4, 5),
  cutoff = "mean",
  include_right=FALSE,
  na_fill = "mean"
)

data(BrookTrout)

# Example with predefined cutoffs
discrete_water_temp <- dtize_col(
  BrookTrout$eDNAConc, cutoff=13.3,
  labels=c("low", "high"),
  infinity=TRUE
)

# Example with median as cutoff
discrete_pH <- dtize_col(BrookTrout$pH, cutoff="median")

# Example with missing value imputation
filled_col <- dtize_col(
  c(1, 2, NA, 4, 5),
  cutoff = "mean",
  include_right=FALSE,
  na_fill = "mean"
)

Discretize Dataframe Columns

Description

Discretizes numeric columns of a dataframe based on specified splitting criteria, and handles missing values using specified imputation methods.

Usage

dtize_df(
  data,
  cutoff = "median",
  labels = c("low", "high"),
  include_right = TRUE,
  infinity = TRUE,
  include_lowest = TRUE,
  na_fill = "none",
  m = 5,
  maxit = 5,
  seed = NULL,
  printFlag = FALSE
)
dtize_df(
  data,
  cutoff = "median",
  labels = c("low", "high"),
  include_right = TRUE,
  infinity = TRUE,
  include_lowest = TRUE,
  na_fill = "none",
  m = 5,
  maxit = 5,
  seed = NULL,
  printFlag = FALSE
)

Arguments

`data`	A dataframe containing the data to be discretized.
`cutoff`	A character string specifying the splitting method for numeric columns. Options are `"median"` (default), `"mean"` or a custom numeric vector of split points.
`labels`	A character vector of labels for the discretized categories. Default is `c("low", "high")`.
`include_right`	A logical value indicating if the intervals should be closed on the right. Default is `TRUE`.
`infinity`	A logical value indicating if the split intervals should extend to infinity. Default is `TRUE`.
`include_lowest`	A logical value indicating if the lowest value should be included in the first interval. Default is `TRUE`.
`na_fill`	A character string specifying the imputation method for handling missing values. Options are `"none"` (default), `"mean"`, `"median"`, or `"pmm"` (predictive mean matching).
`m`	An integer specifying the number of multiple imputations if `na_fill = "pmm"`. Default is `5`.
`maxit`	An integer specifying the maximum number of iterations for the `mice` algorithm. Default is `5`.
`seed`	An integer seed for reproducibility of the imputation process. Default is `NULL`.
`printFlag`	A logical value indicating if `mice` should print logs during imputation. Default is `FALSE`.

Value

A dataframe with numeric columns discretized and missing values handled based on the specified imputation method.

Examples

data(BrookTrout)

# Example with median as cutoff
med_df <- dtize_df(
  BrookTrout,
  cutoff="median",
  labels=c("below median", "above median")
)

# Example with mean as cutoff
mean_df <- dtize_df(
  BrookTrout,
  cutoff="mean",
  include_right=FALSE
)

# Example with missing value imputation
air <- dtize_df(
  airquality,
  cutoff="mean",
  na_fill="pmm",
  m=10,
  maxit=10,
  seed=42
)


data(BrookTrout)

# Example with median as cutoff
med_df <- dtize_df(
  BrookTrout,
  cutoff="median",
  labels=c("below median", "above median")
)

# Example with mean as cutoff
mean_df <- dtize_df(
  BrookTrout,
  cutoff="mean",
  include_right=FALSE
)

# Example with missing value imputation
air <- dtize_df(
  airquality,
  cutoff="mean",
  na_fill="pmm",
  m=10,
  maxit=10,
  seed=42
)

Create an Euler Diagram for Association Rules

Description

Generates an Euler diagram visualization for up to 4 sets of association rules. The function displays the relationships between rule sets with customizable colors, transparency, and labels.

Usage

rule_euler(
  rules,
  fill_color = NULL,
  fill_alpha = 0.5,
  stroke_color = "black",
  stroke_size = 1,
  title = NULL,
  name_color = "black",
  name_size = 12,
  text_color = "black",
  text_size = 11,
  show_legend = FALSE,
  legend_position = "bottom",
  nrow = NULL,
  ncol = NULL
)
rule_euler(
  rules,
  fill_color = NULL,
  fill_alpha = 0.5,
  stroke_color = "black",
  stroke_size = 1,
  title = NULL,
  name_color = "black",
  name_size = 12,
  text_color = "black",
  text_size = 11,
  show_legend = FALSE,
  legend_position = "bottom",
  nrow = NULL,
  ncol = NULL
)

Arguments

`rules`	A list of `rules` objects from the `arules` package. The list must contain between 2 and 4 `rules` objects.
`fill_color`	A character vector of valid R color names or hex color codes for filling the sets. If `NULL`, default colors `c("red", "blue", "green", "purple")` will be used. Defaults to `NULL`.
`fill_alpha`	A numeric value between 0 and 1 specifying the transparency of the fill colors. Defaults to `0.5`.
`stroke_color`	A character string specifying the color of the set borders. Defaults to `"black"`.
`stroke_size`	A positive numeric value specifying the size of the set borders. Defaults to `1`.
`title`	A character string specifying the title of the Euler diagram. Defaults to `NULL`.
`name_color`	A character string specifying the color of the set names. Defaults to `"black"`.
`name_size`	A positive numeric value specifying the font size of the set names. Defaults to `12`.
`text_color`	A character string specifying the color of the quantity labels (counts) in the diagram. Defaults to `"black"`.
`text_size`	A positive numeric value specifying the font size of the quantities (counts). Defaults to `11`.
`show_legend`	A logical value indicating whether to display a legend for the sets rather than labels. Defaults to `FALSE`.
`legend_position`	A character string specifying the position of the legend. Must be one of `"top"`, `"bottom"`, `"left"`, or `"right"`. Defaults to `"bottom"`.
`nrow`	An optional numeric value specifying the number of rows in the legend layout. If `NULL`, the number of rows is calculated automatically. Defaults to `NULL`.
`ncol`	An optional numeric value specifying the number of columns in the legend layout. If `NULL`, the number of columns is calculated automatically. Defaults to `NULL`.

Value

A plot object displaying the Euler diagram visualization.

Examples

library(arules)
data(BrookTrout)

# Discretize the BrookTrout dataset
discrete_bt <- dtize_df(BrookTrout, cutoff = "median")

# Generate the first set of rules with a confidence threshold of 0.5
rules1 <- apriori(
  discrete_bt,
  parameter = list(supp = 0.01, conf = 0.5, target = "rules")
)

# Generate the second set of rules with a higher confidence threshold of 0.6
rules2 <- apriori(
  discrete_bt,
  parameter = list(supp = 0.01, conf = 0.6, target = "rules")
)

# Create an Euler diagram to visualize the intersections between the rule sets
rule_euler(
  rules = list(conf0.5 = rules1, conf0.6 = rules2),
  title = "Euler Diagram of BrookTrout Rule Sets",
  fill_color = c("#7832ff", "lightgreen"),
  stroke_color = "darkblue"
)

library(arules)
data(BrookTrout)

# Discretize the BrookTrout dataset
discrete_bt <- dtize_df(BrookTrout, cutoff = "median")

# Generate the first set of rules with a confidence threshold of 0.5
rules1 <- apriori(
  discrete_bt,
  parameter = list(supp = 0.01, conf = 0.5, target = "rules")
)

# Generate the second set of rules with a higher confidence threshold of 0.6
rules2 <- apriori(
  discrete_bt,
  parameter = list(supp = 0.01, conf = 0.6, target = "rules")
)

# Create an Euler diagram to visualize the intersections between the rule sets
rule_euler(
  rules = list(conf0.5 = rules1, conf0.6 = rules2),
  title = "Euler Diagram of BrookTrout Rule Sets",
  fill_color = c("#7832ff", "lightgreen"),
  stroke_color = "darkblue"
)

Create a Heatmap for Association Rules

Description

Generates a heatmap visualization of association rules, showing relationships between antecedents and consequents based on a specified metric.

Usage

rule_heatmap(
  rules,
  metric = "confidence",
  graph_title = "",
  graph_title_size = 14,
  x_axis_title = "Antecedents",
  x_axis_title_size = 12,
  x_axis_text_size = 11,
  x_axis_text_angle = 45,
  y_axis_title = "Consequents",
  y_axis_title_size = 12,
  y_axis_text_size = 11,
  y_axis_text_angle = 0,
  legend_title = metric,
  legend_text_size = 8,
  legend_position = "right",
  low_color = "lightblue",
  high_color = "navy",
  include_zero = FALSE
)
rule_heatmap(
  rules,
  metric = "confidence",
  graph_title = "",
  graph_title_size = 14,
  x_axis_title = "Antecedents",
  x_axis_title_size = 12,
  x_axis_text_size = 11,
  x_axis_text_angle = 45,
  y_axis_title = "Consequents",
  y_axis_title_size = 12,
  y_axis_text_size = 11,
  y_axis_text_angle = 0,
  legend_title = metric,
  legend_text_size = 8,
  legend_position = "right",
  low_color = "lightblue",
  high_color = "navy",
  include_zero = FALSE
)

Arguments

`rules`	An object of class `rules` from the `arules` package.
`metric`	A character string specifying the metric to use for coloring the heatmap. Must be one of `"confidence"`, `"support"`, or `"lift"`. Defaults to `"confidence"`.
`graph_title`	A character string specifying the title of the graph. Defaults to an empty string (`""`).
`graph_title_size`	A numeric value specifying the size of the graph title text. Defaults to `14`.
`x_axis_title`	A character string specifying the title for the x-axis. Defaults to `"Antecedents"`.
`x_axis_title_size`	A numeric value specifying the size of the x-axis title text. Defaults to `12`.
`x_axis_text_size`	A numeric value specifying the size of the x-axis text. Defaults to `11`.
`x_axis_text_angle`	A numeric value specifying the angle of the x-axis text. Defaults to `45`.
`y_axis_title`	A character string specifying the title for the y-axis. Defaults to `"Consequents"`.
`y_axis_title_size`	A numeric value specifying the size of the y-axis title text. Defaults to `12`.
`y_axis_text_size`	A numeric value specifying the size of the y-axis text. Defaults to `11`.
`y_axis_text_angle`	A numeric value specifying the angle of the y-axis text. Defaults to `0`.
`legend_title`	A character string specifying the title of the legend. Defaults to the value of `metric`.
`legend_text_size`	A numeric value specifying the size of the legend text. Defaults to `8`.
`legend_position`	A character string specifying the position of the legend. Possible values are `"right"` (default), `"left"`, `"top"`, `"bottom"`, or `"none"`.
`low_color`	A valid R color or hex color code for the lower bound of the gradient. Defaults to `"lightblue"`.
`high_color`	A valid R color or hex color code for the upper bound of the gradient. Defaults to `"navy"`.
`include_zero`	A logical value indicating whether to include zero values for missing antecedent-consequent combinations. Defaults to `FALSE`.

Value

A ggplot object representing the heatmap visualization of the association rules.

Examples

library(arules)
library(tidyr)
data(BrookTrout)

# Discretise data
discrete_bt <- dtize_df(BrookTrout, cutoff="median")

# Generate rules
rules <- apriori(
  discrete_bt,
  parameter = list(supp = 0.01, conf = 0.5, target = "rules"),
  appearance = list(rhs="eDNAConc=high")
)

# Subset ruleset (too many rules won't fit on the heatmap)
rules <- rules %>%
  subset(!is.redundant(., measure = "confidence")) %>%
  subset(is.significant(., alpha = 0.05)) %>%
  sort(by = c("confidence", "lift", "support"))

# Create a heatmap of the rules using confidence as the metric
rule_heatmap(
  rules,
  metric = "confidence",
  graph_title = "Confidence Heatmap"
)

# Create a heatmap of the rules using lift as the metric
rule_heatmap(
  rules,
  metric = "lift",
  graph_title = "Lift Heatmap",
  low_color = "#D4A221",
  high_color = "darkgreen"
)

library(arules)
library(tidyr)
data(BrookTrout)

# Discretise data
discrete_bt <- dtize_df(BrookTrout, cutoff="median")

# Generate rules
rules <- apriori(
  discrete_bt,
  parameter = list(supp = 0.01, conf = 0.5, target = "rules"),
  appearance = list(rhs="eDNAConc=high")
)

# Subset ruleset (too many rules won't fit on the heatmap)
rules <- rules %>%
  subset(!is.redundant(., measure = "confidence")) %>%
  subset(is.significant(., alpha = 0.05)) %>%
  sort(by = c("confidence", "lift", "support"))

# Create a heatmap of the rules using confidence as the metric
rule_heatmap(
  rules,
  metric = "confidence",
  graph_title = "Confidence Heatmap"
)

# Create a heatmap of the rules using lift as the metric
rule_heatmap(
  rules,
  metric = "lift",
  graph_title = "Lift Heatmap",
  low_color = "#D4A221",
  high_color = "darkgreen"
)

Package 'RulesTools'

Help Index

Brook Trout eDNA and Environmental Data

Description

Usage

Format

Source

Examples

Compare Association Rule Sets and Find Their Intersections

Description

Usage

Arguments

Value

Examples

Discretize a Numeric Column

Description

Usage

Arguments

Value

Examples

Discretize Dataframe Columns

Description

Usage

Arguments

Value

Examples

Create an Euler Diagram for Association Rules

Description

Usage

Arguments

Value

Examples

Create a Heatmap for Association Rules

Description

Usage

Arguments

Value

Examples