Title: | Preparing, Analyzing, and Visualizing Association Rules |
---|---|
Description: | Streamlines data preprocessing, analysis, and visualization for association rule mining. Designed to work with the 'arules' package, features include discretizing data frames, generating rule set intersections, and visualizing rules with heatmaps and Euler diagrams. 'RulesTools' also includes a dataset on Brook trout detection from Nolan et al. (2022) <doi:10.1007/s13412-022-00800-x>. |
Authors: | Nikolett Toth [aut, cre], Jarrett Phillips [ctb] |
Maintainer: | Nikolett Toth <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.1 |
Built: | 2025-02-28 05:33:28 UTC |
Source: | https://github.com/nikolett0203/rulestools |
This dataset contains information on brook trout detections using environmental DNA (eDNA) and environmental parameters collected from various sites in Ontario, Canada. The data was sourced from a scientific study comparing eDNA sampling methods with electrofishing to detect Brook trout populations.
BrookTrout
BrookTrout
A dataframe with 10 variables and multiple rows (one row per sample):
Character. The type of eDNA sampler: "OSMOS" or "ANDe".
Integer. The site number where the sample was taken.
Integer. The number of fish caught via electrofishing.
Numeric. Air temperature in degrees Celsius.
Numeric. Water temperature in degrees Celsius.
Numeric. pH level of the water sample.
Numeric. Dissolved oxygen concentration in mg/L.
Numeric. Conductivity in uS/cm.
Numeric. Volume of water filtered in litres.
Numeric. eDNA concentration in copies per microlitre.
Adapted from Nolan, K. P., Loeza-Quintana, T., Little, H. A., et al. (2022). Detection of brook trout in spatiotemporally separate locations using validated eDNA technology. Journal of Environmental Studies and Sciences, 13, 66-82. doi:10.1007/s13412-022-00800-x
data(BrookTrout) summary(BrookTrout) plot(eDNAConc ~ Site, data = BrookTrout)
data(BrookTrout) summary(BrookTrout) plot(eDNAConc ~ Site, data = BrookTrout)
Compares multiple sets of association rules, identifies intersections, and optionally displays the results or writes them to a CSV file.
compare_rules(..., display = TRUE, filename = NULL)
compare_rules(..., display = TRUE, filename = NULL)
... |
Named association rule sets (objects of class |
display |
Logical. If |
filename |
Character string. If provided, writes the results to a CSV file. Default is |
A list containing the intersections of the provided rule sets.
library(arules) data(BrookTrout) # Discretize the BrookTrout dataset discrete_bt <- dtize_df(BrookTrout, cutoff = "mean") # Generate the first set of rules with a confidence threshold of 0.5 rules1 <- apriori( discrete_bt, parameter = list(supp = 0.01, conf = 0.5, target = "rules") ) # Generate the second set of rules with a higher confidence threshold of 0.6 rules2 <- apriori( discrete_bt, parameter = list(supp = 0.01, conf = 0.6, target = "rules") ) # Compare the two sets of rules and display the intersections compare_rules( r1 = rules1, r2 = rules2, display = TRUE ) # If `filename = "intersections.csv"`, the data is saved in a .csv file
library(arules) data(BrookTrout) # Discretize the BrookTrout dataset discrete_bt <- dtize_df(BrookTrout, cutoff = "mean") # Generate the first set of rules with a confidence threshold of 0.5 rules1 <- apriori( discrete_bt, parameter = list(supp = 0.01, conf = 0.5, target = "rules") ) # Generate the second set of rules with a higher confidence threshold of 0.6 rules2 <- apriori( discrete_bt, parameter = list(supp = 0.01, conf = 0.6, target = "rules") ) # Compare the two sets of rules and display the intersections compare_rules( r1 = rules1, r2 = rules2, display = TRUE ) # If `filename = "intersections.csv"`, the data is saved in a .csv file
Discretizes a numeric vector into categories based on specified cutoff points. The function handles missing values, allows for infinite bounds, and supports predefined cutoffs such as the mean or median.
dtize_col( column, cutoff = "median", labels = c("low", "high"), include_right = TRUE, infinity = TRUE, include_lowest = TRUE, na_fill = "none" )
dtize_col( column, cutoff = "median", labels = c("low", "high"), include_right = TRUE, infinity = TRUE, include_lowest = TRUE, na_fill = "none" )
column |
A numeric vector to discretize. |
cutoff |
A numeric vector specifying cutoff points, or a string ("mean" or "median"). |
labels |
A character vector specifying labels for the resulting categories. |
include_right |
Logical. If |
infinity |
Logical. If |
include_lowest |
Logical. If |
na_fill |
A string specifying the method to impute missing values: "none", "mean", or "median" (default "none"). |
A factor with the same length as column
, where each value is categorized based on the cutoffs.
data(BrookTrout) # Example with predefined cutoffs discrete_water_temp <- dtize_col( BrookTrout$eDNAConc, cutoff=13.3, labels=c("low", "high"), infinity=TRUE ) # Example with median as cutoff discrete_pH <- dtize_col(BrookTrout$pH, cutoff="median") # Example with missing value imputation filled_col <- dtize_col( c(1, 2, NA, 4, 5), cutoff = "mean", include_right=FALSE, na_fill = "mean" )
data(BrookTrout) # Example with predefined cutoffs discrete_water_temp <- dtize_col( BrookTrout$eDNAConc, cutoff=13.3, labels=c("low", "high"), infinity=TRUE ) # Example with median as cutoff discrete_pH <- dtize_col(BrookTrout$pH, cutoff="median") # Example with missing value imputation filled_col <- dtize_col( c(1, 2, NA, 4, 5), cutoff = "mean", include_right=FALSE, na_fill = "mean" )
Discretizes numeric columns of a dataframe based on specified splitting criteria, and handles missing values using specified imputation methods.
dtize_df( data, cutoff = "median", labels = c("low", "high"), include_right = TRUE, infinity = TRUE, include_lowest = TRUE, na_fill = "none", m = 5, maxit = 5, seed = NULL, printFlag = FALSE )
dtize_df( data, cutoff = "median", labels = c("low", "high"), include_right = TRUE, infinity = TRUE, include_lowest = TRUE, na_fill = "none", m = 5, maxit = 5, seed = NULL, printFlag = FALSE )
data |
A dataframe containing the data to be discretized. |
cutoff |
A character string specifying the splitting method for numeric columns.
Options are |
labels |
A character vector of labels for the discretized categories. Default is |
include_right |
A logical value indicating if the intervals should be closed on the right. Default is |
infinity |
A logical value indicating if the split intervals should extend to infinity. Default is |
include_lowest |
A logical value indicating if the lowest value should be included in the first interval. Default is |
na_fill |
A character string specifying the imputation method for handling missing values.
Options are |
m |
An integer specifying the number of multiple imputations if |
maxit |
An integer specifying the maximum number of iterations for the |
seed |
An integer seed for reproducibility of the imputation process. Default is |
printFlag |
A logical value indicating if |
A dataframe with numeric columns discretized and missing values handled based on the specified imputation method.
data(BrookTrout) # Example with median as cutoff med_df <- dtize_df( BrookTrout, cutoff="median", labels=c("below median", "above median") ) # Example with mean as cutoff mean_df <- dtize_df( BrookTrout, cutoff="mean", include_right=FALSE ) # Example with missing value imputation air <- dtize_df( airquality, cutoff="mean", na_fill="pmm", m=10, maxit=10, seed=42 )
data(BrookTrout) # Example with median as cutoff med_df <- dtize_df( BrookTrout, cutoff="median", labels=c("below median", "above median") ) # Example with mean as cutoff mean_df <- dtize_df( BrookTrout, cutoff="mean", include_right=FALSE ) # Example with missing value imputation air <- dtize_df( airquality, cutoff="mean", na_fill="pmm", m=10, maxit=10, seed=42 )
Generates an Euler diagram visualization for up to 4 sets of association rules. The function displays the relationships between rule sets with customizable colors, transparency, and labels.
rule_euler( rules, fill_color = NULL, fill_alpha = 0.5, stroke_color = "black", stroke_size = 1, title = NULL, name_color = "black", name_size = 12, text_color = "black", text_size = 11, show_legend = FALSE, legend_position = "bottom", nrow = NULL, ncol = NULL )
rule_euler( rules, fill_color = NULL, fill_alpha = 0.5, stroke_color = "black", stroke_size = 1, title = NULL, name_color = "black", name_size = 12, text_color = "black", text_size = 11, show_legend = FALSE, legend_position = "bottom", nrow = NULL, ncol = NULL )
rules |
A list of |
fill_color |
A character vector of valid R color names or hex color codes for filling the sets.
If |
fill_alpha |
A numeric value between 0 and 1 specifying the transparency of the fill colors. Defaults to |
stroke_color |
A character string specifying the color of the set borders. Defaults to |
stroke_size |
A positive numeric value specifying the size of the set borders. Defaults to |
title |
A character string specifying the title of the Euler diagram. Defaults to |
name_color |
A character string specifying the color of the set names. Defaults to |
name_size |
A positive numeric value specifying the font size of the set names. Defaults to |
text_color |
A character string specifying the color of the quantity labels (counts) in the diagram. Defaults to |
text_size |
A positive numeric value specifying the font size of the quantities (counts). Defaults to |
show_legend |
A logical value indicating whether to display a legend for the sets rather than labels. Defaults to |
legend_position |
A character string specifying the position of the legend. Must be one of |
nrow |
An optional numeric value specifying the number of rows in the legend layout. If |
ncol |
An optional numeric value specifying the number of columns in the legend layout. If |
A plot
object displaying the Euler diagram visualization.
library(arules) data(BrookTrout) # Discretize the BrookTrout dataset discrete_bt <- dtize_df(BrookTrout, cutoff = "median") # Generate the first set of rules with a confidence threshold of 0.5 rules1 <- apriori( discrete_bt, parameter = list(supp = 0.01, conf = 0.5, target = "rules") ) # Generate the second set of rules with a higher confidence threshold of 0.6 rules2 <- apriori( discrete_bt, parameter = list(supp = 0.01, conf = 0.6, target = "rules") ) # Create an Euler diagram to visualize the intersections between the rule sets rule_euler( rules = list(conf0.5 = rules1, conf0.6 = rules2), title = "Euler Diagram of BrookTrout Rule Sets", fill_color = c("#7832ff", "lightgreen"), stroke_color = "darkblue" )
library(arules) data(BrookTrout) # Discretize the BrookTrout dataset discrete_bt <- dtize_df(BrookTrout, cutoff = "median") # Generate the first set of rules with a confidence threshold of 0.5 rules1 <- apriori( discrete_bt, parameter = list(supp = 0.01, conf = 0.5, target = "rules") ) # Generate the second set of rules with a higher confidence threshold of 0.6 rules2 <- apriori( discrete_bt, parameter = list(supp = 0.01, conf = 0.6, target = "rules") ) # Create an Euler diagram to visualize the intersections between the rule sets rule_euler( rules = list(conf0.5 = rules1, conf0.6 = rules2), title = "Euler Diagram of BrookTrout Rule Sets", fill_color = c("#7832ff", "lightgreen"), stroke_color = "darkblue" )
Generates a heatmap visualization of association rules, showing relationships between antecedents and consequents based on a specified metric.
rule_heatmap( rules, metric = "confidence", graph_title = "", graph_title_size = 14, x_axis_title = "Antecedents", x_axis_title_size = 12, x_axis_text_size = 11, x_axis_text_angle = 45, y_axis_title = "Consequents", y_axis_title_size = 12, y_axis_text_size = 11, y_axis_text_angle = 0, legend_title = metric, legend_text_size = 8, legend_position = "right", low_color = "lightblue", high_color = "navy", include_zero = FALSE )
rule_heatmap( rules, metric = "confidence", graph_title = "", graph_title_size = 14, x_axis_title = "Antecedents", x_axis_title_size = 12, x_axis_text_size = 11, x_axis_text_angle = 45, y_axis_title = "Consequents", y_axis_title_size = 12, y_axis_text_size = 11, y_axis_text_angle = 0, legend_title = metric, legend_text_size = 8, legend_position = "right", low_color = "lightblue", high_color = "navy", include_zero = FALSE )
rules |
An object of class |
metric |
A character string specifying the metric to use for coloring the heatmap.
Must be one of |
graph_title |
A character string specifying the title of the graph.
Defaults to an empty string ( |
graph_title_size |
A numeric value specifying the size of the graph title text.
Defaults to |
x_axis_title |
A character string specifying the title for the x-axis.
Defaults to |
x_axis_title_size |
A numeric value specifying the size of the x-axis title text.
Defaults to |
x_axis_text_size |
A numeric value specifying the size of the x-axis text.
Defaults to |
x_axis_text_angle |
A numeric value specifying the angle of the x-axis text.
Defaults to |
y_axis_title |
A character string specifying the title for the y-axis.
Defaults to |
y_axis_title_size |
A numeric value specifying the size of the y-axis title text.
Defaults to |
y_axis_text_size |
A numeric value specifying the size of the y-axis text.
Defaults to |
y_axis_text_angle |
A numeric value specifying the angle of the y-axis text.
Defaults to |
legend_title |
A character string specifying the title of the legend. Defaults to the value of |
legend_text_size |
A numeric value specifying the size of the legend text. Defaults to |
legend_position |
A character string specifying the position of the legend.
Possible values are |
low_color |
A valid R color or hex color code for the lower bound of the gradient.
Defaults to |
high_color |
A valid R color or hex color code for the upper bound of the gradient.
Defaults to |
include_zero |
A logical value indicating whether to include zero values for missing antecedent-consequent combinations.
Defaults to |
A ggplot
object representing the heatmap visualization of the association rules.
library(arules) library(tidyr) data(BrookTrout) # Discretise data discrete_bt <- dtize_df(BrookTrout, cutoff="median") # Generate rules rules <- apriori( discrete_bt, parameter = list(supp = 0.01, conf = 0.5, target = "rules"), appearance = list(rhs="eDNAConc=high") ) # Subset ruleset (too many rules won't fit on the heatmap) rules <- rules %>% subset(!is.redundant(., measure = "confidence")) %>% subset(is.significant(., alpha = 0.05)) %>% sort(by = c("confidence", "lift", "support")) # Create a heatmap of the rules using confidence as the metric rule_heatmap( rules, metric = "confidence", graph_title = "Confidence Heatmap" ) # Create a heatmap of the rules using lift as the metric rule_heatmap( rules, metric = "lift", graph_title = "Lift Heatmap", low_color = "#D4A221", high_color = "darkgreen" )
library(arules) library(tidyr) data(BrookTrout) # Discretise data discrete_bt <- dtize_df(BrookTrout, cutoff="median") # Generate rules rules <- apriori( discrete_bt, parameter = list(supp = 0.01, conf = 0.5, target = "rules"), appearance = list(rhs="eDNAConc=high") ) # Subset ruleset (too many rules won't fit on the heatmap) rules <- rules %>% subset(!is.redundant(., measure = "confidence")) %>% subset(is.significant(., alpha = 0.05)) %>% sort(by = c("confidence", "lift", "support")) # Create a heatmap of the rules using confidence as the metric rule_heatmap( rules, metric = "confidence", graph_title = "Confidence Heatmap" ) # Create a heatmap of the rules using lift as the metric rule_heatmap( rules, metric = "lift", graph_title = "Lift Heatmap", low_color = "#D4A221", high_color = "darkgreen" )