The ALASCA package is described in the paper ALASCA: An R package for longitudinal and cross-sectional analysis of multivariate data by ASCA-based methods.. The paper contains several examples of how the package can be used.
This vignette will only show how to quickly get started with the ALASCA package. For more examples, see
if (!requireNamespace("devtools", quietly = TRUE))
install.packages("devtools")
devtools::install_github(“andjar/ALASCA”, ref = “main”)
If you have utilized the ALASCA package, please consider citing:
Jarmund AH, Madssen TS and Giskeødegård GF (2022) ALASCA: An R package for longitudinal and cross-sectional analysis of multivariate data by ASCA-based methods. Front. Mol. Biosci. 9:962431. doi: 10.3389/fmolb.2022.962431
@ARTICLE{10.3389/fmolb.2022.962431,
AUTHOR={Jarmund, Anders Hagen and Madssen, Torfinn Støve and Giskeødegård, Guro F.},
TITLE={ALASCA: An R package for longitudinal and cross-sectional analysis of multivariate data by ASCA-based methods},
JOURNAL={Frontiers in Molecular Biosciences},
VOLUME={9},
YEAR={2022},
URL={https://www.frontiersin.org/articles/10.3389/fmolb.2022.962431},
DOI={10.3389/fmolb.2022.962431},
ISSN={2296-889X}
}
We will start by creating an artificial data set with 100 participants, 5 time points, and 20 variables. The variables follow four patterns
n_time <- 5
n_id <- 100
n_variable <- 20
df <- rbindlist(lapply(seq(1,n_id), function(i_id) {
rbindlist(lapply(seq(1,n_variable), function(i_variable) {
r_intercept <- rnorm(1, sd = 5)
beta <- 2 + rnorm(1)
temp_data <- data.table(
id = paste0("id_", i_id),
time = seq(1, n_time) - 1,
variable = paste0("variable_", i_variable)
)
if ((i_variable %% 4) == 0) {
temp_data[, value := r_intercept + beta * time]
} else if ((i_variable %% 4) == 1) {
temp_data[, value := r_intercept - beta * time]
} else if ((i_variable %% 4) == 2) {
temp_data[, value := r_intercept - beta*n_time/2 + beta * abs(time - n_time/2)]
} else {
temp_data[, value := r_intercept + beta*n_time/2 - beta * abs(time - n_time/2)]
}
temp_data[, value := value + rnorm(n_time)]
temp_data[, value := value * i_variable/2]
temp_data
}))
}))
Overall (ignoring the random effects), the four patterns look like this:
We want time to be a categorical variable:
df[, time := paste0("t_", time)]
Your data can either be provided in long or wide format. In long format, there is one column with variable names and one column with the variable values. For example:
head(df)
#> id time variable value
#> 1: id_1 t_0 variable_1 2.4769378
#> 2: id_1 t_1 variable_1 1.0115827
#> 3: id_1 t_2 variable_1 -0.8411744
#> 4: id_1 t_3 variable_1 -1.1813231
#> 5: id_1 t_4 variable_1 -3.0772545
#> 6: id_1 t_0 variable_2 -5.9339985
In wide format, each variable has a separate column:
head(dcast(data = df, ... ~ variable))
#> id time variable_1 variable_10 variable_11 variable_12 variable_13
#> 1: id_1 t_0 2.4769378 -35.94635 -45.35875 44.66768 67.171158
#> 2: id_1 t_1 1.0115827 -47.67568 -35.25310 39.04289 47.716234
#> 3: id_1 t_2 -0.8411744 -48.66101 -15.12380 33.58592 51.664226
#> 4: id_1 t_3 -1.1813231 -52.54843 -22.20310 52.73221 28.748378
#> 5: id_1 t_4 -3.0772545 -44.94631 -25.90227 66.61952 16.283738
#> 6: id_10 t_0 0.0443992 34.87501 21.86164 28.39654 9.949849
#> variable_14 variable_15 variable_16 variable_17 variable_18 variable_19
#> 1: 74.82333 -79.56599 30.31720 -7.394496 -57.60884 -18.153458
#> 2: 62.80409 -69.31716 26.06098 -22.636185 -82.55770 11.162799
#> 3: 39.95578 -69.36965 60.04150 -35.164581 -121.22651 36.579156
#> 4: 33.55430 -76.73088 88.16500 -30.210472 -133.42926 29.447347
#> 5: 60.75067 -61.41886 109.99894 -39.079021 -91.41670 8.549952
#> 6: -34.83746 19.31289 -17.64939 -9.643601 -11.97513 47.819253
#> variable_2 variable_20 variable_3 variable_4 variable_5 variable_6
#> 1: -5.933999 -49.15108 -0.03570218 11.11355 0.3497869 -10.620561
#> 2: -5.939178 -30.81022 3.20041468 14.05601 3.9081466 -6.652098
#> 3: -7.608419 14.29663 4.80352499 16.59040 -3.8849583 -17.966355
#> 4: -8.632151 36.31787 5.07881888 20.57171 0.4075008 -14.904063
#> 5: -7.139009 93.05832 -0.50478694 28.58191 -5.2086206 -16.391410
#> 6: -5.838160 -33.53981 -1.47054236 11.96847 5.4414614 -24.266648
#> variable_7 variable_8 variable_9
#> 1: 15.55688 10.14435 -34.4222387
#> 2: 19.31045 15.09871 -46.2560162
#> 3: 29.84184 22.75923 -54.9854605
#> 4: 25.72539 26.06270 -61.3664425
#> 5: 20.97858 34.30927 -80.4137867
#> 6: -28.21639 -22.49494 0.5137732
ALASCA supports both formats but defaults to long format. To use wide
format, you have to set wide = TRUE
.
In this example, we are only looking at the common time development. For examples involving group differences, see the vignette on regression models.
To assess the time development in this data set, we will use the
regression formula value ~ time + (1|id)
. Here,
value
is the measured variable value, time
the
predictor, and (1|id)
a random intercept per
participant-id. ALASCA will implicitly run the regression for each
variable separately.
res <- ALASCA(
df,
value ~ time + (1|id)
)
#> INFO [2024-01-18 00:08:50] Initializing ALASCA (v1.0.14, 2024-01-17)
#> WARN [2024-01-18 00:08:50] Guessing effects: `time`
#> INFO [2024-01-18 00:08:50] Will use linear mixed models!
#> INFO [2024-01-18 00:08:50] Will use Rfast!
#> WARN [2024-01-18 00:08:50] The `time` column is used for stratification
#> WARN [2024-01-18 00:08:50] Converting `character` columns to factors
#> INFO [2024-01-18 00:08:50] Scaling data with sdall ...
#> INFO [2024-01-18 00:08:50] Calculating LMM coefficients
#> INFO [2024-01-18 00:08:51] ==== ALASCA has finished ====
#> INFO [2024-01-18 00:08:51] To visualize the model, try `plot(<object>, effect = 1, component = 1, type = 'effect')`
The ALASCA function will provide output with important information:
Guessing effects: 'time'
When effects are not
explicitly provided to ALASCA, the package will try to guess the effects
you are interested in. See the
vignette on regression models for details.Will use linear mixed models!
ALASCA will use linear
mixed models when you provide a random effect in the regression formula
(i.e., (1|id)
)Will use Rfast!
Linear mixed model regression can be
performed by one out of two different R packages: the lme4
package or the Rfast
package
The 'time' column is used for stratification
This is
only important for model validation. For details, see the vignette on model
validation
Converting 'character' columns to factors
We provided
time as a character variable and ALASCA converts it to a factor
variable. If the levels of your variable matters and they are not in
alphabetical order, you may want to convert the variable to a factor by
yourself.Scaling data with sdall ...
ALASCA supports various
scalings, and sdall
is the default. For details, see our
paper ALASCA:
An R package for longitudinal and cross-sectional analysis of
multivariate data by ASCA-based methods.
Calculating LMM coefficients
Simply informs you that
the regression is ongoing as this may take some timeTo see the resulting model:
plot(res, component = c(1,2), type = 'effect')
#> INFO [2024-01-18 00:08:51] Effect plot. Selected effect (nr 1): `time`. Component: 1 and 2.
#> WARN [2024-01-18 00:08:51] Showing 20 of 20 variables. Adjust the number with `n_limit`
#> WARN [2024-01-18 00:08:52] Showing 20 of 20 variables. Adjust the number with `n_limit`
See the vignette on plotting the model for more visualizations.