Introduction to ALASCA

The ALASCA package is described in the paper ALASCA: An R package for longitudinal and cross-sectional analysis of multivariate data by ASCA-based methods.. The paper contains several examples of how the package can be used.

This vignette will only show how to quickly get started with the ALASCA package. For more examples, see

Installation

if (!requireNamespace("devtools", quietly = TRUE))
    install.packages("devtools")

devtools::install_github(“andjar/ALASCA”, ref = “main”)

Citation

If you have utilized the ALASCA package, please consider citing:

Jarmund AH, Madssen TS and Giskeødegård GF (2022) ALASCA: An R package for longitudinal and cross-sectional analysis of multivariate data by ASCA-based methods. Front. Mol. Biosci. 9:962431. doi: 10.3389/fmolb.2022.962431

@ARTICLE{10.3389/fmolb.2022.962431,
  AUTHOR={Jarmund, Anders Hagen and Madssen, Torfinn Støve and Giskeødegård, Guro F.},
  TITLE={ALASCA: An R package for longitudinal and cross-sectional analysis of multivariate data by ASCA-based methods},
  JOURNAL={Frontiers in Molecular Biosciences},
  VOLUME={9},
  YEAR={2022},
  URL={https://www.frontiersin.org/articles/10.3389/fmolb.2022.962431},       
  DOI={10.3389/fmolb.2022.962431},      
  ISSN={2296-889X}
}

Creating an ASCA model

Generating a data set

We will start by creating an artificial data set with 100 participants, 5 time points, and 20 variables. The variables follow four patterns

  • Linear increase
  • Linear decrease
  • A v-shape
  • An inverted v-shape
n_time     <- 5
n_id       <- 100
n_variable <- 20

df <- rbindlist(lapply(seq(1,n_id), function(i_id) {
  rbindlist(lapply(seq(1,n_variable), function(i_variable) {
    
    r_intercept <- rnorm(1, sd = 5)
    beta <- 2 + rnorm(1)
    
    temp_data <- data.table(
                  id = paste0("id_", i_id),
                  time = seq(1, n_time) - 1,
                  variable = paste0("variable_", i_variable)
                )
    if ((i_variable %% 4) == 0) {
      temp_data[, value := r_intercept + beta * time]
    } else if ((i_variable %% 4) == 1) {
      temp_data[, value := r_intercept - beta * time]
    } else if ((i_variable %% 4) == 2) {
      temp_data[, value := r_intercept - beta*n_time/2 + beta * abs(time - n_time/2)]
    } else {
      temp_data[, value := r_intercept + beta*n_time/2 - beta * abs(time - n_time/2)]
    }
    
    temp_data[, value := value + rnorm(n_time)]
    temp_data[, value := value * i_variable/2]
    temp_data
  }))
}))

Overall (ignoring the random effects), the four patterns look like this:

ggplot(df[variable %in% c("variable_1", "variable_2", "variable_3", "variable_4"),],
       aes(time, value)) +
  geom_smooth() +
  facet_wrap(~variable, scales = "free_y") +
  scale_color_viridis_d(end = 0.8)
#> `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Data format

We want time to be a categorical variable:

df[, time := paste0("t_", time)]

Your data can either be provided in long or wide format. In long format, there is one column with variable names and one column with the variable values. For example:

head(df)
#>      id time   variable      value
#> 1: id_1  t_0 variable_1  2.4769378
#> 2: id_1  t_1 variable_1  1.0115827
#> 3: id_1  t_2 variable_1 -0.8411744
#> 4: id_1  t_3 variable_1 -1.1813231
#> 5: id_1  t_4 variable_1 -3.0772545
#> 6: id_1  t_0 variable_2 -5.9339985

In wide format, each variable has a separate column:

head(dcast(data = df, ... ~ variable))
#>       id time variable_1 variable_10 variable_11 variable_12 variable_13
#> 1:  id_1  t_0  2.4769378   -35.94635   -45.35875    44.66768   67.171158
#> 2:  id_1  t_1  1.0115827   -47.67568   -35.25310    39.04289   47.716234
#> 3:  id_1  t_2 -0.8411744   -48.66101   -15.12380    33.58592   51.664226
#> 4:  id_1  t_3 -1.1813231   -52.54843   -22.20310    52.73221   28.748378
#> 5:  id_1  t_4 -3.0772545   -44.94631   -25.90227    66.61952   16.283738
#> 6: id_10  t_0  0.0443992    34.87501    21.86164    28.39654    9.949849
#>    variable_14 variable_15 variable_16 variable_17 variable_18 variable_19
#> 1:    74.82333   -79.56599    30.31720   -7.394496   -57.60884  -18.153458
#> 2:    62.80409   -69.31716    26.06098  -22.636185   -82.55770   11.162799
#> 3:    39.95578   -69.36965    60.04150  -35.164581  -121.22651   36.579156
#> 4:    33.55430   -76.73088    88.16500  -30.210472  -133.42926   29.447347
#> 5:    60.75067   -61.41886   109.99894  -39.079021   -91.41670    8.549952
#> 6:   -34.83746    19.31289   -17.64939   -9.643601   -11.97513   47.819253
#>    variable_2 variable_20  variable_3 variable_4 variable_5 variable_6
#> 1:  -5.933999   -49.15108 -0.03570218   11.11355  0.3497869 -10.620561
#> 2:  -5.939178   -30.81022  3.20041468   14.05601  3.9081466  -6.652098
#> 3:  -7.608419    14.29663  4.80352499   16.59040 -3.8849583 -17.966355
#> 4:  -8.632151    36.31787  5.07881888   20.57171  0.4075008 -14.904063
#> 5:  -7.139009    93.05832 -0.50478694   28.58191 -5.2086206 -16.391410
#> 6:  -5.838160   -33.53981 -1.47054236   11.96847  5.4414614 -24.266648
#>    variable_7 variable_8  variable_9
#> 1:   15.55688   10.14435 -34.4222387
#> 2:   19.31045   15.09871 -46.2560162
#> 3:   29.84184   22.75923 -54.9854605
#> 4:   25.72539   26.06270 -61.3664425
#> 5:   20.97858   34.30927 -80.4137867
#> 6:  -28.21639  -22.49494   0.5137732

ALASCA supports both formats but defaults to long format. To use wide format, you have to set wide = TRUE.

Initialize an ALASCA model

In this example, we are only looking at the common time development. For examples involving group differences, see the vignette on regression models.

To assess the time development in this data set, we will use the regression formula value ~ time + (1|id). Here, value is the measured variable value, time the predictor, and (1|id) a random intercept per participant-id. ALASCA will implicitly run the regression for each variable separately.

res <- ALASCA(
  df,
  value ~ time + (1|id)
)
#> INFO  [2024-01-18 00:08:50] Initializing ALASCA (v1.0.14, 2024-01-17)
#> WARN  [2024-01-18 00:08:50] Guessing effects: `time`
#> INFO  [2024-01-18 00:08:50] Will use linear mixed models!
#> INFO  [2024-01-18 00:08:50] Will use Rfast!
#> WARN  [2024-01-18 00:08:50] The `time` column is used for stratification
#> WARN  [2024-01-18 00:08:50] Converting `character` columns to factors
#> INFO  [2024-01-18 00:08:50] Scaling data with sdall ...
#> INFO  [2024-01-18 00:08:50] Calculating LMM coefficients
#> INFO  [2024-01-18 00:08:51] ==== ALASCA has finished ====
#> INFO  [2024-01-18 00:08:51] To visualize the model, try `plot(<object>, effect = 1, component = 1, type = 'effect')`

The ALASCA function will provide output with important information:

  • Guessing effects: 'time' When effects are not explicitly provided to ALASCA, the package will try to guess the effects you are interested in. See the vignette on regression models for details.
  • Will use linear mixed models! ALASCA will use linear mixed models when you provide a random effect in the regression formula (i.e., (1|id))
  • Will use Rfast! Linear mixed model regression can be performed by one out of two different R packages: the lme4 package or the Rfast package
  • The 'time' column is used for stratification This is only important for model validation. For details, see the vignette on model validation
  • Converting 'character' columns to factors We provided time as a character variable and ALASCA converts it to a factor variable. If the levels of your variable matters and they are not in alphabetical order, you may want to convert the variable to a factor by yourself.
  • Scaling data with sdall ... ALASCA supports various scalings, and sdall is the default. For details, see our paper ALASCA: An R package for longitudinal and cross-sectional analysis of multivariate data by ASCA-based methods.
  • Calculating LMM coefficients Simply informs you that the regression is ongoing as this may take some time

To see the resulting model:

plot(res, component = c(1,2), type = 'effect')
#> INFO  [2024-01-18 00:08:51] Effect plot. Selected effect (nr 1): `time`. Component: 1 and 2.
#> WARN  [2024-01-18 00:08:51] Showing 20 of 20 variables. Adjust the number with `n_limit`
#> WARN  [2024-01-18 00:08:52] Showing 20 of 20 variables. Adjust the number with `n_limit`

See the vignette on plotting the model for more visualizations.