This demo will review some of the main functions of the Differential-Time Varying Effect Model (DTVEM).
DTVEM is a hybrid exploratory-confirmatory tool that helps detect and model optimal lags in intensive longitudinal data.
This modeling tool can be utilized with idiographic research (N = 1 designs), or multisubject (i.e. replicated time series).
Example 1, Idiographic analyses with 1 variable predicting itself (N = 1, t = 168)
Let’s start with modeling some data which from a single person in an EMA dataset with one variable.
For this hypothetical example, say an EMA study is collected every hour for a period of 1 week (minus 8 hours per night that the person will be asleep).
Before we can begin, we need to load the example data into 1.
load("Data/N=1,t=168.RData")
Let’s get a general sense of the data format:
We can see that there are three columns in the dataset: (1) Time A time column which has intervals of time (2) X which is the variable we’re interested in modeling (3) ID the ID of the single person in the dataset
This is what is known as a “stacked” style data format, rather than a “wide” data format. Here subject’s data takes multiple rows (one row for each time point).
Here’s another breakdown of the data:
summary(n1t168)
Time X ID
Min. : 1.00 Min. :-5.05521 Min. :1
1st Qu.: 42.75 1st Qu.:-1.23711 1st Qu.:1
Median : 84.50 Median :-0.20369 Median :1
Mean : 84.50 Mean : 0.03012 Mean :1
3rd Qu.:126.25 3rd Qu.: 1.40372 3rd Qu.:1
Max. :168.00 Max. : 5.58616 Max. :1
NA's :56
Next let’s visualize the data
library(ggplot2)
ggplot(data=n1t168, aes(x=Time, y=X)) + geom_line()+geom_point()+xlab("Time (in Hours)")
It looks like there are a total of 168 total data points, with missing data occuring in 8-hour blocks (you can see this via the gaps in the time series).
Great, we’re now ready to get started!
We load DTVEM, and then use the LAG function as the primary function call.
Here we’ll specify a few important things: (1) We first specify the variable “X” (2) We specify the variable “X” is one of our differential time-varying predictors (3) We specify that the variable “X” is also the outcome of interest. (4) We specify the number of k selection points to use in the first stage predictions. When this is very low, the exploratory stage will not be very wiggly. (5) We specify where we want to stop looking for lags. Here we’re really only interested in looking for lags within the first 24 hours (this is heavily dependent on the problem). (6) We specify the ID variable name (7) We Specify the Time variable name (8) We specify the name of the data frame (9) We specify how the variables should be within-person centered and or within-person standardized. This choice can be important and it effects the model estimates. In general, my preference is to within-person standardize as this makes it so that all variables are mean centered, and the variance in the variables is constrained to be equal to 1, resulting in standardized model estimates.
library(DTVEM)
out=LAG("X",differntialtimevaryingpredictors = "X",outcome = "X",k=10,predictionsend = 24,ID="ID",Time="Time",data=n1t168,standardized = TRUE)
OpenMx version: 2.13.2 [GIT v2.13.2]
R version: R version 3.6.0 (2019-04-26)
Platform: x86_64-w64-mingw32
Default optimizer: CSOLNP
NPSOL-enabled?: No
OpenMP-enabled?: No
Setting up the data for variable #1 of 1.
Completed stage 1 of DTVEM, now identifying the significant peaks/valleys.
DTVEM stage 1 identified the following variables as potential peaks or valleys: XlagonXlag1, XlagonXlag2, XlagonXlag5, XlagonXlag6, XlagonXlag7, XlagonXlag8, XlagonXlag17, XlagonXlag18, XlagonXlag19, XlagonXlag20, XlagonXlag21
Removing group-based time of residuals
Beginning initial fit attempt
[0] MxComputeGradientDescent(CSOLNP) evaluations 205 fit 340.439 change -5.853
[0] MxComputeGradientDescent(CSOLNP) evaluations 595 fit 250.295 change -7.105e-013
[0] MxComputeNumericDeriv 57/78
Fit attempt 0, fit=250.295390103711, new current best! (was 460.768785798069)
Summary of multisujbject
free parameters:
Model Statistics:
| Parameters | Degrees of Freedom | Fit (-2lnL units)
Model: 12 100 250.2954
Saturated: 2 110 NA
Independence: 2 110 NA
Number of observations/statistics: 168/112
Information Criteria:
| df Penalty | Parameters Penalty | Sample-Size Adjusted
AIC: 50.29539 274.2954 276.3083
BIC: -262.10101 311.7830 273.7883
CFI: NA
TLI: 1 (also known as NNFI)
RMSEA: 0 [95% CI (NA, NA)]
Prob(RMSEA <= 0.05): NA
To get additional fit indices, see help(mxRefModels)
timestamp: 2019-06-19 12:34:54
Wall clock time: 2.896687 secs
OpenMx version number: 2.13.2
Need help? See help(mxSummary)
[1] "Variables that should be controlled for are: XlagonXlag1"
[2] "Variables that should be controlled for are: XlagonXlag7"
[3] "Variables that should be controlled for are: XlagonXlag8"
Completed stage 2 of DTVEM (wide format). Now passing to Stage 1.5. Controlling for the following lags in the varying-coefficient model: XlagonXlag1, XlagonXlag7, XlagonXlag8
Removing group-based time of residuals
Beginning initial fit attempt
Fit attempt 0, fit=255.26296721935, new current best! (was 414.196666059423)
Solution found! Final fit=255.26297 (started at 414.19667) (1 attempt(s): 1 valid, 0 errors)
Summary of multisujbject
free parameters:
Model Statistics:
| Parameters | Degrees of Freedom | Fit (-2lnL units)
Model: 4 108 255.263
Saturated: 2 110 NA
Independence: 2 110 NA
Number of observations/statistics: 168/112
Information Criteria:
| df Penalty | Parameters Penalty | Sample-Size Adjusted
AIC: 39.26297 263.2630 263.5084
BIC: -298.12514 275.7588 263.0939
CFI: NA
TLI: 1 (also known as NNFI)
RMSEA: 0 [95% CI (NA, NA)]
Prob(RMSEA <= 0.05): NA
To get additional fit indices, see help(mxRefModels)
timestamp: 2019-06-19 12:34:55
Wall clock time: 0.5958519 secs
OpenMx version number: 2.13.2
Need help? See help(mxSummary)
[1] "Variables that should be controlled for are: XlagonXlag1"
[2] "Variables that should be controlled for are: XlagonXlag7"
[3] "Variables that should be controlled for are: XlagonXlag8"
This is the #2 out of a possible 10 passes
Completed loop between 1.5 and 2
DTVEM analysis completed,congradulations!
Contact the author - Nick Jacobson: njacobson88@gmail.com if you have any questions.
Summary of multisujbject
free parameters:
Model Statistics:
| Parameters | Degrees of Freedom | Fit (-2lnL units)
Model: 4 108 255.263
Saturated: 2 110 NA
Independence: 2 110 NA
Number of observations/statistics: 168/112
Information Criteria:
| df Penalty | Parameters Penalty | Sample-Size Adjusted
AIC: 39.26297 263.2630 263.5084
BIC: -298.12514 275.7588 263.0939
CFI: NA
TLI: 1 (also known as NNFI)
RMSEA: 0 [95% CI (NA, NA)]
Prob(RMSEA <= 0.05): NA
To get additional fit indices, see help(mxRefModels)
timestamp: 2019-06-19 12:34:55
Wall clock time: 0.5958519 secs
OpenMx version number: 2.13.2
Need help? See help(mxSummary)
Okay, there’s a lot to look at here. When we ran DTVEM we utilized OpenMx to model the confirmatory stage. Consequently, the first output (in Red) is OpenMx saying that it’s loaded.
In the next step, DTVEM sets up the data structure (to run the model, the data needs to be first reorganized). This step succeeds as DTVEM enters the first stage of the modeling. This model also separately accounts for the time-trends within the data.
You can see the first plot represents the exploratory stage 1 estimates of DTVEM. This models the lagged effect of variable X on itself at later hours. DTVEM then goes through a peak-picking algorithm and picks the significant peaks and valleys associated with the time series. This algorithm looks for the significant peaks and valleys, and the nearby points to these peaks.
Here we can see that it grabs lags 1, 2, 5, 7, 8, 17, 18, 19, 20, and 21.
Stage 2 then goes through and models the points identified in Stage 1. Before we begin stage 2, DTVEM automatically detrends the data, as the confirmatory modeling framework does not do this on it’s own. In Stage 2, the lags 1, 7, and 8 are identified as significantly different from 0. All other lags previously identified are then dropped.
Next, DTVEM does another exploratory search to see if it missed any lags in the exploratory stage, but in this pass it controls for all of the lags identified in the confirmatory stage (i.e. lags 1, 7, and 8).
The second round of exploration in DTVEM does not identify any other significant lags, and consequently, reports the final model fit with lags 1, 7, and 8 which are all significant.
Model interpretation:
Okay, so the lags 1, 7, and 8 are significant. How do we interpret this?
The estimates of interest are:
XlagonXlag1 0.5623759 0.07689808 NA NA FALSE FALSE 7.313263 2.606804e-13 TRUE
XlagonXlag7 0.5571036 0.07545662 NA NA FALSE FALSE 7.383097 1.545430e-13 TRUE
XlagonXlag8 -0.4244806 0.08702034 NA NA FALSE FALSE -4.877947 1.071956e-06 TRUE
Note that lag 1 and 7 are positive (see the sign of the estimate), and lag 8 is negative.
We could report this as: X significantly positively predicted itself 1 hour later (Beta = 0.562, SE = 0.077, Z = 7.313, p < .001) and 7 hours later (Beta = 0.557, SE = 0.075, Z = 7.383, p < .001). Additionally, X significantly negatively predicted itself 8 hours later (Beta = -0.424, SE = 0.087, Z = -4.878, p < .001).
However, the dynamics of these estimates also give us more information, in particular, because the lag 7 and lag 8 are opposing signs and are adjacent, this suggests that we have an oscillatory process (see mathametical equivalence of a SARIMA model and and autoregressive model).
Thus, we can also conclude that X also follows an oscillatory pattern over a span of approximately 7 hours.
Example Two: Multiple Persons, Bivariate Predictors single Outcome (i.e. N = 100, t = 14, 2 variables)
Okay lets take a different scenario: we have data from a daily diary study collected from multiple subjects across a two-week period. We collected data about two variables: X and Y. For theoretical reasons, we are really only interested in the effect of X on Y while controlling for the effects of Y on itself.
Like in the first example, we have missing data, but instead of being caused by night times as in the first data it’s due to random non-compliance.
As before, we first load the data
load("Data/N=100,t=14.RData")
Let’s get a general sense of the data format:
head(n100t14)
We can see that there are four columns in the dataset: (1) Time A time column which has intervals of time (2) X which is a variable we’re interested in modeling (3) Y which is a variable we’re interested in modeling (4) ID the ID of the single person in the dataset
This is what is known as a “stacked” style data format, rather than a “wide” data format. Here subject’s data takes multiple rows (one row for each time point).
Here’s another breakdown of the data:
summary(n100t14)
Time X Y ID
Min. : 1.0 Min. :-4.8798 Min. :-5.8317 Min. : 1.00
1st Qu.: 4.0 1st Qu.:-1.0026 1st Qu.:-1.1012 1st Qu.: 25.75
Median : 7.5 Median :-0.0925 Median :-0.0129 Median : 50.50
Mean : 7.5 Mean :-0.0311 Mean : 0.0009 Mean : 50.50
3rd Qu.:11.0 3rd Qu.: 1.0507 3rd Qu.: 1.1234 3rd Qu.: 75.25
Max. :14.0 Max. : 5.4594 Max. : 5.7154 Max. :100.00
NA's :400 NA's :400
Okay, it looks like there are 400 missing values of X and Y (i.e. 30% missingness).
library(ggplot2)
temp=data.frame(Time=n100t14$Time,Value=c(n100t14$X,n100t14$Y),ID=n100t14$ID,lab=c(rep("X",nrow(n100t14)),rep("Y",nrow(n100t14))))
ggplot(data=temp, aes(x=Time, y=Value,group=interaction(ID, lab),col=lab)) + geom_line()+geom_point()+xlab("Time (in Days)")
No clearly identifiable visible patterns in the data.
Great, we’re now ready to get started!
We load DTVEM, and then use the LAG function as the primary function call.
Here we’ll specify a few important things: (1) We first specify the variable “X” (1) We first specify the variable “Y” (2) We specify the variables “X” and “Y” as our differential time-varying predictors (3) We specify that the variable “Y” is also the outcome of interest. (4) We specify the number of k selection points to use in the first stage predictions. When this is very low, the exploratory stage will not be very wiggly. (5) We specify where we want to stop looking for lags. Here we’re really only interested in looking for lags within the entire potential time series (i.e. 14 days), since for this problem we have no idea when lags might be occuring. (6) We specify the ID variable name (7) We Specify the Time variable name (8) We specify the name of the data frame (9) We specify how the variables should be within-person centered and or within-person standardized. This choice can be important and it effects the model estimates. In general, my preference is to within-person standardize as this makes it so that all variables are mean centered, and the variance in the variables is constrained to be equal to 1, resulting in standardized model estimates.
library(DTVEM)
out=LAG("X","Y",differntialtimevaryingpredictors = c("X","Y"),outcome = c("Y"),k=10,predictionsend = 14,ID="ID",Time="Time",data=expppdat,standardized = TRUE)
OpenMx version: 2.13.2 [GIT v2.13.2]
R version: R version 3.6.0 (2019-04-26)
Platform: x86_64-w64-mingw32
Default optimizer: CSOLNP
NPSOL-enabled?: No
OpenMP-enabled?: No
Setting up the data for variable #1 of 2.
Setting up the data for variable #2 of 2.
Completed stage 1 of DTVEM, now identifying the significant peaks/valleys.
It looks like all peaks might be very flat, such that the highest peak is not significantly different fromt the smallest valley for variable 'YlagonYlag'.
DTVEM stage 1 identified the following variables as potential peaks or valleys: XlagonYlag8, XlagonYlag12, XlagonYlag13
Removing group-based time of residuals
Beginning initial fit attempt
[0] MxComputeGradientDescent(CSOLNP) evaluations 36 fit 6340.35 change -700.6
[0] MxComputeGradientDescent(CSOLNP) evaluations 91 fit 5929.87 change -25.03
[0] MxComputeGradientDescent(CSOLNP) evaluations 129 fit 5457.35 change -264.1
[0] MxComputeGradientDescent(CSOLNP) evaluations 166 fit 5412.56 change -5.721
[0] MxComputeGradientDescent(CSOLNP) evaluations 203 fit 5410.39 change -0.001169
[0] MxComputeNumericDeriv 1/15
[0] MxComputeNumericDeriv 6/15
[0] MxComputeNumericDeriv 11/15
[0] MxComputeNumericDeriv 15/15
Fit attempt 0, fit=5410.3918562197, new current best! (was 7303.2659341743)
Solution found! Final fit=5410.3919 (started at 7303.2659) (1 attempt(s): 1 valid, 0 errors)
Summary of multisujbject
free parameters:
Model Statistics:
| Parameters | Degrees of Freedom | Fit (-2lnL units)
Model: 5 1995 5410.392
Saturated: NA NA NA
Independence: NA NA NA
Number of observations/statistics: 1400/2000
Information Criteria:
| df Penalty | Parameters Penalty | Sample-Size Adjusted
AIC: 1420.392 5420.392 5420.435
BIC: -9041.842 5446.613 5430.730
CFI: NA
TLI: 1 (also known as NNFI)
RMSEA: 0 [95% CI (NA, NA)]
Prob(RMSEA <= 0.05): NA
To get additional fit indices, see help(mxRefModels)
timestamp: 2019-06-19 14:54:16
Wall clock time: 10.09754 secs
OpenMx version number: 2.13.2
Need help? See help(mxSummary)
[1] "Variables that should be controlled for are: XlagonYlag12"
[2] "Variables that should be controlled for are: XlagonYlag8"
Completed stage 2 of DTVEM (wide format). Now passing to Stage 1.5. Controlling for the following lags in the varying-coefficient model: XlagonYlag12, XlagonYlag8
Intermediate stage failed, running without it.
Removing group-based time of residuals
Beginning initial fit attempt
[0] MxComputeGradientDescent(CSOLNP) evaluations 34 fit 6123.6 change -864.1
[0] MxComputeGradientDescent(CSOLNP) evaluations 82 fit 5817.54 change -107.6
[0] MxComputeGradientDescent(CSOLNP) evaluations 118 fit 5415.15 change -7.068
[0] MxComputeGradientDescent(CSOLNP) evaluations 158 fit 5412.13 change -0.0001201
[0] MxComputeNumericDeriv 1/10
[0] MxComputeNumericDeriv 6/10
[0] MxComputeNumericDeriv 10/10
Fit attempt 0, fit=5412.13165247287, new current best! (was 7288.01210304408)
Solution found! Final fit=5412.1317 (started at 7288.0121) (1 attempt(s): 1 valid, 0 errors)
Summary of multisujbject
free parameters:
Model Statistics:
| Parameters | Degrees of Freedom | Fit (-2lnL units)
Model: 4 1996 5412.132
Saturated: NA NA NA
Independence: NA NA NA
Number of observations/statistics: 1400/2000
Information Criteria:
| df Penalty | Parameters Penalty | Sample-Size Adjusted
AIC: 1420.132 5420.132 5420.160
BIC: -9047.346 5441.109 5428.402
CFI: NA
TLI: 1 (also known as NNFI)
RMSEA: 0 [95% CI (NA, NA)]
Prob(RMSEA <= 0.05): NA
To get additional fit indices, see help(mxRefModels)
timestamp: 2019-06-19 14:54:28
Wall clock time: 8.155889 secs
OpenMx version number: 2.13.2
Need help? See help(mxSummary)
[1] "Variables that should be controlled for are: XlagonYlag12"
[2] "Variables that should be controlled for are: XlagonYlag8"
This is the #2 out of a possible 10 passes
Completed loop between 1.5 and 2
DTVEM analysis completed,congradulations!
Contact the author - Nick Jacobson: njacobson88@gmail.com if you have any questions.
Summary of multisujbject
free parameters:
Model Statistics:
| Parameters | Degrees of Freedom | Fit (-2lnL units)
Model: 4 1996 5412.132
Saturated: NA NA NA
Independence: NA NA NA
Number of observations/statistics: 1400/2000
Information Criteria:
| df Penalty | Parameters Penalty | Sample-Size Adjusted
AIC: 1420.132 5420.132 5420.160
BIC: -9047.346 5441.109 5428.402
CFI: NA
TLI: 1 (also known as NNFI)
RMSEA: 0 [95% CI (NA, NA)]
Prob(RMSEA <= 0.05): NA
To get additional fit indices, see help(mxRefModels)
timestamp: 2019-06-19 14:54:28
Wall clock time: 8.155889 secs
OpenMx version number: 2.13.2
Need help? See help(mxSummary)
Great, the model ran!
Unlike some scenarios, the results are fairly straightforward.
Here are the primary results:
name matrix row col Estimate Std.Error lbound ubound lboundMet uboundMet tstat pvalue sig
2 XlagonYlag12 s1.A 15 12 -0.3022132 0.06244596 NA NA FALSE FALSE -4.839595 1.301038e-06 TRUE
1 XlagonYlag8 s1.A 15 8 0.3373854 0.04814107 NA NA FALSE FALSE 7.008266 2.412959e-12 TRUE
Here variable X predict variable Y signitivaly positively 8 days later (Beta = 0.337, SE = 0.048, Z = 7.008, p < .001) and significantly negatively 12 days later (Beta = -0.302, SE = 0.062, Z = -4.840, p < .001).
Variable Y does not significantly predict itself any times later.
Let’s say we wanted to report the test-statistic for y not predicting itself later, how might we do this?
Now let’s get started applying this to some real data!