Group-Based Trajectory Modeling
Source:    Publish Time: 2013-03-24 03:29   8587 Views   Size:  16px  14px  12px
Author: Xuanqian Xie Group-based trajectory model is usually used in psychology, behavior, criminal, etc. studies. It

Author: Xuanqian Xie

Group-based trajectory model is usually used in psychology, behavior, criminal, etc. studies. It has been applied in the clinical research in recent years for the longitudinal data. Trajectory model can identify the clusters of individuals developing to some outcomes over times. This model assumes that the population is composed of a finite number of latent groups. Then, a mixture probability distribution is used to describe the data.  The details of the statistical model and likelihood factions can be found somewhere else (Jones & Nagin, 2007; Jones et al., 2001). Briefly, Jones et al. introduced 3 models using SAS procedure, the censored normal (CNORM) model for normally distributed continuous data, zero-inflated Poisson (ZIP) model for the count data, and LOGIT model for dichotomous data. Up to 4th order polynomial relationship are used to link the time/age and outcomes/behaviors. This model can include both time-stable and time-dependent covariates, and allow the missing values for longitudinal data. The parameters of model are estimated by Maximum likelihood, quasi-Newton approach.

 

Reference:

Jones, B. L., & Nagin, D. S. (2007). Advances in group-based trajectory modeling and an SAS procedure for estimating them. Sociological Methods & Research, 35, 542–571.

Jones, B., Nagin, D. S., & Roeder, K. (2001). A SAS procedure based on mixture models for estimating developmental trajectories. Sociological Research and Methods, 29, 374–393.

 

SAS program: I once used this model a couple years ago. My SAS program was mainly adapted from Jones et al 2001.

 

title;

proc means data=final;

    var r1000 r10000;

run;

 

title;

PROC TRAJ DATA=traj_f OUTPLOT=OP_f1 OUTSTAT=OS_f1 OUT=OF_f1 ;

    ID ward;

    VAR r2005-r2009;

    INDEP T2005-T2009;

    MODEL cnorm;

    max 12;

      ngroups 4;

    ORDER 0 1 1 1;

RUN;

 

%TRAJPLOT (PlotFile=op_f1,StatFile=os_f1,Title2="XXXX",

Ylab="XXXXX", Xlab="Year");

 

/*Add risk factors,.*/

title;

PROC TRAJ DATA=traj_f1 OUTPLOT=OP_f2 OUTSTAT=OS_f2 OUT=OF_f2 ;

    ID ward;

    VAR r2005-r2009;

    INDEP T2005-T2009;

    MODEL cnorm;

    max 12;

      ngroups 4;

    ORDER 0 1 1 1;

      risk Risk_A Risk_B ;

RUN;

 

%TRAJPLOT (PlotFile=op_f2,StatFile=os_f2,Title2="XXXXX",

Ylab="XXXXXX", Xlab="Year");

 

/*Add time-dependent variables, intervention or not in the model.*/

title;

PROC TRAJ DATA=traj_f1 OUTPLOT=OP_f3 OUTSTAT=OS_f3 OUT=OF_f3 ;

    ID ward;

    VAR r2005-r2009;

    INDEP T2005-T2009;

    MODEL cnorm;

    TCOV td2005-td2009;

      plottcov 0 0 0 0 0;

      min 0;

    max 11;

      ngroups 4;

    ORDER 0 1 1 1;

RUN;

%TRAJPLOT (PlotFile=op_f3,StatFile=os_f3,Title2=" XXXXX",

Ylab=" XXXXX ", Xlab="Year");

 

title;

PROC TRAJ DATA=traj_f1 OUTPLOT=OP_f4 OUTSTAT=OS_f4 OUT=OF_f4 ;

    ID ward;

    VAR r2005-r2009;

    INDEP T2005-T2009;

    MODEL cnorm;

    TCOV td2005-td2009;

      plottcov 0 0 0 0 0;

      min 0;

    max 11;

      ngroups 4;

    ORDER 0 1 1 1;

      risk S_MGH;

RUN;

%TRAJPLOT (PlotFile=op_f4,StatFile=os_f4,Title1=" XXXXX ",

Title2="Adjusted by time varying prevention program, tcov=0",

Ylab=" XXXXX ", Xlab="Year");

 

 

title;

PROC TRAJ DATA=traj_f1 OUTPLOT=OP_f5 OUTSTAT=OS_f5 OUT=OF_f5 ;

    ID ward;

    VAR r2005-r2009;

    INDEP T2005-T2009;

    MODEL cnorm;

    TCOV td2005-td2009;

      plottcov 1 1 1 1 1;

      min 0;

    max 11;

      ngroups 4;

    ORDER 0 1 1 1;

      risk Risk_A;

RUN;

%TRAJPLOT (PlotFile=op_f5,StatFile=os_f5,Title1="Falls at MUHC, 2005-2010",

Title2="Adjusted by time varying prevention program and another risk factor, tcov=1",

Ylab="XXXXXXXXXXXXXX", Xlab="Year");

 

 

 

/*ZIP mode: Samples weights and Exposure time did not fit well*/

 

PROC TRAJ DATA=traj_f1 OUTPLOT=OP OUTSTAT=OS OUT=OF ;

    ID ward;

    VAR f2005-f2009;

    INDEP T2005-T2009;

    expos d2005-d2009; 

    MODEL zip;

    ORDER 1 1 1 ;

RUN;

 

/* BIC: In the discrete case, the BIC score can only be negative.

It is defined as (see section 11.2 of the HUGIN C API Reference Manual): l-1/2*k*log (n) where l is log-likelihood, k is the number of free parameters, and n is the number of cases.

When comparing two models with different BIC scores, you should select the one with the highest score (e.g., if the scores are -100 and -200, then the highest score is -100).

You can read about the BIC score here: http://en.wikipedia.org/wiki/Bayesian_information_criterion

In the continuous case, i.e., for CG networks, the log-likelihood is computed from the value of the density at the observed value. This may produce a positive contribution to the log-likelihood.

By Anders L Madsen.  http://forum.hugin.com/index.php?topic=188.0

*/