Effect coding (-1, 1, 0) vs. Dummy Coding (0, 1)
Source:    Publish Time: 2012-09-30 04:34   2354 Views   Size:  16px  14px  12px
I used a simple example to illustrate the differences between effect coding and dummy coding.

Author: Xuanqian Xie

I used a simple example to illustrate the differences between effect coding and dummy coding.

 

Data: There are 4 groups samples with continuous results.  

                                          

 

                                           grp    outcome

 

                                            1      398.0

                                            4      177.6

                                            3      156.8

                                            1       85.4

                                            1      337.2

                                            1       85.4

                                            3      300.8

                                            2      196.0

                                            4      109.4

                                            3      230.8

 

----------------------------------------------------------------------------------------------------

 

The descriptive statistics: Mean (sd) of each group.  

 

                                    Analysis Variable : outcome

 

                       N

             grp     Obs       N            Mean         Std Dev         Minimum         Maximum

    ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

               1    1000    1000     252.8664000     125.6165406      82.0000000     720.0000000

 

               2    1000    1000     161.2152000      82.9326782      10.0000000     406.0000000

 

               3    1000    1000     185.8592000      68.9353757      84.4000000     442.0000000

 

               4    1000    1000     158.8698000      57.1008399      52.0000000     407.0000000

    ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

 

----------------------------------------------------------------------------------------------------

 

The descriptive statistics: Mean (sd) of all 4000 samples.  

 

                                    Analysis Variable : outcome

 

                   N            Mean         Std Dev         Minimum         Maximum

                ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

                4000     189.7026500      95.4135748      10.0000000     720.0000000

                Ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

 


 

 

 

Effect coding:

 

data aa1;

    set aa;

      if grp=1 then do; e1=1; e2=0; e3=0; e4=0; end;

      if grp=2 then do; e1=0; e2=1; e3=0; e4=0; end;

      if grp=3 then do; e1=0; e2=0; e3=1; e4=0; end;

      if grp=4 then do; e1=-1; e2=-1; e3=-1; e4=-1; end;

run;

 

Title "Effect coding";

proc reg data=aa1;

    model outcome = e1 e2 e3 e4;

run;

quit;

 

                                        Parameter Estimates

 

                                     Parameter       Standard

                Variable     DF       Estimate          Error    t Value    Pr > |t|

 

                Intercept     B      189.70265        1.38451     137.02      <.0001

                e1            B       63.16375        2.39804      26.34      <.0001

                e2            B      -28.48745        2.39804     -11.88      <.0001

                e3            B       -3.84345        2.39804      -1.60      0.1091

                e4            0              0              .        .         .

 

Interpretation:

Intercept = grand mean

Mean of group 1= Coefficient of e1+ Intercept

The t-test: group mean vs. grand mean 

 

Dummy coding: 

/*Dummy coding ---1*/

 

data aa2;

    set aa;

      if grp=1 then do; d1=1; d2=0; d3=0; d4=0; end;

      if grp=2 then do; d1=0; d2=1; d3=0; d4=0; end;

      if grp=3 then do; d1=0; d2=0; d3=1; d4=0; end;

      if grp=4 then do; d1=0; d2=0; d3=0; d4=1; end;

run;

 

Title  "Dummy coding ---1";

proc reg data=aa2;

    model outcome = d1 d2 d3 d4;

run;

quit;

 

/*Dummy coding ---2*/

data aa3;

    set aa;

      if grp=1 then do; d1=1; d2=0; d3=0; d4=0; end;

      if grp=2 then do; d1=0; d2=1; d3=0; d4=0; end;

      if grp=3 then do; d1=0; d2=0; d3=1; d4=0; end;

      if grp=4 then do; d1=0; d2=0; d3=0; d4=0; end;

run;

 

Title  "Dummy coding ---2";

proc reg data=aa3;

    model outcome = d1 d2 d3 d4;

run;

quit;

 

Title  "Dummy coding ---3";

proc reg data=aa3;

    model outcome = d1 d2 d3;

run;

quit;

Different dummy coding methods would have same results. I suggest using the 2nd one.

 

 

 

                                        Parameter Estimates

 

                                     Parameter       Standard

                Variable     DF       Estimate          Error    t Value    Pr > |t|

 

                Intercept     B      158.86980        2.76902      57.37      <.0001

                d1            B       93.99660        3.91599      24.00      <.0001

                d2            B        2.34540        3.91599       0.60      0.5493

                d3            B       26.98940        3.91599       6.89      <.0001

                d4            0              0              .        .         .

 

Interpretation:

Group 4 is the baseline.

Intercept = Mean of group 4

Mean of group 1= Coefficient of d1+ Intercept

The t-test: group mean vs. baseline mean

 

When considering the interactions between categorical variables, it is better to use effect coding to explore the main effect (rather than effect of the reference group) and interactions.

  http://methodology.psu.edu/node/266

http://www.ats.ucla.edu/stat/mult_pkg/faq/general/effect.htm

 

The complete SAS program can be found in the SAS code section of this web.