Home
Archive
Background
FAQ
People
Publications                         
In surveys designed to measure mental health outcomes and risk factors in community or service-based samples, information about mental health status or service use is often obtained from multiple sources.  For example, in studies of childhood psychopathology, the child's parent is routinely used as a proxy data source; other informants (e.g., self-report, peers, teachers, clinicians, or trained observers) may also be employed, depending on the child's age and the nature of psychopathology under study.  Multiple sources can provide information on either risk factors or outcomes, and key methodological challenges in analyzing such data concern how they should best be represented in statistical models.  

Introduction to Multiple Informant Reports

Multiple informants as outcomes

Agreement of multiple informants

Partially observed multiple informant reports with different types of non-response

Multiple informants as risk factors or predictors

Introduction to Multiple Informant Reports

The use of multiple sources in community-based assessments of mental health, particularly studies of children's mental health, behavior, and service use, has been our primary motivation for developing appropriate methodology.  However, the methods are broadly applicable in other settings as well, including (but not limited to):
1) service utilization studies, where both user and provider(s) are asked to report types of services obtained/ provided or where multiple types of service reports are analyzed. 
2) family history studies, where many relatives are interviewed about the status of the proband and other family members;
3) behavioral studies of alcohol/drug use or of eating disorders, where information is obtained from the subject, as well as family members or other informants;
4) studies of severe mental illness, such as schizophrenia or Alzheimer's disease, where the adult subject is often unable to provide self-report data;

Our own previous work on analytic methods for multiple-source risk factors and outcomes has focused on developing likelihood-based regression methods for simultaneously analyzing data from all sources.  Our methods treat the multiple sources as providing either conceptually different information or the same information measured with error. They also fully use all available information, even from subjects who have missing data from one or more sources.

Multiple informants as outcomes

Fitzmaurice et al. (1995, 1996) and Daskalakis (2002) developed novel regression methodology for analyzing binary multiple-source outcomes.  The methodology provides an approach which permits one to use a single analysis to:
1) include all multiple-source outcomes in a single multivariate regression analysis;
2) test for source differences in outcome, and estimate different source effects where necessary;
3) test if the effects of other risk factors on the outcome differ by source, and estimate those differences where necessary or estimate a combined effect if appropriate;
4) include partial data from subjects with missing source observations; and
5) assess inter-source agreement and the effect of covariates on agreement. 
The regression coefficients have the same interpretation as those in ordinary logistic regression. The method is similar to a multivariate repeated measures ANOVA, except that it is designed for discrete or ordinal outcomes, it can handle measured or continuous predictor variables and can include subjects with missing outcomes. 

Daskalakis, Laird and Lipsitz (2002) generalized the methods to categorical multiple-source outcomes and incorporated analysis of agreement.  This general likelihood-based regression approach allows both risk factor analyses and agreement analyses within a single modeling framework, and accommodates missing data.  Previous work focused on binary or categorical multiple-source outcomes.  In practice, multiple-source reports are often naturally dichotomous, particularly when they represent diagnostic classifications obtained with a standardized diagnostic instrument (e.g., the DIS).  Quite often, however, reports are dichotomized from underlying continuous or categorical measures, obtained with symptom checklists (e.g., the CBCL).  Achenbach (1991a) discussed some of the advantages and disadvantages of using specific normative cutpoint to derive a discrete variable from a continuous measure.  In general, there can be loss of information from a dichotomization of a continuous variable; for instance, subtle behavioral differences between subjects may be obscured.  In addition, categorization of a covariate may lead to inadequate control of confounding (Brenner and Blettner, 1997; Brenner, 1997).  For these reasons, methods for continuous multiple-source data are as important as those for categorical or ordinal ones.  

Goldwasser and Fitzmaurice (2001) developed an extension of the approach outlined in Fitzmaurice et al. (1995, 1996) for analyzing multiple informant data that are continuous. When the outcome variable is continuous and assumed to have a multivariate normal distribution, they show that there is a general class of linear models that are suitable for analyses. These models can be thought of as extensions of the repeated measures ANOVA model and can handle mixed continuous and categorical covariates, unbalanced and/or missing data, and more general covariance structures. The proposed approach has been implemented using existing statistical software (PROC MIXED in SAS) and has been applied to data from the Connecticut Child Surveys. 

Often the responses obtained from multiple informants are ordinal rather than binary, counts or measured responses. The analysis of ordinal data in complete generality is complicated by the need to specify many covariance parameters. Glonek and Laird (2002) have undertaken a project to look at the loss of information that may occur if the ordinal responses are dichotomized.  In the case of one informant, they show that little will be lost. With two or more, their results suggest that unless there are marked differences in the levels of responses for different informants, the use of dichotomous variables offers a simple approach that retains most of the information in the data. 

Daskalakis, Laird, and Murphy (2001) developed methods for analyzing categorical multiple-informant outcomes in longitudinal studies with "discrete time" designs (i.e., studies where assessments are conducted at a few timepoints, common to all subjects). The methodology has been implemented in a macro that is available for general use. These methods were applied in the analysis of data on depression from the Stirling County Study. 

Agreement of multiple informants

Most of our previous work has dealt with regression models using either risk factors or outcomes measured by multiple sources, but studies of inter-source agreement are also important. Traditionally, the Kappa statistic is often used for studying agreement. Daskalakis (2002) reviewed and evaluated various methods for constructing confidence intervals and conducting hypothesis tests for Kappa.  The Kappa statistic has several limitations, however, including dependence on marginal response rates and dependence on number of categories in the ordinal or nominal outcome. Log-linear modeling overcomes some of these disadvantages and leads to a more flexible way of modeling the effects of covariates on agreement; Zahner and Daskalakis (1998) applied this methodology to study parent-teacher agreement in the Connecticut Child Surveys.

Partially observed multiple informant reports with different types of non-response

Horton and Fitzmaurice (2002) considered methods for handling non-ignorable missingness in multiple informant reports. This work was motivated by data from the Connecticut Child Surveys, which solicited multiple informant reports of psychopathology from parents and teachers. However, in this study teacher ratings were not available on over 40% of the children and a variety of causes of missingness could be distinguished, e.g., school district nonparticipation, parental refusal to give consent, and teacher nonresponse. They proposed mixture models that permit estimation under the assumption that there are two distinct types of missingness mechanisms, one that is ignorable, the other non-ignorable. 

Multiple informants as risk factors or predictors

Horton, Laird and Zahner (1999) provided an overview of the available methods for analyzing multiple-source data used as a risk factor or predictor variable and evaluated them for the case where both the multiple-source risk factor and the outcome are binary.  Horton and Laird (1999) and Horton and Laird (2001) showed how to accommodate missing data from one or more sources in this setting.  This work focused on binary multiple-source risk factors.  Although the methods are somewhat more straightforward for continuous multiple-source risk factors, there still is no universally accepted approach, particularly in the presence of missing data.  

Horton et al. (2001) proposed a generalization of the methods outlined in Horton, Laird and Zahner (1999) to a time-to-event setting. Specifically, they proposed a regression model relating the distribution of time-to-event data or survival time to multiple informant risk factors that are only partially observed. The methods were used for analyzing the mortality associated with psychiatric disorders in the Stirling County study.  These methods have also be applied to an example where 5 reports of comorbidity were used to predict use of tamoxifen amongst a cohort of breast cancer survivors (Lash et al, 2002).

 

Back to Top

Last updated February 19, 2007 by Nicholas Horton