In surveys designed to
measure mental health outcomes and
risk factors in community or service-based samples, information about
mental health status or service use is often obtained
from multiple sources.
For example, in studies of childhood psychopathology, the
child's parent is routinely used as a proxy data source;
other informants (e.g., self-report, peers, teachers, clinicians,
or trained observers) may also be employed, depending on
the child's age and the nature of psychopathology under
study. Multiple sources
can provide information on either risk factors or
outcomes, and key methodological challenges in analyzing
such data concern how they should best be represented in
statistical models.
Introduction
to Multiple Informant Reports
Multiple
informants as outcomes
Agreement
of multiple informants
Partially
observed multiple informant reports with different types of
non-response
Multiple
informants as risk factors or predictors
Introduction to Multiple Informant Reports
The use of multiple
sources in community-based assessments of mental health,
particularly studies of children's mental health, behavior,
and service use, has been our primary motivation for
developing appropriate methodology. However, the methods
are broadly applicable in other settings as well, including
(but not limited to):
1) service utilization studies, where both
user and provider(s) are asked to report types of services obtained/
provided or where multiple types of service reports are
analyzed.
2) family history studies, where many relatives
are interviewed about the status of the proband and other family
members;
3) behavioral studies of alcohol/drug use or of eating
disorders, where information is obtained from the subject, as well
as family members or other informants;
4) studies of severe
mental illness, such as schizophrenia or Alzheimer's disease, where
the adult subject is often unable to provide self-report data;
Our own previous work
on analytic methods for multiple-source risk factors and
outcomes has focused on developing likelihood-based
regression methods for simultaneously analyzing data from all
sources. Our
methods treat the multiple sources as providing either conceptually
different information or the same information measured with
error. They also fully use all available information,
even from subjects who have missing data from one or more
sources.
Multiple informants as outcomes
Fitzmaurice
et al. (1995, 1996) and Daskalakis
(2002) developed novel regression methodology for
analyzing binary multiple-source outcomes. The
methodology provides an approach which permits one to use a single
analysis to:
1) include all multiple-source outcomes in a single
multivariate regression analysis;
2) test for source differences
in outcome, and estimate different source effects where
necessary;
3) test if the effects of other risk factors on the
outcome differ by source, and estimate those differences where
necessary or estimate a combined effect if appropriate;
4)
include partial data from subjects with missing source observations;
and
5) assess inter-source agreement and the effect of covariates
on agreement.
The regression coefficients have the same
interpretation as those in ordinary logistic regression. The method
is similar to a multivariate repeated measures ANOVA, except that it
is designed for discrete or ordinal outcomes, it can handle measured
or continuous predictor variables and can include subjects with
missing outcomes.
Daskalakis,
Laird and Lipsitz (2002) generalized the methods to
categorical multiple-source outcomes and incorporated
analysis of agreement.
This general likelihood-based regression approach
allows both risk factor analyses and agreement analyses within
a single modeling framework, and accommodates missing data. Previous work
focused on binary or categorical multiple-source outcomes. In practice,
multiple-source reports are often naturally dichotomous,
particularly when they represent diagnostic
classifications obtained with a standardized diagnostic
instrument (e.g., the DIS).
Quite often, however, reports are dichotomized
from underlying continuous or categorical measures, obtained
with symptom checklists (e.g., the CBCL). Achenbach
(1991a) discussed some of the advantages and
disadvantages of using specific normative cutpoint to derive a
discrete variable from a continuous measure. In general, there can be
loss of information from a dichotomization of a continuous
variable; for instance, subtle behavioral differences
between subjects may be obscured. In addition,
categorization of a covariate may lead to inadequate control
of confounding (Brenner
and Blettner, 1997; Brenner,
1997). For these
reasons, methods for continuous multiple-source data are
as important as those for categorical or ordinal ones.
Goldwasser
and Fitzmaurice (2001) developed an extension of the approach
outlined in Fitzmaurice
et al. (1995, 1996) for analyzing multiple informant data that
are continuous. When the outcome variable is continuous and assumed
to have a multivariate normal distribution, they show that there is
a general class of linear models that are suitable for analyses.
These models can be thought of as extensions of the repeated
measures ANOVA model and can handle mixed continuous and categorical
covariates, unbalanced and/or missing data, and more general
covariance structures. The proposed approach has been implemented
using existing statistical software (PROC MIXED in SAS) and has been
applied to data from the Connecticut Child Surveys.
Often the responses obtained from multiple
informants are ordinal rather than binary, counts or measured
responses. The analysis of ordinal data in complete generality is
complicated by the need to specify many covariance parameters. Glonek
and Laird (2002) have undertaken a project to look at the loss
of information that may occur if the ordinal responses are
dichotomized. In the case of one informant, they show that
little will be lost. With two or more, their results suggest that
unless there are marked differences in the levels of responses for
different informants, the use of dichotomous variables offers a
simple approach that retains most of the information in the
data.
Daskalakis,
Laird, and Murphy (2001) developed methods for analyzing
categorical multiple-informant outcomes in longitudinal studies with
"discrete time" designs (i.e., studies where assessments are
conducted at a few timepoints, common to all subjects). The
methodology has been implemented in a macro that is available for
general use. These methods were applied in the analysis of data on
depression from the Stirling County Study.
Agreement of multiple
informants
Most of our previous
work has dealt with regression models using either risk factors or
outcomes measured by multiple sources, but studies of inter-source
agreement are also important. Traditionally, the Kappa statistic is
often used for studying agreement. Daskalakis
(2002) reviewed and evaluated various methods for constructing
confidence intervals and conducting hypothesis tests for
Kappa. The Kappa statistic has several limitations, however,
including dependence on marginal response rates and dependence on
number of categories in the ordinal or nominal outcome. Log-linear
modeling overcomes some of these disadvantages and leads to a more
flexible way of modeling the effects of covariates on agreement; Zahner
and Daskalakis (1998) applied this methodology to study
parent-teacher agreement in the Connecticut Child Surveys.
Partially observed multiple informant reports with
different types of non-response
Horton
and Fitzmaurice (2002) considered methods for handling
non-ignorable missingness in multiple informant reports. This work
was motivated by data from the Connecticut Child Surveys, which
solicited multiple informant reports of psychopathology from parents
and teachers. However, in this study teacher ratings were not
available on over 40% of the children and a variety of causes of
missingness could be distinguished, e.g., school district
nonparticipation, parental refusal to give consent, and teacher
nonresponse. They proposed mixture models that permit estimation
under the assumption that there are two distinct types of
missingness mechanisms, one that is ignorable, the other
non-ignorable.
Multiple informants as risk factors or
predictors
Horton,
Laird and Zahner (1999) provided an overview of the
available methods for analyzing multiple-source data used
as a risk factor or predictor variable and evaluated them
for the case where both the multiple-source risk factor
and the outcome are binary.
Horton
and Laird (1999) and Horton
and Laird (2001) showed how to accommodate missing data
from one or more sources in this setting. This work focused on
binary multiple-source risk factors. Although the methods are
somewhat more straightforward for continuous
multiple-source risk factors, there still is no
universally accepted approach, particularly in the presence of
missing data.
Horton
et al. (2001) proposed a generalization of the methods outlined
in Horton,
Laird and Zahner (1999) to a time-to-event setting.
Specifically, they proposed a regression model relating the
distribution of time-to-event data or survival time to multiple
informant risk factors that are only partially observed. The methods
were used for analyzing the mortality associated with psychiatric
disorders in the Stirling County study. These methods have
also be applied to an example where 5 reports of comorbidity were
used to predict use of tamoxifen amongst a cohort of breast cancer
survivors (Lash
et al, 2002).