Analysis Of Incomplete Multivariate Data

Show simple item record

dc.contributor.author Schafer, J. F.
dc.date.accessioned 2021-10-06T07:52:28Z
dc.date.available 2021-10-06T07:52:28Z
dc.date.issued 1997
dc.identifier.isbn 0-412-04061-1
dc.identifier.uri http://repository.dkut.ac.ke:8080/xmlui/handle/123456789/4879
dc.description.abstract This book presents methods of statistical inference from multivariate datasets with missing values where missingness may occur on any or all of the variables. Such datasets arise frequently in statistical practice, but the tools for effectively dealing with them are not readily available to data analysts. It is our goal to provide these tools, along with the knowledge of how to use them. When faced with missing values, practitioners frequently resort to ad hoc methods of case deletion or imputation to force the incomplete dataset into a rectangular complete-data format. Many statistical software packages, for example, automatically omit from a linear regression analysis any case t hat has a missing value for any variable. Imputation is a generic term for filling in missing data with plausible values. In a multivariate dataset, each missing value may be replaced by the observed mean for that variable, or, in a slightly less naive approach, by some sort of predicted value from a regression model. Almost invariably, after the dataset has been altered by one of these methods no additional provision for missing data is made in the subsequent analysis. The research usually proceeds as if the omitted cases had never really been observed, or as if the imputed values were real data. When the incomplete cases comprise only a small fraction of all cases (say, five percent or less) then case deletion may be a perfectly reasonable solution to the missing-data problem. In multivariate settings where missing values occur on more than one variable, however, the incomplete cases are often a substantial portion of the entire dataset. If so, deleting them may be inefficient, causing large amounts of information to be discarded. Moreover, omitting them from the analysis will tend to introduce bias, to the extent that the incompletely observed cases differ systematically from the completely observed ones. The completely observed cases that remain will be unrepresentative of the population for which the inference is usually intended: the population of all cases, rather than the population of cases with no missing data. Ad hoc methods of imputation are no less problematic. Imputing averages on a variable-by-variable basis preserves the observed sample means, but it distorts the covariance structure, biasing estimated variances and covariances toward zero. Imputing predicted values from regression models, on the other hand, tends to inflate observed correlations, biasing them away from zero. When the pattern of missingness is complex, devising an ad hoc imputation scheme that preserves important aspects of the joint distribution of the variables can be a daunting task. Moreover, even if the joint distribution of all variables could be adequately preserved, it may be a serious mistake to treat the imputed data as if they were real. Standard errors, p-values and other measures of uncertainty calculated by standard complete-data methods could be misleading, because they fail to reflect any uncertainty due to missing data. This book presents a unified approach to the analysis of incomplete multivariate data. We will consider datasets for which the variables are continuous, categorical, or both. This approach allows one to analyze the data by virtually any technique that would be appropriate if the data were complete. This is accomplished not by simply modifying the data in an ad hoc fashion to make them appear complete, but by principled methods that account for the missing values, and the uncertainty they introduce, at each step of the analysis in a formal way. These methods tend to be computationally intensive, requiring more computer time than ad hoc alternatives. However, they do not require a heavy investment of analyst time, and can be applied to a wide variety of problems more or less routinely without special efforts to develop new technology unique to each problem. This book is written from an applied perspective, attempting to bring together theory, computational methods, data examples and practical advice in a single source. en_US
dc.language.iso en en_US
dc.publisher Chapman & Hall/CRC en_US
dc.title Analysis Of Incomplete Multivariate Data en_US
dc.type Book en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account