Utilisation du modèle linéaire. Rappels de base - méthodes de validation
Linear regression modelling is one of the most widely used statistical tools: usual applications of this "core technique" encompass description and analysis of experimental data, interpolation, help in recognizing causal relationships, and forecasting. It is thus necessary for the practicioner to obtain an understanding of the basic principles necessary to apply regression methods in a variety of settings. Accordingly, the first two chapters provide the standard results for the simple linear regression (only one controlled variable, or regressor). Since the emphasis is on practical applications, theoretical results are stated without proof, and the major guidelines are building models, assessing fit and reliability, and drawing conclusions. Chapter 1 relies upon a geometrical approach to introduce the identification of the model by ordinary least squares. Chapter 2 focuses specifically on statistical modelling: the fundamental concepts of estimation theory are first recalled, and the maximum likelihood estimation of the model parameters is then presented; statistical inferences are treated here in the "classical" (gaussian) framework, i.e., the errors are assumed to be independent and identical normal random variables. The generalization to the multiple linear regression model (two regressors at least) is described in Chapter 3 using matrix algebra; this chapter examines the design matrix properties generating multicollinearity problems: included are their sources, their harmful effects, and a review of available diagnostics and remedial measures.
The next two chapters form the nucleus of this practically oriented textbook on regression analysis, whose successful use requires a capacity both in checking the model adequacy, and in managing the practical difficulties that arise when the technique is employed with real-world data. Chapters 4 and 5 put therefore the emphasis on the art of exploratory data analysis rather than on statistical theory, and cover several procedures designed to detect various types of disagreement between observations and the assumed model. Chapter 4 introduces diagnostics for investigating departures from the usual assumptions on the random error component of the model (e.g., heteroscedasticity, autocorrelation, or non-normality); remedial actions are also examined, for instance analytical methods for selecting transformations to stabilize residual variance. Chapter 5 goes beyond the residual analysis, by introducing methods for assessing the influence of individual observations, with the purpose of pinpointing outlying values both in the response variable and in the explanatory part of the model (the so-called "leverage points" in the latter case). This chapter also emphasizes a complementary line of inquiry, through the introduction of robust (or resistant) regression methods that require progressively fewer untenable assumptions, and whose results remain trustworthy even if a certain amount of observations are outliers. The concepts of breakdown point and influence function of an estimator are introduced; it is further stressed that robust methods provide powerful tools in identifying outliers, or, more generally, "troublesome" observations.
Despite its broad range of application, linear regression calls for generalizations; two of them are examined in Chapter 6: the first one is a brief introduction to logistic regression, which offers a didactic example of one special case in the class of generalized linear models; the second one deals with the structural relationship.
Gros Philippe (2000). Utilisation du modèle linéaire. Rappels de base - méthodes de validation. Ref. DEL AO. https://archimer.ifremer.fr/doc/00013/12405/