“Econometrics texts devote many pages to the problem of multicollinearity in multiple regression, but they say little about the closely analogous problem of small sample size,” Arthur Goldberger noted many years ago, continuing, “Perhaps the imbalance is attributable to the lack of an exotic polysyllabic name for `small sample size.’ If so, we can remove that impediment by introducing the term `micronumerosity.'”

His point, of course, is the same as the one I made above. Collinearity is not an issue, even in finite samples, except in the same sense as small samples themselves are an issue: there is just less information present in the data, but our measures of sampling variability correctly reflect that lack of information.

]]>I am not quite sure the terminology is as settled as you seem to think. In my understanding, collinearity is between two variables, multi-collinearity is when 3 or more variables are in a linear relation with each other. As for anything less than perfect collinearity not being an issue, this is true only at the limit if you have a very large sample. I must say I have found many different and partially conflicting accounts of the issue. My choice of terms corresponds to what I have seen majoritarily used in the field.

Another issue I do not mention in the blog post is that the variables used in the regression are an arbitrary choice of measures of an underlying risk concept, with the relation between the underlying concepts not being clearly defined.

]]>I think perhaps what you have in mind in your opening paragraph is *perfect* collinearity, which means the covariates are linearly dependent, and in turn that the model is not identified by the sample, and your software will indeed warn you about it (Stata, for example, will just drop variables until the model is identified).

]]>