It is convenient to assume an environment in which an experiment is performed: the dependent variable is then outcome of a measurement.
The regression equation deals with the following variables:
• The unknown parameters denoted as β. This may be a scalar or a vector of length k.
• The independent variables, X.
• The dependent variable, Y.
Regression equation is a function of variables X and β.
The user of regression analysis must make an intelligent guess about this function. Sometimes the form of this function is known, sometimes he must apply a trial and error process.
Assume now that the vector of unknown parameters, β is of length k. In order to perform a regression analysis the user must provide information about the dependent variable Y:
• If the user performs the measurement N times, where N < k, regression analysis cannot be performed: there is not provided enough information to do so.
• If the user performs N independent measurements, where N = k, then the problem reduces to solving a set of N equations with N unknowns β.
• If, on the other hand, the user provides results of N independent measurements, where N > k regression analysis can be performed. Such a system is also called an overdetermined system;
In the last case the regression analysis provides the tools for:
1. finding a solution for unknown parameters β that will, for example, minimize the distance between the measured and predicted values of the dependent variable Y (also known as method of least squares).
2. under certain statistical assumptions the regression analysis uses the surplus of information to provide statistical information about the unknown parameters β and predicted values of the dependent variable Y.
No comments:
Post a Comment