The estimation of \(\alpha\) refers to the predicted value of Y when all X’s are zero (it might not make much sense if the variables cannot assume the value of zero).
Multivariable models
Usually, we think of \(\beta_k\) as having partial effects interpretations.
Meaning that we think \(\beta\) as the change in Y (\(\Delta y\)) given a change in x (\(\Delta x_1\)), ceteribus paribus
i.e., holding all other changes as zero (\(\Delta x_2 = \Delta x_3 = . . . = \Delta x_k = 0\))
Thus, the “effect” or the “association” is \(\beta_1\), holding all else constant.
We can predict the value of \(y\) just like before.
variable salary was int now long
(209 real changes made)
----------------------------
(1)
salary
----------------------------
roe 18501.2
(1.66)
_cons 963191.3***
(4.52)
----------------------------
N 209
----------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001
Scaling
What if, instead, we multiply the x (ROE) by a constant 1000.
So, if you estimate a \(\beta\) of 0.2, it means that a 1 s.d. increase in x leads to a 0.2 s.d. increase in y.
Functional form of relationships
In many cases, you want to use the logarithm of a variable. This changes the interpretation of the coefficients you estimate.
log-log regression: both Y and X are in log values \(ln(Y) = \alpha + \beta \times ln(X) + \epsilon\). The interpretation of \(\beta\) in this case is: one percent increase of \(x\) leads to \(\beta\) percent increase in \(y\).
log-level regression: both Y and X are in log values \(ln(Y) = \alpha + \beta \times X + \epsilon\). The interpretation of \(\beta\) in this case is: one unit increase of \(x\) leads to \(\beta\) percent increase in \(y\).
level-log regression: both Y and X are in log values \(Y = \alpha + \beta \times ln(X) + \epsilon\). The interpretation of \(\beta\) in this case is: one percent increase of \(x\) leads to \(\frac{\beta}{100}\) units increase in \(y\).
Important note: the misspecification of x’s is similar to the omitted variable bias (OVB).
Winsorization
Winsorization
In real research, one very common problem is when you have outliers.
Outliers are observations very far from the mean. For instance, companies that have 800% of leverage (\(\frac{Debt}{TA}\)). Clearly, situations like this are typing errors in the original dataset. And this is more common that one should expect.
Researchers avoid excluding such variables. We only exclude when it is totally necessary.
To avoid using these weird values, we winsorize.
Usually, 1% at both tails.
Winsorization
Look at the following dispersion graphs. Something weird?
library(ggplot2)library(foreign) mydata <-read.dta("files/CEOSAL1.DTA")options(repr.plot.width=6, repr.plot.height=4) ggplot(mydata, aes(x = roe, y = salary)) +geom_point() +labs(title ="Salary vs. ROE", x ="ROE", y ="Salary") +theme_minimal()
Python
import seaborn as snsimport pandas as pdimport matplotlib.pyplot as pltmydata = pd.read_stata("files/CEOSAL1.DTA")plt.figure(figsize=(6, 4)) sns.scatterplot(x ="roe", y ="salary", data=mydata)sns.despine(trim=True)plt.title("Salary vs. ROE")plt.xlabel("ROE")plt.ylabel("Salary")plt.show()
In model (7.1), only two observed factors affect wage: gender and education.
Because \(female = 1\) when the person is female, and $female = 0 $ when the person is male, the parameter \(\delta_1\) has the following interpretation:
Models with binary variables
\(\delta_1\) is the difference in hourly wage between females and males, given the same amount of education (and the same error term u).
Thus, the coefficient \(\delta_1\) determines whether there is discrimination against women:
if \(\delta_1<0\), then, for the same level of other factors, women earn less than men on average.
In terms of expectations, if we assume the zero conditional mean assumption E(\(\mu\) | female,educ) = 0, then
The key here is that the level of education is the same in both expectations; the difference, \(\delta_1\) , is due to gender only.
Models with binary variables
The visual interpretation is as follows. The situation can be depicted graphically as an intercept shift between males and females. The interpretation relies on \(\delta_1\). We can observe that \(delta_1 < 0\); this is an argument for existence of a gender gap in wage.
Let’s say you have a variable that should not show a clear linear relationship with another variable.
For instance, consider ownership concentration and firm value.There is a case to be made the relationship between these variable is not linear.
In low levels of ownership concentration (let’s say 5% of shares), a small increase in it might lead to an increase in firm value. The argument is that, in such levels, an increase in ownership concentration will lead the shareholder to monitor more the management maximizing the likelihood of value increasing decisions.
But consider now the case where the shareholder has 60% or more of the firm’s outstanding shares. If you increase further the concentration it might signals the market that this shareholder is too powerful that might start using the firm to personal benefits (which will not be shared with minorities).
If this story is true, the relationship is (inverse) u-shaped. That is, at first the relationship is positive, then becomes negative.
Models with quadratic terms
Theoretically, I could make an argument for a non-linear relationship between several variables of interest in finance. Let’s say size and leverage. Small firms might not be able to issue too much debt as middle size firms. At the same time, huge firms might not need debt. The empirical relationship might be non-linear.
As noted before, misspecifying the functional form of a model can create biases.
But, in this specific case, the problem seems minor since we have the data to fix it.
In some specific cases, you want to interact variables to test if the interacted effect is significant.
For instance, you might believe that, using Wooldridge very traditional example 7.4., women that are married are yet more discriminated in the job market than single women.
So, you may prefer to estimate the following equation to follow your intuition.
When the dependent variable is binary we cannot rely on linear models as those discussed so far.
We need a linear probability model.
In such models, we are interested in how the probability of the occurrence of an event depends on the values of x. That is, we want to know \(P[y=1|x]\).
Imagine that \(y\) is employment status, 0 for unemployed, 1 for employed.
Imagine that we are interested in estimating the probability that a person start working after a training program.
For these types of problem, we need a linear probability model.
The mechanics of estimating these model is similar to before, except that \(Y\) is binary.
The interpretation of coefficients change. That is, a unit change in \(x\) changes the probability of y = 1.
So, let’s say that \(\beta_1\) is 0.05. It means that changing \(x_1\) by one unit will change the probability of \(y = 1\) (i.e., getting a job) in 5%, ceteris paribus.
Linear probability model
Using Wooldridge’s example 7.29:
where:
\(inlf\) =1 if in labor force, 1975
Linear probability model
The relationship between the probability of labor force participation and \(educ\) is plotted in the figure below.
Fixing the other independent variables at 50, 5, 30, 1 and 6, respectively, the predicted probability is negative until education equals 3.84 years. This is odd, since the model is predicting negative probability of employment given a set of specific values.
Linear probability model
Another example
The model is predicting that going from 0 to 4 kids less than 6 years old reduces the probability of working by \(4\times 0.262 = 1.048\), which is impossible since it is higher than 1.
The takeaway
That is, one important caveat of a linear probability model is that probabilities might falls off of expected empirical values.
If this is problematic to us, we might need a different solution.
Although the linear probability model is simple to estimate and use, it has some limitations as discussed.
If that problem is important to us, we need a solution that addresses the problem of negative or higher than 1 probability.
That is, we need a binary response model.
In a binary response model, interest relies on the response probability.
\[P(y =1 | x) = P(y=1| x_1,x_2,x_3,...)\]
That is, we have a group of X variables explaining Y, which is binary. In a LPM, we assume that the response probability is linear in the parameters \(\beta\).
This is the assumption that created the problem discussed above.
Logit and Probit
We can change that assumption to a different function.
A logit model assumes a logistic function (\(G(Z)=\frac{exp(z)}{[1+exp(z)]}\))
A probit model assumes a standard normal cumulative distribution function (\(\int_{-inf}^{+z}\phi(v)dv\))
Importantly, in a LPM model, the coefficients have similar interpretations as usual.
But logit and probit models lead to harder to interpret coefficients.
In fact, often we do not make any interpretation of these coefficients.
Instead, we usually transform them to arrive at an interpretation that is similar to what we have in LPM.
To make the magnitudes of probit and logit roughly comparable, we can multiply the probit coefficients by 1.6, or we can multiply the logit estimates by .625.
Also, the probit slope estimates can be divided by 2.5 to make them comparable to the LPM estimates.
After these adjustments, the interpretation of the logit and probit outputs are similar to LPM’s.
Tobit
Tobit
Another problem in the dependent variable occurs when we have a limited dependent variable with a corner solution.
That is, a variable that ranges from zero to all positive values.
For instance, hours working.
Nobody works less than zero hours, but individuals in the population can work many number of positive hours.
When we have such type of dependent variable, we need to estimate a tobit model.
Tobit
Tobit can make a huge difference to the LPM model.