Log and Exponential Transformations and Other Statistics notes:
Baseline Category in RIn categorical variables, the baseline category serves as the reference group against which the effects of other categories are measured. In R, when performing regression analysis with categorical variables, the first level of the factor is chosen as the baseline by default. However, this can be changed based on analytical needs or interpretability by releveling the factor so that another category serves as the reference.
Transformations: Log and Exponential ModelsTransformations such as log and exponential are used to linearize relationships between variables, making linear regression models more applicable when the original relationship is non-linear.
QQ Plot: Normality CheckA QQ (Quantile-Quantile) plot is a graphical tool to assess if a dataset follows a particular distribution, such as the normal distribution. If the points in a QQ plot lie roughly along a straight line, the data is considered to follow that distribution. Heavy tails suggest the presence of outliers or deviations from the assumed distribution.
Model Comparison
OLS (Ordinary Least Squares) Regression
OLS regression is the most common method used for linear regression analysis. It aims to find the line (or hyperplane in higher dimensions) that best fits a set of data points. The "best fit" is determined by minimizing the sum of the squared differences between the observed values and the values predicted by the linear model. This method assumes that the errors (residuals) between the observed values and the model's predicted values are homoscedastic, meaning they have the same variance across all levels of the independent variables.
Key Characteristics of OLS:
Key Characteristics of HC:
....................................
Setting the Baseline
When dealing with categorical variables, regression analysis assigns a baseline category as the reference point. In R, by default, the first level of a factor is chosen as the baseline. But worry not! You can easily adjust this based on your analysis to ensure better interpretability.
Transforming for Linearity
The magic of linear regression lies in its ability to model linear relationships between variables. But what if your data isn't linear? Fear not! Log and exponential transformations come to the rescue. Log transformations help stabilize variance and create a more linear relationship, while exponential transformations model exponential growth or decay.
Understanding Baseline Interpretations
In the context of regression with categorical variables, the baseline interpretation refers to the expected change in the dependent variable (the variable you're trying to predict) when moving from the baseline category to another category, assuming all other variables remain constant.
Checking for Normality: The QQ Plot
The QQ (Quantile-Quantile) plot is a visual tool to assess if your data follows a normal distribution, a key assumption for linear regression. If the points in the QQ plot roughly follow a straight line, your data is considered normal. However, heavy tails suggest outliers or deviations from normality, potentially affecting your analysis.
Model Selection: Choosing the Best Fit
When it comes to model selection, several metrics help us choose the best fit for our data. Here are a few to remember:
Coefficients in log-transformed models represent the percentage change in the dependent variable for a one-percent change in the independent variable, holding other variables constant. On the other hand, coefficients in exponential models translate to multiplicative effects on the dependent variable for a one-unit change in the independent variable.
Heteroscedasticity: A Potential Roadblock
Heteroscedasticity occurs when the variance of the error terms (the difference between predicted and actual values) varies across levels of an independent variable. This violates a key assumption of linear regression and can lead to biased estimates.
Robust Standard Errors: A Reliable Alternative
To address heteroscedasticity, robust standard errors are used. These adjust the standard errors of the model, providing more reliable hypothesis testing even in the presence of non-constant variance.
Feature Engineering: Beyond Transformations
Sometimes, transformations like log or exponential might not fully capture the relationship between variables. This is where feature engineering comes in. It involves creating new predictors or transforming existing ones to better model the underlying relationships. Additionally, machine learning models can often handle non-linearity more flexibly, potentially reducing the need for transformations.
OLS Regression: The Workhorse
Ordinary Least Squares (OLS) regression is the most common method used for linear regression analysis. It aims to find the line (or hyperplane in higher dimensions) that best fits a set of data points, minimizing the sum of squared errors between observed and predicted values.
Beyond the Basics: Addressing Challenges
This post provides a foundational understanding of regression analysis. However, the journey doesn't end here! Here's a glimpse into some advanced topics you might encounter:
Check for Linearity:
normality
multi-colinearity
Check for Non Linearity:
GLA assumptions:
-> Y is independent
-> Distribution is from the exponential family.
-> Linear relationship not required b/w Y and X.
-> Common variance is not required.
-> Weibull
-> Gompertz
-> Log-logistics
Maximum Likelihood:
likelihood function:
Non Parametric Estimation:
non- parametric equations do not account for the actual situations. they are just based on coefficients.
Semi-Parametric Estimation: hazard rate (lambda) (time, t) (B)
Baseline hazard (lambda = 0, which does not depend on the co-variates): when B>1, it expands the baseline or if B<0, it means there is contraction in the baseline.
example: smoking expands the chances of death whereas workout reduces the chances of death: the baseline roughly remains constant.
Kaplan-Meier Analysis by groups:
Cox-Proportional Hazard Model (semi-parametric): parameters are only on one part of the equation.
Multi-Level Data:
Example:
Level 1 -> Houses
Level 2-> Subdivision
Level 3-> Counties
Level 4: years
Level 5: Decades
Organize the data in the levels based on similarity. So that we capture the averages and patterns based on similar level data otherwise the data averages would be skewed.
OATS Data:
1, 2, 3, ...6 regions in the US where OATS come from.
Random Effect Model:
Limitation: There can be situations where we can find houses have their own subdivisions and no other houses in that subdivision.
## Generally, the accepted way to get rich is using the researched knowledge and experienced knowledge in a way that generate a product that reaches/is useful/and replicable to large scale population.
Survival Models:
Alternate Names: Duratino Analysis, failure analysis, duration analysis,
Event: Patent Died, person god a job, loan was repaid (always binary)
Time Scale: Year, month, week, days (
Origin of the event:
Why cant we use linear regression:
a) Dependent variable and residuals are not normally distributed:
data is not normal, time of entry into the sample may be a poisson distribution, Y is assumed to have continuous probability distribution,
b) Y may be censored (incomplete, time series, binary):
Patent hasn't died at teh current time, we dont know if they will die in a future time, subjects may have dropped out ( or we stopped tracking them) before the sduty ended.
Example: How will you find out the probability of you not dying given that you have not died until time <t (given max life span is 100 years old)?
Answer: P(die at time t|still alive at t-epslon) = P (die time - t)/survived (upto time t)
Censored Data:
1) Patent Died
2) Patent Survived
3) Patent Dropped Out
4) Patient Entered the study Later
5) Patent Died
6) Patent survived
Hazard Rate: probability that the event will happen at time t and given that is has not happened at time < t.
Popular Distributions:
1) Normal -> variance = 1/sigma | PDF (probability density function) = [1/(signma*root(2pi)]*e^(-x-u)^2/(2*sigma^2) |
2) Bernoulli -> single trial with only two outcomes (binary) -> yes/no -> PDF -> 1-P or P | by symmetry the center is at p=1/2 | variance = 1/4
3) Binomial -> probability of observing a set of bernoulli's trials -> Pr =(X=k) -> (n,k)p^k (1-p)^(n-k) where (n,k) = n!/(k(n-k)! | Three parameters, n=total number of trials , x = current trial; p = probability of success
4) Poissons: failures are indepnednt and they dont happen at the same time - (example how many smoke detectors fail at my home at the same time) - assuming there is no other factors causing the failure meaning the failures are independent.
5) Uniform Distribution: any number in the bucket b/w A and B (connected by straight line) | PDF = 1/n when x belongs to A and 1/(b-a) if a<x<b | E(x) = (a+b)/2 and Var = (b-a)^2/12 | standard Uniform density distribution | very similar to linear distribution | but doesn't increase/decrease with change in Y (axis)
Can a Von-Neumann type of computer generate real random numbers? Answer is no: when we set a random seed = 42 example - it always generate same set of random numbers -> because there is a pseudo set of numbers that uses uniform distribution equation and generates random numbers based on the random seed number we assign.
RGTI -> quantum computer block chain company.
Talking about vacation next week - > we want to
#missed first 5 minute of the class
Original Notes:
The baseline category for R when producing the linear model (regression), R chooses the first entry as the baseline, but we can change the baseline per our needs.
Transformations: Log and exponential models
Baseline interpretations: for 1 unit increase in the baseline, there is related x magnitude increase/decrease per unit change in the baseline.
We apply the log function to the independent variable to linearize the data (for QQ plot).
QQ Plot: Normality Check: Heavy tails on left and right indicate outliers. if the values of QQ plot lie on the same line, then the data is not normal.
Model Comparison:
R^2: is worse: of same models should only be compared. Never compare R^2 of two different models.
QQ Plot: is worse
Universal model comparison is done by AIC and BIC.
Interpretation of LOG and Exponential Model:
-> Both models are used to make the data look more linear.
-> Log model -> increase or decrease is in terms of 1%
-> Exponential model -> 1 Unit
log(y)=a+blog(x)
interpretation: if I increase x by 1% y increases by abc%. Example: demand supply elastic curve.
Future Engineering: required when just log and exponential do not work ie. make the data linear on QQ plot.
-> make new predictors (feature engineering)
-> Machine learning models deal with nonlinearity more naturally.
Note: we are trying to linearize the data (through transformations) otherwise the assumptions will fail.
Heteroscedasticity Assumption: If the assumption fails, the data would be biased. Do we know omega in practice? No
So, we assume omega and see how model works. We try to infer omega based on how data responds.
Robust Standard Errors: can only fix the variance of data but they dont fix the bias.
OLS (Ordinary Least Squares) and HC (Heteroscedasticity-Consistent) regression are both statistical methods used in econometrics and statistics for estimating the parameters of a linear regression model. Let's break down what each of these terms means and how they are applied:
OLS (Ordinary Least Squares) RegressionOLS regression is the most common method used for linear regression analysis. It aims to find the line (or hyperplane in higher dimensions) that best fits a set of data points. The "best fit" is determined by minimizing the sum of the squared differences between the observed values and the values predicted by the linear model. This method assumes that the errors (residuals) between the observed values and the model's predicted values are homoscedastic, meaning they have the same variance across all levels of the independent variables.
Key Characteristics of OLS:
Key Characteristics of HC:
It is not uncommon to see better errors (R^2) than HC models with robust standard errors.
STEPS: Estimate the basic OLS
Then Obtain the fitted the values of least square and compute the weights
pass the weights to the new LM call
Weighted Least Squares (WLS): (above): Fitted Square.
Generalized Least Squares (GLS): is more generalized, when omeaga =1 it becomes OLS.
Flexible GLS: reduces the steps of OLS. a +bx. If you are nailing the right omega function, this method works fantastic.
CLASSIFICATION MODELS:
Logit and Probit models:
Predicting coronary heart disease: we want to classify higher chances of heart disease, behavioural, and medial. The model is binary.
Notice that: the LM (Linear model) runs without warning. However, in the residual plot will not be normal, and the Beta values of all variables will be very very small indicating indicating not much co-relation.
The inference will be fraud even though probability will be high. The regression line will be straight, however, the predictions are in the top and bottom horizontal line in the chart.
Beta ->
LOGIT: popular in health science. Logistic regression. Susceptable to hetroskendasticity. ln(P/(1-P)) = B1 + B2X2+.....BkXk
PROBIT: Popular in econometrics and political sceince. Robust to hetroskendasticity. integration...High XB -> more likely the event can happen
X does not have constant effect on Y.
Odds vs probabilities:
p=0 -> O=0 (odds are chances, based on some more known information, probability is definition)
w=P/(1-P) -> formula for odds
Observation is not indicator of causality.
Goodness of fit of R^2 (pseudo R^2)
Assumption of LOGIT MODEL
1) Non linea transformation
2) No multi-colinearity
Accuracy using train and test data:
Are false positive same as false negatives?
Accuracy = (TP+TN)/ (TP+FP+TN+FN) | Diagonal /ALL
Precision = TP/(TP+FP) | Right Colm
Sensitivity = TP/(FP+FN)
Recall = TP/(TP+FN) |
Confusion matrix:
The problem of unbalanced Sample:
example; identifying 1 terrorists in a million people -
or identifying fradulent transactions in a million legit transactions.
How to work with multiple classes (not just binary 0 and 1)
-> multinomial logit or ordered logit (dependent variable)
-> logistic regression assume bernoulli distribution with no ordering between variables
-> Logit assume uniform distribution with three probabilities adding upto 1.
-> ordered logit assume ordering of th eDV, and use cumulative elvents for log of odds computation.
Transformations: Log and Exponential ModelsTransformations such as log and exponential are used to linearize relationships between variables, making linear regression models more applicable when the original relationship is non-linear.
- Log Transformation: Applying the log function to one or more variables can help in stabilizing variance and making the relationship between variables more linear. For example, a log transformation of the independent variable log(x) is useful when dealing with multiplicative effects.
- Exponential Transformation: An exponential transformation might be applied to the dependent variable to model exponential growth or decay processes. The exponential model can describe how changes in the independent variable have multiplicative effects on the dependent variable.
QQ Plot: Normality CheckA QQ (Quantile-Quantile) plot is a graphical tool to assess if a dataset follows a particular distribution, such as the normal distribution. If the points in a QQ plot lie roughly along a straight line, the data is considered to follow that distribution. Heavy tails suggest the presence of outliers or deviations from the assumed distribution.
Model Comparison
- R² (R-squared): A measure of the proportion of variance in the dependent variable that is predictable from the independent variables. It is used for comparing the goodness of fit for different models on the same data. Comparing R² across models with different dependent variables or datasets is not appropriate.
- AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion): Both are used for model selection among a finite set of models. They take into account the goodness of fit of the model and the complexity of the model, helping to balance between overfitting and underfitting.
- Log Model: The coefficient in a log-transformed regression can be interpreted as the percentage change in the dependent variable for a one percent change in the independent variable, holding other variables constant.
- Exponential Model: In an exponential model, the coefficient can be interpreted in terms of multiplicative effects on the dependent variable for a one-unit change in the independent variable.
- Heteroscedasticity Assumption Failure: Heteroscedasticity occurs when the variance of the error terms varies across levels of an independent variable, violating one of the key OLS assumptions. This can lead to biased estimates of standard errors, affecting confidence intervals and hypothesis tests.
- Robust Standard Errors: These are adjusted standard errors that account for heteroscedasticity, providing more reliable hypothesis testing. However, they correct only the standard errors for inconsistency; they do not correct bias in the coefficient estimates themselves.
- Omega (Ω) in Practice: In the context of heteroscedasticity, Ω represents the true variance-covariance matrix of the error terms, which is rarely known in practice. Various techniques, including robust standard errors and heteroscedasticity-consistent (HC) estimators, are used to approximate Ω without explicitly knowing it.
OLS (Ordinary Least Squares) Regression
OLS regression is the most common method used for linear regression analysis. It aims to find the line (or hyperplane in higher dimensions) that best fits a set of data points. The "best fit" is determined by minimizing the sum of the squared differences between the observed values and the values predicted by the linear model. This method assumes that the errors (residuals) between the observed values and the model's predicted values are homoscedastic, meaning they have the same variance across all levels of the independent variables.
Key Characteristics of OLS:
- Assumes homoscedasticity (constant variance of errors).
- The estimates are obtained by minimizing the sum of squared residuals.
- Under the Gauss-Markov theorem, OLS estimators are the Best Linear Unbiased Estimators (BLUE) if the assumptions hold, including linearity, independence, and homoscedasticity of errors.
Key Characteristics of HC:
- Does not assume constant variance of errors across observations.
- Adjusts the standard errors of the OLS estimates to be consistent in the presence of heteroscedasticity.
- There are several versions of HC standard errors (e.g., HC0, HC1, HC2, HC3), with different adjustments for small sample sizes or other concerns.
- The regression model can be estimated using OLS to obtain parameter estimates.
- Then, HC standard errors are calculated to correct for heteroscedasticity, allowing for more reliable hypothesis testing and confidence intervals.
....................................
Setting the Baseline
When dealing with categorical variables, regression analysis assigns a baseline category as the reference point. In R, by default, the first level of a factor is chosen as the baseline. But worry not! You can easily adjust this based on your analysis to ensure better interpretability.
Transforming for Linearity
The magic of linear regression lies in its ability to model linear relationships between variables. But what if your data isn't linear? Fear not! Log and exponential transformations come to the rescue. Log transformations help stabilize variance and create a more linear relationship, while exponential transformations model exponential growth or decay.
Understanding Baseline Interpretations
In the context of regression with categorical variables, the baseline interpretation refers to the expected change in the dependent variable (the variable you're trying to predict) when moving from the baseline category to another category, assuming all other variables remain constant.
Checking for Normality: The QQ Plot
The QQ (Quantile-Quantile) plot is a visual tool to assess if your data follows a normal distribution, a key assumption for linear regression. If the points in the QQ plot roughly follow a straight line, your data is considered normal. However, heavy tails suggest outliers or deviations from normality, potentially affecting your analysis.
Model Selection: Choosing the Best Fit
When it comes to model selection, several metrics help us choose the best fit for our data. Here are a few to remember:
- R-squared (R²): This metric indicates the proportion of variance in the dependent variable explained by the independent variables (predictors). It's useful for comparing models on the same data, but not across models with different dependent variables or datasets.
- AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion): These criteria help balance model goodness-of-fit with complexity, preventing overfitting (a model that performs well on training data but poorly on unseen data).
Coefficients in log-transformed models represent the percentage change in the dependent variable for a one-percent change in the independent variable, holding other variables constant. On the other hand, coefficients in exponential models translate to multiplicative effects on the dependent variable for a one-unit change in the independent variable.
Heteroscedasticity: A Potential Roadblock
Heteroscedasticity occurs when the variance of the error terms (the difference between predicted and actual values) varies across levels of an independent variable. This violates a key assumption of linear regression and can lead to biased estimates.
Robust Standard Errors: A Reliable Alternative
To address heteroscedasticity, robust standard errors are used. These adjust the standard errors of the model, providing more reliable hypothesis testing even in the presence of non-constant variance.
Feature Engineering: Beyond Transformations
Sometimes, transformations like log or exponential might not fully capture the relationship between variables. This is where feature engineering comes in. It involves creating new predictors or transforming existing ones to better model the underlying relationships. Additionally, machine learning models can often handle non-linearity more flexibly, potentially reducing the need for transformations.
OLS Regression: The Workhorse
Ordinary Least Squares (OLS) regression is the most common method used for linear regression analysis. It aims to find the line (or hyperplane in higher dimensions) that best fits a set of data points, minimizing the sum of squared errors between observed and predicted values.
Beyond the Basics: Addressing Challenges
This post provides a foundational understanding of regression analysis. However, the journey doesn't end here! Here's a glimpse into some advanced topics you might encounter:
- HC (Heteroscedasticity-Consistent) Regression: This method tackles heteroscedasticity by adjusting the standard errors of the OLS estimates.
- Non-Linear Relationships: When data exhibits non-linearity, alternative models like survival models or classification models (e.g., logit and probit) might be better suited.
- Multi-Level Data: This refers to data with hierarchical structures, requiring specialized analysis techniques.
Check for Linearity:
normality
multi-colinearity
Check for Non Linearity:
GLA assumptions:
-> Y is independent
-> Distribution is from the exponential family.
-> Linear relationship not required b/w Y and X.
-> Common variance is not required.
-> Weibull
-> Gompertz
-> Log-logistics
Maximum Likelihood:
likelihood function:
Non Parametric Estimation:
non- parametric equations do not account for the actual situations. they are just based on coefficients.
Semi-Parametric Estimation: hazard rate (lambda) (time, t) (B)
Baseline hazard (lambda = 0, which does not depend on the co-variates): when B>1, it expands the baseline or if B<0, it means there is contraction in the baseline.
example: smoking expands the chances of death whereas workout reduces the chances of death: the baseline roughly remains constant.
Kaplan-Meier Analysis by groups:
Cox-Proportional Hazard Model (semi-parametric): parameters are only on one part of the equation.
Multi-Level Data:
Example:
Level 1 -> Houses
Level 2-> Subdivision
Level 3-> Counties
Level 4: years
Level 5: Decades
Organize the data in the levels based on similarity. So that we capture the averages and patterns based on similar level data otherwise the data averages would be skewed.
OATS Data:
1, 2, 3, ...6 regions in the US where OATS come from.
Random Effect Model:
Limitation: There can be situations where we can find houses have their own subdivisions and no other houses in that subdivision.
## Generally, the accepted way to get rich is using the researched knowledge and experienced knowledge in a way that generate a product that reaches/is useful/and replicable to large scale population.
Survival Models:
Alternate Names: Duratino Analysis, failure analysis, duration analysis,
Event: Patent Died, person god a job, loan was repaid (always binary)
Time Scale: Year, month, week, days (
Origin of the event:
Why cant we use linear regression:
a) Dependent variable and residuals are not normally distributed:
data is not normal, time of entry into the sample may be a poisson distribution, Y is assumed to have continuous probability distribution,
b) Y may be censored (incomplete, time series, binary):
Patent hasn't died at teh current time, we dont know if they will die in a future time, subjects may have dropped out ( or we stopped tracking them) before the sduty ended.
Example: How will you find out the probability of you not dying given that you have not died until time <t (given max life span is 100 years old)?
Answer: P(die at time t|still alive at t-epslon) = P (die time - t)/survived (upto time t)
Censored Data:
1) Patent Died
2) Patent Survived
3) Patent Dropped Out
4) Patient Entered the study Later
5) Patent Died
6) Patent survived
Hazard Rate: probability that the event will happen at time t and given that is has not happened at time < t.
Popular Distributions:
1) Normal -> variance = 1/sigma | PDF (probability density function) = [1/(signma*root(2pi)]*e^(-x-u)^2/(2*sigma^2) |
2) Bernoulli -> single trial with only two outcomes (binary) -> yes/no -> PDF -> 1-P or P | by symmetry the center is at p=1/2 | variance = 1/4
3) Binomial -> probability of observing a set of bernoulli's trials -> Pr =(X=k) -> (n,k)p^k (1-p)^(n-k) where (n,k) = n!/(k(n-k)! | Three parameters, n=total number of trials , x = current trial; p = probability of success
4) Poissons: failures are indepnednt and they dont happen at the same time - (example how many smoke detectors fail at my home at the same time) - assuming there is no other factors causing the failure meaning the failures are independent.
5) Uniform Distribution: any number in the bucket b/w A and B (connected by straight line) | PDF = 1/n when x belongs to A and 1/(b-a) if a<x<b | E(x) = (a+b)/2 and Var = (b-a)^2/12 | standard Uniform density distribution | very similar to linear distribution | but doesn't increase/decrease with change in Y (axis)
Can a Von-Neumann type of computer generate real random numbers? Answer is no: when we set a random seed = 42 example - it always generate same set of random numbers -> because there is a pseudo set of numbers that uses uniform distribution equation and generates random numbers based on the random seed number we assign.
RGTI -> quantum computer block chain company.
Talking about vacation next week - > we want to
#missed first 5 minute of the class
Original Notes:
The baseline category for R when producing the linear model (regression), R chooses the first entry as the baseline, but we can change the baseline per our needs.
Transformations: Log and exponential models
Baseline interpretations: for 1 unit increase in the baseline, there is related x magnitude increase/decrease per unit change in the baseline.
We apply the log function to the independent variable to linearize the data (for QQ plot).
QQ Plot: Normality Check: Heavy tails on left and right indicate outliers. if the values of QQ plot lie on the same line, then the data is not normal.
Model Comparison:
R^2: is worse: of same models should only be compared. Never compare R^2 of two different models.
QQ Plot: is worse
Universal model comparison is done by AIC and BIC.
Interpretation of LOG and Exponential Model:
-> Both models are used to make the data look more linear.
-> Log model -> increase or decrease is in terms of 1%
-> Exponential model -> 1 Unit
log(y)=a+blog(x)
interpretation: if I increase x by 1% y increases by abc%. Example: demand supply elastic curve.
Future Engineering: required when just log and exponential do not work ie. make the data linear on QQ plot.
-> make new predictors (feature engineering)
-> Machine learning models deal with nonlinearity more naturally.
Note: we are trying to linearize the data (through transformations) otherwise the assumptions will fail.
Heteroscedasticity Assumption: If the assumption fails, the data would be biased. Do we know omega in practice? No
So, we assume omega and see how model works. We try to infer omega based on how data responds.
Robust Standard Errors: can only fix the variance of data but they dont fix the bias.
OLS (Ordinary Least Squares) and HC (Heteroscedasticity-Consistent) regression are both statistical methods used in econometrics and statistics for estimating the parameters of a linear regression model. Let's break down what each of these terms means and how they are applied:
OLS (Ordinary Least Squares) RegressionOLS regression is the most common method used for linear regression analysis. It aims to find the line (or hyperplane in higher dimensions) that best fits a set of data points. The "best fit" is determined by minimizing the sum of the squared differences between the observed values and the values predicted by the linear model. This method assumes that the errors (residuals) between the observed values and the model's predicted values are homoscedastic, meaning they have the same variance across all levels of the independent variables.
Key Characteristics of OLS:
- Assumes homoscedasticity (constant variance of errors).
- The estimates are obtained by minimizing the sum of squared residuals.
- Under the Gauss-Markov theorem, OLS estimators are the Best Linear Unbiased Estimators (BLUE) if the assumptions hold, including linearity, independence, and homoscedasticity of errors.
Key Characteristics of HC:
- Does not assume constant variance of errors across observations.
- Adjusts the standard errors of the OLS estimates to be consistent in the presence of heteroscedasticity.
- There are several versions of HC standard errors (e.g., HC0, HC1, HC2, HC3), with different adjustments for small sample sizes or other concerns.
- The regression model can be estimated using OLS to obtain parameter estimates.
- Then, HC standard errors are calculated to correct for heteroscedasticity, allowing for more reliable hypothesis testing and confidence intervals.
It is not uncommon to see better errors (R^2) than HC models with robust standard errors.
STEPS: Estimate the basic OLS
Then Obtain the fitted the values of least square and compute the weights
pass the weights to the new LM call
Weighted Least Squares (WLS): (above): Fitted Square.
Generalized Least Squares (GLS): is more generalized, when omeaga =1 it becomes OLS.
Flexible GLS: reduces the steps of OLS. a +bx. If you are nailing the right omega function, this method works fantastic.
CLASSIFICATION MODELS:
Logit and Probit models:
Predicting coronary heart disease: we want to classify higher chances of heart disease, behavioural, and medial. The model is binary.
Notice that: the LM (Linear model) runs without warning. However, in the residual plot will not be normal, and the Beta values of all variables will be very very small indicating indicating not much co-relation.
The inference will be fraud even though probability will be high. The regression line will be straight, however, the predictions are in the top and bottom horizontal line in the chart.
Beta ->
LOGIT: popular in health science. Logistic regression. Susceptable to hetroskendasticity. ln(P/(1-P)) = B1 + B2X2+.....BkXk
PROBIT: Popular in econometrics and political sceince. Robust to hetroskendasticity. integration...High XB -> more likely the event can happen
X does not have constant effect on Y.
Odds vs probabilities:
p=0 -> O=0 (odds are chances, based on some more known information, probability is definition)
w=P/(1-P) -> formula for odds
Observation is not indicator of causality.
Goodness of fit of R^2 (pseudo R^2)
Assumption of LOGIT MODEL
1) Non linea transformation
2) No multi-colinearity
Accuracy using train and test data:
Are false positive same as false negatives?
Accuracy = (TP+TN)/ (TP+FP+TN+FN) | Diagonal /ALL
Precision = TP/(TP+FP) | Right Colm
Sensitivity = TP/(FP+FN)
Recall = TP/(TP+FN) |
Confusion matrix:
The problem of unbalanced Sample:
example; identifying 1 terrorists in a million people -
or identifying fradulent transactions in a million legit transactions.
How to work with multiple classes (not just binary 0 and 1)
-> multinomial logit or ordered logit (dependent variable)
-> logistic regression assume bernoulli distribution with no ordering between variables
-> Logit assume uniform distribution with three probabilities adding upto 1.
-> ordered logit assume ordering of th eDV, and use cumulative elvents for log of odds computation.