|
FALL 2004 GUIDE 7: REGRESSION BASICS DR SUSAN CAROL LOSH |
Yes,
it is true. Time really flies!
Here is what the rest
of the Fall term looks like:
|
|
|
| Assignment 4 | November 12 (Friday 3 PM my box) |
| Assignment 5
NEW DATE!!! |
November 29 (Monday) |
| Exam 3 | December 8 (Wednesday 5:30) |
| READ THIS GUIDE FIRST!
KEY TO: Agresti and Finlay, Chapter 9, pp 301-342; Chapter 11, pp 382-404 and pp 411-421 I treat bivariate and multiple regression
as a comprehensive unit because I believe it is easier to learn this way.
Therefore, it is a good idea to read through Guides 7 and 8 which translate
much of this material, then go back and read both selections of Agresti
and Finlay's Chapters 9 and 11.
|
|
|
|
|
|
|
Multiple crosstabulation or contingency tables can take us only so far.
|
|
The resulting predictive regression coefficients are net effects, controlling for every other independent variable in the regression equation.
This is one terrific technique!
IF YOU CAN MEET THE ASSUMPTIONS THAT
ARE REQUIRED TO USE IT.
A WEIGHTY EXAMPLE TO START US OFF |
You are a nutritionist--or a sports psychologist--or a medical doctor--or perhaps just a weight watcher--who is trying to predict the weights of a group of adult women enrolled in Educational Psychology courses. You have measured the weight in pounds of each woman enrolled in these classes. (You have also measured several other variables, but we will get to those in a little bit.)
With no other information, your best guess for each woman's weight will be the average or mean weight in pounds of all enrolled women.
Obviously, with no other information to go by, you will be wrong much of the time as you try to predict each woman's weight. Mean weight gives us very little information about how much any one individual weighs. For weights close to the female mean, you will be reasonably accurate, but if there is any kind of variation in weight across your sample of women, you will make many, many mistakes or errors.
How much will you be wrong (on the average)?
We can measure that with an old friend:
You can measure
your average error with the standard deviation
of mean weight. That is the "average distance"
each woman's weight is from the mean weight score.
It's also your average "error" or "mistake" in guessing weight if you use the mean weight score as a predictor.
Actually, we are going to use a "cousin" of the standard deviation. This time, instead of the average deviation, we look at a total error estimate for the entire sample, which we call "the total sum of squared deviations around the mean,' or the "Total Sum of Squares," or just TSS for short. You met the total sum of squares in Guide 3 on the way to learning the standard deviation.
All we do is add
up the entire set of squared deviations between an individual score and
the mean score, as you see below. We will use "y" to represent our dependent
variable score.
|
Aha! A jolly statistician comes to
your rescue to reduce your errors in prediction.
"Did you also measure each woman's height
in feet and inches?" she asks.
"Of course," you respond.
"Then," she says, "I have a formula for you that will help you predict weight better than simply using the mean weight score alone and here it is:
Weight in pounds = 100 pounds for the first five feet of height + 5 pounds per inch over five feet.
That's pretty verbose, so let's call height in inches "x" and pounds of weight "y". Then we can rewrite our formula very simply as:
y = 100 + 5x
Now, let's see if you can do a better job predicting weight."
Oops. We left some things out.
After all, do all five feet tall women weigh 100 pounds?So we really need to rewrite the equation this way:
Of course not, some weigh more and some weigh less.Do all five feet five inches tall women weigh 125 pounds?
No, some do, but some weigh more and some weigh less.
y = 100 + 5x + e
Where "e" stands for the ERROR TERM, or the difference between the woman's actual observed weight and the weight we would predict for her if we knew her height score.
The "e" term helps us balance the equation exactly so that the left hand side and the right hand side of the equation match. Sometimes it is called the RESIDUAL TERM (see below).
and we can write the equation more generically still as:
y = a + bx + e
NOTE:
these are terms that are typically used with SAMPLE data. Very often, with
POPULATION data, you will see the greek letters used, a =
and b =
or:
y =
+
x +
where "a" (or alpha) is the "intercept term" and "b" (or beta) is the "slope" term and "e" (or epsilon) is the "error" term. We can graph "perfect" height and weight with no errors in prediction this way in the following graph. The formula creates a straight line.
The "intercept term" or "constant" term ("a") is where the line crosses the "y" axis or the dependent variable score. It is the Y score we would expect if the X score were "0". For example, if a woman were exactly five feet tall, her "x" score for the number of inches over five feet would be zero and she would weight exactly 100 pounds.
The "b" term is a slope. It is the rate of increase (or decrease) in the dependent variable for a ONE UNIT CHANGE IN THE INDEPENDENT VARIABLE.
The "b" term tells us how many units to go up or down in the dependent variable for a one unit change in the independent variable.
IN OUR EXAMPLE, FOR "X" or "Height," ONE UNIT IS ONE INCH OF HEIGHT (over five feet). The slope term, weight change or "b" term is five POUNDS.
IMPORTANT: The b terms ARE ALWAYS IN THE UNITS OF THE DEPENDENT VARIABLE. In my example, the bs will always come out as pounds.
Thus, for
each additional inch of height over five feet tall, the woman weighs five
more pounds.
PLEASE BECOME USED TO SAYING THESE
RESULTS IN WORDS. YOU WILL HAVE TO DO SO ON ASSIGNMENT 5 AND ON EXAM 3.
|
y or Weight in pounds
150 |
X
x or height in inches (5 feet
plus)
If you were to draw a line connecting these points, it should look like a straight line, with all the dots or "x"s on the line. This is an example of a "perfect linear relationship" with |
We call the b
(or
in the population) term
the METRIC REGRESSION
COEFFICIENT. Notice
that the metric b term always comes out in the metric units of the dependent
variable, such as pounds.
In our example here, b always comes out IN POUNDS and weight in pounds is our dependent variable, or what we are trying to predict.
If you were trying to predict Graduate Record Exam scores, the metric b term would be in metric units of GRE points.
Sometimes the metric regression coefficient is called the "unstandardized" regression coefficient. Use the metric coefficient when you want to make a definitive prediction (e.g., a person's weight in pounds or the dollars of a person's salary income).
In the "real world," we probably can't
perfectly
predict weight from height. Instead of connecting the points
to form a straight line, we are more likely to have a "cloud of points"
with a certain amount of variability around the line. Each point represents
a combined height and weight score.
|
y or Weight in pounds . 150 | . . . X 145 | . . . X . 140 | . . . X . 135 | . . . . X . . 130 | . . . X . . 125 | . . . X . . 120 | . . . X . 115 |. . . X . 110 |. . X . . 105 |. X . 100 |X .___________________________________________________ 5' 5'1" 5'2" 5'3" 5'4" 5'5" 5'6" 5'7" 5'8" 5'9" 5'10" x or height in inches (5 feet
plus)
|
Although,
even with some variability, we do suspect that ON THE AVERAGE, we
will do a better job predicting each woman's weight if we know her height
than if we only had weight scores and the mean weight alone.
What do I mean by "a better job"? The primary way statistical analysts do it is to look at the difference between:
the actual weight
scores and the predicted weight scores -- or the "error term" e.
Remember the error
term a little ways back?
The DIFFERENCE BETWEEN
THE OBSERVED AND THE PREDICTED SCORE?
For example, if we had a woman who actually weighed 130 pounds and we had a predicted weight score (using height) of 125 pounds we would have 130 - 125 or a deviation score of "5".One simple symbolic way to describe the difference (deviation) between the OBSERVED SCORE and the PREDICTED (ESTIMATED) SCORE is to:If we had a woman who actually weighed 120 pounds and we had a predicted weight score (using height) of 125 pounds we would have 120 - 125 or a deviation score of "- 5".
|
|
y
|
|
(or "y-hat") |
|
(Sometimes you will see the predicted score written as y' or as "y-prime".)
We then call
the difference between the observed and the estimated (or predicted)
score:
|
|
The "e" or ERROR TERM gives us some flexibility because we know that we probably will not be able to EXACTLY predict the dependent variable.
In the population, e will be denoted by
an epsilon or "
" term.
|
|
Sometimes the "e" term is called THE RESIDUAL TERM, because the "e" term is what is "left over" when you have done your best job of predicting the dependent variable and produced an estimated dependent variable score for each person.
Another way to think about the residual is as a deviation from the regression line or plane like the scattered points around the regression line in my box above. More examples of residuals are provided for you below as I extend the example about height and weight. Graphically, the residual is a vertical deviation from the predicted weight score.
There are a lot of regression assumptions about the distribution of the residual terms (do they resemble a normal distribution? are they clearly some other kind of distribution? do they incorporate some type of statistical bias?) that we will examine in Guide 8.
We can then find an average deviation score and compare the average "e" deviation with the standard deviation. If height is a good predictor of weight, then the average deviation between the observed weight and the estimated weight will be very small (the estimated weight will be very close to the observed weight), and this average "e" deviation will be much smaller than the standard deviation.
To ascertain the TOTAL ERRORS for
all our cases put together, we look at the sum of the squared differences
between the observed and the predicted dependent variable score for each
individual. (the "unexplained" or residual sum of squares).
|
REMEMBER THIS PHRASE! We call the sum of the squared differences between the observed and predicted scores on the dependent variable for each individual (what you see in the box immediately above):
(THE AGRESTI AND FINLAY TERM)
The "Sum of Squared Error" or SSE
Makes sense: this is the total sum of all
the squared residuals. It is the variation in the dependent variable that
all the independent variables put together cannot account for or
"explain" in the dependent variable. Hopefully, it is a form of random
error or variation from one person to another.
|
ALTERNATIVE NAMES FOR THE SUM OF SQUARED ERROR: Now, comes the hard part. This useful diagnostic entity has several different names. It has been called: "the residual sum of squares" (we don't want that one. Too easy to confuse with the "regression sum of squares.") It has been called "the error sum of squares" (we don't want that one either, we reserve the abbreviation "ESS" for the explained sum of squares). the "unexplained sum of squares" (USS) which is one of the more popular terms. so don't be surprised if you run across any of these three terms instead as you read or attend conferences. |
EXTENDING OUR WEIGHTY EXAMPLE |
With simple regression, you have ONE independent variable. With multiple regression, you have AT LEAST TWO independent variables.
In fact, if you look at the Berkeley SDA program, it allows you to enter 52 independent variables (but I recommend you do not try this at home).
Surely, height is not the whole story for weight, or we would be able to predict each woman's weight perfectly by knowing her height score.
So, let's examine some other possible predictors. How about:
100
pounds for the first 5 feet
add
5 pounds for each inch over 5 feet
add
10
pounds for each 1/2 inch wrist measurement over 6 inches
(subtract 10 pounds for each 1/2 inch wrist measurement under 6 inches)
OR 100 + (5 X 4) + (10 X 0) + (2 X 1.5) - (1 X 7) OR 100 + 20 + 0 + 3 - 7 = 116 predicted pounds of weight Her observed weight is 115 pounds so
her "e" or residual score is the difference between her observed score
(115 pounds) minus her predicted score (116 pounds): 115 - 116 = - 1 pound.
|
*2 1/2 inch increments over a 6 inch wrist OR 100 + (5 X 4) + (10 X 2) + (2 X 3) - (1 X 1) OR 100 + 20 + 20 + 6 - 1 = 145 predicted pounds of weight Her observed weight is 150 pounds and her predicted weight is 145 pounds, so her "e" or residual score is: 150 - 145 = + 5 pounds. |
Notice that even after adding all these predictors, we STILL have not predicted each woman's weight perfectly. However, we should do much better than we would have done knowing only each woman's height, and we will virtually certainly do better than if we knew only mean weight score for the total group of women.
So, once again, we will have a new error term for each woman, that represents the difference between her observed weight and the weight that we predicted for her knowing her height, wrist measurement, kilo calories per day and weekly exercise habits. And we can write this exactly the same way as we did for using only one predictor, that is:
We still call
the difference between the observed and the estimated (predicted) score:
|
|
This, again is the RESIDUAL SCORE for
each woman.
Each person we study has a unique "e"
or residual score.
For example, if woman #1 really weighs
115 pounds, her error "e" score = 115 - 116 or - 1 pound.
She weighs one pound LESS than expected.
If woman #2 really weighs 150 pounds, her
error "e" score = 150 - 145 or 5 pounds.
She weighs 5 pounds MORE than expected.
However, now that we have included three
additional predictors, the regression equation (or prediction equation)
looks more complicated. Generically, we can now write this equation as
you see below (remember for a population we would use the Greek letter
instead of the bs):
|
The difference is that the observed equation includes the "e" term and the estimated equation drops the "e" term. The estimated equation is more generic, what we typically use when we refer to the entire sample.
We are now working with multiple regression. In multiple regression, you have several xs and bs in the equation, one for each INDEPENDENT variable (you still have only ONE DEPENDENT variable.).
And, for our specifics in this weighty
example, it becomes:
|
y = 100 + 5x1 + 10x2 + 2x3 + (-1)x4+eobserved where: y = observed weight in pounds x1
=
number of inches over five feet (negative for under five feet in height)
|
This whole process is what we call STATISTICAL CONTROL. You do not need to physically separate your case base into distinct groups as you did with multivariate contingency tables. Instead, as we shall shortly see, you will control through the covariances (a form of correlation) that each variable has with every other variable -- the relationships between your dependent variable (weight in this example) and each of your independent variables in turn -- as well as the correlations that your independent variables have with one another.
Sound immensely complicated? Well...it's not exactly easy, but as we take each piece of the regression separately, it is manageable knowledge. While there are computational formulae to solve for the bs if you have one or two independent variables, more complex models use matrix algebra or partial derivatives from calculus.
But first, let's see what data assumptions
must be met to use regression analysis in the first place.
|
|
In regression analysis, we make numeric predictions about scores on a numeric dependent variable.
We also use our independent
variables to make precise "more than" or "less than" statements about values
on the dependent variable. For example, we say that people who eat more
weigh more--and we then make a precise prediction about how many more
pounds someone will weigh for each kilo calorie consumed daily.
|
|
This means that:
The regression equation (like our height, bone structure, kilo calories, exercise and weight example above) defines the "best fitting" straight line" or n-dimensional geometric plane to describe our data.
"a" and "b" (or
the "bs" are chosen to minimize
the "average e". That is, the regression line or plane is calculated to
minimize the average error term for a particular data set. We will examine
the formulas that produce the intercept and slope terms in Guide 8. They
are cumbersome to calculate, but that is what computers are for.
| When we have | One independent and
One dependent variable |
We call this | SIMPLE REGRESSION |
| When we have | At least two independent
and
One dependent variable |
We call this | MULTIPLE REGRESSION |
|
|
You originally learned Pearson's r as the
correlation coefficient between one interval (or ratio) variable and a
second interval (ratio) variable. If you revisit the original complex formula
for Pearson's r, here it is:
|
____________
/
my V
means to take the square root.
|
But there are some other ways to conceptualize
Pearson's r or
(in the population).
We can also see Pearson's r as:
| When you have | ONE INDEPENDENT VARIABLE
(and one dependent variable) |
AT LEAST TWO INDEPENDENT VARIABLES
(and one dependent variable) |
| Use: |
|
|
| Call it | The zero order correlation
The bivariate correlation |
The multiple correlation coefficient
The multivariate correlation |
Recall from Guide 5 that R2 (or
r2) is THE
PRE measure .
R2 tells us how precisely
we can predict scores on the dependent variable from scores on at least
one independent variable. As we will see in Guide 8, R2 tells
us how much of the variation we can explain in the dependent variable,
knowing the scores of all the independent variables in the regression equation.
R2 tells us how close the data are to the regression line or regression plane that you can draw with the regression equation.
Use the strength chart from Guide 5 to evaluate R2 from very weak to very strong (R2 is always positive).
R2 X
100 is often called the percent variance explained in the dependent variable.
When you are
asked, "how much variation did you explain in the dependent variable?",
your answer will be the value of R2.
(SMALL NOTE: There is also something called the "adjusted" R2 is The adjusted R2 is adjusted for the number of independent variables or predictors. What happens is that the "adjusted" R2 "shrinks" in size if you include many independent variables which have trivial correlations with the dependent variable. With three predictors, the values of the adjusted" R2 and the R2 will probably be about the same.)
The partial correlation coefficient is the correlation between an independent variable and the one dependent variable, statistically controlling for at least one other independent variable.
You actually had an indirect introduction
to the partial correlation coefficient when you examined the partial subtables
in the three-way crosstabulation assignment. However, in the case of
regression, you have ONLY ONE PARTIAL CORRELATION COEFFICIENT PER
INDEPENDENT VARIABLE (per equation). In the case of regression, your
partial correlation coefficient is a kind of weighted average across all
the values of the control (or second independent variable) variable.
|
|
I am introducting
a lot of new terms here, so you may want to pause, look back over this
section, and ensure that you are comfortable with what is meant by the
following terms:
|
|
|
In REGRESSION, we try to predict scores on an interval-ratio (that is, numeric) dependent variable using numeric independent variables. We are about to break one of our regression assumptions for a very special case.
Normally, our independent variables are ALSO numeric. However, we can include ordinal or nominal variables as independent variables in regression IF and only if these variables take what is called "dummy variable" form.
Contrary to their name, dummy variables
are very smart variables indeed.
A dummy variable
is a dichotomized variable that can ONLY take on the scores of 0 or 1.
One value of the dichotomy is scored "1";
the other value is coded "0".
Mathematically, this enables us to do
several things:
|
where: |
What happens to this estimated
equation for the cases scored "0" on eating dessert?
What happens to this estimated
equation for the cases scored "1" on eating dessert?
(Without the "e" or error term at the
end of the equation, it is an ESTIMATED [or approximate] prediction equation.)
For simplicity's sake the ONLY variable
I will touch in this equation is our "dummy variable" for "eats dessert."
All the other terms are just as shown in the box immediately above.
| Person eats dessert, D1 = 1 | |
| Since 1 X "anything"
= "anything",
5 x 1 = 5 |
|
| Non-dessert eater, D1 = 0 | |
| Since 0 X "anything" = 0, 5 x 0 = 0 |
For DESSERT EATERS ONLY, we can
rearrange the terms in the regression equation (notice there is no "X"
or independent variable category associated with D1). So our
comparison now looks like this:
| Person eats dessert, D1 = 1 |
|
| Add 100 + 5 for dessert-eaters |
|
| And the equation for non-dessert eaters is just: |
|
What this means is that THE INTERCEPT (or "bo" term) will be 5 pounds higher for our dessert eaters. No matter what else they do, all that sugar and fat catches up with them and they will weigh 5 pounds more on the average than non-dessert eaters, all other things (height, bone structure, kilo calories, exercise) equal.
Here a couple of questions you should ask about dummy variables:
1. My variable has several categories. How should I collapse it down to just two categories, scored 0 and 1?
2. How do I decide which category should be scored 1 and which one should be scored 0?
First, you CAN create more than one dummy variable from a single nominal or ordinal variable. However, your results will be more complicated to interpret. If you believe that one group or category in your variable has something unique, you are better off just dichotomizing because it will be easier to interpret your results.
However, if you have k categories, where "k" is the number of categories in your nominal or ordinal variable, you can create NO MORE THAN k - 1 dummy variables from this. For example, if your nominal or ordinal independent variable has 3 categories, you can only create 2 dummy variables.
One category must ALWAYS be scored zero, no matter how many other categories you have (in a two category variable, it's easy, one category is scored 1 and the other is scored 0).
Let's look at the example below. You have three categories for the variable "marital status": NEVER married, EVER married (divorced, separated, widowed) or CURRENTLY married. You decide that you will compare the other marital status groups with those who are currently married.
Since you have 3 values or categories, you can create TWO dummy variables. The category "currently married" will be scored zero for both dummy variables.
The "currently married" will be the
OMITTED or the REFERENCE CATEGORY for the two dummy variables, "never married"
and "ever married".
| DUMMY VARIABLE |
|
|
| CATEGORY SCORE FOR THE: | ||
| Currently Married |
|
|
| Never Married |
|
|
| Ever Married |
|
|
The "never married" are coded 1 on the
"never married" dummy variable, D1. Everyone else is coded zero.
The "ever married" are coded 1 on the
"ever married" dummy variable, D2. Everyone else is coded zero.
The "currently married" are our reference
group in this example and are ALWAYS coded zero on BOTH dummy variables,
DI and D2 .
You would do this if you had reason to believe that people who were never married and people who had once been married (but weren't any more) differed in some way from the people who were currently married.
Second, who gets the "high score" of 1 on each dummy variable?
To some extent, the choice of which category
to code 1 and which to code 0 is somewhat arbitrary.
If you have a conceptual reason to believe
that one group is relatively unique with respect to the dependent variable,
this reasoning takes precedence and you would code this group "1".
For example, suppose we were studying the "digital divide" and home computer ownership, and we want to create a dichotomy for race. Research on the "digital divide" suggests that White and Asian Americans are wealthier and therefore can afford more home computers. Black, Hispanic, and Native Americans are less wealthy and can afford fewer computers. Therefore we could code Whites and Asians as "1" and everyone else "0".
We would then interpret this particular dummy variable b coefficient as how many more household computers Whites and Asians owned compared with all the other groups combined (or how many fewer computers if the b coefficient is negaitive).
Another possibility is if you suspect that the group coded "1" may have higher scores on the dependent variable (conceptual issues aside), that is a justification. Most of the time, all of us have an easier time interpreting positive coefficients than we do negative coefficients (BUT conceptual issues override this.).
It is important
to be consistent with how you code dummy variables within the scope of
a single research project. For example, if I am analyzing the digital divide,
and I began by coding Whites and Asians as "1" I would stay with that coding
throughout my digital divide analysis.
|
|
|
Up until now, we have dealt with METRIC regression coefficients and what is called a PREDICTION EQUATION.
Metric prediction equations literally predict values of the dependent variable, just as we predicted weight in pounds for the earlier examples in this Guide.
Metric prediction equations are VERY widely used. Variations on them predict college grades (from high school grades and Scholastic Aptitude Test scores), gross national product, income variations, longevity in years (using your health habits and the longevity of your ancestors), and many other variables of scientific and/or economic or social interest.
However, in metric form, it is nearly impossible to compare the influence of each independent variable on the dependent variable. The different metrics of the independent variables make it like comparing "apples and oranges."
How does one year of age compare to being White/Asian versus not?
How does an extra inch of height compare to one more 15 minute weekly exercise period?
And, the answer is -- they don't compare at all.
Generally, the wider the range of categories on your independent variable, the smaller the impact of the metric b on the dependent variable. For example, the age in years variable from the Current Population Survey data has a valid adult range from 18 to 90 years, or a range of 72 years! On the other hand, the number of personal computers in the household basically ranges from 0 to 3. Obviously the standard deviation for age will be far greater than the standard deviation for the number of household computers.
Since the standard deviation of the independent variable is in the DENOMINATOR for the formula for the "b" slope term, the bigger the standard deviation of the independent variable (all else equal), the smaller the number for the "b" slope will be. Guide 8 will give you this formula for the simpler cases of one or two independent variables.
Therefore, it is very helpful to be able to STANDARDIZE the regression equation and to turn the metric regression coefficients into STANDARDIZED SLOPES or STANDARDIZED REGRESSION COEFFICIENTS.
We can directly compare standardized regression coefficients WITHIN A SINGLE EQUATION (do NOT USE THESE TO COMPARE ACROSS EQUATIONS).
Standardized regression coefficients are all in STANDARD DEVIATION UNITS of THE DEPENDENT VARIABLE, no matter what the original metric of the variable was.
Because all the regression coefficients are in standard deviation units instead of their original metric scores, we can rank them in absolute value order from largest to smallest, or from the most important to the least important in terms of how much they influence the dependent variable. The largest standardized regression coefficient (in absolute value) is the most important influence.
Because they are in standard deviation
units, standardized regression coefficients in theory can range from positive
infinity through negative infinity. However, in practice, they can be 0
(that independent variable has no effect on the dependent variable) and
they range to + or - one. Independent variables that have standardized
regression coefficients close to (+) 1 typically have a very strong
influence on the dependent variable.
|
OOPS! THAT TERMINOLOGY THING
I have mentioned in class that when I was young and naive, I really expected that mathematicians and statisticians would be consistent! They would only have one name for one thing -- not the same name for three different things or three different names for the same one thing. This, of course, would keep things clear and make it easier.
BUT, I was wrong. And, except for some very basic univariate stuff, statisticians are just as awful and inconsistent about terminology as all the other disciplines.
I mention this as a preamble because:
The two most popular designations for the standardized regression coefficient are:
So here is the terminology WE will
use for the rest of the semester:
| UNSTANDARDIZED REGRESSION COEFFICIENT = | METRIC regression coefficient |
| STANDARDIZED REGRESSION COEFFICIENT = | b* or BETA WEIGHT |
How
do you calculate a BETA WEIGHT?
1. The first way to do so is to simply standardize the scores on all the variables (including the dependent variable) you are using in the regression and turn them into Z-scores. See Guide 3 for a review if you don't remember standardized variables very clearly.
Each of your variables will now have a mean of 0 and a variance of 1.
The regression line or plane will now go through the origin and the constant term will disappear (because it will now be zero).
All your regression coefficients are now Beta Weights.
2. The second way to do so is to multiply each metric regression coefficient by the following formula.
The regression line or plane will now go through the origin and the constant term will disappear (because it will now be zero).
All your regression coefficients are now
Beta Weights.
|
1. In words, take each independent variable's unstandardized regression b coefficient.
2. Create a ratio by dividing the standard deviation of the particular independent variable by the standard deviation of the dependent variable. That's (sx ÷ sy ).
3. Now, multiply each metric b by the ratio of the standard deviation of the independent variable to the standard deviation of the dependent variable.
4. Make sure you are using the standard deviation for the independent variable in the numerator of the ratio that matches the independent variable b you are examining . (Of course, the computer programs will calculate all this for you.)
5. Make sure you do this for ALL of the predictor variables, one at a time.
6. sy is always the standard
deviation of the dependent variable in the equation and it always forms
the denominator of the ratio.
Use the beta weights:
(1) to see how relatively important
each independent variable is
(2) Use the strength chart to assess
each beta weight from very weak to very strong too.
REVIEW THE STRENGTH
CHART HERE.
|
|
This Guide has sidestepped much of the technical aspect of regression analysis.
How do we get those "a" and "b" terms in
simple regression?
How do we get the "b" terms in multiple
regression?
What are these covariances, anyway?
What's a correlation matrix?
What does it mean to say that R2
represents "the percent of variance explained" in the dependent variable?
How do we test for the statistical significance
of the R2 and of each separate metric b?
What are some of the problems that a large
Beta Weight helps diagnose?
Continue with Guide 8 to find out.
![]() |
READINGS AND ASSIGNMENTS |
OVERVIEW |
|
Susan Carol Losh November
10, 2004
This page was built with
Netscape Composer
and is best viewed with
Netscape Navigator
600 X 800 display resolution.