|
FALL 2004 DR SUSAN CAROL LOSH |
|
This assignment is worth 5 PERCENT toward
your final grade.
Remember! I use plus and minus grading
on assignments and for the final grade.
Again, despite the "AQ" (Anxiety Quotient) , most people did quite well. The median score was 20/20 and the mean was 18.97 points. (Please see me if those terms are unfamiliar.) Very consistent with scores on the earlier assignments.
REPEAT: Most of us get nervous when we learn new material. There is that ghastly feeling of not quite having one's feet on the floor. But, as you know by this time, such a feeling dissipates with practice.
Are you a "regression whiz," ready
to tackle the most complex elaborations and complications on regression?
Nope. Should you be able to read (or perform) basic analyses and interpret
your own or others' basic analytic results? You should.
|
|
As questions 1 and 2 requested, you only examined how the dependent variable, educ, related to sibs, family16 and mother's educational level, because educ is what you want to predict using sibs, family16 and mother's educational level as independent variables. You observed but did NOT include the intercorrelations among the INDEPENDENT variables in this question.
You used only the bivariate rs for this question and no other measures, not Bs, not Beta weights.
Bivariate correlation (r) of educ (respondent's education) with:
| VARIABLE | NUMERIC VALUE | STRENGTH | DIRECTION |
| Sibs |
|
|
|
| Family16 (1 = 2 parent) |
|
|
|
| Mother's education |
|
|
|
You did NOT say that any of the BIVARIATE correlations were zero or nonzero, because your correlation output in the regression package did not tell you statistical significance, and we have no way of knowing it without a formal statistical test of the bivariate correlation. (They exist, but we did not do them on this assignment. The correlations program of SDA will include statistical significance tests for the bivariate rs. SPSS is similar to SDA is this respect.)
We CANNOT interpolate multivariate results to bivariate results, i.e., if the B for family16 were zero, that does NOT mean the bivariate correlation is zero. It might be--and it might not be. Often, a bivariate correlation is larger and statistically significant, but the B, which is a NET result, controlling all the other variables in the equation is not statistically significant. That's a big reason we do multivariate analyses in the first place.
Similarly, we certainly cannot interpolate bivariate results to multivariate results (the Bs) because the Bs are multivariate effects that are net of the statistical controls of all the other variables in the regression equation. In fact, we often expect that the bivariate correlations will be larger than the partial (controlled) correlations.
Yes, dummy variables
DO have a direction. In the case here of a dichotomy, they tell
us about mean differences on the dependent variable by the two categories
of the independent variable. A positive correlation between family16
and years of education means that adults who were adolescents in two parent
families have more years of education--just as this also appears for the
B and beta coefficients.
You correctly identified the percentage variance explained in educ by sibs, family16 and mother's years of education as 16.4% because you realized that R2 X 100 = the percent variance explained in the dependent variable.
You identified R2 as "real," NOT because its value was .164 (real but weak) or NOT because the case base was large but BECAUSE the probability level associated with this R2 was less than .001--and for no other reason than a formal statistical associated F-test.
DO NOT TRUST SAMPLE VALUES without substantiating evidence (such as a probability level). Many sample values "look real," that is, they LOOK nonzero, when in fact they are simply sampling fluxuations around zero no matter how large they appear. Yes, as novices and beginning analysts, we may "eyeball" some results, and it is true that small results are very often statistically significant with large samples of 1000 or more. However, when the formal statistical test (in this case, an F-ratio) is right there on the output, this is what to use.
The numeric regression equation for this
problem was:
|
By convention, the constant term (10.652)
goes
first.
By convention, the value of the b coefficient
procedes the variable designation (.627 X2)as you see
it above.
Be sure to include the variable names,
otherwise
we can't know which slope goes with which variable. (Some people left them
off; typically you lost 1 point of credit if you did.)
Be sure to include the constant. It is what makes this equation a prediction equation.
The metric regression Bs were what needed to be in this equation. Not the bivariate rs, not the beta weights, and not the probability levels. This is a prediction equation. You lost 1 to 2 points credit, depending on what else you put.
And you noted that the Bs for all three independent variables (and the constant term too) were statistically significant at the p <.001 level.
PARENTHETICAL NOTE: I generally suggest
including ALL terms in the initial predictive regression
equation, even if some are not statistically significant. Each term had
a conceptual reason for being there, otherwise it wouldn't have been in
(your) regression in the first place.
|
DESCRIBING THE RESULTS IN WORDS GAVE THE MOST TROUBLE
Each additional year of mother's education increased one's own years of education by .266 years (about one-quarter of a year), controlling for the number of siblings and family composition at age 16.
If you think about it, this really is a substantial finding as well as a statistically significant one. Every four years of a mother's education translates into a one year increment in the adult child's completed schooling. Considering that educational levels in the USA rose nearly SIX YEARS on the average from 1900 to 2000, mother's education makes a very substantial contribution to child's education. As Peter Blau and Otis Dudley Duncan asserted nearly 40 years ago, providing for a child's schooling is apparently a major way in which occupational and social class inheritance "works" in modern industrial societies.
Controlling mother's education and number of siblings, adults who were in two parent families at age 16 had about two-thirds of a year more education .627 than those who grew up in other circumstances.
For each additional brother or sister,
net of family16 and mother's education, the individual had about 1/6 of
one year less education (-0.178). Thus, adults who had had 6 siblings had,
on the average, one year of education LESS than only children (who had
no siblings).
|
One problem was that a few students confused
metric Bs with standardized Beta Weights. A couple of others confused metric
Bs with correlation coefficients.
How the standardized regression coefficients (the Betas) influenced educ (respondent's years of education):
| VARIABLE | RANK | BETA NUMERIC VALUE | STRENGTH | DIRECTION |
| Ma-educ |
|
|
|
|
| Sibs |
|
|
|
|
| Family16 |
|
|
|
|
You used the Beta Weights to rank order the independent variables in terms of their influence on educ BECAUSE:
Beta Weights are standardized. They come
out in standard deviation units.
Therefore, you can use them to directly
compare the net effects of the independent variables WITHIN a single
equation because the metric is the same now for all independent variables,
just as it was when we did z-scores. The new metric is "standard deviation
units".
Use the ABSOLUTE VALUE OF THE BETA WEIGHT to rank order their effects, then add back the positive or negative sign.
Use the metric Bs to compare across groups or populations, or to have a prediction equation.
You don't chose Beta Weights (or Bs) because
they are "more realistic" (I'm not sure what that means, both the Bs and
Betas are realistic because they are both based on observed data) or more
moderate, or any other subjective criteria. The Bs have a special role
to play in regression analysis and the Beta Weights have a different special
role to play in regression analysis.
|
|
You
used the correlations among the INDEPENDENT variables for question 2.
Your
output was not complete.
You
said the R2 was real or not real because of its size (.164)
or the sample size, instead of looking at the probability level. WE
TEST FOR STATISTICAL SIGNIFICANCE BECAUSE SAMPLE RESULTS CAN BE MISLEADING.
They
can look nonzero even when they really are sampling fluxuations around
a true population value of zero. Conversely, in large samples, it is not
unusual to find a small R2 that is highly statistically significant.
Use the F-test results and the p-level to see if R2 is statistically
significant or not.
You
left out the "X"s in the regression equation (it was OK to put variable
names such as age or gender; it was NOT OK to totally omit the identity
of each independent variable.)
You
didn't use the metric B slopes in the estimated regression equation but
instead put Beta Weights, correlation coefficients or some other number
that was not a metric B.
You interpreted metric B slopes as percentages, likelihoods, rates, ratios,
or correlation coefficients. You did not use the unit of the dependent
variable to describe the Bs.
You
didn't rank order the effects of the independent variables using the beta
weights.
You
put the probability levels instead of either the Bs or the Beta Weights
(or conversely, you put the Bs or Beta Weights instead of the p-levels
when asked.) Remember, you must be able to tell these entities apart when
you read in order to assess the author's conclusions!
You
thought the beta weight was a correlation coefficient (it is NOT; it is
a standardized regression coefficient.) You thought the beta weight was
a probability (it IS NOT).
You
messed up the decimal places. This may seem trivial, but, in fact, is important.
A correlation coefficient of 0.06 is negligible. A correlation of 0.60
is strong. Confusion over decimal places could lead you to believe results
were statistically significant when they were not, or vice versa; or, this
confusion could mean the Beta Weights were misinterpreted. Assess whether
the confusion over decimal places was a careless error or if you really
have trouble knowing the difference. I consider the latter a mild form
of math dyslexia, but it CAN be overcome with help and with practice.
Although there were some errors, most were "novice errors," that is, you will be unlikely to repeat them with more practice. 77% were 19 or 20 point papers! Congratulations!
For those continuing on to EDF5401,
you will see many of these terms again! Good luck!
|
THERE WILL BE A SIMILAR PROBLEM ON EXAM 3. LOOK FOR ANOTHER REGRESSION EXAMPLE ON THE EXAM 3 STUDY GUIDE. |
|
|
|
OVERVIEW |
|
November 30 2004
This page was built with
Netscape Composer.
It is best displayed in
Netscape Navigator,
600 X 800 display resolution.
Susan Carol Losh