THERE
IS NO CLASS WEDNESDAY NOVEMBER 24. HAVE A HAPPY THANKSGIVING!
|
FALL 2004 DR SUSAN CAROL
LOSH
|
EXAM GUIDE 3 WILL BE AVAILABLE HERE
PLEASE NOTE THE EXAM 3 DATE: WEDNESDAY
DECEMBER 8 5:30 PM
|
|
I WILL BE IN MONDAY AFTERNOON 3:30-5:00 PM NOVEMBER 22 & NOVEMBER 29 MARIA HAS OFFICE HOURS TUESDAY 3:30-5:00. CHECK WITH EMAIL FOR RYAN OR YOU MAY ALSO EMAIL
US. HOWEVER, PLEASE DO NOT E-MAIL AFTER 8 PM SUNDAY NIGHT.
IF YOU E-MAIL ME MONDAY MORNING I WILL NOT HAVE ANY TIME TO RESPOND TO YOU. REMEMBER: I DO NOT TAKE E-MAIL ATTACHMENTS!
THANK YOU.
|
|
|
TODAY |
|
|
|
OF COURSE YOU'LL DO COMPUTER RUNS IMMEDIATELY!
WE RECOMMEND YOU HAVE YOUR OUTPUT PRESENT DURING MONDAY CLASS NOVEMBER
22 & 29 FOR THE REGRESSION SEGMENT OF THE COURSE.
|
This also holds for your regression assignment. One glitche that may happen is that your probability level is a set of zeros, like this: .00 or .0000 (SDA and SPSS both do this)
In order to receive full credit, please observe the correct terminology (e.g., p < .01) in reporting your results depending on the number of zeros that appear in your output. |
In this assignment, you'll use the SDA system and the 2002 General Social Survey to see how three independent variables differentially affected an individual's highest year of education (educ).
|
REVIEW
This assignment requires you to draw upon considerable amounts of previous course material.
#1.
Examining
means and standard deviations.
#2.
Examining
bivariate correlation coefficients (Pearson's r)
#3.
Assessing
the strength and direction of correlation coefficients in words.
BUT THERE ARE SEVERAL NEW ELEMENTS
#1.
Examining
a regression equation that has three independent variables.
#2.
Describing
how each of the three independent variables influenced the dependent variable
in
words.
#3. Assessing whether the entire regression equation was statistically
significant (i.e., was at least one B reliably different from zero?)
#4. Deciding how strong the combined total effect of the three
independent variables was on the dependent variable, educ.
#5. Deciding whether each separate regression B was statistically
different from zero or essentially zero within sampling error.
#6. Ranking each independent variable from most to least important in terms
of how it influenced the years of respondent education.
#7. Assessing the relative net strength of
each of the independent variables on predicting the years of respondent
education.
|
|
This total assignment is worth 20 points.
Correctly following all programming information for running frequency distributions and all the regression statistics, and turning in all output = 2 points.
Although
your actual output does not count very heavily, I MUST receive your output
in order for you to receive credit on this assignment.
|
|
You will describe the numeric values of the three zero-order (bivariate) Pearson r correlations between each of your independent variables and your dependent variable educ. (Include DIRECTION: the positive or negative sign.)
You will describe the strength and direction of each of these three bivariate correlations above in words.
Overall, how much variance (i.e., R2) did you predict in educ with your three independent variables?
Was that value of R2 statistically significant (that is, "REAL")? What was the "significance" level or the probability level of the R2 for this regression?
You will write out the estimated numeric regression equation for educ using your independent variables of sibs, family16, and maeduc.
What was the probability level (significance level) for the effect of each independent variable [those are found through the t-statistic for each separate B] on educ?
You can construct a chart showing the dependent variable, educ, and the independent variables, Bs, beta weights, and significance levels. This is a short way to present the numeric results but please note that such a chart cannot substitute for describing the effects in words. (Chart examples are presented in Guide 8.)
You will interpret each of the effects of sibs, family16, and maeduc on educ in words [i.e., how much the years of respondent education rises or falls for a one year change in mother's education or how many more or fewer years of respondent education there were if the person grew up in a two parent family as opposed to an "other" situation.]
You will describe the relative impact (the BETA weight) of each of your independent variables on educ and describe these results numerically for each independent variable, the strength (remember to use our STRENGTH chart!), and the direction of each beta weight on educ.
You'll rate each independent variable from most to least important in terms of how much each predictor influences the dependent variable and decide whether to use the the "B"s or the "Beta Weights."
And, YES, you really should be able to do all this by November 29!
Take each question step by step and you should be able to answer them all.
|
|
OVERALL
FIRST, access SDA and the 2002 General Social Survey.
SECOND, run frequencies on all variables for this computer session (educ, sibs, family16, and maeduc). Click the percentages box to get a idea of the relative distributions. You don't need to run measures of central tendency or variation for this assignment.
REMEMBER:
Where it says Selection Filter(s):
put: year (2002)
to select only respondents from 2002.
YOU WILL NEED TO RECODE SIBS AND FAMILY16
WHEN YOU DO THE REGRESSION RUN.
See below for directions.
THIRD, you will run your regression program. You will interpret several statistics about the regression equation for educ.
SPECIFICALLY
1. You will run FREQUENCIES on the four variables for this exercise because you ALWAYS check the frequencies on all the variables that you plan to use at the beginning of any analysis session. You will watch for out of order codes, missing data, "wild punches," and other anomalies so that you can recode these if needed or restrict the range of valid codes.
You will run frequencies on:
family16 (the respondent's living situation when s/he was 16 years old)
maeduc
(respondent's mother's YEARS of education)
sibs
(the respondent's NUMBER of brothers and sisters)
educ (number of respondent's YEARS of education)
When you actually do the regression, you will have your first experience with a "dummy variable" or a dichotomized variable coded only into the categories 0 and 1. This will be the variable "family16" that will be recoded to Two parent family (1) and Other (0).
2.
You will then conduct a multiple regression. Your
dependent variable will be educ. Your
independent variables will be number of sibs, family16, and years
of maeduc.
3.
Based on your results, you will
decide on the statistical significance of the total regression equation,
its substantive importance, and the statistical and substantive significance
of the NET effect of each of the independent variables on educ.
4.
You will examine the relative NET magnitude of the influence of each
independent variable on the years of respondent education, and decide
which type of regression coefficient (B or Beta) is the more appropriate
kind of regression coefficient to use to present your results.
Accessing an
online database (the GSS file) and the SDA system.
Running univariate
frequency and percentage distributions.
Assessing the statistical
significance of an association between two variables.
Assessing the magnitude
and direction of the relationship between two variables.
Filtering for the study
year (2002 ONLY).
NEW FEATURES include:
Conducting a multiple regression analysis.
Assessing R2 and the F-Test
for regression.
Assessing metric Bs and the t-test for
each B.
Using the Beta Weights to assess the relative
net impact of each independent variable on your dependent variable.
|
|
|
|
|
Use the RIGHT
toe of your mouse to click on this link:
|
|
When
the menu opens on the link, click on:
Open in New Window
Click on the
button
to pull up the statistical program selection screen.
|
You can always switch back to this screen by clicking on the box at the very bottom of the monitor screen that reads "Assignment 5". Or you can print out the pages of Assignment 5.
Once again, you will bring up the "radio buttons" screen to select an analytic option. First, you will click on"Frequencies or crosstabulation", to run the frequencies on family16, sibs, maeduc, and educ.
In the Study: GSS 1972-2002
Cumulative Datafile screen that opens: first click on:
|
|
to open up the codebook window. Then,
click on :
|
|
so that you have the SDA program window
active as well.
|
|
REMEMBER! The first step in working
with data is to ALWAYS display the total frequencies on all the
variables that you plan to analyze for a particular research project.
|
After you have accessed the General Social Survey data and the SDA system, be sure to run the original 2002 frequencies for:
1.
family16
2. maeduc
3.
sibs and
4. educ
Obtain the frequencies and click the Column box on Percentaging: You DO NOT need measures of central tendency, dispersion, or other univariate measures for this assignment, just the frequencies and the percents.
The percentages will be the BOLD numbers in each cell.
REMEMBER:
Where it says Selection Filter(s):
put: year (2002)
to select only respondents from 2002.
BE SURE TO IMMEDIATELY PRINT THESE PAGES AND INCLUDE THEM WITH YOUR OUTPUT.
Remember in the Standard Codebook to click on the blue underlined shorthand abbreviation or mnemonic to examine the basic univariate frequency distribution for the particular variable of interest, including missing value codes and frequencies. Recall that the SDA program sometimes, BUT DEFINITELY NOT ALWAYS, omits the cases with missing values when it executes an analysis.
After you have run the original frequencies, here is what you will do in the context of your regression computer run:
You will create a "dummy variable" for family16. A "dummy variable"
is a dichotomy where the cases are ONLY coded as zero or one. For
this exercise, the category "Mom-Dad" (two parent family) will be coded
1 and "Other" will be coded 0 because individuals from two parent
families may have been able to afford more education more easily.
You
will truncate the values for "sibs" which can range into the twenties
into the values 0-10 by using a recode statement in the context
of the regression run. This will prevent outlying values (e.g., 28 brothers
and sisters) from having a disproportionate effect on the results.
|
|
The first thing to do is to return to the Study: GSS 1972-2002 Cumulative Datafilewindow.
![]()
![]()
This time, you will click on the Multiple
Regression radio
button and this will pull up a totally new screen for you that will look
like the screen below:
|
|
Independent: (You can tab from one input box to the next)
Other statistics:
Change number of decimal places to display: For coefficients: For t-tests: For F-test: For univariate stats: For correlation matrix: For covariance matrix: More independent variables
|
You will see that there are MANY boxes for independent variables (FIFTY TWO boxes if you scroll down to the bottom of the screen!) We will only use the first three boxes. Be sure that you place specifications ONLY in boxes 1, 2, and 3 going across the row.
In box #1 type: sibs (r:0=0;1=1;2=2;3=3;4=4;5=5;6=6;7=7;8=8;9=9;10=10-28 "10 or more")
Yes, this really will all fit in the little box. If you prefer, you can just cut and paste the line beginning with "sibs" and including the recode statement into the little box (make sure to copy it all, including the closing parenthesis.) Sibs can go over 30 brothers and sisters (although not very often) so you will recode it to a manageable number to avoid outliers.
In box #2 type: family16 (r:1=1-3 "Mom - Dad";0=0,4-9 "Other")
This statement will also fit into the little box. Again, you can cut and paste this line beginning with "family16" if you prefer. This will recode those individuals growing up with two parents as "1" and those in all other living arrangements as "0".
In box #3 type: maeduc
You won't have to recode this variable at all, so just put the variable name.
REMEMBER:
Where it says Selection Filter(s):
put: year (2002)
to select only respondents from 2002.
Under "Other statistics:" click on the boxes to the LEFT of:
T-tests
Global
F-test
Univariate
stats (this is so you can double
check your frequency figures)
Correlation
matrix
Color coding and Question text are your option BUT.
I find the color coding helpful. Blue coefficients are negative and red coefficients are positive.
Although you may be familiar with many of the questionnaire codes for these variables, I strongly recommend that you have the codes from your univariate frequencies in front of you so that you can translate the coefficients into English words more easily while you do this assignment. This will be especially true for the "family16" dummy variable.
Click on the gray box at the bottom
left that says:
|
|
Then blink! Here is your output.
P.S. Did you notice that you can change
the number of printed decimal places all the way to 6 decimal places? However,
I don't recommend this. It won't help very much. All you will do is clutter
up the data presentation and you have enough to worry about. It won't help
with the tests of statistical significance either.
|
IMMEDIATELY PRINT THESE PAGES BEFORE YOU ANSWER THE ASSIGNMENT QUESTIONS BELOW!
|
|
FIRST
examine your univariate and bivariate statistics: the means, standard deviations,
and the correlation coefficients.
Note any unusually strong or very weak
correlations
(e.g., under |.10|
or over |.50|).
MAKE SURE YOU KNOW WHAT THE METRIC
IS OF YOUR DEPENDENT VARIABLE (pounds of weight? number of weekly workhours?
years of education? number of library books?) !
This will be the
metric you will use for the Bs. (HINT: Your dependent variable in
this exercise is years of respondent education.)
SECOND see if the overall R2 is statistically significant. Use the Global F-Test results and look at the "P" for probability level.
The null hypothesis is, Ho : R2 = 0
The alternative hypothesis is HA: R2 > 0.
Because R2 is a squared measure, it cannot be a negative number.
If the significance or probability level
for the F test is small (p < .05), then the R2 is REAL (non-zero).
Usually this means at least one B is
non-zero.
Go to step 3.
If the R2 is basically 0 (p > .05), any apparent influence of the predictors on the dependent variable is an ACCIDENT. STOP HERE IN THIS CASE! GO NO FURTHER!
THIRD
see if the STRENGTH of R2 is at least weak (.11 plus).
If yes, continue to step 4.
If R2 is smaller than .10,
your results are real but not practically important.
Interpret any Bs with extreme caution.
FOURTH NOW examine each of the Bs.
The null hypothesis for each separate B, Ho : B = 0
The alternative hypothesis is HA: |B| > 0 for a 1-tailed test
You can use a 1-tailed (1-sided) test if you predicted the direction of the effect of each independent variable on the dependent variable in advance. (Re-examine the box in the early part of this assignment.)
or HA: B =/= 0 for a 2-tailed test
If you use a 1-tailed test, you can cut the probability levels associated with each B in half. (The program Bs are for 2-tailed tests.) That means that some of the smaller Bs in a regression will be statistically significant where they would not be were a 2-tailed test used.
B can be positive or negative.
The test for the statistical significance
of each B simultaneously tests whether the accompanying Beta Weight (BETA)
is zero, too.
Any B less than twice its own standard
error will usually have a significance level greater than .05.
(Yes, you read that right.)
This means any apparent influence of that
B is so small that it is a sampling ACCIDENT and that B is really 0.
Use a marker to note the Bs with statistical
significance p < .05.
These are REAL or nonzero.
Discuss how the statistically significant Bs raise or lower scores on the dependent variable (see my example for how pounds of weight works and follow it along: For example, for each 15 minute period a woman exercised, she would weigh 1 pound less.)
CLICK HERE TO REVIEW THE WEIGHT EXAMPLE.
FIFTH Look at the BETA weights of the SIGNIFICANT Bs. (Remember that the Bs that were not statistically significant are really 0 in the population and so are the corresponding Beta Weights.)
Rank the Beta Weights from most to least
important in terms
of absolute value size.
Discuss the strength and direction
of each statistically significant beta weight.
|
|
Here's
what you turn in to me by class Monday November 29
(you
may add a short explanation to your answer to any of these questions).
Points for each part are in parentheses:
Your printed output (2 points maximum) for:
The
global F-Test statistics and t-tests for each B
The
univariate means and standard deviations and
The
correlation matrix of all four variables in this regression.
REMEMBER TO RECODE SIBS AND FAMILY16
THEN, YOUR ANSWER TO QUESTIONS 1- 12 BELOW:
(1) (1 point) Describe the numeric values of the three zero-order (bivariate) Pearson r correlations between each of your independent variables and your dependent variable educ. (Include DIRECTION.)(Form is assumed to be linear because these are Pearson's r.)
(2) (1 point) Describe
the strength and direction of each of these three correlations from part
(1) in words.
|
CLICK
HERE TO REVIEW THE CHART ON CORRELATION COEFFICIENT STRENGTH
|
(3) (2 points) Overall, how much variance ALL TOGETHER (i.e., R2) did you predict in educ with your three independent variables?
(4) (1 point) Was the value of R2 statistically significant (that is, "REAL" or non-zero)?
(5) (1 point) What was the "significance" level or the probability level of the R2 for this regression?
(6) (2 points) Write out the estimated numeric regression equation for educ using your independent variables of sibs, family16, and maeduc.
(7) (2 points) What was the probability level (significance level) for the effect of each independent variable [those are the Bs] on educ?
You can construct a chart that shows the dependent variable (educ), and how educ was affected by each independent variable, showing the Bs, beta weights, and significance levels. This is a short way to present the numeric results but such a chart cannot substitute for describing the effects in words. (See examples in Guide 8).
(8) (2 points) Interpret each of the effects of sibs, family16, and maeduc on educ in words [i.e., how much the years of respondent education rose or fell for a one year change in maeduc, or how many more or fewer years of education the respondent had if they grew up in a two parent family.]
(9) (1 point) What
was the relative impact (the BETA weight)
of each of your independent variables on educ?
Describe these results numerically
for each independent variable.
(10) (1 point) Describe the strength (remember to use our chart!) and the direction of each of the beta weights on educ.
(11) (2 points) Rate each independent variable from most to least important in order in terms of how much each predictor influenced the dependent variable (remember to ignore the + or - signs while you actually rank the effects of the independent variables and use the absolute value! ).
(12) (2 points) Did
you use the "B"s or the "Betas" for part (11)?
BRIEFLY describe the reason behind your
choice.
|
![]() |
READINGS AND ASSIGNMENTS |
OVERVIEW |
|
Susan Carol Losh November
21, 2004
This page was built with
Netscape Composer
and is best viewed with
Netscape Navigator
600 X 800 display resolution.