| OVERVIEW
DUE FRIDAY |
GUIDE
1: INTRODUCTION
GUIDE 2: CONSTRUCTING A TABLE GUIDE 3: UNIVARIATE STATISTICS AND DISPLAYS GUIDE 6: MULTIVARIATE CROSSTABULATIONS GUIDE 7: BASIC REGRESSION GUIDE 8: REGRESSION SPECIFICS GUIDE 9: SAMPLING ASSIGNMENT 4 DUE NOVEMBER 12 by 3 PM |
READINGS
AND ASSIGNMENTS
PUT IN MY MAILBOX
|
|
FALL 2004 DR SUSAN CAROL LOSH |
PLEASE NOTE THE
DUE
DATE: IT IS FRIDAY NOVEMBER
12 BY 3 PM!
|
|
OR YOU MAY ALSO EMAIL US. PLEASE DO
NOT E-MAIL AFTER 8 PM THURSDAY NIGHT.
IF YOU E-MAIL ME ON FRIDAY MORNING I WILL NOT BE ABLE TO RESPOND TO YOU. REMEMBER: NO E-MAIL ATTACHMENTS! THANK
YOU.
|
|
|
TODAY |
|
|
TURN IN |
OF COURSE YOU'LL DO COMPUTER RUNS IMMEDIATELY!
Strange things can happen...
|
(they all have them) Remember this one? One glitch that may
happen is that your probability level is a set of zeros, like this: .00
or .0000 (SDA and SPSS both do this)
Please observe the correct terminology (e.g., p < .01) in reporting your results depending on the number of zeros that appear in your output. |
In this assignment, you'll use the SDA system and the General Social Survey to see how a control variable may (or may not) alter the original correlation between two variables.
Here are the first two basic questions
we ask about a relationship between two variables:
|
This question typically examines what is
called the "statistical significance" of the association.
|
Question 2 requires you to:
If the correlation coefficient is statistically
significant and at least weak (at least |.11| in magnitude, but ideally
moderate or more), continue to question 3.
|
REVIEW
Below are the six possible outcomes to the bivariate correlation when you add a third or control variable:
First, you must decide whether ANY CHANGES have occurred in the original bivariate relationship when you examine it across the separate values of the control variable.
If changes do occur, decide which ONE of the following patterns you have:
THEN STOP!!
If you do not have extraneous, joint, or interaction effects, study the pattern of your results for one of the following:
BE SURE TO KEEP THE DIFFERENCE STRAIGHT BETWEEN:
(1) THE PROBABILITY LEVEL USED TO TEST
WHETHER THE RELATIONSHIP IS ZERO AND
(2) THE CORRELATION COEFFICIENT USED
TO ESTIMATE THE STRENGTH OF THE ASSOCIATION.
|
|
This total assignment is worth 20 points.
Correctly following all programming information for running frequency distributions on the two variables, the crosstabulation, and the difference of means test, and turning in all output = 2 points.
Although
your actual output does not count very heavily, I MUST receive your
output in order for you to receive credit on this assignment.
QUESTIONS YOU WILL NEED TO ANSWER, BASED
ON YOUR OUTPUT DATA
|
Was there any association between degree and sex in your table for the total sample? (Was the association real or accidental?)NEXT: Which correlation coefficient should you use to measure any association between year and degree? BRIEFLY, WHY should you use that particular correlation coefficient? [If you use phi, examine the table size to see if you should use Cramer's V instead.]What was that significance level for the association between degree and sex?
(Remember: a row of .00s here really means p < .01.)Was this relationship curvilinear (non-linear), monotonic, approximately linear, or couldn't you tell? How did you know?
What was the numeric value of this correlation? Describe the strength [and direction if applicable] of this association in words.
Was there any association between year and degree in your table for the total sample? (Was the association real or accidental?)What was that significance level for the association between year and degree?
(Remember: a row of two .00s here really means p < .01.)Was this relationship curvilinear (non-linear), monotonic, approximately linear, or couldn't you tell? How did you know?
What was the numeric value of this correlation? Describe the strength [and direction if applicable] of this association in words.
|
|
Use the same kind of correlation coefficient that you chose for the relationship between year and degree IN THE TOTAL SAMPLE. What were the numeric values of the correlations between year and degree for:
What are your causal conclusions about the type of relationship considering all three variables together?
Do you have an extraneous relationship, a joint relationship, an interaction effect, an intervening (indirect or mediated) relationship, a spurious relationship, or a suppressed relationship?
Remember, you can make ONLY ONE CHOICE from among the SIX outcomes immediately above. (If your correlations for Men and Women differ by at least |.10| , you have an interaction effect and that is what you call it.)
In a sentence or so, discuss the reasoning behind your decision.
Lots of questions here, but take everything step by step and you should be able to answer them all.
|
|
OVERALL
FIRST, access SDA and the General Social Survey.
SECOND, run frequencies on all variables for this computer session (year, degree, sex). It is always a good idea to click the percents box to get a idea of the relative distributions.
THIRD, you will run your assigned crosstabulations.You will select the most appropriate statistics for your data and then decide what kind of causal relationship you have.
SPECIFICALLY
1. You will run univariate FREQUENCIES on the three variables for this exercise. You will watch for out of order codes, missing data, "wild punches," and other anomalies so that you can recode these if needed or restrict the range of valid codes.
You will run frequencies on:
sex (male or female)
degree
(respondent's highest degree level, in five degrees)
year (year of study--we will only use 1972 and 2002)
2.
You will then conduct a crosstabulation of degree by sex for the total
sample. In your program, you will
request statistics and column percentages.
3.
Based on your crosstabulation,
you will select the BEST or MOST APPROPRIATE
correlation coefficient for your data. You will decide whether any apparent
association between degree and the respondent's gender is ACCIDENTAL or
REAL. You will report the level of "statistical significance" of this association.
4.
You will report the magnitude of the correlation coefficient, its direction
(if appropriate), its form (IF APPROPRIATE),
and how strong the correlation coefficient is in
words.
5.
Next, you will conduct a crosstabulation of degree by year for the
total sample. In your program,
you will request statistics and column percentages. (NOTE:
this will be part of your control variable run. It comes out as the very
last table in your output when you run the three way contingency table.)
6.
Based on your crosstabulation, you will select the BEST or MOST APPROPRIATE
correlation coefficient for your data. You will decide whether any apparent
association between year and the respondent's degree level is ACCIDENTAL
or REAL. You will report the level of "statistical significance" of this
association.
7.
You will report the magnitude of the correlation coefficient, its direction
(if appropriate), its form (if appropriate), and how strong the correlation
coefficient is in words.
8.
At the same time you conduct the crosstabulation for degree by year
for the total sample, you will request:
You will assess the statistical significance of the correlation separately for each sex, its magnitude, direction (if appropriate), its form (if appropriate), and its strength in words.
At step 8, you have conducted a multivariate
crosstabulation (sometimes called a "three-way crosstab").
For most of you it is your very first
one. Congratulations!
9.
You will decide on the general causal status of the three way crosstabulation.
Is it:
directYou may choose only ONE of these six alternatives.
joint
statistical interaction
intervening
spurious or
suppressed?
Accessing an
online database (the General Social Survey file) and the SDA system.
Running univariate
frequency and percentage distributions.
Using a selection filter
to select only two years: here, 1972 and 2002
Recoding the values
of a variable into fewer values.
Conducting a crosstabulation
to generate:
NEW FEATURES include:
|
|
|
|
|
Use the RIGHT toe of your mouse to click on this link:
|
|
When
the menu opens on the link, click on:
Open in New Window
Click on the
button to pull up the statistical program selection screen.
|
You can always switch back to this screen by clicking on the box at the very bottom of the monitor screen that reads "Assignment 4". Or you can print out the pages of Assignment 4.
Once again, you will bring up the "radio
buttons" screen to select an analytic option, in this case, Frequencies
or crosstabulation.
The
directions that you place in the SDA boxes will direct the program to perform
a series of crosstabulations.
In the Study: GSS 1972-2002
Cumulative Datafile screen that opens: first click on:
|
|
to open up the codebook window. Remember to click on the "Standard Codebook" link.
Use the radio button to select Frequencies
or crosstabulation. Then, click on
:
|
|
so that you have the SDA program window
active as well.
PRELIMINARY FREQUENCIES
REMEMBER! The first step in working
with data is to ALWAYS display the total frequencies on all the variables
that you plan to analyze for a particular research project.
|
After you have accessed the Current Population Survey data and the SDA system, be sure to run the original frequencies for:
1.sex
2. degree and
3. year
Obtain the frequencies ONLY. (This includes percentages too!) You DO NOT need measures of central tendency, dispersion, or other univariate measures for this assignment.
BE SURE TO IMMEDIATELY PRINT THESE FREQUENCIES PAGES AND INCLUDE THEM WITH YOUR OUTPUT.
See the directions below for "what goes
in the boxes" below the SDA Program box.
|
|
Pull up the window for the SDA Program Screen. Again for your review, below is the example screen for running frequencies or crosstabulations that appeared when you clicked the "START" button on the SDA beginning window.
We can be reasonably sure that time will influence educational level in the United States because people in more recent years are better educated. Most states and the federal government have tried to prevent high school dropouts and encourage high school graduates to attend college. I cannot imagine any circumstances under which one's level of education would influence year (try the "giggle factor" here if nothing else).
We can be totally sure that neither the study year nor someone's level of education will influence whether they are male or female! On the other hand, one sex or the other may have increased their educational level disproportionately over time.
FIRST, you will run the crosstabulation between degree and sex.
Row: degree
Make sex your Column: variable
REMEMBER:
Where it says Selection Filter(s):
put: year (1972,2002)
to select only respondents from 1972 and
2002.
Be sure that the "Column" box is checked under Percentaging:
Now, click on the boxes to the LEFT of:
"Statistics"
AND
"Question
text"
If you like, you can change the number
of decimal places ONLY for statistics to "4".
However there is a good
chance this won't work and you will only get 2 decimal places anyway under
statistical significance...
Click on the gray box at the bottom
left that says:
|
|
IMMEDIATELY PRINT THESE PAGES BEFORE YOU BEGIN THE THREE-WAY CROSSTABULATION RUN BELOW!
|
REQUIRED Variable names to specify |
Now for the three-way crosstabulation. You will also be able to get the BIVARIATE crosstabulation of degree by year for the total sample from this run. It will be the VERY LAST TABLE (and include all the valid cases) on the run output.
In
the Row box, type: degree
degree goes in the "row" line because it is your dependent variable.
Next to the Column: line, type: year
This will make year, your independent
variable, the column variable, per MOST conventions.
![]()
Next
to the Control: line, type: sex
Be sure to switch "sex" to the Control: line. This will make sex your control variable. This command will generate you THREE crosstabulation tables for the association between degree level and time:
Next to Percentaging: be sure the little
box to the LEFT of "Column" is checked.
(Click to check this box if it is blank.)
This WILL produce column percents because you have both rows and columns.
Now,
click on the boxes to the LEFT of:
"Statistics"
AND
"Question
text"
Remember to ONLY use the P or Pearson
Chi-Square for this assignment.
Also recall that if you choose to use
Phi for any of your correlations, that you can easily calculate it if you:
(1) Take the Pearson Chi-Square and
(2) Divide it by the casebase (n),
then
(3) Take the SQUARE ROOT of the result
at Step 2.
That's it, that's Phi.
If you want Phi-squared instead, just
stop at Step 2.
Use the Cramer's V correction in the denominator if you have a 3 by 3 size table OR LARGER. The phi formula will work for a "2 by anything" table.
Now click the gray box at the bottom left that says:
|
|
Here is your output for your first multivariate crosstabulation run using
the SDA system.
Go ahead and print your "three-way crosstab" NOW so that you have these
pages to study while you complete this assignment. They also form part
of the Assignment 4 that you will turn in to me.
|
|
My mailbox 307 Stone (you may also turn in at class 11-10-04 |
Here's
what you turn in to me by 3 PM Friday November 12 (you may add a short
explanation to your answer to any of these questions). Points for each
part are in parentheses:
Your printed output (2 points maximum) for:
| THEN, YOUR ANSWER TO QUESTIONS 1- 15 BELOW: |
(1) (2 points) Which correlation coefficient should you use to measure any association between degree and sex? BRIEFLY, WHY should you use that particular correlation coefficient? [If you use phi, examine the table size to see if you should use Cramer's V instead.]
(2) (1 point) Was there any association between degree and sex in your table for the total sample? (Was the association real or accidental?)(Was the association zero or non-zero--these three questions are equivalent.)
(3) (1 point) What was that significance level for the association between degree and sex? (Remember: a row of .00s or 1.00 here really means p < .01.)
(4) (1 point) Was this relationship curvilinear (non-linear), monotonic, approximately linear, or couldn't you tell? [This is often called the FORM of the relationship.] How did you know?
(5) (1 point) What was the numeric
value of this correlation? Describe the strength [and direction if applicable]
of this association in words.
|
|
(6) (2 points) Which correlation coefficient should you use to measure any association between year and degree? BRIEFLY, WHY should you use that particular correlation coefficient? [If you use phi, examine the table size to see if you should use Cramer's V instead.]
(7) (1 point) Was there any association between year and degree in your table for the total sample? (Was the association real or accidental?) Remember, the crosstabulation table for males and females combined will be the THIRD table in your program output.
(8) (1 point) What was that significance level for the association between year and degree? (Remember: a row of .00s or 1.00 here really means p < .01.)
(9) (1 point) Was this relationship curvilinear (non-linear), monotonic, approximately linear, or couldn't you tell? [This is often called the FORM of the relationship.] How did you know?
(10) (1 point) What was the numeric value of this correlation? Describe the strength [and direction if applicable] of this association in words.
Now, what happens when you use your control variable?
(11) (1 point) Use the same kind of correlation coefficient that you chose for part (6) above [NOT part (1)]. What were the numeric values of the correlations between year and degree for:
(13) (1 point) What were the levels of statistical significance for each of the two correlations in part (11)?
(14) (1 point) Describe the strength of each correlation (direction, form if applicable) in part (11) in words.
(15) (2 points) What are your causal conclusions about the type of relationship considering all three variables together?
Do you have an extraneous relationship, a joint relationship, an interaction effect, an intervening (indirect) relationship, a spurious relationship, or a suppressed relationship?
Remember, you can make ONLY ONE CHOICE from among the outcomes immediately above. (If your correlations for Males and Females differ by at least |.10| , you have an interaction effect and that is what you must call it.)
In a sentence or so, discuss the reasoning
behind your decision.
|
(1) the highest values for both the independent
and the dependent variables and the bottom right-hand cell contains the
lowest values for both the independent and the dependent variables OR
(2) the top left-hand cell contains the
lowest values for both the independent and the dependent variables and
the bottom right-hand cell contains the highest values for both the independent
and the dependent variables.
Obviously this is not a problem if one of your variables is nominal in the cross-tab table but it could be a problem if both your variables are at least ordinal. Because the top left-hand cell is low for both variables in this example (year = 1972 and degree = less than high school) your correlation coefficient this time around will be OK.
However, in other variable codings (e.g.,
the top left-hand cell is "high-low"), the sign of the correlation
coefficient may be reversed and the analyst must change the signs of the
correlation coefficients. Be alert to this issue in analyses you may do
later on for theses, dissertations, conference papers, reports, articles,
etc. and how your independent and dependent variables are coded. Remember
that computers are robots and they go by table position, they don't really
know which cells you meant were the "high" cells. SPSS and
SAS follow similar conventions for tables.
|
|
READINGS AND ASSIGNMENTS |
OVERVIEW |
|
Susan Carol Losh November
2, 2004
This page was built with
Netscape Composer
and is best viewed with
Netscape Navigator
600 X 800 display resolution.