|
FALL 2004 DR SUSAN CAROL
LOSH
|
|
|
YOU MAY EMAIL US. HOWEVER, PLEASE DO NOT E-MAIL AFTER 8 PM TUESDAY NIGHT OCTOBER 19. IF YOU E-MAIL ME WEDNESDAY MORNING I WILL NOT HAVE TIME TO RESPOND TO YOU. REMEMBER: I DO NOT
TAKE E-MAIL ATTACHMENTS! THANK YOU.
|
|
|
TODAY |
|
|
|
This assignment gives you practical experience on the first two basic questions we ask about a relationship between two variables:
#1.
Is
the relationship zero or nonzero in the population?
This question typically examines what is called the "statistical significance" of the association. It requires you to make an inference about whether the association that you observed in your sample is basically zero, or something larger in absolute value (either positive OR negative) than zero, in the population.
You answer this question by looking at the probability level (p) associated with the correct probability density function, such as Chi-Square, the t, or the F distributions. In classical statistics we begin with the null hypothesis that there is no relationship between two variables. This draws a pdf with associated probabilities for each outcome.
If the probability level is very small of getting our results by chance if there is no relationship (typically 0.05 or smaller), we conclude the relationship in the population is probably REAL (nonzero) and we reject the null hypothesis of no association.
If the probability level is relatively large (0.06 or larger), that means the odds are high that the value of the association is zero in the population and that your observed sample results are an ACCIDENT of the random laws of chance (and there is no relationship at all).
#2.
How strong is the relationship?
This is the issue
of effect size, strength, or "substantive significance."
Question 2 requires you to:
Select the BEST and MOST APPROPRIATE correlation coefficient for your data from among the many that are available. Then:
assess the strength of that correlation coefficient.
In this assignment, you will also decide whether it is better to use a crosstabulation table to examine the relationship between two particular variables, or whether in this case you should use a difference of means test.
|
IMPORTANT! CRITICALLY IMPORTANT. PLEASE MEMORIZE THE PARAGRAPH BELOW.
|
The value of the correlation coefficient
can be found under the names, such as Phi, Tau-b, r, or eta.
|
In this assignment, you'll use the SDA and the General Social Survey to:
(1) analyze and interpret a bivariate
crosstabulation table and
(2) using the SAME TWO VARIABLES, examine
the difference between two groups with a difference of means test.
Your independent variable is study year, 1980 or 2002. Your dependent variable is "adults," or the number of adults in the individual's household.
Why might we expect a difference by year?
Since 1980 in the United States, marriage rates have dropped and divorce
rates have risen (if you find this hard to believe, after you have finished
the assignment run below, do a second computer cross tabulation run
and substitute "marital" as the row variable for "adults".) As a result,
the number of household adults may have dropped.
|
|
|
This total assignment is worth 20 points.
Correctly
following all programming information for running frequency distributions
on the two variables, the crosstabulation, and the difference of means
test, and turning in all output = 2 points.
Although
your actual output does not count very heavily, I MUST receive your output
in order for you to receive credit on this assignment.
Is
the variable "year" nominal, ordinal, interval or ratio?
Is the variable "number of adults" (adults)
nominal ordinal, interval or ratio? (1 point for both total)
Looking
first at the crosstabulation table, which correlation coefficient is the
MOST
APPROPRIATE to use to describe the relationship between study year
and the number of household adults? Why? (2 points)
Referring
to this correlation coefficient that you chose, did you have an association
between study year and the number of household adults? (That is,
was this association REAL or was it an ACCIDENT?) (2 points)
What
was the level of statistical significance or p-level (type one or
"alpha" error) associated with this correlation coefficient? (Give the
probability numeric value.) (2 points)
What
was the numeric value of the correlation coefficient you chose
between study year and the number of household adults (whether
it was statistically significant--or not)? (1 point)
Characterize
the strength (and direction, if appropriate) of this correlation
in words. (1 point)
Turning
now to the difference of means test for the number of household adults
by study year,
was your estimate of the difference by year ZERO or
some difference in ABSOLUTE VALUE THAT WAS GREATER THAN ZERO? (2 points)
What
was the level of statistical significance or p-level (type one or
"alpha" error) associated with this difference of means test? (Give the
probability numeric value.) (2 points)
What
was the numeric value of the actual t-test difference in means of
the number of household adults between 1980 and 2002 (whether it was statistically
significant--or not)? (1 point)
This is the t difference (not the
difference between the original mean scores).
What
was the value of ETA (NOT eta-squared)? (1 point)
Characterize
the strength of eta (NOT eta-squared) in words. (1 point)
In
this case, which do you think is more appropriate to use to assess
the relationship between study year and the number of household adults:
the crosstabulation table or the difference of means test (or do you want
to use both)? GIVE THE RATIONALE BEHIND YOUR DECISION. (2 points)
|
|
1. You will run FREQUENCIES on the two variables for this exercise because you ALWAYS check the frequencies on all the variables that you plan to use at the beginning of any analysis session. You will watch for any out of order codes, missing data codes, substantial amounts of missing data, "wild punches," and other anomalies so that you can recode these if needed or restrict the range of valid codes.
You will run frequencies on:
year
(1980
and 2002 ONLY) and
adults (the number of adults in the household)
2.
You will then conduct a crosstabulation of adults by year. In your program,
you will request statistics and column percentages.
3.
Based
on your crosstabulation, you will select the BEST or MOST APPROPRIATE
correlation coefficient for your data. You will decide whether any apparent
association between year and the number of household adults is ACCIDENTAL
or REAL. You will report the level of "statistical significance" of this
association.
4.
You will report the numeric magnitude of the correlation coefficient, its
direction (if appropriate), and how strong (in words) the correlation coefficient
is.
5.
Next, you will conduct a difference of means test, using "adults" as the
dependent variable and "year" (1980 and 2002 ONLY) as the independent variable.
(Changes
over time might influence the number of adults in the household but I hope
no one believes that the number of adults in the household influences what
year it is...)
6.
You will decide if there is a nonzero difference (estimated for the population,
of course) across time for the number of household adults. You will report
the probability level or statistical significance of this difference.
7.
You will report the time difference in mean number of adults per household
and the value of eta for this analysis. You will assess the numeric magnitude
and strength in words of the eta correlation coefficient.
8.
You will decide whether it is more appropriate to use the crosstabulation
table to present this analysis, the difference between means test, or whether
either presentation would be appropriate to assess the relationship between
the two variables, study year and the number of household adults. You will
give a brief rationale for your choice of analysis method.
WHEW! It sure looks like you are doing
a LOT. However, the assignment looks like more than it really is because
I broke down exactly what you are expected to do step by step. These
procedures, decisions, and reports are very typical of the steps that researchers
follow when they want to see whether group differences (e.g., across time)
exist.
(1) Conducting a crosstabulation to generate:
(4) Assessing the numeric magnitude of the relationship between two variables.
(5) Assessing the strength in words of the relationship between two variables.
(6) Looking at the difference of means across categories of your independent variable.
(7) Deciding whether a crosstabulation
table or a difference of means test is better to describe the association
between your two variables.
|
|
|
|
|
Use the RIGHT toe
of your mouse to click on this link:
|
|
When
the menu opens on the link, click on:
Open in New Window
Click on the
button to pull up the statistical program selection screen.
|
You can always switch back to this screen by clicking on the box at the very bottom of the monitor screen that reads "Assignment 3". Or you can print out the pages of Assignment 3.
Once again, you will bring up the "radio buttons" screen to select an analytic option, in this case, Frequencies or crosstabulation. Only this time, the directions that you place in the SDA boxes will direct the program to perform a crosstabulation instead of frequencies.
In the Study: GSS 1972-2002
Cumulative Datafile screen that opens: first click on:
|
|
to open up the codebook window. Remember to click on the "Standard Codebook" link.
Use the radio button to select Frequencies
or crosstabulation. Then, click on
:
|
|
so that you have the SDA program window active as well.
|
|
REMEMBER! The first step in working with data is to ALWAYS display the total frequencies on all the variables that you plan to analyze for a particular research project.
After you have accessed the General Social Survey data and the SDA system, be sure to run the original frequencies for:
1. year (see below for the Selection
Filter box instructions)
2. adults
Obtain the frequencies ONLY. You
DO NOT need measures of central tendency, dispersion, or other univariate
measures for this assignment.
|
We will ONLY examine respondents who were studied in the years 1980 and 2002.
BE SURE TO IMMEDIATELY PRINT THESE PAGES AND INCLUDE THEM WITH YOUR OUTPUT.
In the Extra Codebook Window, you can look
at the specifics for "year" and "adults." If you click on the blue
underlined shorthand abbreviation or mnemonic, you can also
examine the basic univariate frequency distribution for that particular
variable, including missing value codes and frequencies. The SDA program
sometimes, BUT DEFINITELY NOT ALWAYS, omits the cases with missing values
when it executes an analysis. That is one reason why you have to run the
frequencies for year and adults to make sure what is and is not included
when the program actually runs. For example, you are led to think there
will be people with "8 or more" adults in the household. However, these
values only range from 1 to 7 for the years 1980 and 2002. (HINT: you couldn't
have a household of "0" in the sample, although some could occur in the
population, because then there would be no adults to interview. What does
this imply for the level of measurement for the variable "adults"?)
|
|
Pull up the window for the SDA Program
Screen. Again for your review, below is the example screen for running
frequencies or crosstabulations that appeared when you clicked the "START"
button on the SDA beginning window.
|
REQUIRED Variable names to specify Other options Chart options |
In
the Row box, type: adults
(You can just copy and paste this command
into the screen.)
"adults" goes in the "row" line because
it is your dependent variable.
![]()
Next
to the Column: line, type: year
This will make year, your independent
variable, the column variable, per MOST conventions.
Add
the words: year (1980,2002) in the Selection
Filter(s): box
Next to Percentaging: leave the little
box checked to the LEFT of "Column". (If
the checkmark is missing, just click it to make the check mark appear.)
This really WILL produce column percents this time, because you now have both rows and columns.
Now,
click on the boxes to the LEFT of:
"Statistics"
AND
"Question
text"
Now that you have two variables, the "Statistics" command will produce Chi-Square and several correlation coefficients.
ONLY use the P or Pearson Chi-Square for this assignment, please.
The program does not calculate Phi. However, Phi is very easy to calculate. In case you feel that Phi is the best measure to use for this exercise, to produce Phi, just:
(1) Take the Pearson Chi-Square and
(2) Divide it by the casebase (n or
N), then
(3) Take the SQUARE ROOT of the result
at Step 2.
That's it, that's Phi.
If you want Phi-squared instead, just
stop at Step 2.
If
you like, select "bar chart" under Chart Options (otherwise, just select
"no chart"). If the chart picture gives any trouble loading, just go ahead
and select "no chart" to solve the problem.
Blink once, then click on the gray box at the bottom left that says:
|
|
Blink again. Here is your output for your first crosstabulation run using
the SDA system.
The various correlations and Chi-squares
appear right below the bivariate table.
REMEMBER: ONLY USE THE "P" OR PEARSON
CHI-SQUARE. (The "LR" Chi Square is something called the "Likelihood
Ratio" Chi-Square statistic. While it will produce similar results to the
Pearson Chi-Square IN LARGE SAMPLES, this is a different entity.)
Go ahead and print your "crosstab" NOW so that you have these pages to
study when you complete this assignment. They also form part of the Assignment
3 that you will turn in to me.
(1) Is the variable "year" nominal,
ordinal, interval or ratio?
Is the variable "number of adults"
nominal ordinal, interval or ratio?
(2) Looking at your crosstabulation table, which correlation coefficient is the MOST APPROPRIATE to use to describe the relationship between study year and the number of adults? Why? (You may have to calculate Phi if you select Phi as the most appropriate coefficient. That takes about one-half minute with a modern calculator that does square roots.)
(3) Referring to this correlation coefficient that you selected, did you have an association between study year and the number of adults? (That is, was this association REAL or was it a sampling ACCIDENT?)
(4) What was the level of statistical
significance or "p-level" (type one or "alpha" error) associated with this
correlation coefficient? Give the probability numeric value.
|
(5) What was the numeric value of the correlation coefficient you selected for the association between year and the number of household adults (whether it was statistically significant--or not)?
(6) Characterize the strength (and direction, if appropriate) of this correlation in words. (Guide 5 will have a chart for approximate strength in words.)
CLICK HERE TO SEE CHART.
|
|
Remember that SDA does not use a t-test to assess the difference between two groups. Instead it uses an F distribution and something called a "one-way ANOVA" or "one-way analysis of variance." ("One-way" means just ONE independent variable.)
DO NOT WORRY!
Remember: the ANOVA and the t-test are statistical cousins.
Simply take the value of the "F" that appears in the "Analysis of Variance" right below the table of means and then:
TAKE THE SQUARE ROOT OF THE F WITH A
POCKET CALCULATOR. (This takes one-half minute.)
The result is the absolute value of the
t-test (that means the value on your calculator is always positive).
Your associated probability with the F is the same probability that you will use for the t-test, so you don't have to do anything else here.
The square root of the Eta_sq (third column from the LEFT ) is Eta, a correlation coefficient for the association between one nominal variable (that's why Eta is always positive--nominal variables do NOT have a direction) and one interval variable. (This will be the same value of eta you will see in the correlation coefficient section of your crosstabulation output.) You can use eta with an ordinal (or even an interval or ratio) independent variable too, because if you can use a statistic with nominal data, you can use that statistic with any kind of data.
Here is the main SDA screen again. This
time, click the radio button on Comparison of means.
|
|
The passing of time might influence how many adults there are in the household (see earlier for reasons). Thus year is our "cause" or our independent variable.
We can be totally sure that the number of adults in the household will NOT influence what year it is. Therefore, adults is our "effect" or our dependent variable.
Type
next to Dependent:
adults
Type
next to Row: year
Add
the words: year (1980,2002) in the Selection
Filter(s): box
Keep
the "Main statistic to display:"
as
Means
Under
"Additional
statistics in each cell" click the boxes
for:
Complex std errsN of cases should already be checked (but check it if the box is not checked)
Std deviations
This will give you several basic statistics
for each year group, such as means, standard deviations, and standard errors
for 1980 and 2002 separately.
|
Check
the box next to Confidence intervals.
Leave
the Level of confidence:
at
95 percent
Under
"Other
options" Click the boxes to check them for:
ANOVA
stats Question text
Color
coding
(Do
NOT check the "Show T-statistic"
box. This is NOT the Student's
t distribution for this comparison of means, so leave this box blank. It
will simply clutter up your output--which has enough in it already.)
Click:
|
|
and examine your output.
If your output is OK, print your output from your comparison of means. Remember to include this output when you turn in your assignment.
Was
your estimate of the year difference on the number of adults in the household
in the population ZERO or some difference in ABSOLUTE VALUE GREATER THAN
ZERO?
What
was the level of statistical significance or "p-level" (type one
or "alpha" error) associated with this difference of means test? What was
the probability numeric value?
|
What
was the numeric value of the t-test difference in means of the number
of adults in the household over time (whether it was statistically significant--or
not)?
PLEASE NOTE: this is the t-test value, NOT the subtraction of one mean from another mean.
Did
you want to know whether the difference between the two sample means was
positive or negative? That's not a problem. First, obtain the absolute
t-value. If the mean for group 1 is LARGER than the mean for group 2, the
t-value is positive. If the mean for group 1 is SMALLER than the mean for
group 2, the t-value is negative.
What
was the value of ETA (NOT eta-squared)?
Characterize
the strength of eta (NOT eta-squared) in words. (Use the Guide 5
chart).
CLICK
HERE TO REVIEW CHART.
|
|
Here's
what you turn in to me by class Wednesday October 20 (you may add a short
explanation to your answer to any of these questions):
Your printed output (2 points) for:
1. Is the variable "year" nominal, ordinal,
interval or ratio?
Is the variable "number of adults" nominal
ordinal, interval or ratio? (1 point total)
2. Looking first at the crosstabulation table, which correlation coefficient is the MOST APPROPRIATE to use to describe the relationship between year and the number of household adults? Why? (2 points)
3. Referring to this correlation coefficient,
did you have an association between year and the number of household adults?
(That is, was this association REAL or was it an ACCIDENT?) (2 points)
|
4. What was the level of statistical significance or p-level (type one or "alpha" error) associated with this correlation coefficient? (Give the probability numeric value.) (2 points)
5. What was the numeric value of the correlation coefficient between year and the number of adults (whether it was statistically significant--or not)? (1 point)
6. Characterize the strength (and direction, if appropriate) of this correlation in words. (1 point)
7. Turning now to the difference of means test for the number of adults over time, was your estimate of the 1980 versus 2002 difference ZERO or some difference in ABSOLUTE VALUE GREATER THAN ZERO? (2 points)
8. What was the level of statistical significance or p-level (type one or "alpha" error) associated with this difference of means test? (Give the probability numeric value.) (2 points)
9. What was the numeric value of the t-test difference in means of the number of adults between 1980 and 2002 (whether it was statistically significant--or not)? (1 point) NOTE: This is not the difference between the original means for 1980 and 2002, it is the value of the t-statistic.
10. What was the value of ETA (NOT eta-squared) in the comparison of means analysis? (1 point)
11. Characterize the strength of eta (NOT eta-squared) in words. (1 point)
12. In this case, which do you think is more appropriate to use to assess the relationship between year and the number of household adults: the crosstabulation table or the difference of means test (or do you want to use both)? GIVE THE RATIONALE BEHIND YOUR DECISION. (2 points)
![]() |
READINGS AND ASSIGNMENTS |
OVERVIEW |
|
Susan Carol Losh October
10, 2004
This page was built with
Netscape Composer
and is best viewed with
Netscape Navigator
600 X 800 display resolution.