OVERVIEW

GUIDE 1: INTRODUCTION
GUIDE 2: CONSTRUCTING A TABLE
GUIDE 3: UNIVARIATE STATISTICS AND DISPLAYS
GUIDE 4: BIVARIATE BASICS
GUIDE 5: BIVARIATE CORRELATIONS
GUIDE 6: MULTIVARIATE CROSSTABULATIONS
GUIDE 7: BASIC REGRESSION
GUIDE 8: REGRESSION SPECIFICS
GUIDE 9: SAMPLING
TO EDF 5400 READINGS AND ASSIGNMENTS
 
 
RETURN TO  ASSIGNMENT PORTAL

 
 
EDF 5400 INTRODUCTORY STATISTICS
FALL 2004

DR SUSAN CAROL LOSH
EDUCATIONAL PSYCHOLOGY AND LEARNING SYSTEMS


 
 
 

ASSIGNMENT 3: ASSOCIATIONS BETWEEN TWO VARIABLES
CROSSTABULATIONS, CORRELATIONS, AND T-TESTS


DUE WEDNESDAY OCTOBER 20 BY CLASS
DUE TO THE INTENSE NATURE OF THIS COURSE
LATE PAPERS ARE NOT ACCEPTED
EARLY ASSIGNMENTS ARE ACCEPTED

 
LAST DAY QUESTIONS ABOUT THIS ASSIGNMENT?

YOU MAY EMAIL US. HOWEVER, PLEASE DO NOT E-MAIL AFTER 8 PM TUESDAY NIGHT OCTOBER 19.

Different e-mail providers may take a long time to deliver their mail and we may not receive it in time. We are not responsible for late delivery of e-mail by either your provider or ours, or for server viruses that slow transmission, so please leave enough time!

IF YOU E-MAIL ME WEDNESDAY MORNING I WILL NOT HAVE TIME TO RESPOND TO YOU. 

REMEMBER: I DO NOT TAKE E-MAIL ATTACHMENTS! THANK YOU.
FAX IS OK 850-644-8776  REMEMBER YOUR NAME AND EDF5400!

ASSIGNMENT STATS
YOUR TASKS 
TODAY
CROSSTABULATIONS COMPUTER RUN
T-TEST COMPUTER RUN
WHAT YOU TURN IN OCTOBER 20

This assignment gives you practical experience on the first two basic questions we ask about a relationship between two variables:

#1. Is the relationship zero or nonzero in the population?

This question typically examines what is called the "statistical significance" of the association. It requires you to make an inference about whether the association that you observed in your sample is basically zero, or something larger in absolute value (either positive OR negative) than zero, in the population.

You answer this question by looking at the probability level (p) associated with the correct probability density function, such as Chi-Square, the t, or the F distributions. In classical statistics we begin with the null hypothesis that there is no relationship between two variables. This draws a pdf with associated probabilities for each outcome.

If the probability level is very small of getting our results by chance if there is no relationship (typically 0.05 or smaller), we conclude the relationship in the population is probably REAL (nonzero) and we reject the null hypothesis of no association.

If the probability level is relatively large (0.06 or larger), that means the odds are high that the value of the association is zero in the population and that your observed sample results are an ACCIDENT of the random laws of chance (and there is no relationship at all).

#2. How strong is the relationship?
This is the issue of effect size, strength, or "substantive significance."

Question 2 requires you to:

Select the BEST and MOST APPROPRIATE correlation coefficient for your data from among the many that are available. Then:

assess the strength of that correlation coefficient.

In this assignment, you will also decide whether it is better to use a crosstabulation table to examine the relationship between two particular variables, or whether in this case you should use a difference of means test.


 
KEEP THE DIFFERENCE STRAIGHT BETWEEN:

(1) THE PROBABILITY LEVEL USED TO TEST WHETHER THE RELATIONSHIP IS ZERO 

AND

(2) THE CORRELATION COEFFICIENT USED TO ESTIMATE THE STRENGTH OF THE ASSOCIATION.

It can be easy to mix these two up because both range from 0 to 1. (However, a correlation can ALSO range between 0 and -1).

It will help you to remember that the probability level is usually in a column or row labeled "P".
Or sometimes you will see it as p =

IMPORTANT! CRITICALLY IMPORTANT. PLEASE MEMORIZE THE PARAGRAPH BELOW.

 
If you get a row of '0000's next to "p =" or underneath the "P" header, this means that your results could have occurred by accident only once in 1,000 samples (or even 1 in or 10,000 samples) if Chi-square were really zero in the population. This really means p<.001 so in such a case, when asked about the statistical significance of your findings, you will write "p < .001". Such a result is VERY statistically significant and means your relationship is REAL. 

You will see the row of zeros because the computer will truncate the results at a predetermined set of decimal places. DO NOT write p = .0000 in your assignment. This is a computer glitche that appears in virtually all programs, including SPSS. It DOES NOT mean the odds of your results occurring by chance are 0.
 

The value of the correlation coefficient can be found under the names, such as Phi, Tau-b, r, or eta.
 


 
 
IMPORTANT NOTE: The Berkeley SDA system does NOT run t-tests. Instead (in my opinion) it does something much nicer:

SDA tests the differences in mean scores across two or more groups using the F distribution. The pdf formula for the F distribution is SO ghastly, I will not reproduce it here. I'm not going to draw it either, because this is one of these distributions, like X2, that look different for different degrees of freedom (this one is also at least three-dimensional when graphed). 

The F distribution is VERY widely used, and fortunately the computer tears through the pdf formula in seconds. There is a very intimate relationship between the F distribution and the t-test. In fact, IF YOU HAVE TWO GROUPS ONLY, if you take the SQUARE ROOT of the F-value that SDA (or SPSS)  gives you for the "comparison of means" program, then you will have the absolute value (that means positive only) of the corresponding t-test.

Thus, you WILL be able to access the t-test and the probability level given will be the right one for the t-test for your data. 

A very minor adjustment is needed for the degrees of freedom that could make a difference in small samples (say, under n = 50). Subtract TWO (not 1) from the total casebase in your printout  (one for each group) and you will have the correct degrees of freedom for your t-test.

So, why is obtaining the F-ratio instead of the t-test so much nicer? Because you ALSO receive the value of the correlation coefficient eta-squared ("Eta_sq" in your printout.) 

Remember:2 is a widely used and very nice squared correlation coefficient when you assess the relationship between one nominal independent variable and one interval dependent variable. If you take the square root of this coefficient (just "eta"), you will have a correlation coefficient analagous to the widely used Pearson's r ("rho").

There is no other EASY way to answer the question "how strong is this relationship?" with a two-value nominal or ordinal independent variable and a numeric dependent variable and a difference of means test, so that's why I like what SDA does here. The conventional t-test does NOT have a conventional associated correlation coefficient, but as you see, the F-ratio test does.

Did you want to know if the difference between the two sample means is positive or negative? That's not a problem. Obtain the absolute t-value by taking the square root of the F-ratio. If the mean for group 1 is LARGER than the mean for group 2, then the t-value is positive. If the mean for group 1 is SMALLER than the mean for group 2, then the t-value is negative. That's all there is to it.
 

In this assignment, you'll use the SDA and the General Social Survey to:

(1) analyze and interpret a bivariate crosstabulation table and
(2) using the SAME TWO VARIABLES, examine the difference between two groups with a difference of means test.

Your independent variable is study year, 1980 or 2002. Your dependent variable is "adults," or the number of adults in the individual's household.

Why might we expect a difference by year? Since 1980 in the United States, marriage rates have dropped and divorce rates have risen (if you find this hard to believe, after you have finished the assignment run below, do a second computer cross tabulation run and substitute "marital" as the row variable for "adults".) As a result, the number of household adults may have dropped.
 
 
THOUGHT QUESTION: Would you choose a "two tailed" (bidirectional) alternative hypothesis or a "one tailed" (unidirectional) hypothesis in our example exercise? Why?
 


ASSIGNMENT STATS

This total assignment is worth 20 points.

Correctly following all programming information for running frequency distributions on the two variables, the crosstabulation, and the difference of means test, and turning in all output = 2 points.

Although your actual output does not count very heavily, I MUST receive your output in order for you to receive credit on this assignment.

Is the variable "year" nominal, ordinal, interval or ratio?
Is the variable "number of adults" (adults)  nominal ordinal, interval or ratio? (1 point for both total)

Looking first at the crosstabulation table, which correlation coefficient is the MOST APPROPRIATE to use to describe the relationship between study year and the number of household adults? Why? (2 points)

Referring to this correlation coefficient that you chose, did you have an association between study year and the number of  household adults? (That is, was this association REAL or was it an ACCIDENT?) (2 points)

What was the level of statistical significance or p-level (type one or "alpha" error) associated with this correlation coefficient? (Give the probability numeric value.) (2 points)

What was the numeric value of the correlation coefficient you chose between study year and the number of household adults  (whether it was statistically significant--or not)? (1 point)

Characterize the strength (and direction, if appropriate) of this correlation in words. (1 point)

Turning now to the difference of means test for the number of household adults by study year, was your estimate of the difference by year ZERO or some difference in ABSOLUTE VALUE THAT WAS GREATER THAN ZERO? (2 points)

What was the level of statistical significance or p-level (type one or "alpha" error) associated with this difference of means test? (Give the probability numeric value.) (2 points)

What was the numeric value of the actual t-test difference in means of the number of household adults between 1980 and 2002 (whether it was statistically significant--or not)? (1 point)
This is the t difference (not the difference between the original mean scores).

What was the value of ETA (NOT eta-squared)? (1 point)

Characterize the strength of eta (NOT eta-squared) in words. (1 point)

In this case, which do you think is more appropriate to use to assess the relationship between study year and the number of household adults: the crosstabulation table or the difference of means test (or do you want to use both)? GIVE THE RATIONALE BEHIND YOUR DECISION. (2 points)
 


YOUR TASKS FOR ASSIGNMENT THREE

1. You will run FREQUENCIES on the two variables for this exercise because you ALWAYS check the frequencies on all the variables that you plan to use at the beginning of any analysis session. You will watch for any out of order codes, missing data codes, substantial amounts of missing data, "wild punches," and other anomalies so that you can recode these if needed or restrict the range of valid codes.

You will run frequencies on:

year (1980 and 2002 ONLY) and
adults (the number of adults in the household)

2.   You will then conduct a crosstabulation of adults by year. In your program, you will request statistics and column percentages.

3. Based on your crosstabulation, you will select the BEST or MOST APPROPRIATE correlation coefficient for your data. You will decide whether any apparent association between year and the number of household adults is ACCIDENTAL or REAL. You will report the level of "statistical significance" of this association.

4.  You will report the numeric magnitude of the correlation coefficient, its direction (if appropriate), and how strong (in words) the correlation coefficient is.

5.  Next, you will conduct a difference of means test, using "adults" as the dependent variable and "year" (1980 and 2002 ONLY) as the independent variable. (Changes over time might influence the number of adults in the household but I hope no one believes that the number of adults in the household influences what year it is...)

6.  You will decide if there is a nonzero difference (estimated for the population, of course) across time for the number of household adults. You will report the probability level or statistical significance of this difference.

7.  You will report the time difference in mean number of adults per household and the value of eta for this analysis. You will assess the numeric magnitude and strength in words of the eta correlation coefficient.

8.  You will decide whether it is more appropriate to use the crosstabulation table to present this analysis, the difference between means test, or whether either presentation would be appropriate to assess the relationship between the two variables, study year and the number of household adults. You will give a brief rationale for your choice of analysis method.

WHEW! It sure looks like you are doing a LOT. However, the assignment looks like more than it really is because I broke down exactly what you are expected to do step by step. These procedures, decisions, and reports are very typical of the steps that researchers follow when they want to see whether group differences (e.g., across time) exist.
 



OLD FEATURES in this exercise include: NEW FEATURES include:

(1) Conducting a crosstabulation to generate:

(2) Conducting a difference of means test (3) Assessing the statistical significance of an association between two variables.

(4) Assessing the numeric magnitude of the relationship between two variables.

(5) Assessing the strength in words of the relationship between two variables.

(6) Looking at the difference of means across categories of your independent variable.

(7) Deciding whether a crosstabulation table or a difference of means test is better to describe the association between your two variables.
 



MAKE LIFE EASIER!

 
 
ALWAYS A GOOD IDEA TO PRINT OUT A COPY OF THIS ASSIGNMENT.

CHECK OFF EACH INSTRUCTION AS YOU COMPLETE IT AND YOU WILL HAVE A SPEEDY AND NONTRAUMATIC EXPERIENCE RUNNING THE SDA SYSTEM. 


 
SDA REVIEW

Use the RIGHT toe of your mouse to click on this link:
 
http://www.icpsr.umich.edu/GSS

When the menu opens on the link, click on:         Open in New Window

Click on the  button to pull up the statistical program selection screen.
 
 
Remember: WHAT YOU SEE BELOW IS A NONWORKING COPY. 
YOU MUST GO TO THE NEW PAGE THAT OPENS ON YOUR MONITOR TO ACCESS THE SDA PROGRAM.

You can always switch back to this screen by clicking on the box at the very bottom of the monitor screen that reads "Assignment 3". Or you can print out the pages of Assignment 3.

Once again, you will bring up the "radio buttons" screen to select an analytic option, in this case, Frequencies or crosstabulation. Only this time, the directions that you place in the SDA boxes will direct the program to perform a crosstabulation instead of frequencies.

Survey Documentation and Analysis

Study: GSS 1972-2002 Cumulative Datafile

Select an action:
Browse codebook in this window
Frequencies or crosstabulation
Comparison of means
Correlation matrix
Comparison of correlations
Multiple regression
Logit/Probit
List values of individual cases
Recode variables (into public work area)
Compute a new variable
List/delete derived variables

Download a customized subset

Suggestion for running analysis programs:
Click the "Open Extra Codebook Window" button above. This allows you to "copy-and-paste" the names of variables you wish to analyze from the codebook window to the analysis windows.
Return to SDA Home Page

In the Study: GSS 1972-2002 Cumulative Datafile screen that opens: first click on:
 
Extra Codebook Window

to open up the codebook window. Remember to click on the "Standard Codebook" link.

Use the radio button to select Frequencies or crosstabulation. Then, click on :
 
Start

so that you have the SDA program window active as well.



 
PRELIMINARY FREQUENCIES

REMEMBER! The first step in working with data is to ALWAYS display the total frequencies on all the variables that you plan to analyze for a particular research project.

After you have accessed the General Social Survey data and the SDA system, be sure to run the original frequencies for:

1. year  (see below for the Selection Filter box instructions)
2. adults

Obtain the frequencies ONLY. You DO NOT need measures of central tendency, dispersion, or other univariate measures for this assignment.
 
 
REMEMBER! Add the words: year (1980,2002)     in the Selection Filter(s):  box

We will ONLY examine respondents who were studied in the years 1980 and 2002.

BE SURE TO IMMEDIATELY PRINT THESE PAGES AND INCLUDE THEM WITH YOUR OUTPUT.

In the Extra Codebook Window, you can look at the specifics for "year" and "adults." If you click on the blue underlined shorthand abbreviation or mnemonic, you can also examine the basic univariate frequency distribution for that particular variable, including missing value codes and frequencies. The SDA program sometimes, BUT DEFINITELY NOT ALWAYS, omits the cases with missing values when it executes an analysis. That is one reason why you have to run the frequencies for year and adults to make sure what is and is not included when the program actually runs. For example, you are led to think there will be people with "8 or more" adults in the household. However, these values only range from 1 to 7 for the years 1980 and 2002. (HINT: you couldn't have a household of "0" in the sample, although some could occur in the population, because then there would be no adults to interview. What does this imply for the level of measurement for the variable "adults"?)
 
RUNNING YOUR SDA SHORT PROGRAM FOR CROSSTABULATIONS

Pull up the window for the SDA Program Screen. Again for your review, below is the example screen for running frequencies or crosstabulations that appeared when you clicked the "START" button on the SDA beginning window.
 

 
PLEASE NOTE: THE BOX BELOW WILL NOT WORK TO RUN YOUR PROGRAM.
YOU MUST GO TO THE NEW PAGE THAT OPENS ON YOUR MONITOR TO RUN YOUR PROGRAM.

 
 
 
SDA Tables Program
(Selected Study: GSS 1972-2002 Cumulative Datafile)
Help: General / Recoding Variables
REQUIRED Variable names to specify
Row:

OPTIONAL Variable names to specify
Column:
Control:
Selection Filter(s):Example: age(18-50) gender(1)
Weight:

Percentaging:ColumnRowTotal

Other options
StatisticsSuppress tableQuestion text
Color codingShow Z-statistic
Chart options
Type of chart (if any) to display:
Bar chart options:
   Orientation: Vertical Horizontal
   Visual Effects: 2-D 3-D
Size of chart - width:  height: 



Change number of decimal places to display
For percents:
For statistics:

In the Row box, type: adults

(You can just copy and paste this command into the screen.)
"adults" goes in the "row" line because it is your dependent variable.

Next to the Column: line, type: year
This will make year, your independent variable, the column variable, per MOST conventions.

Add the words: year (1980,2002)     in the Selection Filter(s):  box

Next to Percentaging: leave the little box checked to the LEFT of "Column".  (If the checkmark is missing, just click it to make the check mark appear.)

This really WILL produce column percents this time, because you now have both rows and columns.

Now, click on the boxes to the LEFT of:

"Statistics"                    AND
"Question text"

Now that you have two variables, the "Statistics" command will produce Chi-Square and several correlation coefficients.

ONLY use the P or Pearson Chi-Square for this assignment, please.

The program does not calculate Phi. However, Phi is very easy to calculate. In case you feel that Phi is the best measure to use for this exercise, to produce Phi, just:

(1) Take the Pearson Chi-Square and
(2) Divide it by the casebase (n or N), then
(3) Take the SQUARE ROOT of the result at Step 2.

That's it, that's Phi.
If you want Phi-squared instead, just stop at Step 2.

If you like, select "bar chart" under Chart Options (otherwise, just select "no chart"). If the chart picture gives any trouble loading, just go ahead and select "no chart" to solve the problem.

Blink once, then click on the gray box at the bottom left that says:
 
Run the Table

Blink again. Here is your output for your first crosstabulation run using the SDA system.

The various correlations and Chi-squares appear right below the bivariate table.
REMEMBER: ONLY USE THE "P" OR PEARSON CHI-SQUARE. (The "LR" Chi Square is something called the "Likelihood Ratio" Chi-Square statistic. While it will produce similar results to the Pearson Chi-Square IN LARGE SAMPLES, this is a  different entity.)

Go ahead and print your "crosstab" NOW so that you have these pages to study when you complete this assignment. They also form part of the Assignment 3 that you will turn in to me.
 

Now, you have more statistics than ever before (and you're not even done yet.) Once again, the problem is, which are the RIGHT ONES to assess your association between study year and the number of adults in the household?

(1) Is the variable "year" nominal, ordinal, interval or ratio?
Is the variable "number of adults" nominal ordinal, interval or ratio?

(2) Looking at your crosstabulation table, which correlation coefficient is the MOST APPROPRIATE to use to describe the relationship between study year and the number of adults? Why? (You may have to calculate Phi if you select Phi as the most appropriate coefficient. That takes about one-half minute with a modern calculator that does square roots.)

(3) Referring to this correlation coefficient that you selected, did you have an association between study year and the number of adults? (That is, was this association REAL or was it a sampling ACCIDENT?)

(4) What was the level of statistical significance or "p-level" (type one or "alpha" error) associated with this correlation coefficient? Give the probability numeric value.
 

 
REMEMBER! If you get a row of '0000's next to "p ="  or underneath the "P" header, this means that your results could have occurred by accident only once in 1,000 (or even 10,000) samples if Chi-square were really zero. This really means p<.001 (or even p<.0001 if you have four zeros) so be sure to put p < .001 (or p < .0001). Such a result is VERY statistically significant and means your relationship is probably REAL. 

Be sure to put p < .01 if you get a row of '00's next to "p ="
DO NOT put .00!
 

(5) What was the numeric value of the correlation coefficient you selected for the association between year and the number of household adults (whether it was statistically significant--or not)?

(6) Characterize the strength (and direction, if appropriate) of this correlation in words. (Guide 5 will have a chart for approximate strength in words.)

CLICK HERE TO SEE CHART.


USING THE SDA TO ASSESS DIFFERENCES IN MEANS ACROSS GROUPS

Remember that SDA does not use a t-test to assess the difference between two groups. Instead it uses an F distribution and something called a "one-way ANOVA" or "one-way analysis of variance." ("One-way" means just ONE independent variable.)

DO NOT WORRY!

Remember: the ANOVA and the t-test are statistical cousins.

Simply take the value of the "F" that appears in the "Analysis of Variance" right below the table of means and then:

TAKE THE SQUARE ROOT OF THE F WITH A POCKET CALCULATOR. (This takes one-half minute.)
The result is the absolute value of the t-test (that means the value on your calculator is always positive).

Your associated probability with the F is the same probability that you will use for the t-test, so you don't have to do anything else here.

The square root of the Eta_sq (third column from the LEFT ) is Eta, a correlation coefficient for the association between one nominal variable (that's why Eta is always positive--nominal variables do NOT have a direction) and one interval variable. (This will be the same value of eta you will see in the correlation coefficient section of your crosstabulation output.) You can use eta with an ordinal (or even an interval or ratio) independent variable too, because if you can use a statistic with nominal data, you can use that statistic with any kind of data.

Here is the main SDA screen again. This time, click the radio button on Comparison of means.
 

Survey Documentation and Analysis

Study: GSS 1972-2002 Cumulative Datafile

Select an action:
Browse codebook in this window
Frequencies or crosstabulation
Comparison of means
Correlation matrix
Comparison of correlations
Multiple regression
Logit/Probit
List values of individual cases
Recode variables (into public work area)
Compute a new variable
List/delete derived variables

Download a customized subset

Suggestion for running analysis programs:
Click the "Open Extra Codebook Window" button above. This allows you to "copy-and-paste" the names of variables you wish to analyze from the codebook window to the analysis windows.
Return to SDA Home Page


 Once you have clicked "Start", the following screen will open:
REMEMBER THE SCREEN BELOW IS JUST FOR ILLUSTRATION. YOU MUST GO TO THE SDA PROGRAM TO HAVE A WORKING SCREEN!
 
 
 
SDA Means Program
Selected Study: GSS 1972-2002 Cumulative Datafile
Help: General / Recoding Variables


REQUIRED Variable names to specify
Dependent:
Row:

OPTIONAL Variable names to specify
Column:
Control:
Selection Filter(s):Example: age(18-50) gender(1)
Weight:

Main statistic to display:

Additional statistics in each cell
Complex std errsSRS std errsDesign effect (deft)Rho
Std deviationsN of casesWeighted N

Optional tables of statistics
Confidence intervals Level of confidence:
Multiple classification analysis
Diagnostic information

Other options
ANOVA statsSuppress tableQuestion text
Color codingShow T-statistic



Change number of decimal places to display
For means:
For totals:
For differences of means and MCA (relative to means):
For std deviations (relative to means):
For std errors (relative to means or totals):
For DEFT and RHO:
For weighted N's:


 

The passing of time might influence how many adults there are in the household (see earlier for reasons). Thus year is our "cause" or our independent variable.

We can be totally sure that the number of adults in the household will NOT influence what year it is. Therefore, adults is our "effect" or our dependent variable.

Type next to Dependent:      adults

Type next to Row:   year

Add the words: year (1980,2002)     in the Selection Filter(s):  box

Keep the "Main statistic to display:" as Means

Under "Additional statistics in each cell" click the boxes for:

Complex std errs
Std deviations
N of cases should already be checked (but check it if the box is not checked)

This will give you several basic statistics for each year group, such as means, standard deviations, and standard errors for 1980 and 2002 separately.
 
 
You want "complex" standard errors because the sampling method used for the General Social Survey is a more complicated one than a "simple random sample" and the standard errors will tend to be larger than those for a simple random sample with the identical case base.

Check the box next to Confidence intervals. Leave the Level of confidence: at 95 percent

Under "Other options" Click the boxes to check them for:

    ANOVA stats   Question text
    Color coding

(Do NOT check the "Show T-statistic"  box. This is NOT the Student's t distribution for this comparison of means, so leave this box blank. It will simply clutter up your output--which has enough in it already.)

Click:
 
Run the Table

and examine your output.

If your output is OK, print your output from your comparison of means. Remember to include this output when you turn in your assignment.

Was your estimate of the year difference on the number of adults in the household in the population ZERO or some difference in ABSOLUTE VALUE GREATER THAN ZERO?

What was the level of statistical significance or "p-level"  (type one or "alpha" error) associated with this difference of means test? What was the probability numeric value?
 
 

 
REMEMBER! If you get a row of '0000's next to "p ="  or underneath the "P" header, this means that your results could have occurred by accident only once in 1,000 (or even 10,000) samples if Chi-square were really zero. This really means p<.001 (or even p<.0001 if you have four zeros) so be sure to put p < .001 (or p < .0001). Such a result is VERY statistically significant and means your relationship is probably REAL. 

If you think I am nagging here, you are right! (If you think no one will put a row of zeros, you are wrong.)
 

What was the numeric value of the t-test difference in means of the number of adults in the household over time (whether it was statistically significant--or not)?

PLEASE NOTE: this is the t-test value, NOT the subtraction of one mean from another mean.

Did you want to know whether the difference between the two sample means was positive or negative? That's not a problem. First, obtain the absolute t-value. If the mean for group 1 is LARGER than the mean for group 2, the t-value is positive. If the mean for group 1 is SMALLER than the mean for group 2, the t-value is negative.

What was the value of ETA (NOT eta-squared)?

Characterize the strength of eta (NOT eta-squared) in words. (Use the Guide 5 chart).
CLICK HERE TO REVIEW CHART.

HERE'S WHAT YOU TURN IN BY CLASS ON WEDNESDAY OCTOBER 20 2004


  Here's what you turn in to me by class Wednesday October 20 (you may add a short explanation to your answer to any of these questions):

Your printed output (2 points) for:

THEN, YOUR ANSWER TO QUESTIONS 1-  12 BELOW:

1. Is the variable "year" nominal, ordinal, interval or ratio?
Is the variable "number of adults" nominal ordinal, interval or ratio? (1 point total)

2. Looking first at the crosstabulation table, which correlation coefficient is the MOST APPROPRIATE to use to describe the relationship between year and the number of household adults? Why? (2 points)

3. Referring to this correlation coefficient, did you have an association between year and the number of household adults? (That is, was this association REAL or was it an ACCIDENT?) (2 points)
 

 
REMEMBER! CHI-SQUARE IS NOT A CORRELATION COEFFICIENT. 
Instead, it tests the statistical significance level of a correlation coefficient.

Correlation coefficients are bounded by the absolute values of 0 to 1. If some number in your output is larger than 1, you can safely assume it is NOT a correlation coefficient. It is probably either an F-ratio or Chi-Square.
 

4. What was the level of statistical significance or p-level (type one or "alpha" error) associated with this correlation coefficient? (Give the probability numeric value.)  (2 points)

5. What was the numeric value of the correlation coefficient between year and the number of adults (whether it was statistically significant--or not)? (1 point)

6. Characterize the strength (and direction, if appropriate) of this correlation in words. (1 point)

7. Turning now to the difference of means test for the number of adults over time, was your estimate of the 1980 versus 2002 difference ZERO or some difference in ABSOLUTE VALUE GREATER THAN ZERO? (2 points)

8. What was the level of statistical significance or p-level (type one or "alpha" error) associated with this difference of means test? (Give the probability numeric value.) (2 points)

9. What was the numeric value of the t-test difference in means of the number of adults between 1980 and 2002 (whether it was statistically significant--or not)? (1 point)  NOTE: This is not the difference between the original means for 1980 and 2002, it is the value of the t-statistic.

10. What was the value of ETA (NOT eta-squared) in the comparison of means analysis? (1 point)

11. Characterize the strength of eta (NOT eta-squared) in words. (1 point)

12. In this case, which do you think is more appropriate to use to assess the relationship between year and the number of household adults: the crosstabulation table or the difference of means test (or do you want to use both)? GIVE THE RATIONALE BEHIND YOUR DECISION. (2 points)



 

READINGS AND ASSIGNMENTS

OVERVIEW

Susan Carol Losh October 10, 2004
This page was built with Netscape Composer
and is best viewed with Netscape Navigator
600 X 800 display resolution.