OVERVIEW

GUIDE 1: INTRODUCTION
GUIDE 2: CONSTRUCTING A TABLE
GUIDE 3: UNIVARIATE STATISTICS AND DISPLAYS
GUIDE 4: BIVARIATE BASICS
GUIDE 5: BIVARIATE CORRELATIONS
GUIDE 6: MULTIVARIATE CROSSTABULATIONS
GUIDE 7: BASIC REGRESSION
GUIDE 8: REGRESSION SPECIFICS
GUIDE 9: SAMPLING
EDF 5400 READINGS AND ASSIGNMENTS


 
EDF 5400 INTRODUCTORY STATISTICS
FALL 2004
GENERAL FEEDBACK FOR EXAM 2

DR SUSAN CAROL LOSH
EDUCATIONAL PSYCHOLOGY AND LEARNING SYSTEMS


 
GENERAL POINTS
SCORE DISTRIBUTIONS
EXAM ANSWERS

IMPORTANT GENERAL POINTS

Here are the scores from Exam Two. The maximum possible score was 100 points.
 
 
First, please read over this site in its entirety. Many questions you may have about individual exam items will be answered here.

Please look over your exam carefully. Remember Maria and I do NOT discuss individual papers on class time (or on break) although I am happy to address GENERIC issues when we review in class. We are also happy to examine your individual paper after class, during office hours, or in an appointment. 

The exception is if you discover an arithmetic mistake in  your score. Please let me know this during break or after class (NOT during class) and I will see that your grade file is corrected.
 


 

 
ANNOUNCEMENT: THERE WAS A PEARSON'S r CORRELATION COEFFICIENT OF

 0.67  (p < .001; n = 35)

BETWEEN SCORES ON EXAM 1 AND SCORES ON EXAM 2.

REVIEW: 

Who tended to score higher on Exam 2? Those with higher scores on Exam 1 or those with lower scores on Exam 1?

Was this a statistically significant relationship?
How strong was this relationship?
Does this finding imply that grades were "locked in stone" after Exam 1 or that many people's scores changed from the first to the second exam? Why did you make the choice you did?

EVERYONE should be able to answer these questions!

An "A" means excellent work. A "B+" means superior work. A "B" means good work.

This was a very good group of exams. Maria and I were both pleased. Although the Exam 2 statistics were about three-quarters of one standard deviation below Exam 1 statistics (do you think this difference was statistically significant? Why or why not?), this material is more complex. Look for "repeats" on readings tables, ascertaining statistical significance, correlation strength and causal issues on Exam 3.

LETTER GRADES ARE APPROXIMATE. I will add the scores from all three exams to create a new TOTAL SCORE in early August. This total score will receive a grade and the total exam score grade counts 75 percent of your final grade.

You can review grading HERE.

Please read all material on this site.

I WILL continue to include material on identifying the level of measurement of your variables on Exam 3 as part of the problem solver questions, because it becomes more important the more techniques you know and can choose among. This exam was a good example.
 

EXAM PURPOSES

Any exam serves several purposes.

FIRST  it should spot overall class problems in comprehension and correct them.There will be material on interpreting crosstabulation tables on Exam 3. Some students also need more practice with statistical and substantive significance, as well as choosing the appropriate correlation  coefficient for your data. Since this material also forms part of Assignment 4, you WILL have some questions on these topics on Exam 3.

SECOND an exam should assess class mastery of the material so that I can judge the pace and sophistication of presented material. So far, we are doing well.

Expect to feel less confidence as new material is introduced! If I had a dollar for each student who told me when we started bivariate distributions "well, I understood all the univariate stuff, but I am now hopelessly confused..." then the students who said "well I understood all that bivariate stuff, but now that you have introduced a third variable, I don't understand a thing...", I would be an ENORMOUSLY WEALTHY woman!

THIRD  of course, an exam should assess your individual mastery of the material, which, of course, I am mandated to do.

If you scored below 80 on this exam BEWARE!
You may be in TROUBLE.

If your score is low, you should "trouble shoot" why so that you can create the best learning strategy for you:

You can reach me by slosh@garnet.acns.fsu.edu

Remember: my office hours are Monday 3:30-5:00 and Wednesday, 2:00-5:00.
Maria has office hours Tuesday and Thursday afternoons in the LRC.

If you are interested in tutorial help, you need someone who has had experience successfully analyzing data, so look for a good, advanced student in your discipline with research analytic experience. When in doubt, please check with me. A math or statistics major PROBABLY WILL NOT HELP YOU because our course addresses applying statistical and data analytic principles. Deriving formulas WILL NOT HELP YOU in this course.


EXAM 2 SCORES

 
Score

67
69
70
74
81(2)
82
83(2)
84
85(2)
87 
89(2)
90
91(2)
92(2)
94(3)
95(2)
96(3)
97
98(4)
99(2)
 

Grade

C
C
C
C+
B
B
B
B
B+
B+
B+
A-
A-
A-
A-
A
A
A
A
A

Exam 2: Median = 91.0  Mean = 88.9   25th percentile = 83   75th percentile = 96 standard deviation = 8.9
skew = -1.02  s.e. skew = 0.40
   
Exam 1: Median = 97 Mean = 94.2   25th percentile = 93   75th percentile = 99  standard deviation = 8.32 
skew =  -2.56  s.e. skew = .40
   

 


ANNOTATED EXAM 2

ABBREVIATED PORTIONS ARE AREAS IN WHICH EVERYONE DID WELL. CORRECT ANSWERS ARE ANNOTATED IN RED.

Please read each question carefully. Complete each specified question. If you can choose among alternatives, please only answer the number specified (we give no extra credit for extra answers and only grade the first specified number of answers.) You may add a brief explanation at any time. Point values are in parentheses next to each question.

PART A: (9; 3 points each). For ANY THREE of the following variable pairs, (a) describe which variable is the independent variable and (b) give a brief rationale (i.e., a guide or rule to establish cause) behind your answer (if all four are answered, we score only 1-3). [If you feel that the relationship is symmetric instead of asymmetric, then briefly state this and why you think so.]

A. Country of national origin and major field in college. The independent variable is

country of national origin       because:

it predates college in time. There is no way your college major could influence your country of birth (giggle factor). However, by stressing different themes (e.g., science or engineering), national origin could affect one's choice of college major.

Be alert to phrases such as "origin" or "birth" because they almost certainly establish time order.

B. Gender and occupational field choice.  The independent variable is:

gender  because:

it precedes occupational choice in time and is much more difficult to change (see A-A). Stereotypes about "good fields" for women or men might influence someone's occupational choice, but it is clear that your occupational choice will rarely, if ever, affect your gender ("I became a nurse, so I will have a sex change operation and become female"? No way! Use the "giggle factor.")

C. Incidence of lung cancer and number of daily cigarettes smoked. The independent variable is:

number of daily cigarettes smoked        because:

of time order as well as good conceptual medical reasons about how cigarette smoking can influence cell composition, etc. Although I won't entirely rule out how your anxiety about developing lung cancer caused you to finally take up smoking cigarettes, for most people, smoking predates cancer development by many, many years.

D. Political ideology (conservative, liberal, moderate) and presidential vote choice. The independent variable is:

political ideology     because:

it is the more general concept and a particular presidential vote choice is much more specific (if you have a reasoned answer about how your zeal for a particular candidate led to a "political conversion" experience, I will be glad to strongly consider it).

Political ideology also often comes first in time and is more difficult to change. In 2004, weeks before the presidential election, the overwhelming majority of self-identified conservatives said they would vote for Bush and the overwhelming majority of self-identified liberals said they would vote for Kerry.
 
 
 
To receive full credit, you HAD TO invoke a causal guideline or rule (e.g., time order or the "giggle factor").
YOU CAN REVIEW THESE IN THE "ON PROOF" SECTION HERE
ISSUES OF CAUSE BECOME MORE COMPLEX FOR EXAM 3.

For our next sections on regression, you MUST select a dependent variable.

If you go on to take structural equation modelling, you MUST determine the causal order of your variables before you can analyze the data.

Be careful about statements that amount to a tautology, e.g., "Gender is the cause because it must cause science interest (or whatever it was.)" You lost at least one point for doing so.
 


 

PART B: (4 each, 12 total) Please answer all questions 1-3. FIRST: check the  BEST correlation coefficient to use in each question; SECOND: in each case give the rationale behind your decision.

1. The association between gender (male-female) and the number of science courses taken in college. The best correlation coefficient to use is (PLEASE CHECK ONLY ONE):

 [   ] A. Chi-square                  [X ]B. Eta          [    ]C. The F-test     or                [   ]D. Tau-beta?

BECAUSE:

gender is nominal and the number of science courses is ratio making eta the best choice. X2 and F are PDFs not correlation coefficients and the taus require both variables to be ordinal or numeric.

2. The association between number of books in one’s personal collection and years of formal education . The best correlation coefficient is (PLEASE CHECK ONLY ONE):

[   ]A. Chi-square                  [   ]B. Gamma              [   ]C. Phi (Cramer's V)     or   [X ]D. Pearson's r?

BECAUSE:

Both variables are numeric (ratio) and we have no reason to expect a nonlinear relationship. Individuals with more education probably have more books. Phi is a nominal level coefficient (and each variable has so many values a contingency table will do a poor job of presenting the data anyway) and gamma is not only an ordinal coefficient but inflates the relationship because of the way it is calculated.

3. The association between political party (Democrat, Republican, etc.) and 2004 presidential candidate (e.g., George Bush, John Kerry, etc.). The best correlation coefficient is (PLEASE CHECK ONLY ONE):

[    ]A. Eta                         [X ]B. Phi (Cramer's V)             C. [   ] The t-test     or       [  ]D. Tau-b?

BECAUSE:

Both variables are NOMINAL ruling out eta or tau-b and leaving only V. The t-test is a PDF, not a correlation coefficient. Phi (V) is the only nominal-level correlation coefficient presented.
 


 PART C: Correlations, statistical significance, and strength (18 total; 2 points each). For each statement, check whether the statement is generally TRUE or generally FALSE.
 

1. A correlation coefficient of 0.00 means the association between two variables is very statistically significant. It means there is no association at all. It's zero.
[    ]TRUE
[X ]FALSE
2. A correlation of +0.76 is larger than a correlation of -0.61 (assume the same type of correlation coefficient, e.g., r). Use the absolute value to determine strength. .76 is larger than .61.
[X]TRUE
[   ]FALSE
3. A very large Chi-square typically means that we reject the null hypothesis of no relationship.
[X]TRUE
[   ]FALSE
4. Causation implies correlation. If one variable causes a second, they should be correlated!
[X]TRUE
[   ]FALSE
5 Before examining effect size, you must first check whether the inference PDF value is statistically different from zero. Absolutely. If the PDF value is not statistically significant, the effect size is zero.
[X]TRUE
[   ]FALSE
6 . It is possible for a correlation coefficient to be very statistically significant, but very weak in strength. Absolutely, see results for this exam and Assignment 3. This is particularly likely to happen in large samples, where the correlation is different from zero--but only a LITTLE different from zero.
[X]TRUE
[   ]FALSE
7. Using gamma on nominal data produces a strong correlation coefficient . You can't use gamma on nominal data, period. Don't even think about it.
[    ]TRUE
[X ]FALSE
8. We generally try to accept the null hypothesis. We generally hope to REJECT the null hypothesis.
[    ]TRUE
  [X ]FALSE
9. We use the value of the correlation coefficient to test for the statistical significance of an association. No, we use a PDF such as F or t to test for statistical significance. The value of the correlation could LOOK real, but simply be a sampling accident and actually be zero in the population.
[    ]TRUE
  [X ]FALSE

 
 
Statistical significance addresses whether or not your results probably are a sampling accident. It tells us if we have a relationship AT ALL. You must examine this issue first because if you have an association between two variables that is zero in the population, you essentially do not have any findings to discuss. Statistical significance assesses the null hypothesis. 
Substantive or practical significance addresses the effect size or strength of your findings (if they are not zero). Especially with large samples, you may have statistically significant results that are substantively trivial, very weak or weak. 
One example is Table 1 below, where the association is highly statistically significant but substantivelyweak. A second example was shown in Assignment 3.

IF NEEDED, PLEASE REVIEW THESE SITES: HERE AND OVER HERE TOO.
 
 


 PART D: (4) BRIEFLY describe TWO (and ONLY two) characteristics of a good measure of association:

Unfortunately, a good correlation coefficient is not always the largest one or the most statistically significant.
Choose a good measure of association from DATA PROPERTIES, such as the level of measurement (e.g., nominal), the relationship form (e.g., nonlinear), directionality (if appropriate), etc.

Don't choose the correlation coefficient because it reflected the outcome you wanted (e.g., strong). That a slippery slope down to lying with statistics.

Choose from among the following characteristics:

Appropriate to the level of data
Appropriate to the form of the relationship
Shows direction (if appropriate to level and form)
Asymmetric (if you can designate an independent variable)
PRE
Able to reach 1 in non-square tables (if using cross-tabulations)

PART E: (4) Give a very brief example (1 sentence or so) of a "negative relationship" using two variables of your choice. NEXT, draw a simple graph or diagram that shows approximately what a negative relationship looks like.
 
 

RELATIONSHIP BETWEEN YEARS OF EDUCATION AND
AVERAGE NUMBER OF CIGARETTES SMOKED PER DAY
 
          | 
    #     |     X 
    daily |        X 
cigarettes|           X
          |              X
          |                 X
          |                    X 
 
     education in years

Other great examples:


PART F: (4) When working with observed or naturalistic data (rather than experimental data), how can you begin to determine which variable is independent and which variable is dependent? Briefly state TWO DIFFERENT rules or guidelines that help you to determine this:

Choose from the following:

Time order
Ease of change for each variable
Neccessary condition
Sufficient condition
General versus specific conceptually
Particular theoretical predictions
Logic
The "Giggle Factor"

(P.S. For those who wish to use a citation or read in more detail, see Morris Rosenberg, The Logic of Survey Analysis, or Hans Zeisel, Say It With Figures, both published 1968--so this is old news.)

You do NOT determine which variable is causal from the size of the correlation coefficient or the level of statistical significance. Causality is determined prior to analyzing your data using one of the rules or guidelines above.
 


 PART G: Table 1 contains data from a sample of 9234 American adults and 5 separate general public surveys about possible changes over time in visits per year to science museums. Please use Table 1 and the associated statistics below it to answer questions G1-G18.

Part G caused the most trouble for the largest number of students. It involves a series of decisions, where the earlier decisions you make strongly influence the later decisions. When possible, Maria and I tried to follow the logic of your earlier decisions to see how these affected the later ones, and you gained more credit (even if the literal answer was wrong) when you were able to show that you understood what a test of statistical significance meant, which pdf went with which type of correlation coefficient, how testing the null hypothesis for a correlation coefficient differed from ascertaining the strength of a correlation coefficient, and whether your interpret of the results was consistent were your earlier interpretation of the data. All of these are tasks you must know how to do, whether you are analyzing data for a project or reading the tables constructed by other researchers.

Examining changes over time is done very often by educators, and behavioral or social scientists. For example "No Child Left Behind" requires students to make adequate yearly progress. The SAT and GRE scores are recalibrated every several years to reflect changes in the total answers correct. Changes in the economy or the well-being of a country's citizens all involve examining scores on a dependent variable over time.
 
 

Number of  Visits/Study Year
1983
1988
1992
1997
2001
Total
0
70.4% 
69.4%
75.7%
69.2%
66.7%
6507
1
15.0 
18.5 
16.6 
20.9 
20.7 
1697
2
8.9 
7.0 
3.9 
5.9 
6.9 
591
3 or more  (TREAT AS 3)
5.7 
5.1 
3.8 
4.0 
5.7 
439
Total
100.0%
(1631) 
100.0%
(2041) 
100.0%
(1993) 
100.0%
(1997) 
100.0%
(1572)
 n =
9234
Mean Visits by Year 
0.50
0.48
0.36
0.45
0.52
0.45

 
 
Statistic
Value
Inference PDF used
PDF Value
df 
Probability
Cramer’s V ()
.06
Chi-square
88.48
12
.00000
Tau-beta
.01
t
1.24
9233
.21541
Pearson’s r
.00
Z
-.09
9232
.93097
Gamma
.02
t
1.24
9233
.21541
Eta
.07
F
11.17
4,9229
.00000
1. (1) Is the variable “Year” nominal, ordinal, interval or ratio?  Year is interval: numeric but with no fixed zero. Year does not lose this interval quality even if all years were not present in the study design for a particular sample just as age would not stop being a ratio variable just because no one in your sample was 62 years old. There is still an equal unit ("one year") .

2. (1) Is the variable “Number of Visits” (per year to science museums) nominal, ordinal, interval or ratio?

Number of visits is a count variable, it is ratio with a fixed zero.

How we scored subsequent responses depended a lot on what you said in questions 1 and 2. For examples, if you said both variables were not numeric, eta or r were precluded as choices. Answers that were inconsistent with your answers on #1 and #2 lost credit.

3. (4) A. Which is the independent variable, “Year” or “Number of visits”?

Year could be because we can see how changes over time might influence visits to a science museum, but it is impossible for the number of visits to determine time. The "giggle factor" works well here so does time order.

You had to cite a causal guideline for nonexperimental data to receive full credit.
It is not sufficient to say "year is the independent variable because it causes museum visits."

    B. Briefly give the rationale for your choice in 3A:

See the above, giggle factor, time order, ease of change too.

4. (2) In 1983, what percent of the sample visited a science museum at most once a year (show your work)?

70.4% + 15.0% = 85.4%  (that's 0 visits plus once a year)

5. (2) In 2001, what percent of the sample visited a science museum at least once a year (show your work)?

20.7% + 6.9% + 5.7%  OR, more simply:

100.0% - 66.7% = 33.3% (1 + 2 + 3 visits or 100% - the percent of 0 visits)

6. (4) A. Generally, would you characterize the relationship between Year and Number of Visits as (choose only ONE) linear, monotonic, or non-linear (or cannot be determined in these data)?: NONLINEAR

     B. Briefly give the rationale behind your answer to 6A. If you graph the means over time, the graph resembles a "U".  See the EXCEL graph below. Also the percent reporting "0" visits per year peaks in 1992 and the percent reporting at least two annual visits drops in 1992.

Again, we looked at later answers in terms of your answer here. Monotonic or linear coefficients (the taus, r) would be inappropriate to use with a nonlinear relationship.

Remember! Linear relationships look like a straight line. Monotonic relationships "sort of" look like a line. Nonlinear relationships (like this example) don't even look close to a straight line.

7. (4) A. Using the cross-tabulation Table 1, which correlation coefficient is the MOST APPROPRIATE or the BEST to use for the relationship between “Year” and “Number of Visits”?

ETA
CRAMER'S V could also be used because the table is not huge (although eta is preferable because of the numeric dependent variable).

     B. Briefly give the rationale behind your answer to 7A.

This is a nonlinear relationship with a numeric dependent variable.
Phi, of course, can be used with nonlinear relationships--it's not the best choice in general (or here, either) but could work because the table is not very large.

We looked to see if later values, strength, statistical significance, etc. were consistent were your answer to Q7. If they were, you received credit (even if your answer to Q7 was incorrect.)\

8. (4) List below which, if any, of the coefficients listed above for Table 1 is/are INAPPROPRIATE or INVALID to use for the relationship between these two variables?

Tau -beta, Gamma, Pearson's r

     B. Briefly give the rationale behind your answer to 8A.

The sample relationship is nonlinear. Therefore the taus, gamma and r will all dramatically underestimate the strength of the relationship in general under these circumstances, and in this case in particular. Further, gamma typically gives inflated results (although it does not do so in this case BECAUSE the relationship is nonlinear).

Note that V or eta are not the best coefficients to use because they were statistically significant and r, gamma or tau were not. These two coefficients were statistically significant BECAUSE V and eta reflected the nonlinearity in the data, or the relationship form, and the monotonic and linear correlation coefficients do not reflect the form of this relationship. It was matching the correlation coefficient to the relationship form that made eta or V the best correlation coefficients to use, and, which also produced slightly higher correlation values.
 
 
     
    CRITICALLY IMPORTANT POINTS OF CONFUSION:
     
  • WRITING A NULL HYPOTHESIS
  • LEVELS OF STATISTICAL SIGNIFICANCE AND THE STRENGTH OF A CORRELATION COEFFICIENT
  • REMEMBER, IF YOU CAN USE A CORRELATION COEFFICIENT WITH NOMINAL DATA (e.g., ) YOU CAN USE IT WITH ANY KIND OF DATA (although it may not be the best or most appropriate choice)
  • USING THE PROBABILITY LEVEL TO DECIDE IF YOUR RESULTS ARE REAL OR SIMPLY A SAMPLING ACCIDENT
  • USING THE F TEST & ITS PROBABILITY LEVEL TO EXAMINE MEAN DIFFERENCES ACROSS MORE THAN 2 GROUPS
REVIEW! REVIEW! REVIEW!

9. (2) What is the null hypothesis or H 0 for the correlation coefficient you chose in question 7:
H 0 : Eta = 0   OR  V =  0   (not as good but OK:   F = 0   X2 = 0     etc.)

We start with a "null" hypothesis of no relationship or no difference.

10. (4) A. Using the cross tabulation Table 1 and the correlation coefficient you chose in question 7, do you think there is a real change across time 1983 - 2001 in the population on the number of visits to science museums? (PLEASE CHECK ONLY ONE ANSWER:)

[  ]Yes, a real difference  OR
[  ]No, a sampling accident OR
[  ]Cannot be determined from these data

     B. BRIEFLY, explain the reason for your choice in question 10A:

Clearly, yes if you chose V or Eta because p < .00001 for either.
This is less than the common alpha level used of p < .05,
If there were no association IN THE (population) CROSSTABULATION TABLE, you would expect this association, miniscule as it is, in less than 5 samples in a 100 purely by chance. In fact, this result is so extreme that you would expect an association this size or larger in less than 1 in 100,000 samples by chance if the population correlation coefficient were zero.

No, if you chose any of the other correlation coefficients (which are poorer choices in this case) because all the probabilities in the other three cases are p > .05

DO NOT decide whether your results are a sampling accident or not based on the value of the correlation coefficient. That is a sample estimate and subject to fluctuations. Use the probability level instead.

11. (2) What was the statistical ”level of significance” or ”p-level” for the correlation coefficient that you chose between year and number of visits in question 7?

V or Eta, p < .00001
Tau, gamma or r, p >.05

12. (2) What was the value of the correlation coefficient you chose in question 7?

See Table 1 for the value of each of the correlation coefficients.
We checked to see that the value you placed here corresponded to the correlation coefficient you chose.

13. (2) Characterize the strength of this correlation coefficient (and direction–if appropriate) in words:

Every last correlation coefficient, no matter which of the five, was VERY WEAK.

Review the STRENGTH CHART.
We will also use this for the Exam 3 material so review now if you forgot these

14. (4) A. Turning now to the difference of means test, do you think there was a real change in the population across time in the number of visits to a science museum per year?

[X ] Yes, a real difference  OR
[  ]No, a sampling accident (all the group means are the same in the population) OR
[  ]Cannot be determined from these data

     B. BRIEFLY, explain the reason for your choice in question 14A. p < .00001

If there were no change over time on visits to science museums, results such as these would occur by chance only once in 100,000 (NOT 10,000) samples. This is such an extremely rare event, we reject the null hypothesis, Ho: F = 0 and go with the alternative hypothesis HA: F > 0. (You may have used eta or 1 = 2 =3 =4 = 5. All three are essentially equivalent statements.)

Do NOT use the t-test when (as here) the independent variable has at least three values. The t is only for cases where the independent variable has two values only. (And this t went with the tau-beta, the t is the official pdf for tau-beta.)

r was not designed to test the differences in means across groups and cannot be used here.

15. (2) Which inference pdf test did you use to examine the difference between means? (HINT: one letter will do here.)
                     the F

16. (2) What was the level of statistical significance for the difference between means test?  p < .00001

17. (4) A. In this particular example (i.e., Table 1), is it better to use a cross-tabulation table or a difference of means to describe the relationship between “Year” and the “Number of Visits to a science museum”? (CHECK ONLY ONE:)

[   ] CROSS-TABULATION TABLE         [   ]DIFFERENCE OF MEANS        [ X  ]NO DIFFERENCE (basically)

     B. BRIEFLY explain one reason behind your choice in question 17A:

The difference between means test makes the changes across time somewhat easier to ascertain and interpret. In general, we would go with the difference of means test which is more concise and uses all the scores. This would especially be the case if your dependent variable has many values or categories.

When might you go with the crosstabulation table? If the means were about the same, but the standard deviations were quite different, the crosstabulation would be more informative. This crosstabulation table isn't that big so you and your reader might prefer the detail.

18. (3) Based on these results in Table 1, suppose a newspaper reporter called to ask you about any changes across time 1983 - 2001 in the number of visits to a science museum per year. BRIEFLY how would you describe the results about year and number of visits to a science museum per year to this newspaper reporter?

There were statistically significant changes in the number of science museum visits over time. However, because the relationship was nonlinear, they were not easy or straightforward to interpret. The number of visits was similar in 1983 and 2001 (slightly over 0.5 visits per year) but dipped in the late 1980s and 1990s. However, even the nonlinear relationship was so weak that the newspaper had better call on someone else for "news."
 


 
 
Inference statistics test whether two variables have a nonzero association at all. 
Correlation coefficients are descriptive statistics that measure how strong the association is.

 
 
This one is worth repeating because many students confused testing for statistical significance with effect size or strength. You CANNOT simply "eyeball" the size of a sample correlation coefficient to decide whether it is or is not zero in the population.

Set up a null hypothesis and use the associated pdf such as t, F or chi-square to test your null hypothesis that the correlation coefficient is zero in the population.
 


 



PLEASE EXAMINE YOUR EXAM CAREFULLY. GENERALLY THERE ARE COMMENTS ON EACH ONE. IF YOU DON'T UNDERSTAND YOUR SCORES PLEASE SEE ME OR MARIA AFTER CLASS, DURING OFFICE HOURS OR IN AN APPOINTMENT. PLEASE LET ME KNOW IF THERE IS AN ARITHMETIC MISTAKE IN YOUR SCORE DURING BREAK OR AFTER CLASS AND I WILL CORRECT IT.
 
 

READINGS AND ASSIGNMENTS

OVERVIEW

Susan Carol Losh November 7, 2004
This page was built with Netscape Composer
and is best viewed with Netscape Navigator
600 X 800 display resolution.