OVERVIEW

 
 

WATCH FOR:

Assignment 3
due October 20

GUIDE 1: INTRODUCTION
GUIDE 2: CONSTRUCTING A TABLE
GUIDE 3: UNIVARIATE STATISTICS AND DISPLAYS
GUIDE 4: BIVARIATE BASICS
GUIDE 5: BIVARIATE CORRELATIONS
GUIDE 6: MULTIVARIATE CROSSTABULATIONS
GUIDE 7: BASIC REGRESSION
GUIDE 8: REGRESSION SPECIFICS
GUIDE 9: SAMPLING
TO EDF 5400 READINGS AND ASSIGNMENTS

 


 
 
EDF 5400 INTRODUCTORY STATISTICS
FALL 2004
EXAM 1 GENERAL FEEDBACK

DR SUSAN CAROL LOSH
EDUCATIONAL PSYCHOLOGY AND LEARNING SYSTEMS


 
GENERAL POINTS
SCORE DISTRIBUTIONS
EXAM ANSWERS

IMPORTANT GENERAL POINTS

Here are the scores from Exam One. The maximum possible score was 100 points.

An "A" means excellent work. A "B+" means superior work. A "B" means good work.

This was a terrific group of exams. Study the distribution below and you will see what I mean. The mean was 94 and the median was 97.
 
 
The one danger I do see is complacency because most students did so well. This is cumulative material, and as we add a second, then a third (or more) variable, the material becomes more complex. As we add inference statistics to descriptive statistics, the material becomes more complex. Be sure to keep up with your reading, do the assignments on time, and ascertain where you made errors.

LETTER GRADES ARE APPROXIMATE. I will add the scores from all three exams to create a new TOTAL SCORE at the end of July. This total score will receive a grade and this total exam score grade counts 75 percent of your final grade. One point one way or the other almost certainly will not influence your final grade and I work with the numeric score, not the grade at the semester's end.

You can review grading HERE.

Please read all material on this site.
 
 
 
Please look over your exam carefully. Remember Maria and I do NOT discuss individual exams on class time (including break) although I am happy to address GENERIC issues when we review during class. We will be happy to examine your individual paper after class (but not too late, please), during office hours, or in an appointment. 

The exception is if you discover an arithmetic mistake or other numeric error in adding up your score. Please let me know this during break or after class (NOT during class) and I will see that your grade file is corrected.
 


 
EXAM PURPOSES

Any exam serves several purposes. FIRST, it should spot overall class problems in comprehension and correct them.

SECOND, an exam should assess class mastery of the material so that I can judge the pace and sophistication of presented material. So far, despite our astounding diversity, we are doing well. We have students from all over the university.  Many students have NEVER had a statistics course before, while others have had at least three.

THIRD, of course, an exam should assess your individual mastery of the material, which I am mandated to do.


If your score is low, you should "trouble shoot" why so that you can create the best learning strategy for you:

Remember: my office hours are Monday 3:15 - 5:00 PM and Wednesday, 2:00-5:00 PM.
Maria's office hours are Tuesday and Thursday 3:15 - 5:00 PM in the LRC.

You can reach me by email at: slosh@garnet.acns.fsu.edu
 

EXAM 1 SCORES

 
65
68
76
89 (2)
92 (3)
93
94 (2)
95 (2) 
96 (4)
97 (2)
98 (6)
99 (5)
100 (5)
C-
C
C+
B+
A-
A-
A-
A
A
A
A
A
A
 
 

 

Median = 97 Mean = 94.1   25th percentile = 93.0   75th percentile = 99
standard deviation = 8.3   skew =  -2.56  s.e. skew = .40
   

 
 


ANNOTATED EXAM 1 

ABBREVIATED PORTIONS ARE AREAS IN WHICH EVERYONE DID WELL. CORRECT ANSWERS ARE ANNOTATED IN RED BOLD.

Please read each question carefully. Be sure to complete each question. If you can choose among alternatives, please only answer the number specified (no extra credit is given for extra answers and we only grade the first specified number of answers.) You may add a brief explanation to any answer. Point values are usually in parentheses () next to each question.

PART A: (15) For each statement 1-4, circle whether it is TRUE or FALSE. (2 points each)
 

1. A category system for a variable must be exhaustive. 
TRUE
FALSE
2. The categories of a variable must be mutually exclusive. 
TRUE
FALSE
3. The categories of a variable must be multi-dimensional. 
TRUE
FALSE
4. The categories of a variable must be numeric.
TRUE
FALSE

Every case must fit in one of the categories and each case can only fit in ONE category of a variable. You don't want more than one dimension in a variable and numeric is nice but not necessary in a category system. Because IT IS NOT NECESSARY for a category system to have equal intervals, a lack of equal intervals is NOT necessarily a valid problem with the coding scheme for the variable "education" below.
 

 
An FSU student did a survey and originally asked adults their highest year of completed education. After collecting the data, he then created the following recoded category system shown here: 

   1. Less than 8 years of education
   2. 9-10 years of education
   3. 11-12 years of education
   4. 13-15 years of education
   5. 16 years of education
   6. 17 years of education or more
 

5. (2) What was the measurement level of the original data collected on education: nominal, ordinal, interval or ratio?

Because the student originally asked highest completed year, the original data were RATIO (you can't have fewer than 0 years of school and there is a common unit: one year.
6. (2) What was the measurement level of the recoded education variable as shown above: nominal, ordinal, interval or ratio?
Because of the unequal categories collapse, the recoded data are now ORDINAL.
7. (3) BRIEFLY describe ONE problem with the category system shown above: In general, if your ONLY critique was the lack of equal intervals, you lost two points because (A) this is not a necessity for a category system; (B) to have equal intervals in this case may not make conceptual sense (see categories #2 and #3); and (C) the answer overlooked far more serious problems (e.g., the category system was not exhaustive).
 
 
REVIEW PROPERTIES OF CATEGORY SYSTEMS HERE

AND OVER HERE TOO.

PART B: (8) For each statement below, circle whether it is TRUE or FALSE. (2 points each)
 
 
1. Statistics programs allow you to filter and recode variables. 
TRUE
FALSE
2.Statistics programs decide whether your variables are numeric or not.
TRUE
FALSE
3.Statistics programs produce far more results than you can typically use. 
TRUE
FALSE
4.Virtually all statistics programs have parts of the program that need watching or correction. 
TRUE
FALSE

Computers operate at ever-increasing high speeds. They allow you to manipulate and recode categories (Q1.) They can do calculations as fast as lightening--and do lots of them, typically TOO MANY of them (Q3.). However, they don't really "think." They don't know if you have nominal data and they can't decide if the median is better for your set of income scores than the mean (Q2.). ALL of the statistical programs have idiosyncrasies and glitches you need to check or correct, for example, the percentages add to 100% even if they really don't or rounding errors cause problems with the cumulative percentages (Q4.). You can do a bar chart with SDA but not a histogram. This will hold true for the more complex statistics too.


PART C:  2 points each (total 4 points). For ANY TWO the following, indicate whether the variable is nominal, ordinal, interval OR ratio (if all three are answered, we score only 1 & 2):

1. An individual's gender (male or female)

NOMINAL. The categories cannot be rank ordered as "more or less" or "better or worse"

2. An individual’s Graduate Record Exam score (GRE)

INTERVAL. There is no fixed zero (indeed no zero at all) but the categories are created to be equal intervals depending on the number of questions correct and standard deviation units.

3. Extent of agreement with the following statement (Strongly Agree, Agree, Neutral, Disagree or Strongly Disagree): "The cost of parking permits for FSU students should be raised or increased."

ORDINAL. A student who agrees is more in favor than one who disagrees but we don't know by  HOW MUCH more.
 

BECAUSE THIS IS SUCH A CRUCIAL AREA, IF YOU HAD TROUBLE, REVIEW HERE

PART D: 3 points each (total 9 points).  For ANY THREE of the following variables, indicate the best or most appropriate measure of central tendency to use (if all four are answered, we score only 1-3). Assume no skew or other unusual features:

1. A person’s favorite spectator sport (e.g., baseball, basketball, football, golf, soccer)

Nominal data, the MODE is the only measure of central tendency you can use here.

2. Number of years of education completed

Fixed 0 and equal intervals. Use the MEAN for this ratio variable.
(2 points if you said the MODE or the MEDIAN. You CAN use them, but they're not the BEST measures for ratio data.)

3. Rank order of grade point average in a high school graduating class (1st, 2nd, 3rd,  etc.)

The categories are rank ordered, i.e., ordinal so use the MEDIAN.
(2 points if you said the MODE. You can use it but it is often very uninformative. In this case, unless you have tied student GPAs, you won't even have a mode.)

4. Whether an individual has ever bought at least one lottery ticket (yes or no)

Anyone who answers yes has bought more tickets than anyone who answers no.
Ordinal data, median preferred (but you COULD use the mode in this particular case because there're only two categories so the category described will be the same).

You lost 2 points if you gave an impossible answer (e.g., the mean on nominal data), but at least it was a measure of central tendency. You lost 1 or 2 points if you correctly identified the level of the variable but gave an inappropriate measure of central tendency, or no measure of central tendency.
 
 

 
MOST STUDENTS KNEW WHICH MEASURES WERE APPROPRIATE FOR WHICH LEVEL OF DATA. It was IDENTIFYING the level of data that was tough! That's why each problem was three points each.


PART E: 3 points each (total 9 points).  For ANY THREE of the following variables, indicate the best or most appropriate measure of dispersion or variability (if all four are answered, we score only 1-3). Assume no skew or other unusual features:

1. Country of national origin (e.g., China, Kenya, Korea, Taiwan, USA, Venezuela)

Nominal data. The Index of Dispersion D is the only measure of dispersion you can use.

2. Number of books in an individual’s private home library

Count integer variable, fixed zero, RATIO data, use the standard deviation.
(2 points if you said "D", range, or IQR. You CAN use them, but generally they're not the BEST measures.)

3. Number of completed high school math courses among high school graduates

Another count variable with a fixed zero, RATIO data, use the standard deviation.
(2 points if you said "D", range, or IQR. You CAN use them, but generally they're not the BEST measures.)

4. Rating of the "60 Minutes" television program: excellent, very good, good, fair, or poor

Ordinal data, the range or Inter-Quartile Range are your best measures of dispersion here.
(2 points if you said "D". You CAN use it, but generally it's not the BEST measures.)

Don't drop to a lower level of measurement (e.g., the range for #2 or #3 or the mode for #4) without a GOOD REASON (which you briefly describe, of course). Lower level measures are usually less informative and use less of the data than higher level measures.
 
 
You lost 2 points if you gave an impossible answer (e.g., the standard deviation on nominal data), but at least it was a measure of dispersion. You lost 1 or 2 points if you correctly identified the level of the variable but gave an inappropriate measure of dispersion, or no measure of dispersion.

 


PART F: SYMBOLS (6, 2 points each): COMPLETELY describe in words what is meant by ANY THREE of the following symbols (if all four are answered, we score only 1-3):

1.  (“big sigma”) : to add or to sum

 2     (“mu”) : the POPULATION mean

 3.  N : the POPULATION case base or frequency size

  4. Z-score: a standardized or "normal" score; obtained by subtracting the mean from the individual score and dividing this quantity by the standard deviation. The CORRECT formula provided here received full credit too.

(HINT: It counted whether a symbol represents the population or the sample.)


PART G: For each of the following statements (2 points each for 16 points total), check whether the statement is true or false (normal curve, sampling distribution, Z-scores).]
 

1. Even though the sample of cases may not be normally distributed, the sampling distribution may be normal if the case base is large. 
TRUE
FALSE
2.  In a normal distribution, about 68 percent of the cases are found within one standard deviation of the mean. 
TRUE
FALSE
3. The frequency distribution for a normal curve takes on the shape of a backwards “J” (like the “Babies” variable in Assignment 2). 
TRUE
FALSE
4. You can use the normal curve with any kind of data, including nominal or ordinal data. 
TRUE
FALSE
5. Z-scores allow us to compare values from the same case on two different variables. 
TRUE
FALSE

1. True. Simply restates the "law of large numbers." Sample statistics, such as a mean, vary less from sample to sample than individual cases do within a single sample.

2. Is true by definition. This is one of the identifying properties of a normal curve.

3. False (and "Babies" did NOT follow a normal distribution)--the curve looks like a bell and is symmetric.

4. False, only with numeric data. You must be able to take a mean and standard deviation. This is critical and many students who got this answer correct totally forgot about the construct in parts (H) and (I).

5. Absolutely true. Remember the example of Marilyn Vos Savant in Guide 3? We used Z-scores to compare Marilyn's (fictitious) income with her IQ score.We can compare two different cases on the same variable too.

(2) In a SAMPLING DISTRIBUTION, the basic unit is (PLEASE CHECK ONLY ONE:)

[   ]A. A single case                      [   ]B. A single sample                 [   ]C. The population

(2) The measure of variability for the SAMPLING DISTRIBUTION (of the mean) is the (PLEASE CHECK ONLY ONE:)

[   ]A. Inter-quartile range             [   ]B. Standard deviation            [   ]C. Standard error

The standard error functions as the standard deviation of the mean of the sampling distribution.

(2) We use the sampling distribution of the mean to (PLEASE CHECK ONLY ONE:)

[   ]A. Generate descriptive statistics about the one particular sample we took
[   ]B. Make inferences about the population mean
[   ]C. Make sure that we have valid results

We use the sampling distribution to make inferences about the population.


PART H: “Exhibit A” shows responses to the statement “Science makes our way of life change too fast” (“Too Fast”) in a 2001 survey of American adults. It also has statistics associated with the distribution. Please use these data to answer questions 1 - 10 (24 points).

“TOO FAST”  Science makes our way of life change too fast

                                                Valid    Cum
Value Label                 Value  Frequency   Percent  Percent

STRONGLY AGREE               1.00        69       4.4      4.4
AGREE                        2.00       537      34.1     38.5
NEUTRAL                      3.00        42       2.7     41.2
DISAGREE                     4.00       842      53.5     94.6
STRONGLY DISAGREE            5.00        84       5.4    100.0
                                     -------   -------  -------
                            Total      1574     100.0    100.0

Mean          3.213      Std err        .028      Median        4.000
Mode          4.000      Std dev       1.105      Skewness      -.384
                                                  S E Skew       .062

Percentile    Value      Percentile    Value      Percentile    Value
  25.00       2.000        50.00       4.000        75.00       4.000

1. (1) Are these data a (CHECK ONE):

[  ]A. bivariate distribution?
[  ]B. multivariate distribution? OR a
[  ]C. univariate distribution? Only one variable is presented here.

2. (4) As currently coded in Exhibit A, are the responses to this statement an (PLEASE CHECK ONE:)
   [  ]A  Interval
   [  ]B. Nominal
   [  ]C. Ordinal or
   [  ]D. Ratio level variable?

Briefly explain the rationale behind your answer:

Categories are ranked ordered but there is no common unit, thus ORDINAL.
The data are NOT numbers.
 

 
GENERIC NOTE: We graded the rest of part of Part H depending on your answer to this question. We were looking to see if you knew the best measures of central tendency and dispersion to use with each level of data. Thus, if you said these were interval data (incorrect), we expected you to use the mean and standard deviation, not the median or the range. If you said the data were nominal, you could not use a median or inter-quartile range.

In general, if you dropped back to a lower level of statistics than what was appropriate (e.g., the mode or "D"), you lost 1 point for each such error. You COULD use these measures but they weren't the most appropriate.
 

3. (2) The measure of central tendency most appropriate to the variable “Too Fast” as currently coded is the (CHECK ONE:)
                                                                                   [  ]A. Mean
                                                                                   [  ]B. Median
                                                                                   [  ]C. Mode
                                                                                   [  ]D. None of the above

4. (2) BRIEFLY: Your choice in part (2) was the most appropriate "average" measure because:

These are ordinal data and the median is generally the most informative measure of central tendency for ordinal data.

5. (2) In these data, the value of this measure of central tendency would be (CHECK ONE:)

                      [   ]A. 4.000
                      [   ]B. 3.213
                      [   ]C. Neutral
                      [   ]D. Disagree

This category contains the 50th percentile. The data aren't numbers so you must use the verbal label.
 


6. (3) What percent of adults in “EXHIBIT A” AT LEAST agreed with the “science makes our way of life change too fast” statement?
                                                                   38.5      Percent

4.4% Strongly Agreed plus 34.1% Agreed. This is a cumulative percent.

7. (2) What percent of adults in 'EXHIBIT A' were Neutral about the “science makes our way of life change too fast” statement?
                                                                                                   2.7     Percent

This is NOT a cumulative percent. Be sure to read carefuly.

8. (2) Which ONE of the following would be the best or most appropriate measure of dispersion or variation to describe the data about the statement as coded in “EXHIBIT A”?

                [  ]A. The Index of Dispersion (D)
                [  ]B. The Inter Quartile range  This is ordinal data so the IQR is the most descriptive.
                [  ]C. The standard deviation
                [  ]D. The standard error
 .
9. (2) What are the two end-point categories that define the interquartile range for these data?

You can just read off "2" and "4" as the 25th and 75th percentile from the output printout, then be sure to supply the verbal labels that correspond to numeric tags "2" and "4" to get full credit for this non-numeric variable and those are: Agree to Disagree.

10. (4) Do the data as displayed in Exhibit A follow a normal distribution?  [    ]YES or  [X ]NO

BRIEFLY, give ONE reason for your choice.

This variable is not numeric and you MUST have numeric data to follow a normal distribution.
NO OTHER REASONS NEED APPLY. Under these circumstances, you can't do a mean, you can't do a standard deviation, and it doesn't matter WHAT the shape of the histogram might be.
 




PART I: (9 total) “Exhibit B” below is a graphic display of answers to the question “What happens when animals hibernate for the winter?” (For example, when bears hibernate in caves for the winter.)


 

1. (2) We call this kind of graphic display (PLEASE CHECK ONLY ONE:)

[   ]A. A frequency distribution
[   ]B. A frequency polygon
[   ]C. A histogram
[   ]D. A pie chart

2. (4) Suppose you were a statistics instructor and a student turned in “Exhibit B” to you for an assignment in constructing a histogram. Briefly describe ANY two mistakes, problems, extra information, or omissions that must be corrected in order for “EXHIBIT B” to qualify as a good example of a histogram:

NO data source.
NO case base.
NO missing data mentioned.
Legend is in body and can be placed as values underneath the histogram.

3. (3) Are the responses to the hibernation question “normally distributed”? PLEASE CHECK:

[    ] YES       or      [     ] NO

Give the most important reason why they are or are not normally distributed:

This variable is not numeric and you MUST have numeric data to follow a normal distribution.
NO OTHER REASONS NEED APPLY. Under these circumstances, you can't do a mean, you can't do a standard deviation, and it doesn't matter WHAT the shape of the histogram might be. Don't even think about apply the mean or standard deviation to nominal data!!

I don't think this point can be made too many times! This is especially true because many of the statistics we will work with shortly assume a normal distribution.
 




 

READINGS AND ASSIGNMENTS

OVERVIEW

Susan Carol Losh October 2, 2004
This page was built with Netscape Composer
and is best viewed with Netscape Navigator
600 X 800 display resolution.