|
FALL 2004 EXAM 1 GENERAL FEEDBACK DR SUSAN CAROL
LOSH
|
|
|
|
|
|
|
Here are the scores from Exam One. The maximum possible score was 100 points.
An "A" means excellent work. A "B+" means superior work. A "B" means good work.
This
was a terrific group of exams. Study the distribution below and
you will see what I mean. The mean was 94 and
the median was 97.
|
LETTER GRADES ARE APPROXIMATE. I will add the scores from all three exams to create a new TOTAL SCORE at the end of July. This total score will receive a grade and this total exam score grade counts 75 percent of your final grade. One point one way or the other almost certainly will not influence your final grade and I work with the numeric score, not the grade at the semester's end.
You can review grading HERE.
Please read all material
on this site.
|
|
|
Any exam serves several purposes. FIRST, it should spot overall class problems in comprehension and correct them.
SECOND, an exam should assess class mastery of the material so that I can judge the pace and sophistication of presented material. So far, despite our astounding diversity, we are doing well. We have students from all over the university. Many students have NEVER had a statistics course before, while others have had at least three.
THIRD, of course, an exam should assess your individual mastery of the material, which I am mandated to do.
If your score is low, you should "trouble shoot" why so that you can create the best learning strategy for you:
You can reach me by email at:
slosh@garnet.acns.fsu.edu
|
|
|
|
65
68 76 89 (2) 92 (3) 93 94 (2) 95 (2) 96 (4) 97 (2) 98 (6) 99 (5) 100 (5) |
C-
C C+ B+ A- A- A- A A A A A A
|
|
standard deviation = 8.3 skew = -2.56 s.e. skew = .40 |
|
|
ABBREVIATED PORTIONS ARE AREAS IN WHICH EVERYONE DID WELL. CORRECT ANSWERS ARE ANNOTATED IN RED BOLD.
Please read each question carefully. Be sure to complete each question. If you can choose among alternatives, please only answer the number specified (no extra credit is given for extra answers and we only grade the first specified number of answers.) You may add a brief explanation to any answer. Point values are usually in parentheses () next to each question.
PART A: (15) For each statement
1-4, circle whether it is TRUE or FALSE. (2 points each)
| 1. A category system for a variable must be exhaustive. |
|
|
| 2. The categories of a variable must be mutually exclusive. |
|
|
| 3. The categories of a variable must be multi-dimensional. |
|
|
| 4. The categories of a variable must be numeric. |
|
|
Every case must
fit in one of the categories and each case can only fit in ONE category
of a variable. You don't want more than one dimension in a variable and
numeric is nice but not necessary in a category system. Because IT IS NOT
NECESSARY for a category system to have equal intervals, a lack of equal
intervals is NOT necessarily a valid problem with the coding scheme for
the variable "education" below.
|
5. (2) What was the measurement level of the original data collected on education: nominal, ordinal, interval or ratio?
Because the student originally asked highest completed year, the original data were RATIO (you can't have fewer than 0 years of school and there is a common unit: one year.6. (2) What was the measurement level of the recoded education variable as shown above: nominal, ordinal, interval or ratio?
Because of the unequal categories collapse, the recoded data are now ORDINAL.7. (3) BRIEFLY describe ONE problem with the category system shown above:
REVIEW PROPERTIES OF CATEGORY SYSTEMS HERE |
| 1. Statistics programs allow you to filter and recode variables. |
|
|
| 2.Statistics programs decide whether your variables are numeric or not. |
|
|
| 3.Statistics programs produce far more results than you can typically use. |
|
|
| 4.Virtually all statistics programs have parts of the program that need watching or correction. |
|
|
Computers operate at ever-increasing high speeds. They allow you to manipulate and recode categories (Q1.) They can do calculations as fast as lightening--and do lots of them, typically TOO MANY of them (Q3.). However, they don't really "think." They don't know if you have nominal data and they can't decide if the median is better for your set of income scores than the mean (Q2.). ALL of the statistical programs have idiosyncrasies and glitches you need to check or correct, for example, the percentages add to 100% even if they really don't or rounding errors cause problems with the cumulative percentages (Q4.). You can do a bar chart with SDA but not a histogram. This will hold true for the more complex statistics too.
PART C: 2 points each (total 4 points). For ANY TWO the following, indicate whether the variable is nominal, ordinal, interval OR ratio (if all three are answered, we score only 1 & 2):
1. An individual's gender (male or female)
NOMINAL. The categories cannot be rank ordered as "more or less" or "better or worse"
2. An individual’s Graduate Record Exam score (GRE)
INTERVAL. There is no fixed zero (indeed no zero at all) but the categories are created to be equal intervals depending on the number of questions correct and standard deviation units.
3. Extent of agreement with the following statement (Strongly Agree, Agree, Neutral, Disagree or Strongly Disagree): "The cost of parking permits for FSU students should be raised or increased."
ORDINAL. A student
who agrees is more in favor than one who disagrees but we don't know by
HOW MUCH more.
|
|
PART D: 3 points each (total 9 points). For ANY THREE of the following variables, indicate the best or most appropriate measure of central tendency to use (if all four are answered, we score only 1-3). Assume no skew or other unusual features:
1. A person’s favorite spectator sport (e.g., baseball, basketball, football, golf, soccer)
Nominal data, the MODE is the only measure of central tendency you can use here.
2. Number of years of education completed
Fixed 0 and equal
intervals. Use the MEAN for this ratio variable.
(2 points if
you said the MODE or the MEDIAN. You CAN use them, but they're not the
BEST measures for ratio data.)
3. Rank order of grade point average in a high school graduating class (1st, 2nd, 3rd, etc.)
The categories
are rank ordered, i.e., ordinal so use the MEDIAN.
(2 points if
you said the MODE. You can use it but it is often very uninformative. In
this case, unless you have tied student GPAs, you won't even have a mode.)
4. Whether an individual has ever bought at least one lottery ticket (yes or no)
Anyone who answers
yes has bought more tickets than anyone who answers no.
Ordinal data,
median preferred (but you COULD use the mode in this particular case because
there're only two categories so the category described will be the same).
You lost
2 points if you gave an impossible answer (e.g., the mean on nominal data),
but at least it was a measure of central tendency. You lost 1 or 2 points
if you correctly identified the level of the variable but gave an inappropriate
measure of central tendency, or no
measure of central tendency.
|
PART E: 3 points each (total 9 points). For ANY THREE of the following variables, indicate the best or most appropriate measure of dispersion or variability (if all four are answered, we score only 1-3). Assume no skew or other unusual features:
1. Country of national origin (e.g., China, Kenya, Korea, Taiwan, USA, Venezuela)
Nominal data. The Index of Dispersion D is the only measure of dispersion you can use.
2. Number of books in an individual’s private home library
Count integer
variable, fixed zero, RATIO data, use the standard deviation.
(2 points if
you said "D", range, or IQR. You CAN use them, but generally they're not
the BEST measures.)
3. Number of completed high school math courses among high school graduates
Another count
variable with a fixed zero, RATIO data, use the standard deviation.
(2 points if
you said "D", range, or IQR. You CAN use them, but generally they're not
the BEST measures.)
4. Rating of the "60 Minutes" television program: excellent, very good, good, fair, or poor
Ordinal data,
the range or Inter-Quartile Range are your best measures of dispersion
here.
(2 points if
you said "D". You CAN use it, but generally it's not the BEST measures.)
Don't drop to
a lower level of measurement (e.g., the range for #2 or #3 or the mode
for #4) without a GOOD REASON (which you briefly describe, of course).
Lower level measures are usually less informative and use less of the data
than higher level measures.
|
PART F: SYMBOLS (6, 2 points each): COMPLETELY describe in words what is meant by ANY THREE of the following symbols (if all four are answered, we score only 1-3):
1.
(“big sigma”) : to add or to sum
2
(“mu”) : the POPULATION mean
3. N : the POPULATION case base or frequency size
4. Z-score: a standardized or "normal" score; obtained by subtracting the mean from the individual score and dividing this quantity by the standard deviation. The CORRECT formula provided here received full credit too.
(HINT: It counted whether a symbol represents the population or the sample.)
PART G: For each of the following
statements (2 points each for 16 points total), check whether the statement
is true or false (normal curve, sampling distribution, Z-scores).]
| 1. Even though the sample of cases may not be normally distributed, the sampling distribution may be normal if the case base is large. |
|
|
| 2. In a normal distribution, about 68 percent of the cases are found within one standard deviation of the mean. |
|
|
| 3. The frequency distribution for a normal curve takes on the shape of a backwards “J” (like the “Babies” variable in Assignment 2). |
|
|
| 4. You can use the normal curve with any kind of data, including nominal or ordinal data. |
|
|
| 5. Z-scores allow us to compare values from the same case on two different variables. |
|
|
1. True. Simply restates the "law of large numbers." Sample statistics, such as a mean, vary less from sample to sample than individual cases do within a single sample.
2. Is true by definition. This is one of the identifying properties of a normal curve.
3. False (and "Babies" did NOT follow a normal distribution)--the curve looks like a bell and is symmetric.
4. False, only with numeric data. You must be able to take a mean and standard deviation. This is critical and many students who got this answer correct totally forgot about the construct in parts (H) and (I).
5. Absolutely true. Remember the example of Marilyn Vos Savant in Guide 3? We used Z-scores to compare Marilyn's (fictitious) income with her IQ score.We can compare two different cases on the same variable too.
(2) In a SAMPLING DISTRIBUTION, the basic unit is (PLEASE CHECK ONLY ONE:)
[ ]A. A single case [ ]B. A single sample [ ]C. The population
(2) The measure of variability for the SAMPLING DISTRIBUTION (of the mean) is the (PLEASE CHECK ONLY ONE:)
[ ]A. Inter-quartile range [ ]B. Standard deviation [ ]C. Standard error
The standard error functions as the standard deviation of the mean of the sampling distribution.
(2) We use the sampling distribution of the mean to (PLEASE CHECK ONLY ONE:)
[ ]A. Generate descriptive
statistics about the one particular sample we took
[ ]B.
Make inferences about the population mean
[ ]C. Make sure that we have
valid results
We use the sampling distribution to make inferences about the population.
PART H: “Exhibit A” shows responses to the statement “Science makes our way of life change too fast” (“Too Fast”) in a 2001 survey of American adults. It also has statistics associated with the distribution. Please use these data to answer questions 1 - 10 (24 points).
“TOO FAST” Science makes our way of life change too fast
Valid Cum
Value Label
Value Frequency Percent Percent
STRONGLY AGREE
1.00 69
4.4 4.4
AGREE
2.00 537
34.1 38.5
NEUTRAL
3.00 42
2.7 41.2
DISAGREE
4.00 842
53.5 94.6
STRONGLY DISAGREE
5.00 84
5.4 100.0
------- ------- -------
Total 1574 100.0
100.0
Mean 3.213
Std err .028
Median 4.000
Mode 4.000
Std dev 1.105
Skewness -.384
S E Skew .062
Percentile Value
Percentile Value Percentile
Value
25.00 2.000
50.00 4.000
75.00 4.000
1. (1) Are these data a (CHECK ONE):
[ ]A. bivariate distribution?
[ ]B. multivariate distribution?
OR a
[ ]C. univariate
distribution? Only one variable is presented here.
2. (4) As currently coded in Exhibit A,
are the responses to this statement an (PLEASE CHECK ONE:)
[ ]A Interval
[ ]B. Nominal
[ ]C. Ordinal
or
[ ]D. Ratio level variable?
Briefly explain the rationale behind your answer:
Categories are
ranked ordered but there is no common unit, thus ORDINAL.
The data are
NOT numbers.
|
3. (2) The measure of central tendency
most appropriate to the variable “Too Fast” as currently coded is the (CHECK
ONE:)
[ ]A. Mean
[ ]B. Median
[ ]C. Mode
[ ]D. None of the above
4. (2) BRIEFLY: Your choice in part (2) was the most appropriate "average" measure because:
These are ordinal data and the median is generally the most informative measure of central tendency for ordinal data.
5. (2) In these data, the value of this measure of central tendency would be (CHECK ONE:)
[ ]A. 4.000
[ ]B. 3.213
[ ]C. Neutral
[ ]D. Disagree
This category
contains the 50th percentile. The data aren't numbers so you must use the
verbal label.
6. (3) What percent of adults in “EXHIBIT
A” AT LEAST agreed with the “science makes our way of life change too fast”
statement?
38.5 Percent
4.4% Strongly Agreed plus 34.1% Agreed. This is a cumulative percent.
7. (2) What percent of adults in 'EXHIBIT
A' were Neutral about the “science makes our way of life change too fast”
statement?
2.7 Percent
This is NOT a cumulative percent. Be sure to read carefuly.
8. (2) Which ONE of the following would be the best or most appropriate measure of dispersion or variation to describe the data about the statement as coded in “EXHIBIT A”?
[ ]A. The Index of Dispersion (D)
[ ]B. The Inter Quartile range This
is ordinal data so the IQR is the most descriptive.
[ ]C. The standard deviation
[ ]D. The standard error
.
9. (2) What are the two end-point categories
that define the interquartile range for these data?
You can just read off "2" and "4" as the 25th and 75th percentile from the output printout, then be sure to supply the verbal labels that correspond to numeric tags "2" and "4" to get full credit for this non-numeric variable and those are: Agree to Disagree.
10. (4) Do the data as displayed in Exhibit A follow a normal distribution? [ ]YES or [X ]NO
BRIEFLY, give ONE reason for your choice.
This variable
is not numeric and you MUST have numeric data to follow a normal distribution.
NO OTHER REASONS
NEED APPLY. Under these circumstances, you can't do a mean, you can't do
a standard deviation, and it doesn't matter WHAT the shape of the histogram
might be.
1. (2) We call this kind of graphic display (PLEASE CHECK ONLY ONE:)
[ ]A. A frequency distribution
[ ]B. A frequency polygon
[ ]C.
A histogram
[ ]D. A pie chart
2. (4) Suppose you were a statistics instructor and a student turned in “Exhibit B” to you for an assignment in constructing a histogram. Briefly describe ANY two mistakes, problems, extra information, or omissions that must be corrected in order for “EXHIBIT B” to qualify as a good example of a histogram:
NO data source.
NO case base.
NO missing data
mentioned.
Legend is in
body and can be placed as values underneath the histogram.
3. (3) Are the responses to the hibernation question “normally distributed”? PLEASE CHECK:
[ ] YES or [ ] NO
Give the most important reason why they are or are not normally distributed:
This variable
is not numeric and you MUST have numeric data to follow a normal distribution.
NO OTHER REASONS
NEED APPLY. Under these circumstances, you can't do a mean, you can't do
a standard deviation, and it doesn't matter WHAT the shape of the histogram
might be. Don't even think about apply the mean or standard deviation to
nominal data!!
I don't think
this point can be made too many times! This is especially true because
many of the statistics we will work with shortly assume a normal distribution.
![]() |
READINGS AND ASSIGNMENTS |
OVERVIEW |
|
Susan Carol Losh October
2, 2004
This page was built with
Netscape Composer
and is best viewed with
Netscape Navigator
600 X 800 display resolution.