|
FALL 2004 DR SUSAN CAROL LOSH |
|
|
I WILL CORRECT ANY ADDITION ERRORS OR TRANSLATE MY HANDWRITING DURING BREAK. ANY OTHER ISSUES, PLEASE WAIT UNTIL AFTER CLASS. THANK YOU. LAST DAY QUESTIONS ABOUT EXAM 1? YOU MAY EMAIL ME OR MARIA. PLEASE DO NOT E-MAIL AFTER 8 PM TUESDAY NIGHT. Different e-mail providers may take a long time to deliver their mail & we may not receive it in time. We are not responsible for late delivery of e-mail by either your provider or ours, or for server viruses that slow transmission, so please leave enough time! IF YOU E-MAIL US WEDNESDAY MORNING WE
WILL NOT HAVE TIME TO RESPOND TO YOU.
MARIA WILL BE AVAILABLE IN THE LRC TUESDAY AFTERNOON 3:15-5. RYAN WILKE, A FORMER TA FOR THIS COURSE, CAN ALSO HELP OUT IF YOU CAN COME OUT TO THE HIGH MAGNETIC LAB (INNOVATION PARK NEAR ALUMNI VILLAGE) WHERE HE WORKS ON TUESDAY. EMAIL RYAN AT: raw7447@garnet.acns.fsu.edu FOR QUESTIONS AND DIRECTIONS.
|
This assignment is worth 5 PERCENT toward
your final grade.
Remember! I use plus and minus grading
on assignments and for the final grade.
Please do not ask me or Maria to address your individual paper during class time or break, although after class is fine.
I am at least as interested in how you arrive at your answer as what your answer is.
This assignment is a good example. Let's suppose you misidentified mother's years of education as an ordinal variable (it is really a ratio variable with a fixed 0 and equal intervals of 1 year). Do you know the best measures to use with an ordinal variable? In this example, if you said the median and the Inter-Quartile Range, you would receive credit on those questions, because generally these are the best methods to use with an ordinal variable.
On the other hand, continuing with this
example, if you misidentified maeduc as ordinal, then said
the mode and the Index of Dispersion were the best measures of central
tendency and dispersion to use, you LOST credit. Why? Because I
am looking for consistency between
what you thought the level of the
variable was and the best methods to use with that kind of variable.
Although
it's the best we can do for nominal data, generally, the mode is a poor
measure of central tendency for ordinal or numeric data.
|
|
Similarly, the good histogram EXCLUDED percentages (or frequencies) in the body of the histogram because anything that clutters the interior of the histogram makes it more difficult to read. Certainly, the good histogram did NOT include percentages (or frequencies) in BOTH the body of the table AND on the "Y axis".
The "Y axis" could either be percentages of households or the frequencies. It was NOT "relative frequencies" (which could also refer to proportions or rates). Further, to have full credit, you need to label your bars correctly (i.e., you lost credit if you called percentages frequencies instead or vice-versa.)
Although the bars should touch, I recognize that it might be difficult to locate the provision in your computer program to make them do so, therefore it does NOT count against you if the bars did not touch.
However, your histogram should be complete: it should include a title, data source, total valid cases (that the histogram is based on) and missing cases.
It should be ACCURATE. The data source for this exercise was the 2002 General Social Survey. There were only 28 missing cases on the Babies variable.
Did not count the filtered cases for the years 1972-2000 that were never accessed.
Ordinal data is
non-numeric. It cannot follow a normal distribution.
Nominal data
is non-numeric too, so it can't be normally distributed.
On the other hand, "maeduc" (mother's years of education) is RATIO. One year is the unit and you can't have less than zero years of education. (If you said interval, that was OK for purposes of this assignment.) The key thing is that you recognized that "maeduc" is numeric. Even if the last category was truncated, this is basically a ratio variable. There are several reasons why the "high category" may be truncated or collapsed: there may be very few people in this category; the extreme high scores may be so extreme that they will distort univariate--and more complex--statistics; a very extreme score might enable a nosey individual to identify that person (the federal US government is very sensitive to this issue).
marital (individual's marital status) is NOMINAL. We can say whether two people have the same or different marital status, but there is no inherent rank order to the categories.
babies (number of household members under 6) is RATIO. You can count the number of household members under 6 and you can't have fewer than zero.
This section counted 4 points.
I examine your designated measure of central tendency and dispersion in the context of your answer about level of measurement. For example, if you incorrectly designated "marital" as ordinal, I expected you to choose the median as the best measure of central tendency and the range or inter-quartile range as the best measure of dispersion. While you lost one point for the earlier question, you would get 1 FULL POINT for correctly choosing the best measures for an incorrectly identified ordinal variable. However, if you designated "marital" as ordinal then selected the mode as the most appropriate measure of central tendency, you lost credit (unless you had a real good explanation) because the median is usually the best measure of central tendency for ordinal data.
Because "attend" as presented is ordinal, its best measure of central tendency is the median. You must use the VERBAL CATEGORY that corresponds to category number 3 because the variable values are not really numbers, which is "several times a year."
Because "maeduc" is a ratio variable,
its best measure of central tendency is the mean, which is the number "11.45."
|
Because marital is a nominal variable, its best measure of central tendency is the mode. You must use the VERBAL CATEGORY that corresponds to category number 1, which is "married."
Combining both the best measure of central tendency for each variable AND the correct value for the mean, median or mode was worth a total of 4 points.
If you decided the mode was the best measure of central tendency for any of the other three variables, again, the measure of dispersion to use is the Index of Dispersion "D." The SDA system does not calculate "D." If you did it by hand, you probably had a hard time for either "attend" or mother's years of education.
You did not lose credit if you miscalculated
"D" unless you presented a number larger than one.
D varies between zero and one. D CANNOT
be larger than 1.
For fractional indices such as "D," use
TWO
decimal places.
(The convention will be two decimal places
for correlation coefficients also.)
Because "maeduc" is a ratio variable, its best measure of dispersion was the standard deviation of the mean, which is the number "3.49."
Because "attend" is ordinal, its best measure of dispersion is also ordinal. The Inter-Quartile Range is much more informative than the Range (the range is less than high school to graduate work). The IQR or the endpoints of the "middle 50 percent" ranges from less than once a year to nearly once a week.
Do NOT substract for either the range or the IQR when you have ordinal data. The categories are not numbers and they cannot be added or subtracted.
Do your cumulative percents carefully!
Some
people were one category off on the high or the low end when they calculated
their IQR.
|
Combining both the best measure of dispersion for each variable AND identifying the correct value for that measure (or indicating that "D" was unavailable) was worth a total of 4 points.
Are any of your variables normally distributed? Let's make it easy.
You need a numeric variable to examine a normal distribution. (How else can you discuss the mean or standard deviation which require arithmetic operations such as subtraction or division?)
"Attend" and "marital" are NOT numeric variables. Therefore they can't be normally distributed.
"Babies" IS a numeric variable, so let's examine a second or even a third criterion for a normal distribution.
"Babies" (1) has a large positive skew (those few cases with 3 or 4 household members under age 6) and (2) it doesn't look anything like a "bell" shape. Instead, the frequency distribution for "babies" looks like a backward "J". So rule out "babies" as following a normal distribution.
"Maeduc" is numeric and approximately bell-shaped. Its skew is negative and relatively small (-.82).
The mean, median and mode are the same number when data follow a normal distribution. For "maeduc" in this sample, the mean is 11.45 years, the median is 12 years, and the mode is 12 years. Are these "almost" the same? The mean is more than 2 standard error units (.07 X 1.96) away from the median but it's still pretty close.
"Maeduc" is close enough to apply the
standard deviation property of the normal curve.
|
A "statistical purist" would say, no,
not normal.
A "somewhat impurist" (me) would say
"approximately normal".
Either way, I looked for your reasoning,
not a simple "yes" or "no". I looked to see what evidence you used
and if you used it correctly.
And I expected you to cite at least
two properties of the normal curve in your answer. That's what could enable
you, for example, to distinguish between "maeduc" and "babies".
Assessing whether any of your variables
were normally distributed, and why, was worth a total of 3 points.
|
|
|
You
used numberic categories for a nominal or ordinal value of central tendency
or dispersion and didn't state what the verbal values were. Similarly,
you shouldn't do subtraction for either the range or the inter-quartile
range UNLESS your data are numeric (even then, the end points are more
informative).
You
used number categories that appeared NOWHERE on your output.
You
incorrectly identified the level of measurement for the type of variable
that you had.
BEWARE!
You will have similar problems on Exam 1.
Further, you must be able to identify
the type of variables that you have in order to identify the best measure
of correlation for those variables.
You
choose an inappropriate measure of central tendency or dispersion for the
kind of data that you identified.
Overusing
the mode. Often, the mode is uninformative. If the data are relatively
evenly spread across categories, the mode does not give useful information
(and, what do you do if you have TWO OR MORE modes?) The mode does not
incorporate every single score. We use a mode (always) with nominal data
because we can't use anything else. But the median or the mean, especially
coupled with their associated measures of dispersion, give us more information
about what the "typical score" really looks like.
Your
measures of central tendency and dispersion were inconsistent, for
example, you selected a median, then a standard deviation of the mean for
"maeduc" of for "babies" (this happened a lot). Stay with the same level
(e.g., ordinal or interval/ratio) for your measures of central tendency
and dispersion for a particular variable. (This was also mentioned in class
Monday 9-20 and Wednesday 9-22.)
Some students first misidentified the level of measurement. Next, they
specified a measure of central tendency that did correspond to the true
level of measurement--but not to the level the student identified.
Then, they specified a measure of dispersion inconsistent with either one.
BAD EXAMPLE:
First (incorrectly) designating "maeduc" as ordinal.
Then specifying the mean as the best measure,
You cannot do a mean on ordinal data.
I want to know if YOU know which type
of measure goes with which level of category system.
Then, designating the Index of Dispersion
("D") for the best measure of dispersion. D is a nominal measure.
Your
histogram was incomplete. For example, it lacked a title, a data source,
or even omitted a category.
Your histogram stated an incorrect number of valid or missing cases.
Your histogram was needlessly cluttered. In addition to the percentage
or frequency scale on the "y axis" you also labelled your percents (and
some people labelled percents and then put the frequencies too. KEEP IT
SIMPLE to make it readable.) The most common error was to place the percentages
both along the side AND in the body of the histogram. Eliminate percents
(or frequencies) in the body of the histogram.
You
didn't mention at all what your "y axis" was or had an incorrect label.
Frequencies? Percentages? (Since you had a choice, it was important to
label this for your reader.)
You didn't understand why "maeduc" had an approximately normal distribution
in this assignment.
You
didn't realize that non-numeric variables (nominal or ordinal) cannot be
normally distributed.
You
didn't explain AT ALL why your variables were or were not normally distributed.
You
not only used a range on nominal data, you SUBTRACTED the lowest from the
highest category and produced a number for nominal or ordinal data. Neither
nominal nor truly ordinal data are numeric so you cannot do numeric operations
on them. You can't substract two numbers with ordinal data either because
the values of an ordinal variable are not numbers.
You
miscalculated doing cumulative percents, either for the Inter-quartile
Range or while assessing the normality of "maeduc".
|
|
|
|
READINGS AND ASSIGNMENTS |
OVERVIEW |
|
Susan Carol Losh September
26 2004
This page was built with
Netscape Composer
and is best viewed with
Netscape Navigator
600 X 800 display resolution.
You mean that's who we get to blame?
Welc
me
to Fl
rida!