EDF
5400 INTRODUCTORY STATISTICS
FALL
2004
DR SUSAN
CAROL LOSH
|
This assignment is worth 5 PERCENT toward
your final grade.
Remember! I use plus and minus grading
on assignments and for the final grade.
This Feedback page is generic. Please read
it all the way through If you feel it does not address the score on your
paper, please make an appointment and we will go over your paper.
Please DO
NOT ask me or Maria to review your personal paper either
during
class or during class break. I will be glad to review it during
office hours or through an appointment. Please review this generic feedback
FIRST.*
CLASS POLICY: I will not go over your
personal paper during class or break. Thank you.
|
*However, if you cannot read my handwriting, I will be glad to translate
it for you.
Although this assignment does not count
very much toward your final grade, and although it is the first one of
the semester, it is probably the hardest thing that you will do in Statistics
this Fall. When you recategorize a variable, you need to do a lot of thinking
about what that variable and its values mean. You need to decide what makes
the people or other types of cases "in the same category" the same,
and what makes people in other categories different from those in the first
category.
In addition, you needed to know the measurement
level of a variable in order to select the correct modal and median values
for it. As coded in its original form within the SDA system, "emailhr"
is ratio. After your category collapses, of course, this became an ordinal
variable.
If you took a mean on the recoded
variable,
you were in error. This recoded variable was no longer interval, ratio
or numeric data. It became ordinal data after your recode.
Most
assignments were quite good. Some absolutely terrific rationales that were
very interesting to read. Very heavy email users were held in low regard
and received labels such as "addicted" or "no life beyond the screen,"
Nonusers were also perceived negatively and received labels such as "the
lazy" or "hermit".
Virtually
all the tables were attractively drawn, typically using a word processor
or spreadsheet program.
PLEASE
NOTE: If your score was below 15 on this assignment,
please review readings, Guides and Assignment 1 very thoroughly. Decide
if you made careless errors (that you won't make again) or whether you
have some serious misunderstandings of the material. A tutor may be in
order. Generally, it is NOT a good idea to use a math or math statistics
major because this course is NOT about formulae, but about data decisions.
I can make some suggestions if you give me one session's lead time or email
me. And, of course, Maria and I will be glad to help you. (So try us first.)
This paper:
-
Completed all parts of the assignment.
CRITICAL!
-
Followed all directions.
-
Turned in all computer
output.
-
Ran the programs correctly
(that was the easy part and everyone did.)
-
Retained all the valid
cases while recoding the data.
Creativity is terrific
but be sure to complete all the assignment requirements FIRST!
It is vital to
be able to construct and read a well-presented table. Although the table
is a basic building block of data analysis, it is used constantly in both
popular and professional prose.
You constructed
a UNIVARIATE PERCENTAGE table, a very widely used data presentation tool.
THE TABLE ITSELF
in this assignment:
-
was uncluttered
because it presented only percents and not cell frequencies but
it
-
did include the total
valid
casebase.
(It was OK to put the valid casebase in parentheses underneath the 100.0%.)
There were a few deviations here and at least one casebase where I could
not figure out the source.
-
The percents themselves
totalled approximately 100 percent (don’t worry about rounding errors such
as 99.9 or 100.1% but do make a note of them).
-
Deviations from 100.0
percent were labeled with a footnote stating deviations were due to rounding
error (or you rounded the modal category up or down very slightly
so that percentages added to 100.0).
-
WAS COMPLETE.
It had a TITLE, a DATA SOURCE, a VALID CASEBASE, and
the
amount of missing data. The most likely omission in this
group was the missing data.
-
I hope you remembered
to use "n" for a sample, instead of "N" for a population
(a few people did not).
-
Each category was
labeled in a descriptive manner and included the age values for that
particular category.
Remember:
the computer will always tell you the percents added to 100.0 percent.
It is programmed that way. (Computers don't really think.) Be sure to check
whether that is the case for the data in your table.
YOUR CATEGORY
SYSTEM:
-
Clearly stated which
categories it grouped together to form new categories.
-
In its rationale, clearly
stated WHY the particular categories were grouped together the way
they were and what individuals in each category had in common.
-
Assigned each new category
a new descriptive label (and included the number of hours so that
your reader could tell at a glance what categories such as "low" or "moderate"
meant.)
-
Did not deliberately
create categories with either zero cases or less than one percent of the
cases.
-
Recognized that nonusers--persons
with computers who had zero email hours were qualitatively different than
individuals who used email even 1 hour a week.
-
Did not try to force
categories into equal intervals of 14 hours each because this created categories
that were either enormous (over 90 percent of cases) or very tiny (less
than 0.5% of cases).
-
Did not try to force
categories into a "normal curve". Data MUST BE NUMERIC TO FOLLOW A NORMAL
DISTRIBUTION. As soon as you collapsed categories fromm 33 to 5, the
data became ordinal, hence nonnumeric (we will resume with the normal curve
on Monday).
-
Did not create "intuitive"
categories without sharing the reasons behind your intuition with the reader
(me).
-
Used ALL the valid
cases.
YOUR ANSWERS TO THE
FOUR STATISTICS QUESTIONS:
-
Used the mode (0) and
median (2 hours) as directed from the original 33 values
of weekly email hours.
-
Applied cumulative percents
correctly for the "less than 7 hours per week" question (that's 6
hours per week OR LESS--that was 81.1 percent which you could
actually get directly from the "bad recode example output) and the
"at least 30 hours per week question" question (that's 30 hours per week
OR
MORE--that was either 2.3 or 2.6 percent depending on
whether you cumulated frequencies or percentages; the differences were
due to rounding).
You lost credit
if:
-
You did not construct
a table, but substituted computer output or something else instead of a
percentage table.
-
You omitted parts of
the assignment and did not answer them.
-
You mixed up the mean
value with the median value.
-
You used the statistics
from the recoded table instead of the original table.
-
You used the cumulative
percent from seven or less instead of the correct less than seven hours
weekly.
-
You had some cumulative
percents and I could not figure out how you obtained them at all.
-
You included BOTH
frequencies AND percentages in your table.
-
You cluttered up the
table by putting a percent mark in every cell, instead of just the first
category percentage and the total of 100%.
-
You omitted percent
marks entirely (having the % mark at the very top of the column was OK)/
-
You omitted the valid
casebase.
-
You omitted any mention
of missing data entirely, or incorrectly designated the missing
cases. These were largely cases which did not have either computer or Internet
access somewhere. The unselected cases for the surveys ("filtered out")
for the years 1972-2000 are not missing. You simply selected only
the cases for the year 2002.
-
You omitted a descriptive
title.
-
You omitted the source
of the data or you used an incorrect source. We used the year 2002
General Social Survey for this assignment.
-
You miscalculated cumulative
percentages.
-
You put a percent, and
not a category, for the mode or the median.
-
You put nonusers and
email users in the same category.
-
You created either enormous
categories (over 90 percent of the cases in one category) or very tiny
categories (e.g., less than one-half percent). Very tiny categories are
basically like "throwing away" a category. Sometimes you can't help this
because that's how the observed data are distributed. But if YOU create
the categories, these very large or small frequencies can usually be avoided.
-
You used an incomplete
subset of the assignment from last year's or last semester's EDF
5400 course which used a totally different variable. If by chance you route
to the wrong course year or semester for EDF 5400, ALWAYS check for the
course year at the top of the page and make sure it reads "Fall 2004."
Also be sure that the construction sign has been removed from the top of
the assignment page (or any other page for this course) before you print
the website.
-
You didn't understand
a cumulative percentage so the answers to question 6, parts 3 or
4 were incorrect.
-
Become used to slightly
different ways of describing a cumulative percentage. You need a cumulative
percentage to easily calculate the interquartile range (IQR), a very useful
measure of dispersion for ordinal data. There WILL be cumulative
percents on Exam 1.
-
Your "cumulative percentage"
didn't resemble any of the data.
-
You didn't explicitly
say which categories you collapsed to make a new category, leaving
your reader to guess at what each category contained.
-
You didn't tell why
you grouped together the categories that you did, that is, you didn't say
what it was that people in the same category had in common. And how these
people differed from people in the other categories. Thus your rationale
was
incomplete.
| NONUSERS...REASONABLE USERS...ADDICTS... |
Remember that individuals who did not have
any computer access were coded missing. They were NOT coded zero. Individuals
with no computer access were not asked any questions about email usage.
If you included individuals without a computer in the "0" group, you lost
credit because the assignment notes these people were coded as missing.
I also mentioned this in two different classes.
Individuals with computers who were total
nonusers belong in a DIFFERENT category from users, even those who use
as little as 1 email hour per week. This is a qualitative difference. Further,
these individuals comprised nearly one-third of the sample so the category
was not negligible.
Almost all the rationales were clear and
a lot of fun to read. Even when the same approach was taken (e.g., probable
work use), different people used different rationales and different coding
schemes.
In either case, as you go up the hourly
scale, there are generally fewer people who used email a great deal. So
be careful not to create categories that are so tiny that you basically
"threw a category away" and really only had FOUR categories instead of
five.
You will have only a limited number of
categories to describe your data in a univariate distribution. Maybe you
will have more than five categories--but not too many more than that because
it will make the table too difficult to read. You typically don't have
categories to "throw away" on tiny numbers of cases.
|
PLEASE EXAMINE YOUR ASSIGNMENT.
COMMENTS ARE ON THEM AS APPROPRIATE.
|
Susan Carol Losh September
19, 2004
This page was built with
Netscape Composer
and is best viewed with
Netscape Navigator
600 X 800 display resolution.
This
was Hurricane Frances