| IMPORTANT
ANNOUNCEMENT:
As of 6 PM Tuesday September 14 2004, FSU plans for all classes to be held as scheduled. They will update the alerts page on Wednesday. Please watch Blackboard for updates and announcements as well, and be assured I will monitor the storm situation carefully. Hurricane Ivan has also slowed its forward motion so that it is arriving in Florida later than originally anticipated. Right now the weather looks like typical Tallahassee rain for Wednesday evening. I AM planning on holding class Wednesday but probably only for about an hour so we can get an early start home. Because Assignment 2 is now up and it is due next Wednesday, I can do some demonstrations with SDA tomorrow. I can answer any last minute questions about Assignment 1 and examine some of the graphics material at the end of Guide 3. We will examine sampling distributions, the normal curve, Z-scores and confidence intervals next week. Under the circumstances I will take Assignment 1 in my mailbox BY 5 PM WEDNESDAY, which is 307 Stone Building in the Educational Psychology suite. All Assignment 1s are due Wednesday--tomorrow--by class (the original due date). |
|
FALL 2004 DR SUSAN CAROL
LOSH
|
MY
OFFICE HOURS ARE MONDAY AND WEDNESDAY 3:30-5 AT 307K STONE BUILDING.
MARIA
HAS OFFICE HOURS TUESDAY- THURSDAY 3:15-5 IN THE LRC ROOM 124 STONE.
|
Different e-mail providers may take a long time to deliver their mail & we may not receive it in time. Maria and I are not responsible for late delivery of e-mail by either your provider or ours, or for server viruses that slow transmission, so please leave enough time!
|
|
|
TODAY |
|
RUN #2 |
|
We often summarize a univariate distribution with a measure of central tendency and a measure of dispersion or variation. Measures of central tendency include modes, medians, and means, while measures of variation include the index of dispersion (D), the range, the inter-quartile range (IQR), and the standard deviation. You must decide whether each variable is nominal, ordinal or interval/ratio and examine its distribution to choose the appropriate measures of central tendency and dispersion for each variable.
A normal curve is defined by a collection
of mathematical criteria. Are any of your variables normally distributed?
Take a look at the criteria, and take a look at the distribution of each
of your variables, and you decide.
|
|
|
|
This total assignment is worth 20 points.
Correctly following all programming information for running frequency distributions on the four variables and turning in all output = 2 points.
Although
your actual output does not count very heavily, I MUST receive your output
in order for you to receive credit on this assignment.
Constructing a histogram on the variable "babies" = 3 points.
Designating the measurement level of each of the four variables correctly = 4 points.
Designating the most appropriate measure of central tendency for each of the four variables = 2 points.
Designating the correct value of the chosen measure of central tendency for each variable = 2 points.
Designating the most appropriate measure of dispersion for each of the four variables = 2 points.
Designating the correct value of the chosen measure of dispersion for each variable (when available) = 2 points.
Assessing whether one's variables are normally distributed with rationale = 3 points.
|
|
1. You will run FREQUENCIES on four variables:
maeduc
(highest
year of school the individual's mother completed)
marital (the individual's marital status)
attend (how often the individual attends religious services)
and
babies (number of household members under 6 years old)
2.
You
will construct a histogram on the variable "babies"
3.
You
will identify the measurement level of each of these four variables
as
they were coded and appeared in your computer run.
4.
For
each of these four variables
as appearing in your
computer run, you will select the most appropriate measure of
central
tendency, and
give the value of the measure of central tendency
that you chose in each case.
5.
For
each of these four variables as appearing in your
computer run, you will select the
most appropriate measure of dispersion, and then give the value
of the measure of dispersion that you chose in each case (when available;
"D" is not available but may be named as the most appropriate measure of
dispersion depending on the particular variable).
6.
You
will decide if any of your variables are normally distributed (and
briefly describe the rational behind your decision).
As you will see in your computer exercise, the SDA system generates MANY statistics when it runs the frequencies program. This is typical of most statistical software, such as the SPSS program. The computer program will do the calculations, but it is your job to select among all the statistics that the SDA generates.
You are the one who decides, in fact, whether the answer will be a number, or the verbal designation associated with a numeric code that is the true category value (for nominal or ordinal variables.)
Accessing an online database (the General Social Survey) and the SDA system.
Running a univariate frequency distribution.
NEW FEATURES include:
Constructing a HISTOGRAM (on the variable "babies") and becoming familiar
with the components of one graphic display.
Assessing the appropriateness of the univariate statistics provided for
each of your variables.
Deciding if any of your four variables match the criteria for a normally
distributed variable.
|
|
You will construct a histogram on the variable "babies".
I very quickly constructed my histogram
in Guide 3
using
the EXCEL program. However, you can just draw your histogram by hand, or
use a word processor.
Please do NOT use the bar chart option in the SDA program to do your histogram. This program option is brand new and may not work correctly. Further, the picture may not even load on your computer system. It is apparent that programmers are still working on this new option.
You
can use either the percentages on "BABIES" or the frequencies
for your histogram (but not both).
If
you use frequencies, please present the histogram in hundreds or thousands
of cases, because otherwise
there will not be enough room on the page and your histogram will be difficult
to read and difficult to understand.
Recall
that a histogram ALSO has a title, a data source, and a total casebase.
Be
sure to label each category of "babies" as well and remember the
missing data.
|
|
|
|
|
REMEMBER! The first step in working
with data is to ALWAYS display the total frequencies on all the variables
that you plan to analyze for a particular research project.
|
|
Use the RIGHT toe of your mouse to click on this link:
|
|
When
the menu opens on the link, click on:
Open in New Window
After you have opened the new window and reached the General Social Survey page, now click on Analyze
This is the third from the left button,
at the top of the screen.
|
You can always switch back to this screen for Assignment 2 by clicking on the box at the very bottom of the monitor screen that reads "Assignment 2". Or you can print out the pages of Assignment 2.
Once again, you will bring up the "radio
buttons" screen to select an analytic option, in this case:
Frequencies or
crosstabulation.
In the Study: GSS 1972-2002
Cumulative Datafile screen that opens: first click on:
|
|
to open up the codebook window. Remember
to click on the "Standard Codebook"
link. Then, click on :
|
|
so that you have the SDA program window active as well.
Get familiar with the SDA variables list. If you click on the blue underlined shorthand abbreviation or mnemonic for each variable (e.g., maeduc), you will bring up the basic univariate frequency distribution for that particular variable, including missing value codes and frequencies. Note, however, that this is for all 30 years of the General Social Survey. The SDA program typically, BUT DEFINITELY NOT ALWAYS, will omit the cases that have missing values. Double check the variable values in each case. This is one reason why you must always run the univariate frequencies at the beginning of any analyses that you conduct.
Browse around to look at the specifications
of the following variables:
| highest year of education the person's mother completed | |
| the individual's current marital status | |
| how often the person attends religious services | |
| number of household members under 6 years old |
Don't confuse the number of missing cases
with the "cases excluded by filter". The cases excluded by filter ar for
other years than 2002. The number of missing cases for 2002 is designated
by "cases with invalid codes on row variable".
|
|
You will not have to recode any of your variables and you can obtain the frequencies and statistics for each variable in just one run.
Pull up the window for the SDA Program
Screen. Again for your review, below is the example screen for running
frequencies or crosstabulations that appeared when you clicked the "START"
button on the SDA beginning window.
In the
screen on your monitor, click on the radio button that says:
o Frequencies or crosstabulation
Then click on :
|
|
|
Remember to "turn off" the charts option by selecting "no chart".
REQUIRED Variable names to specify |
In
the line marked Row, you will type:
maeduc marital attend babies
these are the four variables you will use in this analysis. Separate each variable by a space.
In
the box that says Selection Filter(s):
type: year (2002)
This will select ONLY respondents from the year 2002.
Next to Percentaging: leave the checkmark
in the little box to the LEFT of "Column".
Now,
click on the boxes to the LEFT of:
"Statistics"
AND
"Question
text"
to be sure both of these are checked.
Leave the check in the Color coding box.
Under
"Chart options" be sure to select "(no chart)".
Blink once, then click on the gray box at the bottom left that says:
|
|
Blink again. There is your output for the four variables for this exercise.
Go ahead and print all the pages from the runs on these four variables
NOW so that you have them to study when you complete this assignment. They
also form part of the Assignment 2 that you will turn in to me.
Good golly! Do you ever have statistics! The problem is, which statistics are the right ones for each of your four variables?
(1) Is maeduc as presented in these data
nominal, ordinal, interval, or ratio? How about marital? How about attend?
How about babies?
(2) Which measure of central tendency is best for each of these four variables?
What is the category label or numerical
result (when appropriate and available) of this measure of central tendency
for each variable?
(3) Which measure of dispersion is best for each of these four variables?
Is this measure available for each
variable?
When the measure is available, what
is the category value or numerical result (when appropriate)?
(4) Do any of your variables have a normal distribution? Why or why not?
|
|
Here's
what you turn in to me by class Wednesday September 22 (you may
add a short explanation to your answer to any of these questions):
1. Your printed output for the frequency distributions on the four variables:
maeduc; marital; attend; and babies.
2. Your constructed histogram for the
variable "babies."
Use ONLY the
valid values of "babies". (Do, of course, note the number of missing
cases for 2002.)
3. The measurement level of each of these four variables: is each variable nominal, ordinal, interval, or ratio?
Make sure that you make this decision based on how the variable categories actually appear in your computer run.
4. The best measure of central tendency to use for each of these four variables considered alone. In each case, select only one of: the mode, the median, or the mean.
5. The numeric value or verbal category label that corresponds to the best measure of central tendency that you chose for each of these four variables. (COMBINING YOUR ANSWERS TO QUESTION 4 WITH YOUR ANSWERS TO QUESTION 5 IS FINE.)
6. The best measure of dispersion or variation to use for each of these four variables considered alone. In each case, select only one of: the Index of Dispersion (D), the range, inter-quartile range, or the standard deviation.
7. The numeric value or the verbal category label of the best measure of dispersion that you chose for each of these four variables. (COMBINING YOUR ANSWERS TO QUESTION 6 WITH YOUR ANSWERS TO QUESTION 7 IS FINE.)
PLEASE NOTE: The SDA system will
not calculate the Index of Dispersion so you will not find a value of it
on your output. It is OK to say so and you do not need to calculate
an actual value for D if you choose this option. You probably would
not want to use D for a nominal variable with a lot of categories either.
REMEMBER #8 IMMEDIATELY BELOW:
|
|
|
READINGS AND ASSIGNMENTS |
OVERVIEW |
|
Susan Carol Losh September
14, 2004
This page was built with
Netscape Composer
and is best viewed with
Netscape Navigator
600 X 800 display resolution.
Ivan
in multicolor...