IMPORTANT ANNOUNCEMENT:

As of 6 PM Tuesday September 14 2004, FSU plans for all classes to be held as scheduled. They will update the alerts page on Wednesday. Please watch Blackboard for updates and announcements as well, and be assured I will monitor the storm situation carefully.

Hurricane Ivan has also slowed its forward motion so that it is arriving in Florida later than originally anticipated. Right now the weather looks like typical Tallahassee rain for Wednesday evening. I AM planning on holding class Wednesday but probably only for about an hour so we can get an early start home. Because Assignment 2 is now up and it is due next Wednesday, I can do some demonstrations with SDA tomorrow. I can answer any last minute questions about Assignment 1 and examine some of the graphics material at the end of Guide 3. We will examine sampling distributions, the normal curve, Z-scores and confidence intervals next week.

Under the circumstances I will take Assignment 1  in my mailbox BY 5 PM WEDNESDAY, which is 307 Stone Building in the Educational Psychology suite. All Assignment 1s are due Wednesday--tomorrow--by class (the original due date). 


 
OVERVIEW

GUIDE 1: INTRODUCTION
GUIDE 2: CONSTRUCTING A TABLE
GUIDE 3: UNIVARIATE STATISTICS AND DISPLAYS
GUIDE 4: BIVARIATE BASICS
GUIDE 5: BIVARIATE CORRELATIONS
GUIDE 6: MULTIVARIATE CROSSTABULATIONS
GUIDE 7: BASIC REGRESSION
GUIDE 8: REGRESSION SPECIFICS
GUIDE 9: SAMPLING
TO EDF 5400 READINGS AND ASSIGNMENTS
 
 
RETURN TO  ASSIGNMENT PORTAL

 
 
EDF 5400 INTRODUCTORY STATISTICS
FALL 2004

DR SUSAN CAROL LOSH
EDUCATIONAL PSYCHOLOGY AND LEARNING SYSTEMS

MY OFFICE HOURS ARE MONDAY AND WEDNESDAY 3:30-5  AT 307K STONE BUILDING.
MARIA HAS OFFICE HOURS TUESDAY- THURSDAY 3:15-5 IN THE LRC ROOM 124 STONE.
 

 

ASSIGNMENT 2: EXECUTING UNIVARIATE FREQUENCY RUNS, SELECTING
       MEASURES OF CENTRAL TENDENCY AND DISPERSION, AND UNIVARIATE DISPLAYS


DUE WEDNESDAY SEPTEMBER 22 BY CLASS
DUE TO THE INTENSE NATURE OF THIS COURSE
LATE PAPERS ARE NOT ACCEPTED
EARLY ASSIGNMENTS ARE ACCEPTED

REMEMBER: I DO NOT TAKE E-MAIL ATTACHMENTS! THANK YOU.

PLEASE SEE ALTERNATIVE WAYS TO GET ASSIGNMENTS TO ME IN THE OVERVIEW

 
LAST DAY QUESTIONS ABOUT THIS ASSIGNMENT? PLEASE EMAIL. HOWEVER, PLEASE DO NOT E-MAIL AFTER 8 PM TUESDAY NIGHT.
Different e-mail providers may take a long time to deliver their mail & we may not receive it in time. Maria and I are not responsible for late delivery of e-mail by either your provider or ours, or for server viruses that slow transmission, so please leave enough time!
IF YOU E-MAIL ME WEDNESDAY MORNING I WILL NOT HAVE TIME TO RESPOND TO YOU. 

 

ASSIGNMENT STATS
YOUR TASKS 
TODAY
CONSTRUCTING A HISTOGRAM
SDA PROGRAM
RUN #2
WHAT YOU TURN IN ON SEPTEMBER 22

We often summarize a univariate distribution with a measure of central tendency and a measure of dispersion or variation. Measures of central tendency include modes, medians, and means, while measures of variation include the index of dispersion (D), the range, the inter-quartile range (IQR), and the standard deviation. You must decide whether each variable is nominal, ordinal or interval/ratio and examine its distribution to choose the appropriate measures of central tendency and dispersion for each variable.

A normal curve is defined by a collection of mathematical criteria. Are any of your variables normally distributed? Take a look at the criteria, and take a look at the distribution of each of your variables, and you decide.
 
REVIEW THE NORMAL CURVE CRITERIA HERE: 

ASSIGNMENT STATS

This total assignment is worth 20 points.

Correctly following all programming information for running frequency distributions on the four variables and turning in all output = 2 points.

Although your actual output does not count very heavily, I MUST receive your output in order for you to receive credit on this assignment.

Constructing a histogram on the variable "babies" = 3 points.

Designating the measurement level of each of the four variables correctly = 4 points.

Designating the most appropriate measure of central tendency for each of the four variables = 2 points.

Designating the correct value of the chosen measure of central tendency for each variable = 2 points.

Designating the most appropriate measure of dispersion for each of the four variables = 2 points.

Designating the correct value of the chosen measure of dispersion for each variable (when available) = 2 points.

Assessing whether one's variables are normally distributed with rationale = 3 points.

YOUR TASKS FOR ASSIGNMENT TWO

1. You will run FREQUENCIES on four variables:

maeduc (highest year of school the individual's mother completed)
  marital (the individual's marital status)
  attend  (how often the individual attends religious services) and
  babies (number of household members under 6 years old)

2. You will construct a histogram on the variable "babies"

3. You will identify the measurement level of each of these four variables as they were coded and  appeared in your computer run.

4. For each of these four variables as appearing in your computer run, you will select the most appropriate measure of central tendency, and give the value of the measure of central tendency that you chose in each case.

5. For each of these four variables as appearing in your computer run, you will select the most appropriate measure of dispersion, and then give the value of the measure of dispersion that you chose in each case (when available; "D" is not available but may be named as the most appropriate measure of dispersion depending on the particular variable).

6. You will decide if any of your variables are normally distributed (and briefly describe the rational behind your decision).


As you will see in  your computer exercise, the SDA system generates MANY statistics when it runs the frequencies program. This is typical of most statistical software, such as the SPSS program. The computer program will do the calculations, but it is your job to select among all the statistics that the SDA generates.

You are the one who decides, in fact, whether the answer will be a number, or the verbal designation associated with a numeric code that is the true category value (for nominal or ordinal variables.)

OLD FEATURES in this exercise include:

  Accessing an online database (the General Social Survey) and the SDA system.
  Running a univariate frequency distribution.

NEW FEATURES include:

Constructing a HISTOGRAM (on the variable "babies") and becoming familiar with the components of one graphic display.
Assessing the appropriateness of the univariate statistics provided for each of your variables.
Deciding if any of your four variables match the criteria for a normally distributed variable.
 


CONSTRUCTING A HISTOGRAM

You will construct a histogram on the variable "babies".

I very quickly constructed my histogram in Guide 3 using the EXCEL program. However, you can just draw your histogram by hand, or use a word processor.

Please do NOT use the bar chart option in the SDA program to do your histogram. This program option is brand new and may not work correctly. Further, the picture may not even load on your computer system. It is apparent that programmers are still working on this new option.

You can use either the percentages on  "BABIES" or the frequencies for your histogram (but not both).

If you use frequencies, please present the histogram in hundreds or thousands of cases, because otherwise there will not be enough room on the page and your histogram will be difficult to read and difficult to understand.

Recall that a histogram ALSO has a title, a data source, and a total casebase. Be sure to label each category of "babies" as well and remember the missing data.
 


MAKE LIFE EASIER!

 
 
IF YOU ARE A COMPUTER NOVICE, AGAIN I RECOMMEND THAT YOU PRINT THIS SITE.
(You can also open a second copy of your preferred browser and review directions by switching windows. Recall also that you can download and edit any of our Internet pages into Composer or Front Page and include your own notes.)

CHECK OFF EACH INSTRUCTION AS YOU COMPLETE IT AND YOU WILL HAVE A SPEEDY AND NONTRAUMATIC EXPERIENCE RUNNING THE SDA SYSTEM. 


 
PRELIMINARY FREQUENCIES

REMEMBER! The first step in working with data is to ALWAYS display the total frequencies on all the variables that you plan to analyze for a particular research project.
 
SDA REVIEW

Use the RIGHT toe of your mouse to click on this link:
 
http://webapp.icpsr.umich.edu/GSS/

When the menu opens on the link, click on:         Open in New Window

After you have opened the new window and reached the General Social Survey page, now click on Analyze

This is the third from the left button, at the top of the screen.
 
 
 
Remember: WHAT YOU SEE BELOW IS A NONWORKING COPY. 
YOU MUST GO TO THE NEW PAGE THAT OPENS ON YOUR MONITOR TO ACCESS THE PROGRAM.

You can always switch back to this screen for Assignment 2 by clicking on the box at the very bottom of the monitor screen that reads "Assignment 2". Or you can print out the pages of Assignment 2.

Once again, you will bring up the "radio buttons" screen to select an analytic option, in this case:
Frequencies or crosstabulation.
 

Survey Documentation and Analysis

Study: GSS 1972-2002 Cumulative Datafile

Select an action:
Browse codebook in this window
Frequencies or crosstabulation
Comparison of means
Correlation matrix
Comparison of correlations
Multiple regression
Logit/Probit
List values of individual cases
Recode variables (into public work area)
Compute a new variable
List/delete derived variables

Download a customized subset

Suggestion for running analysis programs:
Click the "Open Extra Codebook Window" button above. This allows you to "copy-and-paste" the names of variables you wish to analyze from the codebook window to the analysis windows.
Return to SDA Home Page

In the Study: GSS 1972-2002 Cumulative Datafile screen that opens: first click on:
 
Extra Codebook Window

to open up the codebook window. Remember to click on the "Standard Codebook" link. Then, click on :
 
Start

so that you have the SDA program window active as well.


Get familiar with the SDA variables list. If you click on the blue underlined shorthand abbreviation or mnemonic for each variable (e.g., maeduc), you will bring up the basic univariate frequency distribution for that particular variable, including missing value codes and frequencies. Note, however, that this is for all 30 years of the General Social Survey. The SDA program typically, BUT DEFINITELY NOT ALWAYS, will omit the cases that have missing values. Double check the variable values in each case. This is one reason why you must always run the univariate frequencies at the beginning of any analyses that you conduct.

Browse around to look at the specifications of the following variables:
 

maeduc highest year of education the person's mother completed
marital the individual's current marital status
attend how often the person attends religious services
babies number of household members under 6 years old 

Don't confuse the number of missing cases with the "cases excluded by filter". The cases excluded by filter ar for other years than 2002. The number of missing cases for 2002 is designated by "cases with invalid codes on row variable".
 
 

YOUR SDA COMPUTER PROGRAM RUN

You will not have to recode any of your variables and you can obtain the frequencies and statistics for each variable in just one run.

Pull up the window for the SDA Program Screen. Again for your review, below is the example screen for running frequencies or crosstabulations that appeared when you clicked the "START" button on the SDA beginning window.
 

In theSurvey Documentation and Analysis   screen on your monitor, click on the radio button that says:

o Frequencies or crosstabulation

Then click on :
 
Start

 
 
THE BOX BELOW WILL NOT WORK TO RUN YOUR PROGRAM. 
YOU MUST GO TO THE NEW PAGE THAT OPENS ON YOUR MONITOR TO RUN YOUR PROGRAM.

Remember to "turn off" the charts option by selecting "no chart".
 


 
SDA Tables Program
(Selected Study: GSS 1972-2002 Cumulative Datafile)
Help: General / Recoding Variables
REQUIRED Variable names to specify
Row:

OPTIONAL Variable names to specify
Column:
Control:
Selection Filter(s):Example: age(18-50) gender(1)
Weight:

Percentaging:ColumnRowTotal

Other options
StatisticsSuppress tableQuestion text
Color codingShow Z-statistic



Chart options BE SURE TO CHECK THE (no chart) OPTION:

Type of chart (if any) to display:
Bar chart options:
   Orientation: Vertical Horizontal
   Visual Effects: 2-D 3-D
Size of chart - width:  height: 



Change number of decimal places to display
For percents:
For statistics:

 

In the line marked Row, you will type:             maeduc marital attend babies

these are the four variables you will use in this analysis. Separate each variable by a space.

In the box that says Selection Filter(s):  type:         year (2002)

This will select ONLY respondents from the year 2002.

Next to Percentaging: leave the checkmark in the little box to the LEFT of "Column".

Now, click on the boxes to the LEFT of:

"Statistics"                    AND
"Question text"

to be sure both of these are checked.

Leave the check in the Color coding box.

Under "Chart options" be sure to select "(no chart)".

Blink once, then click on the gray box at the bottom left that says:
 
Run the Table

Blink again. There is your output for the four variables for this exercise.

Go ahead and print all the pages from the runs on these four variables NOW so that you have them to study when you complete this assignment. They also form part of the Assignment 2 that you will turn in to me.

Good golly! Do you ever have statistics! The problem is, which statistics are the right ones for each of your four variables?

(1) Is maeduc as presented in these data nominal, ordinal, interval, or ratio? How about marital? How about attend? How about babies?

(2) Which measure of central tendency is best for each of these four variables?
What is the category label or numerical result (when appropriate and available) of this measure of central tendency for each variable?

(3) Which measure of dispersion is best for each of these four variables?
Is this measure available for each variable?
When the measure is available, what is the category value or numerical result (when appropriate)?

(4) Do any of your variables have a normal distribution? Why or why not?
 


WHAT YOU TURN IN BY CLASS ON WEDNESDAY SEPTEMBER 22 2004


  Here's what you turn in to me by class Wednesday September 22  (you may add a short explanation to your answer to any of these questions):

1. Your printed output for the frequency distributions on the four variables:

maeduc; marital; attend; and babies.

2. Your constructed histogram for the variable "babies."
Use ONLY the valid values of "babies". (Do, of course, note the number of missing cases for 2002.)

3. The measurement level of each of these four variables: is each variable nominal, ordinal, interval, or ratio?

Make sure that you make this decision based on how the variable categories actually appear in your computer run.

4. The best measure of central tendency to use for each of these four variables considered alone. In each case, select only one of: the mode, the median, or the mean.

5. The numeric value or verbal category label that corresponds to the best measure of central tendency that you chose for each of these four variables. (COMBINING YOUR ANSWERS TO QUESTION 4 WITH YOUR ANSWERS TO QUESTION 5 IS FINE.)

6. The best measure of dispersion or variation to use for each of these four variables considered alone. In each case, select only one of: the Index of Dispersion (D), the range, inter-quartile range, or the standard deviation.

7. The numeric value or the verbal category label of the best measure of dispersion that you chose for each of these four variables. (COMBINING YOUR ANSWERS TO QUESTION 6 WITH YOUR ANSWERS TO QUESTION 7 IS FINE.)

PLEASE NOTE: The SDA system will not calculate the Index of Dispersion so you will not find a value of it on your output. It is OK to say so and you do not need to calculate an actual value for D if you choose this option. You probably would not want to use D for a nominal variable with a lot of categories either.
 

REMEMBER #8 IMMEDIATELY BELOW:

 
8. Were any of your variables normally distributed? 
If so, which ones? 
What was the basis for your decision?



 
 

READINGS AND ASSIGNMENTS

OVERVIEW

Susan Carol Losh September 14, 2004
This page was built with Netscape Composer
and is best viewed with Netscape Navigator
600 X 800 display resolution.
Ivan in multicolor...