IMPORTANT ANNOUNCEMENT:

Tonight (Thursday) I discovered the source of the problems with the non-loading picture and the nonprinting output. In the past 1.5 weeks since I wrote the assignment, BERKELEY CHANGED THE PROGRAM. They added a new option which apparently does not work well on many computers and operating systems. 

This new option is a chart, which is a color picture file. Depending on the connection, picture files can take a long time to load. Fortunately this problem is easy to fix: 

Scroll down on the Frequencies or Cross Tabulations program page past "color coding" and "show Z statistic" and you will see a new section called "Chart Options". 

Where it says "Type of chart (if any) to display:" select "(No Chart)". Then click on "Run the Table." The program will now eliminate the chart picture and your output should print OK. 

I have changed the program box below so that it now corresponds to the new addition and so that you can see where to make the change. If you have tried this correction and are still having problems, please let me know ASAP: 

slosh@garnet.acns.fsu.edu 

 


 
 
OVERVIEW


Assignment 1 is due September 15.

GUIDE 1: INTRODUCTION
GUIDE 2: CONSTRUCTING A TABLE
GUIDE 3: UNIVARIATE STATISTICS AND DISPLAYS
GUIDE 4: BIVARIATE BASICS
GUIDE 5: BIVARIATE CORRELATIONS
GUIDE 6: MULTIVARIATE CROSSTABULATIONS
GUIDE 7: BASIC REGRESSION
GUIDE 8: REGRESSION SPECIFICS
GUIDE 9: SAMPLING
TO EDF 5400 READINGS AND ASSIGNMENTS
 
 
RETURN TO  ASSIGNMENT PORTAL

 
 
EDF 5400 INTRODUCTORY STATISTICS
FALL 2004

DR SUSAN CAROL LOSH
EDUCATIONAL PSYCHOLOGY AND LEARNING SYSTEMS


 
 

ASSIGNMENT 1: EXECUTING A UNIVARIATE FREQUENCY AND PERCENTAGE RUN AND  CONSTRUCTING A UNIVARIATE TABLE

DUE WEDNESDAY SEPTEMBER 15 BY CLASS
DUE TO THE INTENSE NATURE OF THIS COURSE
LATE PAPERS ARE NOT ACCEPTED
EARLY ASSIGNMENTS ARE ACCEPTED

REMEMBER: I DO NOT TAKE E-MAIL ATTACHMENTS! THANK YOU.
PLEASE SEE ALTERNATIVE WAYS TO GET ASSIGNMENTS TO ME IN THE OVERVIEW

I WILL BEGIN DEMONSTRATIONS WITH THE SDA SYSTEM WEDNESDAY SEPTEMBER 1.

IF YOU HAVE QUESTIONS ABOUT THIS ASSIGNMENT, PLEASE SEE ME OR MARIA.

OR YOU MAY EMAIL ME. HOWEVER, PLEASE DO NOT E-MAIL AFTER 8 PM TUESDAY NIGHT.

Different e-mail providers may take a long time to deliver their mail & I may not receive it in time. I am not responsible for late delivery of e-mail by either your provider or ours, so please leave enough time!

Although we are very willing to look over your assignments and answer questions, we DO NOT guarantee a particular grade and we do not "proofread." Your work is YOUR responsibility. Thank you.

IF YOU E-MAIL ME WEDNESDAY MORNING (September 15)  I PROBABLY WILL NOT HAVE ANY TIME TO RESPOND TO YOU. 


YOUR TASKS 
TODAY
MAKE LIFE EASIER
GET FAMILIAR
WITH SDA
RECODE THE VARIABLE & RERUN
REALLY CONSTRUCTING
A TABLE

 
ASSIGNMENT STATS 

This total assignment is worth 20 points.

Correctly following all programming information for the 33 value, the 2 value, and your created five value category systems for weekly email hours (emailhr) 2002, and turning in all output = 2 points.

Although your actual output does not count very heavily, I MUST receive your output in order for you to receive credit on this assignment. Please be sure to include your computer output with your answers.
 

Your coding rationale = 4 points.

Table (completeness, appearance, followed directions) = 6 points.

Answers to questions about interpreting data output = 8 points.


YOUR TASKS FOR ASSIGNMENT ONE

No, you won't have formulae for this assignment. No, you won't do complex calculations.

You WILL be doing the hardest assignment you will have this semester.

You will construct a univariate percentage distribution TABLE using data from the year 2002 General Social Survey and the variable "emailhr" or respondent number of email hours per week in 2002. You will recode the data to FIVE substantive categories.

DO include the total valid case base.
DO note the number of missing cases.(typically right underneath your table)
DO use only the valid percents.
DO NOT use the category frequencies in your table.

BE SURE TO HAVE A TITLE!

You will do your basic computer software runs online using the Berkeley "Survey Documentation and Analysis" system (SDA). Instructions for this SDA run will be given later in this site.

Your computer output forms the basis for your TABLE but your OUTPUT is only the beginning and CANNOT substitute for the table. Keep things simple and remember to include a title and your data source (that's the 2002 General Social Survey).

ABOVE ALL: MAKE SENSE.

REPEAT:Your computer output cannot substitute for a table!

Here are the steps to follow:

1. You will execute a univariate frequency distribution on the variable "emailhr" using the entire range of 33 valid values. Print your SDA results.

I don't need to tell you how popular using email has become so this is a good variable to use for our first assignment.

2. RECODE the variable "emailhr" using my SAMPLE instructions.

3. Run a frequency distribution on the recoded EXAMPLE variable "emailhr" using only two categories. Print your SDA results.

NOW FOR THE CRITICAL AND ORIGINAL PART:

4. Repeat steps (2) and (3) for YOUR OWN ORIGINAL recoding of "emailhr". Print your results.

You will create a new coding category system with FIVE categories that uses the entire range of the 33 usable values for emailhr (excepting "missing" values) . You may collapse into any five categories you like but:
 
 
You must give the reasoning behind the five category-collapse system that you create and explain why this system is a good one. In a few sentences you will describe why you created each category that you did and what the people in each category have in common--and how they differ from people in the other four categories.

Using SDA online, you will recode the General Social Survey data to YOUR new category system. Add a descriptive value label to each category. (Be sure the labels match your categories!) Then obtain your new recoded set of frequencies.

5. Your statistical results from your own original recoding will be incorporated into a final, complete PERCENTAGE TABLE.  Using your output as a basis, you will construct a univariate percentage distribution table for your own recoded version of the variable "emailhr".
 
 

 
The output from SDA IS NOT AN ACCEPTABLE TABLE. It is the basis for a table. You must use a word processor or a typewriter to create your table. Please print or type your table. Handwritten, "cursive" tables are not acceptable due to legibility issues.

Only use the percents (except for the total valid case base). Missing data will be excluded from the final percentages (but you should note the number of missing cases. Missing cases are given at the bottom of your SDA output.)  Because of the additional information you must give for a complete and readable table, your output must accompany the table in an attachment but cannot substitute for it.

DO NOT USE FREQUENCIES IN A PERCENTAGE DISTRIBUTION TABLE (except for the total case base frequency and, of course, the number of missing cases).
 

6. Based on the data in your ORIGINAL computer software output with 33 original valid categories of "emailhr", you will answer four questions related to percentages and basic statistics.


There will be more information on the table later in the site. But first, you will need to get familiar with the SDA system and learn how to execute a basic program in it.

What do all these terms mean?
What is the sequence order to produce these data?
Can life be made easier?
Read on.
 
THE GENERAL SOCIAL SURVEY, THE VARIABLE "emailhr" AND VARIABLE CATEGORY RECODES

To date, the General Social Survey has surveyed 43,698 individuals over a 30 year period in the lower 48 United States of America. It is part of the U.S. government's emphases on "social indicators" that began in the 1970s. The GSS has many individual background variables (e.g., education, age, occupation) as well as attitudes and behaviors about almost everything. There are also short assessments (e.g., IQ tests) and information about developments such as home computer use. The 2004 survey, currently in the field, should be added next year.

"emailhr" is the short General Social Survey name or mnemonic for the variable "email hours per week". Originally, "emailhr" has so many values (33 valid values) that univariate distributions using them all are hard to read. Thus in your original recode, you will group together or "collapse" the old categories of "emailhr" to just five categories so that the results can be understood more easily.

You will do my example recode first for practice with SDA, frequencies, and recoding.

A BAD EXAMPLE USING YEAR OF BIRTH

It is very difficult to read or interpret a table when the variable(s) has many, many categories. You can see one horrendous example by clicking on the link below for a univariate distribution on the variable "year of birth" from a different study year. The oldest person was born in 1904. The youngest was born in 1975. Literally dozens of birth years are spread over two pages in a cluttered array.

CLICK HERE  for the example "year of birth" using a sample from the 1993 General Social Survey. After you have shuddered a little bit, click on the link to come back to this page.
 


MAKE LIFE EASIER!

 
 
IF YOU ARE A COMPUTER NOVICE, I STRONGLY RECOMMEND THAT YOU PRINT OUT THIS SITE. 

CHECK OFF EACH INSTRUCTION AS YOU COMPLETE IT AND YOU SHOULD HAVE A RELATIVELY NONTRAUMATIC FIRST EXPERIENCE RUNNING THE SDA SYSTEM. 

BEFORE YOU BEGIN TO WORK WITH THE BERKELEY SDA SYSTEM, I STRONGLY SUGGEST THAT YOU DO ONE OF THE FOLLOWING:

Print all the pages for assignment 1 (that's this site where you are right now). Perhaps you saved this file to your home computer.
 
OR

You can open a second copy of your preferred browser.

The idea is that you are able to work back and forth between (1) my directions for Assignment 1 and (2) the online directions and screens that will open to allow you to run the SDA system. 

This means that ultimately you should have (at least) THREE screens open:

1. This one, with my directions for you
2. The SDA directions screen
3. The "Extra Codebook" screen

You can open even more screens once you get used to the idea.

   If you are not used to switching between urls, now is the time to start! It's easy. Typically, you will find a box for each of your active online pages at the very bottom of your monitor screen. Just click on that box to pull that window into the full-screen view.
 


THE BERKELEY SURVEY DOCUMENTATION AND ANALYSIS SYSTEM

 
PRELIMINARY FREQUENCIES

Whenever you work with data, ALWAYS first display the total frequencies on all the variables that you plan to analyze for a particular research project. This gives you valuable information about missing values, category sizes, and how many categories you have. These may raise questions that you need to answer before proceeding.

In this assignment, you will also answer some questions about medians, modes, and percentages.  So, before going further, let's examine the preliminary frequencies on the variable "emailhr" or "email hours per week".

The steps immediately below will help you run your very first statistical program using the SDA system. Please follow these steps exactly in sequence. The frames on the side of the SDA program and the General Social Survey will change depending on the alternatives you choose in earlier steps so please do not try any shortcuts.

Please do not use the exercises from previous classes because the url for the dataset has changed. The old links will not work.

REMEMBER: you can right click on a link and click "Open in New Window" and thus have several windows open at the same time.

Use the RIGHT toe of your mouse to click on this link:
 
http://webapp.icpsr.umich.edu/GSS/

When the menu opens on the link, click on:         Open in New Window

Don't worry about the new window saying 1972-2000. The next screen will pull up the file of data through 2002.
After you have opened the new window and reached the General Social Survey page, now click on Analyze

This is the third from the left button, at the top of the screen.
 
 
PLEASE NOTE: WHAT YOU SEE BELOW IS A NONWORKING COPY. 
YOU MUST GO TO THE NEW PAGE THAT OPENS ON YOUR MONITOR TO ACCESS THE PROGRAM.

This link will bring up the site to run the SDA system using the General Social Survey archives. These data are in user friendly access thanks to scholars at the National Opinion Research Center, the University of California-Berkeley, and the University of Michigan.

You can always switch back to the "Assignment 1" screen by clicking on the box at the very bottom of the monitor screen that reads "Assignment 1: Constructing a Table".



THE NEW PAGE THAT WILL OPEN ON YOUR MONITOR LOOKS LIKE THIS:
 

Survey Documentation and Analysis

Study: GSS 1972-2002 Cumulative Datafile

Select an action:
Browse codebook in this window
Frequencies or crosstabulation
Comparison of means
Correlation matrix
Comparison of correlations
Multiple regression
Logit/Probit
List values of individual cases
Recode variables (into public work area)
Compute a new variable
List/delete derived variables

Download a customized subset

Suggestion for running analysis programs:
Click the "Open Extra Codebook Window" button above. This allows you to "copy-and-paste" the names of variables you wish to analyze from the codebook window to the analysis windows.
Return to SDA Home Page


Now, click on:
 
Open Extra Codebook Window

The "Extra Codebook Window" lets you check the details of each of your variables while you are putting together the short program that will execute your univariate frequency. It is always a good idea to open the Extra Codebook Window.

When the portal "Extra Codebook Window" opens, click on:

After  you have opened the STANDARD Extra Codebook Window, click on Alphabetical Variable List (on the left of the screen, under INDEXES) to browse the General Social Survey. This is an enormous set of data, so you will only be able to browse a small part of it.

A page will open that looks like this:
 

Index to Pages with Items in Alphabetical Order

Click on the link to a page to see the list of those items in alphabetical order.
Links to Pages First Item Last Item
Go to page containing range of items: abany emfamloc
Go to page containing range of items: emfamoth justpay
Go to page containing range of items: kdalive1 poleff15
Go to page containing range of items: poleff16 tax
Go to page containing range of items: taxcheat zombies

Click on the first line that reads "Go to page containing range of items: abany to emfamloc".

Scroll down the page until you reach the variable "emailhr". You can click on "emailhr" to see how it is coded and what the missing value codes are (998 for Don't Know and 999 for No Answer). Missing values will be excluded in this analysis because they don't give us any real information. However, you will need to indicate the size of the missing values in your table (for the year 2002 ONLY!) You will learn more about the missing values that are for 2002 later in this assignment.
 

RUNNING YOUR SDA SHORT PROGRAM

In theSurvey Documentation and Analysis screen on your monitor, click the radio button that says:

o Frequencies or crosstabulation

Then click on :
 
Start

 
 
THE BOX BELOW WILL NOT WORK TO RUN YOUR PROGRAM. 
YOU MUST GO TO THE NEW PAGE THAT OPENS ON YOUR MONITOR TO RUN YOUR PROGRAM.

NOW you should see the following screen:
 
 

PLEASE NOTE NEW ADDED SECTION BELOW:
 
SDA Tables Program
(Selected Study: GSS 1972-2002 Cumulative Datafile)
Help: General / Recoding Variables
REQUIRED Variable names to specify
Row:

OPTIONAL Variable names to specify
Column:
Control:
Selection Filter(s):Example: age(18-50) gender(1)
Weight:

Percentaging:ColumnRowTotal

Other options
StatisticsSuppress tableQuestion text
Color codingShow Z-statistic




Chart options CHECK THE (no chart) OPTION:

Type of chart (if any) to display:
Bar chart options:
   Orientation: Vertical Horizontal
   Visual Effects: 2-D 3-D
Size of chart - width:  height: 



Change number of decimal places to display
For percents:
For statistics:

 

In the box marked Row, type:             emailhr

(SDA is not very case-sensitive, but it never hurts to be careful.)

In the box that says Selection Filter(s):  type:         year (2002)

This will select ONLY respondents from the year 2002.
The General Social Survey covers several years over the 1972-2002 period. For this assignment, we will only analyze the 2002 data.
 

Next to Percentaging: make sure the checkmark is present in the little box to the LEFT of "Column".
Click the box to make the checkmark appear if it is absent.

Now, click on the boxes to the LEFT of:

"Statistics"                    AND
"Question text"

to be sure both these are checked.

Leave the check in the Color coding box.

Blink once, then click on the gray box at the bottom left that says:
 
Run the Table

and blink again. There is your first output right before your eyes.

Go ahead and print all the pages from your first output NOW (there may be two pages, depending on how your printer formats) so that you have them to study when you create your new category system and to answer the four statistics questions later on. These preliminary pages also form part of the Assignment 1 that you will turn in to me.

There are 1881 valid cases. 884 cases for 2002 are missing, that is, they have scores of -1, 998 or 999. Most of these 884 cases did not have access to a computer at home, school, work or elsewhere so they were never asked the question about email (OK--they forgot about small cell phones!) Don't worry about the 40,933 cases excluded by filter; they are for other years than 2002 and we are not including these individuals. You will find this information at the very bottom of your output.
 


TRY OUT A RECODE & UNIVARIATE EXAMPLE FREQUENCY DISTRIBUTION IN SDA

Recodes assign new values to your original variable category system. Many different recodes are possible, but the particular recode for this assignment will group old categories together to form new categories.

Ultimately, you will create your own category system for "emailhr" that has FIVE substantive categories (excluding the missing data). You will have a rationale that accompanies your recodes to explain why this five category collapse is the very best one you want to use for "emailhr".

However, now you will do a short example for practice recoding the 33 valid categories in "emailhr" to just TWO categories so that you can become familiar with the computer syntax. You decide to code everyone with valid data who uses email at least 7 hours a week as "high" and everyone who uses email 6 hours a week as "low".

Right at the top of the SDA Tables Program, just above where it says Row: there is a link that reads: Recoding Variables. Click on the "Recoding Variables" link so that you can study the SDA recoding conventions. You may even wish to right click on the link and open it in a new window (if you do, at this point you would be working with several separate open window screens.)

OK: here goes for this example:

In the box marked Row, type:             emailhr (r: 0-6 "a little"; 7-70 "a lot")

Be very careful about the syntax. The short variable name ("emailhr") goes first. Make sure a semi-colon (;) separates the coding values. Your alphabetic labels go after each recoded category, are encased with double quote marks ("), and are placed before the semi-colon for each recoded category. Parentheses are at each end of the total recode statement.

The rest of the steps are the same as in your first run when you did not recode anything.

In the box that says Selection Filter(s):  type:         year (2002)
This means you will only recode and study the year 2002 respondents.

Next to Percentaging: make sure the checkmark in present in the little box to the LEFT of "Column".
Click the box to make the checkmark appear if it is absent.

Now, click on the boxes to the LEFT so that both are checked of:

"Statistics"                    AND
"Question text"

Leave the check in the Color coding  box.

Then click:
 
Run the Table

You have just grouped the original 33 values of "emailhr" into only two values, "a little" and "a lot".

Go ahead and print all the pages from your second output NOW, again so that you have them to study when you create your new category system. They also form part of the Assignment 1 that you will turn in to me. (Your entire assignment will be returned.) Label them "bad example recode."

There was nothing special about my recoded example of "emailhr". In fact, it was a second bad example! Two categories was probably too few given the diversity of email use among adults in the United States. This distribution is probably needlessly lopsided with over 80% of Americans in the "a little" category.  You will try a different regrouping of the many possible categories of "emailhr" into five new descriptive categories with a good rationale.

Be sure to remove the recoded values of "emailhr" using only two categories in the example before you start recoding "emailhr" for your five categories. You can just block out the prior recode and delete it (or use your delete key).


If you like, you can continue to study my second terrible earlier example about two categories of education in the year 2000.

Just click here:
Right click on Hurricane Frances to open a new window if you like.

ON CODING

FIRST, remember to select ONLY the year 2002 cases in the Selection Filter(s):  section.
Otherwise you will have a LOT more cases than the 1881  in the year 2002.

Your recode will be done in the Row: section, only this time you will have five categories instead of two. Remember to follow the syntax.

Remember to get the:

statistics
the question text
color coding and
the column percents.

When you are creating your new category system on the variable "emailhr", remember these tips:

Of course categories will be exhaustive and mutually exclusive. Be sure to include all valid cases!

Keep the meaning of categories simple

Place people with something in common in the same category.
 

 
Be prepared to say what people in the same email hour category have in common and how they are different from people in the other four categories when you write your rationale.

When working with numbers, you can try for equal intervals if it makes sense (otherwise DO NOT use equal intervals)

If possible, and the categories can make sense, avoid categories that have a very small or very large percentage of cases in them  (e.g., less than 10% or more than 80%)

Make category labels descriptive

ABOVE ALL MAKE SENSE!
 

THOUGHT QUESTIONS: What was the measurement level of the original variable 'email hours per week'? Was it nominal, ordinal, interval, or ratio? What's the reason for your choice? What happened after you recoded the variable? What was the level of measurement then?
 


AT LAST: THE CREATIVE PART OF YOUR ASSIGNMENT!

PLEASE BE SURE TO READ AND COMPLETE THE BASIC STATISTICS QUESTIONS ASSIGNMENT FOLLOWING (4) BELOW

So...Your assignment is the following:

  (1) Devise a new coding category system with FIVE categories that uses the entire range of VALID values for "emailhr". Use your judgement to collapse into any five categories you like but:

  (2) Give the reasoning behind the five category-collapse system you devise and explain why this system is a good one. In a few sentences describe why you created each category that you did, and what the people in each email hours category have in common--and how they differ from people in the other four categories.

(3) Using the SDA, recode the General Social Survey data to YOUR new category system. Be sure the labels that you create match your categories!

(4) Use the Berkeley SDA system to obtain your new set of frequencies using your five category system.

Please be sure to print your new frequency table and turn it in with your other output, your table and the answers to the percentage questions.

You can experiment and try different recode schemes before choosing the final set of combined categories that is best for you. Simply remove your "old" recodes in the "row" line and replace them with the new recodes.


  (5) Using your output as a basis, construct a univariate percentage distribution table for your new "emailhr" recoded variable (only use the valid percents and exclude the missing data...but make sure you note the size of the missing data).

Please print or type your table.

PLEASE REMEMBER: Because of the additional information that you must provide for a complete and readable table, your output must accompany the table in an attachment but cannot substitute for it.

DO NOT USE FREQUENCIES IN A PERCENTAGE DISTRIBUTION TABLE (except for the total case base frequency).


(6) ANSWER THE QUESTIONS BELOW USING ALL 33 VALUES OF "emailhr". This is the very first practice example that you ran.

INTERPRETING BASIC UNIVARIATE DATA

1. What was the MEDIAN email hours per week used by an adult respondent in the 2002 General Social Survey?

2. What was the MODAL category of weekly email hours?

3. Approximately what percent of the sample used email at least 30 hours a week?

4. Approximately what percent of the sample used email less than 7 hours per week?


  Here's what you turn in to me by class Wednesday September 15:

1. Your printed output for the first frequency distribution with 33 valid values for weekly email hours.

2. Your printed output for the second frequency distribution with only 2 recoded valid values of email hours ("a little" and "a lot").

3. Your printed output for YOUR CREATED RECODED frequency distribution with 5 valid values of respondent weekly email use.

4. The rationale for the recoded category system that you created.

5. Your univariate percentage table using your 5 category recode scheme.

6. The four answers to part (6) on medians, modes, and percents using the entire initial 0-70 (33 valid values) set of categories.
 
 

READINGS AND ASSIGNMENTS

OVERVIEW

Susan Carol Losh September 6, 2004
This page was built with Netscape Composer
and is best viewed with Netscape Navigator
600 X 800 display resolution.