| IMPORTANT ANNOUNCEMENT:
Tonight (Thursday) I discovered the source of the problems with the non-loading picture and the nonprinting output. In the past 1.5 weeks since I wrote the assignment, BERKELEY CHANGED THE PROGRAM. They added a new option which apparently does not work well on many computers and operating systems. This new option is a chart, which is a color picture file. Depending on the connection, picture files can take a long time to load. Fortunately this problem is easy to fix: Scroll down on the Frequencies or Cross Tabulations program page past "color coding" and "show Z statistic" and you will see a new section called "Chart Options". Where it says "Type of chart (if any) to display:" select "(No Chart)". Then click on "Run the Table." The program will now eliminate the chart picture and your output should print OK. I have changed the program box below so that it now corresponds to the new addition and so that you can see where to make the change. If you have tried this correction and are still having problems, please let me know ASAP:
|
|
FALL 2004 DR SUSAN CAROL
LOSH
|
|
REMEMBER: I DO NOT TAKE E-MAIL ATTACHMENTS! THANK YOU.
PLEASE SEE ALTERNATIVE WAYS TO GET ASSIGNMENTS TO ME IN THE OVERVIEW
|
IF YOU HAVE QUESTIONS ABOUT THIS ASSIGNMENT, PLEASE SEE ME OR MARIA. OR YOU MAY EMAIL ME. HOWEVER, PLEASE DO NOT E-MAIL AFTER 8 PM TUESDAY NIGHT. Although we are very willing to look over your assignments and answer questions, we DO NOT guarantee a particular grade and we do not "proofread." Your work is YOUR responsibility. Thank you. IF YOU E-MAIL ME WEDNESDAY MORNING (September 15) I PROBABLY WILL NOT HAVE ANY TIME TO RESPOND TO YOU. |
|
TODAY |
|
WITH SDA |
|
A TABLE |
|
|
This total assignment is worth 20 points.
Correctly following all programming information for the 33 value, the 2 value, and your created five value category systems for weekly email hours (emailhr) 2002, and turning in all output = 2 points.
Although
your actual output does not count very heavily, I MUST receive your output
in order for you to receive credit on this assignment. Please be sure to
include your computer output with your answers.
Your coding rationale = 4 points.
Table (completeness, appearance, followed directions) = 6 points.
Answers to questions about interpreting data output = 8 points.
|
|
No, you won't have formulae for this assignment. No, you won't do complex calculations.
You WILL be doing the hardest assignment you will have this semester.
You will construct a univariate percentage
distribution TABLE using data from the year 2002 General Social Survey
and the variable "emailhr" or respondent number of email hours per week
in 2002. You will recode the data to FIVE substantive categories.
DO include the total valid case
base.
DO note the number of missing
cases.(typically right underneath your table)
DO use only the valid percents.
DO NOT use the category frequencies
in your table.
BE SURE TO HAVE A TITLE!
You
will do your basic computer software runs online using the Berkeley "Survey
Documentation and Analysis" system (SDA). Instructions for this SDA run
will be given later in this site.
Your computer output forms the basis for your TABLE but your OUTPUT is only the beginning and CANNOT substitute for the table. Keep things simple and remember to include a title and your data source (that's the 2002 General Social Survey).
ABOVE ALL: MAKE SENSE.
REPEAT:Your computer output cannot substitute for a table!
Here are the steps to follow:
1.
You will execute a univariate frequency distribution on the variable "emailhr"
using the entire range of 33 valid values. Print your SDA results.
I don't need to tell you how popular using email has become so this is a good variable to use for our first assignment.
2. RECODE the variable "emailhr" using my SAMPLE instructions.
3. Run a frequency distribution on the recoded EXAMPLE variable "emailhr"
using only two categories. Print your SDA results.
NOW FOR THE CRITICAL AND ORIGINAL PART:
4. Repeat steps (2) and (3) for YOUR OWN ORIGINAL recoding
of "emailhr". Print your results.
You will create a
new coding category system with
FIVE categories that uses the entire range of the 33 usable
values for emailhr (excepting "missing" values) . You may collapse into
any five categories you like but:
|
Using SDA online, you will recode the General Social Survey data to YOUR new category system. Add a descriptive value label to each category. (Be sure the labels match your categories!) Then obtain your new recoded set of frequencies.
5. Your statistical results from your own original recoding
will be incorporated into a final, complete PERCENTAGE TABLE. Using
your output as a basis, you will construct a univariate percentage distribution
table for your own recoded version of the variable "emailhr".
|
Only use the percents (except for the total valid case base). Missing data will be excluded from the final percentages (but you should note the number of missing cases. Missing cases are given at the bottom of your SDA output.) Because of the additional information you must give for a complete and readable table, your output must accompany the table in an attachment but cannot substitute for it.
DO NOT USE FREQUENCIES IN A PERCENTAGE
DISTRIBUTION TABLE (except for the total case base frequency and, of course,
the number of missing cases).
6.
Based on the data in your ORIGINAL computer software output with 33 original
valid categories of "emailhr", you will answer four questions related to
percentages and basic statistics.
There will be more information on the table later in the site. But first, you will need to get familiar with the SDA system and learn how to execute a basic program in it.
What do all these terms mean?
What is the sequence order to produce
these data?
Can life be made easier?
Read on.
|
|
To date, the General Social Survey has surveyed 43,698 individuals over a 30 year period in the lower 48 United States of America. It is part of the U.S. government's emphases on "social indicators" that began in the 1970s. The GSS has many individual background variables (e.g., education, age, occupation) as well as attitudes and behaviors about almost everything. There are also short assessments (e.g., IQ tests) and information about developments such as home computer use. The 2004 survey, currently in the field, should be added next year.
"emailhr" is the short General Social Survey name or mnemonic for the variable "email hours per week". Originally, "emailhr" has so many values (33 valid values) that univariate distributions using them all are hard to read. Thus in your original recode, you will group together or "collapse" the old categories of "emailhr" to just five categories so that the results can be understood more easily.
You will do my example recode first for practice with SDA, frequencies, and recoding.
A BAD EXAMPLE USING YEAR OF BIRTH
It is very difficult to read or interpret a table when the variable(s) has many, many categories. You can see one horrendous example by clicking on the link below for a univariate distribution on the variable "year of birth" from a different study year. The oldest person was born in 1904. The youngest was born in 1975. Literally dozens of birth years are spread over two pages in a cluttered array.
CLICK HERE
for the example "year of birth" using a sample from the 1993
General Social Survey. After you have shuddered a little bit, click on
the link to come back to this page.
|
|
|
If you are not used to switching between urls, now is the time to start!
It's easy. Typically, you will find a box for each of your active online
pages at the very bottom of your monitor screen. Just click on that box
to pull that window into the full-screen view.
|
|
|
|
Whenever you work with data, ALWAYS first display the total frequencies on all the variables that you plan to analyze for a particular research project. This gives you valuable information about missing values, category sizes, and how many categories you have. These may raise questions that you need to answer before proceeding.
In this assignment, you will also answer some questions about medians, modes, and percentages. So, before going further, let's examine the preliminary frequencies on the variable "emailhr" or "email hours per week".
The steps immediately below will help you run your very first statistical program using the SDA system. Please follow these steps exactly in sequence. The frames on the side of the SDA program and the General Social Survey will change depending on the alternatives you choose in earlier steps so please do not try any shortcuts.
Please do not use the exercises from previous classes because the url for the dataset has changed. The old links will not work.
REMEMBER: you can right click on a link and click "Open in New Window" and thus have several windows open at the same time.
Use the RIGHT
toe of your mouse to click on this link:
|
|
When
the menu opens on the link, click on:
Open in New Window
Don't worry about the new window saying
1972-2000. The next screen will pull up the file of data through 2002.
After you have opened the new window and
reached the General Social Survey page, now click on Analyze
This is the third from the left button,
at the top of the screen.
|
This link will bring up the site to run the SDA system using the General Social Survey archives. These data are in user friendly access thanks to scholars at the National Opinion Research Center, the University of California-Berkeley, and the University of Michigan.
You can always switch back to the "Assignment 1" screen by clicking on the box at the very bottom of the monitor screen that reads "Assignment 1: Constructing a Table".
|
|
The "Extra Codebook Window" lets you check the details of each of your variables while you are putting together the short program that will execute your univariate frequency. It is always a good idea to open the Extra Codebook Window.
When the portal "Extra Codebook Window" opens, click on:
After you have opened the STANDARD Extra Codebook Window, click on Alphabetical Variable List (on the left of the screen, under INDEXES) to browse the General Social Survey. This is an enormous set of data, so you will only be able to browse a small part of it.A page will open that looks like this:
|
Click on the first line that reads "Go to page containing range of items: abany to emfamloc".
Scroll down the page until you reach the
variable "emailhr". You can click on "emailhr" to see how it is coded and
what the missing value codes are (998 for Don't Know and 999 for No Answer).
Missing values will be excluded in this analysis because they don't give
us any real information. However, you will need to indicate the size of
the missing values in your table (for the year 2002 ONLY!) You will
learn more about the missing values that are for 2002 later in this
assignment.
|
|
In the
screen on your monitor, click the radio button that says:
o Frequencies or crosstabulation
Then click on :
|
|
|
NOW you should see the following screen:
PLEASE NOTE NEW ADDED SECTION BELOW:
REQUIRED Variable names to specify |
In
the box marked Row, type:
emailhr
(SDA is not very case-sensitive, but it never hurts to be careful.)
In the box that says Selection Filter(s):
type: year (2002)
This will select ONLY respondents from
the year 2002.
The General Social Survey covers several
years over the 1972-2002 period. For this assignment, we will only analyze
the 2002 data.
Next to Percentaging: make sure the
checkmark is present in the little box to the LEFT of "Column".
Click the box to make the checkmark appear
if it is absent.
Now,
click on the boxes to the LEFT of:
"Statistics"
AND
"Question
text"
to be sure both these are checked.
Leave the check in the Color coding box.
Blink
once, then click on the gray box at the bottom left that says:
|
|
and blink again. There is your first output right before your eyes.
Go ahead and print all the pages from your first output NOW (there may be two pages, depending on how your printer formats) so that you have them to study when you create your new category system and to answer the four statistics questions later on. These preliminary pages also form part of the Assignment 1 that you will turn in to me.
There are 1881 valid cases. 884 cases for
2002 are missing, that is, they have scores of -1, 998 or 999. Most of
these 884 cases did not have access to a computer at home, school, work
or elsewhere so they were never asked the question about email (OK--they
forgot about small cell phones!) Don't worry about the 40,933 cases excluded
by filter; they are for other years than 2002 and we are not including
these individuals. You will find this information at the very bottom of
your output.
|
|
Recodes assign new values to your original variable category system. Many different recodes are possible, but the particular recode for this assignment will group old categories together to form new categories.
Ultimately, you will create your own category system for "emailhr" that has FIVE substantive categories (excluding the missing data). You will have a rationale that accompanies your recodes to explain why this five category collapse is the very best one you want to use for "emailhr".
However, now you will do a short example for practice recoding the 33 valid categories in "emailhr" to just TWO categories so that you can become familiar with the computer syntax. You decide to code everyone with valid data who uses email at least 7 hours a week as "high" and everyone who uses email 6 hours a week as "low".
Right at the top of the SDA Tables Program, just above where it says Row: there is a link that reads: Recoding Variables. Click on the "Recoding Variables" link so that you can study the SDA recoding conventions. You may even wish to right click on the link and open it in a new window (if you do, at this point you would be working with several separate open window screens.)
OK: here goes for this example:
In
the box marked Row, type:
emailhr (r: 0-6 "a little"; 7-70 "a lot")
Be very careful about the syntax. The short variable name ("emailhr") goes first. Make sure a semi-colon (;) separates the coding values. Your alphabetic labels go after each recoded category, are encased with double quote marks ("), and are placed before the semi-colon for each recoded category. Parentheses are at each end of the total recode statement.
The rest of the steps are the same as in your first run when you did not recode anything.
In the box that says Selection Filter(s):
type: year (2002)
This means you will only recode and study
the year 2002 respondents.
Next to Percentaging: make sure the
checkmark in present in the little box to the LEFT of "Column".
Click the box to make the checkmark appear
if it is absent.
Now,
click on the boxes to the LEFT so that both are checked of:
"Statistics"
AND
"Question
text"
Leave the check in the Color coding box.
Then click:
|
|
You have just grouped the original 33 values of "emailhr" into only two values, "a little" and "a lot".
Go ahead and print all the pages from your second output NOW, again so that you have them to study when you create your new category system. They also form part of the Assignment 1 that you will turn in to me. (Your entire assignment will be returned.) Label them "bad example recode."
There was nothing special about my recoded example of "emailhr". In fact, it was a second bad example! Two categories was probably too few given the diversity of email use among adults in the United States. This distribution is probably needlessly lopsided with over 80% of Americans in the "a little" category. You will try a different regrouping of the many possible categories of "emailhr" into five new descriptive categories with a good rationale.
Be sure to remove the recoded values of "emailhr" using only two categories in the example before you start recoding "emailhr" for your five categories. You can just block out the prior recode and delete it (or use your delete key).
If you like, you
can continue to study my second terrible earlier example about two categories
of education in the year 2000.
Just click here:
Right click on Hurricane
Frances to open a new window if you like.
ON CODING
FIRST, remember
to select ONLY the year 2002 cases in the Selection
Filter(s): section.
Otherwise you will
have a LOT more cases than the 1881 in the year 2002.
Your recode will be done in the Row: section, only this time you will have five categories instead of two. Remember to follow the syntax.
Remember to get the:
statistics
the
question text
color
coding and
the
column percents.
When you are creating your new category system on the variable "emailhr", remember these tips:
Of
course categories will be exhaustive and mutually exclusive.
Be
sure to include all valid cases!
Keep
the meaning of categories simple
Place people with something in common in the same category.
|
When
working with numbers, you can try for equal intervals if
it makes sense (otherwise DO NOT use equal intervals)
If
possible, and the categories can make sense, avoid categories that
have a very small or very large percentage of cases in them (e.g.,
less than 10% or more than 80%)
Make
category labels descriptive
ABOVE
ALL MAKE SENSE!
THOUGHT QUESTIONS:
What
was the measurement level
of the original variable 'email hours
per week'? Was it nominal, ordinal, interval, or ratio? What's the reason
for your choice? What happened
after you recoded the variable? What
was the level of measurement then?
|
|
PLEASE BE SURE TO READ AND COMPLETE THE BASIC STATISTICS QUESTIONS ASSIGNMENT FOLLOWING (4) BELOW
So...Your assignment is the following:
(1) Devise a new coding category system with FIVE categories that uses
the entire range of VALID values for "emailhr". Use your judgement to collapse
into any five categories you like but:
(2) Give the reasoning behind the five category-collapse system you devise
and explain why this system is a good one. In a few sentences describe
why you created each category that you did, and what the people in each
email hours category have in common--and how they differ from people in
the other four categories.
(3)
Using the SDA, recode the General Social Survey data to YOUR new category
system. Be sure the labels that you create match your categories!
(4) Use the Berkeley SDA system to obtain your new set of frequencies using
your five category system.
Please be sure to print your new frequency table and turn it in with your other output, your table and the answers to the percentage questions.
You can experiment and try different recode schemes before choosing the final set of combined categories that is best for you. Simply remove your "old" recodes in the "row" line and replace them with the new recodes.
(5)
Using your output as a basis, construct a univariate percentage
distribution table for your new "emailhr" recoded variable (only
use the valid percents and exclude the missing data...but make sure you
note the size of the missing data).
Please print or type your table.
PLEASE REMEMBER: Because of the additional information that you must provide for a complete and readable table, your output must accompany the table in an attachment but cannot substitute for it.
DO NOT USE FREQUENCIES IN A PERCENTAGE DISTRIBUTION TABLE (except for the total case base frequency).
(6) ANSWER THE QUESTIONS BELOW USING ALL 33 VALUES OF "emailhr". This is
the very first practice example that you ran.
INTERPRETING BASIC UNIVARIATE DATA
1. What was the MEDIAN email hours per week used by an adult respondent in the 2002 General Social Survey?
2. What was the MODAL category of weekly email hours?
3. Approximately what percent of the sample used email at least 30 hours a week?
4. Approximately what percent of the sample used email less than 7 hours per week?
Here's
what you turn in to me by class Wednesday September 15:
1. Your printed output for the first frequency distribution with 33 valid values for weekly email hours.
2. Your printed output for the second frequency distribution with only 2 recoded valid values of email hours ("a little" and "a lot").
3. Your printed output for YOUR CREATED RECODED frequency distribution with 5 valid values of respondent weekly email use.
4. The rationale for the recoded category system that you created.
5. Your univariate percentage table using your 5 category recode scheme.
6. The four answers to part (6) on medians,
modes, and percents using the entire initial 0-70 (33 valid values) set
of categories.
![]() |
READINGS AND ASSIGNMENTS |
OVERVIEW |
|
Susan Carol Losh September
6, 2004
This page was built with
Netscape Composer
and is best viewed with
Netscape Navigator
600 X 800 display resolution.