Please note: At your instructor's discretion,
there may be minor alterations to the reading
assignments listed below. One of the major advantages to
providing you with an on-line readings archive is that
timely articles can be added or substituted when
appropriate. Opening documents downloaded from this
website will require that your computer have
Acrobat Reader . You will also need the
class-specific password to open individual files.
UNIT 2 ASSIGNMENT SCHEDULE
Links to helpful
resources:
Week 6
Monday, September 23: In-class Exam 1.
The final version of your Unit 1 study guide will be
posted a week ahead of the test. The exam will contain
10-15 multiple choice and several short-essay items.
Topic 6 (Wednesday
and Friday, Sept. 25, 27, after the test): Is
political polling getting better over time? Can we
still trust surveys to determine what Americans think
about contentious issues?
-
Stef W. Kight and Margaret Talev: "Study: What Americans really think"
(Axios 2022, 3pp.) This article uses the
phrase "self silencing" to refer to what social
scientists refer to as "social desirability bias."
What is a list experiment? How does it tackle the
problem of this type of bias? While we know that
people don't always admit to their prejudice against
other group, what does this article (and data in class
PPTs) say about how widespread social desirability
bias is?
The next several articles discuss how political polling
has improved over the last several elections. It looks
like a lot of reading but together, it adds up to less
than 15 pages.
-
Geoffrey Skelley, Why Was The
National Polling Environment So Off In 2020?
(FiveThirtyEight, 2021, 3-4pp). How far off are polls
from estimating actual turnout and vote choice in the
typical election? What is non-response bias? This
article doesn't talk about it, but what are some of
the factors that might cause the supporters of Donald
Trump to be more or less likely to take a survey than
Democrats, independents, and even some Republicans who
don't support him?
-
Andrew Fischer, "Polls Were Great in
2022. Can They Repeat Their Success" (NY Times,
2pp). What are polls doing to get better? How has
adjusting polls to make sure that they have an
proportionally accurate share whites and non-white
individuals without a college education and
respondents who voted for Trump in the previous
election improved polls' predictions.
-
You might be interested in reading more (i.e., this
is optional, but only because it is so detailed) about
how accurate political polling typically is and
whether there is a political bias to polling (when
large numbers of surveys are combined, there typically
isn't, especially if you weight for "house effects,"
which fivethirtyeight.com does): https://fivethirtyeight.com/features/2022-election-polling-accuracy/
Week 7
Topic 7—An introduction to some important tools of the
trade: Statistical analysis programs, scholarly research
resources, and survey datasets
Monday (9/30): Downloading and configuring SPSS
-
Important: Over the weekend, ahead of class, try
to complete all of the steps listed below as it may
not be possible to do them during our class time. Sometimes,
it is hard to download SPSS in the classroom due to
slow bandwidth issues. If you follow all of the
instructions listed below and successfully install
SPSS, configure SPSS (Mac users only), and ensure that
you have your computer set up so that your PSC 2019
coursework is being backed up to a cloud server where
you can recover previous versions of files, you will
not need to be present for much of Monday's class.
Monday's class will be primarily used
-
BRING YOUR COMPUTER TO CLASS. Unless
instructed otherwise, please bring your laptop to
class going forward as you will need to use it
for the statistical work we will be doing for the
remainder of the term.
-
In class, we will try
to make sure you have key pieces of technology in
place so we can begin to learn how to use SPSS, a
widely used statistical analysis software package.
SPSS is one of the four major packages widely used in
the social sciences (the others being R, Stata, and
SAS).
While you have access to SPSS on some university
computers, you will find it much more convenient (and
essential if you have to quarantine at any point) to install
and run SPSS on your own laptop. HPU has a site
license that will allow you to use an
individual-version of the program on your own computer
all semester at no cost. If you want to get started
ahead of our class (or need to reinstall the program
at some point after class), here are the instructions
to download a copy of SPSS and then a 6-mo license
from HPU’s IT office:
-
When you load the SPSS program, you will have to
activate the license. When you get to this
point, check the option indicating that you have an
individual user license. After a couple of clicks,
you will paste your license code in and click the
"add" button. From there, you may have to click
through some more OKs to complete this process.
- Once you are done installing SPSS, make sure you
can run it. To do so, you will need to find and
open it. It likely won't be on your programs
dock/banner initially. For PC users, you can quickly
find and open a program by using the search bar on the
bottom banner. For Mac users, you can quickly find a
program by clicking on the Spotlight icon (magnifying
glass) in the top right-hand corner of your computer.
To make things faster in the future, you could create
an alias for your desktop
-
Important!: For Mac users only: You should
quickly complete a set of one-time changes to SPSS
for Mac's default settings so that your version of
SPSS will look and act just like SPSS for PC.
Important: You aren't changing any other program or
your OS, just your SPSS settings. Open SPSS for Mac
in its default mode, selecting the option to open "a
new dataset," which opens a blank data editor
window. From there, follow this link for instructions
on making the necessary changes. Why are you
being asked to change the program's defaults?
Inexplicably, the default layout of SPSS for Mac
looks quite a bit different from the PC version, and
the default Mac settings lack SPSS options that are
described in most textbooks, my screencast
tutorials, and most instructional materials you will
find online.
-
Also in class, we will download and configure
Google Drive for Desktop (if you don't already
have it) and configure it to back up your Documents
folder, which is where you will store your PSC 2019
SPSS datasets and work. Backing up your SPSS datawork
every time you invest time into a project is critical,
and a failure to do so can result in a major setback.
-
For both PC and Mac users, I *strongly* recommend
that you install Google Drive for Desktop and
configure it so that you automatically sync and
backup to the cloud all of the files in your
computer's "Documents" folder every time you make
changes to and save a file your computer.
The main advantages of Google Drive over some other
backup programs are cost related as well as the fact
that you can automatically access multiple previous
versions of a file instead of overwriting the only
version of a backed up file every time you save it.
This feature is important if you ever make a mistake
with a file and save changes to it before realizing
what you have done.
-
Here are instructions on how to download and
install Google Drive for Desktop:
https://support.google.com/a/users/answer/9967896
-
Also take a look
at the instructions on this webpage regarding the
steps you to need to take to configure Google
Drive so that it will sync the folders and files
in your "Documents" folder with the online version
of Google Drive. When you configure Google
Drive without purchasing additional cloud space
(most of you won't need it), you will have a large,
but limited, amount of free space to work with. If
you are going to be using the free version of Drive,
you have the option of syncing your documents folder
without also backing up space-hogging photos and
videos. You can have Google Drive and other backup
programs running simultaneously, so use something
else to back-up your photos and video if you are not
going to pay for extra storage.
- To verify the settings for Google Drive's
for Desktop, you will need to open the
program rather than the storage area. For a
PC, you open Google Drive by clicking on the up
arrow in the bottom-right corner of your computer,
which will show you several different programs running
in your computer's background. Select the triangle
icon that is blue, green, and yellow. For a Mac,
click on the Spotlight icon (it looks like a
magnifying glass) in the top right-hand corner of your
computer. Search for Google Drive. It should identify
the program icon for you.
-
When you set up Google Drive's for Desktop's
configuration, you want to "mirror" rather
than "stream" files. Mirroring a file means
that there will be a copy of your file stored on
your personal laptop and a second, identical one
synced to the cloud. To make sure you have this
configured correctly, open the Google Drive for
Desktop app. Click on the (1) gear icon and then
(2) preferences and then (3) "Google Drive Folders
from Drive" and check the "mirror files" option if
it is not already checked.
-
Once you using Google Drive and have it configured
properly, create a folder for all of your
Research Methods work in your computer's Documents
folder.
-
Finally, verify that Google Drive is configured
so that it continuously backs up (i.e., syncs to
the cloud) a copy of your entire documents folder
or at least your senior-seminar-work folder.
To tell Google Drive for Desktop to backup a
specific folder, open the program. Click on the (1)
gear icon and then (2) preferences and then (3) "My
laptop" and then (4) "add folder" and then select
the folder you want to back up. If you don't have a
huge amount material saved in your documents file,
just select the whole folder. At a minimum, back up
the folders with your Research Methods work.
Wednesday (10/2)
This week will have very little out-of-class homework so
that you can use this time to review screencasts on what
we are covering in class if you think a review is
necessary. In class, we will continue getting introduced
to SPSS and learn the basics of "data set preparation,"
which is the process of downloading survey data and then
taking, if necessary, a few additional steps to make sure
that we can analyze just the data we want.
-
In Wednesday's class, most of our time will be
spent downloading and taking a preliminary look at a
2022 dataset from PRRI (which we will
download in class). We will also make sure that we
download the survey's methodology and original
questionnaire, taking a close look at both.
PRRI is a nonpartisan think tank that annually
administers its Values Survey, which includes a
large number of interesting topics (for example, whether
or not an American thinks their top leaders need to be
moral in their private lives to be effective in the
their public roles. We will be looking at this
particular dataset because it covers so many interesting
topics. For political science and international
relations majors who want to begin thinking about the
research design they will submit at the end of the term,
this may be a good source of data for your project.
We will discuss how we can use
SPSS's Frequency command (Analyze
-> Descriptive Statistics -> Frequencies)
to see how Americans are divided on various issues.
We will also learn how to use
SPSS's "Split File" command [Data -> Split File -> Compare groups
(and then identify the variable whose response categories
you want to compare)]. For example, we can split our
dataset on the partisanship variable and then a frequency
to see how Democrats, Republicans, etc. differ in how they
answered the questions in the survey. This approach
is a very quick and simple way to get a first look at how
an independent variable (for example people who identify
with different races) corresponds to differences in a
dependent variable (for example, the party they identify
with). Important: After
you have compared your groups of interest, make sure to
go back to the split file and select the option to
analyze the full dataset; otherwise, SPSS will
continue to report all statistics for the subgroups as
long as the program is running.
Friday (10/4) will be
spent reviewing some of the places you can find survey
datasests as well as how to more effectively use
Google Scholar and seeing what AI resources like
Concensus can do help identify interesting research
questions and survey the existing research on a topic
-
Now, go to tab on the top banner labeled, "Tools
& Resources." Look at the resources for "Survey
Question Search." Try a practice search in
"International Questions." For example, check the
box that will show results only from surveys that
have been administered in Brazil and then search
this term: polit (the partial word search will
survey questions that included either political
or politics).
-
Now, go back to tab on the top banner labeled,
"Tools & Resources." Look at the section labeled
"Dataset downloads." This is where you can find and
download the public-use datatsets noted in many of
the Pew articles you see on the website. This is a
great place to start if you are looking for data on
specific groups or have no idea yet what you'd like
to write a research project on. If you are
interested in doing work on foreign countries, for
example, you'll want to begin looking in the "Global
Attitudes" unit.
-
IMPORTANT: Pew typically embargoes the release of
new survey data for a least a year so its
researchers can publish findings first. This section
of the website will be the best place to start
looking for a thesis topic using Pew data because
the press releases in this section all refer to
datasets that already are available. If you are most
likely to do a topic on American politics, the ATP
(American Trends Panel) data is going to be a good
place to start looking for a topic.
-
Before class, quickly familiarize
yourself with the work of Pew's research units that
focus on specialized topical areas (these are
the topic areas Pew uses to organize the dataset in
the "Dataset Downloads" section that you looked at
earlier. Take quick look at these Pew Center websites
to get a sense about what kinds of topics the various
research units at Pew are looking at:
Pew Research Center for the people and the press
(domestic studies on issues other than those tackled
by Pew's special units):
https://people-press.org/
- Ahead of class, you do NOT need to
review materials at the other research centers listed
below, but be aware that there are many other
places where you can locate high-quality, publicly
available datasets at no cost. See some of these
resources at: https://marksetzler.org/generalissues/SurveyDataSources.html.
Students in advanced political science classes have used
datasets from a wide range of sources.
Week 8: No
classes, midterm break!
Week 9
Topic 8—Some SPSS basics
and hands-on practice: Preparing survey datasets for
analysis
On Monday (10/14), we will go over the
steps necessary to drop observations and variables from a
dataset.
-
In class, we
will continue working with the PRRI Values Survey from
2022, which you downloaded previously.
-
Ahead of class,
if you haven't already done so, please make
sure to have an SPSS version of this dataset saved to
a folder on your computer's Documents folder.
Specifically, you want to have it saved in a subfolder
there, so that the pathway to your dataset looks
similar to:
Documents/PSC2019_SPSSwork/PRRI_Values_Survey_2022/PRRI20220.sav
If you still need a copy of the dataset, I have put it
and its questionnaire in a zipped file (so you can
download both) in the course's PPTs and Assignments
folder: https://marksetzler.org/ResearchMethods/PPTs/.
The dataset is in a folder labeled "SPSS 1
(downloading and coding data)"
-
Ahead of class, please read
through the first part of this document that
I asked you previously to download and print out (a
handout I have prepared so you have a quick
reference) up to, but not including, SPSS Basics 5:
Recoding Data. This handout covers EVERYTHING we
will be doing in class on Wednesday and Friday.
-
During class, we will be
looking more closely at how the "syntax" window works.
This is where we either manually enter or use SPSS's
point-and-click features to write code telling SPSS to
do something. Specifically, we will be learning how to
use sytax to "prepare a dataset for analysis," which
usually means removing variables and perhaps
observations that are irrelevant to a study.
-
After class,
watch this screencast if you need more guidance on how to create a smaller
version of an SPSS dataset that keeps only some of
the variables (and also how to recreate that dataset
if you later discover you omitted variables you
shouldn't have). The screencast runs a
little over 14 minutes.
-
Important: I left
two important things left out of
screencast. First, you need to make sure to save
your syntax so that you will able to run again it in
the future if you want to add additional variables
from the full dataset. Name the syntax file
something like: "Syntax to keep only some
variables."
-
Second, you should add a note (starting with an
asterisk) at the top of the syntax reminding your
future self that this syntax needs to be run only
when the full copy_version of the dataset is open in
SPSS.
-
So why do you need to know how to drop most of the
variables from a dataset? If you don't take a few
minutes now to learn how to create a truncated
version of a dataset, every time you generate
statistics for a project, you will have to scroll
down through large amount of irrelevant information.
And, if you are working with a dataset that has
dozens of similarly named variables, working with
the full dataset increases the likelihood of making
mistakes in your coding and analysis.
Key
ideas from the screencast, so you shouldn't need
to watch it more than once (if at all):
You can create the necessary code to
make a small version of your dataset in two steps:
Step one:
- Open the copy_ version of the full
original dataset.
- If you are using a Mac you need to
have previously configured SPSS for Mac so that SPSS
works the same as it does on a PC.
- Use the FIle -> "Save As"
command, and click your way to the destination
folder where you want to save a smaller version
dataset.
- Do not hit the ok
button--after you give this file a new name, you
are going to use the Paste button to paste the code
into a syntax file. We will add an additional line
to this code in a minute. If you are using a Mac
and do not see a paste button, you need to verify
that your version of SPSS is configured as
explained in the schedule second related to
installing SPSS.
- Rename the file being saved so that
you know it is a smaller version of the dataset. Dr.
Setzler typically keeps the name of the original
dataset, but adds *small_to the front end of the
file. His system is to have original_, copy_, and
now small_versions of the dataset.
- Again, after you have renamed the
file, use the Paste
button to paste the code into a syntax file.
- The last step will automatically
open up a new syntax file and paste a command
telling SPSS to save the small_version of the file.
If you select and run that code, it will save the
full dataset, and that's not what you want. You are
going to want to tell SPSS to keep only some of the
variables when you run that SAVE command. Before you
modify your syntax to do that, take a minute and add
a notation (starting with an asterisk so SPSS will
grey out your note and not try to run it) explaining
to your future self what this code does. This is a
good time to remind yourself in the note that, "In
order for this code to run in the future, you need
to have the full copy_version of the dataset
open."
Step two:
- Now, you want to manually change the
syntax you just pasted, telling SPSS to save the
_small file with just a subset of variables. To do
this, type out "/KEEP=" as a subcommand and
then list of variables to your save command. The
line of code that you add should look like what I
have bolded here:
SAVE
OUTFILE='C:\Users\msetzler\Google
Drive\SeniorSemPrepDataset\AmerBarUS2018_working.sav'
/KEEP = Var1 Var2 Var3
/ COMPRESSED.
*Note that
syntax lines that begin with a forward slash are
subcommands. The ONE period for the full command goes
at the very end of the command, which is after the
last subcommand in this case.
- Select just the SAVE code block
(i.e., from SAVE OUTFILE through COMPRESSED.) and
tell SPSS to run the selection. You can do this by
green arrow in the commands icon options or by
right-hand clicking and selecting run selection (on
a Mac, holding the control button down while doing a
touchpad tap is the equivalent of a right-hand mouse
click).
- Verify that you now have a
_small.sav file in the folder where it is supposed
to be. Open it up and make sure just the subset
variables you need are there.
- If all looks good,
add two notes to the top of your syntax. Start
notes in your syntax with an asterisk to SPSS will
grey out the note and not treat it as code. SPSS
sees a period as your indication that the note has
ended. The first note should be a reminder to your
future self that whenever you run this code, you
will need to have the copy_ version of your full
dataset open, and you will open this syntax file
from within that dataset (File -> Open->
Syntax). The second note should be a quick
reminder to yourself of what this syntax does.
- Once everything is
working and you have annotated your syntax, save
the syntax file to the same folder as your
datasets. Name it clearly so you can find it again
in case you need to re-run it in the future.
- These steps are
important because you may need to add or drop
additional variables in your small_ dataset at
some point, To do so, all you will have to do is
delete the current small_ dataset file, change the
syntax as necessary to add new variables you want
to keep, and rerun it. As you saw in the
screencast, this quick step will recreate a
modified small file that has only the variables
you need.
- If all looks good, close the syntax
and all of the open datasets. Do NOT save any
changes to the copy_ dataset if asked to do so.
Going forward, you will be working only with that
small datset.
Pro-tip from the screencast:
- When editing the Keep= line, rather
than copying variable names individually or typing
them in manually, it is a lot faster and way less
error-prone to create a list of variables to keep
using a point and click command:
Analyze->Descriptive statistics
->Descriptives->and then select all of the
variable names you want to keep. Then, paste your
descriptive command into your syntax. Select
just that code and run it (to run it, hit the green
arrow button).
- If the output for the variables
looks good (i.e., you got the right variables
listed), then copy and paste the descriptives
command's list of variables into your /KEEP= code
line.
Another pro-tip from the
screencast:
- When you are in the
Descriptives selection window (or any command
listing all of the variables), it is a lot faster to
find the variables you are looking for if you
right-hand click on the variables (Mac users need to
press the command key and click at the same time) to
first show variable names and then to sort those
names alphabetically. This trick works in all of the
menu's dialog windows where there is a long list of
variable labels listed in whatever order they appear
in the dataset.
- After class, watch this screencast if (i.e., doing so is
optional) you
need more guidance to prepare a dataset so that you
will analyze only a subset of a survey's respondents:
It is about 17 minutes long, so you may want to try
following the written instructions below first. Why do
you need to know how to do this? For example, when
international relations majors are using a Global
Attitudes or a LatinoBarometer survey, they often only
want to look at respondents from certain countries, and
running statistics on the full dataset would return
results for all of countries in the survey.
Key steps noted in the screencast:
(1) Open the copy_ version of your dataset
(2) File -> open -> syntax... and open the syntax
file you used to create the small_ version of your
dataset (the one where you have removed many variables).
(3) If your small_ dataset does not have all of the
variables you will need to tell SPSS to keep only
certain types of respondents, add those variables to
your syntax and re-run it to recreate your small_
dataset.
(4) At the bottom of your syntax, make a notation,
starting with an asterisk, indicating which types of
observations you will be dropping
(5) With the copy_ version of the dataset open and the
small_version closed, Point-click-paste: File -> Open
-> Data -> and select but do not actually open the
small_ version. Use the paste button to tell SPSS to
paste the "GET" command into the bottom of your syntax.
That syntax will tell SPSS to go get and open the small_
version of your dataset when run. The reason why this
step is important is that you are going to be automating
an SPSS command to get, open, and then change the
small_version of the dataset rather than your full
copy_version.
(6) Select and run the GET command you just pasted.
Doing so will open the small_version of the dataset, And
then close the copy_ version of your dataset. Again this
is to make sure you don't change and save changes to the
full version by mistake when you need to be modifying
the small_ version only.
(7) Now, you can create the code that will keep only
some observations. Use point-click-paste to work your
way through the Data->Select Cases command. You will
need to enter logical criteria specific to your needs in
the "Select If" module and then hit continue. You then
will need to tell SPSS to "Delete" all of the "unused
cases." Remember to
paste this code into your syntax.
(7) Now you need to automate the process of saving your
small_ dataset everytime this full syntax file is run.
To do so: Point-click-paste: File -> Save AS->
Data -> and select but do not open the small_
version. Paste this code into your syntax.
(8) Select and run the "Filter off" code all the way
down to the end where you save the dataset. Take a look
at the small dataset and make sure that it only has the
variables and observations it is supposed to have.
(9) Save your revised syntax to write over the first
version. Now, if you need to make changes in the future,
you will just open this syntax from within the copy_
dataset,, make your edits, and re-run the syntax.
If all of this seems like
a lot of work, you don't have to automate things as I
suggest above and in the screencast. Here's a faster
way:
(1) Make a copy of the small_version of your dataset. If
you accidentally delete the wrong observations, you want
to have a backup. You will not need this backup unless you
delete the wrong observations and then save over the
small_version of the dataset, too.
(2) Now, open the small_ version of your dataset that has
most of the variables removed.
(3) Create and run the few lines of code needed to remove
observations from the small_ dataset. To do so, go Data
-> Select cases -> Select if. Then identify the
conditions that need to be met for a variable to be kept.
In the screencast, variables are kept if (partyID7 =1 OR
partyID7 =1 OR partyID7 = 2 OR partyID7 =3 OR partyID7
>4. If all looks good, select the button telling SPSS
that you will delete unused cases AND paste this code into
syntax.
(3) Run your code and then use a frequency command to look
at who is in your dataset. If you have removed all and
only the observations you wanted to remove, save the
edited small_ dataset, replacing your previous version.
This version has fewer variables and fewer observations
than the copy original dataset.
(4) If you remove observations this way, make sure to
include annotations at the start of your your new syntax
to remind your future self what this block of code
does.
(5) If you need to add additional variables to your small_
dataset in the future, remember that you will need to run
this second syntax file to remove the relevant
observations and save the file again.
Here's an example of what your syntax would look like if
you wanted to only include respondents living in France,
the UK, and the US (assuming the variable "country" is
coded 17, 2, and 4) for French, British, and Americans,
respectively:
*This code
will drop any respondent who is not from one of those
three countries.*
FILTER
OFF.
USE ALL.
SELECT IF (country = 17 or country = 2
or country = 4).
EXECUTE.
*Here is
a second example, but this time for the select line,
let's use "AND" to tell SPSS to keep observations only
if they meet two conditions (here, they are male
Republicans):
SELECT IF (Party=2) AND (Male=1).
If we wanted to keep only males who
were Republicans or Democrats (but not independents or
other responses), the select line would look something
like:
SELECT IF (Male=1) AND (Party=1) OR
(Party=2) .
Important: If we have a key variable in
our study and many respondents were not asked the
relevant question, we want to delete all observations
in the dataset that have "missing" data for that
variable. If we don't do this, descriptive
statistics for all of the other variables will analyze
lots of respondents who will not be included in
bivariate and multivariate analyses. In this example, I
want to remove all respondents who did not answer a
question about whether they support making public
colleges free to attend because the relevant question
was presented to only half of the people taking the
survey:
*This code drops any
respondent who has missing data on the variable
"free_college_good"
FILTER OFF.
USE ALL.
SELECT IF (NOT
MISSING(free_college_good)).
EXECUTE.
On
Wednesday (10/16), you will practice what you have
learned about SPSS so far by starting to complete a
Blackboard assignment in class.
-
SPSS assignment #1 will require you to locate
and download an original PRRI dataset, make a copy.
version of that dataset, and then make a small.
version that drops certain variables and observations.
You also will need to split the dataset into groups
and practice running descriptives and frequency
commands. The assignment will be written so that most
students should be able to complete their work in
class; however, I will post it after class on
Wednesday in case you want to get a head
start.
-
Barring unusual circumstances, SPSS assignment #1 (BlackBoard) must
be completed by 5pm Friday (10/18). If you
have a unique situation related to how SPSS is working
on your computer, please contact Dr. Setzler after you
have carefully tried to address the issue yourself and
carefully reviewed the instructions/screencast links
above. It is better for all involved for you to submit
a late assignment (that will receive a modest penalty)
rather than one that is partially completed.
Topic 9 (10/18 and 10/21) Recoding and
creating variables; labeling variables and their
response categories
On Friday and Monday, we
will learn how to create variables in SPSS. You will be
completing an assignment on this topic on BlackBoard. It
will be due at the end of Week 10.
-
Ahead of
Friday's class, read Chapter 7 in
your textbook up to the section on descriptive
statistics. This reading is just five
pages; it will help you understand what we will be
doing this week and why.
-
Also before class, read the rest of this document, starting at the
section "SPSS Basics 5: Recoding Data." This is the
first handout of three documents that your instructor
has written to cover all of the SPSS coding and
statistical methods we use in PSC 2019 and PSC 4099.
The handout:
-
Explains why researchers typically recode or create
every variable in their analyses rather than working
with a dataset's variables in their original
form.
-
Summarizes everything we have done with SPSS so
far, which mostly has involved the process of
finding and downloading data and--if
necessary--importing the data into SPSS, removing
variables, and removing observations not related to
our study in anyway.
- Explains in detail how to recode an original
variable. You will find this information helpful when
you are completing the Blackboard assignment at the
end of the week.
One of the main topics this block of material is how
to use SPSS's RECODE INTO command. Note: everything
that is discussed below in this section will be taught in
class. You have no assigned readings on this block of
material other than reading through the points that are
listed below. Any linked screencasts are optional; I
record screencasts that duplicate class material because
some students prefer visual summaries versus those I post
in the course schedule.
-
The RECODE INTO command is your go-to
method when are working with only one original
variable and want to collapse its categories
(perhaps into a dummy variable, like making the 0-1
variable Republican out of the original variable where
Republican is one of several partisan groups coded
onto the same variable). RECODE INTO is also a command
we can use to reverse code a variable's values (e.g.,
using Conservative7 to make the new variable
Liberal7). Here are the key steps to recoding a
variable in SPSS:
(1) First, use SPSS
to run a CODEBOOK command on the original variable,
Conservative7. You need to do this in syntax. It will
show you how the variable is numbered and labeled:
CODEBOOK
Conservative7.
(2) Next, if it is
available, it would be helpful to have the original
variable's exact question phrasing pasted into our
syntax so that you can easily recall how exactly
the variable was phrased later on.
You can write or copy and
paste annotations into the syntax right before the
RECODE INTO command. If you put an asterisk in front of
an annotation with question phrasing pasted from the
questionnaire, it will be greyed out so we can see it,
but SPSS won't stall when it gets to that part of the
syntax. SPSS will keep greying out language as long as
their are no line breaks AND you don't click on the
return and end a line with a period. So, when you are
ready to end the annotation, use a period and click on
return.
(3) Start to recode the
variable by using SPSS's point-and-click interface,
going to Transform -> Recode into New Variables.
you want to use the RECODE INTO command
which preserves all aspects of the original variable in
case you make any recoding mistakes.
For this example, you will
need to tell SPSS that you are going to transform the
orginal variable HowConservative7 into a new variable
named HowLiberal7.
In the same screen, you will
provide a label for the new variable. Variable names
cannot have any spaces. Make sure to label new variables
in a way that makes sense and is specific enough that
you will remember what a higher value is (e.g., don't
name and label a variable "Gender" if you are creating a
dummy variable for people who identify as male because
you may forget who is coded 1 or 0 later on),
After you label you
variable the button "Change." Nothing will seem to
happen, but the new variable will have that label when
it is created late.
(4) Now, click the button
for "Old and New values." This is the area where you do
all of the actual recoding. Make sure that you think
through how you need to recode. You need to recode every
value in the original variable into some value (or
system missing) in the new variable.
(5) It is best practice in
recoding to start off by indicating that any value that
was "system or user missing" in the old variable should
be coded as "system missing" in the new variable. Choose
those options and click on the add button.
(6) Now you need to recode
the values. So 10 in the old variable = 1 in the new
one, and click on the add button. And then 9=2 and
"add," and so on until all ten of the values are
reversed for the new variable.
(7) Once you have made sure
to include instructions that will recode ALL of the
original values into the new variable, click on continue
and then paste. If you accidentally click on the OK,
button, just point and click your way back through the
transform command where all of the data you just entered
will still be there, and this time click on paste.
(8) To create the new
variable, you have to select and run the just code you
pasted into syntax (run with the green arrow on the
menu). Whenever you create a new variable, do what you
can to make sure you did the recoding correctly. To do
this run a frequency for the original and new variable
and make sure that everything looks right. If you didn't
code the variable correctly, look at your syntax and see
what went wrong. If you need to edit the syntax, you can
run the whole RECODE INTO command over again, and it
will recreate the variable (this is a cool improvement
over older versions of SPSS, which made you delete a
variable before you could recreate it).
(9) Whenever you
make a new variable, you need to label both it and the
variable's response categories. The fastest
way to label variables is to do so in syntax. If you are
using RECODE INTO SPSS, you already will have
created the code to label your variable back in step 3,
However, you also will need to add value labels to any
new variable you create. Here's what that code looks
likes if we want to label just the anchors on our
7-point measure of liberalism (if we were working with a
dummy, multi-category, or ordinal variable we would
label each value):
VARIABLE
LABELS Liberal7 "How liberal is the respondent?".
VALUE LABELS Liberal7
1 "Very Conservative"
7 "Very liberal" .
Notice that there are two
periods for the two label-related commands. There is
only a single period at the very end of each command,
and it goes outside of any parentheses.
Note that if you want to
change the variable label or any of its value labels,
you can make the edits and just rerun the commands. This
is one of the big advantages to working in syntax.
Finally, notice that once
you have created the recode+relabel syntax for a
variable or two, you can copy and paste that syntax and
quickly swap out values and phrasing to very quickly
create a whole bunch of similar variables.
We also will
review the other technique most commonly used to change
and create new variables in SPSS: Using a
combination of COMPUTE and IF commands in syntax. Note: everything that
is discussed below in this section will be taught in
class. You have no assigned readings on this block of
material other than reading through the points that are
listed below. Any linked screencasts are optional; I
record screencasts that duplicate class material because
some students prefer visual summaries versus those I post
in the course schedule.
-
If you prefer, you can use the COMPUTE command using
SPSS's point-click-paste commands (see the Transform
menu). However, it is a lot clearer and easier to
write these commands directly into syntax.
-
You can use COMPUTE
+ IF commands to recode variables instead of using RECODE INTO. If you are
making many similarly coded new variables, either type
of SPSS command works well. To get you acquainted with
using Compute and If statements, we will practice
using them to recode a few variables in class. This
technique will also be taught/reviewed as part of your
BlackBoard assignment. After this week's classes,
if you need a refresher, here is an optional screencast that goes over
the technique (under 9 min if you listen at full
speed). One important thing: After I
recorded the screencast, SPSS changed how you deal
with missing data on the original item. See the
highlighted code below:
After class, if you need a refresher on
what we practiced, here is an optional
screencast that goes over the technique (9 min or so
if you listen at full speed).
Here is a summary of what is
covered in the screencast, only it uses an example (a
dog's breed and color) that may be easier to remember:
Let's say we have a sample
of dog owners who all own one dog. They have taken a
survey asking questions about their dog. We want to
create a hypothetical new dummy variable identifying the
owners of white poodles. Our new dummy variable,
WhitePoodle, will be created with information from two
original variables: dog_breed and dog_color.
A CODEBOOK command indicates the following:
*dog_breed = 12 if dog is a poodle, while values of 1
through 55 refer to other dog breeds.
*dog_color = 1 if a dog is white, while values 2 through
10, refer to black, brown, grey, spotted brown, etc.
*If a respondent didn't know or refused to give
information about their dog, the relevant variables were
coded 99 in the dataset.
Here is the block of code
we could use to create the dummy variable:
COMPUTE WhitePoodle =
$SYSMIS.
IF (dog_breed < 99) AND (dog_color < 99)
WhitePoodle =0.
IF (dog_breed = 12) AND (dog_color = 3) WhitePoodle
=1.
*and we would needto deal with any missing data on the
original variables if there were any:
IF MISSING(dog_breed) OR
MISSING(dog_color) WhitePoodle =$SYSMIS.
VARIABLE LABELS WhitePoodle "Dog is a white
poodle".
VALUE LABELS WhitePoodle
1 "White poodle"
0 "Not a white
poodle" .Note where the periods are.
Note that COMPUTE, IF,
VARIABLE LABELS, and VALUE LABELS are all commands, so
one--and only one--period goes at the end of each full
command even if that command stretches over more than one
line of code.
Here is the logic behind how the code was written, step by
step:
Step 1
To see how the responses for dog_breed and dog_color are
coded and labeled in the dataset, we run a codebook command:
CODEBOOK dog_breed dog_color.
Step 2
Now, tell SPSS to create
a new variable where all of the values are blank (i.e.,
are "system missing"):
COMPUTE WhitePoodle = $SYSMIS.
If we were to look at the data set in the SPSS data
viewer (Data View tab), we would see a new column for a
variable name "WhitePoodle." All of its values would be
blank.
Step 3
Tell SPSS to turn almost all
of those black values into a zero if the observation
should not be system missing:
IF (dog_breed < 99) AND (dog_color
<99) WhitePoodle =0
Notice this set changes the blank to a zero (meaning
not in our group) for every respondent who didn't say
don't know. Now, if we were to look at the data
set in the SPSS data viewer (Data View tab), we would
see most the values for "WhitePoodle" are zero.
Step 4
Now, tell SPSS to
turn some of the respondent's values into a 1 if certain
conditions are met:
IF (dog_breed = 12) AND (dog_color =
3) WhitePoodle =1.
This step turns all of the zeroes into ones for people
who have white poodles.
Notice that the order of commands matters! If we had
done Step 4 before we did step 3, everyone who didn't
refuse the answer the questions would be coded as not
having a white poodle
Step 5
Now, make sure to tell SPSS to turn some of the respondent's values into
missing data if they had missing data on either of the
original items::
IF
MISSING(dog_breed) OR MISSING(dog_color) WhitePoodle
=$SYSMIS.
Step 6
Now, tell SPSS to label the new
variable and its response categories. For
example:
VARIABLE LABELS WhitePoodle "Dog is a
white poodle".
VALUE LABELS WhitePoodle
1 "White poodle"
0 "Not a white poodle" .
Step 7
Run a frequency on the old
and new variables to verify that general pattern and
number of observations looks right. If you made a
mistake, read carefully through the code. If you make
edits, you can run the block of code over again; SPSS will
delete the variable and replace it with the edited version
-
COMPUTE + IF commands also are the best way to
create a couple of other types of variables that
researchers frequently use, including reverse coding
variables with lots of response categories.
While you can use RECODE INTO to reverse code a
10-point measure, it is a lot faster to use code like
this:
COMPUTE
New_Variable10 = 11 - Old_Variable10.
IF Old_Variable10 = $SYSMIS New_Variable10 =
$SYSMIS.
IF Old_Variable10 = 99 New_Variable10 = $SYSMIS.
IF MISSING(Old_Variable10)
New_Variable10
= $SYSMIS.
*Notice the COMPUTE line's logic. We're
telling SPSS to create a new variable by subtracting
the original variable's value from
11, which is one higher than the maximum value.
This means a 10 on the old variable will be a one on the
new variable while a 1 will become a 10, and so on.
*Notice also, that we have to tell SPSS which variables
should be coded as missing responses on the new value. Here, we're assuming 99 on the
original variable means that the respondent refused to
answer or said they "didn't know."
After the week's
class meetings, if you want to review more details on
reverse coding a variable using COMPUTE + IF commands,
here is a optional
screencast on the topic. It runs a bit over
10 minutes, but only the first 6:30 covers the reverse
coding. In the sample code directly above, you'll
see that I do something that I didn't do in the
screencast, but that is important; I added a line of
code to make sure that any missing responses on the
original item carry over to the new item:
IF MISSING(Old_Variable10) New_Variable10
= $SYSMIS.
I recorded the rest of the screencast because it shows
your instructor making a very common mistake when adding
variables (initially, both the VARIABLE LABELS and VALUE
LABELS commands neglected to tell SPSS what variable
needed to be labeled).
In the likely event that you need help
trouble-shooting issues that come up with labeling new
variables and their response categories, here is a list
of common issues:
- Building on the example from above, we might label
our new variable:
VARIABLE LABELS WhitePoodle "Dog is a white poodle".
EXECUTE.
What to check if you can't get SPSS to create a
new variable's label:
-
Note that the VARIABLE LABELS command is plural
even when you are just labeling one variable. If
they made your instructor king for a day, the
command would run just fine with or without that
last S; however, his promotion to regality does not
appear to be forthcoming.
-
Note where the quotation marks go. SPSS uses a
single quotation mark, but you instructor always
quotation marks, which allows him to “Put single
‘marks’ inside of a label”
-
Notice also that each line begins with a command
and ends with the required period. There is only one
period per full command.
-
Notice that you have to tell SPSS which variable
you want to label. In this case: WhitePoodle
-
The EXECUTE command tells SPSS to go ahead and do
this work now rather than waiting until another
command is run. This is a leftover “feature” from
when computers used to be SLOW and no one wanted to
sit around waiting for the program to label
variables until those variables were actually going
to be used in some type of analysis.
-
Finally, if you don't select and run the full block
of code, the variable won't be labeled.
What to check if you can't get SPSS to create the
response labels:
-
You can't have extra lines in the command.
-
Folks accidentally substitute VARIABLE LABELS from
above instead of using the VALUE LABELS command.
-
Folks labeling a variable try to use VALUE LABEL
without the final “S,” in which case SPSS doesn’t
recognize the command.
-
Folks forget to put at least one space between each
number and it's label.
-
Folks add an equal sign (e.g. 1 = “yes”) or forget
the quotation marks for each label.
-
Folks add an extra period after WhitePoodle,
despite being mid-command.
-
Folks forget to include the period after the last
label. Your instructor typically put the period on a
separate line so it is easier to copy and repurpose
old code with more or fewer categories without
forgetting to add the required period after the last
label.
-
Note that the EXECUTE command and its period need
to be there.
-
And, finally sometimes people create all of the
code correctly but forget to select and run it.
Week 10
In Monday's class, we will finish up the
workshop on coding variables and you may have some lab
time to work on an SPSS assignment on BlackBoard. SPSS
Assignment #2 will be due at the end of the week (i.e.,
by Friday evening).
Note: As of Monday's class, we are two weeks out from
your Unit 2 test. Make sure to take a look at the
study guide if you have not been following along all unit!
Topic 10 (Wednesday, October 23)—Descriptive statistics
for a variable: Frequencies, medians, means, and
standard deviations (aka, just enough univariate
stats to make sure that your variable coding has been done
correctly and that you can describe a variable's central
tendency and its distribution)
- Read closely, this
appendix from a best-selling book targeting a
non-specialist, general audience (Richard J.
Herrnstein and Charles Murray, "Statistics for People
Who Are Sure They Can’t Learn Statistics," 12pp).
-
Read closely Chapter
7, "Getting Started with Quantitative Data:
Descriptive Statistics." in Carolyn
Forestiere's textbook. After the classes that meet on
the substance of this chapter, you should be able to
explain, apply, and give an example of every concept
in the glossary.
-
Make sure to review
the summary of research example by Ceka and
Magalhaes (you do not need to scrutinize the
summary of Park and Shin's article). In addition to
paying attention to how different variables were coded
in the study, review the questions in the study guide,
which ask you to think about this study's research
design. Specifically, see if you can identify the
theory, dependent variable independent variable,
intervening variable (which is assessed with three
country-level measures), and controls.
-
For each of the statistical techniques we cover in
this class, your instructor will summarize the key
ideas that you need to remember in a block of
material, like the one you see immediately below.
Review these blocks of material carefully until you
understand every point well. The reason that I
am summarizing this material in the schedule is so
that you can go to one place to review every major
statistical concept. These same ideas are covered in
class, in PPTs, and (for the most part) in assigned
readings.
After class and completing your
assigned readings, make sure that you feel
comfortable with your understanding of a number of
basic the concepts and methods that researchers use
to explore the "central tendency" and distribution
of different types of variables. These
concepts will come up continuously for the rest of
the term. Here are some big takeaways that you
will want to remember:
-
The type of variable you are analyzing
matters. Means, medians, and standard
distributions communicate valuable information
about interval (aka "continuous")
variables. With an interval variable, every
one-unit increase is assumed to be equal in
influence. When that it not the case, interval
variables are often modified so that a one-unit
increase is roughly equal in importance. For
example, researchers typically substitute years of
formal schooling with an interval variable that
looks something like this: 1=no high school degree;
2=high school degree; 3=some college or 2-year
degree; 4: 4-year college degree; 5) Masters degree,
6: Doctoral degree.
-
Standard deviations are a widely used measure
(in all fields) to look at the distribution of
an interval variable
across its range of values.
A straightforward way to
explain a standard deviation (SD) is to say that it is the
measure of how far away from a variable's mean most
respondents' answers are scattered.If a variable is normally distributed, two-thirds of
all cases will be within one standard deviation of the
mean, a range that is approximately a third below to a
third higher than the average respondent. And almost all
respondents fall within two standard deviations below or
above the mean. Specifically 95% of observations in a
normal distribution fall within two standard deviations.
An observation that is three standard deviations below or
above the mean would be an extreme outlier (i.e., in the
bottom .3% or the top 99.7%).
Standard deviations are useful for comparing the
distributions of two populations on the same variable.
So, if we have two cities with the same mean and median
income but very different standard deviations that vary
by tens of thousands of dollars, the city with the
highest standard deviation also has a higher degree of
inequality. In short, a small standard deviation means
that most observations are clustered around the mean,
while a larger standard deviation means that more
observations are further away from the mean.
Standard deviations also are useful for comparing
observations on two different variables. For example, we
might say that someone whose LSAT (law school admissions
test) was 1.7 standard deviations above the mean is a
better test taker than someone whose score or the GRE
(the exam many graduate schools require) was 1.5
standard deviations above the mean.
Why use standard deviation statistics when you could
use percentiles? The reason why social
scientists, economists, financial planners, and lots of
other professionals use standard deviations to examine
distributions of data is because just looking at
percentiles can be misleading. For example, someone with
an LSAT score of 150 (out of 180 possible points) is in
the 39th percentile of takers. Just a 3-point higher
score puts a test-taker in the 50th percentile just
three points higher than that (a 157) puts the taker in
the 61st percentile. So, at the center of the
distribution, a 6 point gain in the LSAT score
corresponds to a 20-point gain in percentiles. In
contrast, going from a score of 170 (96th percentile) to
a 176 increases a test-taker percentile by just four
percentiles (to the 99th), Put another way, a
three-point higher or lower score
- When an interval variable's distribution is
highly skewed, researchers typically cap the highest
or lowest values so that the mean and the median are
closer and so that the mean doesn't distort what the
typical respondent looks like. For example,
income in the US has a large, positive skew because a
relatively few number of individuals earn a tremendous
amount of money compared to everyone else. This means
that the mean income in the US is a lot higher than
the median unless we cap the top level of income by,
let's say, creating a top-income category of $200K or
more. When we truncate the top income value by
grouping folks like Bill Gates, Donald Trump and Jeff
Bezos together with much less affluent, but still
quite wealthy people, the average income shifts back
closer to the median income, and the standard
deviation decreases. Most research on income uses
medians rather than means; however, most statistical
analyses incorporate means, so researchers frequently
need to modify their original variables to address
these kinds of skews.
-
The means and standard
deviations of a categorical (aka nominal)
variable are useless because the values
assigned to a given category are arbitrary for
this type of variable. We
can use a frequency table or chart to explore the
distribution of a categorical variable, but for
statistical analyses, this type of variable
typically is recoded and analyzed as
a series of separate dummy variables. For
example, if you have party variable with three
categories, a mean of 1.3 doesn't mean anything, so
you'd want to create three dummy variables, one for
each party. Remember, a dummy variable is coded
1 if a respondent is in the group and 0 if they
are not. If your analysis would benefit
from also summarizing the distribution of a
categorical (also called "nominal") variable, you
will want to make a separate table or figure
reporting its "frequencies" (in percentages) because
the mean and standard deviation of a categorical
variable does not communicate useful information for
these types of variables.
-
Remember how to interpret the means of dummy
(aka "binary," "dichtomous," and "0/1" variables.
By convention, we report the standard deviation for
each dummy variable, too, even though the SD
information for a dummy variable is not useful. By
itself, the mean value for a dummy
variable reports its distribution in your sample;
e.g., a value of .37 for the dummy variable Democrat
indicates that 37% of the sample identifies as a
Democrat.
-
What do
researchers do to summarize the main
characteristics of an ordinal variable?
While there are statistical techniques designed for
independent and dependent ordinal variables, most
analyses recode ordinal variables into a dummy
variable or treat them as an interval variable. The
most common type of interval variable is a Likert
scale.
Topic 11
(Friday, Oct. 25)—Computing, interpreting, and
comparing descriptive statistics for more than one
variable in SPSS.
-
Remember that SPSS
#2 on BlackBoard is due by 5pm. If
you have a unique situation related to how SPSS is
working on your computer, please contact Dr. Setzler
after you have carefully tried to address the issue
yourself and carefully reviewed the
instructions/screencast links above. It is better for
all involved for you to submit a late assignment (that
will receive a modest penalty) rather than one that is
partially completed.
-
Ahead of class,
print out this handout--the third
of three that your instructor has written to cover all
of the SPSS and statistical methods we use in PSC 2019
and PSC 4099.
-
Ahead of class,
reread five pages of Mark Setzler, "Did Brazilians
Vote for Jair Bolsonaro Because They Share his Most
Controversial Views?" Specifically, print
out and review:
You are being asked to review these materials (again)
because they provide and example of how survey dataset
variables are recoded so that researcher's hypotheses
can be tested. The two figures provide examples of how
data-splits and frequency results can be used to
visually test arguments. We will be doing this kind of
work in class
-
In class, we
learn about and use SPSS to practice one of the
techniques researchers use to examine and visually demonstrate
whether two variables are associated with one
another. Specifically, we will be splitting a dataset
by independent variables and then computing
means or frequencies for dependent variables
You soon will be reading Chapter 8 in
Forestiere's textbook, which covers the topic of
"Bivariarte Analysis." While the statistical tests and
the methods discussed in that chapter very useful,
usually the best way to begin analyzing whether there
may be a relationship between two or more variables is
to see how variations in one or more independent
variables correspond to different means or proportions
for a dependent variable. This procedure looks only at
the data that are in our survey and does not involve any
statistical tests to determine whether any differences
between groups we see for our sample and the larger
population.
For example, if we think that there probably is a
relationship between gender and a person's income, we
can use SPSS's command Data -> split file ->
compare groups (and select the variable gender, which in
this example we have coded into a binary measure of
males and non-males). Once we have "split the file," we
will see statistical results for males and then
non-males every time we run a statistical procedure.
Each time, we will get two sets of output because there
are the two values for this variable. So, if we now run
a descriptives command for our variable measuring
household income, the results will tell us what the mean
income is for males and what it is for non-males.
If we have several independent variables (e.g., perhaps
we want to see how household incomes vary for women and
men, Republicans and Democrats, and people who are under
40, 41-65, and over 65 years of age, we can repeatedly
split our data by the relevant variable and then run
descriptives tests. We can examine the different means
in a table, but for a presentation or paper that
considers the differences for several different groups,
it will look best if we compile our results in a
chart/figure.
Week 11
Topic 12
(Monday, 10/28)—Comparing means and
frequencies in bar charts
-
In class, we
will be visually summarizing frequencies and
descriptives statistical results in spreadsheet
barcharts. You have no assigned readings for
Friday. We will be using statistical results to make
the kind of bar charts you see in the Brazilian
election article you were asked to reads parts of
ahead of Wednesday's class.
After class, if
you need additional guidance on creating Excel
charts to visually summarize SPSS reults, you have
the option of
watching this screencast: https://youtu.be/T6kHpZ2oReQ.
It shows you how splitting data and calculating the
means for several different variables will allow you
to make a nice-looking chart in Excel to show your
results. Making this kind of a chart is a task you
will need to do for your next BlackBoard SPSS
Assignment.
SPSS
test 1 (Wednesday 10/30) . The test will
be taken in-class, on BlackBoard)
. The test will require
you to use SPSS to drop some observations based on how
people answered a specific question. You also will be
asked to create a smaller version of a dataset by dropping
a couple of variables. And you will be asked to recode and
create a few variables, including new variables that draw
information on two or more other variables (e.g.m to
create a white, female, Democrat dummy variable). You will
show that you have created and labeled new variables
correctly with frequency tables. You will also be asked to
generate and interpret descriptive statistics for a
variable and its subgroups (i.e., to split the variable
into groups and run frequencies or descriptives). The test
will make up 5% of your final course grade
Unit 2
test: (Friday,
11/1).
The topics covered on the exam are noted
in the study guide (i.e., the Focus Questions handout that
has been in the PPTs folder since the start of the unit).
You will not be using SPSS or interpreting any SPSS output
as part of this exam. You will be asked conceptual
questions related to the use of SPSS and other statistical
software.
|