Please note: At your instructor's discretion,
there may be minor alterations to the reading
assignments listed below. One of the major advantages to
providing you with an on-line readings archive is that
timely articles can be added or substituted when
appropriate. Opening documents downloaded from this
website will require that your computer have
Acrobat Reader . You will also need the
class-specific password to open individual files.
UNIT 2 ASSIGNMENT SCHEDULE
Links to helpful
resources:
Survey-related
materials were introduced before Exam 1, but not
covered on it. They will be covered on the Unit 2
test:
Topic 5 —Surveys and
samples--studying a relatively small number of
people to make accurate inferences
about much larger "universes”
For Wednesday (9/17):
For Friday (9/19). It may seem like a lot
of readings are listed here, but these collectively add up
to less reading than a textbook chapter. Each piece is the
equivalent of a short textbook chapter section. Use the
focus questions to guide your reading.
-
Which subgroups were over-sampled? Specifically,
Pew invited extra individuals from several subgroups
whose members are less likely than other individuals
to participate in surveys or because they wanted
researchers to have access to a subgroup samples
that more accurately reflected these subgroups
overall.
- How do Pew analyses of their surveys deal with the
fact that even the most carefully created random
samples typically do not mirror their target
population exactly because different types of people
are more or less likely to accept invitations to
participate. To deal with this issue, researchers
typically apply post-hoc (i.e., after-the fact)
weights that use parameters from the Census or other
massive, high-quality surveys. Specifically,
researchers' data analyses typically under-weight or
over-weight the responses of different types of
individuals based on whether their were too few or too
many respondents of each type in the sample relative
to their share of the larger population. In Pew's
case, they used population parameters to determine who
would participate in their survey and--because they
purposefully oversampled some groups--they also use
post-hoc weights.
Week 6
Monday, September 22: In-class Exam 1.
Topic 6 (Wednesday
and Friday, Sept. 24, 26, after the test): Is
political polling getting better over time? Can we
still trust surveys to determine what Americans think
about contentious issues?
-
Stef W. Kight and Margaret Talev: "Study: What Americans really think"
(Axios 2022, 3pp.) This article uses the
phrase "self silencing" to refer to what social
scientists refer to as "social desirability bias."
What is a list experiment? How does it tackle the
problem of this type of bias? While we know that
people don't always admit to their prejudice against
other group, what does this article (and data in class
PPTs) say about how widespread social desirability
bias is?
The next several articles discuss how political polling
has improved over the last several elections. It looks
like a lot of reading but together, it adds up to less
than 15 pages.
-
Geoffrey Skelley, Why Was The
National Polling Environment So Off In 2020?
(FiveThirtyEight, 2021, 3-4pp). How far off are polls
from estimating actual turnout and vote choice in the
typical election? What is non-response bias? This
article doesn't talk about it, but what are some of
the factors that might cause the supporters of Donald
Trump to be more or less likely to take a survey than
Democrats, independents, and even some Republicans who
don't support him?
-
Andrew Fischer, "Polls Were Great in
2022. Can They Repeat Their Success" (NY Times,
2pp). What are polls doing to get better? How has
adjusting polls to make sure that they have an
proportionally accurate share whites and non-white
individuals without a college education and
respondents who voted for Trump in the previous
election improved polls' predictions.
-
You might be interested in reading more (i.e., this
is optional, but only because it is so detailed) about
how accurate political polling typically is and
whether there is a political bias to polling (when
large numbers of surveys are combined, there typically
isn't, especially if you weight for "house effects,"
which fivethirtyeight.com does): https://fivethirtyeight.com/features/2022-election-polling-accuracy/
Week 7
Topic 7—An introduction to some important tools of the
trade: Statistical analysis programs, scholarly research
resources, and survey datasets
Monday (9/29): Downloading and configuring
SPSS. In class, we'll likely finish up
the last block of material on survey research. You won't
have any new readings assigned for the day because I want
you to focus on getting a statistical program loaded and
configured so that we can start using it this week.
-
Important: Over the weekend, ahead of Monday's
class, please try to get SPSS (the statisical
analysis program we will be using a lot for the rest
of the semester). We will likely will use SPSS for
the first time on Wednesday,
Ideally, we would all collectively download, install,
and configure the software you will need for the rest of
the term together in class. However, due to slow/clogged
bandwidth issues, I have found that trying to do this
work together has not worked well at all.
-
Download and install SPSS
on your personal computer (you need to be working on
a Mac or PC). Why do you want to put this
program onto your own computer? While you have access
to SPSS on some university computers, you will find it
much more convenient to run SPSS on your own
laptop. HPU has a site license that will allow you to
use an individual-version of the program on your own
computer all semester at no cost.
In the unlikely event that you already have a SPSS and
a license on your computer from a previous course,
uninstall that program first. You want to start with a
fresh install and new license so that you have the
latest version of SPSS and so your program license
doesn't expire during the the semester.
To install (or need to reinstall the program at some
point after class), here are the instructions to
download a copy of SPSS and then a 6-mo license from
HPU’s IT office:
How to Install SPSS 30 for Windows
https://highpointuniversity.service-now.com/help?id=kb_article&sys_id=3d24327adbab44d02bd0f1396896195b
How to Install SPSS 30 for Mac
https://highpointuniversity.service-now.com/help?id=kb_article&sys_id=c3e084631bae481015324000cd4bcbeb
For both PCs and Macs, you download the program
first and then the enter the license code. The
fastest way to do this is to open the text file with
the license code and copy that user-specific code
before you download and install SPSS.
When you install the SPSS program, you will have to
activate the license. When you get to this point
in the installation process, check the option indicating
that you have an "individual user" license.
After a couple of clicks, you will paste your license
code in (i.e., the one you copied earlier) and click the
"add" button. From there, you may have to click through
some more OKs to complete this process.
Once you are done installing SPSS, make sure you can
open and run it. To do so, you will need to find
and open it. It likely won't be on your programs
dock/banner initially. For PC users, you can quickly
find and open a program by using the search bar on the
bottom banner. For Mac users, you can quickly find a
program by clicking on the Spotlight icon (magnifying
glass) in the top right-hand corner of your computer. To
make things faster in the future, you could create an
alias for your desktop
Once you have followed all of the directions
above--including uninstalling and reinstalling SPSS if
you previously had it on your computer, let Dr.
Setzler know by email message if the program will not
open. and run. For me to assist, I need to know
what is happening--are you getting a specific error
message? Also, I will need to know if you are using a PC
or a Mac and what operating system you are using (i.e.,
Windows 10 or 11 for PC users or one of these for Mac users).
- Mac users only: You
need to quickly complete a set of one-time changes to
SPSS for Mac's default settings so that your version
of SPSS will look and act just like SPSS for PC.
Important: You aren't changing any other program or your
OS, just your SPSS settings. Open SPSS for Mac in its
default mode, selecting the option to open "a new
dataset," which opens a blank data editor window. From
there, follow this link for instructions on
making the necessary changes. Follow the written
instructions, especially those that are highlighted:
If use the options show in the picture of the options
page, you will NOT change what needs to be changed.
Why are Mac users
(only) being asked to change the program's defaults? Inexplicably,
the default layout of SPSS for Mac looks quite a bit
different from the PC version, and the default Mac
settings lack SPSS options that are described in most
textbooks, my screencast tutorials, and most
instructional materials you will find online
Wednesday and Friday
(10/1 and 10/3 will be spent reviewing some of the
places you can find survey datasests that can be
downloaded at no charge.
-
Now, go to tab on the top banner labeled, "Tools
& Resources." Look at the resources for "Survey
Question Search." Try a practice search in
"International Questions." For example, check the
box that will show results only from surveys that
have been administered in Brazil and then search
this term: polit (the partial word search will
survey questions that included either political
or politics).
-
Now, go back to tab on the top banner labeled,
"Tools & Resources." Look at the section labeled
"Dataset downloads." This is where you can find and
download the public-use datatsets noted in many of
the Pew articles you see on the website. This is a
great place to start if you are looking for data on
specific groups or have no idea yet what you'd like
to write a research project on. If you are
interested in doing work on foreign countries, for
example, you'll want to begin looking in the "Global
Attitudes" unit.
-
IMPORTANT: Pew typically embargoes the release of
new survey data for a least a year so its
researchers can publish findings first. This section
of the website will be the best place to start
looking for a thesis topic using Pew data because
the press releases in this section all refer to
datasets that already are available. If you are most
likely to do a topic on American politics, the ATP
(American Trends Panel) data is going to be a good
place to start looking for a topic.
-
Before class, quickly familiarize
yourself with the work of Pew's research units that
focus on specialized topical areas (these are
the topic areas Pew uses to organize the dataset in
the "Dataset Downloads" section that you looked at
earlier. Take quick look at these Pew Center websites
to get a sense about what kinds of topics the various
research units at Pew are looking at:
Pew Research Center for the people and the press
(domestic studies on issues other than those tackled
by Pew's special units):
https://people-press.org/
-
Ahead of class, you do NOT need to
review materials at the other research centers
listed below, but be aware that there are many
other places where you can locate high-quality,
publicly available datasets at no cost. See some of
these resources at: https://marksetzler.org/generalissues/SurveyDataSources.html.
Students in advanced political science classes have
used datasets from a wide range of sources.
Week 8: No
classes, midterm break!
Week 9
Topic 8—Some SPSS basics
and hands-on practice: Preparing survey datasets for
analysis
Monday 10/13
This week will have very little out-of-class reading so
that you can use this time to review screencasts on what
we are covering in class if you think a review is
necessary. In class, we will continue getting introduced
to SPSS and learn the basics of "data set preparation,"
which is the process of downloading survey data and then
taking, if necessary, a few additional steps to make sure
that we can analyze just the data we want.
-
In Monday's class, most of our time will be
spent downloading and taking a preliminary look at a
2023 dataset from PRRI (which we will
download in class). We will also make sure that we
download the survey's methodology and original
questionnaire, taking a close look at both.
PRRI is a nonpartisan think tank that annually
administers its Values Survey, which includes a
large number of interesting topics (for example, whether
or not an American thinks their top leaders need to be
moral in their private lives to be effective in the
their public roles. We will be looking at this
particular dataset because it covers so many interesting
topics. For political science and international
relations majors who want to begin thinking about the
research design they will submit at the end of the term,
this may be a good source of data for your project.
We will discuss how we can use
SPSS's Frequency command (Analyze
-> Descriptive Statistics -> Frequencies)
to see how Americans are divided on various issues.
We will also learn how to use
SPSS's "Split File" command [Data -> Split File -> Compare groups
(and then identify the variable whose response categories
you want to compare)]. For example, we can split our
dataset on the partisanship variable and then a frequency
to see how Democrats, Republicans, etc. differ in how they
answered the questions in the survey. This approach
is a very quick and simple way to get a first look at how
an independent variable (for example people who identify
with different races) corresponds to differences in a
dependent variable (for example, the party they identify
with). Important: After
you have compared your groups of interest, make sure to
go back to the split file and select the option to
analyze the full dataset; otherwise, SPSS will
continue to report all statistics for the subgroups as
long as the program is running.
On Wednesday (10/15), we will go over the
steps necessary to drop observations and variables from a
dataset.
-
In class, we
will continue working with the PRRI Values Survey from
2023, which we will download in class.
-
Ahead of class, please read
through the first part of this document (a
handout I have prepared so you have a quick
reference) up to, but not including, SPSS Basics 5:
Recoding Data. This handout covers EVERYTHING we
will be doing in class on Wednesday and Friday.
-
During class, we will be
looking more closely at how the "syntax" window works.
This is where we either manually enter or use SPSS's
point-and-click features to write code telling SPSS to
do something. Specifically, we will be learning how to
use sytax to "prepare a dataset for analysis," which
usually means removing variables and perhaps
observations that are irrelevant to a study.
-
After class,
watch this screencast only if you need more guidance on how to create a smaller
version of an SPSS dataset that keeps only some of
the variables (and also how to recreate that dataset
if you later discover you omitted variables you
shouldn't have). The screencast runs a
little over 14 minutes.
-
Important: I left
two important things left out of
screencast. First, you need to make sure to save
your syntax so that you will able to run again it in
the future if you want to add additional variables
from the full dataset. Name the syntax file
something like: "Syntax to keep only some
variables."
-
Second, you should add a note (starting with an
asterisk) at the top of the syntax reminding your
future self that this syntax needs to be run only
when the full copy_version of the dataset is open in
SPSS.
-
So why do you need to know how to drop most of the
variables from a dataset? If you don't take a few
minutes now to learn how to create a truncated
version of a dataset, every time you generate
statistics for a project, you will have to scroll
down through large amount of irrelevant information.
And, if you are working with a dataset that has
dozens of similarly named variables, working with
the full dataset increases the likelihood of making
mistakes in your coding and analysis.
Key
ideas from the screencast, so you shouldn't need
to watch it more than once (if at all):
You can create the necessary code to
make a small version of your dataset in two steps:
Step one:
- Open the copy_ version of the full
original dataset.
- If you are using a Mac you need to
have previously configured SPSS for Mac so that SPSS
works the same as it does on a PC.
- Use the FIle -> "Save As"
command, and click your way to the destination
folder where you want to save a smaller version
dataset.
- Do not hit the ok
button--after you give this file a new name, you
are going to use the Paste button to paste the code
into a syntax file. We will add an additional line
to this code in a minute. If you are using a Mac
and do not see a paste button, you need to verify
that your version of SPSS is configured as
explained in the schedule second related to
installing SPSS.
- Rename the file being saved so that
you know it is a smaller version of the dataset. Dr.
Setzler typically keeps the name of the original
dataset, but adds *small_to the front end of the
file. His system is to have original_, copy_, and
now small_versions of the dataset.
- Again, after you have renamed the
file, use the Paste
button to paste the code into a syntax file.
- The last step will automatically
open up a new syntax file and paste a command
telling SPSS to save the small_version of the file.
If you select and run that code, it will save the
full dataset, and that's not what you want. You are
going to want to tell SPSS to keep only some of the
variables when you run that SAVE command. Before you
modify your syntax to do that, take a minute and add
a notation (starting with an asterisk so SPSS will
grey out your note and not try to run it) explaining
to your future self what this code does. This is a
good time to remind yourself in the note that, "In
order for this code to run in the future, you need
to have the full copy_version of the dataset
open."
Step two:
- Now, you want to manually change the
syntax you just pasted, telling SPSS to save the
_small file with just a subset of variables. To do
this, type out "/KEEP=" as a subcommand and
then list of variables to your save command. The
line of code that you add should look like what I
have bolded here:
SAVE
OUTFILE='C:\Users\msetzler\Google
Drive\SeniorSemPrepDataset\AmerBarUS2018_working.sav'
/KEEP = Var1 Var2 Var3
/ COMPRESSED.
*Note that
syntax lines that begin with a forward slash are
subcommands. The ONE period for the full command goes
at the very end of the command, which is after the
last subcommand in this case.
- Select just the SAVE code block
(i.e., from SAVE OUTFILE through COMPRESSED.) and
tell SPSS to run the selection. You can do this by
green arrow in the commands icon options or by
right-hand clicking and selecting run selection (on
a Mac, holding the control button down while doing a
touchpad tap is the equivalent of a right-hand mouse
click).
- Verify that you now have a
_small.sav file in the folder where it is supposed
to be. Open it up and make sure just the subset
variables you need are there.
- If all looks good,
add two notes to the top of your syntax. Start
notes in your syntax with an asterisk to SPSS will
grey out the note and not treat it as code. SPSS
sees a period as your indication that the note has
ended. The first note should be a reminder to your
future self that whenever you run this code, you
will need to have the copy_ version of your full
dataset open, and you will open this syntax file
from within that dataset (File -> Open->
Syntax). The second note should be a quick
reminder to yourself of what this syntax does.
- Once everything is
working and you have annotated your syntax, save
the syntax file to the same folder as your
datasets. Name it clearly so you can find it again
in case you need to re-run it in the future.
- These steps are
important because you may need to add or drop
additional variables in your small_ dataset at
some point, To do so, all you will have to do is
delete the current small_ dataset file, change the
syntax as necessary to add new variables you want
to keep, and rerun it. As you saw in the
screencast, this quick step will recreate a
modified small file that has only the variables
you need.
- If all looks good, close the syntax
and all of the open datasets. Do NOT save any
changes to the copy_ dataset if asked to do so.
Going forward, you will be working only with that
small datset.
Pro-tip from the screencast:
- When editing the Keep= line, rather
than copying variable names individually or typing
them in manually, it is a lot faster and way less
error-prone to create a list of variables to keep
using a point and click command:
Analyze->Descriptive statistics
->Descriptives->and then select all of the
variable names you want to keep. Then, paste your
descriptive command into your syntax. Select
just that code and run it (to run it, hit the green
arrow button).
- If the output for the variables
looks good (i.e., you got the right variables
listed), then copy and paste the descriptives
command's list of variables into your /KEEP= code
line.
Another pro-tip from the
screencast:
- When you are in the
Descriptives selection window (or any command
listing all of the variables), it is a lot faster to
find the variables you are looking for if you
right-hand click on the variables (Mac users need to
press the command key and click at the same time) to
first show variable names and then to sort those
names alphabetically. This trick works in all of the
menu's dialog windows where there is a long list of
variable labels listed in whatever order they appear
in the dataset.
- After class, watch this screencast only if you need more guidance to
prepare a dataset so that you will analyze only a
subset of a survey's respondents: It is about
17 minutes long, so you may want to try following the
written instructions below first. Why do you need to
know how to do this? For example, when international
relations majors are using a Global Attitudes or a
LatinoBarometer survey, they often only want to look at
respondents from certain countries, and running
statistics on the full dataset would return results for
all of countries in the survey.
Key steps noted in the screencast:
(1) Open the copy_ version of your dataset
(2) File -> open -> syntax... and open the syntax
file you used to create the small_ version of your
dataset (the one where you have removed many variables).
(3) If your small_ dataset does not have all of the
variables you will need to tell SPSS to keep only
certain types of respondents, add those variables to
your syntax and re-run it to recreate your small_
dataset.
(4) At the bottom of your syntax, make a notation,
starting with an asterisk, indicating which types of
observations you will be dropping
(5) With the copy_ version of the dataset open and the
small_version closed, Point-click-paste: File -> Open
-> Data -> and select but do not actually open the
small_ version. Use the paste button to tell SPSS to
paste the "GET" command into the bottom of your syntax.
That syntax will tell SPSS to go get and open the small_
version of your dataset when run. The reason why this
step is important is that you are going to be automating
an SPSS command to get, open, and then change the
small_version of the dataset rather than your full
copy_version.
(6) Select and run the GET command you just pasted.
Doing so will open the small_version of the dataset, And
then close the copy_ version of your dataset. Again this
is to make sure you don't change and save changes to the
full version by mistake when you need to be modifying
the small_ version only.
(7) Now, you can create the code that will keep only
some observations. Use point-click-paste to work your
way through the Data->Select Cases command. You will
need to enter logical criteria specific to your needs in
the "Select If" module and then hit continue. You then
will need to tell SPSS to "Delete" all of the "unused
cases." Remember to
paste this code into your syntax.
(7) Now you need to automate the process of saving your
small_ dataset everytime this full syntax file is run.
To do so: Point-click-paste: File -> Save AS->
Data -> and select but do not open the small_
version. Paste this code into your syntax.
(8) Select and run the "Filter off" code all the way
down to the end where you save the dataset. Take a look
at the small dataset and make sure that it only has the
variables and observations it is supposed to have.
(9) Save your revised syntax to write over the first
version. Now, if you need to make changes in the future,
you will just open this syntax from within the copy_
dataset,, make your edits, and re-run the syntax.
If all of this seems like
a lot of work, you don't have to automate things as I
suggest above and in the screencast. Here's a faster
way:
(1) Make a copy of the small_version of your dataset. If
you accidentally delete the wrong observations, you want
to have a backup. You will not need this backup unless you
delete the wrong observations and then save over the
small_version of the dataset, too.
(2) Now, open the small_ version of your dataset that has
most of the variables removed.
(3) Create and run the few lines of code needed to remove
observations from the small_ dataset. To do so, go Data
-> Select cases -> Select if. Then identify the
conditions that need to be met for a variable to be kept.
In the screencast, variables are kept if (partyID7 =1 OR
partyID7 =1 OR partyID7 = 2 OR partyID7 =3 OR partyID7
>4. If all looks good, select the button telling SPSS
that you will delete unused cases AND paste this code into
syntax.
(3) Run your code and then use a frequency command to look
at who is in your dataset. If you have removed all and
only the observations you wanted to remove, save the
edited small_ dataset, replacing your previous version.
This version has fewer variables and fewer observations
than the copy original dataset.
(4) If you remove observations this way, make sure to
include annotations at the start of your your new syntax
to remind your future self what this block of code
does.
(5) If you need to add additional variables to your small_
dataset in the future, remember that you will need to run
this second syntax file to remove the relevant
observations and save the file again.
Here's an example of what your syntax would look like if
you wanted to only include respondents living in France,
the UK, and the US (assuming the variable "country" is
coded 17, 2, and 4) for French, British, and Americans,
respectively:
*This code
will drop any respondent who is not from one of those
three countries.*
FILTER
OFF.
USE ALL.
SELECT IF (country = 17 or country = 2
or country = 4).
EXECUTE.
*Here is
a second example, but this time for the select line,
let's use "AND" to tell SPSS to keep observations only
if they meet two conditions (here, they are male
Republicans):
SELECT IF (Party=2) AND (Male=1).
If we wanted to keep only males who
were Republicans or Democrats (but not independents or
other responses), the select line would look something
like:
SELECT IF (Male=1) AND (Party=1) OR
(Party=2) .
Important: If we have a key variable in
our study and many respondents were not asked the
relevant question, we want to delete all observations
in the dataset that have "missing" data for that
variable. If we don't do this, descriptive
statistics for all of the other variables will analyze
lots of respondents who will not be included in
bivariate and multivariate analyses. In this example, I
want to remove all respondents who did not answer a
question about whether they support making public
colleges free to attend because the relevant question
was presented to only half of the people taking the
survey:
*This code drops any
respondent who has missing data on the variable
"free_college_good"
FILTER OFF.
USE ALL.
SELECT IF (NOT
MISSING(free_college_good)).
EXECUTE.
- If we have time on
Wednesday, you will start practice what you have
learned about SPSS so far by beginning to complete a
Blackboard assignment in class.
-
SPSS assignment #1 will require you to
locate and download an original PRRI dataset, make a
copy. version of that dataset, and then make a
small. version that drops certain variables and
observations. You also will need to split the
dataset into groups and practice running
descriptives and frequency commands. The assignment
will be written so that most students should be able
to complete their work in class; however, I will
post it after class on Wednesday in case you want to
get a head start.
-
Barring unusual
circumstances,
SPSS assignment #1 (BlackBoard) must be completed by 5pm
Friday (10/17). If you have a unique
situation related to how SPSS is working on your
computer, please contact Dr. Setzler after you have
carefully tried to address the issue yourself and
carefully reviewed the instructions/screencast links
above. It is better for all involved for you to
submit a late assignment (that will receive a modest
penalty) rather than one that is partially
completed.
Topic 9: Recoding and creating variables;
labeling variables and their response categories.
I will post instructions and assignments after
Wednesday's class, but am holding off for now so that you
don't have to wade through lots of detailed instructions
that will serve as your class notes for Friday and
Monday.
Topic 9 (10/18 and 10/21) Recoding and
creating variables; labeling variables and their
response categories
On Friday and Monday, we
will learn how to create variables in SPSS. You will be
completing an assignment on this topic on BlackBoard. It
will be due late next week.
-
Ahead of
Friday's class, quickly read Chapter
7 in your textbook up to the section on
descriptive statistics. This reading
is just five pages; it will help you understand what
we will be doing this week and why.
-
Also before class, skim the rest of this document, starting at the
section "SPSS Basics 5: Recoding Data." This is the
first handout of three documents that your instructor
has written to cover all of the SPSS coding and
statistical methods we use in PSC 2019 and PSC 4099.
The handout is a handy resource that:
-
Explains why researchers typically recode or create
every variable in their analyses rather than working
with a dataset's variables in their original
form.
-
Summarizes everything we have done with SPSS so
far, which mostly has involved the process of
finding and downloading data and--if
necessary--importing the data into SPSS, removing
variables, and removing observations not related to
our study in anyway.
- Explains in detail how to recode an original
variable. You will find this information helpful when
you are completing the Blackboard assignment at the
end of the week.
One of the main topics this block of material is how
to use SPSS's RECODE INTO command. Note: everything
that is discussed below in this section will be taught in
class. You have no assigned readings on this block of
material other than reading through the points that are
listed below. Any linked screencasts are optional; I have
recorded screencasts that duplicate class material because
some students prefer video summaries versus those I post
in the course schedule.
-
The RECODE INTO command is your go-to
method when are working with only one original
variable and want to collapse its categories
(perhaps into a dummy variable, like making the 0-1
variable Republican out of the original variable where
Republican is one of several partisan groups coded
onto the same variable). When we are doing data
analysis it typically is necessary to recode every
variable we will be using to addressing respondents
who are missing data, said "don't know" to items, or
refused to answer some questions.
RECODE INTO is also a command
we can use to reverse code a variable's values (e.g.,
using Conservative7 to make the new variable Liberal7).
Here are the key steps to recoding a
variable in SPSS:
(1) First, make sure that you
are working with a copy of your dataset so that
if you accidentally recode an original variable onto
itself you can just make another copy of the dataset and
fix the glitch. Your instructor typically keeps four
versions of a dataset in same subfolder when he's
working on a project: the original dataset, a "copy" of
the full dataset, a "small" version of the copy with
most variables removed, and a "working" copy of the
dataset, which is where he recodes variables and
computes new ones.
(1) Now, with the working version of
your dataset
open, open a new syntax file that will be used for all
of your variable recoding with this dataset (or reopen
it if you are continuing previous work).
(1) Next, use SPSS to run
a CODEBOOK command on the original variable you
are going to recode, in this example that is:
Conservative7. To run a CODEBOOK command, you can either
go Analyze -> Reports -> Codebook or--in
syntax--type:
CODEBOOK Conservative7.
(2) Next, if it is
available, it would be helpful to have the original
variable's exact question phrasing pasted into our
syntax so that you can easily recall how exactly
the variable was phrased later on. It it also helpful to have
your CODEBOOK
results pasted into your syntax, so that you can more
easily identify an issue if something is recoded
backwards or one of the original response categories not
carried over int the new variable.
How do you make notes in your syntax without
causing SPSS to read those notes as code and stall
out? You can write or copy and paste
annotations into the syntax right before the RECODE INTO
command. If you put an
asterisk in front of an annotation with question
phrasing pasted from the questionnaire, it will be
greyed out so we can see it, but SPSS won't stall when
it gets to that part of the syntax. SPSS will
keep greying out language as long as their are no line
breaks AND you don't click on the return and end a line
with a period. So, when you are ready to end the
annotation, use a period and click on return.
(3) Start to recode the
variable by using SPSS's point-and-click interface,
going to Transform -> Recode into New Variables.
you want to use the RECODE INTO command
which preserves all aspects of the original variable in
case you make any recoding mistakes.
For this example, you will
need to tell SPSS that you are going to transform the
orginal variable HowConservative7 into a new variable
named HowLiberal7.
In the same screen, you will
provide a label for the new variable. Variable names
cannot have any spaces. Make sure to label new variables
in a way that makes sense and is specific enough that
you will remember what a higher value is (e.g., don't
name and label a variable "Gender" if you are creating a
dummy variable for people who identify as male because
you may forget who is coded 1 or 0 later on),
After you label you
variable the button "Change." Nothing will seem to
happen, but the new variable will have that label when
it is created late.
(4) Now, click the button
for "Old and New values." This is the area where you do
all of the actual recoding. Make sure that you think
through how you need to recode. You need to recode every
value in the original variable into some value (or
system missing) in the new variable.
(5) It is best practice in
recoding to start off by indicating that any value that
was "system or user missing" in the old variable should
be coded as "system missing" in the new variable. Choose
those options and click on the add button.
(6) Now you need to recode
the values. So 10 in the old variable = 1 in the new
one, and click on the add button. And then 9=2 and
"add," and so on until all ten of the values are
reversed for the new variable.
(7) Once you have made sure
to include instructions that will recode ALL of the
original values into the new variable, click on continue
and then paste. If you accidentally click on the OK,
button, just point and click your way back through the
transform command where all of the data you just entered
will still be there, and this time click on paste.
(8) To create the new
variable, you have to select and run the just code you
pasted into syntax (run with the green arrow on the
menu). Whenever you create a new variable, do what you
can to make sure you did the recoding correctly. To do
this run a frequency for the original and new variable
and make sure that everything looks right. If you didn't
code the variable correctly, look at your syntax and see
what went wrong. If you need to edit the syntax, you can
run the whole RECODE INTO command over again, and it
will recreate the variable (this is a cool improvement
over older versions of SPSS, which made you delete a
variable before you could recreate it).
(9) Important:
Whenever you make a new variable, you need to label
both it and the variable's response categories.
The fastest way to label variables is to do so in
syntax. If you are using RECODE INTO SPSS, you already
will have created the code to label your variable
back in step 3,
However, you also will need
to add value labels to any new variable you create. Unfortunately, SPSS does
not have a way to point-click-paste the coding for
response category labels, so you will need to create
the coding manually (usually by copying and pasting
sample coding and substituting in new,
variable-specific information.
Here's what that code looks likes if we
want to label just the anchors on our 7-point measure of
liberalism (if we were working with a dummy,
multi-category, or ordinal variable we would label each
value):
VARIABLE LABELS Liberal7
"How liberal is the respondent?".
VALUE LABELS Liberal7
1 "Very Conservative"
7 "Very liberal" .
Notice that there are two
periods for the two label-related commands. There is
only a single period at the very end of each command,
and it goes outside of any parentheses.
Note that if you want to
change the variable label or any of its value labels,
you can make the edits and just rerun the commands. This
is one of the big advantages to working in syntax.
Finally, notice that once
you have created the recode+relabel syntax for a
variable or two, you can copy and paste that syntax and
quickly swap out values and phrasing to very quickly
create a whole bunch of similar variables.
We also will
review the other technique most commonly used to change
and create new variables in SPSS: Using a
combination of COMPUTE and IF commands in syntax. Note: everything that
is discussed below in this section will be taught in
class. You have no assigned readings on this block of
material other than reading through the points that are
listed below. Any linked screencasts are optional; I
record screencasts that duplicate class material because
some students prefer visual summaries versus those I post
in the course schedule.
-
If you prefer, you can try to use the COMPUTE command
using SPSS's point-click-paste commands (see the
Transform menu). However, it is a lot clearer and
easier to write these commands directly into syntax.
-
You can use COMPUTE
+ IF commands to recode a single variable instead of using RECODE INTO. If you are
making many similarly coded new variables, either type
of SPSS command works well. To get you acquainted with
using Compute and If statements, we will practice
using them to recode a few variables in class. This
technique will also be taught/reviewed as part of your
BlackBoard assignment. After this week's classes,
if you need a refresher, here is an optional screencast that goes over
the technique (under 9 min if you listen at full
speed). One important thing: After I
recorded the screencast, SPSS changed how you deal
with missing data on the original item. See the
highlighted code below:
After class, if you need a refresher on
what we will practice, here is an optional
screencast that goes over the technique (9 min or so
if you listen at full speed).
Here is a summary of what is
covered in the screencast, only the summary uses an
example (a dog's breed and color) that may be easier to
follow remember:
Let's say we have a sample
of dog owners who all own one dog. They have taken a
survey asking questions about their dog. We want to
create a hypothetical new dummy variable identifying the
owners of white poodles. Our new dummy variable,
WhitePoodle, will be created with information from two
original variables: dog_breed and dog_color.
A CODEBOOK command indicates the following:
*dog_breed = 12 if a dog is a poodle, while values of 1
through 55 refer to other dog breeds.
*dog_color = 1 if a dog is white, while values 2 through
10, refer to black, brown, grey, spotted brown, etc.
*And, if a respondent didn't know or refused to give
breed or color information about their dog, the relevant
variables were coded 99 in the dataset.
*Here is the block of code
we could use to create the dummy variable:
COMPUTE WhitePoodle =
$SYSMIS.
IF (dog_breed < 99) AND (dog_color < 99)
WhitePoodle =0.
IF (dog_breed = 12) AND (dog_color = 3) WhitePoodle
=1.
*and then we would need to deal with any missing data on
the original variables if there were any:
IF MISSING(dog_breed) OR
MISSING(dog_color) WhitePoodle =$SYSMIS.
*and finally we would need to add labels for the new
variable as well as its the response categories.
VARIABLE LABELS WhitePoodle "Dog is a white
poodle".
VALUE LABELS WhitePoodle
1 "White poodle"
0 "Not a white
poodle" .
*Note where the periods are.
So the full block of code together would be:
COMPUTE WhitePoodle = $SYSMIS.
IF (dog_breed < 99) AND (dog_color < 99)
WhitePoodle =0.
IF (dog_breed = 12) AND (dog_color = 3) WhitePoodle
=1.
IF MISSING(dog_breed) OR MISSING(dog_color) WhitePoodle
=$SYSMIS.
VARIABLE LABELS WhitePoodle "Dog is a white poodle".
VALUE LABELS WhitePoodle
1 "White poodle"
0 "Not a white
poodle" .
Note that COMPUTE, IF,
VARIABLE LABELS, and VALUE LABELS are all commands, so
one--and only one--period goes at the end of each full
command even if that command stretches over more than one
line of code.
Here is the logic behind how the code was written, step by
step:
Step 1
To see how the responses for dog_breed and dog_color are
coded and labeled in the dataset, we run a codebook command:
CODEBOOK dog_breed dog_color.
Step 2
Now, tell SPSS to create
a new variable where all of the values are blank (i.e.,
are "system missing"):
COMPUTE WhitePoodle = $SYSMIS.
If we were to look at the data set in the SPSS data
viewer (Data View tab), we would see a new column for a
variable name "WhitePoodle." All of its values would be
blank.
Step 3
Tell SPSS to turn almost all
of those black values into a zero if the observation
should not be system missing:
IF (dog_breed < 99) AND (dog_color
<99) WhitePoodle =0
Notice this set changes the blank to a zero (meaning
not in our group) for every respondent who didn't say
don't know. Now, if we were to look at the data
set in the SPSS data viewer (Data View tab), we would
see most the values for "WhitePoodle" are zero.
Step 4
Now, tell SPSS to
turn some of the respondent's values into a 1 if certain
conditions are met:
IF (dog_breed = 12) AND (dog_color =
3) WhitePoodle =1.
This step turns all of the zeroes into ones for people
who have white poodles.
Notice that the order of commands matters! If we had
done Step 4 before we did step 3, everyone who didn't
refuse the answer the questions would be coded as not
having a white poodle
Step 5
Now, make sure to tell SPSS to turn some of the respondent's values into
missing data if they had missing data on either of the
original items::
IF
MISSING(dog_breed) OR MISSING(dog_color) WhitePoodle
=$SYSMIS.
Step 6
Now, tell SPSS to label the new
variable and its response categories. For
example:
VARIABLE LABELS WhitePoodle "Dog is a
white poodle".
VALUE LABELS WhitePoodle
1 "White poodle"
0 "Not a white poodle" .
Step 7
Run a frequency on the old
and new variables to verify that general pattern and
number of observations looks right. If you made a
mistake, read carefully through the code. If you make
edits, you can run the block of code over again; SPSS will
delete the variable and replace it with the edited version
-
COMPUTE + IF commands also are the best way to
create a couple of other types of variables that
researchers frequently use, including reverse coding
variables with lots of response categories.
While you can use RECODE INTO to reverse code a
10-point measure, it is a lot faster to use code like
this:
COMPUTE
New_Variable10 = 11 - Old_Variable10.
IF Old_Variable10 = $SYSMIS New_Variable10 =
$SYSMIS.
IF Old_Variable10 = 99 New_Variable10 = $SYSMIS.
IF MISSING(Old_Variable10)
New_Variable10
= $SYSMIS.
*Notice the COMPUTE line's logic. We're telling SPSS
to create a new variable by subtracting the
original variable's value from 11,
which is one higher than the maximum value. This
means a 10 on the old variable will be a one on the new
variable while a 1 will become a 10, and so on.
*Notice also, that we have to tell SPSS which variables
should be coded as missing responses on the new value. Here, we're assuming 99 on the
original variable means that the respondent refused to
answer or said they "didn't know."
After the week's
class meetings, if you want to review more details on
reverse coding a variable using COMPUTE + IF commands,
here is a optional
screencast on the topic. It runs a bit over
10 minutes, but only the first 6:30 covers the reverse
coding. In the sample code directly above, you'll
see that I do something that I didn't do in the
screencast, but that is important; I added a line of
code to make sure that any missing responses on the
original item carry over to the new item:
IF MISSING(Old_Variable10) New_Variable10
= $SYSMIS.
I recorded the rest of the screencast because it shows
your instructor making a very common mistake when adding
variables (initially, both the VARIABLE LABELS and VALUE
LABELS commands neglected to tell SPSS what variable
needed to be labeled).
In the likely event that you need help
trouble-shooting issues that come up with labeling new
variables and their response categories, here is a list
of common issues:
- Building on the example from above, we might label
our new variable:
VARIABLE LABELS WhitePoodle "Dog is a white poodle".
EXECUTE.
What to check if you can't get SPSS to create a
new variable's label:
-
Note that the VARIABLE LABELS command is plural
even when you are just labeling one variable. If
they made your instructor king for a day, the
command would run just fine with or without that
last S; however, his promotion to regality does not
appear to be forthcoming.
-
Note where the quotation marks go. SPSS uses a
single quotation mark, but you instructor always
quotation marks, which allows him to “Put single
‘marks’ inside of a label”
-
Notice also that each line begins with a command
and ends with the required period. There is only one
period per full command.
-
Notice that you have to tell SPSS which variable
you want to label. In this case: WhitePoodle
-
The EXECUTE command tells SPSS to go ahead and do
this work now rather than waiting until another
command is run. This is a leftover “feature” from
when computers used to be SLOW and no one wanted to
sit around waiting for the program to label
variables until those variables were actually going
to be used in some type of analysis.
-
Finally, if you don't select and run the full block
of code, the variable won't be labeled.
-
And, if you have looked at each of the problem
areas I identify above, you can use ChatGPT or
Claude to help you identify what the issue is with
your code.
What to check if you can't get SPSS to create the
response labels:
-
You can't have extra, blank lines in a command.
This can happen accidentally when you cut and paste
sample code from outside of SPSS to use as a
template.
-
Folks accidentally substitute VARIABLE LABELS
from above instead of using the correct VALUE LABELS
command.
-
Folks labeling a variable try to use VALUE LABEL
without the final “S,” in which case SPSS doesn’t
recognize the command.
-
Folks forget to tell SPSS which variable they are
trying to label response categories for (above, that
would be WhitePoodle or they list the wrong
variable's because they are reusing code for a
previously created variable. If you do this, the
response categories on the previous variables will
be relabeled... No worries--if you make a mistake in
labeling any variable or its categories, you can
just fix the mistake and rerun the code and it will
fix the labels.
-
Folks forget to put at least one space between each
number and it's label. This won't work: 1"Cat". This
will: 1 "Cat". Incidentally, you can add extra
spaces, so that's a good habit to get into so that
it becomes very easy to see if you forget to have at
least one space.
-
Folks add an equal sign (e.g. 1 = “yes”) or forget
the quotation marks for each label.
-
Folks add an extra period after the variable name
(above, that would be WhitePoodle) despite being
mid-command.
-
Folks forget to include the period after the last
label, which tells SPSS that this is the end of full
command. Your instructor typically put the period on
a separate line so it is easier to copy and
repurpose old code with more or fewer categories
without forgetting to add the required period after
the last label.
-
Note that the EXECUTE command and its period need
to be there.
-
And, finally sometimes people create all of the
code correctly but forget to select and run it.
-
And, if you have looked at each of the problem
areas I identify above, you can use ChatGPT or
Claude to help you identify what the issue is with
your code.
Topic 10 (Wednesday, October 22)—Descriptive
statistics for a variable: Frequencies, medians, means,
and standard deviations (aka, just enough univariate
stats to make sure that your variable coding has been done
correctly and that you can describe a variable's central
tendency and its distribution)
- Read closely, this
appendix from a best-selling book targeting a
non-specialist, general audience (Richard J.
Herrnstein and Charles Murray, "Statistics for People
Who Are Sure They Can’t Learn Statistics," 12pp).
-
Read quickly to get
an introduction to key ideas: Chapter 7, "Getting
Started with Quantitative Data: Descriptive
Statistics." in Carolyn Forestiere's
textbook. After the classes that cover the substance
of this chapter, you should will want to reread this
chapter and make sure that you are able to explain,
apply, and give an example of every concept in the
glossary. Your textbook talks about doing data
analysis with the "dataprac" dataset, but we
will not be doing this. Instead, your practice work
on the concepts in textbook chapters 7, 8, and 9
will be done with in-class SPSS work and BlackBoard
exercises that use newer, better-quality datasets.
-
Make sure to review
the summary of research example by Ceka and
Magalhaes (you do not need to scrutinize the
summary of Park and Shin's article). In addition to
paying attention to how different variables were coded
in the study, review the questions in the study guide,
which ask you to think about this study's research
design. Specifically, see if you can identify the
theory, dependent variable independent variable,
intervening variable (which is assessed with three
country-level measures), and controls.
-
For each of the statistical techniques we cover in
this class, your instructor will summarize the key
ideas that you need to remember in a block of
material, like the one you see immediately below.
Review these blocks of material carefully until you
understand every point well. The reason that I
am summarizing this material in the schedule is so
that you can go to one place to review every major
statistical concept. These same ideas are covered
in class, in PPTs, and (for the most part) in
assigned readings. I am doing this way so that you
can take limited notes in class and still be
confident that you have clear notes on the concepts
you need to know.
-
Consider using AI as a tutor to make sure that you
fully understand the notes that are provided for
each block of statistical methods material. For
example, you might test your understanding of the
block of material that comes right after this note, by
copying and pasting those notes into ChatGPT or
Claude.ai and asking this prompt:
I am taking a research methods class for political
science and IR majors. Here a some summary notes the
professor gave us for what we are supposed to remember
from class. Can you give me a clear, easy to
understand explanation of each key concept and a
couple of additional examples for each key concept.
Also, can you then create a 20-item multiple choice
quiz for me that covers definitions, concepts, and
applications of the key ideas. Do not mark the correct
answers in the quiz. Instead, give me an answer key
afterwards together with an explanation for each
correct answer.
-
After class and completing your
assigned readings, make sure that you feel
comfortable with your understanding of a number of
basic the concepts and methods that researchers use to
explore the "central tendency" and distribution of
different types of variables. These concepts
will come up continuously for the rest of the term. Here
are some big takeaways that you will want to remember:
-
The type of variable you are analyzing
matters. Means, medians, and standard
distributions communicate valuable information
about interval (aka "continuous")
variables. With an interval variable, every
one-unit increase is assumed to be equal in
influence. When that it not the case, interval
variables are often modified so that a one-unit
increase is roughly equal in importance. For
example, researchers typically substitute years of
formal schooling with an interval variable that
looks something like this: 1=no high school degree;
2=high school degree; 3=some college or 2-year
degree; 4: 4-year college degree; 5) Masters degree,
6: Doctoral degree.
-
Standard deviations are a widely used measure
(in all fields) to look at the distribution of
an interval variable
across its range of values.
A straightforward way to
explain a standard deviation (SD) is to say that it is the
measure of how far away from a variable's mean most
respondents' answers are scattered.If a variable is normally distributed, two-thirds of
all cases will be within one standard deviation of the
mean, a range that is approximately a third below to a
third higher than the average respondent. And almost all
respondents fall within two standard deviations below or
above the mean. Specifically 95% of observations in a
normal distribution fall within two standard deviations.
An observation that is three standard deviations below or
above the mean would be an extreme outlier (i.e., in the
bottom .3% or the top 99.7%).
Standard deviations are useful for comparing the
distributions of two populations on the same variable.
So, if we have two cities with the same mean and median
income but very different standard deviations that vary
by tens of thousands of dollars, the city with the
highest standard deviation also has a higher degree of
inequality. In short, a small standard deviation means
that most observations are clustered around the mean,
while a larger standard deviation means that more
observations are further away from the mean.
Standard deviations also are useful for comparing
observations on two different variables. For example, we
might say that someone whose LSAT (law school admissions
test) was 1.7 standard deviations above the mean is a
better test taker than someone whose score or the GRE
(the exam many graduate schools require) was 1.5
standard deviations above the mean.
Why use standard deviation statistics when you could
use percentiles? The reason why social
scientists, economists, financial planners, and lots of
other professionals use standard deviations to examine
distributions of data is because just looking at
percentiles can be misleading. Percentiles focus on
rank order, while standard deviations give you a sense
of how far a given observation is from the average.
Imagine a long race with 100 runners who all finished
within two second of each other--one runner finished one
second behind all of the other runners except the
winner, who finished one second ahead of all but the
last finisher. The standard deviation for race times
would be tiny, so we would know that a runner who
finished 2 standard deviations faster than the other
runners didn't run much faster at all.
One of the big issues with percentiles is that unequal percentile
spacing can suggest big differences where
there are almost non or small differences when there are
big differences. . For example, someone with an LSAT
score of 150 (out of 180 possible points) is in the 39th
percentile of takers. Just a 3-point higher score puts a
test-taker in the 50th percentile just three points
higher than that (a 157) puts the taker in the 61st
percentile. So, at the center of the distribution, a 6
point gain in the LSAT score corresponds to a 20-point
gain in percentiles. In contrast, going from a score of
170 (96th percentile) to a 176 increases a test-taker
percentile by just four percentiles (to the 99th),
-
When an interval variable's distribution is
highly skewed, researchers typically cap the
highest or lowest values so that the mean and the
median are closer and so that the mean doesn't
distort what the typical respondent looks like. For
example, income in the US has a large, positive skew
because a relatively few number of individuals earn
a tremendous amount of money compared to everyone
else. This means that the mean income in the US is a
lot higher than the median.
In advanced analyses examining relationships between
variables (e.g., income and happiness), researchers
usually use calculations based on variable means
rather than medians. When a variable like income is
highly skewed (so its mean differs greatly from its
median) researchers usually recode the data to
collapse extreme values at the end of the
distribution. For instance, income data are commonly
capped by grouping everyone earning $200K or more into
one high-end category. Recoding like this combines
billionaires like Bill Gates or Jeff Bezos with the
merely highly affluent, pulling the mean income closer
to the median and reducing the standard deviation.
-
The median, means, and
standard deviations of a categorical (aka
nominal) variable are useless because
the values assigned to a given category are
arbitrary for this type of variable. We
can use a frequency table or chart to explore the
distribution of a categorical variable, but for
statistical analyses, this type of variable
typically is recoded and analyzed as
a series of separate dummy variables. For
example, if you have party variable with three
categories, a mean of 1.3 doesn't mean anything, so
you'd want to create three dummy variables, one for
each party. Remember, a dummy variable is coded
1 if a respondent is in the group and 0 if they
are not. If your analysis would benefit
from also summarizing the distribution of a
categorical (also called "nominal") variable, you
will want to make a separate table or figure
reporting its "frequencies" (in percentages) because
the mean and standard deviation of a categorical
variable does not communicate useful information for
these types of variables.
-
Remember how to interpret the means of dummy
(aka "binary," "dichtomous," and "0/1" variables.
By convention, we report the standard deviation for
each dummy variable, too, even though the SD
information for a dummy variable is not useful. By
itself, the mean value for a dummy
variable reports its distribution in your sample;
e.g., a value of .37 for the dummy variable Democrat
indicates that 37% of the sample identifies as a
Democrat.
-
What do
researchers do to summarize the main
characteristics of an ordinal variable?
While there are statistical techniques specifically
designed for independent and dependent ordinal
variables, most analyses either recode ordinal
variables into dummy variables or treat them as
interval variables
However, if a researcher is not using the kinds of
statistical tests we’ll focus on for the rest of the
semester and instead simply wants to show how a
variable is distributed, a frequency chart is often
the best choice. For example, imagine three students
who each have the same GPA of 2.0. One student earns
an even mix of Bs, Cs, and Ds. Another earns mostly Bs
and Ds, with just one C. The third earns all Cs.
Although the mean and median GPA for all three
students are identical, the distribution of their
grades reveals that they are quite different types of
performers. In this case, displaying each student’s
grade distribution with a frequency chart is the
clearest way to communicate their
differences—especially when addressing a general
audience that may not easily interpret measures like
standard deviation.
The most common type of interval variable is a
Likert scale. Important: you always want to
think carefully about treating a Likert scale as an
ordinal variable versus creating a dummy variable from
it. While researchers often do this, it is worth while
to examine a frequency distribution of the variable.
If we ask how much an American approves of the
Republican party with a Likert item (1=Not a all,
2=very little, 3=do not approve or disapprove, 4=quite
a lot, or 5=very much), there's not going to be much
difference at all between responses 1 and 2 or 3 and
4. Typically, scholars will consider recoding an item
like this into either a 3-point interval variable
(disapprove, neither, approve) or creating a dummy
variable (1=approve; 0=disapprove or neither). It is
common practice to convert ordinal items with 6 or
more responses into interval variables.
Topic 11
(Friday, Oct. 24)—Computing, interpreting, and
comparing descriptive statistics for more than one
variable in SPSS.
-
Remember that SPSS
#2 on BlackBoard is due by 5pm. If you have
a unique situation related to how SPSS is working on
your computer, please contact Dr. Setzler after you
have carefully tried to address the issue yourself,
carefully reviewed the instructions/screencast links
above, and tried to use an AI resource to troubleshoot
your coding. It is better for all involved for you
to submit a late assignment (that will receive a
modest penalty) rather than one that is partially
completed.
-
Ahead of class,
print out this handout--the third
of three that your instructor has written to cover all
of the SPSS and statistical methods we use in PSC 2019
and PSC 4099.
-
Ahead of class,
reread five pages of Mark Setzler, "Did Brazilians
Vote for Jair Bolsonaro Because They Share his Most
Controversial Views?" Specifically, print
out and review:
You are being asked to review these materials (again)
because they provide and example of how survey dataset
variables are recoded so that researcher's hypotheses
can be tested. The two figures provide examples of how
data-splits and frequency results can be used to
visually test arguments. We will be doing this kind of
work in class
-
In class, we
learn about and use SPSS to practice one of the
techniques researchers use to examine and visually demonstrate
whether two variables are associated with one
another. Specifically, we will be splitting a dataset
by independent variables and then computing
means or frequencies for dependent variables.
You soon will be reading Chapter 8 in
Forestiere's textbook, which covers the topic of
"Bivariarte Analysis." While the statistical tests and
the methods discussed in that chapter very useful,
usually the best way to begin analyzing whether there
may be a relationship between two or more variables is
to see how variations in one or more independent
variables correspond to different means or proportions
for a dependent variable. This procedure looks only at
the data that are in our survey and does not involve any
statistical tests to determine whether any differences
between groups we see for our sample and the larger
population.
For example, if we think that there probably is a
relationship between gender and a person's income, we
can use SPSS's command Data -> split file ->
compare groups (and select the variable gender, which in
this example we have coded into a binary measure of
males and non-males). Once we have "split the file," we
will see statistical results for males and then
non-males every time we run a statistical procedure.
Each time, we will get two sets of output because there
are the two values for this variable. So, if we now run
a descriptives command for our variable measuring
household income, the results will tell us what the mean
income is for males and what it is for non-males.
If we have several independent variables (e.g., perhaps
we want to see how household incomes vary for women and
men, Republicans and Democrats, and people who are under
40, 41-65, and over 65 years of age, we can repeatedly
split our data by the relevant variable and then run
descriptives tests. We can examine the different means
in a table, but for a presentation or paper that
considers the differences for several different groups,
it will look best if we compile our results in a
chart/figure.
Week 11
Topic 12
(Monday, 10/27)—Comparing means and
frequencies in bar charts
-
In class, we
will be visually summarizing frequencies and
descriptives statistical results in spreadsheet
barcharts. You have no assigned readings for
Friday. We will be using statistical results to make
the kind of bar charts you see in the Brazilian
election article you were asked to reads parts of
ahead of Wednesday's class.
After class, if
you need additional guidance on creating Excel
charts to visually summarize SPSS reults, you have
the option of
watching this screencast: https://youtu.be/T6kHpZ2oReQ.
It shows you how splitting data and calculating the
means for several different variables will allow you
to make a nice-looking chart in Excel to show your
results. Making this kind of a chart is a task you
will need to do for your next BlackBoard SPSS
Assignment.
- After class, take this practice SPSS exam so you
know what your test will look like for Wednesday. Due
to the very short turn-around time, this test won't be
graded. The purpose of the practice instrument is to
make sure you know what kind of questions to expect and
in what format.
SPSS
test 1 (Wednesday 10/29 . The test will be
taken in-class, on BlackBoard) .
The test will require you to use SPSS to
drop some observations based on how people answered a
specific question. You also will be asked to create a
smaller version of a dataset by dropping a couple of
variables. And you will be asked to recode and create a
few variables, including new variables that draw
information on two or more other variables (e.g. to create
a white, female, Democrat dummy variable). You will show
that you have created and labeled new variables correctly
with frequency tables. You will also be asked to generate
and interpret descriptive statistics for a variable and
its subgroups (i.e., to split the variable into groups and
run frequencies or descriptives). The test will make up 5%
of your final course grade. IMPORTANT: You may
bring a 3x5 notecard with you for the test that has
important information--including sample coding. You my
write or type only on one side of the card.
Unit 2
test: (Friday,
10/13).
The topics covered on the exam are noted
in the study guide (i.e., the Focus Questions handout that
has been in the PPTs folder since the start of the unit).
You will not be using SPSS or interpreting any SPSS output
as part of this exam. You will be asked conceptual
questions related to the use of SPSS and other statistical
software. IMPORTANT: You may bring a 3x5 notecard with
you for the test that has important
information--including sample coding. You my write or
type only on one side of the card.
|