RESEARCH METHODS IN POLITICAL SCIENCE


COURSE HOMEPAGE     STUDENT RESOURCES



Please note: At your instructor's discretion, there may be minor alterations to the reading assignments listed below. One of the major advantages to providing you with an on-line readings archive is that timely articles can be added or substituted when appropriate. Opening documents downloaded from this website will require that your computer have Acrobat Reader . You will also need the class-specific password to open individual files.

UNIT 2 ASSIGNMENT SCHEDULE

Links to helpful resources:

 

Week 6

Monday, September 23: In-class Exam 1.  The final version of your Unit 1 study guide will be posted a week ahead of the test. The exam will contain 10-15 multiple choice and several short-essay items.


Topic 6 (Wednesday and Friday, Sept. 25, 27, after the test): Is political polling getting better over time? Can we still trust surveys to determine what Americans think about contentious issues?

  • Stef W. Kight and Margaret Talev: "Study: What Americans really think" (Axios 2022, 3pp.)  This article uses the phrase "self silencing" to refer to what social scientists refer to as "social desirability bias." What is a list experiment? How does it tackle the problem of this type of bias? While we know that people don't always admit to their prejudice against other group, what does this article (and data in class PPTs) say about how widespread social desirability bias is?

The next several articles discuss how political polling has improved over the last several elections. It looks like a lot of reading but together, it adds up to less than 15 pages.

  • Geoffrey Skelley, Why Was The National Polling Environment So Off In 2020? (FiveThirtyEight, 2021, 3-4pp). How far off are polls from estimating actual turnout and vote choice in the typical election? What is non-response bias? This article doesn't talk about it, but what are some of the factors that might cause the supporters of Donald Trump to be more or less likely to take a survey than Democrats, independents, and even some Republicans who don't support him?

  • Andrew Fischer, "Polls Were Great in 2022. Can They Repeat Their Success" (NY Times, 2pp). What are polls doing to get better? How has adjusting polls to make sure that they have an proportionally accurate share whites and non-white individuals without a college education and respondents who voted for Trump in the previous election improved polls' predictions. 

  • You might be interested in reading more (i.e., this is optional, but only because it is so detailed) about how accurate political polling typically is and whether there is a political bias to polling (when large numbers of surveys are combined, there typically isn't, especially if you weight for "house effects," which fivethirtyeight.com does): https://fivethirtyeight.com/features/2022-election-polling-accuracy/


Week 7
Topic 7—An introduction to some important tools of the trade: Statistical analysis programs, scholarly research resources, and survey datasets

Monday (9/30): Downloading and configuring SPSS

  • Important: Over the weekend, ahead of class, try to complete all of the steps listed below as it may not be possible to do them during our class time. Sometimes, it is hard to download SPSS in the classroom due to slow bandwidth issues. If you follow all of the instructions listed below and successfully install SPSS, configure SPSS (Mac users only), and ensure that you have your computer set up so that your PSC 2019 coursework is being backed up to a cloud server where you can recover previous versions of files, you will not need to be present for much of Monday's class. Monday's class will be primarily used

  • BRING YOUR COMPUTER TO CLASS. Unless instructed otherwise, please bring your laptop to class going forward as you will need to use it for the statistical work we will be doing for the remainder of the term.

  • In class, we will try to make sure you have key pieces of technology in place so we can begin to learn how to use SPSS, a widely used statistical analysis software package. SPSS is one of the four major packages widely used in the social sciences (the others being R, Stata, and SAS).


    While you have access to SPSS on some university computers, you will find it much more convenient (and essential if you have to quarantine at any point) to install and run SPSS on your own laptop. HPU has a site license that will allow you to use an individual-version of the program on your own computer all semester at no cost. If you want to get started ahead of our class (or need to reinstall the program at some point after class), here are the instructions to download a copy of SPSS and then a 6-mo license from HPU’s IT office:

    • For both PCs and Macs, you download the program first and then the enter the license code. The fastest way to do this is to open the text file with the license code and copy that user-specific code before you download and install SPSS.

    • When you load the SPSS program, you will have to activate the license. When you get to this point, check the option indicating that you have an individual user license. After a couple of clicks, you will paste your license code in and click the "add" button. From there, you may have to click through some more OKs to complete this process.

    • Once you are done installing SPSS, make sure you can run it. To do so, you will need to find and open it. It likely won't be on your programs dock/banner initially. For PC users, you can quickly find and open a program by using the search bar on the bottom banner. For Mac users, you can quickly find a program by clicking on the Spotlight icon (magnifying glass) in the top right-hand corner of your computer. To make things faster in the future, you could create an alias for your desktop

  • Important!: For Mac users only: You should quickly complete a set of one-time changes to SPSS for Mac's default settings so that your version of SPSS will look and act just like SPSS for PC. Important: You aren't changing any other program or your OS, just your SPSS settings. Open SPSS for Mac in its default mode, selecting the option to open "a new dataset," which opens a blank data editor window. From there, follow this link for instructions on making the necessary changes. Why are you being asked to change the program's defaults? Inexplicably, the default layout of SPSS for Mac looks quite a bit different from the PC version, and the default Mac settings lack SPSS options that are described in most textbooks, my screencast tutorials, and most instructional materials you will find online.

  • Also in class, we will download and configure Google Drive for Desktop (if you don't already have it) and configure it to back up your Documents folder, which is where you will store your PSC 2019 SPSS datasets and work. Backing up your SPSS datawork every time you invest time into a project is critical, and a failure to do so can result in a major setback.

    • For both PC and Mac users, I *strongly* recommend that you install Google Drive for Desktop and configure it so that you automatically sync and backup to the cloud all of the files in your computer's "Documents" folder every time you make changes to and save a file your computer. The main advantages of Google Drive over some other backup programs are cost related as well as the fact that you can automatically access multiple previous versions of a file instead of overwriting the only version of a backed up file every time you save it. This feature is important if you ever make a mistake with a file and save changes to it before realizing what you have done.

    • Here are instructions on how to download and install Google Drive for Desktop:
      https://support.google.com/a/users/answer/9967896

    • Also take a look at the instructions on this webpage regarding the steps you to need to take to configure Google Drive so that it will sync the folders and files in your "Documents" folder with the online version of Google Drive. When you configure Google Drive without purchasing additional cloud space (most of you won't need it), you will have a large, but limited, amount of free space to work with. If you are going to be using the free version of Drive, you have the option of syncing your documents folder without also backing up space-hogging photos and videos. You can have Google Drive and other backup programs running simultaneously, so use something else to back-up your photos and video if you are not going to pay for extra storage.

    • To verify the settings for Google Drive's for Desktop, you will need to open the program rather than the storage area. For a PC, you open Google Drive by clicking on the up arrow in the bottom-right corner of your computer, which will show you several different programs running in your computer's background. Select the triangle icon that is blue, green, and yellow. For a Mac, click on the Spotlight icon (it looks like a magnifying glass) in the top right-hand corner of your computer. Search for Google Drive. It should identify the program icon for you.

    • When you set up Google Drive's for Desktop's configuration, you want to "mirror" rather than "stream" files. Mirroring a file means that there will be a copy of your file stored on your personal laptop and a second, identical one synced to the cloud. To make sure you have this configured correctly, open the Google Drive for Desktop app. Click on the (1) gear icon and then (2) preferences and then (3) "Google Drive Folders from Drive" and check the "mirror files" option if it is not already checked. 

    • Once you using Google Drive and have it configured properly, create a folder for all of your Research Methods work in your computer's Documents folder. 

    • Finally, verify that Google Drive is configured so that it continuously backs up (i.e., syncs to the cloud) a copy of your entire documents folder or at least your senior-seminar-work folder. To tell Google Drive for Desktop to backup a specific folder, open the program. Click on the (1) gear icon and then (2) preferences and then (3) "My laptop" and then (4) "add folder" and then select the folder you want to back up. If you don't have a huge amount material saved in your documents file, just select the whole folder. At a minimum, back up the folders with your Research Methods work.

Wednesday (10/2)

This week will have very little out-of-class homework so that you can use this time to review screencasts on what we are covering in class if you think a review is necessary. In class, we will continue getting introduced to SPSS and learn the basics of "data set preparation," which is the process of downloading survey data and then taking, if necessary, a few additional steps to make sure that we can analyze just the data we want.

  • In Wednesday's class, most of our time will be spent downloading and taking a preliminary look at a 2022 dataset from PRRI (which we will download in class). We will also make sure that we download the survey's methodology and original questionnaire, taking a close look at both.

PRRI is a nonpartisan think tank that annually administers its Values Survey, which includes a large number of interesting topics (for example, whether or not an American thinks their top leaders need to be moral in their private lives to be effective in the their public roles. We will be looking at this particular dataset because it covers so many interesting topics. For political science and international relations majors who want to begin thinking about the research design they will submit at the end of the term, this may be a good source of data for your project.

We will discuss how we can use SPSS's Frequency command (Analyze -> Descriptive Statistics -> Frequencies) to see how Americans are divided on various issues.

We will also learn how to use SPSS's "Split File" command [Data -> Split File -> Compare groups (and then identify the variable whose response categories you want to compare)]. For example, we can split our dataset on the partisanship variable and then a frequency to see how Democrats, Republicans, etc. differ in how they answered the questions in the survey.  This approach is a very quick and simple way to get a first look at how an independent variable (for example people who identify with different races) corresponds to differences in a dependent variable (for example, the party they identify with). Important: After you have compared your groups of interest, make sure to go back to the split file and select the option to analyze the full dataset; otherwise, SPSS will continue to report all statistics for the subgroups as long as the program is running.

  • Watch after class if you need more guidance on splitting variable so that you can analyze what different categories of that variable answered a survey question (the screencast above also covers this in the second half). This screen-cast to learn how to split variables into subgroups for analysis: https://youtu.be/YWgz0bKcq-M (a little over five minutes). Note that this screencast uses a different dataset than the one downloaded in the last screencast.


Friday (10/4) will be spent reviewing some of the places you can find survey datasests as well as how to more effectively use Google Scholar and seeing what AI resources like Concensus can do help identify interesting research questions and survey the existing research on a topic

    • Take look at a couple of the most recent stories posted by Pew to get a sense of how the organization tells folks what they are doing with the public opinion surveys they are administering. Not every story focuses on public opinion data, so find a couple of those.

    • Next, go to the tab on top banner labeled "Research Topics," and then select the link for "Full topic list." Take a look at a couple of topics that seem interesting.

    • Now, go to tab on the top banner labeled, "Tools & Resources." Look at the resources for "Survey Question Search." Try a practice search in "International Questions." For example, check the box that will show results only from surveys that have been administered in Brazil and then search this term: polit (the partial word search will survey questions that included either political or politics).

    • Now, go back to tab on the top banner labeled, "Tools & Resources." Look at the section labeled "Dataset downloads." This is where you can find and download the public-use datatsets noted in many of the Pew articles you see on the website. This is a great place to start if you are looking for data on specific groups or have no idea yet what you'd like to write a research project on. If you are interested in doing work on foreign countries, for example, you'll want to begin looking in the "Global Attitudes" unit.

    • IMPORTANT: Pew typically embargoes the release of new survey data for a least a year so its researchers can publish findings first. This section of the website will be the best place to start looking for a thesis topic using Pew data because the press releases in this section all refer to datasets that already are available. If you are most likely to do a topic on American politics, the ATP (American Trends Panel) data is going to be a good place to start looking for a topic.

  • Before class, quickly familiarize yourself with the work of Pew's research units that focus on specialized topical areas (these are the topic areas Pew uses to organize the dataset in the "Dataset Downloads" section that you looked at earlier. Take quick look at these Pew Center websites to get a sense about what kinds of topics the various research units at Pew are looking at:

    Pew Research Center for the people and the press (domestic studies on issues other than those tackled by Pew's special units): https://people-press.org/

    • Data for other Pew research units, can be obtained by following this path: Tools and Resources -> Dataset Downloads -> Select Research Area

  • Ahead of class, you do NOT need to review materials at the other research centers listed below, but be aware that there are many other places where you can locate high-quality, publicly available datasets at no cost. See some of these resources at: https://marksetzler.org/generalissues/SurveyDataSources.html. Students in advanced political science classes have used datasets from a wide range of sources.



Week 8: No classes, midterm break!


Week 9

Topic 8—Some SPSS basics and hands-on practice: Preparing survey datasets for analysis

On Monday (10/14), we will go over the steps necessary to drop observations and variables from a dataset.

  • In class, we will continue working with the PRRI Values Survey from 2022, which you downloaded previously.

  • Ahead of class, if you haven't already done so, please make sure to have an SPSS version of this dataset saved to a folder on your computer's Documents folder. Specifically, you want to have it saved in a subfolder there, so that the pathway to your dataset looks similar to:
    Documents/PSC2019_SPSSwork/PRRI_Values_Survey_2022/PRRI20220.sav

    If you still need a copy of the dataset, I have put it and its questionnaire in a zipped file (so you can download both) in the course's PPTs and Assignments folder: https://marksetzler.org/ResearchMethods/PPTs/.  The dataset is in a folder labeled "SPSS 1 (downloading and coding data)"

  • Ahead of class, please read through the first part of this document that I asked you previously to download and print out (a handout I have prepared so you have a quick reference) up to, but not including, SPSS Basics 5: Recoding Data. This handout covers EVERYTHING we will be doing in class on Wednesday and Friday.

  • During class, we will be looking more closely at how the "syntax" window works. This is where we either manually enter or use SPSS's point-and-click features to write code telling SPSS to do something. Specifically, we will be learning how to use sytax to "prepare a dataset for analysis," which usually means removing variables and perhaps observations that are irrelevant to a study.

  • After class, watch this screencast if you need more guidance on how to create a smaller version of an SPSS dataset that keeps only some of the variables (and also how to recreate that dataset if you later discover you omitted variables you shouldn't have). The screencast runs a little over 14 minutes.

    • Important: I left two important things left out of screencast. First, you need to make sure to save your syntax so that you will able to run again it in the future if you want to add additional variables from the full dataset. Name the syntax file something like: "Syntax to keep only some variables."

    • Second, you should add a note (starting with an asterisk) at the top of the syntax reminding your future self that this syntax needs to be run only when the full copy_version of the dataset is open in SPSS.

    • So why do you need to know how to drop most of the variables from a dataset? If you don't take a few minutes now to learn how to create a truncated version of a dataset, every time you generate statistics for a project, you will have to scroll down through large amount of irrelevant information. And, if you are working with a dataset that has dozens of similarly named variables, working with the full dataset increases the likelihood of making mistakes in your coding and analysis. 

Key ideas from the screencast, so you shouldn't need to watch it more than once (if at all):

You can create the necessary code to make a small version of your dataset in two steps:

Step one:

  • Open the copy_ version of the full original dataset. 
  • If you are using a Mac you need to have previously configured SPSS for Mac so that SPSS works the same as it does on a PC.
  • Use the FIle -> "Save As" command, and click your way to the destination folder where you want to save a smaller version dataset.
  • Do not hit the ok button--after you give this file a new name, you are going to use the Paste button to paste the code into a syntax file. We will add an additional line to this code in a minute. If you are using a Mac and do not see a paste button, you need to verify that your version of SPSS is configured as explained in the schedule second related to installing SPSS.
  • Rename the file being saved so that you know it is a smaller version of the dataset. Dr. Setzler typically keeps the name of the original dataset, but adds *small_to the front end of the file. His system is to have original_, copy_, and now small_versions of the dataset.
  • Again, after you have renamed the file, use the Paste button to paste the code into a syntax file.
  • The last step will automatically open up a new syntax file and paste a command telling SPSS to save the small_version of the file. If you select and run that code, it will save the full dataset, and that's not what you want. You are going to want to tell SPSS to keep only some of the variables when you run that SAVE command. Before you modify your syntax to do that, take a minute and add a notation (starting with an asterisk so SPSS will grey out your note and not try to run it) explaining to your future self what this code does. This is a good time to remind yourself in the note that, "In order for this code to run in the future, you need to have the full copy_version of the dataset open." 

Step two:

  • Now, you want to manually change the syntax you just pasted, telling SPSS to save the _small file with just a subset of variables. To do this, type out  "/KEEP=" as a subcommand and then list of variables to your save command. The line of code that you add should look like what I have bolded here:

SAVE OUTFILE='C:\Users\msetzler\Google Drive\SeniorSemPrepDataset\AmerBarUS2018_working.sav'
/KEEP = Var1 Var2 Var3

/ COMPRESSED.

*Note that syntax lines that begin with a forward slash are subcommands. The ONE period for the full command goes at the very end of the command, which is after the last subcommand in this case.

  • Select just the SAVE code block (i.e., from SAVE OUTFILE through COMPRESSED.) and tell SPSS to run the selection. You can do this by green arrow in the commands icon options or by right-hand clicking and selecting run selection (on a Mac, holding the control button down while doing a touchpad tap is the equivalent of a right-hand mouse click).
  • Verify that you now have a _small.sav file in the folder where it is supposed to be. Open it up and make sure just the subset variables you need are there.
  • If all looks good, add two notes to the top of your syntax. Start notes in your syntax with an asterisk to SPSS will grey out the note and not treat it as code. SPSS sees a period as your indication that the note has ended. The first note should be a reminder to your future self that whenever you run this code, you will need to have the copy_ version of your full dataset open, and you will open this syntax file from within that dataset (File -> Open-> Syntax). The second note should be a quick reminder to yourself of what this syntax does.
  • Once everything is working and you have annotated your syntax, save the syntax file to the same folder as your datasets. Name it clearly so you can find it again in case you need to re-run it in the future.
  • These steps are important because you may need to add or drop additional variables in your small_ dataset at some point, To do so, all you will have to do is delete the current small_ dataset file, change the syntax as necessary to add new variables you want to keep, and rerun it. As you saw in the screencast, this quick step will recreate a modified small file that has only the variables you need.
  • If all looks good, close the syntax and all of the open datasets. Do NOT save any changes to the copy_ dataset if asked to do so. Going forward, you will be working only with that small datset.
Pro-tip from the screencast:
  • When editing the Keep= line, rather than copying variable names individually or typing them in manually, it is a lot faster and way less error-prone to create a list of variables to keep using a point and click command: Analyze->Descriptive statistics ->Descriptives->and then select all of the variable names you want to keep. Then, paste your descriptive  command into your syntax. Select just that code and run it (to run it, hit the green arrow button). 
  • If the output for the variables looks good (i.e., you got the right variables listed), then copy and paste the descriptives command's list of variables into your /KEEP= code line. 
Another pro-tip from the screencast:
  •  When you are in the Descriptives selection window (or any command listing all of the variables), it is a lot faster to find the variables you are looking for if you right-hand click on the variables (Mac users need to press the command key and click at the same time) to first show variable names and then to sort those names alphabetically. This trick works in all of the menu's dialog windows where there is a long list of variable labels listed in whatever order they appear in the dataset.

  • After class, watch this screencast if (i.e., doing so is optional) you need more guidance to prepare a dataset so that you will analyze only a subset of a survey's respondents: It is about 17 minutes long, so you may want to try following the written instructions below first. Why do you need to know how to do this? For example, when international relations majors are using a Global Attitudes or a LatinoBarometer survey, they often only want to look at respondents from certain countries, and running statistics on the full dataset would return results for all of countries in the survey.
Key steps noted in the screencast:

(1) Open the copy_ version of your dataset

(2) File -> open -> syntax... and open the syntax file you used to create the small_ version of your dataset (the one where you have removed many variables).

(3) If your small_ dataset does not have all of the variables you will need to tell SPSS to keep only certain types of respondents, add those variables to your syntax and re-run it to recreate your small_ dataset.

(4) At the bottom of your syntax, make a notation, starting with an asterisk, indicating which types of observations you will be dropping

(5) With the copy_ version of the dataset open and the small_version closed, Point-click-paste: File -> Open -> Data -> and select but do not actually open the small_ version. Use the paste button to tell SPSS to paste the "GET" command into the bottom of your syntax. That syntax will tell SPSS to go get and open the small_ version of your dataset when run. The reason why this step is important is that you are going to be automating an SPSS command to get, open, and then change the small_version of the dataset rather than your full copy_version.

(6) Select and run the GET command you just pasted. Doing so will open the small_version of the dataset, And then close the copy_ version of your dataset. Again this is to make sure you don't change and save changes to the full version by mistake when you need to be modifying the small_ version only.

(7) Now, you can create the code that will keep only some observations. Use point-click-paste to work your way through the Data->Select Cases command. You will need to enter logical criteria specific to your needs in the "Select If" module and then hit continue. You then will need to tell SPSS to "Delete" all of the "unused cases." Remember to paste this code into your syntax.

(7) Now you need to automate the process of saving your small_ dataset everytime this full syntax file is run. To do so: Point-click-paste: File -> Save AS-> Data -> and select but do not open the small_ version. Paste this code into your syntax.

(8) Select and run the "Filter off" code all the way down to the end where you save the dataset. Take a look at the small dataset and make sure that it only has the variables and observations it is supposed to have.

(9) Save your revised syntax to write over the first version. Now, if you need to make changes in the future, you will just open this syntax from within the copy_ dataset,, make your edits, and re-run the syntax.



If all of this seems like a lot of work, you don't have to automate things as I suggest above and in the screencast. Here's a faster way:

(1) Make a copy of the small_version of your dataset. If you accidentally delete the wrong observations, you want to have a backup. You will not need this backup unless you delete the wrong observations and then save over the small_version of the dataset, too.

(2) Now, open the small_ version of your dataset that has most of the variables removed.

(3) Create and run the few lines of code needed to remove observations from the small_ dataset. To do so, go Data -> Select cases -> Select if. Then identify the conditions that need to be met for a variable to be kept. In the screencast, variables are kept if (partyID7 =1 OR partyID7 =1 OR partyID7 = 2 OR partyID7 =3 OR partyID7 >4. If all looks good, select the button telling SPSS that you will delete unused cases AND paste this code into syntax.  

(3) Run your code and then use a frequency command to look at who is in your dataset. If you have removed all and only the observations you wanted to remove, save the edited small_ dataset, replacing your previous version. This version has fewer variables and fewer observations than the copy original dataset.

(4) If you remove observations this way, make sure to include annotations at the start of your your new syntax to remind your future self what this block of code does. 

(5) If you need to add additional variables to your small_ dataset in the future, remember that you will need to run this second syntax file to remove the relevant observations and save the file again.   

Here's an example of what your syntax would look like if you wanted to only include respondents living in France, the UK, and the US (assuming the variable "country" is coded 17, 2, and 4) for French, British, and Americans, respectively:

*This code will drop any respondent who is not from one of those three countries.*
FILTER OFF.
USE ALL.
SELECT IF (country = 17 or country = 2 or country = 4).
EXECUTE.

*Here is a second example, but this time for the select line, let's use "AND" to tell SPSS to keep observations only if they meet two conditions (here, they are male Republicans):
SELECT IF (Party=2) AND (Male=1).

If we wanted to keep only males who were Republicans or Democrats (but not independents or other responses), the select line would look something like:
SELECT IF (Male=1) AND (Party=1) OR (Party=2) .

Important: If we have a key variable in our study and many respondents were not asked the relevant question, we want to delete all observations in the dataset that have "missing" data for that variable. If we don't do this, descriptive statistics for all of the other variables will analyze lots of respondents who will not be included in bivariate and multivariate analyses. In this example, I want to remove all respondents who did not answer a question about whether they support making public colleges free to attend because the relevant question was presented to only half of the people taking the survey:

*This code drops any respondent who has missing data on the variable "free_college_good"
FILTER OFF.
USE ALL.
SELECT IF (NOT MISSING(free_college_good)).
EXECUTE.


On Wednesday (10/16), you will practice what you have learned about SPSS so far by starting to complete a Blackboard assignment in class. 

  • SPSS assignment #1 will require you to locate and download an original PRRI dataset, make a copy. version of that dataset, and then make a small. version that drops certain variables and observations. You also will need to split the dataset into groups and practice running descriptives and frequency commands. The assignment will be written so that most students should be able to complete their work in class; however, I will post it after class on Wednesday in case you want to get a head start.  

  • Barring unusual circumstances, SPSS assignment #1 (BlackBoard) must be completed by 5pm Friday (10/18). If you have a unique situation related to how SPSS is working on your computer, please contact Dr. Setzler after you have carefully tried to address the issue yourself and carefully reviewed the instructions/screencast links above. It is better for all involved for you to submit a late assignment (that will receive a modest penalty) rather than one that is partially completed.


Topic 9 (10/18 and 10/21) Recoding and creating variables; labeling variables and their response categories

On Friday and Monday, we will learn how to create variables in SPSS. You will be completing an assignment on this topic on BlackBoard. It will be due at the end of Week 10.

  • Ahead of Friday's class, read Chapter 7 in your textbook up to the section on descriptive statistics. This reading is just five pages; it will help you understand what we will be doing this week and why.

  • Also before class, read the rest of this document, starting at the section "SPSS Basics 5: Recoding Data." This is the first handout of three documents that your instructor has written to cover all of the SPSS coding and statistical methods we use in PSC 2019 and PSC 4099. The handout:

    • Explains why researchers typically recode or create every variable in their analyses rather than working with a dataset's variables in their original form. 

    • Summarizes everything we have done with SPSS so far, which mostly has involved the process of finding and downloading data and--if necessary--importing the data into SPSS, removing variables, and removing observations not related to our study in anyway.

    • Explains in detail how to recode an original variable. You will find this information helpful when you are completing the Blackboard assignment at the end of the week.


One of the main topics this block of material is how to use SPSS's RECODE INTO
command. Note: everything that is discussed below in this section will be taught in class. You have no assigned readings on this block of material other than reading through the points that are listed below. Any linked screencasts are optional; I record screencasts that duplicate class material because some students prefer visual summaries versus those I post in the course schedule.

  • The RECODE INTO command is your go-to method when are working with only one original variable and want to collapse its categories (perhaps into a dummy variable, like making the 0-1 variable Republican out of the original variable where Republican is one of several partisan groups coded onto the same variable). RECODE INTO is also a command we can use to reverse code a variable's values (e.g., using Conservative7 to make the new variable Liberal7). Here are the key steps to recoding a variable in SPSS:

(1) First,  use SPSS to run a CODEBOOK command on the original variable, Conservative7. You need to do this in syntax. It will show you how the variable is numbered and labeled:

   CODEBOOK Conservative7.

(2) Next,  if it is available, it would be helpful to have the original variable's exact question phrasing pasted into our syntax so that you can easily recall how exactly the variable was phrased later on.

You can write or copy and paste annotations into the syntax right before the RECODE INTO command. If you put an asterisk in front of an annotation with question phrasing pasted from the questionnaire, it will be greyed out so we can see it, but SPSS won't stall when it gets to that part of the syntax. SPSS will keep greying out language as long as their are no line breaks AND you don't click on the return and end a line with a period. So, when you are ready to end the annotation, use a period and click on return.

(3) Start to recode the variable by using SPSS's point-and-click interface, going to Transform -> Recode into New Variables. you want to use the RECODE INTO command which preserves all aspects of the original variable in case you make any recoding mistakes.

For this example, you will need to tell SPSS that you are going to transform the orginal variable HowConservative7 into a new variable named HowLiberal7.

In the same screen, you will provide a label for the new variable. Variable names cannot have any spaces. Make sure to label new variables in a way that makes sense and is specific enough that you will remember what a higher value is (e.g., don't name and label a variable "Gender" if you are creating a dummy variable for people who identify as male because you may forget who is coded 1 or 0 later on),

After you label you variable  the button "Change." Nothing will seem to happen, but the new variable will have that label when it is created late.

(4) Now, click the button for "Old and New values." This is the area where you do all of the actual recoding. Make sure that you think through how you need to recode. You need to recode every value in the original variable into some value (or system missing) in the new variable.

(5) It is best practice in recoding to start off by indicating that any value that was "system or user missing" in the old variable should be coded as "system missing" in the new variable. Choose those options and click on the add button.

(6) Now you need to recode the values. So 10 in the old variable = 1 in the new one, and click on the add button. And then 9=2 and "add," and so on until all ten of the values are reversed for the new variable.

(7) Once you have made sure to include instructions that will recode ALL of the original values into the new variable, click on continue and then paste. If you accidentally click on the OK, button, just point and click your way back through the transform command where all of the data you just entered will still be there, and this time click on paste.

(8) To create the new variable, you have to select and run the just code you pasted into syntax (run with the green arrow on the menu). Whenever you create a new variable, do what you can to make sure you did the recoding correctly. To do this run a frequency for the original and new variable and make sure that everything looks right. If you didn't code the variable correctly, look at your syntax and see what went wrong. If you need to edit the syntax, you can run the whole RECODE INTO command over again, and it will recreate the variable (this is a cool improvement over older versions of SPSS, which made you delete a variable before you could recreate it). 

(9)  Whenever you make a new variable, you need to label both it and the variable's response categories.  The fastest way to label variables is to do so in syntax. If you are using RECODE INTO SPSS, you already will  have created the code to label your variable back in step 3, However, you also will need to add value labels to any new variable you create. Here's what that code looks likes if we want to label just the anchors on our 7-point measure of liberalism (if we were working with a dummy, multi-category, or ordinal variable we would label each value):

    VARIABLE LABELS Liberal7 "How liberal is the respondent?".
    VALUE LABELS Liberal7
    1 "Very Conservative"
    7 "Very liberal" .

Notice that there are two periods for the two label-related commands. There is only a single period at the very end of each command, and it goes outside of any parentheses.

Note that if you want to change the variable label or any of its value labels, you can make the edits and just rerun the commands. This is one of the big advantages to working in syntax.

Finally, notice that once you have created the recode+relabel syntax for a variable or two, you can copy and paste that syntax and quickly swap out values and phrasing to very quickly create a whole bunch of similar variables.


W
e also will review the other technique most commonly used to change and create new variables in SPSS: Using a combination of COMPUTE and IF commands in syntax. Note: everything that is discussed below in this section will be taught in class. You have no assigned readings on this block of material other than reading through the points that are listed below. Any linked screencasts are optional; I record screencasts that duplicate class material because some students prefer visual summaries versus those I post in the course schedule.

  • If you prefer, you can use the COMPUTE command using SPSS's point-click-paste commands (see the Transform menu). However, it is a lot clearer and easier to write these commands directly into syntax.

  • You can use COMPUTE + IF commands to recode variables instead of using RECODE INTO. If you are making many similarly coded new variables, either type of SPSS command works well. To get you acquainted with using Compute and If statements, we will practice using them to recode a few variables in class. This technique will also be taught/reviewed as part of your BlackBoard assignment. After this week's classes, if you need a refresher, here is an optional screencast that goes over the technique (under 9 min if you listen at full speed). One important thing: After I recorded the screencast, SPSS changed how you deal with missing data on the original item. See the highlighted code below:

  • COMPUTE + IF commands are what you use when you want to create a new variable that combines information from two or more original variables (For example, we might want to create a dummy variable for White Republicans).

After class, if you need a refresher on what we practiced, here is an optional screencast that goes over the technique (9 min or so if you listen at full speed).

Here is a summary of what is covered in the screencast, only it uses an example (a dog's breed and color) that may be easier to remember:

Let's say we have a sample of dog owners who all own one dog. They have taken a survey asking questions about their dog. We want to create a hypothetical new dummy variable identifying the owners of white poodles. Our new dummy variable, WhitePoodle, will be created with information from two original variables: dog_breed and dog_color.

A CODEBOOK command indicates the following:

*dog_breed = 12 if dog is a poodle, while values of 1 through 55 refer to other dog breeds.

*dog_color = 1 if a dog is white, while values 2 through 10, refer to black, brown, grey, spotted brown, etc.

*If a respondent didn't know or refused to give information about their dog, the relevant variables were coded 99 in the dataset.

Here is the block of code we could use to create the dummy variable:

COMPUTE WhitePoodle = $SYSMIS. 
IF (dog_breed < 99) AND (dog_color < 99)  WhitePoodle =0.
IF (dog_breed = 12) AND (dog_color = 3)  WhitePoodle =1. 

*and we would needto deal with any missing data on the original variables if there were any:
IF MISSING(dog_breed) OR MISSING(dog_color) WhitePoodle =$SYSMIS. 
 
  VARIABLE LABELS WhitePoodle "Dog is a white poodle".
  VALUE LABELS WhitePoodle
        1 "White poodle"
        0 "Not a white poodle" .Note where the periods are.

Note that COMPUTE, IF, VARIABLE LABELS, and VALUE LABELS are all commands, so one--and only one--period goes at the end of each full command even if that command stretches over more than one line of code.

Here is the logic behind how the code was written, step by step:

Step 1
To see how the responses for dog_breed and dog_color are coded and labeled in the dataset, we run a codebook command:
    CODEBOOK dog_breed dog_color.

Step 2
Now, tell SPSS to create a new variable where all of the values are blank (i.e., are "system missing"):
    COMPUTE WhitePoodle = $SYSMIS.
If we were to look at the data set in the SPSS data viewer (Data View tab), we would see a new column for a variable name "WhitePoodle." All of its values would be blank.

Step 3
Tell SPSS to turn almost all of those black values into a zero if the observation should not be system missing:
    IF (dog_breed < 99) AND (dog_color <99)  WhitePoodle =0
Notice this set changes the blank to a zero (meaning not in our group) for every respondent who didn't say don't know. Now, if we were to look at the data set in the SPSS data viewer (Data View tab), we would see most the values for "WhitePoodle" are zero.


Step 4
Now, tell SPSS to turn some of the respondent's values into a 1 if certain conditions are met:
    IF (dog_breed = 12) AND (dog_color = 3)  WhitePoodle =1.
This step turns all of the zeroes into ones for people who have white poodles.

Notice that the order of commands matters! If we had done Step 4 before we did step 3, everyone who didn't refuse the answer the questions would be coded as not having a white poodle 


Step 5
Now, make sure to tell SPSS to turn some of the respondent's values into missing data if they had missing data on either of the original items::
    IF MISSING(dog_breed) OR MISSING(dog_color) WhitePoodle =$SYSMIS.   

Step 6
Now, tell SPSS to label the new variable and its response categories. For example:
    VARIABLE LABELS WhitePoodle "Dog is a white poodle".
    VALUE LABELS WhitePoodle
    1 "White poodle"
    0 "Not a white poodle" .

Step 7
Run a frequency on the old and new variables to verify that general pattern and number of observations looks right. If you made a mistake, read carefully through the code. If you make edits, you can run the block of code over again; SPSS will delete the variable and replace it with the edited version

  • COMPUTE + IF commands also are the best way to create a couple of other types of variables that researchers frequently use, including reverse coding variables with lots of response categories. While you can use RECODE INTO to reverse code a 10-point measure, it is a lot faster to use code like this:

COMPUTE New_Variable10 = 11 - Old_Variable10.
IF Old_Variable10 = $SYSMIS   New_Variable10 = $SYSMIS.
IF Old_Variable10 = 99   New_Variable10 = $SYSMIS.
IF MISSING(
Old_Variable10)   New_Variable10 = $SYSMIS.
*Notice the COMPUTE line's logic. We're telling SPSS to create a new variable by subtracting the     original variable's value from 11, which is one higher than the maximum value. This means a 10 on the old variable will be a one on the new variable while a 1 will become a 10, and so on.

*Notice also, that we have to tell SPSS which variables should be coded as missing responses on the new value.
Here, we're assuming 99 on the original variable means that the respondent refused to answer or said they "didn't know."

After the week's class meetings, if you want to review more details on reverse coding a variable using COMPUTE + IF commands, here is a optional screencast on the topic. It runs a bit over 10 minutes, but only the first 6:30 covers the reverse coding. In the sample code directly above, you'll see that I do something that I didn't do in the screencast, but that is important; I added a line of code to make sure that any missing responses on the original item carry over to the new item:
IF MISSING(Old_Variable10)   New_Variable10 = $SYSMIS.


I recorded the rest of the screencast because it shows your instructor making a very common mistake when adding variables (initially, both the VARIABLE LABELS and VALUE LABELS commands neglected to tell SPSS what variable needed to be labeled).


In the likely event that you need help trouble-shooting issues that come up with labeling new variables and their response categories, here is a list of common issues:

  • Building on the example from above, we might label our new variable:
    VARIABLE LABELS WhitePoodle "Dog is a white poodle".
    EXECUTE.

What to check if you can't get SPSS to create a new variable's label:

  1. Note that the VARIABLE LABELS command is plural even when you are just labeling one variable. If they made your instructor king for a day, the command would run just fine with or without that last S; however, his promotion to regality does not appear to be forthcoming.

  2. Note where the quotation marks go. SPSS uses a single quotation mark, but you instructor always quotation marks, which allows him to “Put single ‘marks’ inside of a label”

  3. Notice also that each line begins with a command and ends with the required period. There is only one period per full command.

  4. Notice that you have to tell SPSS which variable you want to label. In this case: WhitePoodle

  5. The EXECUTE command tells SPSS to go ahead and do this work now rather than waiting until another command is run. This is a leftover “feature” from when computers used to be SLOW and no one wanted to sit around waiting for the program to label variables until those variables were actually going to be used in some type of analysis.

  6. Finally, if you don't select and run the full block of code, the variable won't be labeled.

  • Building on the example from above, we would need to label the new variable's response categories:

     VALUE LABELS WhitePoodle
      1 "White poodle"
      0 "Not a white poodle"
     .
     EXECUTE.

What to check if you can't get SPSS to create the response labels:

  1. You can't have extra lines in the command. 

  2. Folks accidentally substitute VARIABLE LABELS from above instead of using the VALUE LABELS command.

  3. Folks labeling a variable try to use VALUE LABEL without the final “S,” in which case SPSS doesn’t recognize the command.

  4. Folks forget to put at least one space between each number and it's label.

  5. Folks add an equal sign (e.g. 1 = “yes”) or forget the quotation marks for each label.

  6. Folks add an extra period after WhitePoodle, despite being mid-command.

  7. Folks forget to include the period after the last label. Your instructor typically put the period on a separate line so it is easier to copy and repurpose old code with more or fewer categories without forgetting to add the required period after the last label. 

  8. Note that the EXECUTE command and its period need to be there.

  9. And, finally sometimes people create all of the code correctly but forget to select and run it.


Week 10

In Monday's class, we will finish up the workshop on coding variables and you may have some lab time to work on an SPSS assignment on BlackBoard. SPSS Assignment #2 will be due at the end of the week (i.e., by Friday evening).

Note: As of Monday's class, we are two weeks out from your Unit 2 test. Make sure to take a look at the study guide if you have not been following along all unit!


Topic 10 (Wednesday, October 23)—Descriptive statistics for a variable: Frequencies, medians, means, and standard deviations
(aka, just enough univariate stats to make sure that your variable coding has been done correctly and that you can describe a variable's central tendency and its distribution)

  • Read closely, this appendix from a best-selling book targeting a non-specialist, general audience (Richard J. Herrnstein and Charles Murray, "Statistics for People Who Are Sure They Can’t Learn Statistics," 12pp).

  • Read closely Chapter 7, "Getting Started with Quantitative Data: Descriptive Statistics." in Carolyn Forestiere's textbook. After the classes that meet on the substance of this chapter, you should be able to explain, apply, and give an example of every concept in the glossary.

  • Make sure to review the summary of research example by Ceka and Magalhaes (you do not need to scrutinize the summary of Park and Shin's article). In addition to paying attention to how different variables were coded in the study, review the questions in the study guide, which ask you to think about this study's research design. Specifically, see if you can identify the theory, dependent variable independent variable, intervening variable (which is assessed with three country-level measures), and controls.  

  • For each of the statistical techniques we cover in this class, your instructor will summarize the key ideas that you need to remember in a block of material, like the one you see immediately below. Review these blocks of material carefully until you understand every point well. The reason that I am summarizing this material in the schedule is so that you can go to one place to review every major statistical concept. These same ideas are covered in class, in PPTs, and (for the most part) in assigned readings.

    A
    fter class and completing your assigned readings, make sure that you feel comfortable with your understanding of a number of basic the concepts and methods that researchers use to explore the "central tendency" and distribution of different types of variables. These concepts will come up continuously for the rest of the term. Here are some big takeaways that you will want to remember:

    • The type of variable you are analyzing matters. Means, medians, and standard distributions communicate valuable information about interval (aka "continuous") variables. With an interval variable, every one-unit increase is assumed to be equal in influence. When that it not the case, interval variables are often modified so that a one-unit increase is roughly equal in importance. For example, researchers typically substitute years of formal schooling with an interval variable that looks something like this: 1=no high school degree; 2=high school degree; 3=some college or 2-year degree; 4: 4-year college degree; 5) Masters degree, 6: Doctoral degree.

    • Standard deviations are a widely used measure (in all fields) to look at the distribution of an interval variable across its range of values.

A straightforward way to explain a standard deviation (SD) is to say that it is the measure of how far away from a variable's mean most respondents' answers are scattered.If a variable is normally distributed, two-thirds of all cases will be within one standard deviation of the mean, a range that is approximately a third below to a third higher than the average respondent. And almost all respondents fall within two standard deviations below or above the mean. Specifically 95% of observations in a normal distribution fall within two standard deviations. An observation that is three standard deviations below or above the mean would be an extreme outlier (i.e., in the bottom .3% or the top 99.7%).

Standard deviations are useful for comparing the distributions of two populations on the same variable. So, if we have two cities with the same mean and median income but very different standard deviations that vary by tens of thousands of dollars, the city with the highest standard deviation also has a higher degree of inequality. In short, a small standard deviation means that most observations are clustered around the mean, while a larger standard deviation means that more observations are further away from the mean.

Standard deviations also are useful for comparing observations on two different variables. For example, we might say that someone whose LSAT (law school admissions test) was 1.7 standard deviations above the mean is a better test taker than someone whose score or the GRE (the exam many graduate schools require) was 1.5 standard deviations above the mean.

Why use standard deviation statistics when you could use percentiles? The reason why social scientists, economists, financial planners, and lots of other professionals use standard deviations to examine distributions of data is because just looking at percentiles can be misleading. For example, someone with an LSAT score of 150 (out of 180 possible points) is in the 39th percentile of takers. Just a 3-point higher score puts a test-taker in the 50th percentile just three points higher than that (a 157) puts the taker in the 61st percentile. So, at the center of the distribution, a 6 point gain in the LSAT score corresponds to a 20-point gain in percentiles. In contrast, going from a score of 170 (96th percentile) to a 176 increases a test-taker percentile by just four percentiles (to the 99th), Put another way, a three-point higher or lower score

    • When an interval variable's distribution is highly skewed, researchers typically cap the highest or lowest values so that the mean and the median are closer and so that the mean doesn't distort what the typical respondent looks like. For example, income in the US has a large, positive skew because a relatively few number of individuals earn a tremendous amount of money compared to everyone else. This means that the mean income in the US is a lot higher than the median unless we cap the top level of income by, let's say, creating a top-income category of $200K or more. When we truncate the top income value by grouping folks like Bill Gates, Donald Trump and Jeff Bezos together with much less affluent, but still quite wealthy people, the average income shifts back closer to the median income, and the standard deviation decreases. Most research on income uses medians rather than means; however, most statistical analyses incorporate means, so researchers frequently need to modify their original variables to address these kinds of skews.
    • The means and standard deviations of a categorical (aka nominal) variable are useless because the values assigned to a given category are arbitrary for this type of variable.  We can use a frequency table or chart to explore the distribution of a categorical variable, but for statistical analyses, this type of variable typically is recoded and analyzed as a series of separate dummy variables. For example, if you have party variable with three categories, a mean of 1.3 doesn't mean anything, so you'd want to create three dummy variables, one for each party. Remember, a dummy variable is coded 1 if a respondent is in the group and 0 if they are not.  If your analysis would benefit from also summarizing the distribution of a categorical (also called "nominal") variable, you will want to make a separate table or figure reporting its "frequencies" (in percentages) because the mean and standard deviation of a categorical variable does not communicate useful information for these types of variables.

    • Remember how to interpret the means of dummy (aka "binary," "dichtomous," and "0/1" variables. By convention, we report the standard deviation for each dummy variable, too, even though the SD information for a dummy variable is not useful. By itself, the mean value for a dummy variable reports its distribution in your sample; e.g., a value of .37 for the dummy variable Democrat indicates that 37% of the sample identifies as a Democrat.

    • What do researchers do to summarize the main characteristics of an ordinal variable? While there are statistical techniques designed for independent and dependent ordinal variables, most analyses recode ordinal variables into a dummy variable or treat them as an interval variable. The most common type of interval variable is a Likert scale. 


Topic 11 (Friday, Oct. 25)Computing, interpreting, and comparing descriptive statistics for more than one variable in SPSS.

  • Remember that SPSS #2 on BlackBoard  is due by 5pm. If you have a unique situation related to how SPSS is working on your computer, please contact Dr. Setzler after you have carefully tried to address the issue yourself and carefully reviewed the instructions/screencast links above. It is better for all involved for you to submit a late assignment (that will receive a modest penalty) rather than one that is partially completed.

  • Ahead of class, print out this handout--the third of three that your instructor has written to cover all of the SPSS and statistical methods we use in PSC 2019 and PSC 4099.

  • Ahead of class, reread five pages of Mark Setzler, "Did Brazilians Vote for Jair Bolsonaro Because They Share his Most Controversial Views?" Specifically, print out and review:

    • Page 1, the study's abstract

    • Pages 4-8, which explains how the study's variables were coded and includes two figures

You are being asked to review these materials (again) because they provide and example of how survey dataset variables are recoded so that researcher's hypotheses can be tested. The two figures provide examples of how data-splits and frequency results can be used to visually test arguments. We will be doing this kind of work in class

  • In class, we learn about and use SPSS to practice one of the techniques researchers use to examine and visually demonstrate whether two variables are associated with one another. Specifically, we will be splitting a dataset by independent variables and then computing means or frequencies for dependent variables

You soon will be reading Chapter 8 in  Forestiere's textbook, which covers the topic of "Bivariarte Analysis." While the statistical tests and the methods discussed in that chapter very useful, usually the best way to begin analyzing whether there may be a relationship between two or more variables is to see how variations in one or more independent variables correspond to different means or proportions for a dependent variable. This procedure looks only at the data that are in our survey and does not involve any statistical tests to determine whether any differences between groups we see for our sample and the larger population.

For example, if we think that there probably is a relationship between gender and a person's income, we can use SPSS's command Data -> split file -> compare groups (and select the variable gender, which in this example we have coded into a binary measure of males and non-males). Once we have "split the file," we will see statistical results for males and then non-males every time we run a statistical procedure. Each time, we will get two sets of output because there are the two values for this variable. So, if we now run a descriptives command for our variable measuring household income, the results will tell us what the mean income is for males and what it is for non-males.

If we have several independent variables (e.g., perhaps we want to see how household incomes vary for women and men, Republicans and Democrats, and people who are under 40, 41-65, and over 65 years of age, we can repeatedly split our data by the relevant variable and then run descriptives tests. We can examine the different means in a table, but for a presentation or paper that considers the differences for several different groups, it will look best if we compile our results in a chart/figure.

Week 11

Topic 12 (Monday, 10/28)Comparing means and frequencies in bar charts

  • In class, we will be visually summarizing frequencies and descriptives statistical results in spreadsheet barcharts. You have no assigned readings for Friday. We will be using statistical results to make the kind of bar charts you see in the Brazilian election article you were asked to reads parts of ahead of Wednesday's class.

    After class, if you need additional guidance on creating Excel charts to visually summarize SPSS reults, you have the option of watching this screencasthttps://youtu.be/T6kHpZ2oReQ. It shows you how splitting data and calculating the means for several different variables will allow you to make a nice-looking chart in Excel to show your results. Making this kind of a chart is a task you will need to do for your next BlackBoard SPSS Assignment.


SPSS test 1 (Wednesday 10/30) . The test will be taken in-class, on BlackBoard)  . The test will require you to use SPSS to drop some observations based on how people answered a specific question. You also will be asked to create a smaller version of a dataset by dropping a couple of variables. And you will be asked to recode and create a few variables, including new variables that draw information on two or more other variables (e.g.m to create a white, female, Democrat dummy variable). You will show that you have created and labeled new variables correctly with frequency tables. You will also be asked to generate and interpret descriptive statistics for a variable and its subgroups (i.e., to split the variable into groups and run frequencies or descriptives). The test will make up 5% of your final course grade


Unit 2 test: (Friday, 11/1). The topics covered on the exam are noted in the study guide (i.e., the Focus Questions handout that has been in the PPTs folder since the start of the unit). You will not be using SPSS or interpreting any SPSS output as part of this exam. You will be asked conceptual questions related to the use of SPSS and other statistical software.