RESEARCH METHODS IN POLITICAL SCIENCE


COURSE HOMEPAGE     STUDENT RESOURCES



Please note: At your instructor's discretion, there may be minor alterations to the reading assignments listed below. One of the major advantages to providing you with an on-line readings archive is that timely articles can be added or substituted when appropriate. Opening documents downloaded from this website will require that your computer have Acrobat Reader . You will also need the class-specific password to open individual files.

UNIT 2 ASSIGNMENT SCHEDULE

Links to helpful resources:

 

Survey-related materials were introduced before Exam 1, but not covered on it. They will be covered on the Unit 2 test:


Topic 5 —Surveys and samples--studying a relatively small number of people to make accurate inferences about much larger "universes”

For Wednesday (9/17):

  • Read the rest of Chapter 5 in Carolyn Forestiere's Beginning Research in Political Science (i.e., read only the several-page section on Sampling and Generalization (including the sub-sections).

  • Read the very short Chapter 6 in Carolyn Forestiere's book. Focus on the what makes a study measure (i.e., a variable, which may also be called an "indicator") reliable and valid


For Friday (9/19). It may seem like a lot of readings are listed here, but these collectively add up to less reading than a textbook chapter. Each piece is the equivalent of a short textbook chapter section. Use the focus questions to guide your reading.

    • I assign this short document so you can see best practices in survey composition. The ATP is a meticulously created stratified random sample panel of respondents who are compensated to stay in the survey over time. 

    • The document explains how this sample was designed to ensure that every American had the same probability of being asked to participate. What type of random sample is the APT (hint: the full panel is a stratified random sample with some oversampling) What's that mean)? How were individuals contacted to participate? How big is the sample? What was the margin of error for the sample (at a 95% confidence interval)

    • What parameters were used to ensure that this sample closely matches the characteristics of the nation as a whole?

    • What did Pew do to incentivize selected individuals to participate? What percentage of contacted people take the survey?

    • Which subgroups were over-sampled? Specifically, Pew invited extra individuals from several subgroups whose members are less likely than other individuals to participate in surveys or because they wanted researchers to have access to a subgroup samples that more accurately reflected these subgroups overall.

    • How do Pew analyses of their surveys deal with the fact that even the most carefully created random samples typically do not mirror their target population exactly because different types of people are more or less likely to accept invitations to participate. To deal with this issue, researchers typically apply post-hoc (i.e., after-the fact) weights that use parameters from the Census or other massive, high-quality surveys. Specifically, researchers' data analyses typically under-weight or over-weight the responses of different types of individuals based on whether their were too few or too many respondents of each type in the sample relative to their share of the larger population. In Pew's case, they used population parameters to determine who would participate in their survey and--because they purposefully oversampled some groups--they also use post-hoc weights.


Week 6

Monday, September 22: In-class Exam 1. 

Topic 6 (Wednesday and Friday, Sept. 24, 26, after the test): Is political polling getting better over time? Can we still trust surveys to determine what Americans think about contentious issues?

  • Stef W. Kight and Margaret Talev: "Study: What Americans really think" (Axios 2022, 3pp.)  This article uses the phrase "self silencing" to refer to what social scientists refer to as "social desirability bias." What is a list experiment? How does it tackle the problem of this type of bias? While we know that people don't always admit to their prejudice against other group, what does this article (and data in class PPTs) say about how widespread social desirability bias is?

The next several articles discuss how political polling has improved over the last several elections. It looks like a lot of reading but together, it adds up to less than 15 pages.

  • Geoffrey Skelley, Why Was The National Polling Environment So Off In 2020? (FiveThirtyEight, 2021, 3-4pp). How far off are polls from estimating actual turnout and vote choice in the typical election? What is non-response bias? This article doesn't talk about it, but what are some of the factors that might cause the supporters of Donald Trump to be more or less likely to take a survey than Democrats, independents, and even some Republicans who don't support him?

  • Andrew Fischer, "Polls Were Great in 2022. Can They Repeat Their Success" (NY Times, 2pp). What are polls doing to get better? How has adjusting polls to make sure that they have an proportionally accurate share whites and non-white individuals without a college education and respondents who voted for Trump in the previous election improved polls' predictions. 

  • You might be interested in reading more (i.e., this is optional, but only because it is so detailed) about how accurate political polling typically is and whether there is a political bias to polling (when large numbers of surveys are combined, there typically isn't, especially if you weight for "house effects," which fivethirtyeight.com does): https://fivethirtyeight.com/features/2022-election-polling-accuracy/



Week 7
Topic 7—An introduction to some important tools of the trade: Statistical analysis programs, scholarly research resources, and survey datasets

Monday (9/29): Downloading and configuring SPSS.  In class, we'll likely finish up the last block of material on survey research. You won't have any new readings assigned for the day because I want you to focus on getting a statistical program loaded and configured so that we can start using it this week.

  • Important: Over the weekend, ahead of Monday's class, please try to get SPSS (the statisical analysis program we will be using a lot for the rest of the semester). We will likely will use SPSS for the first time on Wednesday,

Ideally, we would all collectively download, install, and configure the software you will need for the rest of the term together in class. However, due to slow/clogged bandwidth issues, I have found that trying to do this work together has not worked well at all.

  • Download and install SPSS on your personal computer (you need to be working on a Mac or PC). Why do you want to put this program onto your own computer? While you have access to SPSS on some university computers, you will find it much more convenient to run SPSS on your own laptop. HPU has a site license that will allow you to use an individual-version of the program on your own computer all semester at no cost.

In the unlikely event that you already have a SPSS and a license on your computer from a previous course, uninstall that program first. You want to start with a fresh install and new license so that you have the latest version of SPSS and so your program license doesn't expire during the the semester.

To install (or need to reinstall the program at some point after class), here are the instructions to download a copy of SPSS and then a 6-mo license from HPU’s IT office:

How to Install SPSS 30 for Windows
https://highpointuniversity.service-now.com/help?id=kb_article&sys_id=3d24327adbab44d02bd0f1396896195b

How to Install SPSS 30 for Mac
https://highpointuniversity.service-now.com/help?id=kb_article&sys_id=c3e084631bae481015324000cd4bcbeb

For both PCs and Macs, you download the program first and then the enter the license code. The fastest way to do this is to open the text file with the license code and copy that user-specific code before you download and install SPSS.

When you install the SPSS program, you will have to activate the license. When you get to this point in the installation process, check the option indicating that you have an "individual user" license. After a couple of clicks, you will paste your license code in (i.e., the one you copied earlier) and click the "add" button. From there, you may have to click through some more OKs to complete this process.

Once you are done installing SPSS, make sure you can open and run it. To do so, you will need to find and open it. It likely won't be on your programs dock/banner initially. For PC users, you can quickly find and open a program by using the search bar on the bottom banner. For Mac users, you can quickly find a program by clicking on the Spotlight icon (magnifying glass) in the top right-hand corner of your computer. To make things faster in the future, you could create an alias for your desktop

Once you have followed all of the directions above--including uninstalling and reinstalling SPSS if you previously had it on your computer, let Dr. Setzler know by email message if the program will not open. and run. For me to assist, I need to know what is happening--are you getting a specific error message? Also, I will need to know if you are using a PC or a Mac and what operating system you are using (i.e., Windows 10 or 11 for PC users or one of these for Mac users).

  • Mac users only: You need to quickly complete a set of one-time changes to SPSS for Mac's default settings so that your version of SPSS will look and act just like SPSS for PC. Important: You aren't changing any other program or your OS, just your SPSS settings. Open SPSS for Mac in its default mode, selecting the option to open "a new dataset," which opens a blank data editor window. From there, follow this link for instructions on making the necessary changes. Follow the written instructions, especially those that are highlighted: If use the options show in the picture of the options page, you will NOT change what needs to be changed.

Why are Mac users (only) being asked to change the program's defaults? Inexplicably, the default layout of SPSS for Mac looks quite a bit different from the PC version, and the default Mac settings lack SPSS options that are described in most textbooks, my screencast tutorials, and most instructional materials you will find online


Wednesday and Friday (10/1 and 10/3 will be spent reviewing some of the places you can find survey datasests that can be downloaded at no charge. 

    • Take look at a couple of the most recent stories posted by Pew to get a sense of how the organization tells folks what they are doing with the public opinion surveys they are administering. Not every story focuses on public opinion data, so find a couple of those.

    • Next, go to the tab on top banner labeled "Research Topics," and then select the link for "Full topic list." Take a look at a couple of topics that seem interesting.

    • Now, go to tab on the top banner labeled, "Tools & Resources." Look at the resources for "Survey Question Search." Try a practice search in "International Questions." For example, check the box that will show results only from surveys that have been administered in Brazil and then search this term: polit (the partial word search will survey questions that included either political or politics).

    • Now, go back to tab on the top banner labeled, "Tools & Resources." Look at the section labeled "Dataset downloads." This is where you can find and download the public-use datatsets noted in many of the Pew articles you see on the website. This is a great place to start if you are looking for data on specific groups or have no idea yet what you'd like to write a research project on. If you are interested in doing work on foreign countries, for example, you'll want to begin looking in the "Global Attitudes" unit.

    • IMPORTANT: Pew typically embargoes the release of new survey data for a least a year so its researchers can publish findings first. This section of the website will be the best place to start looking for a thesis topic using Pew data because the press releases in this section all refer to datasets that already are available. If you are most likely to do a topic on American politics, the ATP (American Trends Panel) data is going to be a good place to start looking for a topic.

  • Before class, quickly familiarize yourself with the work of Pew's research units that focus on specialized topical areas (these are the topic areas Pew uses to organize the dataset in the "Dataset Downloads" section that you looked at earlier. Take quick look at these Pew Center websites to get a sense about what kinds of topics the various research units at Pew are looking at:

    Pew Research Center for the people and the press (domestic studies on issues other than those tackled by Pew's special units): https://people-press.org/

    • Data for other Pew research units, can be obtained by following this path: Tools and Resources -> Dataset Downloads -> Select Research Area

  • Ahead of class, you do NOT need to review materials at the other research centers listed below, but be aware that there are many other places where you can locate high-quality, publicly available datasets at no cost. See some of these resources at: https://marksetzler.org/generalissues/SurveyDataSources.html. Students in advanced political science classes have used datasets from a wide range of sources.


Week 8: No classes, midterm break!

Week 9

Topic 8—Some SPSS basics and hands-on practice: Preparing survey datasets for analysis

Monday 10/13

This week will have very little out-of-class reading so that you can use this time to review screencasts on what we are covering in class if you think a review is necessary. In class, we will continue getting introduced to SPSS and learn the basics of "data set preparation," which is the process of downloading survey data and then taking, if necessary, a few additional steps to make sure that we can analyze just the data we want.

  • In Monday's class, most of our time will be spent downloading and taking a preliminary look at a 2023 dataset from PRRI (which we will download in class). We will also make sure that we download the survey's methodology and original questionnaire, taking a close look at both.

PRRI is a nonpartisan think tank that annually administers its Values Survey, which includes a large number of interesting topics (for example, whether or not an American thinks their top leaders need to be moral in their private lives to be effective in the their public roles. We will be looking at this particular dataset because it covers so many interesting topics. For political science and international relations majors who want to begin thinking about the research design they will submit at the end of the term, this may be a good source of data for your project.

We will discuss how we can use SPSS's Frequency command (Analyze -> Descriptive Statistics -> Frequencies) to see how Americans are divided on various issues.

We will also learn how to use SPSS's "Split File" command [Data -> Split File -> Compare groups (and then identify the variable whose response categories you want to compare)]. For example, we can split our dataset on the partisanship variable and then a frequency to see how Democrats, Republicans, etc. differ in how they answered the questions in the survey.  This approach is a very quick and simple way to get a first look at how an independent variable (for example people who identify with different races) corresponds to differences in a dependent variable (for example, the party they identify with). Important: After you have compared your groups of interest, make sure to go back to the split file and select the option to analyze the full dataset; otherwise, SPSS will continue to report all statistics for the subgroups as long as the program is running.


On Wednesday (10/15), we will go over the steps necessary to drop observations and variables from a dataset.

  • In class, we will continue working with the PRRI Values Survey from 2023, which we will download in class. 

  • Ahead of class, please read through the first part of this document (a handout I have prepared so you have a quick reference) up to, but not including, SPSS Basics 5: Recoding Data. This handout covers EVERYTHING we will be doing in class on Wednesday and Friday.

  • During class, we will be looking more closely at how the "syntax" window works. This is where we either manually enter or use SPSS's point-and-click features to write code telling SPSS to do something. Specifically, we will be learning how to use sytax to "prepare a dataset for analysis," which usually means removing variables and perhaps observations that are irrelevant to a study.

  • After class, watch this screencast only if you need more guidance on how to create a smaller version of an SPSS dataset that keeps only some of the variables (and also how to recreate that dataset if you later discover you omitted variables you shouldn't have). The screencast runs a little over 14 minutes.

    • Important: I left two important things left out of screencast. First, you need to make sure to save your syntax so that you will able to run again it in the future if you want to add additional variables from the full dataset. Name the syntax file something like: "Syntax to keep only some variables."

    • Second, you should add a note (starting with an asterisk) at the top of the syntax reminding your future self that this syntax needs to be run only when the full copy_version of the dataset is open in SPSS.

    • So why do you need to know how to drop most of the variables from a dataset? If you don't take a few minutes now to learn how to create a truncated version of a dataset, every time you generate statistics for a project, you will have to scroll down through large amount of irrelevant information. And, if you are working with a dataset that has dozens of similarly named variables, working with the full dataset increases the likelihood of making mistakes in your coding and analysis. 

Key ideas from the screencast, so you shouldn't need to watch it more than once (if at all):

You can create the necessary code to make a small version of your dataset in two steps:

Step one:

  • Open the copy_ version of the full original dataset. 
  • If you are using a Mac you need to have previously configured SPSS for Mac so that SPSS works the same as it does on a PC.
  • Use the FIle -> "Save As" command, and click your way to the destination folder where you want to save a smaller version dataset.
  • Do not hit the ok button--after you give this file a new name, you are going to use the Paste button to paste the code into a syntax file. We will add an additional line to this code in a minute. If you are using a Mac and do not see a paste button, you need to verify that your version of SPSS is configured as explained in the schedule second related to installing SPSS.
  • Rename the file being saved so that you know it is a smaller version of the dataset. Dr. Setzler typically keeps the name of the original dataset, but adds *small_to the front end of the file. His system is to have original_, copy_, and now small_versions of the dataset.
  • Again, after you have renamed the file, use the Paste button to paste the code into a syntax file.
  • The last step will automatically open up a new syntax file and paste a command telling SPSS to save the small_version of the file. If you select and run that code, it will save the full dataset, and that's not what you want. You are going to want to tell SPSS to keep only some of the variables when you run that SAVE command. Before you modify your syntax to do that, take a minute and add a notation (starting with an asterisk so SPSS will grey out your note and not try to run it) explaining to your future self what this code does. This is a good time to remind yourself in the note that, "In order for this code to run in the future, you need to have the full copy_version of the dataset open." 

Step two:

  • Now, you want to manually change the syntax you just pasted, telling SPSS to save the _small file with just a subset of variables. To do this, type out  "/KEEP=" as a subcommand and then list of variables to your save command. The line of code that you add should look like what I have bolded here:

SAVE OUTFILE='C:\Users\msetzler\Google Drive\SeniorSemPrepDataset\AmerBarUS2018_working.sav'
/KEEP = Var1 Var2 Var3

/ COMPRESSED.

*Note that syntax lines that begin with a forward slash are subcommands. The ONE period for the full command goes at the very end of the command, which is after the last subcommand in this case.

  • Select just the SAVE code block (i.e., from SAVE OUTFILE through COMPRESSED.) and tell SPSS to run the selection. You can do this by green arrow in the commands icon options or by right-hand clicking and selecting run selection (on a Mac, holding the control button down while doing a touchpad tap is the equivalent of a right-hand mouse click).
  • Verify that you now have a _small.sav file in the folder where it is supposed to be. Open it up and make sure just the subset variables you need are there.
  • If all looks good, add two notes to the top of your syntax. Start notes in your syntax with an asterisk to SPSS will grey out the note and not treat it as code. SPSS sees a period as your indication that the note has ended. The first note should be a reminder to your future self that whenever you run this code, you will need to have the copy_ version of your full dataset open, and you will open this syntax file from within that dataset (File -> Open-> Syntax). The second note should be a quick reminder to yourself of what this syntax does.
  • Once everything is working and you have annotated your syntax, save the syntax file to the same folder as your datasets. Name it clearly so you can find it again in case you need to re-run it in the future.
  • These steps are important because you may need to add or drop additional variables in your small_ dataset at some point, To do so, all you will have to do is delete the current small_ dataset file, change the syntax as necessary to add new variables you want to keep, and rerun it. As you saw in the screencast, this quick step will recreate a modified small file that has only the variables you need.
  • If all looks good, close the syntax and all of the open datasets. Do NOT save any changes to the copy_ dataset if asked to do so. Going forward, you will be working only with that small datset.
Pro-tip from the screencast:
  • When editing the Keep= line, rather than copying variable names individually or typing them in manually, it is a lot faster and way less error-prone to create a list of variables to keep using a point and click command: Analyze->Descriptive statistics ->Descriptives->and then select all of the variable names you want to keep. Then, paste your descriptive  command into your syntax. Select just that code and run it (to run it, hit the green arrow button). 
  • If the output for the variables looks good (i.e., you got the right variables listed), then copy and paste the descriptives command's list of variables into your /KEEP= code line. 
Another pro-tip from the screencast:
  •  When you are in the Descriptives selection window (or any command listing all of the variables), it is a lot faster to find the variables you are looking for if you right-hand click on the variables (Mac users need to press the command key and click at the same time) to first show variable names and then to sort those names alphabetically. This trick works in all of the menu's dialog windows where there is a long list of variable labels listed in whatever order they appear in the dataset.

  • After class, watch this screencast only if you need more guidance to prepare a dataset so that you will analyze only a subset of a survey's respondents: It is about 17 minutes long, so you may want to try following the written instructions below first. Why do you need to know how to do this? For example, when international relations majors are using a Global Attitudes or a LatinoBarometer survey, they often only want to look at respondents from certain countries, and running statistics on the full dataset would return results for all of countries in the survey.
Key steps noted in the screencast:

(1) Open the copy_ version of your dataset

(2) File -> open -> syntax... and open the syntax file you used to create the small_ version of your dataset (the one where you have removed many variables).

(3) If your small_ dataset does not have all of the variables you will need to tell SPSS to keep only certain types of respondents, add those variables to your syntax and re-run it to recreate your small_ dataset.

(4) At the bottom of your syntax, make a notation, starting with an asterisk, indicating which types of observations you will be dropping

(5) With the copy_ version of the dataset open and the small_version closed, Point-click-paste: File -> Open -> Data -> and select but do not actually open the small_ version. Use the paste button to tell SPSS to paste the "GET" command into the bottom of your syntax. That syntax will tell SPSS to go get and open the small_ version of your dataset when run. The reason why this step is important is that you are going to be automating an SPSS command to get, open, and then change the small_version of the dataset rather than your full copy_version.

(6) Select and run the GET command you just pasted. Doing so will open the small_version of the dataset, And then close the copy_ version of your dataset. Again this is to make sure you don't change and save changes to the full version by mistake when you need to be modifying the small_ version only.

(7) Now, you can create the code that will keep only some observations. Use point-click-paste to work your way through the Data->Select Cases command. You will need to enter logical criteria specific to your needs in the "Select If" module and then hit continue. You then will need to tell SPSS to "Delete" all of the "unused cases." Remember to paste this code into your syntax.

(7) Now you need to automate the process of saving your small_ dataset everytime this full syntax file is run. To do so: Point-click-paste: File -> Save AS-> Data -> and select but do not open the small_ version. Paste this code into your syntax.

(8) Select and run the "Filter off" code all the way down to the end where you save the dataset. Take a look at the small dataset and make sure that it only has the variables and observations it is supposed to have.

(9) Save your revised syntax to write over the first version. Now, if you need to make changes in the future, you will just open this syntax from within the copy_ dataset,, make your edits, and re-run the syntax.



If all of this seems like a lot of work, you don't have to automate things as I suggest above and in the screencast. Here's a faster way:

(1) Make a copy of the small_version of your dataset. If you accidentally delete the wrong observations, you want to have a backup. You will not need this backup unless you delete the wrong observations and then save over the small_version of the dataset, too.

(2) Now, open the small_ version of your dataset that has most of the variables removed.

(3) Create and run the few lines of code needed to remove observations from the small_ dataset. To do so, go Data -> Select cases -> Select if. Then identify the conditions that need to be met for a variable to be kept. In the screencast, variables are kept if (partyID7 =1 OR partyID7 =1 OR partyID7 = 2 OR partyID7 =3 OR partyID7 >4. If all looks good, select the button telling SPSS that you will delete unused cases AND paste this code into syntax.  

(3) Run your code and then use a frequency command to look at who is in your dataset. If you have removed all and only the observations you wanted to remove, save the edited small_ dataset, replacing your previous version. This version has fewer variables and fewer observations than the copy original dataset.

(4) If you remove observations this way, make sure to include annotations at the start of your your new syntax to remind your future self what this block of code does. 

(5) If you need to add additional variables to your small_ dataset in the future, remember that you will need to run this second syntax file to remove the relevant observations and save the file again.   

Here's an example of what your syntax would look like if you wanted to only include respondents living in France, the UK, and the US (assuming the variable "country" is coded 17, 2, and 4) for French, British, and Americans, respectively:

*This code will drop any respondent who is not from one of those three countries.*
FILTER OFF.
USE ALL.
SELECT IF (country = 17 or country = 2 or country = 4).
EXECUTE.

*Here is a second example, but this time for the select line, let's use "AND" to tell SPSS to keep observations only if they meet two conditions (here, they are male Republicans):
SELECT IF (Party=2) AND (Male=1).

If we wanted to keep only males who were Republicans or Democrats (but not independents or other responses), the select line would look something like:
SELECT IF (Male=1) AND (Party=1) OR (Party=2) .

Important: If we have a key variable in our study and many respondents were not asked the relevant question, we want to delete all observations in the dataset that have "missing" data for that variable. If we don't do this, descriptive statistics for all of the other variables will analyze lots of respondents who will not be included in bivariate and multivariate analyses. In this example, I want to remove all respondents who did not answer a question about whether they support making public colleges free to attend because the relevant question was presented to only half of the people taking the survey:

*This code drops any respondent who has missing data on the variable "free_college_good"
FILTER OFF.
USE ALL.
SELECT IF (NOT MISSING(free_college_good)).
EXECUTE.


  • If we have time on Wednesday, you will start practice what you have learned about SPSS so far by beginning to complete a Blackboard assignment in class. 
    • SPSS assignment #1 will require you to locate and download an original PRRI dataset, make a copy. version of that dataset, and then make a small. version that drops certain variables and observations. You also will need to split the dataset into groups and practice running descriptives and frequency commands. The assignment will be written so that most students should be able to complete their work in class; however, I will post it after class on Wednesday in case you want to get a head start.  

    • Barring unusual circumstances, SPSS assignment #1 (BlackBoard) must be completed by 5pm Friday (10/17). If you have a unique situation related to how SPSS is working on your computer, please contact Dr. Setzler after you have carefully tried to address the issue yourself and carefully reviewed the instructions/screencast links above. It is better for all involved for you to submit a late assignment (that will receive a modest penalty) rather than one that is partially completed.


Topic 9: Recoding and creating variables; labeling variables and their response categories.

I will post instructions and assignments after Wednesday's class, but am holding off for now so that you don't have to wade through lots of detailed instructions that will serve as your class notes for Friday and Monday. 


Topic 9 (10/18 and 10/21) Recoding and creating variables; labeling variables and their response categories

On Friday and Monday, we will learn how to create variables in SPSS. You will be completing an assignment on this topic on BlackBoard. It will be due late next week. 

  • Ahead of Friday's class, quickly read Chapter 7 in your textbook up to the section on descriptive statistics. This reading is just five pages; it will help you understand what we will be doing this week and why.

  • Also before class, skim the rest of this document, starting at the section "SPSS Basics 5: Recoding Data." This is the first handout of three documents that your instructor has written to cover all of the SPSS coding and statistical methods we use in PSC 2019 and PSC 4099. The handout is a handy resource that:

    • Explains why researchers typically recode or create every variable in their analyses rather than working with a dataset's variables in their original form. 

    • Summarizes everything we have done with SPSS so far, which mostly has involved the process of finding and downloading data and--if necessary--importing the data into SPSS, removing variables, and removing observations not related to our study in anyway.

    • Explains in detail how to recode an original variable. You will find this information helpful when you are completing the Blackboard assignment at the end of the week.


One of the main topics this block of material is how to use SPSS's RECODE INTO
command. Note: everything that is discussed below in this section will be taught in class. You have no assigned readings on this block of material other than reading through the points that are listed below. Any linked screencasts are optional; I have recorded screencasts that duplicate class material because some students prefer video summaries versus those I post in the course schedule.

  • The RECODE INTO command is your go-to method when are working with only one original variable and want to collapse its categories (perhaps into a dummy variable, like making the 0-1 variable Republican out of the original variable where Republican is one of several partisan groups coded onto the same variable). When we are doing data analysis it typically is necessary to recode every variable we will be using to addressing respondents who are missing data, said "don't know" to items, or refused to answer some questions.

RECODE INTO is also a command we can use to reverse code a variable's values (e.g., using Conservative7 to make the new variable Liberal7).

Here are the key steps to recoding a variable in SPSS:

(1) First, make sure that you are working with a copy of your dataset so that if you accidentally recode an original variable onto itself you can just make another copy of the dataset and fix the glitch. Your instructor typically keeps four versions of a dataset in same subfolder when he's working on a project: the original dataset, a "copy" of the full dataset, a "small" version of the copy with most variables removed, and a "working" copy of the dataset, which is where he recodes variables and computes new ones.

(1) Now, with the working version of your dataset open, open a new syntax file that will be used for all of your variable recoding with this dataset (or reopen it if you are continuing previous work).

(1) Next, use SPSS to run a CODEBOOK command on the original variable you are going to recode, in this example that is: Conservative7. To run a CODEBOOK command, you can either go Analyze -> Reports -> Codebook or--in syntax--type:
   CODEBOOK Conservative7.

(2) Next,  if it is available, it would be helpful to have the original variable's exact question phrasing pasted into our syntax so that you can easily recall how exactly the variable was phrased later on. It it also helpful to have your CODEBOOK results pasted into your syntax, so that you can more easily identify an issue if something is recoded backwards or one of the original response categories not carried over int the new variable.

How do you make notes in your syntax without causing SPSS to read those notes as code and stall out? You can write or copy and paste annotations into the syntax right before the RECODE INTO command. If you put an asterisk in front of an annotation with question phrasing pasted from the questionnaire, it will be greyed out so we can see it, but SPSS won't stall when it gets to that part of the syntax. SPSS will keep greying out language as long as their are no line breaks AND you don't click on the return and end a line with a period. So, when you are ready to end the annotation, use a period and click on return.

(3) Start to recode the variable by using SPSS's point-and-click interface, going to Transform -> Recode into New Variables. you want to use the RECODE INTO command which preserves all aspects of the original variable in case you make any recoding mistakes.

For this example, you will need to tell SPSS that you are going to transform the orginal variable HowConservative7 into a new variable named HowLiberal7.

In the same screen, you will provide a label for the new variable. Variable names cannot have any spaces. Make sure to label new variables in a way that makes sense and is specific enough that you will remember what a higher value is (e.g., don't name and label a variable "Gender" if you are creating a dummy variable for people who identify as male because you may forget who is coded 1 or 0 later on),

After you label you variable  the button "Change." Nothing will seem to happen, but the new variable will have that label when it is created late.

(4) Now, click the button for "Old and New values." This is the area where you do all of the actual recoding. Make sure that you think through how you need to recode. You need to recode every value in the original variable into some value (or system missing) in the new variable.

(5) It is best practice in recoding to start off by indicating that any value that was "system or user missing" in the old variable should be coded as "system missing" in the new variable. Choose those options and click on the add button.

(6) Now you need to recode the values. So 10 in the old variable = 1 in the new one, and click on the add button. And then 9=2 and "add," and so on until all ten of the values are reversed for the new variable.

(7) Once you have made sure to include instructions that will recode ALL of the original values into the new variable, click on continue and then paste. If you accidentally click on the OK, button, just point and click your way back through the transform command where all of the data you just entered will still be there, and this time click on paste.

(8) To create the new variable, you have to select and run the just code you pasted into syntax (run with the green arrow on the menu). Whenever you create a new variable, do what you can to make sure you did the recoding correctly. To do this run a frequency for the original and new variable and make sure that everything looks right. If you didn't code the variable correctly, look at your syntax and see what went wrong. If you need to edit the syntax, you can run the whole RECODE INTO command over again, and it will recreate the variable (this is a cool improvement over older versions of SPSS, which made you delete a variable before you could recreate it). 

(9)  Important: Whenever you make a new variable, you need to label both it and the variable's response categories.  The fastest way to label variables is to do so in syntax. If you are using RECODE INTO SPSS, you already will  have created the code to label your variable back in step 3,

However, you also will need to add value labels to any new variable you create. Unfortunately, SPSS does not have a way to point-click-paste the coding for response category labels, so you will need to create the coding manually (usually by copying and pasting sample coding and substituting in new, variable-specific information.

Here's what that code looks likes if we want to label just the anchors on our 7-point measure of liberalism (if we were working with a dummy, multi-category, or ordinal variable we would label each value):

    VARIABLE LABELS Liberal7 "How liberal is the respondent?".
    VALUE LABELS Liberal7
    1 "Very Conservative"
    7 "Very liberal" .

Notice that there are two periods for the two label-related commands. There is only a single period at the very end of each command, and it goes outside of any parentheses.

Note that if you want to change the variable label or any of its value labels, you can make the edits and just rerun the commands. This is one of the big advantages to working in syntax.

Finally, notice that once you have created the recode+relabel syntax for a variable or two, you can copy and paste that syntax and quickly swap out values and phrasing to very quickly create a whole bunch of similar variables.


W
e also will review the other technique most commonly used to change and create new variables in SPSS: Using a combination of COMPUTE and IF commands in syntax. Note: everything that is discussed below in this section will be taught in class. You have no assigned readings on this block of material other than reading through the points that are listed below. Any linked screencasts are optional; I record screencasts that duplicate class material because some students prefer visual summaries versus those I post in the course schedule.

  • If you prefer, you can try to use the COMPUTE command using SPSS's point-click-paste commands (see the Transform menu). However, it is a lot clearer and easier to write these commands directly into syntax.

  • You can use COMPUTE + IF commands to recode a single variable instead of using RECODE INTO. If you are making many similarly coded new variables, either type of SPSS command works well. To get you acquainted with using Compute and If statements, we will practice using them to recode a few variables in class. This technique will also be taught/reviewed as part of your BlackBoard assignment. After this week's classes, if you need a refresher, here is an optional screencast that goes over the technique (under 9 min if you listen at full speed). One important thing: After I recorded the screencast, SPSS changed how you deal with missing data on the original item. See the highlighted code below:

  • COMPUTE + IF commands are what you must use when you want to create a new variable that combines information from two or more original variables (For example, we might want to create a dummy variables to compare white male Republicans with non-white male Republicans and white female Republicans.

After class, if you need a refresher on what we will practice, here is an optional screencast that goes over the technique (9 min or so if you listen at full speed).

Here is a summary of what is covered in the screencast, only the summary uses an example (a dog's breed and color) that may be easier to follow remember:

Let's say we have a sample of dog owners who all own one dog. They have taken a survey asking questions about their dog. We want to create a hypothetical new dummy variable identifying the owners of white poodles. Our new dummy variable, WhitePoodle, will be created with information from two original variables: dog_breed and dog_color.

A CODEBOOK command indicates the following:

*dog_breed = 12 if a dog is a poodle, while values of 1 through 55 refer to other dog breeds.

*dog_color = 1 if a dog is white, while values 2 through 10, refer to black, brown, grey, spotted brown, etc.

*And, if a respondent didn't know or refused to give breed or color information about their dog, the relevant variables were coded 99 in the dataset.

*Here is the block of code we could use to create the dummy variable:

COMPUTE WhitePoodle = $SYSMIS. 
IF (dog_breed < 99) AND (dog_color < 99)  WhitePoodle =0.
IF (dog_breed = 12) AND (dog_color = 3)  WhitePoodle =1. 

*and then we would need to deal with any missing data on the original variables if there were any:
IF MISSING(dog_breed) OR MISSING(dog_color) WhitePoodle =$SYSMIS. 

*and finally we would need to add labels for the new variable as well as its the response categories.
 
  VARIABLE LABELS WhitePoodle "Dog is a white poodle".
  VALUE LABELS WhitePoodle
        1 "White poodle"
        0 "Not a white poodle" .
*Note where the periods are.

So the full block of code together would be:

COMPUTE WhitePoodle = $SYSMIS. 
IF (dog_breed < 99) AND (dog_color < 99)  WhitePoodle =0.
IF (dog_breed = 12) AND (dog_color = 3)  WhitePoodle =1. 
IF MISSING(dog_breed) OR MISSING(dog_color) WhitePoodle =$SYSMIS. 
VARIABLE LABELS WhitePoodle "Dog is a white poodle".
VALUE LABELS WhitePoodle
        1 "White poodle"
        0 "Not a white poodle" .


Note that COMPUTE, IF, VARIABLE LABELS, and VALUE LABELS are all commands, so one--and only one--period goes at the end of each full command even if that command stretches over more than one line of code.

Here is the logic behind how the code was written, step by step:

Step 1
To see how the responses for dog_breed and dog_color are coded and labeled in the dataset, we run a codebook command:
    CODEBOOK dog_breed dog_color.

Step 2
Now, tell SPSS to create a new variable where all of the values are blank (i.e., are "system missing"):
    COMPUTE WhitePoodle = $SYSMIS.
If we were to look at the data set in the SPSS data viewer (Data View tab), we would see a new column for a variable name "WhitePoodle." All of its values would be blank.

Step 3
Tell SPSS to turn almost all of those black values into a zero if the observation should not be system missing:
    IF (dog_breed < 99) AND (dog_color <99)  WhitePoodle =0
Notice this set changes the blank to a zero (meaning not in our group) for every respondent who didn't say don't know. Now, if we were to look at the data set in the SPSS data viewer (Data View tab), we would see most the values for "WhitePoodle" are zero.


Step 4
Now, tell SPSS to turn some of the respondent's values into a 1 if certain conditions are met:
    IF (dog_breed = 12) AND (dog_color = 3)  WhitePoodle =1.
This step turns all of the zeroes into ones for people who have white poodles.

Notice that the order of commands matters! If we had done Step 4 before we did step 3, everyone who didn't refuse the answer the questions would be coded as not having a white poodle 


Step 5
Now, make sure to tell SPSS to turn some of the respondent's values into missing data if they had missing data on either of the original items::
    IF MISSING(dog_breed) OR MISSING(dog_color) WhitePoodle =$SYSMIS.   

Step 6
Now, tell SPSS to label the new variable and its response categories. For example:
    VARIABLE LABELS WhitePoodle "Dog is a white poodle".
    VALUE LABELS WhitePoodle
    1 "White poodle"
    0 "Not a white poodle" .

Step 7
Run a frequency on the old and new variables to verify that general pattern and number of observations looks right. If you made a mistake, read carefully through the code. If you make edits, you can run the block of code over again; SPSS will delete the variable and replace it with the edited version

  • COMPUTE + IF commands also are the best way to create a couple of other types of variables that researchers frequently use, including reverse coding variables with lots of response categories. While you can use RECODE INTO to reverse code a 10-point measure, it is a lot faster to use code like this:

COMPUTE New_Variable10 = 11 - Old_Variable10.
IF Old_Variable10 = $SYSMIS   New_Variable10 = $SYSMIS.
IF Old_Variable10 = 99   New_Variable10 = $SYSMIS.
IF MISSING(
Old_Variable10)   New_Variable10 = $SYSMIS.

*Notice the COMPUTE line's logic. We're telling SPSS to create a new variable by subtracting the     original variable's value from 11, which is one higher than the maximum value
. This means a 10 on the old variable will be a one on the new variable while a 1 will become a 10, and so on.

*Notice also, that we have to tell SPSS which variables should be coded as missing responses on the new value.
Here, we're assuming 99 on the original variable means that the respondent refused to answer or said they "didn't know."

After the week's class meetings, if you want to review more details on reverse coding a variable using COMPUTE + IF commands, here is a optional screencast on the topic. It runs a bit over 10 minutes, but only the first 6:30 covers the reverse coding. In the sample code directly above, you'll see that I do something that I didn't do in the screencast, but that is important; I added a line of code to make sure that any missing responses on the original item carry over to the new item:
IF MISSING(Old_Variable10)   New_Variable10 = $SYSMIS.


I recorded the rest of the screencast because it shows your instructor making a very common mistake when adding variables (initially, both the VARIABLE LABELS and VALUE LABELS commands neglected to tell SPSS what variable needed to be labeled).


In the likely event that you need help trouble-shooting issues that come up with labeling new variables and their response categories, here is a list of common issues:

  • Building on the example from above, we might label our new variable:
    VARIABLE LABELS WhitePoodle "Dog is a white poodle".
    EXECUTE.

What to check if you can't get SPSS to create a new variable's label:

  1. Note that the VARIABLE LABELS command is plural even when you are just labeling one variable. If they made your instructor king for a day, the command would run just fine with or without that last S; however, his promotion to regality does not appear to be forthcoming.

  2. Note where the quotation marks go. SPSS uses a single quotation mark, but you instructor always quotation marks, which allows him to “Put single ‘marks’ inside of a label”

  3. Notice also that each line begins with a command and ends with the required period. There is only one period per full command.

  4. Notice that you have to tell SPSS which variable you want to label. In this case: WhitePoodle

  5. The EXECUTE command tells SPSS to go ahead and do this work now rather than waiting until another command is run. This is a leftover “feature” from when computers used to be SLOW and no one wanted to sit around waiting for the program to label variables until those variables were actually going to be used in some type of analysis.

  6. Finally, if you don't select and run the full block of code, the variable won't be labeled.

  7. And, if you have looked at each of the problem areas I identify above, you can use ChatGPT or Claude to help you identify what the issue is with your code. 

  • Using the example from above, we would need to label the new variable's response categories:

     VALUE LABELS WhitePoodle
      1 "White poodle"
      0 "Not a white poodle"
     .
     EXECUTE.

What to check if you can't get SPSS to create the response labels:

  1. You can't have extra, blank lines in a command. This can happen accidentally when you cut and paste sample code from outside of SPSS to use as a template.

  2. Folks accidentally substitute VARIABLE LABELS from above instead of using the correct VALUE LABELS command.

  3. Folks labeling a variable try to use VALUE LABEL without the final “S,” in which case SPSS doesn’t recognize the command.

  4. Folks forget to tell SPSS which variable they are trying to label response categories for (above, that would be WhitePoodle or they list the wrong variable's because they are reusing code for a previously created variable. If you do this, the response categories on the previous variables will be relabeled... No worries--if you make a mistake in labeling any variable or its categories, you can just fix the mistake and rerun the code and it will fix the labels.

  5. Folks forget to put at least one space between each number and it's label. This won't work: 1"Cat". This will: 1 "Cat". Incidentally, you can add extra spaces, so that's a good habit to get into so that it becomes very easy to see if you forget to have at least one space.

  6. Folks add an equal sign (e.g. 1 = “yes”) or forget the quotation marks for each label.

  7. Folks add an extra period after the variable name (above, that would be WhitePoodle) despite being mid-command.

  8. Folks forget to include the period after the last label, which tells SPSS that this is the end of full command. Your instructor typically put the period on a separate line so it is easier to copy and repurpose old code with more or fewer categories without forgetting to add the required period after the last label. 

  9. Note that the EXECUTE command and its period need to be there.

  10. And, finally sometimes people create all of the code correctly but forget to select and run it.

  11. And, if you have looked at each of the problem areas I identify above, you can use ChatGPT or Claude to help you identify what the issue is with your code. 

Topic 10 (Wednesday, October 22)—Descriptive statistics for a variable: Frequencies, medians, means, and standard deviations (aka, just enough univariate stats to make sure that your variable coding has been done correctly and that you can describe a variable's central tendency and its distribution)

  • Read closely, this appendix from a best-selling book targeting a non-specialist, general audience (Richard J. Herrnstein and Charles Murray, "Statistics for People Who Are Sure They Can’t Learn Statistics," 12pp).

  • Read quickly to get an introduction to key ideas: Chapter 7, "Getting Started with Quantitative Data: Descriptive Statistics." in Carolyn Forestiere's textbook. After the classes that cover the substance of this chapter, you should will want to reread this chapter and make sure that you are able to explain, apply, and give an example of every concept in the glossary. Your textbook talks about doing data analysis with the "dataprac" dataset, but we will not be doing this. Instead, your practice work on the concepts in textbook chapters 7, 8, and 9 will be done with in-class SPSS work and BlackBoard exercises that use newer, better-quality datasets.

  • Make sure to review the summary of research example by Ceka and Magalhaes (you do not need to scrutinize the summary of Park and Shin's article). In addition to paying attention to how different variables were coded in the study, review the questions in the study guide, which ask you to think about this study's research design. Specifically, see if you can identify the theory, dependent variable independent variable, intervening variable (which is assessed with three country-level measures), and controls.  

  • For each of the statistical techniques we cover in this class, your instructor will summarize the key ideas that you need to remember in a block of material, like the one you see immediately below. Review these blocks of material carefully until you understand every point well. The reason that I am summarizing this material in the schedule is so that you can go to one place to review every major statistical concept. These same ideas are covered in class, in PPTs, and (for the most part) in assigned readings. I am doing this way so that you can take limited notes in class and still be confident that you have clear notes on the concepts you need to know

  • Consider using AI as a tutor to make sure that you fully understand the notes that are provided for each block of statistical methods material. For example, you might test your understanding of the block of material that comes right after this note, by copying and pasting those notes into ChatGPT or Claude.ai and asking this prompt:

I am taking a research methods class for political science and IR majors. Here a some summary notes the professor gave us for what we are supposed to remember from class. Can you give me a clear, easy to understand explanation of each key concept and a couple of additional examples for each key concept. Also, can you then create a 20-item multiple choice quiz for me that covers definitions, concepts, and applications of the key ideas. Do not mark the correct answers in the quiz. Instead, give me an answer key afterwards together with an explanation for each correct answer.

  • After class and completing your assigned readings, make sure that you feel comfortable with your understanding of a number of basic the concepts and methods that researchers use to explore the "central tendency" and distribution of different types of variables. These concepts will come up continuously for the rest of the term. Here are some big takeaways that you will want to remember:
    • The type of variable you are analyzing matters. Means, medians, and standard distributions communicate valuable information about interval (aka "continuous") variables. With an interval variable, every one-unit increase is assumed to be equal in influence. When that it not the case, interval variables are often modified so that a one-unit increase is roughly equal in importance. For example, researchers typically substitute years of formal schooling with an interval variable that looks something like this: 1=no high school degree; 2=high school degree; 3=some college or 2-year degree; 4: 4-year college degree; 5) Masters degree, 6: Doctoral degree.

    • Standard deviations are a widely used measure (in all fields) to look at the distribution of an interval variable across its range of values.

A straightforward way to explain a standard deviation (SD) is to say that it is the measure of how far away from a variable's mean most respondents' answers are scattered.If a variable is normally distributed, two-thirds of all cases will be within one standard deviation of the mean, a range that is approximately a third below to a third higher than the average respondent. And almost all respondents fall within two standard deviations below or above the mean. Specifically 95% of observations in a normal distribution fall within two standard deviations. An observation that is three standard deviations below or above the mean would be an extreme outlier (i.e., in the bottom .3% or the top 99.7%).

Standard deviations are useful for comparing the distributions of two populations on the same variable. So, if we have two cities with the same mean and median income but very different standard deviations that vary by tens of thousands of dollars, the city with the highest standard deviation also has a higher degree of inequality. In short, a small standard deviation means that most observations are clustered around the mean, while a larger standard deviation means that more observations are further away from the mean.

Standard deviations also are useful for comparing observations on two different variables. For example, we might say that someone whose LSAT (law school admissions test) was 1.7 standard deviations above the mean is a better test taker than someone whose score or the GRE (the exam many graduate schools require) was 1.5 standard deviations above the mean.

Why use standard deviation statistics when you could use percentiles? The reason why social scientists, economists, financial planners, and lots of other professionals use standard deviations to examine distributions of data is because just looking at percentiles can be misleading. Percentiles focus on rank order, while standard deviations give you a sense of how far a given observation is from the average.

Imagine a long race with 100 runners who all finished within two second of each other--one runner finished one second behind all of the other runners except the winner, who finished one second ahead of all but the last finisher. The standard deviation for race times would be tiny, so we would know that a runner who finished 2 standard deviations faster than the other runners didn't run much faster at all.

One of the big issues with percentiles is that unequal percentile spacing can suggest big differences where there are almost non or small differences when there are big differences. . For example, someone with an LSAT score of 150 (out of 180 possible points) is in the 39th percentile of takers. Just a 3-point higher score puts a test-taker in the 50th percentile just three points higher than that (a 157) puts the taker in the 61st percentile. So, at the center of the distribution, a 6 point gain in the LSAT score corresponds to a 20-point gain in percentiles. In contrast, going from a score of 170 (96th percentile) to a 176 increases a test-taker percentile by just four percentiles (to the 99th),

    • When an interval variable's distribution is highly skewed, researchers typically cap the highest or lowest values so that the mean and the median are closer and so that the mean doesn't distort what the typical respondent looks like. For example, income in the US has a large, positive skew because a relatively few number of individuals earn a tremendous amount of money compared to everyone else. This means that the mean income in the US is a lot higher than the median.

In advanced analyses examining relationships between variables (e.g., income and happiness), researchers usually use calculations based on variable means rather than medians. When a variable like income is highly skewed (so its mean differs greatly from its median) researchers usually recode the data to collapse extreme values at the end of the distribution. For instance, income data are commonly capped by grouping everyone earning $200K or more into one high-end category. Recoding like this combines billionaires like Bill Gates or Jeff Bezos with the merely highly affluent, pulling the mean income closer to the median and reducing the standard deviation.

    • The median, means, and standard deviations of a categorical (aka nominal) variable are useless because the values assigned to a given category are arbitrary for this type of variable.  We can use a frequency table or chart to explore the distribution of a categorical variable, but for statistical analyses, this type of variable typically is recoded and analyzed as a series of separate dummy variables. For example, if you have party variable with three categories, a mean of 1.3 doesn't mean anything, so you'd want to create three dummy variables, one for each party. Remember, a dummy variable is coded 1 if a respondent is in the group and 0 if they are not.  If your analysis would benefit from also summarizing the distribution of a categorical (also called "nominal") variable, you will want to make a separate table or figure reporting its "frequencies" (in percentages) because the mean and standard deviation of a categorical variable does not communicate useful information for these types of variables.

    • Remember how to interpret the means of dummy (aka "binary," "dichtomous," and "0/1" variables. By convention, we report the standard deviation for each dummy variable, too, even though the SD information for a dummy variable is not useful. By itself, the mean value for a dummy variable reports its distribution in your sample; e.g., a value of .37 for the dummy variable Democrat indicates that 37% of the sample identifies as a Democrat.

    • What do researchers do to summarize the main characteristics of an ordinal variable? While there are statistical techniques specifically designed for independent and dependent ordinal variables, most analyses either recode ordinal variables into dummy variables or treat them as interval variables

However, if a researcher is not using the kinds of statistical tests we’ll focus on for the rest of the semester and instead simply wants to show how a variable is distributed, a frequency chart is often the best choice. For example, imagine three students who each have the same GPA of 2.0. One student earns an even mix of Bs, Cs, and Ds. Another earns mostly Bs and Ds, with just one C. The third earns all Cs. Although the mean and median GPA for all three students are identical, the distribution of their grades reveals that they are quite different types of performers. In this case, displaying each student’s grade distribution with a frequency chart is the clearest way to communicate their differences—especially when addressing a general audience that may not easily interpret measures like standard deviation.

The most common type of interval variable is a Likert scale. Important: you always want to think carefully about treating a Likert scale as an ordinal variable versus creating a dummy variable from it. While researchers often do this, it is worth while to examine a frequency distribution of the variable. If we ask how much an American approves of the Republican party with a Likert item (1=Not a all, 2=very little, 3=do not approve or disapprove, 4=quite a lot, or 5=very much), there's not going to be much difference at all between responses 1 and 2 or 3 and 4. Typically, scholars will consider recoding an item like this into either a 3-point interval variable (disapprove, neither, approve) or creating a dummy variable (1=approve; 0=disapprove or neither). It is common practice to convert ordinal items with 6 or more responses into interval variables.


Topic 11 (Friday, Oct. 24)Computing, interpreting, and comparing descriptive statistics for more than one variable in SPSS.

  • Remember that SPSS #2 on BlackBoard is due by 5pm. If you have a unique situation related to how SPSS is working on your computer, please contact Dr. Setzler after you have carefully tried to address the issue yourself, carefully reviewed the instructions/screencast links above, and tried to use an AI resource to troubleshoot your coding. It is better for all involved for you to submit a late assignment (that will receive a modest penalty) rather than one that is partially completed.

  • Ahead of class, print out this handout--the third of three that your instructor has written to cover all of the SPSS and statistical methods we use in PSC 2019 and PSC 4099.

  • Ahead of class, reread five pages of Mark Setzler, "Did Brazilians Vote for Jair Bolsonaro Because They Share his Most Controversial Views?" Specifically, print out and review:

    • Page 1, the study's abstract

    • Pages 4-8, which explains how the study's variables were coded and includes two figures

You are being asked to review these materials (again) because they provide and example of how survey dataset variables are recoded so that researcher's hypotheses can be tested. The two figures provide examples of how data-splits and frequency results can be used to visually test arguments. We will be doing this kind of work in class

  • In class, we learn about and use SPSS to practice one of the techniques researchers use to examine and visually demonstrate whether two variables are associated with one another. Specifically, we will be splitting a dataset by independent variables and then computing means or frequencies for dependent variables.

You soon will be reading Chapter 8 in  Forestiere's textbook, which covers the topic of "Bivariarte Analysis." While the statistical tests and the methods discussed in that chapter very useful, usually the best way to begin analyzing whether there may be a relationship between two or more variables is to see how variations in one or more independent variables correspond to different means or proportions for a dependent variable. This procedure looks only at the data that are in our survey and does not involve any statistical tests to determine whether any differences between groups we see for our sample and the larger population.

For example, if we think that there probably is a relationship between gender and a person's income, we can use SPSS's command Data -> split file -> compare groups (and select the variable gender, which in this example we have coded into a binary measure of males and non-males). Once we have "split the file," we will see statistical results for males and then non-males every time we run a statistical procedure. Each time, we will get two sets of output because there are the two values for this variable. So, if we now run a descriptives command for our variable measuring household income, the results will tell us what the mean income is for males and what it is for non-males.

If we have several independent variables (e.g., perhaps we want to see how household incomes vary for women and men, Republicans and Democrats, and people who are under 40, 41-65, and over 65 years of age, we can repeatedly split our data by the relevant variable and then run descriptives tests. We can examine the different means in a table, but for a presentation or paper that considers the differences for several different groups, it will look best if we compile our results in a chart/figure.

Week 11

Topic 12 (Monday, 10/27)Comparing means and frequencies in bar charts

  • In class, we will be visually summarizing frequencies and descriptives statistical results in spreadsheet barcharts. You have no assigned readings for Friday. We will be using statistical results to make the kind of bar charts you see in the Brazilian election article you were asked to reads parts of ahead of Wednesday's class.

    After class, if you need additional guidance on creating Excel charts to visually summarize SPSS reults, you have the option of watching this screencasthttps://youtu.be/T6kHpZ2oReQ. It shows you how splitting data and calculating the means for several different variables will allow you to make a nice-looking chart in Excel to show your results. Making this kind of a chart is a task you will need to do for your next BlackBoard SPSS Assignment.

  • After class, take this practice SPSS exam so you know what your test will look like for Wednesday. Due to the very short turn-around time, this test won't be graded. The purpose of the practice instrument is to make sure you know what kind of questions to expect and in what format.

SPSS test 1 (Wednesday 10/29 . The test will be taken in-class, on BlackBoard) . The test will require you to use SPSS to drop some observations based on how people answered a specific question. You also will be asked to create a smaller version of a dataset by dropping a couple of variables. And you will be asked to recode and create a few variables, including new variables that draw information on two or more other variables (e.g. to create a white, female, Democrat dummy variable). You will show that you have created and labeled new variables correctly with frequency tables. You will also be asked to generate and interpret descriptive statistics for a variable and its subgroups (i.e., to split the variable into groups and run frequencies or descriptives). The test will make up 5% of your final course grade. IMPORTANT: You may bring a 3x5 notecard with you for the test that has important information--including sample coding. You my write or type only on one side of the card.


Unit 2 test: (Friday, 10/13). The topics covered on the exam are noted in the study guide (i.e., the Focus Questions handout that has been in the PPTs folder since the start of the unit). You will not be using SPSS or interpreting any SPSS output as part of this exam. You will be asked conceptual questions related to the use of SPSS and other statistical software. IMPORTANT: You may bring a 3x5 notecard with you for the test that has important information--including sample coding. You my write or type only on one side of the card.