Skip to content

Description of Variables

Classified Officer Names

master_classified in our code. Classified by gender and ethnicity. Each row item represents one officer.

  1. year: year of Key Officers directory from which name was scraped. Integer. Values range from 1965 to 2022.
  2. mo: 3 letter abbreviation of month of Key Officers directory from which officer was scraped.
  3. rank: raw scraped rank/position of officer.
  4. name: raw scraped full name of officer.
  5. first_clean: parsed first name from name using nameparser package.
  6. middle, last, suffix: parsed middle, last, suffix from name using nameparser package.
  7. firstlast: concatenation of first_clean and last columns for classification.
  8. gender_guess: classified gender of officer name using gender-guesser package. Values include unknown, male, mostly_male, female, mostly_female, or andy (androgynous). For our analysis male and mostly_male are combined, and female and mostly_female are combined.
  9. eth_guess 8_nationality_groups: classified ethnicity of officer name using “8_nationality_groups” model in the name-to-ethnicity package. Values include southAsian, eastAsian, celtic, african, muslim, nordic, hispanic, or european. For our analysis we combine southAsian and eastAsian as Asian; and celtic, nordic, and european as White.
  10. eth_prob 8_nationality_groups: Probability of eth_guess 8_nationality_groups classification.
  11. ethnicseer: classified ethnicity of officer name using the ethnicseer package. Values include eng for English, ita for Italian, ger for German, jap for Japanese, frn for French, ind for Indian, rus for Russian, mea for Middle-Eastern, spa for Spanish, viet for Vietnamese, chi for Chinese, or kor for Korean. For our analysis we combine jap, viet, chi, kor as Asian; and ger, ita, frn, rus as White.

Aggregated Gender/Race Statistics

master_stats in our code. Each row item represents one year-category combination.

  1. year: year of Key Officers directory from which name was scraped. Integer. Values range from 1965 to 2022.
  2. category: a specific gender or ethnicity group, e.g. 'female' or 'male'.
  3. count: absolute number of names in the category in that year.
  4. percent: proportion of names in category, relative to the overal number of names in that year.