An R package to analyze the parliamentary records of the 19th legislative period of the Bundestag, the German parliament.
Nie możesz wybrać więcej, niż 25 tematów Tematy muszą się zaczynać od litery lub cyfry, mogą zawierać myślniki ('-') i mogą mieć do 35 znaków.

196 wiersze
5.7KB

  1. ---
  2. title: "Differences in gender"
  3. output: rmarkdown::html_vignette
  4. vignette: >
  5. %\VignetteIndexEntry{Differences in gender}
  6. %\VignetteEngine{knitr::rmarkdown}
  7. %\VignetteEncoding{UTF-8}
  8. ---
  9. ```{r, include = FALSE}
  10. knitr::opts_chunk$set(
  11. collapse = TRUE,
  12. comment = "#>"
  13. )
  14. ```
  15. ```{r setup}
  16. library(hateimparlament)
  17. library(dplyr)
  18. library(ggplot2)
  19. library(stringr)
  20. library(tidyr)
  21. library(xml2)
  22. ```
  23. ## Preparation of data
  24. First, you need to download all records of the current legislative period.
  25. ```r
  26. fetch_all("../records/") # path to directory where records should be stored
  27. ```
  28. Second, those `.xml` files, need to be parsed into `R` `tibbles`. This is accomplished by:
  29. ```r
  30. read_all("../records/") %>% repair() -> res
  31. ```
  32. We also used `repair` to fix a bunch of formatting issues in the records.
  33. For development purposes, we load the tables from csv files.
  34. ```{r}
  35. res <- read_from_csv('../inst/csv/')
  36. ```
  37. and unpack our tibbles
  38. ```{r}
  39. comments <- res$comments
  40. speeches <- res$speeches
  41. speaker <- res$speaker
  42. talks <- res$talks
  43. ```
  44. Bevor we can do our analysis, we have to assign a gender to our politicans. We do this
  45. by reading the gender from the master data of all members of parliament, which is
  46. fetched from bundestag.de.
  47. ```{r}
  48. xml_get <- function(node, name) {
  49. res <- xml_text(xml_find_all(node, name))
  50. if (length(res) == 0) NA_character_
  51. else res
  52. }
  53. x <- read_xml("../inst/masterdata.xml")
  54. mdbs <- xml_find_all(x, "MDB")
  55. ids <- c()
  56. genders <- c()
  57. for (mdb in mdbs) {
  58. xml_get(mdb, "ID") -> mdb_id
  59. xml_find_first(mdb, "BIOGRAFISCHE_ANGABEN") %>%
  60. xml_get("GESCHLECHT") ->
  61. mdb_gender
  62. ids <- c(ids, mdb_id)
  63. genders <- c(genders, if (mdb_gender == "männlich") "male" else "female")
  64. }
  65. gender <- tibble(id = ids, gender = genders)
  66. speaker_with_gender <- left_join(res$speaker, gender)
  67. ```
  68. ## Analyse
  69. First, let's look at the relative distribution of the sexes throughout the whole Bundestag.
  70. ```{r}
  71. speaker_with_gender %>%
  72. select(gender) %>%
  73. group_by(gender) %>%
  74. summarise("count" = n()) %>%
  75. filter(gender %in% c("male", "female")) %>%
  76. mutate(portion = 100*count/sum(count)) ->
  77. plot1
  78. bp <- ggplot(plot1, aes(x = "", y = portion, fill = gender))+
  79. geom_bar(width = 1, stat = "identity")
  80. pie <- bp + coord_polar("y", start=0)
  81. pie +
  82. scale_fill_manual(values=c("pink", "blue")) +
  83. ggtitle("Relative distribution of sexes") +
  84. xlab("") +
  85. ylab("")
  86. ```
  87. Next, we look at the individual distributions between men and women in the different fractions.
  88. ```{r, fig.width=7}
  89. speaker_with_gender %>%
  90. group_by(fraction) %>%
  91. summarize(n = n()) ->
  92. fraction_size
  93. speaker_with_gender %>%
  94. filter(gender=="female") %>%
  95. group_by(fraction) %>%
  96. summarize(n_female = n()) %>%
  97. left_join(fraction_size) %>%
  98. mutate(q = n_female/n) -> women_per_fraction
  99. bar_plot_fractions(women_per_fraction, x_variable=fraction, y_variable=q, title="Frauenanteil nach Partei")
  100. ```
  101. Prepared with this knowledge, we can now analyse the relative amount of speeches by gender and fraction.
  102. ```{r, fig.width=7}
  103. speaker_with_gender %>% transmute(speaker_id = id, gender, fraction) -> simple_speaker_with_gender
  104. speeches %>%
  105. transmute(id, speaker_id = speaker) %>%
  106. inner_join(simple_speaker_with_gender) %>%
  107. group_by(fraction) %>%
  108. summarize(speeches=n()) ->
  109. fraction_speeches_size
  110. speeches %>%
  111. transmute(id, speaker_id = speaker) %>%
  112. inner_join(simple_speaker_with_gender) %>%
  113. filter(gender=='female') %>%
  114. group_by(fraction) %>%
  115. summarize(female_speeches=n()) %>%
  116. left_join(fraction_speeches_size) %>%
  117. left_join(women_per_fraction) %>%
  118. mutate(q_speeches = female_speeches/speeches) -> speech_distribution
  119. #bar_plot_fractions(speech_distribution, x_variable=fraction, y_variable=q_speeches, title="Redeanteil Frauen nach Partei")
  120. party_order <- factor(c("Fraktionslos", "AfD&Fraktionslos",
  121. "DIE LINKE", "BÜNDNIS 90/DIE GRÜNEN", "SPD", "CDU/CSU",
  122. "FDP", "AfD", NA_character_))
  123. speech_distribution %>%
  124. mutate("Frauenanteil" = q, "Redenanteil Frauen" = q_speeches) %>%
  125. pivot_longer(c(Frauenanteil, "Redenanteil Frauen"), "type") %>%
  126. ggplot(aes(x=factor(fraction, levels = party_order), y=value, fill=factor(type, levels = factor(c("Frauenanteil", "Redenanteil Frauen"))))) + scale_fill_manual(values= c("Frauenanteil"="gray", "Redenanteil Frauen"="red")) + coord_flip() + geom_bar(stat="identity", position="dodge") + labs(fill="Kategorie")
  127. ```
  128. For comparison, let's analyze the total differences in the amount of speeches given.
  129. ```{r}
  130. speeches %>%
  131. group_by(speaker) %>%
  132. summarize(n = n()) %>%
  133. ungroup() %>%
  134. arrange(-n) %>%
  135. join_speaker(res) %>%
  136. left_join(gender, by=c("speaker"="id")) %>%
  137. group_by(gender) %>%
  138. summarise(absolute=sum(n)) %>%
  139. filter(gender %in% c("female", "male")) %>%
  140. mutate(absolute2=absolute/sum(absolute)) %>%
  141. mutate(portion=c(0.32, 0.68)) %>%
  142. mutate(relative=absolute*(1-portion)) %>%
  143. mutate(relative2=relative/sum(relative)) ->
  144. plot3
  145. ```
  146. At first lets take a look at the absolute difference in the amount of speeches by the two sexes.
  147. ```{r,fig.width=7}
  148. barplot(plot3$absolute2,
  149. ylab = "amount of speeches",
  150. main = "Absolute comparison of speech shares",
  151. las = 1,
  152. names.arg = c("women", "men"),
  153. col = c("pink", "darkblue"),
  154. font.main = 4,
  155. cex.axis = 0.7)
  156. ```
  157. Since there are more men represented in the German Bundestag, we now consider the relative proportions of speeches, depending on the ratio of men and women.
  158. ```{r, fig.width=7}
  159. barplot(plot3$relative2,
  160. ylab = "amount of speeches",
  161. main = "Relative comparison of speech shares",
  162. las = 1,
  163. names.arg = c("women", "men"),
  164. col = c("pink", "darkblue"),
  165. font.main = 4,
  166. cex.axis = 0.7)
  167. ```