An R package to analyze the parliamentary records of the 19th legislative period of the Bundestag, the German parliament.
Nelze vybrat více než 25 témat Téma musí začínat písmenem nebo číslem, může obsahovat pomlčky („-“) a může být dlouhé až 35 znaků.

218 řádky
6.4KB

  1. ---
  2. title: "genderequality"
  3. output: rmarkdown::html_vignette
  4. vignette: >
  5. %\VignetteIndexEntry{genderequality}
  6. %\VignetteEngine{knitr::rmarkdown}
  7. %\VignetteEncoding{UTF-8}
  8. ---
  9. ```{r, include = FALSE}
  10. knitr::opts_chunk$set(
  11. collapse = TRUE,
  12. comment = "#>"
  13. )
  14. ```
  15. ```{r setup}
  16. library(hateimparlament)
  17. library(dplyr)
  18. library(ggplot2)
  19. library(stringr)
  20. library(tidyr)
  21. library(rvest)
  22. ```
  23. ## Preparation of data
  24. First, you need to download all records of the current legislative period.
  25. ```r
  26. fetch_all("../records/") # path to directory where records should be stored
  27. ```
  28. Second, those `.xml` files, need to be parsed into `R` `tibbles`. This is accomplished by:
  29. ```r
  30. read_all("../records/") %>% repair() -> res
  31. ```
  32. We also used `repair` to fix a bunch of formatting issues in the records and unpacked
  33. the result into more descriptive variables.
  34. For development purposes, we load the tables from csv files.
  35. ```{r}
  36. res <- read_from_csv('../inst/csv/')
  37. ```
  38. and unpack our tibbles
  39. ```{r}
  40. comments <- res$comments
  41. speeches <- res$speeches
  42. speaker <- res$speaker
  43. talks <- res$talks
  44. ```
  45. Bevor we can do our analysis, we have to assign a gender to our politicans.
  46. ```{r}
  47. extract_href <- function(sel, html) {
  48. html %>%
  49. html_node(sel) %>%
  50. html_attr("href")
  51. }
  52. first_content_p_text <- function(url) {
  53. res <- NA
  54. i <- 1
  55. while(is.na(res)) {
  56. read_html(url) %>%
  57. html_node(str_glue("#mw-content-text > div.mw-parser-output > p:nth-child({i})")) %>%
  58. html_text() -> res
  59. i <- i + 1
  60. }
  61. res
  62. }
  63. abgeordneten_list_html <- read_html(
  64. "https://de.wikipedia.org/wiki/Liste_der_Mitglieder_des_Deutschen_Bundestages_(19._Wahlperiode)")
  65. selectors <- str_glue("#mw-content-text > div.mw-parser-output > table:nth-child(20) > tbody > tr:nth-child({2:709}) > td:nth-child(2) > a")
  66. link_part2 <- sapply(selectors, extract_href, abgeordneten_list_html)
  67. link <- str_c("https://de.wikipedia.org", link_part2)
  68. text <- sapply(link, first_content_p_text)
  69. text %>%
  70. str_extract(" ist ein.") %>%
  71. str_replace(" ist eine", "female") %>%
  72. str_replace(" ist ein ", "male") ->
  73. gender
  74. text %>%
  75. str_extract("^([:upper:]?[:lower:]+[\\s\\-]?)*") %>%
  76. str_trim() ->
  77. names
  78. gender <- tibble(speaker = names,
  79. gender = gender)
  80. speaker %>%
  81. unite("speaker", vorname, nachname, sep = " ") %>%
  82. right_join(gender, by = "speaker") ->
  83. speaker_with_gender
  84. ```
  85. ## Analyse
  86. First, let's look at the relative distribution of the sexes throughout the whole Bundestag.
  87. ```{r}
  88. speaker_with_gender %>%
  89. select(gender) %>%
  90. group_by(gender) %>%
  91. summarise("count" = n()) %>%
  92. filter(gender %in% c("male", "female")) %>%
  93. mutate(portion = 100*count/sum(count)) ->
  94. plot1
  95. bp <- ggplot(plot1, aes(x = "", y = portion, fill = gender))+
  96. geom_bar(width = 1, stat = "identity")
  97. pie <- bp + coord_polar("y", start=0)
  98. pie +
  99. scale_fill_manual(values=c("pink", "blue")) +
  100. ggtitle("Relative distribution of sexes") +
  101. xlab("") +
  102. ylab("")
  103. ```
  104. Next, we look at the individual distributions between men and women in the different fractions.
  105. ```{r, fig.width=7}
  106. speaker_with_gender %>%
  107. group_by(fraction) %>%
  108. summarize(n = n()) ->
  109. fraction_size
  110. speaker_with_gender %>%
  111. filter(gender=="female") %>%
  112. group_by(fraction) %>%
  113. summarize(n_female = n()) %>%
  114. left_join(fraction_size) %>%
  115. mutate(q = n_female/n) -> women_per_fraction
  116. bar_plot_fractions(women_per_fraction, x_variable=fraction, y_variable=q, title="Frauenanteil nach Partei")
  117. ```
  118. Prepared with this knowledge, we can now analyse the relative amount of speeches by gender and fraction.
  119. ```{r, fig.width=7}
  120. speaker_with_gender %>% transmute(speaker_id = id, gender, fraction) -> simple_speaker_with_gender
  121. speeches %>%
  122. transmute(id, speaker_id = speaker) %>%
  123. inner_join(simple_speaker_with_gender) %>%
  124. group_by(fraction) %>%
  125. summarize(speeches=n()) ->
  126. fraction_speeches_size
  127. speeches %>%
  128. transmute(id, speaker_id = speaker) %>%
  129. inner_join(simple_speaker_with_gender) %>%
  130. filter(gender=='female') %>%
  131. group_by(fraction) %>%
  132. summarize(female_speeches=n()) %>%
  133. left_join(fraction_speeches_size) %>%
  134. left_join(women_per_fraction) %>%
  135. mutate(q_speeches = female_speeches/speeches) -> speech_distribution
  136. #bar_plot_fractions(speech_distribution, x_variable=fraction, y_variable=q_speeches, title="Redeanteil Frauen nach Partei")
  137. party_order <- factor(c("Fraktionslos", "AfD&Fraktionslos",
  138. "DIE LINKE", "BÜNDNIS 90 / DIE GRÜNEN", "SPD", "CDU/CSU",
  139. "FDP", "AfD", NA_character_))
  140. speech_distribution %>%
  141. mutate("Frauenanteil" = q, "Redenanteil Frauen" = q_speeches) %>%
  142. pivot_longer(c(Frauenanteil, "Redenanteil Frauen"), "type") %>%
  143. ggplot(aes(x=factor(fraction, levels = party_order), y=value, fill=factor(type, levels = factor(c("Frauenanteil", "Redenanteil Frauen"))))) + scale_fill_manual(values= c("Frauenanteil"="gray", "Redenanteil Frauen"="red")) + coord_flip() + geom_bar(stat="identity", position="dodge") + labs(fill="Kategorie")
  144. ```
  145. For comparison, let's analyze the total differences in the amount of speeches given.
  146. ```{r}
  147. speeches %>%
  148. group_by(speaker) %>%
  149. summarize(n = n()) %>%
  150. ungroup() %>%
  151. arrange(-n) %>%
  152. left_join(speaker, by=c("speaker" = "id")) %>%
  153. unite(name, vorname, nachname, sep = " ") %>%
  154. inner_join(gender, by=c("name"= "speaker")) %>%
  155. group_by(gender) %>%
  156. summarise(absolute=sum(n)) %>%
  157. filter(gender %in% c("female", "male")) %>%
  158. mutate(absolute2=absolute/sum(absolute)) %>%
  159. mutate(portion=c(0.32, 0.68)) %>%
  160. mutate(relative=absolute*(1-portion)) %>%
  161. mutate(relative2=relative/sum(relative)) ->
  162. plot3
  163. ```
  164. At first lets take a look at the absolute difference in the amount of speeches by the two sexes.
  165. ```{r,fig.width=7}
  166. barplot(plot3$absolute2,
  167. ylab = "amount of speeches",
  168. main = "Absolute comparison of speech shares",
  169. las = 1,
  170. names.arg = c("women", "men"),
  171. col = c("pink", "darkblue"),
  172. font.main = 4,
  173. cex.axis = 0.7)
  174. ```
  175. Since there are more men represented in the German Bundestag, we now consider the relative proportions of speeches, depending on the ratio of men and women.
  176. ```{r, fig.width=7}
  177. barplot(plot3$relative2,
  178. ylab = "amount of speeches",
  179. main = "Relative comparison of speech shares",
  180. las = 1,
  181. names.arg = c("women", "men"),
  182. col = c("pink", "darkblue"),
  183. font.main = 4,
  184. cex.axis = 0.7)
  185. ```