An R package to analyze the parliamentary records of the 19th legislative period of the Bundestag, the German parliament.
Você não pode selecionar mais de 25 tópicos Os tópicos devem começar com uma letra ou um número, podem incluir traços ('-') e podem ter até 35 caracteres.

199 linhas
5.8KB

  1. ---
  2. title: "Differences in gender"
  3. output: rmarkdown::html_vignette
  4. vignette: >
  5. %\VignetteIndexEntry{Differences in gender}
  6. %\VignetteEngine{knitr::rmarkdown}
  7. %\VignetteEncoding{UTF-8}
  8. ---
  9. ```{r, include = FALSE}
  10. knitr::opts_chunk$set(
  11. collapse = TRUE,
  12. comment = "#>"
  13. )
  14. ```
  15. ```{r setup}
  16. library(hateimparlament)
  17. library(dplyr)
  18. library(ggplot2)
  19. library(stringr)
  20. library(tidyr)
  21. library(xml2)
  22. ```
  23. ## Preparation of data
  24. First, you need to download all records of the current legislative period.
  25. ```r
  26. fetch_all("../records/") # path to directory where records should be stored
  27. ```
  28. Second, those `.xml` files, need to be parsed into `R` `tibbles`. This is accomplished by:
  29. ```r
  30. read_all("../records/") %>% repair() -> res
  31. ```
  32. We also used `repair` to fix a bunch of formatting issues in the records.
  33. For development purposes, we only fetch records if they are not already
  34. stored as csv files:
  35. ```{r}
  36. res <- read_from_csv_or_fetch('../inst/')
  37. ```
  38. and unpack our tibbles
  39. ```{r}
  40. comments <- res$comments
  41. speeches <- res$speeches
  42. speaker <- res$speaker
  43. talks <- res$talks
  44. ```
  45. Bevor we can do our analysis, we have to assign a gender to our politicans. We do this
  46. by reading the gender from the master data of all members of parliament, which is
  47. fetched from bundestag.de.
  48. ```{r}
  49. xml_get <- function(node, name) {
  50. res <- xml_text(xml_find_all(node, name))
  51. if (length(res) == 0) NA_character_
  52. else res
  53. }
  54. x <- read_xml("../inst/masterdata.xml")
  55. mdbs <- xml_find_all(x, "MDB")
  56. ids <- c()
  57. genders <- c()
  58. for (mdb in mdbs) {
  59. xml_get(mdb, "ID") -> mdb_id
  60. xml_find_first(mdb, "BIOGRAFISCHE_ANGABEN") %>%
  61. xml_get("GESCHLECHT") ->
  62. mdb_gender
  63. ids <- c(ids, mdb_id)
  64. genders <- c(genders, if (mdb_gender == "männlich") "male" else "female")
  65. }
  66. gender <- tibble(id = ids, gender = genders)
  67. speaker_with_gender <- left_join(res$speaker, gender)
  68. ```
  69. ## Analyse
  70. First, let's look at the relative distribution of the sexes throughout the whole Bundestag.
  71. ```{r}
  72. speaker_with_gender %>%
  73. select(gender) %>%
  74. group_by(gender) %>%
  75. summarise("count" = n()) %>%
  76. filter(gender %in% c("male", "female")) %>%
  77. mutate(portion = 100*count/sum(count)) ->
  78. plot1
  79. bp <- ggplot(plot1, aes(x = "", y = portion, fill = gender))+
  80. geom_bar(width = 1, stat = "identity")
  81. pie <- bp + coord_polar("y", start=0)
  82. pie +
  83. scale_fill_manual(values=c("pink", "blue")) +
  84. ggtitle("Relative distribution of sexes") +
  85. xlab("") +
  86. ylab("")
  87. ```
  88. Next, we look at the individual distributions between men and women in the different fractions.
  89. ```{r, fig.width=7}
  90. speaker_with_gender %>%
  91. group_by(fraction) %>%
  92. summarize(n = n()) ->
  93. fraction_size
  94. speaker_with_gender %>%
  95. filter(gender=="female") %>%
  96. group_by(fraction) %>%
  97. summarize(n_female = n()) %>%
  98. left_join(fraction_size) %>%
  99. mutate(q = n_female/n) -> women_per_fraction
  100. bar_plot_fractions(women_per_fraction, x_variable=fraction, y_variable=q, title="Frauenanteil nach Partei")
  101. ```
  102. Prepared with this knowledge, we can now analyse the relative amount of speeches by gender and fraction.
  103. ```{r, fig.width=7}
  104. speaker_with_gender %>% transmute(speaker_id = id, gender, fraction) -> simple_speaker_with_gender
  105. speeches %>%
  106. transmute(id, speaker_id = speaker) %>%
  107. inner_join(simple_speaker_with_gender) %>%
  108. group_by(fraction) %>%
  109. summarize(speeches=n()) ->
  110. fraction_speeches_size
  111. speeches %>%
  112. transmute(id, speaker_id = speaker) %>%
  113. inner_join(simple_speaker_with_gender) %>%
  114. filter(gender=='female') %>%
  115. group_by(fraction) %>%
  116. summarize(female_speeches=n()) %>%
  117. left_join(fraction_speeches_size) %>%
  118. left_join(women_per_fraction) %>%
  119. mutate(q_speeches = female_speeches/speeches) -> speech_distribution
  120. #bar_plot_fractions(speech_distribution, x_variable=fraction, y_variable=q_speeches, title="Redeanteil Frauen nach Partei")
  121. party_order <- factor(c("Fraktionslos", "AfD&Fraktionslos",
  122. "DIE LINKE", "BÜNDNIS 90/DIE GRÜNEN", "SPD", "CDU/CSU",
  123. "FDP", "AfD", NA_character_))
  124. speech_distribution %>%
  125. mutate("Frauenanteil" = q, "Redenanteil Frauen" = q_speeches) %>%
  126. pivot_longer(c(Frauenanteil, "Redenanteil Frauen"), "type") %>%
  127. ggplot(aes(x=factor(fraction, levels = party_order), y=value, fill=factor(type, levels = factor(c("Frauenanteil", "Redenanteil Frauen"))))) + scale_fill_manual(values= c("Frauenanteil"="gray", "Redenanteil Frauen"="red")) + coord_flip() + geom_bar(stat="identity", position="dodge") + labs(fill="Kategorie")
  128. ```
  129. For comparison, let's analyze the total differences in the amount of speeches given.
  130. ```{r}
  131. speeches %>%
  132. group_by(speaker) %>%
  133. summarize(n = n()) %>%
  134. ungroup() %>%
  135. arrange(-n) %>%
  136. join_speaker(res) %>%
  137. left_join(gender, by=c("speaker"="id")) %>%
  138. group_by(gender) %>%
  139. summarise(absolute=sum(n)) %>%
  140. filter(gender %in% c("female", "male")) %>%
  141. mutate(absolute2=absolute/sum(absolute)) %>%
  142. mutate(portion=c(0.32, 0.68)) %>%
  143. mutate(relative=absolute*(1-portion)) %>%
  144. mutate(relative2=relative/sum(relative)) ->
  145. plot3
  146. ```
  147. At first lets take a look at the absolute difference in the amount of speeches by the two sexes.
  148. ```{r,fig.width=7}
  149. barplot(plot3$absolute2,
  150. ylab = "amount of speeches",
  151. main = "Absolute comparison of speech shares",
  152. las = 1,
  153. names.arg = c("women", "men"),
  154. col = c("pink", "darkblue"),
  155. font.main = 4,
  156. cex.axis = 0.7)
  157. ```
  158. Since there are more men represented in the German Bundestag, we now consider the relative proportions of speeches, depending on the ratio of men and women.
  159. ```{r, fig.width=7}
  160. barplot(plot3$relative2,
  161. ylab = "amount of speeches",
  162. main = "Relative comparison of speech shares",
  163. las = 1,
  164. names.arg = c("women", "men"),
  165. col = c("pink", "darkblue"),
  166. font.main = 4,
  167. cex.axis = 0.7)
  168. ```