An R package to analyze the parliamentary records of the 19th legislative period of the Bundestag, the German parliament.
Nevar pievienot vairāk kā 25 tēmas Tēmai ir jāsākas ar burtu vai ciparu, tā var saturēt domu zīmes ('-') un var būt līdz 35 simboliem gara.

199 rindas
5.8KB

  1. ---
  2. title: "Differences in gender"
  3. output: rmarkdown::html_vignette
  4. vignette: >
  5. %\VignetteIndexEntry{Differences in gender}
  6. %\VignetteEngine{knitr::rmarkdown}
  7. %\VignetteEncoding{UTF-8}
  8. ---
  9. ```{r, include = FALSE}
  10. knitr::opts_chunk$set(
  11. collapse = TRUE,
  12. comment = "#>"
  13. )
  14. ```
  15. ```{r setup}
  16. library(hateimparlament)
  17. library(dplyr)
  18. library(ggplot2)
  19. library(stringr)
  20. library(tidyr)
  21. library(xml2)
  22. ```
  23. ## Preparation of data
  24. First, you need to download all records of the current legislative period.
  25. ```r
  26. fetch_all("../records/") # path to directory where records should be stored
  27. ```
  28. Second, those `.xml` files, need to be parsed into `R` `tibbles`. This is accomplished by:
  29. ```r
  30. read_all("../records/") %>% repair() -> res
  31. ```
  32. We also used `repair` to fix a bunch of formatting issues in the records.
  33. For development purposes, we only fetch records if they are not already
  34. stored as csv files:
  35. ```{r}
  36. res <- read_from_csv_or_fetch('../inst/')
  37. ```
  38. and unpack our tibbles
  39. ```{r}
  40. comments <- res$comments
  41. speeches <- res$speeches
  42. speaker <- res$speaker
  43. talks <- res$talks
  44. ```
  45. Bevor we can do our analysis, we have to assign a gender to our politicans. We do this
  46. by reading the gender from the master data of all members of parliament, which is
  47. fetched from bundestag.de.
  48. ```{r}
  49. xml_get <- function(node, name) {
  50. res <- xml_text(xml_find_all(node, name))
  51. if (length(res) == 0) NA_character_
  52. else res
  53. }
  54. x <- read_xml("../inst/masterdata.xml")
  55. mdbs <- xml_find_all(x, "MDB")
  56. ids <- c()
  57. genders <- c()
  58. for (mdb in mdbs) {
  59. xml_get(mdb, "ID") -> mdb_id
  60. xml_find_first(mdb, "BIOGRAFISCHE_ANGABEN") %>%
  61. xml_get("GESCHLECHT") ->
  62. mdb_gender
  63. ids <- c(ids, mdb_id)
  64. genders <- c(genders, if (mdb_gender == "männlich") "male" else "female")
  65. }
  66. gender <- tibble(id = ids, gender = genders)
  67. speaker_with_gender <- left_join(res$speaker, gender)
  68. ```
  69. ## Analyse
  70. First, let's look at the relative distribution of the sexes throughout the whole Bundestag.
  71. ```{r}
  72. speaker_with_gender %>%
  73. select(gender) %>%
  74. group_by(gender) %>%
  75. summarise("count" = n()) %>%
  76. filter(gender %in% c("male", "female")) %>%
  77. mutate(portion = 100*count/sum(count)) ->
  78. plot1
  79. bp <- ggplot(plot1, aes(x = "", y = portion, fill = gender))+
  80. geom_bar(width = 1, stat = "identity")
  81. pie <- bp + coord_polar("y", start=0)
  82. pie +
  83. scale_fill_manual(values=c("pink", "blue")) +
  84. ggtitle("Relative distribution of sexes") +
  85. xlab("") +
  86. ylab("")
  87. ```
  88. Next, we look at the individual distributions between men and women in the different fractions.
  89. ```{r, fig.width=7}
  90. speaker_with_gender %>%
  91. group_by(fraction) %>%
  92. summarize(n = n()) ->
  93. fraction_size
  94. speaker_with_gender %>%
  95. filter(gender=="female") %>%
  96. group_by(fraction) %>%
  97. summarize(n_female = n()) %>%
  98. left_join(fraction_size) %>%
  99. mutate(q = n_female/n) -> women_per_fraction
  100. bar_plot_fractions(women_per_fraction, x_variable=fraction, y_variable=q, title="Frauenanteil nach Partei")
  101. ```
  102. Prepared with this knowledge, we can now analyse the relative amount of speeches by gender and fraction.
  103. ```{r, fig.width=7}
  104. speaker_with_gender %>% transmute(speaker_id = id, gender, fraction) -> simple_speaker_with_gender
  105. speeches %>%
  106. transmute(id, speaker_id = speaker) %>%
  107. inner_join(simple_speaker_with_gender) %>%
  108. group_by(fraction) %>%
  109. summarize(speeches=n()) ->
  110. fraction_speeches_size
  111. speeches %>%
  112. transmute(id, speaker_id = speaker) %>%
  113. inner_join(simple_speaker_with_gender) %>%
  114. filter(gender=='female') %>%
  115. group_by(fraction) %>%
  116. summarize(female_speeches=n()) %>%
  117. left_join(fraction_speeches_size) %>%
  118. left_join(women_per_fraction) %>%
  119. mutate(q_speeches = female_speeches/speeches) -> speech_distribution
  120. #bar_plot_fractions(speech_distribution, x_variable=fraction, y_variable=q_speeches, title="Redeanteil Frauen nach Partei")
  121. party_order <- factor(c("Fraktionslos", "AfD&Fraktionslos",
  122. "DIE LINKE", "BÜNDNIS 90/DIE GRÜNEN", "SPD", "CDU/CSU",
  123. "FDP", "AfD", NA_character_))
  124. speech_distribution %>%
  125. mutate("Frauenanteil" = q, "Redenanteil Frauen" = q_speeches) %>%
  126. pivot_longer(c(Frauenanteil, "Redenanteil Frauen"), "type") %>%
  127. ggplot(aes(x=factor(fraction, levels = party_order), y=value, fill=factor(type, levels = factor(c("Frauenanteil", "Redenanteil Frauen"))))) + scale_fill_manual(values= c("Frauenanteil"="gray", "Redenanteil Frauen"="red")) + coord_flip() + geom_bar(stat="identity", position="dodge") + labs(fill="Kategorie")
  128. ```
  129. For comparison, let's analyze the total differences in the amount of speeches given.
  130. ```{r}
  131. speeches %>%
  132. group_by(speaker) %>%
  133. summarize(n = n()) %>%
  134. ungroup() %>%
  135. arrange(-n) %>%
  136. join_speaker(res) %>%
  137. left_join(gender, by=c("speaker"="id")) %>%
  138. group_by(gender) %>%
  139. summarise(absolute=sum(n)) %>%
  140. filter(gender %in% c("female", "male")) %>%
  141. mutate(absolute2=absolute/sum(absolute)) %>%
  142. mutate(portion=c(0.32, 0.68)) %>%
  143. mutate(relative=absolute*(1-portion)) %>%
  144. mutate(relative2=relative/sum(relative)) ->
  145. plot3
  146. ```
  147. At first lets take a look at the absolute difference in the amount of speeches by the two sexes.
  148. ```{r,fig.width=7}
  149. barplot(plot3$absolute2,
  150. ylab = "amount of speeches",
  151. main = "Absolute comparison of speech shares",
  152. las = 1,
  153. names.arg = c("women", "men"),
  154. col = c("pink", "darkblue"),
  155. font.main = 4,
  156. cex.axis = 0.7)
  157. ```
  158. Since there are more men represented in the German Bundestag, we now consider the relative proportions of speeches, depending on the ratio of men and women.
  159. ```{r, fig.width=7}
  160. barplot(plot3$relative2,
  161. ylab = "amount of speeches",
  162. main = "Relative comparison of speech shares",
  163. las = 1,
  164. names.arg = c("women", "men"),
  165. col = c("pink", "darkblue"),
  166. font.main = 4,
  167. cex.axis = 0.7)
  168. ```