An R package to analyze the parliamentary records of the 19th legislative period of the Bundestag, the German parliament.
Nie możesz wybrać więcej, niż 25 tematów Tematy muszą się zaczynać od litery lub cyfry, mogą zawierać myślniki ('-') i mogą mieć do 35 znaków.

97 wiersze
2.2KB

  1. ---
  2. title: "genderequality"
  3. output: rmarkdown::html_vignette
  4. vignette: >
  5. %\VignetteIndexEntry{genderequality}
  6. %\VignetteEngine{knitr::rmarkdown}
  7. %\VignetteEncoding{UTF-8}
  8. ---
  9. ```{r, include = FALSE}
  10. knitr::opts_chunk$set(
  11. collapse = TRUE,
  12. comment = "#>"
  13. )
  14. ```
  15. ```{r setup}
  16. library(hateimparlament)
  17. library(dplyr)
  18. library(ggplot2)
  19. library(stringr)
  20. library(tidyr)
  21. library(rvest)
  22. ```
  23. ## Preparation of data
  24. First, you need to download all records of the current legislative period.
  25. ```r
  26. fetch_all("../records/") # path to directory where records should be stored
  27. ```
  28. Second, those `.xml` files, need to be parsed into `R` `tibbles`. This is accomplished by:
  29. ```r
  30. read_all("../records/") %>% repair() -> res
  31. ```
  32. We also used `repair` to fix a bunch of formatting issues in the records and unpacked
  33. the result into more descriptive variables.
  34. For development purposes, we load the tables from csv files.
  35. ```{r}
  36. res <- read_from_csv('../data/csv/')
  37. ```
  38. and unpack our tibbles
  39. ```{r}
  40. comments <- res$comments
  41. speeches <- res$speeches
  42. speaker <- res$speaker
  43. talks <- res$talks
  44. ```
  45. Bevor we can do our analysis, we have to assign a gender to our politicans.
  46. ```{r}
  47. extract_href <- function(sel, html) {
  48. html %>%
  49. html_node(sel) %>%
  50. html_attr("href")
  51. }
  52. first_content_p_text <- function(url) {
  53. res <- NA
  54. i <- 1
  55. while(is.na(res)) {
  56. read_html(url) %>%
  57. html_node(str_glue("#mw-content-text > div.mw-parser-output > p:nth-child({i})")) %>%
  58. html_text() -> res
  59. i <- i + 1
  60. }
  61. res
  62. }
  63. abgeordneten_list_html <- read_html(
  64. "https://de.wikipedia.org/wiki/Liste_der_Mitglieder_des_Deutschen_Bundestages_(19._Wahlperiode)")
  65. selectors <- str_glue("#mw-content-text > div.mw-parser-output > table:nth-child(20) > tbody > tr:nth-child({2:709}) > td:nth-child(2) > a")
  66. link_part2 <- sapply(selectors, extract_href, abgeordneten_list_html)
  67. link <- str_c("https://de.wikipedia.org", link_part2)
  68. text <- sapply(link, first_content_p_text)
  69. text %>%
  70. str_extract(" ist ein.") %>%
  71. str_replace(" ist eine", "female") %>%
  72. str_replace(" ist ein ", "male") ->
  73. gender
  74. text %>%
  75. str_extract("^([:upper:]?[:lower:]+[\\s\\-]?)*") %>%
  76. str_trim() ->
  77. names
  78. gender <- tibble(speaker = names,
  79. gender = gender)
  80. ```