An R package to analyze the parliamentary records of the 19th legislative period of the Bundestag, the German parliament.
25'ten fazla konu seçemezsiniz Konular bir harf veya rakamla başlamalı, kısa çizgiler ('-') içerebilir ve en fazla 35 karakter uzunluğunda olabilir.

97 satır
2.2KB

  1. ---
  2. title: "genderequality"
  3. output: rmarkdown::html_vignette
  4. vignette: >
  5. %\VignetteIndexEntry{genderequality}
  6. %\VignetteEngine{knitr::rmarkdown}
  7. %\VignetteEncoding{UTF-8}
  8. ---
  9. ```{r, include = FALSE}
  10. knitr::opts_chunk$set(
  11. collapse = TRUE,
  12. comment = "#>"
  13. )
  14. ```
  15. ```{r setup}
  16. library(hateimparlament)
  17. library(dplyr)
  18. library(ggplot2)
  19. library(stringr)
  20. library(tidyr)
  21. library(rvest)
  22. ```
  23. ## Preparation of data
  24. First, you need to download all records of the current legislative period.
  25. ```r
  26. fetch_all("../records/") # path to directory where records should be stored
  27. ```
  28. Second, those `.xml` files, need to be parsed into `R` `tibbles`. This is accomplished by:
  29. ```r
  30. read_all("../records/") %>% repair() -> res
  31. ```
  32. We also used `repair` to fix a bunch of formatting issues in the records and unpacked
  33. the result into more descriptive variables.
  34. For development purposes, we load the tables from csv files.
  35. ```{r}
  36. res <- read_from_csv('../inst/csv/')
  37. ```
  38. and unpack our tibbles
  39. ```{r}
  40. comments <- res$comments
  41. speeches <- res$speeches
  42. speaker <- res$speaker
  43. talks <- res$talks
  44. ```
  45. Bevor we can do our analysis, we have to assign a gender to our politicans.
  46. ```{r}
  47. extract_href <- function(sel, html) {
  48. html %>%
  49. html_node(sel) %>%
  50. html_attr("href")
  51. }
  52. first_content_p_text <- function(url) {
  53. res <- NA
  54. i <- 1
  55. while(is.na(res)) {
  56. read_html(url) %>%
  57. html_node(str_glue("#mw-content-text > div.mw-parser-output > p:nth-child({i})")) %>%
  58. html_text() -> res
  59. i <- i + 1
  60. }
  61. res
  62. }
  63. abgeordneten_list_html <- read_html(
  64. "https://de.wikipedia.org/wiki/Liste_der_Mitglieder_des_Deutschen_Bundestages_(19._Wahlperiode)")
  65. selectors <- str_glue("#mw-content-text > div.mw-parser-output > table:nth-child(20) > tbody > tr:nth-child({2:709}) > td:nth-child(2) > a")
  66. link_part2 <- sapply(selectors, extract_href, abgeordneten_list_html)
  67. link <- str_c("https://de.wikipedia.org", link_part2)
  68. text <- sapply(link, first_content_p_text)
  69. text %>%
  70. str_extract(" ist ein.") %>%
  71. str_replace(" ist eine", "female") %>%
  72. str_replace(" ist ein ", "male") ->
  73. gender
  74. text %>%
  75. str_extract("^([:upper:]?[:lower:]+[\\s\\-]?)*") %>%
  76. str_trim() ->
  77. names
  78. gender <- tibble(speaker = names,
  79. gender = gender)
  80. ```