An R package to analyze the parliamentary records of the 19th legislative period of the Bundestag, the German parliament.
Ви не можете вибрати більше 25 тем Теми мають розпочинатися з літери або цифри, можуть містити дефіси (-) і не повинні перевищувати 35 символів.

85 рядки
2.4KB

  1. ---
  2. title: "Analysis of covered topics"
  3. output: rmarkdown::html_vignette
  4. vignette: >
  5. %\VignetteIndexEntry{Analysis of covered topics}
  6. %\VignetteEngine{knitr::rmarkdown}
  7. %\VignetteEncoding{UTF-8}
  8. ---
  9. ```{r, include = FALSE}
  10. knitr::opts_chunk$set(
  11. collapse = TRUE,
  12. comment = "#>"
  13. )
  14. ```
  15. ```{r setup}
  16. library(hateimparlament)
  17. library(dplyr)
  18. library(ggplot2)
  19. library(stringr)
  20. library(tidyr)
  21. ```
  22. ## Preparation of data
  23. First, you need to download all records of the current legislative period.
  24. ```r
  25. fetch_all("../inst/records/") # path to directory where records should be stored
  26. ```
  27. Second, those `.xml` files, need to be parsed into `R` `tibbles`. This is accomplished by:
  28. ```r
  29. read_all("../inst/records/") %>% repair() -> res
  30. ```
  31. We also used `repair` to fix a bunch of formatting issues in the records.
  32. For development purposes, we only fetch records if they are not already
  33. stored as csv files:
  34. ```{r}
  35. res <- read_from_csv_or_fetch('../inst/')
  36. ```
  37. ## Analysis
  38. Now we can start analysing our parsed dataset:
  39. ### Counting the occurences of a given word:
  40. ```{r, fig.width=7, fig.height=7}
  41. find_word(res, "Kohleausstieg") %>%
  42. filter(occurences > 0) %>%
  43. join_speaker(res) %>%
  44. select(content, fraction) %>%
  45. filter(!is.na(fraction)) %>%
  46. group_by(fraction) %>%
  47. summarize(n = n()) %>%
  48. arrange(desc(n)) %>%
  49. bar_plot_fractions(title = "Parties using the word 'Kohleausstieg' the most (absolutely)",
  50. ylab = "Number of uses of 'Kohleausstieg'",
  51. flipped = F,
  52. rotatelab = T)
  53. ```
  54. ### When are which topics discussed the most?
  55. First we define some search patterns, according to some common political topics.
  56. ```{r}
  57. pandemic_pattern <- "(?i)virus|corona|covid|lockdown"
  58. climate_pattern <- "(?i)klimawandel|erderwärmung|co2|treibhaus|methan|kyoto-protokoll|klimaabkommen"
  59. pension_pattern <- "(?i)rente|pension|altersarmut"
  60. ```
  61. Then we use the analysis helper `word_usage_by_date` to generate a tibble counting the
  62. occurences of our search patterns per date. We can then plot the results:
  63. ```{r, fig.width=7, fig.height=6}
  64. word_usage_by_date(res, c(pandemic = pandemic_pattern,
  65. climate = climate_pattern,
  66. pension = pension_pattern)) %>%
  67. ggplot(aes(x = date, y = count, color = pattern)) +
  68. xlab("date of session") +
  69. ylab("occurence of word per session") +
  70. labs(color = "Topic") +
  71. geom_point()
  72. ```