An R package to analyze the parliamentary records of the 19th legislative period of the Bundestag, the German parliament.
Du kannst nicht mehr als 25 Themen auswählen Themen müssen entweder mit einem Buchstaben oder einer Ziffer beginnen. Sie können Bindestriche („-“) enthalten und bis zu 35 Zeichen lang sein.

84 Zeilen
2.4KB

  1. ---
  2. title: "explicittopic"
  3. output: rmarkdown::html_vignette
  4. vignette: >
  5. %\VignetteIndexEntry{explicittopic}
  6. %\VignetteEngine{knitr::rmarkdown}
  7. %\VignetteEncoding{UTF-8}
  8. ---
  9. ```{r, include = FALSE}
  10. knitr::opts_chunk$set(
  11. collapse = TRUE,
  12. comment = "#>"
  13. )
  14. ```
  15. ```{r setup}
  16. library(hateimparlament)
  17. library(dplyr)
  18. library(ggplot2)
  19. library(stringr)
  20. library(tidyr)
  21. ```
  22. ## Preparation of data
  23. First, you need to download all records of the current legislative period.
  24. ```r
  25. fetch_all("../inst/records/") # path to directory where records should be stored
  26. ```
  27. Second, those `.xml` files, need to be parsed into `R` `tibbles`. This is accomplished by:
  28. ```r
  29. read_all("../inst/records/") %>% repair() -> res
  30. ```
  31. We also used `repair` to fix a bunch of formatting issues in the records.
  32. For development purposes, we load the tables from csv files.
  33. ```{r}
  34. res <- read_from_csv('../inst/csv/')
  35. ```
  36. ## Analysis
  37. Now we can start analysing our parsed dataset:
  38. ### Counting the occurences of a given word:
  39. ```{r, fig.width=7, fig.height=7}
  40. find_word(res, "Kohleausstieg") %>%
  41. filter(occurences > 0) %>%
  42. join_speaker(res) %>%
  43. select(content, fraction) %>%
  44. filter(!is.na(fraction)) %>%
  45. group_by(fraction) %>%
  46. summarize(n = n()) %>%
  47. arrange(desc(n)) %>%
  48. bar_plot_fractions(title = "Parties using the word 'Kohleausstieg' the most (absolutely)",
  49. ylab = "Number of uses of 'Kohleausstieg'",
  50. flipped = F,
  51. rotatelab = T)
  52. ```
  53. ### When are which topics discussed the most?
  54. First we define some search patterns, according to some common political topics.
  55. ```{r}
  56. pandemic_pattern <- "(?i)virus|corona|covid|lockdown"
  57. climate_pattern <- "(?i)klimawandel|erderwärmung|co2|treibhaus|methan|kyoto-protokoll|klimaabkommen"
  58. pension_pattern <- "(?i)rente|pension|altersarmut"
  59. ```
  60. Then we use the analysis helper `word_usage_by_date` to generate a tibble counting the
  61. occurences of our search patterns per date. We can then plot the results:
  62. ```{r, fig.width=7, fig.height=6}
  63. word_usage_by_date(res, c(pandemic = pandemic_pattern,
  64. climate = climate_pattern,
  65. pension = pension_pattern)) %>%
  66. ggplot(aes(x = date, y = count, color = pattern)) +
  67. xlab("date of session") +
  68. ylab("occurence of word per session") +
  69. labs(color = "Topic") +
  70. geom_point()
  71. ```