An R package to analyze the parliamentary records of the 19th legislative period of the Bundestag, the German parliament.
Ви не можете вибрати більше 25 тем Теми мають розпочинатися з літери або цифри, можуть містити дефіси (-) і не повинні перевищувати 35 символів.

84 рядки
1.9KB

  1. ---
  2. title: "General questions"
  3. output: rmarkdown::html_vignette
  4. vignette: >
  5. %\VignetteIndexEntry{General questions}
  6. %\VignetteEngine{knitr::rmarkdown}
  7. %\VignetteEncoding{UTF-8}
  8. ---
  9. ```{r, include = FALSE}
  10. knitr::opts_chunk$set(
  11. collapse = TRUE,
  12. comment = "#>"
  13. )
  14. ```
  15. ```{r setup}
  16. library(hateimparlament)
  17. library(dplyr)
  18. library(ggplot2)
  19. library(stringr)
  20. library(tidyr)
  21. ```
  22. ## Preparation of data
  23. First, you need to download all records of the current legislative period.
  24. ```r
  25. fetch_all("../inst/records/") # path to directory where records should be stored
  26. ```
  27. Second, those `.xml` files, need to be parsed into `R` `tibbles`. This is accomplished by:
  28. ```r
  29. read_all("../inst/records/") %>% repair() -> res
  30. ```
  31. We also used `repair` to fix a bunch of formatting issues in the records.
  32. For development purposes, we only fetch records if they are not already
  33. stored as csv files:
  34. ```{r}
  35. res <- read_from_csv_or_fetch('../inst/')
  36. ```
  37. ## Analysis
  38. Now we can start analysing our parsed dataset:
  39. ### Which party gives the most talks?
  40. ```{r, fig.width=7}
  41. join_speaker(res$speeches, res) %>%
  42. group_by(fraction) %>%
  43. summarize(n = n()) %>%
  44. arrange(n) %>%
  45. bar_plot_fractions(title="Number of speeches given by fraction",
  46. ylab="Number of speeches")
  47. ```
  48. Note that `NA` signifies speeches given by speakers who are not members of parliament.
  49. ### Who gives the most speeches?
  50. ```{r}
  51. res$speeches %>%
  52. group_by(speaker) %>%
  53. summarize(n = n()) %>%
  54. arrange(-n) %>%
  55. left_join(res$speaker, by=c("speaker" = "id")) %>%
  56. head(10)
  57. ```
  58. ### Who talks the longest?
  59. Calculate the average character length of talks given by speakers:
  60. ```{r}
  61. res$talks %>%
  62. mutate(content_len = str_length(content)) %>%
  63. group_by(speaker) %>%
  64. summarize(avg_content_len = mean(content_len)) %>%
  65. arrange(-avg_content_len) %>%
  66. left_join(res$speaker, by=c("speaker" = "id")) %>%
  67. head(10)
  68. ```