An R package to analyze the parliamentary records of the 19th legislative period of the Bundestag, the German parliament.
No puede seleccionar más de 25 temas Los temas deben comenzar con una letra o número, pueden incluir guiones ('-') y pueden tener hasta 35 caracteres de largo.

87 líneas
1.9KB

  1. ---
  2. title: "generalquestions"
  3. output: rmarkdown::html_vignette
  4. vignette: >
  5. %\VignetteIndexEntry{generalquestions}
  6. %\VignetteEngine{knitr::rmarkdown}
  7. %\VignetteEncoding{UTF-8}
  8. ---
  9. ```{r, include = FALSE}
  10. knitr::opts_chunk$set(
  11. collapse = TRUE,
  12. comment = "#>"
  13. )
  14. ```
  15. ```{r setup}
  16. library(hateimparlament)
  17. library(dplyr)
  18. library(ggplot2)
  19. library(stringr)
  20. library(tidyr)
  21. ```
  22. ## Preparation of data
  23. First, you need to download all records of the current legislative period.
  24. ```r
  25. fetch_all("../inst/records/") # path to directory where records should be stored
  26. ```
  27. Second, those `.xml` files, need to be parsed into `R` `tibbles`. This is accomplished by:
  28. ```r
  29. read_all("../inst/records/") %>% repair() -> res
  30. ```
  31. We also used `repair` to fix a bunch of formatting issues in the records and unpacked
  32. the result into more descriptive variables.
  33. For development purposes, we load the tables from csv files.
  34. ```{r}
  35. res <- read_from_csv('../inst/csv/')
  36. ```
  37. and unpack our tibbles
  38. ```{r}
  39. comments <- res$comments
  40. speeches <- res$speeches
  41. speaker <- res$speaker
  42. talks <- res$talks
  43. ```
  44. ## Analysis
  45. Now we can start analysing our parsed dataset:
  46. ### Which partie gives the most talkes?
  47. ```{r, fig.width=7}
  48. join_speaker(res$speeches, res) %>%
  49. group_by(fraction) %>%
  50. summarize(n = n()) %>%
  51. arrange(n) %>%
  52. bar_plot_fractions(title="Number of speeches given by fraction",
  53. ylab="Number of speeches")
  54. ```
  55. ### Who gives the most speeches?
  56. ```{r}
  57. res$speeches %>%
  58. group_by(speaker) %>%
  59. summarize(n = n()) %>%
  60. arrange(-n) %>%
  61. left_join(res$speaker, by=c("speaker" = "id")) %>%
  62. head(10)
  63. ```
  64. ### Who talks the longest?
  65. ```{r}
  66. res$talks %>%
  67. mutate(content_len = str_length(content)) %>%
  68. group_by(speaker) %>%
  69. summarize(avg_content_len = mean(content_len)) %>%
  70. arrange(-avg_content_len) %>%
  71. left_join(res$speaker, by=c("speaker" = "id")) %>%
  72. head(10)
  73. ```