An R package to analyze the parliamentary records of the 19th legislative period of the Bundestag, the German parliament.
Ви не можете вибрати більше 25 тем Теми мають розпочинатися з літери або цифри, можуть містити дефіси (-) і не повинні перевищувати 35 символів.

72 рядки
1.6KB

  1. ---
  2. title: "funwithdata"
  3. output: rmarkdown::html_vignette
  4. vignette: >
  5. %\VignetteIndexEntry{funwithdata}
  6. %\VignetteEngine{knitr::rmarkdown}
  7. %\VignetteEncoding{UTF-8}
  8. ---
  9. ```{r, include = FALSE}
  10. knitr::opts_chunk$set(
  11. collapse = TRUE,
  12. comment = "#>"
  13. )
  14. ```
  15. ```{r setup}
  16. library(hateimparlament)
  17. library(dplyr)
  18. library(ggplot2)
  19. ```
  20. ## Preparation of data
  21. First, you need to download all records of the current legislative period.
  22. ```r
  23. fetch_all("../records/") # path to directory where records should be stored
  24. ```
  25. Second, those `.xml` files, need to be parsed into `R` `tibbles`. This is accomplished by:
  26. ```r
  27. read_all("../records/") %>% repair() -> res
  28. reden <- res$reden
  29. redner <- res$redner
  30. talks <- res$talks
  31. ```
  32. We also used `repair` to fix a bunch of formatting issues in the records and unpacked
  33. the result into more descriptive variables.
  34. For development purposes, we load the tables from csv files.
  35. ```{r}
  36. tables <- read_from_csv('../csv/')
  37. comments <- tables$comments
  38. reden <- tables$reden
  39. redner <- tables$redner
  40. talks <- tables$talks
  41. ```
  42. ## Analysis
  43. Now we can start analysing our parsed dataset, e.g. find out which party gives the most talks:
  44. ```{r}
  45. left_join(reden, redner, by=c("redner" = "id")) %>%
  46. group_by(fraktion) %>%
  47. summarize(n = n()) %>%
  48. ggplot(aes(x = fraktion, y = n)) +
  49. geom_bar(stat = "identity")
  50. ```
  51. ### Count a word occurence
  52. ```{r}
  53. find_word(res, "hitler") %>%
  54. filter(occurences > 0) %>%
  55. join_redner(res) %>%
  56. select(content, fraktion) %>%
  57. group_by(fraktion) %>%
  58. summarize(n = n()) %>%
  59. arrange(desc(n))
  60. ```