|
- ---
- title: "funwithdata"
- output: rmarkdown::html_vignette
- vignette: >
- %\VignetteIndexEntry{funwithdata}
- %\VignetteEngine{knitr::rmarkdown}
- %\VignetteEncoding{UTF-8}
- ---
-
- ```{r, include = FALSE}
- knitr::opts_chunk$set(
- collapse = TRUE,
- comment = "#>"
- )
- ```
-
- ```{r setup}
- library(hateimparlament)
- library(dplyr)
- library(ggplot2)
- ```
-
- ## Preparation of data
-
- First, you need to download all records of the current legislative period.
- ```r
- fetch_all("../records/") # path to directory where records should be stored
- ```
- Second, those `.xml` files, need to be parsed into `R` `tibbles`. This is accomplished by:
- ```r
- read_all("../records/") %>% repair() -> res
- ```
- We also used `repair` to fix a bunch of formatting issues in the records and unpacked
- the result into more descriptive variables.
-
- For development purposes, we load the tables from csv files.
- ```{r}
- res <- read_from_csv('../csv/')
- ```
- and unpack our tibbles
- ```{r}
- comments <- res$comments
- reden <- res$reden
- redner <- res$redner
- talks <- res$talks
- ```
-
- ## Analysis
-
- Now we can start analysing our parsed dataset, e.g. find out which party gives the most talks:
- ```{r, fig.width=8}
- join_redner(res$reden, res) %>%
- group_by(fraktion) %>%
- summarize(n = n()) %>%
- arrange(n) %>%
- bar_plot_fraktionen()
- ```
-
- ### Count a word occurence
-
- ```{r, fig.width=8}
- find_word(res, "hitler") %>%
- filter(occurences > 0) %>%
- join_redner(res) %>%
- select(content, fraktion) %>%
- group_by(fraktion) %>%
- summarize(n = n()) %>%
- arrange(desc(n)) %>%
- bar_plot_fraktionen()
- ```
-
- ### Who gives the most speeches?
-
- ```{r}
- res$reden %>%
- group_by(redner) %>%
- summarize(n = n()) %>%
- arrange(-n) %>%
- left_join(res$redner, by=c("redner" = "id")) %>%
- head(10)
- ```
-
- ### Who talks the longest?
-
- ```{r}
- res$talks %>%
- mutate(content_len = str_length(content)) %>%
- group_by(redner) %>%
- summarize(avg_content_len = mean(content_len)) %>%
- arrange(-avg_content_len) %>%
- left_join(res$redner, by=c("redner" = "id")) %>%
- head(10)
- ```
-
- ### Which party gives the most applause to which parties?
-
- ```{r}
- res$applause %>%
- left_join(res$redner, by=c("on_redner" = "id")) %>%
- select(on_fraktion = fraktion, where(is.logical)) %>%
- group_by(on_fraktion) %>%
- arrange(on_fraktion) %>%
- summarize("AfD" = sum(`AfD`),
- "BÜNDNIS 90 / DIE GRÜNEN" = sum(`BÜNDNIS_90_DIE_GRÜNEN`),
- "CDU/CSU" = sum(`CDU_CSU`),
- "DIE LINKE" = sum(`DIE_LINKE`),
- "FDP" = sum(`FDP`),
- "SPD" = sum(`SPD`))
- ```
-
- ### Which party comments the most on which parties?
-
- ```{r}
- res$comments %>%
- left_join(res$redner, by=c("on_redner" = "id")) %>%
- select(by_fraktion = fraktion.x, on_fraktion = fraktion.y) %>%
- group_by(on_fraktion) %>%
- summarize(`AfD` = sum(str_detect(by_fraktion, "AfD"), na.rm=T),
- `BÜNDNIS 90 / DIE GRÜNEN` = sum(str_detect(by_fraktion, "BÜNDNIS 90/DIE GRÜNEN"), na.rm=T),
- `CDU/CSU` = sum(str_detect(by_fraktion, "CDU/CSU"), na.rm = T),
- `DIE LINKE` = sum(str_detect(by_fraktion, "DIE LINKE"), na.rm=T),
- `FDP` = sum(str_detect(by_fraktion, "FDP"), na.rm=T),
- `SPD` = sum(str_detect(by_fraktion, "SPD"), na.rm=T))
- ```
-
- ### When are which topics discussed the most?
-
- ```{r, fig.width=8}
- pandemic_pattern <- "(?i)virus|corona|covid|lockdown"
- climate_pattern <- "(?i)klimawandel|erderwärmung|co2|treibhaus|methan|kyoto-protokoll|klimaabkommen"
- pension_pattern <- "(?i)rente|pension|altersarmut"
-
- word_usage_by_date(res, c(pandemic = pandemic_pattern,
- climate = climate_pattern,
- pension = pension_pattern)) %>%
- ggplot(aes(x = date, y = count, color = pattern)) +
- xlab("date of session") +
- ylab("occurence of word per session") +
- labs(color = "Topic") +
- geom_point()
- ```
|