|
- ---
- title: "Analysis of covered topics"
- output: rmarkdown::html_vignette
- vignette: >
- %\VignetteIndexEntry{Analysis of covered topics}
- %\VignetteEngine{knitr::rmarkdown}
- %\VignetteEncoding{UTF-8}
- ---
-
- ```{r, include = FALSE}
- knitr::opts_chunk$set(
- collapse = TRUE,
- comment = "#>"
- )
- ```
-
- ```{r setup}
- library(hateimparlament)
- library(dplyr)
- library(ggplot2)
- library(stringr)
- library(tidyr)
- ```
-
- ## Preparation of data
-
- First, you need to download all records of the current legislative period.
- ```r
- fetch_all("../inst/records/") # path to directory where records should be stored
- ```
- Second, those `.xml` files, need to be parsed into `R` `tibbles`. This is accomplished by:
- ```r
- read_all("../inst/records/") %>% repair() -> res
- ```
- We also used `repair` to fix a bunch of formatting issues in the records.
-
- For development purposes, we load the tables from csv files.
- ```{r}
- res <- read_from_csv('../inst/csv/')
- ```
-
- ## Analysis
-
- Now we can start analysing our parsed dataset:
-
- ### Counting the occurences of a given word:
-
- ```{r, fig.width=7, fig.height=7}
- find_word(res, "Kohleausstieg") %>%
- filter(occurences > 0) %>%
- join_speaker(res) %>%
- select(content, fraction) %>%
- filter(!is.na(fraction)) %>%
- group_by(fraction) %>%
- summarize(n = n()) %>%
- arrange(desc(n)) %>%
- bar_plot_fractions(title = "Parties using the word 'Kohleausstieg' the most (absolutely)",
- ylab = "Number of uses of 'Kohleausstieg'",
- flipped = F,
- rotatelab = T)
- ```
-
- ### When are which topics discussed the most?
-
- First we define some search patterns, according to some common political topics.
- ```{r}
- pandemic_pattern <- "(?i)virus|corona|covid|lockdown"
- climate_pattern <- "(?i)klimawandel|erderwärmung|co2|treibhaus|methan|kyoto-protokoll|klimaabkommen"
- pension_pattern <- "(?i)rente|pension|altersarmut"
- ```
- Then we use the analysis helper `word_usage_by_date` to generate a tibble counting the
- occurences of our search patterns per date. We can then plot the results:
- ```{r, fig.width=7, fig.height=6}
- word_usage_by_date(res, c(pandemic = pandemic_pattern,
- climate = climate_pattern,
- pension = pension_pattern)) %>%
- ggplot(aes(x = date, y = count, color = pattern)) +
- xlab("date of session") +
- ylab("occurence of word per session") +
- labs(color = "Topic") +
- geom_point()
- ```
|