--- title: "Analysis of covered topics" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Analysis of covered topics} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(hateimparlament) library(dplyr) library(ggplot2) library(stringr) library(tidyr) ``` ## Preparation of data First, you need to download all records of the current legislative period. ```r fetch_all("../inst/records/") # path to directory where records should be stored ``` Second, those `.xml` files, need to be parsed into `R` `tibbles`. This is accomplished by: ```r read_all("../inst/records/") %>% repair() -> res ``` We also used `repair` to fix a bunch of formatting issues in the records. For development purposes, we only fetch records if they are not already stored as csv files: ```{r} res <- read_from_csv_or_fetch('../inst/') ``` ## Analysis Now we can start analysing our parsed dataset: ### Counting the occurences of a given word: ```{r, fig.width=7, fig.height=7} find_word(res, "Kohleausstieg") %>% filter(occurences > 0) %>% join_speaker(res) %>% select(content, fraction) %>% filter(!is.na(fraction)) %>% group_by(fraction) %>% summarize(n = n()) %>% arrange(desc(n)) %>% bar_plot_fractions(title = "Parties using the word 'Kohleausstieg' the most (absolutely)", ylab = "Number of uses of 'Kohleausstieg'", flipped = F, rotatelab = T) ``` ### When are which topics discussed the most? First we define some search patterns, according to some common political topics. ```{r} pandemic_pattern <- "(?i)virus|corona|covid|lockdown" climate_pattern <- "(?i)klimawandel|erderwärmung|co2|treibhaus|methan|kyoto-protokoll|klimaabkommen" pension_pattern <- "(?i)rente|pension|altersarmut" ``` Then we use the analysis helper `word_usage_by_date` to generate a tibble counting the occurences of our search patterns per date. We can then plot the results: ```{r, fig.width=7, fig.height=6} word_usage_by_date(res, c(pandemic = pandemic_pattern, climate = climate_pattern, pension = pension_pattern)) %>% ggplot(aes(x = date, y = count, color = pattern)) + xlab("date of session") + ylab("occurence of word per session") + labs(color = "Topic") + geom_point() ```