--- title: "General questions" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{General questions} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(hateimparlament) library(dplyr) library(ggplot2) library(stringr) library(tidyr) ``` ## Preparation of data First, you need to download all records of the current legislative period. ```r fetch_all("../inst/records/") # path to directory where records should be stored ``` Second, those `.xml` files, need to be parsed into `R` `tibbles`. This is accomplished by: ```r read_all("../inst/records/") %>% repair() -> res ``` We also used `repair` to fix a bunch of formatting issues in the records. For development purposes, we load the tables from csv files. ```{r} res <- read_from_csv('../inst/csv/') ``` ## Analysis Now we can start analysing our parsed dataset: ### Which party gives the most talks? ```{r, fig.width=7} join_speaker(res$speeches, res) %>% group_by(fraction) %>% summarize(n = n()) %>% arrange(n) %>% bar_plot_fractions(title="Number of speeches given by fraction", ylab="Number of speeches") ``` Note that `NA` signifies speeches given by speakers who are not members of parliament. ### Who gives the most speeches? ```{r} res$speeches %>% group_by(speaker) %>% summarize(n = n()) %>% arrange(-n) %>% left_join(res$speaker, by=c("speaker" = "id")) %>% head(10) ``` ### Who talks the longest? Calculate the average character length of talks given by speakers: ```{r} res$talks %>% mutate(content_len = str_length(content)) %>% group_by(speaker) %>% summarize(avg_content_len = mean(content_len)) %>% arrange(-avg_content_len) %>% left_join(res$speaker, by=c("speaker" = "id")) %>% head(10) ```