|
- ---
- title: "General questions"
- output: rmarkdown::html_vignette
- vignette: >
- %\VignetteIndexEntry{General questions}
- %\VignetteEngine{knitr::rmarkdown}
- %\VignetteEncoding{UTF-8}
- ---
-
- ```{r, include = FALSE}
- knitr::opts_chunk$set(
- collapse = TRUE,
- comment = "#>"
- )
- ```
-
- ```{r setup}
- library(hateimparlament)
- library(dplyr)
- library(ggplot2)
- library(stringr)
- library(tidyr)
- ```
-
- ## Preparation of data
-
- First, you need to download all records of the current legislative period.
- ```r
- fetch_all("../inst/records/") # path to directory where records should be stored
- ```
- Second, those `.xml` files, need to be parsed into `R` `tibbles`. This is accomplished by:
- ```r
- read_all("../inst/records/") %>% repair() -> res
- ```
- We also used `repair` to fix a bunch of formatting issues in the records.
-
- For development purposes, we only fetch records if they are not already
- stored as csv files:
- ```{r}
- res <- read_from_csv_or_fetch('../inst/')
- ```
-
- ## Analysis
-
- Now we can start analysing our parsed dataset:
-
- ### Which party gives the most talks?
-
- ```{r, fig.width=7}
- join_speaker(res$speeches, res) %>%
- group_by(fraction) %>%
- summarize(n = n()) %>%
- arrange(n) %>%
- bar_plot_fractions(title="Number of speeches given by fraction",
- ylab="Number of speeches")
- ```
-
- Note that `NA` signifies speeches given by speakers who are not members of parliament.
-
- ### Who gives the most speeches?
-
- ```{r}
- res$speeches %>%
- group_by(speaker) %>%
- summarize(n = n()) %>%
- arrange(-n) %>%
- left_join(res$speaker, by=c("speaker" = "id")) %>%
- head(10)
- ```
-
- ### Who talks the longest?
-
- Calculate the average character length of talks given by speakers:
-
- ```{r}
- res$talks %>%
- mutate(content_len = str_length(content)) %>%
- group_by(speaker) %>%
- summarize(avg_content_len = mean(content_len)) %>%
- arrange(-avg_content_len) %>%
- left_join(res$speaker, by=c("speaker" = "id")) %>%
- head(10)
- ```
|