--- title: "funwithdata" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{funwithdata} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(hateimparlament) library(dplyr) library(ggplot2) ``` ## Preparation of data First, you need to download all records of the current legislative period. ```r fetch_all("../records/") # path to directory where records should be stored ``` Second, those `.xml` files, need to be parsed into `R` `tibbles`. This is accomplished by: ```r read_all("../records/") %>% repair() -> res reden <- res$reden redner <- res$redner talks <- res$talks ``` We also used `repair` to fix a bunch of formatting issues in the records and unpacked the result into more descriptive variables. For development purposes, we load the tables from csv files. ```{r} tables <- read_from_csv('../csv/') comments <- tables$comments reden <- tables$reden redner <- tables$redner talks <- tables$talks ``` ## Analysis Now we can start analysing our parsed dataset, e.g. find out which party gives the most talks: ```{r} left_join(reden, redner, by=c("redner" = "id")) %>% group_by(fraktion) %>% summarize(n = n()) %>% ggplot(aes(x = fraktion, y = n)) + geom_bar(stat = "identity") ``` ### Count a word occurence ```{r} find_word(res, "hitler") %>% filter(occurences > 0) %>% join_redner(res) %>% select(content, fraktion) %>% group_by(fraktion) %>% summarize(n = n()) %>% arrange(desc(n)) ```