--- title: "funwithdata" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{funwithdata} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(hateimparlament) library(dplyr) library(ggplot2) ``` ## Preparation of data First, you need to download all records of the current legislative period. ```r fetch_all("../records/") # path to directory where records should be stored ``` Second, those `.xml` files, need to be parsed into `R` `tibbles`. This is accomplished by: ```r read_all("../records/") %>% repair() -> res ``` We also used `repair` to fix a bunch of formatting issues in the records and unpacked the result into more descriptive variables. For development purposes, we load the tables from csv files. ```{r} res <- read_from_csv('../csv/') ``` and unpack our tibbles ```{r} comments <- res$comments reden <- res$reden redner <- res$redner talks <- res$talks ``` ## Analysis Now we can start analysing our parsed dataset, e.g. find out which party gives the most talks: ```{r, fig.width=10} join_redner(reden, res) %>% group_by(fraktion) %>% summarize(n = n()) %>% arrange(n) %>% bar_plot_fraktionen() ``` ### Count a word occurence ```{r, fig.width=10} find_word(res, "hitler") %>% filter(occurences > 0) %>% join_redner(res) %>% select(content, fraktion) %>% group_by(fraktion) %>% summarize(n = n()) %>% arrange(desc(n)) %>% bar_plot_fraktionen() ```