diff --git a/README.md b/README.md index 0030d85..0a197e0 100644 --- a/README.md +++ b/README.md @@ -1,106 +1,97 @@ -# How to develop +# Description -```r -# everything works with devtools (loads some other packages too) -library(devtools) - -# reload all package functions -load_all() +R package to analyze parliamentary records of the 19th legislative period of the Bundestag, +the German parliament. -#write to CSV files to speed up loading -tables <- read_all() -tables <- repair(tables) -write_to_csv(tables) -``` -We NEVER use source(...), etc.! Also NEVER use library(...). -But to add new packages (as dependency), use: -```r -use_package("my-good-old-package") -``` -To make package imports available, you have to add them to `R/hateimparlament-package.R` -as `@import `. - -To reload / create documentation (calls roxygen) -```r -document() -``` +# Features -Build vignettes -```r -rmarkdown::render("vignettes/bla.Rmd") -``` +The package mainly supplies 4 functionalities: -# Download +## Download records -Before parsing, fetch.R must be run to download all protocols. +To analyze records, they need to be downloaded. This is done with `fetch_all`: ```r -fetch_all("../inst/records/") # path to directory where records should be stored +fetch_all("records/", create = TRUE) # path to directory where records should be stored ``` +This downloads all parliamentary records and stores them as `.xml` files in the given directory. -# Parsing - -## tables +## Parse records -parse.R parses all downloaded logs and creates 5 tibbles. -repair.R then cleans up the errors in these tibbles. +To use the records in R, they are converted to `tibble`s with ```r -read_all("../inst/records/") %>% repair() +res_raw <- read_all("records/") # path to directory where records are stored ``` - +`res_raw` is a named list with 5 `tibble`s: ### Speaker -structure: `id` , `first_name` , `last_name` , `fraction` , `title` , `role_short`, `role_long`. - +Table of all speakers of this legislative period. - -Obtained from the `` entry at the end of the transcripts. +Fields: +- `id`: Unique speaker id +- `prename`: Prename +- `lastname`: Surname +- `fraction`: Name of fraction if the speaker is member of parliament. +- `title`: Title, e.g. ,,Prof'' +- `role_short`: Short name of role, e.g. ,,Bundeskanzlerin'' +- `role_long`: Long name of role ### Speeches -Structure: `id` , `speaker` - -The speeches `id` is specified in the protocol and is unique.A speech is a `` entry in the session history. A speech always has a main speaker (the one standing at the front of the lectern). - -Within a speech, there can be different speech entries: - -- Comments: Applause, interjections, etc. -- Speeches: Typically mainly the main speaker, but also interjections. -These are stored in the talks, comments and applause tables when parsing. +Table of all speeches given during this legislative period. +Fields: +- `id`: Unique speech id +- `speaker`: Principal speaker (the person standing behind the lectern during the speech). +- `date`: Date of session ### Talks -Structure: `speech_id` , `speaker` , `content`. +Within a speech, there can be multiple talks by different people. Mostly this is the main speech +by the principal speaker, but usually there are questions by other members of parliament or +order calls by the president of the Bundestag. -These are the actual talk entries that appear within _speeches_. +Fields: +- `speech_id`: Speech in which this talk has been given +- `speaker`: Person that actually talks +- `content`: Spoken content -- `speech_id`: the speech in which the contribution appears. -- `speaker`: The speaker of the speech entry. -- `content`: The content of the speech. - -###comments +### Comments These are the interjections that appear during the speeches. -They have the following structure: -- `speech_id`: The speech that was interrupted. -- `on_speaker`: The speaker who was interrupted. -- `fraction` -- `commenter`: The person who interrupted the speech. -- `comment`: The content of the comment. +Fields: +- `speech_id`: The speech that was interrupted +- `on_speaker`: The speaker who was interrupted +- `fraction`: The fraction of the commenter +- `commenter`: The person who interrupted the speech +- `comment`: The content of the comment + +### Applause + +Table containing all the rounds of applause that happened during this legislative period. -###applause +Fields: +- `speech_id`: Speech during which was applauded +- `on_speaker`: Speaker who was applauded -The logical table shows which party applauded for which speaker with explicit speech and which did not. +And then logical fields `CDU_CSU`, `SPD`, `FDP`, `DIE_LINKE`, `BUENDNIS_90_DIE_GRUENEN`, `AfD` +for every fraction in the Bundestag, signifying whether this fraction applauded. -structure: `speech_id`, `on_speaker`, `CDU_CSU`, `SPD`, `FDP`, `DIE_LINKE`, `BUENDNIS_90_DIE_GRUENEN`, `AfD` +## Repair records +The parliamentary records usually contain some major and minor formatting issues. These are +mostly resolved by using +``` +res <- repair(res_raw) +``` +By passing `lookup_speaker = TRUE`, even commenters in +`res_raw$comments$ are matched with their respective speaker id. -# Analysis +## Analysis -analysis.R provides some functions to analyze the "Plenarprotokolle" and to create plots. +`analyze.R` provides some functions to analyze the parliamentary records and draw some plots. In the vignettes you can find different analyses of the protocols, for example: @@ -110,4 +101,44 @@ In the vignettes you can find different analyses of the protocols, for example: - "When are which topics discussed the most?" - ... +# Contributing + +Developing works the easiest with `devtools`: +```r +library(devtools) +``` +When you changed something or added some functionality, you can reload all package functions with +```r +load_all() +``` +If you want to avoid reading all records every time you start a new R session, you can +write your parsed tibbles to CSV files: + +``` +tables <- read_all() +tables <- repair(tables) +write_to_csv(tables, "path/to/csv/") +``` +Then later you can use +```r +res <- read_from_csv("path/to/csv/") +``` +to load your stored tibbles very fast. + +NEVER use source(...), etc.! Also NEVER use library(...). +To add new packages (as dependency), use: +```r +use_package("my-good-old-package") +``` +To make package imports available, you have to add them to `R/hateimparlament-package.R` +as `@import `. +To reload / create documentation (calls roxygen) +```r +document() +``` + +Build vignettes +```r +rmarkdown::render("vignettes/test.Rmd") +```