|
|
@@ -1,106 +1,111 @@ |
|
|
# How to develop |
|
|
|
|
|
|
|
|
# Description |
|
|
|
|
|
|
|
|
```r |
|
|
|
|
|
# everything works with devtools (loads some other packages too) |
|
|
|
|
|
library(devtools) |
|
|
|
|
|
|
|
|
R package to analyze parliamentary records of the 19th legislative period of the Bundestag, |
|
|
|
|
|
the German parliament. |
|
|
|
|
|
|
|
|
# reload all package functions |
|
|
|
|
|
load_all() |
|
|
|
|
|
|
|
|
# Installation |
|
|
|
|
|
|
|
|
#write to CSV files to speed up loading |
|
|
|
|
|
tables <- read_all() |
|
|
|
|
|
tables <- repair(tables) |
|
|
|
|
|
write_to_csv(tables) |
|
|
|
|
|
``` |
|
|
|
|
|
We NEVER use source(...), etc.! Also NEVER use library(...). |
|
|
|
|
|
But to add new packages (as dependency), use: |
|
|
|
|
|
|
|
|
Using the `remotes` package, this is easily installed via: |
|
|
```r |
|
|
```r |
|
|
use_package("my-good-old-package") |
|
|
|
|
|
|
|
|
remotes::install_url("https://git.flavigny.de/christian/hateimparlament/archive/master.zip") |
|
|
``` |
|
|
``` |
|
|
To make package imports available, you have to add them to `R/hateimparlament-package.R` |
|
|
|
|
|
as `@import <package>`. |
|
|
|
|
|
|
|
|
|
|
|
To reload / create documentation (calls roxygen) |
|
|
|
|
|
```r |
|
|
|
|
|
document() |
|
|
|
|
|
``` |
|
|
|
|
|
|
|
|
# Features |
|
|
|
|
|
|
|
|
Build vignettes |
|
|
|
|
|
```r |
|
|
|
|
|
rmarkdown::render("vignettes/bla.Rmd") |
|
|
|
|
|
``` |
|
|
|
|
|
|
|
|
The package mainly supplies 4 functionalities: |
|
|
|
|
|
|
|
|
# Download |
|
|
|
|
|
|
|
|
## Download records |
|
|
|
|
|
|
|
|
Before parsing, fetch.R must be run to download all protocols. |
|
|
|
|
|
|
|
|
To analyze records, they need to be downloaded. This is done with `fetch_all`: |
|
|
```r |
|
|
```r |
|
|
fetch_all("../inst/records/") # path to directory where records should be stored |
|
|
|
|
|
|
|
|
fetch_all("records/", create = TRUE) # path to directory where records should be stored |
|
|
``` |
|
|
``` |
|
|
|
|
|
This downloads all parliamentary records and stores them as `.xml` files in the given directory. |
|
|
|
|
|
|
|
|
# Parsing |
|
|
|
|
|
|
|
|
## Parse records |
|
|
|
|
|
|
|
|
## tables |
|
|
|
|
|
|
|
|
|
|
|
parse.R parses all downloaded logs and creates 5 tibbles. |
|
|
|
|
|
repair.R then cleans up the errors in these tibbles. |
|
|
|
|
|
|
|
|
To use the records in R, they are converted to `tibble`s with |
|
|
```r |
|
|
```r |
|
|
read_all("../inst/records/") %>% repair() |
|
|
|
|
|
|
|
|
res_raw <- read_all("records/") # path to directory where records are stored |
|
|
``` |
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
`res_raw` is a named list with 5 `tibble`s: |
|
|
|
|
|
|
|
|
### Speaker |
|
|
### Speaker |
|
|
|
|
|
|
|
|
structure: `id` , `first_name` , `last_name` , `fraction` , `title` , `role_short`, `role_long`. |
|
|
|
|
|
|
|
|
Table of all speakers of this legislative period. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Obtained from the `<speaker list>` entry at the end of the transcripts. |
|
|
|
|
|
|
|
|
Fields: |
|
|
|
|
|
- `id`: Unique speaker id |
|
|
|
|
|
- `prename`: Prename |
|
|
|
|
|
- `lastname`: Surname |
|
|
|
|
|
- `fraction`: Name of fraction if the speaker is member of parliament. |
|
|
|
|
|
- `title`: Title, e.g. ,,Prof'' |
|
|
|
|
|
- `role_short`: Short name of role, e.g. ,,Bundeskanzlerin'' |
|
|
|
|
|
- `role_long`: Long name of role |
|
|
|
|
|
|
|
|
### Speeches |
|
|
### Speeches |
|
|
|
|
|
|
|
|
Structure: `id` , `speaker` |
|
|
|
|
|
|
|
|
Table of all speeches given during this legislative period. |
|
|
|
|
|
|
|
|
The speeches `id` is specified in the protocol and is unique.A speech is a `<speech>` entry in the session history. A speech always has a main speaker (the one standing at the front of the lectern). |
|
|
|
|
|
|
|
|
Fields: |
|
|
|
|
|
- `id`: Unique speech id |
|
|
|
|
|
- `speaker`: Principal speaker (the person standing behind the lectern during the speech). |
|
|
|
|
|
- `date`: Date of session |
|
|
|
|
|
|
|
|
Within a speech, there can be different speech entries: |
|
|
|
|
|
|
|
|
### Talks |
|
|
|
|
|
|
|
|
- Comments: Applause, interjections, etc. |
|
|
|
|
|
- Speeches: Typically mainly the main speaker, but also interjections. |
|
|
|
|
|
These are stored in the talks, comments and applause tables when parsing. |
|
|
|
|
|
|
|
|
Within a speech, there can be multiple talks by different people. Mostly this is the main speech |
|
|
|
|
|
by the principal speaker, but usually there are questions by other members of parliament or |
|
|
|
|
|
order calls by the president of the Bundestag. |
|
|
|
|
|
|
|
|
|
|
|
Fields: |
|
|
|
|
|
- `speech_id`: Speech in which this talk has been given |
|
|
|
|
|
- `speaker`: Person that actually talks |
|
|
|
|
|
- `content`: Spoken content |
|
|
|
|
|
|
|
|
### Talks |
|
|
|
|
|
|
|
|
### Comments |
|
|
|
|
|
|
|
|
Structure: `speech_id` , `speaker` , `content`. |
|
|
|
|
|
|
|
|
These are the interjections that appear during the speeches. |
|
|
|
|
|
|
|
|
These are the actual talk entries that appear within _speeches_. |
|
|
|
|
|
|
|
|
Fields: |
|
|
|
|
|
- `speech_id`: The speech that was interrupted |
|
|
|
|
|
- `on_speaker`: The speaker who was interrupted |
|
|
|
|
|
- `fraction`: The fraction of the commenter |
|
|
|
|
|
- `commenter`: The person who interrupted the speech |
|
|
|
|
|
- `comment`: The content of the comment |
|
|
|
|
|
|
|
|
- `speech_id`: the speech in which the contribution appears. |
|
|
|
|
|
- `speaker`: The speaker of the speech entry. |
|
|
|
|
|
- `content`: The content of the speech. |
|
|
|
|
|
|
|
|
### Applause |
|
|
|
|
|
|
|
|
###comments |
|
|
|
|
|
|
|
|
Table containing all the rounds of applause that happened during this legislative period. |
|
|
|
|
|
|
|
|
These are the interjections that appear during the speeches. |
|
|
|
|
|
|
|
|
Fields: |
|
|
|
|
|
- `speech_id`: Speech during which was applauded |
|
|
|
|
|
- `on_speaker`: Speaker who was applauded |
|
|
|
|
|
|
|
|
They have the following structure: |
|
|
|
|
|
- `speech_id`: The speech that was interrupted. |
|
|
|
|
|
- `on_speaker`: The speaker who was interrupted. |
|
|
|
|
|
- `fraction` |
|
|
|
|
|
- `commenter`: The person who interrupted the speech. |
|
|
|
|
|
- `comment`: The content of the comment. |
|
|
|
|
|
|
|
|
And then logical fields `CDU_CSU`, `SPD`, `FDP`, `DIE_LINKE`, `BUENDNIS_90_DIE_GRUENEN`, `AfD` |
|
|
|
|
|
for every fraction in the Bundestag, signifying whether this fraction applauded. |
|
|
|
|
|
|
|
|
###applause |
|
|
|
|
|
|
|
|
## Repair records |
|
|
|
|
|
|
|
|
The logical table shows which party applauded for which speaker with explicit speech and which did not. |
|
|
|
|
|
|
|
|
The parliamentary records usually contain some major and minor formatting issues. These are |
|
|
|
|
|
mostly resolved by using |
|
|
|
|
|
``` |
|
|
|
|
|
res <- repair(res_raw) |
|
|
|
|
|
``` |
|
|
|
|
|
By passing `lookup_speaker = TRUE`, even commenters in |
|
|
|
|
|
`res_raw$comments` are matched with their respective speaker id. |
|
|
|
|
|
|
|
|
structure: `speech_id`, `on_speaker`, `CDU_CSU`, `SPD`, `FDP`, `DIE_LINKE`, `BUENDNIS_90_DIE_GRUENEN`, `AfD` |
|
|
|
|
|
|
|
|
## Analysis |
|
|
|
|
|
|
|
|
|
|
|
Also some functions are provided to analyze the parliamentary records and draw some plots: |
|
|
|
|
|
|
|
|
# Analysis |
|
|
|
|
|
|
|
|
- `bar_plot_fractions` |
|
|
|
|
|
- `find_word` |
|
|
|
|
|
- `join_speaker` |
|
|
|
|
|
- `word_usage_by_date` |
|
|
|
|
|
|
|
|
analysis.R provides some functions to analyze the "Plenarprotokolle" and to create plots. |
|
|
|
|
|
|
|
|
See their usage with the `?` operator. |
|
|
|
|
|
|
|
|
In the vignettes you can find different analyses of the protocols, for example: |
|
|
In the vignettes you can find different analyses of the protocols, for example: |
|
|
|
|
|
|
|
|
@@ -110,4 +115,44 @@ In the vignettes you can find different analyses of the protocols, for example: |
|
|
- "When are which topics discussed the most?" |
|
|
- "When are which topics discussed the most?" |
|
|
- ... |
|
|
- ... |
|
|
|
|
|
|
|
|
|
|
|
# Contributing |
|
|
|
|
|
|
|
|
|
|
|
Developing works the easiest with `devtools`: |
|
|
|
|
|
```r |
|
|
|
|
|
library(devtools) |
|
|
|
|
|
``` |
|
|
|
|
|
When you changed something or added some functionality, you can reload all package functions with |
|
|
|
|
|
```r |
|
|
|
|
|
load_all() |
|
|
|
|
|
``` |
|
|
|
|
|
If you want to avoid reading all records every time you start a new R session, you can |
|
|
|
|
|
write your parsed tibbles to CSV files: |
|
|
|
|
|
|
|
|
|
|
|
``` |
|
|
|
|
|
tables <- read_all() |
|
|
|
|
|
tables <- repair(tables) |
|
|
|
|
|
write_to_csv(tables, "path/to/csv/") |
|
|
|
|
|
``` |
|
|
|
|
|
Then later you can use |
|
|
|
|
|
```r |
|
|
|
|
|
res <- read_from_csv("path/to/csv/") |
|
|
|
|
|
``` |
|
|
|
|
|
to load your stored tibbles very fast. |
|
|
|
|
|
|
|
|
|
|
|
NEVER use source(...), etc.! Also NEVER use library(...). |
|
|
|
|
|
To add new packages (as dependency), use: |
|
|
|
|
|
```r |
|
|
|
|
|
use_package("my-good-old-package") |
|
|
|
|
|
``` |
|
|
|
|
|
To make package imports available, you have to add them to `R/hateimparlament-package.R` |
|
|
|
|
|
as `@import <package>`. |
|
|
|
|
|
|
|
|
|
|
|
To reload / create documentation (calls roxygen) |
|
|
|
|
|
```r |
|
|
|
|
|
document() |
|
|
|
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
Build vignettes |
|
|
|
|
|
```r |
|
|
|
|
|
rmarkdown::render("vignettes/test.Rmd") |
|
|
|
|
|
``` |