|
|
|
@@ -29,32 +29,42 @@ fetch_all("../records/") # path to directory where records should be stored |
|
|
|
Second, those `.xml` files, need to be parsed into `R` `tibbles`. This is accomplished by: |
|
|
|
```r |
|
|
|
read_all("../records/") %>% repair() -> res |
|
|
|
|
|
|
|
reden <- res$reden |
|
|
|
redner <- res$redner |
|
|
|
talks <- res$talks |
|
|
|
``` |
|
|
|
We also used `repair` to fix a bunch of formatting issues in the records and unpacked |
|
|
|
the result into more descriptive variables. |
|
|
|
|
|
|
|
For development purposes, we load the tables from csv files. |
|
|
|
```{r} |
|
|
|
tables <- read_from_csv('../csv/') |
|
|
|
|
|
|
|
comments <- tables$comments |
|
|
|
reden <- tables$reden |
|
|
|
redner <- tables$redner |
|
|
|
talks <- tables$talks |
|
|
|
res <- read_from_csv('../csv/') |
|
|
|
``` |
|
|
|
and unpack our tibbles |
|
|
|
```{r} |
|
|
|
comments <- res$comments |
|
|
|
reden <- res$reden |
|
|
|
redner <- res$redner |
|
|
|
talks <- res$talks |
|
|
|
``` |
|
|
|
|
|
|
|
## Analysis |
|
|
|
|
|
|
|
Now we can start analysing our parsed dataset, e.g. find out which party gives the most talks: |
|
|
|
```{r} |
|
|
|
left_join(reden, redner, by=c("redner" = "id")) %>% |
|
|
|
```{r, fig.width=10} |
|
|
|
join_redner(reden, res) %>% |
|
|
|
group_by(fraktion) %>% |
|
|
|
summarize(n = n()) %>% |
|
|
|
ggplot(aes(x = fraktion, y = n)) + |
|
|
|
geom_bar(stat = "identity") |
|
|
|
arrange(n) %>% |
|
|
|
bar_plot_fraktionen() |
|
|
|
``` |
|
|
|
|
|
|
|
### Count a word occurence |
|
|
|
|
|
|
|
```{r, fig.width=10} |
|
|
|
find_word(res, "hitler") %>% |
|
|
|
filter(occurences > 0) %>% |
|
|
|
join_redner(res) %>% |
|
|
|
select(content, fraktion) %>% |
|
|
|
group_by(fraktion) %>% |
|
|
|
summarize(n = n()) %>% |
|
|
|
arrange(desc(n)) %>% |
|
|
|
bar_plot_fraktionen() |
|
|
|
``` |