JosuaKugler 4 лет назад
Родитель
Сommit
7067877584
4 измененных файлов: 62 добавлений и 14 удалений
  1. +2
    -0
      NAMESPACE
  2. +31
    -0
      R/analyze.R
  3. +5
    -0
      README.md
  4. +24
    -14
      vignettes/funwithdata.Rmd

+ 2
- 0
NAMESPACE Просмотреть файл

@@ -1,6 +1,8 @@
# Generated by roxygen2: do not edit by hand # Generated by roxygen2: do not edit by hand


export(fetch_all) export(fetch_all)
export(find_word)
export(join_redner)
export(read_all) export(read_all)
export(read_from_csv) export(read_from_csv)
export(repair) export(repair)


+ 31
- 0
R/analyze.R Просмотреть файл

@@ -0,0 +1,31 @@
#' @export
find_word <- function(res, word) {
talks <- res$talks
mutate(talks, occurences = sapply(str_match_all(talks$content, regex(word, ignore_case = TRUE)),
nrow))
}

#' @export
join_redner <- function(tb, res, fraktion_only = F) {
joined <- left_join(tb, res$redner, by=c("redner" = "id"))
if (fraktion_only) select(joined, "fraktion")
else joined
}

party_colors <- c(
SPD="#DF0B25",
"CDU/CSU"="#000000",
AfD="#1A9FDD",
"AfD&Fraktionslos"="#1A9FDD",
"DIE LINKE"="#BC3475",
"BÜNDNIS 90 / DIE GRÜNEN"="#4A932B",
FDP="#FEEB34",
Fraktionslos="#FEEB34"
)

#' @export
bar_plot_fraktionen <- function(tb) {
ggplot(tb, aes(x = reorder(fraktion, -n), y = n, fill = fraktion)) +
scale_fill_manual(values = party_colors) +
geom_bar(stat = "identity")
}

+ 5
- 0
README.md Просмотреть файл

@@ -22,6 +22,11 @@ Um dokumentationen neu zu laden / zu erstellen (ruft roxgen auf)
document() document()
``` ```


Baue vignetten
```r
rmarkdown::render("vignettes/bla.Rmd")
```

# Herunterladen # Herunterladen


Bevor analysiert werden kann, muss fetch.R ausgeführt werden, um alle Protokolle herunterzuladen. Bevor analysiert werden kann, muss fetch.R ausgeführt werden, um alle Protokolle herunterzuladen.


+ 24
- 14
vignettes/funwithdata.Rmd Просмотреть файл

@@ -29,32 +29,42 @@ fetch_all("../records/") # path to directory where records should be stored
Second, those `.xml` files, need to be parsed into `R` `tibbles`. This is accomplished by: Second, those `.xml` files, need to be parsed into `R` `tibbles`. This is accomplished by:
```r ```r
read_all("../records/") %>% repair() -> res read_all("../records/") %>% repair() -> res

reden <- res$reden
redner <- res$redner
talks <- res$talks
``` ```
We also used `repair` to fix a bunch of formatting issues in the records and unpacked We also used `repair` to fix a bunch of formatting issues in the records and unpacked
the result into more descriptive variables. the result into more descriptive variables.


For development purposes, we load the tables from csv files. For development purposes, we load the tables from csv files.
```{r} ```{r}
tables <- read_from_csv('../csv/')

comments <- tables$comments
reden <- tables$reden
redner <- tables$redner
talks <- tables$talks
res <- read_from_csv('../csv/')
```
and unpack our tibbles
```{r}
comments <- res$comments
reden <- res$reden
redner <- res$redner
talks <- res$talks
``` ```


## Analysis ## Analysis


Now we can start analysing our parsed dataset, e.g. find out which party gives the most talks: Now we can start analysing our parsed dataset, e.g. find out which party gives the most talks:
```{r}
left_join(reden, redner, by=c("redner" = "id")) %>%
```{r, fig.width=10}
join_redner(reden, res) %>%
group_by(fraktion) %>% group_by(fraktion) %>%
summarize(n = n()) %>% summarize(n = n()) %>%
ggplot(aes(x = fraktion, y = n)) +
geom_bar(stat = "identity")
arrange(n) %>%
bar_plot_fraktionen()
``` ```


### Count a word occurence

```{r, fig.width=10}
find_word(res, "hitler") %>%
filter(occurences > 0) %>%
join_redner(res) %>%
select(content, fraktion) %>%
group_by(fraktion) %>%
summarize(n = n()) %>%
arrange(desc(n)) %>%
bar_plot_fraktionen()
```

Загрузка…
Отмена
Сохранить