As already mentioned in my first post I also analysed the user comments from a post at www.starwars-union.de word by word. The figure shows the ‘wordcloud’ from all comments (1728 until now).

Wordcloud

To create such a nice wordcloud, I used the following code. The first part was already explained in my first post.

library(tidyverse)
library(rvest)

site <- seq(0, 1710, 30)

url <- paste0("https://www.starwars-union.de/nachrichten/18973/SWU-Kritiken-Unsere-Gedanken-zu-Star-Wars-Die-letzten-Jedi/k/",site,"/#kommentare")

First I load all neccesarry packages and I create all available URLs to the comments.

comments <- lapply(1:length(url), function(x) {

data <- read_html(url[x]) %>%
html_nodes(xpath = '//*[@id="kommentargesamt"]') %>%
html_nodes("#kommentar") %>%
html_nodes("p") %>%
html_text()

data[seq(2, length(data), 2)]
})

This is the main part for scraping all the comments: I searched the HTML file for id=”kommentargesamt” and extract the comments. These are saved in the variable comments.

Now all is prepared for creating the wordcloud. For that purpose I used the following snippet, which I found once in the internet. There are many examples creating a wordcloud with R and I decided to use the following one:

library(stringr)
library(tm)
library(SnowballC)
library(wordcloud)
library(RColorBrewer)

words <- unlist(str_split(comments, pattern = c(" ")))

Corpus <- Corpus(VectorSource(words)) %>%
tm_map(content_transformer(tolower)) %>%
tm_map(removePunctuation) %>%
tm_map(removeWords, c("dass", "zuletzt", "geändert", "am", "uhr",
stopwords('german')))

To create a nice graphical output I recommend to save the wordcloud directly and not via RStudio viewer or something else.

 png(
filename = "SWU_comments_wordcloud.png",
width = 500,
height = 500)

wordcloud(Corpus,
scale = c(8,.2),
min.freq = 2,
max.words = 50,
random.order = FALSE,
rot.per = .15,
colors = brewer.pal(8,"Dark2"))

dev.off()

And that’s it !! I think most of the words are comprehensible also for non-german readers ;-)