“Star Wars: The Last Jedi” is in cinemas since December 14th (in Germany). I visited the midnight premiere with a double feature with Episode 7 and 8. Today I want to present a combination of some things I love: Star Wars and R.

“Star Wars: The Lest Jedi” was discussed controversial in the community and I followed the discussion at the german Star Wars Union homepage.

I want to show briefly how the comments of the users developed during the last week. For the analysis I used the woderful R-package rvest and the packages included in the tidyverse, especially dplyr and ggplot2.

rvest is used for web scraping, dplyr for data manipulation and ggplot2 for visualisation.

See here for the source code of the analysis. The colour used for the first two plots is in the style of the Star Wars episode 8 colour and in ggplot2 it’s called red3.

Some results:

Comments per hour

Comments per hour

Users with the most comments

Most comments from users

I decided to list all users with more than 20 comments in total. My plan was to plot the total numbers of comments per user over all the days and see how the top users changed. But I decided to use this simple plot. I recommand the great R-package gganimate to dynamically show the development of a variable. See the project page for further examples.

Comments per day and hour

Comments per hour

The most comments were created between 7 pm and 8 pm or general: in the evening. It also obvious that on the 14th and 15th December the most comments in total were created.

I hope you had fun to see this statistics and see how easy it is to create such plots with ggplot2. Next time I want to show a wordcloud of the most common words used in the comments and further funny things …

May the force be with you!