Combining Multiple Dataframes and Creating a Bar Graph with ggplot2 in R

Combining Multiple Dataframes and Creating a Bar Graph

In this article, we will explore how to combine multiple dataframes and create a bar graph using the ggplot2 library in R. We will start by creating sample dataframes for gameplay trends, update trends, review trends, best trends, and theme trends.

Background

The gtrendsR package is used to retrieve data from Google Trends. The gtrends function takes a keyword as input and returns a dataframe containing the search volume over time. By replacing the <1 in the hits column with 0, we can convert the hits values into numerical format for easier analysis.

We will create five dataframes, one for each game title: gameplay, update, review, best, and theme. We will use these dataframes to sum up the number of hits for each term.

Sample Data

Let’s start by creating a sample dataframe for gameplay trends using the gtrends function:

# This is for "gameplay"
trends_gameplay <- gtrends(keyword = c("Cult of the Lamb gameplay", 
                                      "Elden Ring gameplay", 
                                      "Divinity Original Sin 2 gameplay", 
                                      "Minecraft gameplay", 
                                      "League of Legends gameplay"))

trends_gameplay_over_time <- trends_gameplay$interest_over_time %>%
  as_tibble() %>%
  mutate(hits = stringr::str_replace(hits, "<1", "0"),
         hits = readr::parse_number(hits))

Similarly, we can create dataframes for update trends, review trends, best trends, and theme trends.

Combining Dataframes

To combine multiple dataframes, we will use the rbind function in R. However, since we want to group by game title, we cannot simply concatenate all the dataframes.

# Function to get data
f_search_trend <- function(v_term) {
  k <- gtrends(keyword = c(paste("Cult of the Lamb", v_term), 
                         paste("Elden Ring", v_term), 
                         paste("Divinity Original Sin 2", v_term), 
                         paste("Minecraft", v_term), 
                         paste("League of Legends", v_term)))

  k <- k$interest_over_time %>%
    as_tibble() %>%
    mutate(hits = stringr::str_replace(hits, "<1", "0"),
           hits = readr::parse_number(hits)) %>%
    mutate(Term = v_term)

  return(k)
}

d1 <- f_search_trend("gameplay")
d2 <- f_search_trend("update")
d3 <- f_search_trend("review")
d4 <- f_search_trend("best")
d5 <- f_search_trend("theme")

# add them together
d <- rbind(d1, d2, d3, d4, d5) %>%
  group_by(keyword, Term) %>%
  summarise(total_hits = sum(hits)) %>%
  mutate(keyword = word(keyword , 1  , -2))

However, this approach will not work because the dataframes are still grouped by the term name, not the game title.

Solution

To solve this problem, we can create a pivot table using the pivot_wider function from the tidyr package.

library(tidyverse)
library(gtrendsR)

# Function to get data
f_search_trend <- function(v_term) {
  k <- gtrends(keyword = c(paste("Cult of the Lamb", v_term), 
                         paste("Elden Ring", v_term), 
                         paste("Divinity Original Sin 2", v_term), 
                         paste("Minecraft", v_term), 
                         paste("League of Legends", v_term)))

  k <- k$interest_over_time %>%
    as_tibble() %>%
    mutate(hits = stringr::str_replace(hits, "<1", "0"),
           hits = readr::parse_number(hits)) %>%
    mutate(Term = v_term)

  return(k)
}

d1 <- f_search_trend("gameplay")
d2 <- f_search_trend("update")
d3 <- f_search_trend("review")
d4 <- f_search_trend("best")
d5 <- f_search_trend("theme")

# Create pivot table
d <- d1 %>%
  mutate(Term = factor(Term)) %>%
  group_by(keyword, Term) %>%
  summarise(total_hits = sum(hits))

d2 <- d2 %>%
  mutate(Term = factor(Term)) %>%
  group_by(keyword, Term) %>%
  summarise(total_hits = sum(hits))

d3 <- d3 %>%
  mutate(Term = factor(Term)) %>%
  group_by(keyword, Term) %>%
  summarise(total_hits = sum(hits))

d4 <- d4 %>%
  mutate(Term = factor(Term)) %>%
  group_by(keyword, Term) %>%
  summarise(total_hits = sum(hits))

d5 <- d5 %>%
  mutate(Term = factor(Term)) %>%
  group_by(keyword, Term) %>%
  summarise(total_hits = sum(hits))

# Merge dataframes
d <- rbind(d1, d2, d3, d4, d5)

# Plot
ggplot(data = d) +
  geom_col(aes(x = Term,
               y = total_hits,
               fill = keyword),
           color = "black",
           binwidth = 0.1,
           position = "dodge", 
           stat = "identity") +
  scale_y_continuous(expand = c(0, 0),
                     limits = c(0, max(d$total_hits) * 1.1)) +
  labs(x = element_blank(),
       y = "Total Hits") +
  theme_bw()

In this solution, we first create a pivot table for each dataframe using the group_by and summarise functions from dplyr. We then merge all the dataframes together using the rbind function. Finally, we use ggplot2 to create a bar graph with the game title on the x-axis and the total hits on the y-axis.

Conclusion

In this article, we explored how to combine multiple dataframes and create a bar graph using the ggplot2 library in R. We used the gtrendsR package to retrieve search volume data from Google Trends and created sample dataframes for gameplay trends, update trends, review trends, best trends, and theme trends. By creating pivot tables and merging dataframes together, we were able to combine multiple dataframes into one and create a bar graph with the desired output.

Additional Resources

If you want to learn more about ggplot2 or gtrendsR, here are some additional resources:


Last modified on 2024-03-09