Most current college students are aware of RateMyProfessors.com, the go-to review site for higher ed. Student of their own volition may review both professors and colleges to help future students make decisions. As with any review site, reviews range from scathing to celebratory, but how accurate are they?
As a small introduction to the idea, I took review and teacher data from RateMyProfessors for the entirety of Brigham Young University - Idaho (my alma mater). What follows are the graphical and statistical insights I found most fascinating from this data set.
To collect the data, I found an api on github for web scraping RateMyProfessor, then modified the Python code to fix it and hit it with a hammer until I got what I wanted. It took considerable time because this was my first experience web scraping with Python.
Unfortunately, I forgot to tie the professor name to each review, but the second table shows overall rating and other data collected about professors (in case a professor is interested in looking up what data is collected on them).
datatable(ratings %>% head(500))
datatable(prof %>% head(500))
The data available from RateMyProfessor contains 76,958 student reviews. The largest grouping of people do not list a grade in their review, but there is a disproportionately high number of A’s for those who do, and higher grades are more common that lower ones in the reviews given.
ggplot(ratings, aes(teacherGrade)) +
geom_bar(fill = "steelblue") +
theme_bw() +
labs(
title = "Student Grades Listed In Review",
subtitle = "RateMyProfessor reviews for BYU-Idaho as of December 2024",
x = "Letter Grade",
y = "Number of Individuals"
) +
theme(
plot.title = element_text(size = 20)
)
table(ratings$teacherGrade) %>% pander()
A+ | A | A- | B+ | B | B- | C+ | C | C- | D+ | D | D- |
---|---|---|---|---|---|---|---|---|---|---|---|
7949 | 16926 | 5334 | 4013 | 3240 | 1273 | 801 | 742 | 408 | 134 | 114 | 64 |
F | P | INC | Audit/No Grade | WD | Not sure yet |
---|---|---|---|---|---|
128 | 9 | 172 | 26 | 293 | 1454 |
The graph below shows grade received compared to teacher rating given. Individual reviews are grouped by the grade the reviewer received, then the rating is averaged for each group. This graph contains data across all BYUI student reviews on RateMyProfessor that reported their grade received. There is an incredible, visibly strong linear relationship.
ratings %>%
group_by(teacherGrade) %>%
summarize(
ave_rating = mean(rOverall)
) %>%
ungroup() %>%
filter(teacherGrade %in% letter_grades) %>%
ggplot(aes(teacherGrade, ave_rating)) +
geom_point(color = "red", size = 3) +
theme_bw() +
labs(
title = "Students Hate Getting Bad Grades (On Average)",
subtitle = "RateMyProfessor reviews for BYU-Idaho as of December 2024\nPoints are average teacher rating given for each grade received",
x = "Letter Grade Received",
y = "Average Teacher Rating (1-5)"
) +
theme(
plot.title = element_text(size = 20)
)
Interesting to note is the exception to the linear relationship in the D+ rating. This could be because the students failed and are angry that their teacher didn’t bump up their grade slightly so they wouldn’t have to retake the class.
While the averaged graph is clearest to notice this trend, I find it important to get as close to individual level data as much as reasonable and not rely solely on high level summaries. This 2d density plot shows how many reviews were given with each category combination, normalized by grade category size. We can visibly see that there is a lot of variation in grades received and ratings given, but there is still a visible downward trend.
just_grade_rating <- ratings %>%
select(teacherGrade, rOverall) %>%
filter(teacherGrade %in% letter_grades) %>%
group_by(teacherGrade, rOverall) %>%
summarize(frequency = n())
grade_count <- ratings %>%
group_by(teacherGrade) %>%
summarize(grade_count = n())
# glimpse(grade_count)
# glimpse(just_grade_rating)
just_grade_rating <- just_grade_rating %>% left_join(grade_count) %>%
mutate(
normalized_frequency = frequency / grade_count
)
ggplot(just_grade_rating, aes(teacherGrade, rOverall, size = normalized_frequency, color = sqrt(sqrt((normalized_frequency))))) +
geom_point() +
scale_size(range = c(1, 10)) +
scale_color_gradient(low = "#F0FFF0", high = "steelblue") +
theme_bw() +
labs(
title = "Students Hate Getting Bad Grades (Frequently)",
subtitle = "RateMyProfessor reviews for BYU-Idaho as of December 2024\nDensity normalized by number of reviews with the same grade received.",
x = "Letter Grade Received",
y = "Average Teacher Rating (1-5)"
) +
theme(
legend.position = "none",
plot.title = element_text(size = 20)
)
To explain more about what this chart is, the upper left point is the number of people that received and A+ and gave the teacher a rating of 5. Then, because we saw that there are a disproportionate number of A+ reviews compared to D reviews, each point is normalized by number of reviews in its grade group. The point with a rating of 5 in the A+ group would be divided by number of people that received an A+. Without normalizing, the chart is a better depiction of number of people that received each grade.
Notably, this demonstration is only correlational, but still useful as a point of interest and investigation. It could be interpreted in several ways:
This analysis doesn’t establish causality. There is simply a very interesting correlation. I would suppose the RateMyProfessor rating is not a measure of how good the teacher is, how friendly they are, or how highly esteemed they are. It seems to be all of those things, or a separate measure in and of itself.