The above graph shows that rap is an extreme outlier in terms of the number of expletives used in each song. With nearly eight uses of profanity (which are uses of the word “b***h, “s**t”, “n***a”, and “f**k”) per song, rap towers above the next closest genre, R&B, which has less than one expletive per song.
Visualizing this information in a different way can help to explain this phenomenon more clearly. Below is a box plot representation of the same data:
In this plot, one can see that the mean number of expletives in the rap genre is affected by a right skew, pulling the mean higher. Additionally, a number of extreme outliers, like Tyga's "Rack City", which contains 71 expletives (the most in the dataset), pull the mean up as well. So, while the average is high compared to other genres, there are still plenty of rap songs that are clean, just as there are plenty of songs from other genres that contain explicit language, even if the majority of other genre's songs are clean (making their box plots are barely visible).
To combat this visibility problem, one can remove all the clean songs from the dataset. This is not the best practice since throwing away data is always an issue, but it can still convey valuable information: of the songs that do use profanity, how much profanity are they using?
Now, the plot is much more visible, and the insight is relatively the same. Rap still uses the most profanity, but of the songs that are explicit, the difference in explicitness across genres is relatively small (as compared to the previous chart).
To read more about this graph in the larger context of the entire project, read the essay.
By: Nikhil Chinchalkar