Quick and Easy Stump Removal Tips

Even though it may appear like chopping a tree is a huge task, it is quite simple compared to extracting a stump. Stump removal is much more time taking, which is why it is typically excluded from…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




What Word Appears the Most in Shoreline Mafia Lyrics?

A while back a friend and I got into a passionate debate about the most commonly used word in Shoreline Mafia lyrics. Shoreline Mafia, a West Coast rap collective, produces hedonistic lyrics that heavily focus on drugs, partying, and sex, so there were a few obvious contenders of what the word could possibly be. However, I wasn’t satisfied with just a rough guess, so I set out to find out for once and all to determine the most frequently recurring word in their lyrics.

For this data analysis, I chose to use R and R Markdown, mainly because I had just completed a class on R this past quarter and the information was still fresh in my mind.

For this project, I decided that the easiest way to source my data would be to go on Genius and utilize all the available lyrics for released Shoreline Mafia songs. For each available song, I copied and pasted the lines into one giant text file that would be broken down later.

In total, I took the lines from the following 39 songs:

The resulting text file had around 2500 lines of lyrics in total.

The first thing I did was clean up the text file I had created earlier. When my file was read into R, line break symbols were added to my text so I had to remove all of those. Furthermore, I deleted all punctuation and changed all the letters to lowercase to ignore any differences due to capitalization.

Once the data was cleaned, I broke apart my text string into individual words. Afterward, I applied the table() function to create a frequency table of all the words that were present in the lyrics of the 39 chosen songs. I then transformed the information from the table into a data frame. Although the data frame originally contained a row counting the occurrences of blank spaces, I decided to remove it as spaces aren’t exactly considered words.

With my data frame completed, I moved on to examining my data.

The first order of business was to calculate the occurrence frequency of all the words used in the 39 chosen songs. The frequency was calculated by dividing the number of times a word used by the total number of words in the lyrics of all 39 songs.

Afterward, I thought it would be interesting to extract all the words that contained four or more letters.

Once I completed the calculations and extractions, it was time to show the results.

For the sake of saving time and space, I decided to only display the top 20 most frequently used words.

Table generated using R Markdown

Looking at this table, it appears that a lot of the most commonly used words in Shoreline Mafia lyrics are function words such as “and”, “a”, and “the”. This is not surprising, as function words are necessary to form a cohesive sentence. The most frequent word is “the”, occurring in 4.1% of the lyrics. “I” comes in a close second and is found 3.66% of the time. Next, I displayed the top 20 words containing 4 or more letters.

Table generated using R Markdown

However, when examining the words that are 4 letters or more, we see that “bitch” occurs the most out of all at 1.7%. Overall, it appears that profanities seem to top the charts.

To better visualize the distributions and compare the words with each other, I then plotted only the top 10 most used words on a bar plot.

Bar plot generated using R Markdown

This plot shows that there is a relatively significant difference in frequency between the most used word — “the”, and the 10th most used word — “it”.

I did the same thing for the top 10 most used words that are 4 or more letters.

Bar plot generated using R Markdown

This time, there was not a big difference in frequency amongst the top 10 most used words that are four or more letters.

To answer my original question, “the” is the most commonly used word in Shoreline Mafia lyrics, while “bitch” is the most commonly used 4+ letter word. Looking towards the future, it would be interesting to compare the word usage frequencies between the artists in Shoreline Mafia to see if different artists had different word choice tendencies in their lyrics. Overall, I found this a fun and quick project to do while under quarantine.

The following is the code I used to generate the tables and plots in R Markdown:

Add a comment

Related posts:

My markdown notes system

I tried many note apps (the obvious suspects as well as Simplenote, Google Keep, Bear & Dropbox Paper), but I (surprisingly) couldn’t find one that would fit my needs, so I hacked something on top of…

The Future of Public Transportation

Companies and start-ups are rushing to create platforms that offer end-to-end transportation solutions in one convenient place. Some are developing apps that help track nearby transportation…

The Cheapest Thrill

My first downhill longboarding experience was a short one. I had zero experience and my genius 14-year-old brain didn’t have the precognition to look up even the bare minimums about bombing hills…