Vocabulary of rappers and Russian classics. About the vocabulary of famous writers, poets and rock musicians

Inspired by the idea of ​​the largest vocabulary in hip-hop, research engineer Varun Jewalikar wanted to create a similar analysis of a wider range of artists from different genres. I went through the list of best-selling musicians and decided to dig deeper. It turned out that Eminem has the greatest variety of words in his lyrics.

The list is quite large (99 musicians and 25 genres), and in order for the analysis to be quite interesting and small, I decided not to tell how it was carried out. Having collected data from the Musixmatch website, I came up with the following analysis.

These same 93 musicians from that same list are sorted by genre. (93 because Bruce Springsteen, Chicago, Def Leppard, Journey, The Beach Boys and The Doors from the main list 99
The artists were not given permission by Musixmatch to use the lyrics of their songs. Therefore, they cannot be included in the analysis).

The goal is to compare the size of musicians' vocabularies. Some of them have released many more songs than others due to a longer career on stage or because of their musical
directions.

To prevent the analysis from being incorrect due to the different number of songs, I included only the 100 most dense songs by each artist in terms of the number of words. Only 6 of all musicians have less than 100 songs, so that's a pretty good limit. Also 100 songs span 8-10 albums that span 5 to 10 years of work. This gives a true picture of the musicians' overall vocabulary.

Here are a few meanings we'll look at:

Lexicon: The number of unique words (in any language) used by a musician in the 100 (or fewer) highest word count songs of their career.

Text content: The total number of words (in any language) used by a musician in the 100 (or fewer) highest word count songs of their career.

New Word Interval(NWI): The average number of words after which a musician uses a new word. This is the coefficient (Text content / Vocabulary). NWI from n means that each n-word is a new word in the artist's lyrics that he/she has never used in his/her songs before.

There are only 4 rappers on the list and they are all at the top in terms of vocabulary size. Among them are Eminem, followed by Jay-z, 2Pac, Kanye West and The Black Eyed Peas by a wide margin. Eminem also has the most
high ratio of the number of words in the song 1018.5.

With how clear and descriptive his songs are, it's no wonder Bob Dylan is ranked so highly. He also ranks fairly high on New Word Interval (#11), averaging a new word after every 9 words.

These superstars have released songs in a variety of popular languages. Their vocabularies were summed up, which led to a fairly high value in the overall top. I did not expect this result when I started the analysis.

I didn't expect a pop sensation like her to rank so high since they rely on simplicity in their songs. She is also the only one to make it into the top 15 artists in terms of vocabulary size and also in total number of certified albums sold.

And who said that songs cannot be sold without lyrics?

The average vocabulary size among all musicians is 2,677 words. About 40 musicians have a vocabulary of less than 400 words on average. Reach this range with your lyrical vocabulary and you will become one of the best-selling artists.

The three best-selling artists of all time rank fairly low in terms of vocabulary size. Not surprisingly, the simplicity of their songs breaks barriers of geography, age and language, and they are revered all over the world. On the contrary, Mariah Carey is quite high in both charts (9th place in sales and 20th place in vocabulary size).

The following table shows the average vocabulary of artists in different genres. The number of artists who represent this genre is given in brackets. Since our list only contains 93 musicians, this is not the best generalization.

Some patterns can be noticed. Hip-hop is head and shoulders above all other genres. Folk takes second place, but since there is only one representative on the list (Bob Dylan), this is not at all an indicator. Pop is the genre with the largest number of musicians and its average vocabulary (2,464 words) is close to the average vocabulary of all artists (2,677 words). The same applies to the Rock genre.

There is a wide variation in vocabulary sizes within the top 93 best-selling artists, and there is essentially no relationship between a musician's commercial success and the size of their vocabulary.

Do not take this analysis to mean that one artist is better than another, it is simply another look at the work of these wonderful artists. We just get a glimpse into the minds of various songwriters, some can rip your heart out with a couple of lines, while others paint complex, intricate images with a thousand words. A quote taken from a John Lennon song explains the whole dilemma quite well: "Half of what I say doesn't make sense, but I say it to reach your minds."

All song lyrics and other data (pictures, albums, tracklists) are taken from the Musixmatch API. Python was used for data processing and song lyrics analysis. The analysis can be improved if we remove all sounds like (ou, aaa, etc.) and other words that are not in the dictionary. The data and codes can be published if anyone is interested.

The largest vocabulary in hip-hop compares the vocabularies of various musicians based on the first 35,000 words they write. Instead of comparing word counts, we took the 100 songs with the most words. Just out of curiosity (and for the sake of some finality), we used the same method to calculate the first 10,000 words written by each artist. The results of the two studies are not very different; the top five musicians have not changed. The top ten are the same, with a slight change. Andrea Bocelli moved from No. 8 to No. 6, while the Black Eyed Peas moved from No. 6 to No. 7 and Julio Iglesias from No. 7 to No. 8. There are no more noticeable changes overall. So we used 100 songs as a limit because it's more musical.

The Yandex.Music service has drawn a map of the most popular words in Russian rap. View it and read the service research.

How was this done?"For each word, we calculated how often it occurs in the texts of rappers and all other performers (only texts available on Yandex.Music were taken - The Flow's note). In order not to overestimate the frequency of words that are repeated a lot in one song (for example, in chorus), the word was taken into account only once for each track. The first frequency was divided by the second - the higher the resulting indicator, the more characteristic the word was considered. Only verbs, nouns and adjectives that were found in both corpora were taken into account.

Which ones are the most popular? e rappers' words?"The most characteristic words for rap and hip-hop were, in fact, rap And hip-hop. Rappers generally talk a lot about their music and the process of its production. Words track, microphone, beat, rhyme or, for example, album are as characteristic of the genre as obscenities or slang - car, hut And so on. Least typical rap words moon, spring, bird, rain, river, wing, silence, heart and so on".

How have popular words changed from the early 90s to the present day?


In addition, the service can show the most popular words in the work of an individual artist - to do this, you need to insert his name into the appropriate column.

That's what we did.

Due to the growing interest in battle rap and the entire hip-hop industry in general, we present to you a detailed conceptual apparatus (rap dictionary) on Rap so that you can understand what MCs are talking about in their battles.

Autotune– Voice processing and correction program, used to correct the performer’s singing according to notes. It has become a household name in rap and identifies all voice correction programs.

underground(underground - underground, underground) - a number of artistic movements in contemporary art (in music, literature, cinema, fine arts, etc.), opposing themselves to mass culture, the mainstream.

Acapella- Text recorded on a microphone, separately from the minus.

Battle- Competition between rap artists is usually accompanied by humiliation of the opponent. A battle track is often nothing more than a diss on an opponent. Battles are divided into online battles (take place on the Internet) and live battles (everything happens live).

Beat- Drum-bass line minus. Previously, they read to a percussion part created by beat-boxing or tapping objects. This word was originally used as a beat in rap music. At the moment, any music that is rapped to is called a beat.

Beatbox- A beat created solely with the mouth, without the use of musical instruments.

Beatmaker– A person who creates beats in specialized programs such as Cubase, FL Studio and others. A good level of beat making is to use instruments recorded live and not use samples.

Biff(Beef) - Enmity between rap artists, parties or labels, accompanied by diss and frequent live showdowns.

Bootleg(bootleg) - a pirated collection of tracks that the artist may never even know about.

Becky– An additional recorded audio track, where the performer usually pronounces only the second part of the line or highlights rhymes and phrases.

Backing vocalist– A person who helps the performer on stage. As a rule, he pronounces the second part of the line so that the performer has the opportunity to draw air at this time.

Versus() - One of the two most popular live battles in Russia. Based in St. Petersburg.

Ghostwriter– A specialist who writes texts for money.

Double time- Reading is twice as fast as the rhythm of the music. Prominent representatives of this style are Ceza, Tech N9ne, FIKE, Dom1no and other performers.

Double rhymes(Double-rhyme) - The end of a line has two words at once, which will be used to rhyme in the next line, also in two words. That is, if the first line ends with “brain and heart,” then you will need to select a consonance for the word “brain” and a separate consonance for the word “heart.” For example - “a poster for the door” (with the brain - the poster, with the heart - the door).

Diss(diss, disrespect) - A track aimed at another artist or someone or something with the goal of “bringing him down.” In such tracks, obscene speech, swearing at the opponent and his relatives, threats, below-the-belt jokes, etc. are practiced. Diss are often used in beefs.

EP- A small album, usually up to 7 songs in size.

Sound engineer– A specialist who mixes and masters tracks.

Indabattle(Platypus) - Battle taking place on the portal indarnb.ru. The second largest battle in Russia. It bears the slang name “Platypus”, because the father of the main organizer of the battle (Snake) is the owner of the Utkonos chain of stores.

Instrumental- Synonym for the general meaning of the word bit

Cover(cover) – A new version of a track recorded (reread) by another artist.

mouth guard- Slang name for the word “acapella”.

Square rhymes- Rhymes in the text are added at the end of the line, and rhyming words have the same endings. An example is “hand is flour”, “mountain is time”. This is considered to be the easiest way to rhyme.

Concert director– Specialist responsible for organizing performer’s concerts.

Crank(Crunk) - a style of southern rap music, with repetitive phrases and fast dance rhythms.

Live(live) - Audio or video recording from a performer’s concert. As a rule, the “live” mark is placed in the title of the track, so that it is clear that this is not a studio version, but a recording from a concert.

Label(label) - 1) Abroad, a label is a record company that has the rights to release and distribute performers’ albums. 2) In Russia, a rap group is called a label. Often this group is primarily united by the studio.

Mike– Microphone

Mastering- the final stage of work on a song, which is designed to make a well-mixed mix louder, brighter, cleaner, more transparent and put it on the same level as popular commercial tracks in terms of volume level. Also at this stage you can correct minor errors made during mixing.

Mix(mix) - several pieces of music (tracks) arranged in a continuous sequence. As a rule, mixes are compiled by DJs for various purposes (for example, for inclusion on the radio in thematic programs). Typically, mixes consist of tracks that are similar in genre, mood and other characteristics. On average, the duration of a mix ranges from 25 to 74 minutes.

Mixtape(Mixtape) - 1) In foreign rap, this word means a release made from remixes or mixed tracks. 2) In Russian rap, a mixtape is a collection of tracks recorded on backing tracks illegally taken from other people’s tracks. As a rule, mixtapes in Russia violate the copyrights of performers. A mixtape is also a collection of tracks recorded on minus, posted by beatmakers for public use.

Minus- A synonym for the general meaning of the word bit.

Independent battle- A battle taking place on the website hip-hop.ru, organized not by the forum administration, but by the forum members themselves.

Noname(Noname) - An insufficiently popular or unknown performer who does not have a “name”. A relatively objective indicator in this aspect can be the amount of audio in VK and the number of concerts, as well as the number of visitors who came to the concert.

HP(New Rap) - The largest news rap public vk.

Newschool– A new style of hip-hop, the distinctive features of which are the use of fast flow, dashes and various plugins and effects, such as melodin and autotune.

Old school(Oldschool) – An early style of hip-hop, also called old school. Prominent representatives of this style are 2Pac, Wu-TangClan and Onyx. Often this is a measured presentation without the use of many effects and fast flow.

The official battle is a battle taking place on the website hip-hop.ru, organized by the forum administration. The largest battle in Russia.

Punch, punchline(Punch) – This is a laconic phrase/line designed to hook an opponent. This can be either a vivid metaphor or a joke below the belt. “The presence of an opponent is not necessary. It's like finishing off a joke. Just a catchy phrase or line"

Part– Written part of one artist on a joint track.

Dashes, acceleration- The fundamental part of fast flow. Increased text reading speed.

PR– Distribution of the artist’s creativity or any information and offer of services.

Innings- The emotions invested in the reading, the placement of intonations, the way of pronouncing words, the use of vocals, dashes, acceleration and other specific rap techniques.

Producer– A specialist who is fully involved in the promotion of the performer, dealing with all legal and financial issues. Often, producers register the performer's name (nickname) in their name, and when the performer changes producer, he is forced to change the nickname, since all rights to the old nickname will belong to the old producer. For this reason, Loc-Dog was forced to change his nickname to Loc Dog.

Promo(promo) - a release to familiarize yourself with the work of a certain artist.

Release- Premiere of an album, track, video or collection

Remix(Remix) - New arrangement of an already released track.

Rapcore- a subgenre of rock music characterized by the use of rap as vocals. Rapcore combines the instrumental and vocal properties of such genres as punk, alternative rock, and hip-hop.

Mixing- the stage of working on a song, during which the recorded audio tracks (instruments, main vocals, takes, etc.) are combined into one audio file using various devices and techniques, such as equalization, compression, volume manipulation, placement in space, adding sound effects. Note: vocal correction, synchronization of takes and backings is a process not included in mixing, it is a precursor
note installation stage.

Swag(swag) – An expression of coolness and individuality.

Skiles(Skills) - Presentation and various types of rhyme construction.

SlovoSpb() - One of the two most popular live battles in Russia. Based in Krasnodar.

Compound rhymes- The end of the line in the next line rhymes with several words at once. Example: “Apocalypse - While you are healing”

Storytelling- a track that sets out a story, while consistently describing the events, actions and deeds of real or fictional characters.

Sample– A relatively small fragment of melody (music), taken as the basis for creating a minus. Beats are applied to the samples.

Take– recorded fragment, attempt. Example of use: I wrote it all down in one take, i.e. in one try.

Track(Track) - A synonym for the word “song” in rap.

Triplet- this is a musical size. In rap, it is now customary to call a rap with broken chips, like tanguist, acceleration, etc.

True(True) - A performer who reads the truth, that is, what he really thinks, does and what happened in his life.

Platypus- Slang name for Indabattle.

Fastflow(FastFlow) - A serving style built on dashes and accelerations.

Fit(ft. or feat) - Indicates that this is a joint track of two or more artists

Flayva(flave) - Party, company, group or label.

Flow(Flow) - Execution speed.

Freestyle(Freestyle) - Improvisation in rap. Reading text composed by the performer on the fly.

Fake(Fake) - Performers whose lyrics are based on lies. Their characteristic feature is considered to be the ability to “responsible for words.”

Hype- enthusiastic rumors, often deliberately inflated for marketing purposes.

Hustle- Any type of income related to rap or breaking the law (selling drugs, etc.)

Hater- A listener who condemns any creativity and has an acute dislike for it.

Homey(Homie) - Friend or loved one.

H.h.ru(persimmon) - One of the most popular forums dedicated to hip-hop culture, hip-hop.ru.

If anyone needs the full text, welcome to the site. I’m not giving a link, they’ll wipe it out, but it’s easy to find from the main site of that site if you go through the fonts.

So, specifically for the site, arrays of texts by famous cultural figures were analyzed, which should have contained exactly 25,000 words. The number of unique words was counted by a special program.

Interesting conclusion #1

Other interesting findings (subjective opinion)

The poorest vocabulary in songs Dima Malikov(well, this is not a complaint against Malikov, many people write to him there - Shaganov, etc.). And the most extensive is from the writer Vladimir Sorokin.

U Rosenbaum And Lermontov approximately the same indicators, both are almost in the middle of the rating.

In prose, Sorokin, an outsider to Lermontov, is in the lead. But the rating itself starts at 4000+ thousand (that’s a lot). Akunin scored a little more than Gogol. Dovlatov and Chekhov are almost equal. Pelevin is second after Sorokin. Lev Nikolaevich teaching us from the picture - towards the end))

IN poetry the leader of Our Everything is Pushkin (predictable, right?), the outsider is Mayakovsky. The lower limit is about 2000, but Vladimir Vladimirovich is slightly below it. In general there are not enough participants. Here Lermontov bypassed many, not like in prose. However, he is in third place, far behind second place - and this is V.S. Vysotsky.

Rap presented quite extensively. The outsider is Dolphin (but still starts from position 2475), the leader is a certain Noggano (6584). Timati is third from the end.

Pop music naturally starts with an indicator of less than 1000 - Malikov and Na-na, Rosenbaum is in the lead (what is he doing here? We didn’t know where to put it, I think). And in second place - we’re not falling - Mikhail Krug (!!! 3741, I think the vocabulary there is specific). Mumiy-Troll with a little more than 3000 is third place. Zemfira looks average - just under 2000.

Finally, rock musicians. There aren't many of them either. The outsider is Viktor Tsoi (1861), the leader is Andrei Makarevich (5874, this is where I was surprised). Right behind him are Egor Letov and Grebenshchikov (I wasn’t surprised at these two). In the middle is Sasha Vasiliev (almost 4000, I thought it was more). For some reason the auction didn’t even reach 2500, I thought it would be more.

That's about it. It is clear that the writer’s vocabulary and the artistic value of the work are not the same thing. But it was still interesting for me to look at the calculations.

Share