Saturday, November 8, 2008

Word Frequency Measurement

I came across a fascinating product today that Google operates. The product is Google Trends and visually displays information relating to search queries that could be entered into a Google search. It's unbelievable comprehensive. Aside from providing a line graph of the relative search volume since 2004, it also provides you with the ability to narrow your search to particular regions of the world or even singular countries. For a person who wants to use Google as their advertising medium on the Internet, this feature is without doubt a must. Understanding where and why people are searching for the terms they are is a critical feature that sets Google apart from the rest.

You can also localize your search term trend for a particular span in time. So, if you are only interested in how people have been searching for these words within the past 30 days, you can set that option very easily. If you are particularly interested in how many times that search term was queried in a particular month since January 2004, Google Trends will allow you to set that.

The graph has an interesting dependent variable called Search Volume Index. On Google Trend's About page they define it as "how many searches have been done for the terms you enter, relative to the total number of searches done on Google over time."

Just below the graph are three columns at are unbelievably helpful in understanding how exactly the search term you're looking at is used. There is a column for regions, which specifies the rank of usage by country. Then, next to that column is an even more specific look at where the search term is being queried by ranking the cities by usage. As you will see later, the city in which the search term is queried the most is quite intuitive. Finally, the last column is what language that term is most queried in. Most of the tests that I've performed have had the most usage in English, however, it is fascinating to see how the rest of the queries not in English relating to the search term rank by language.

Probably one of the coolest features about Google Trends is its pairing with relevant news articles. Below the primary graph that reveals search volume there is a another graph called New Reference Volume. "This graph shows you the number of times your topic appeared in Google News stories." When there are spikes in the search term volume, Google Trends automatically flags the occurrence and links the spike to an actual news article, which probably explains the spike in search term volume.

It's an amazing feat of computational engineering. When it comes to understanding how people are using the Internet in terms of what their searching, I cannot think of a better source than Google Trends. Google has approximately 60% market share of all search queries, and this data is contigent on that sample. Google Trends is comprehensive and provides the user with the relevant information that he or she is looking for.

In order to gain further understanding of Google Trends, I ran a quick study in order to familiarize myself with the options and processes available.
  • Google Trends Study
This study was conducted on November 8, 2008 at approximately 7:30pm. The purpose of the study was to gain familiarity with Google Trends and make comparisons of search term usage for a randomly sampled set of search terms across regions.

The study sampled three words from three randomly sampled individuals living in my house. Participants were asked to, "provide three words that could be possible search queries in a well known search engine, like Google." The individuals choosing the words sat together and were asked to recite them aloud. This would ensure that participants didn't provide duplicate words.

After the participants gave me three randomly selected words each, they were asked to, "provide a country somewhere in the world aside from the United States." Each participant then selected the country in the same fashion that they provided the words. Some participants took longer than others; one participant was still selecting words while another had already provided both their words and country. The time length in which the participants finish their selections is not terribly important, however, it should be done in a reasonable amount of time.

Participants were also asked to predict the relative search volume of their three words from highest to lowest across the world.

The following table summarizes each participant's selections and the order in which they expect their search volume to be from highest to lowest:
As you can see, there was a rich diversity in the words that were selected. Participant 2 focused on more Proper Nouns than regular nouns like the other participants involved, yet, still a sufficiently random sample.
  • Participant 1
[To see worldwide results please click each of these respective words: Drugs, Cowboy, Cardboard]

Upon requesting search volumes for the three words provided by Participant 1 in Djibouti, Google Trends was unable to provide search volume information citing the following:
Your terms - ????? - do not have enough search volume to show graphs.
Suggestions:
  • Make sure all words are spelled correctly.
  • Try different keywords.
  • Try more general keywords.
  • Try fewer keywords.
  • Try viewing data for all years and all regions.
The subsequent 6 search terms provided by Participant 2 and 3 were then also searched for in the Djibouti database rendering the same message. Unfortunately, there does not appear to be sufficient search queries of these provided words from Djibouti to elicit graphs. This is an unfortunate finding based on our study. It appears as though Google Trends, while comprehensive, is indeed fallible.
After trying to insert search queries that I thought would have sufficient volume to represent graphically (United States, Barack Obama), I eventually input "Djibouti," which finally elicited results. The search term, "Djibouti" is most commonly searched for in Djibouti, Djibouti, when narrowing the results to just Djibouti results. The most common language that Djibouti is searched for in Djibouti is French. There was a spike in search volume for "Djibouti" in Djibouti on June 12, 2008 when an article was published entitled, "UN council condemns Eritrean attack."

This can be compared to search queries for the term "Djibouti"
from all regions around the world. "Djibouti" is most commonly searched for in Regions like Djibouti, United Arab Emirates, Morroco, and France. Some cities where "Djibouti" is most commonly searched for include Dubayy, UAE, Ottawa, Canada, and Rennes, France. French, English, Swedish, and Dutch are most common languages when searching for "Djibouti."

Participant 1 correctly predicted the order in which the search terms provided would be relative to one another across the world. It is interesting to note that the order the participant gave the words is the same order that was predicted for highes
t to lowest search volume. The graph visually depicts the relationship between Drugs, Cowboy, and Cardboard in terms of their worldwide search volume.
  • Participant 2
[To see worldwide results please click each of these respective words: Synthesizer, Rex Grossman, Mr. Feeney]

Similar to the Participant 1, the words provided by Participant 2 did not have enough search volume in Malta for a graph to be displayed. This is again another unfortunate occurrence. The same technique was used for Participant 2 as was for Participant 1 in attempting to find some sort of graph. The other 6 search queries did not provide any sort of graphical depiction either.

In order to gather some results, I input the search query, "Malta" to elicit some sort of results within the Malta Google Trends database. Fortunately, this was able to provide some sort of graph with relevant news articles that coincided with spikes in the volume.

In Malta, the search term, "Malta" is most frequently searched for in Msida, San Gwann, and Valleta, all cities in Malta and in that order. The most common language "Malta" is searched for in Malta by frequency is German, English, and Maltese. One of the highest peaks in News Reference Volume within the Malta database when searching for the query "Malta" was on November 20, 2007, when an article entitled, "Queen to Celebrate in Malta," was published on News24.com.

Compare these results to worldwide trends for searching "Malta." Some of the most popular regions searching for "Malta" are Malta, Ireland, United Kingdom, Italy, and Austria. Some of the most popular cities are the three Maltese cities already mentioned, followed by, Poznan, Poland, Dublin, Ireland, and Thames Ditton, United Kingdom.

Participant 2 was correct in the predictions made about the order in which relative search volume would occur. Synthesizer has a higher relative search volume to that of Rex Grossman, and by virtue of there not being enough data on Mr. Feeney, one can deduce that Rex Grossman has a higher relative search volume than Mr. Feeney. This is visually represented below.
  • Participant 3
[To see worldwide results please click each of these respective words: Arsenic, Stencil, Magician]

Similar to the past two participants, all word choices by Participant 3, arsenic, stencil, and magician elicited no graphical results in the Belgium Database of Google Trends. Upon further review, some words provided by Participant 1 (Drugs & Cowboy) elicited results, but for fairness to each participant, I will conduct the same analysis as I have done for the prior two.

Therefore, I will share results from quering the search term "Belgium" in the Belgium database. The most prominent subregions that query the search term "Belgium" are Flemish Brabant, East Flanders, Brussels, and Luxembourg. Major cities querying "Belgium" include Leuven, Gent, and Brussels. The language that "Belgium" is usually queried in is English, Dutch, and French (in that order).

Recently, there was a surge in the News Reference Volume relating to the search query of "Belgium" within Belgium that linked to a story entitled, "First Industrial to Invest in New State-of-the-Art Logistics Facilities in Belgium." The story was published on October 6, 2008 and led to the highest New Reference Volume in the Belgium Database of Google Trends for the search query "Belgium" ever.

Compare this to worldwide searches for the search query "Belgium." The major regions in which "Belgium" is searched for occur in Belgium, Luxembourg, Ireland, the United Kingdom, and the Netherlands. Cities outside of the Belgium that most frequently query the search term "Belgium" include London, United Kingdom, Amsterdam, Netherlands, Syndey, Australia, and New York, New York. Around the world, the most commonly used language to query the search term "Belgium" is Dutch, French, English, German, and Italian (in that order).

Participant 3 incorrectly predicted the relative search volumes of the three randomly selected words provided. Based on worldwide Google Trends data, the relative search volumes of the words Arsenic, Stencil, and Magician are correctly ordered as Stencil, Magician, and Arsenic. Stencil is relatively searched for 2.65 times more than arsenic, and Magician is searched for approximately twice as much as arsenic. The graph below reveals this relationship.
  • Conclusion
The purpose of this study was to gain familiarity with the product Google Trends and make comparisons between the relative search volume of randomly selected words.

While Google Trends contains a wide breadth of data on search volumes, there is desperate need for its regional databases to contain more comprehensive data. Search queries in the Djibouti, Malta, and Belgium databases hardly provided any results when inputting some randomly selected words. Either these databases should not be provided to begin with, or they need to be more comprehensive in nature.

Two of the three participants in the study were correctly able to predict the relative search volume of their provided terms. This suggests that Google Trends provides intuitive knowledge. However, it is well evidenced by Participant 3 that the relative search volumes for particular words may be harder to predict.

Overall, Google Trends is a phenomenal resource that provides superb measurements of the relative search volumes of typical search queries performed on the Internet.

No comments: