Showing posts with label google. Show all posts
Showing posts with label google. Show all posts

Thursday, December 4, 2008

Motive Imagery and The Spread of Ideas

A couple weeks ago I ran a quick survey on my website, Those Answers, in order to gather data about the affect (if any) that motive imagery may play on the spread of ideas. I was interested in finding out if there was one particular class of motive imagery that people are more or less likely to disseminate.

Motive imagery is a psychological phenomenon that is rooted in the Thematic Apperception Test (TAT), in which a person tells a fantastical story about anything or may be guided by the use of images, sounds, or ideas. Based on the language and tones that people use to elicit their thoughts, whether they are stern or sweet, hopeful or sad, angry or happy, can indeed tell you about the current psychological state of that individual.

Motive imagery is then broken down into three core categories (there is actually a fourth dimension to motive imagery, but that will not be explained in this post). The classifications of motive imagery are rather simple, but are phenomenally complex when one explores their intricacies and how they were originally determined.

There is achievement motive imagery, which is based on positive evaluations, a desire to do better, positive goal oriented performance, or unique acts performed. There is affiliation/intimacy motive imagery, which is based on the longing for togetherness, sad feelings about separation, or the desire to help in a genuine way. Finally, there is power motive imagery, which is based on force and regulation and a need to impress others.

Motive imagery was established in the early 1950's by a famed psychologist at Harvard named David McClelland. McClelland asserted that by categorizing and collecting data on these TAT tests, one could establish deep, underlying motivations that people possessed. His work has been built on further by David Winter at the University of Michigan and Richard Boyatzis at Case Western Reserve University.

I am currently conducting an in depth research study on the role motive imagery plays in the corporate world. As I go on this adventure, my mind wanders, and I felt like gathering some data on another aspect of human psychology that motive imagery may be responsible for, and that is the dissemination of ideas.

When people hear information, they need to interpret that information for themselves, process it, store it, and then if they engage in social activities in which they can share that information, disperse it if they decide it is worthy.

I think that the concept of the transfer of ideas is mind blowing. Essentially, an idea is birthed in someone's mind, is uttered in a usable form of language to another organism that can go through the process described above, and then that other organism, feeling so compelled by the information they received makes a conscious decision to seek out and share other organisms with which it can spread the idea or information further.

There is a great deal of relevance to this dissection. In today's world where information is readily available and accessible at all times almost anywhere on the planet (newspapers, phones, Internet, television, etc.), one ought to wonder how human's decide on which ideas are the most pertinent, important, and really, worthy of our own cognitive functioning.

As a result of this inundation of information and knowledge, advertisers and news media have started to rely more heavily on Word of Mouth Marketing (WOMM), which is considered the best form of sharing ideas because it has the greatest impact and lasting effect on the individual with whom the idea is being shared.

Thus, are there ways to ensure that this information is disseminated more frequently? How do you encourage your idea or information to be spread via Word of Mouth? Is there a way to improve the odds?

With these questions in mind, I conducted the following study which I explain here:
  • Idea Dissemination Study
Starting at 6:00 am on November, 19 2008 until 9:00 am November, 24 2008, a group of 36 individuals were randomly sampled and were asked to take a survey. The individuals were sampled from my current Friends list on my Facebook account.

I feel the sample is validly random just based on the mere nature of the demographics of my friends, but in order to further substantiate the validity, I cross checked the locations from which the survey was accessed via Google Analytics over the November 19 to November 24 time period, and there were 83 unique visits from 40 cities around the United States. A majority of the visits came from Ann Arbor, Michigan, but there were several from: Bloomington, New York, Ypsilanti, Chicago, Northbrook, Palo Alto, Ft. Collins, Greencastle, Manhattan, etc.


The 36 individuals who filled out the survey were given the following information when filling out the survey, "Provided are some 'News Headlines' pulled from the same news source. Please determine which one of the three options provided in each case you would most likely share with another person."

The news source (newspaper, television, magazine, etc.) was left up to the individual. Future studies may want to classify which news source the participant is reading the "News Headline" from because there may be some variability between sources.

Participants were then asked to select a single headline from a list of three options. The three options provided headlines in one of each of the three forms of motive imagery (achievement, affiliation, and power). News headlines varied and were on several different topics from energy, to the Big Three, to the Economic Crisis, to Shrimp and Picnics.

Individuals were then asked, "Why?" after each selection, in which they could write down anything they wanted. Most of them made comments relating to their choice in picking the headline. It was an optional field for all five questions, and some fields were indeed left blank.

The participants then submitted their answers and were taken to a screen that explained the nature of the study and further reading that they could do on the topic.

The information that the participants submitted was then collected in a database and then converted into an Excel document. The number of times each motive imagery category was then calculated as a proportion of the total amount of responses. There was one question left blank by the 36 participants, which means that there were 179 (instead of 180) responses.
  • Results
To analyze the data, I used a one-proportion z-test (equation seen below) in order to test each motive imagery category against the null hypothesis that there is an equal likelihood that each motive imagery category would be selected. Therefore, I expected each motive imagery category could be selected one-third or 33.3% of the time.


A z-score allows you to attain a standard score in which you can then determine the p-value, or the probability of obtaining a result at least as extreme as the one that was actually observed, given that the null hypothesis is true. The p-value is defined by the area under the curve in a normal distribution, seen below. The maximum area of a normal distribution with a z-score is 1. The z-score is found on the x-axis of the graph. Based on this z-score table, the p-value is found by subtracting the z-score from 1, essentially subtracting the shaded area from the whole graph, which gives the probability of observing as extreme a result as the outcome observed.
In order for a result to be considered statistically significant, it ought be at least below 5%, which is quite a liberal p-value. In this experiment, I consider results below 5% to be statistically significant.

The results of the survey were:
  • Achievement Motive Imagery Sentences Selected: 60/179 or 0.335 (33.5%)
  • Affiliation Motive Imagery Sentences Selected: 73/179 or 0.408 (40.8%)
  • Power Motive Imagery Sentences Selected: 46/179 or 0.257 (25.7%)
By doing the one-proportion z-test for each of the three motive imagery categories, I arrive at the following z-scores:
  • Achievement Motive Imagery: 0.05
  • Affiliation Motive Imagery: 2.11
  • Power Motive Imagery: -2.17
By using a z-score table, provided here, I find the p-values:
  • Achievement Motive Imagery: 0.5199 or 51.99%
  • Affiliation Motive Imagery: 0.9826 or 98.26%
  • Power Motive Imagery: 0.015 or 1.5%
These are very fascinating results that need a little further discussion.
  • Discussion
Based on the findings of this study, it appears as though there is a significant relationship between affiliation and power motive imagery and dissemination of ideas. This can be deduced by the relatively high and low p-value scores. The score of 98.26% indicates that in only 1.74% of the cases would I expect to find an equally extreme result as those found in this study. That is well under 5%. Additionally, the 1.5% from power motive imagery reveals that in only 1.5% of cases would I expect to find the results that I found for power motive imagery and idea dissemination, which is also below 5%. Unfortunately, achievement motive imagery cannot reject the null hypothesis that it would be equally distributed because of its p-value of 51.99%, well above 5%.

These results indicate to me that ideas that are framed with affiliation motive imagery in mind are most likely to be disseminated, while those ideas that are framed with power motive imagery are least likely to be disseminated.

There are several shortcomings of this research that counter these statistical findings. For instance, the survey just asked individuals which statements they "would most likely share" with another person, and doesn't test it practically. The actual act of telling has not yet occurred, and is thus just the best guess on the behalf of the participant.

Furthermore, and building on the previous idea, participants may have selected headlines that just interested them and not one's they would necessarily take the time to spread to others. Several people cited in their "why?" statements that the article interested them, however, many also made reference to the fact that the headline interested people they knew, which leads one to believe that the idea was generated with the other people in mind. These conflicting points are irreconcilable in the scope of this study, and I urge others to address this in the future.

It is counter-intuitive that ideas that are warm and fuzzy, the type that are common to affiliation motive imagery, would likely be disseminated. Studies have shown that the news is filled with violence and negative stories, and people are more likely to spread that kind of news (Source). These findings go against that notion, especially when power motive imagery headlines, more common for force, control, and regulation was least likely to be disseminated. Perhaps the issues and characters involved in the headline is more significant. That idea is beyond the research question of this study.
  • Conclusion
The purpose of this study was to investigate the impact of the three categories of motive imagery on the frequency in disseminating ideas. In order to evaluate this, a short survey was given to 36 randomly sampled individuals in which they selected headlines that they would most likely share with others.

The findings indicate a statistically significant result for affiliation motive imagery and power motive imagery. Affiliation motive imagery headlines were shown to be most likely to be disseminated while power motive imagery headlines were shown to be least likely to be disseminated. Both were statistical at the 5% level.

Deciding on which issues are most important to us may indeed be a function of motive imagery, but these findings need to be further substantiated with more testing. There are several alternative hypotheses that were not addressed in this study that may impact the validity of the results.

Maybe it is just all about the words we hear and how we hear them that makes a subconscious decision about what is most important to us and what we choose to tell others. These results are but a a first step in finding the answer.

Saturday, November 8, 2008

Word Frequency Measurement

I came across a fascinating product today that Google operates. The product is Google Trends and visually displays information relating to search queries that could be entered into a Google search. It's unbelievable comprehensive. Aside from providing a line graph of the relative search volume since 2004, it also provides you with the ability to narrow your search to particular regions of the world or even singular countries. For a person who wants to use Google as their advertising medium on the Internet, this feature is without doubt a must. Understanding where and why people are searching for the terms they are is a critical feature that sets Google apart from the rest.

You can also localize your search term trend for a particular span in time. So, if you are only interested in how people have been searching for these words within the past 30 days, you can set that option very easily. If you are particularly interested in how many times that search term was queried in a particular month since January 2004, Google Trends will allow you to set that.

The graph has an interesting dependent variable called Search Volume Index. On Google Trend's About page they define it as "how many searches have been done for the terms you enter, relative to the total number of searches done on Google over time."

Just below the graph are three columns at are unbelievably helpful in understanding how exactly the search term you're looking at is used. There is a column for regions, which specifies the rank of usage by country. Then, next to that column is an even more specific look at where the search term is being queried by ranking the cities by usage. As you will see later, the city in which the search term is queried the most is quite intuitive. Finally, the last column is what language that term is most queried in. Most of the tests that I've performed have had the most usage in English, however, it is fascinating to see how the rest of the queries not in English relating to the search term rank by language.

Probably one of the coolest features about Google Trends is its pairing with relevant news articles. Below the primary graph that reveals search volume there is a another graph called New Reference Volume. "This graph shows you the number of times your topic appeared in Google News stories." When there are spikes in the search term volume, Google Trends automatically flags the occurrence and links the spike to an actual news article, which probably explains the spike in search term volume.

It's an amazing feat of computational engineering. When it comes to understanding how people are using the Internet in terms of what their searching, I cannot think of a better source than Google Trends. Google has approximately 60% market share of all search queries, and this data is contigent on that sample. Google Trends is comprehensive and provides the user with the relevant information that he or she is looking for.

In order to gain further understanding of Google Trends, I ran a quick study in order to familiarize myself with the options and processes available.
  • Google Trends Study
This study was conducted on November 8, 2008 at approximately 7:30pm. The purpose of the study was to gain familiarity with Google Trends and make comparisons of search term usage for a randomly sampled set of search terms across regions.

The study sampled three words from three randomly sampled individuals living in my house. Participants were asked to, "provide three words that could be possible search queries in a well known search engine, like Google." The individuals choosing the words sat together and were asked to recite them aloud. This would ensure that participants didn't provide duplicate words.

After the participants gave me three randomly selected words each, they were asked to, "provide a country somewhere in the world aside from the United States." Each participant then selected the country in the same fashion that they provided the words. Some participants took longer than others; one participant was still selecting words while another had already provided both their words and country. The time length in which the participants finish their selections is not terribly important, however, it should be done in a reasonable amount of time.

Participants were also asked to predict the relative search volume of their three words from highest to lowest across the world.

The following table summarizes each participant's selections and the order in which they expect their search volume to be from highest to lowest:
As you can see, there was a rich diversity in the words that were selected. Participant 2 focused on more Proper Nouns than regular nouns like the other participants involved, yet, still a sufficiently random sample.
  • Participant 1
[To see worldwide results please click each of these respective words: Drugs, Cowboy, Cardboard]

Upon requesting search volumes for the three words provided by Participant 1 in Djibouti, Google Trends was unable to provide search volume information citing the following:
Your terms - ????? - do not have enough search volume to show graphs.
Suggestions:
  • Make sure all words are spelled correctly.
  • Try different keywords.
  • Try more general keywords.
  • Try fewer keywords.
  • Try viewing data for all years and all regions.
The subsequent 6 search terms provided by Participant 2 and 3 were then also searched for in the Djibouti database rendering the same message. Unfortunately, there does not appear to be sufficient search queries of these provided words from Djibouti to elicit graphs. This is an unfortunate finding based on our study. It appears as though Google Trends, while comprehensive, is indeed fallible.
After trying to insert search queries that I thought would have sufficient volume to represent graphically (United States, Barack Obama), I eventually input "Djibouti," which finally elicited results. The search term, "Djibouti" is most commonly searched for in Djibouti, Djibouti, when narrowing the results to just Djibouti results. The most common language that Djibouti is searched for in Djibouti is French. There was a spike in search volume for "Djibouti" in Djibouti on June 12, 2008 when an article was published entitled, "UN council condemns Eritrean attack."

This can be compared to search queries for the term "Djibouti"
from all regions around the world. "Djibouti" is most commonly searched for in Regions like Djibouti, United Arab Emirates, Morroco, and France. Some cities where "Djibouti" is most commonly searched for include Dubayy, UAE, Ottawa, Canada, and Rennes, France. French, English, Swedish, and Dutch are most common languages when searching for "Djibouti."

Participant 1 correctly predicted the order in which the search terms provided would be relative to one another across the world. It is interesting to note that the order the participant gave the words is the same order that was predicted for highes
t to lowest search volume. The graph visually depicts the relationship between Drugs, Cowboy, and Cardboard in terms of their worldwide search volume.
  • Participant 2
[To see worldwide results please click each of these respective words: Synthesizer, Rex Grossman, Mr. Feeney]

Similar to the Participant 1, the words provided by Participant 2 did not have enough search volume in Malta for a graph to be displayed. This is again another unfortunate occurrence. The same technique was used for Participant 2 as was for Participant 1 in attempting to find some sort of graph. The other 6 search queries did not provide any sort of graphical depiction either.

In order to gather some results, I input the search query, "Malta" to elicit some sort of results within the Malta Google Trends database. Fortunately, this was able to provide some sort of graph with relevant news articles that coincided with spikes in the volume.

In Malta, the search term, "Malta" is most frequently searched for in Msida, San Gwann, and Valleta, all cities in Malta and in that order. The most common language "Malta" is searched for in Malta by frequency is German, English, and Maltese. One of the highest peaks in News Reference Volume within the Malta database when searching for the query "Malta" was on November 20, 2007, when an article entitled, "Queen to Celebrate in Malta," was published on News24.com.

Compare these results to worldwide trends for searching "Malta." Some of the most popular regions searching for "Malta" are Malta, Ireland, United Kingdom, Italy, and Austria. Some of the most popular cities are the three Maltese cities already mentioned, followed by, Poznan, Poland, Dublin, Ireland, and Thames Ditton, United Kingdom.

Participant 2 was correct in the predictions made about the order in which relative search volume would occur. Synthesizer has a higher relative search volume to that of Rex Grossman, and by virtue of there not being enough data on Mr. Feeney, one can deduce that Rex Grossman has a higher relative search volume than Mr. Feeney. This is visually represented below.
  • Participant 3
[To see worldwide results please click each of these respective words: Arsenic, Stencil, Magician]

Similar to the past two participants, all word choices by Participant 3, arsenic, stencil, and magician elicited no graphical results in the Belgium Database of Google Trends. Upon further review, some words provided by Participant 1 (Drugs & Cowboy) elicited results, but for fairness to each participant, I will conduct the same analysis as I have done for the prior two.

Therefore, I will share results from quering the search term "Belgium" in the Belgium database. The most prominent subregions that query the search term "Belgium" are Flemish Brabant, East Flanders, Brussels, and Luxembourg. Major cities querying "Belgium" include Leuven, Gent, and Brussels. The language that "Belgium" is usually queried in is English, Dutch, and French (in that order).

Recently, there was a surge in the News Reference Volume relating to the search query of "Belgium" within Belgium that linked to a story entitled, "First Industrial to Invest in New State-of-the-Art Logistics Facilities in Belgium." The story was published on October 6, 2008 and led to the highest New Reference Volume in the Belgium Database of Google Trends for the search query "Belgium" ever.

Compare this to worldwide searches for the search query "Belgium." The major regions in which "Belgium" is searched for occur in Belgium, Luxembourg, Ireland, the United Kingdom, and the Netherlands. Cities outside of the Belgium that most frequently query the search term "Belgium" include London, United Kingdom, Amsterdam, Netherlands, Syndey, Australia, and New York, New York. Around the world, the most commonly used language to query the search term "Belgium" is Dutch, French, English, German, and Italian (in that order).

Participant 3 incorrectly predicted the relative search volumes of the three randomly selected words provided. Based on worldwide Google Trends data, the relative search volumes of the words Arsenic, Stencil, and Magician are correctly ordered as Stencil, Magician, and Arsenic. Stencil is relatively searched for 2.65 times more than arsenic, and Magician is searched for approximately twice as much as arsenic. The graph below reveals this relationship.
  • Conclusion
The purpose of this study was to gain familiarity with the product Google Trends and make comparisons between the relative search volume of randomly selected words.

While Google Trends contains a wide breadth of data on search volumes, there is desperate need for its regional databases to contain more comprehensive data. Search queries in the Djibouti, Malta, and Belgium databases hardly provided any results when inputting some randomly selected words. Either these databases should not be provided to begin with, or they need to be more comprehensive in nature.

Two of the three participants in the study were correctly able to predict the relative search volume of their provided terms. This suggests that Google Trends provides intuitive knowledge. However, it is well evidenced by Participant 3 that the relative search volumes for particular words may be harder to predict.

Overall, Google Trends is a phenomenal resource that provides superb measurements of the relative search volumes of typical search queries performed on the Internet.

Tuesday, July 29, 2008

The Search is On

I want to make reference to the sudden resurgence in the online search engine market that I am finding quite titillating at the moment. Within the past two weeks I have become aware of two equally impressive and significantly different search engines.

The first one that caught my eye was referred to a friend of mine who attends the University of Illinois (who is typically on the cutting edge of most technological innovations which I am continually in awe of). The website is called Scour and as the name suggests the website is a search engine that in effect “scours” the Internet combining the powers of Google, Yahoo!, and MSN. Who cares! You may think that this a show of hubris, attempting to topple Google and Yahoo! using their own technology.

Well, where Scour is slightly different is in that it pays you to search. I’m not sure of how this works exactly, but the best understanding I have of it is that you search enough times, accumulate enough points, and then they set you up with an online American Express card. Some people were weary that they didn’t ask for an address, but just because you don’t have an address doesn’t mean you can’t surf the Internet (at least these days).

There are several problems with this search engine, however, that I think will stop it from becoming the next big thing. First, the search itself is quite a lengthy process. When I say lengthy I mean it takes around 3 or 4 seconds to find the information you’re looking for. In the age of Google and Yahoo! blasting at relevant results in mere split-seconds, 3 or 4 seconds turns into an eternity very quickly. If they want to really compete with the big boys, they’re going to have to drastically lower this wait time.

Second, getting paid for search is great, but in order to reap any of the benefits you have to attain something ridiculous like 6400 points. In order to get a point you can do several things. You can search, this will result in 1 point being added to your total. So, getting to 6400 searches won’t take that long I guess, at around 20 searches a day for a year that’s an American Express card. Then again, when I look at my own Google search history (a very nice feature if you ask me), I’ve searched 4923 times in the past year. So, I suppose it is possible.

Aside from merely searching, however, you could also do two of the other options. You can vote on the relevance of your search by clicking the thumbs up or thumbs down icon. This will generate 2 points. Wow, now we’re talking. That’s going to cut my time in half in order to get that American Express card!

Another feature you’re definitely going to want to make use of if you’re on the 6400 point track is the comment section for each search result. By inputting a comment for a particular search this will yield 3 points. My goodness, this American Express card doesn’t seem so far off anymore.

I have some theories about these additional features that Scour has made use of. If you recall back to my post about Doogle, in which I explored ideas about creating the next generation search engine, I made it clear that the next generation search engine would have both the analytical capabilities of Google’s algorithm and also the compassionate understanding of humanity. The search engine must be both flawless in its approach to digging up data on the Internet, but it also has to have a human component.

I think that if Scour were smart, and the more I think about this the more I realize that they must be doing what I’m about to explain and are indeed smart, they would start compiling a database of the information that is being input into it at the moment. For instance, when someone searches “cat” using Scour, people can vote on and comment on the most relevant.

However, how often does someone actually search for something as trivial as “cat”? I’m not saying there is anything wrong with this search term, rather that searches are becoming more complex, and typing in the string, “how can I get from Memphis to Cincinnati by taking a plane and then a bus,” are probably becoming far more commonplace.

Google has excelled in simplifying search, and perhaps that is where it will find its limits. Typing in more complex strings don’t need algorithms, but rather human input to reach and arrive at an answer. At this point in our computing ability, no computer can truly answer some complex human queries in the most relevant way.

That is why I think Scour is smart to begin using everyone’s favorite search engines. Anyone who is anyone, literally almost anyone on the planet with access to the Internet, has used one of the three search engines that Scour employs. This makes people feel comfortable when they are searching. They see those happy symbols of accurate searches and feel warm on the inside.

Every time Scour gets a really complex search query and users put a thumbs up or thumbs down and comment on why the search they went to was more or less accurate, they can put another coin in the piggy-bank.

I think what they will eventually do after probably a year of compiling enough data is release another search engine that they claim to be the most accurate in the entire world, and you know what, they’re going to be right.

They’ll have both elements to the next generation search engine. Not only will their searches be faster (because it will be their own technology and draw from their own servers, etc.), but it will also be absolutely incredibly relevant, especially when it comes to asking it insanely hard questions.

I think that there is a tremendous amount of potential to Scour and they may not topple the giant that is Google and Yahoo!, but heck, I think that they’re going to give them a run for their money.

The second search engine, which I have done considerably less thinking about and know far less about is Cuil.com. Apparently it was just launched within the past couple weeks and is doing fantastically well.

From the little I know about it, it was started by someone Google Executive who started their own company with what they perceived to be a better product.

After searching on it for a little, I can see why some may perceive it to be a little better. It provides the results in a far more stylish way. The layout is a little different with three columns of search results that have pictures associated with them and a little more description than the two-liner than Google typically provides.

I have to admit, the website is fresh looking and it is fast and semi-reliable. It will probably have to work out a couple kinks in the next couple weeks if it has any chance of competing.

I also have to commend the designer of the search engine for their idea about grouping information that is relevant to a particular search term. For instance you can type in “University of Michigan” and then the results will provide you with some categories that you can look into deeper if that was perhaps what you were really referring to when you typed in “University of Michigan.”

There are several problems with this. First, it is attempting to predict what you’re searching for and I think that’s a bad strategy for search engines. People typically know what they want, they don’t want to be led down random roads where they fall into an abyss of the Internet garbage that is out there.

Second, who and how is the determination made for particular categories that certain search queries get filtered into? I don’t like people to make decisions about my search habits and Cuil.com is attempting to do this. Not clever. Not Skoda.

Here is why I think Cuil.com is just Google with a pretty dress on, except now Google is more annoying and doesn’t give you what you want.

Cuil.com has nice pictures next to their search terms and sweet descriptions. Wow, these are all wonderful features, but Ask.com tried the same technique and they haven’t move an inch after their initial marketing push to gain market share when they made their changes at first. People want simplicity when they’re searching (unless they’re searching for really complex things, in which you need a more complex search engine like Scour).

The problem with entering into the search engine market right now is that if you’re not significantly better at doing something than Google, you’re not going to be able to take any market share away from them. They have a stranglehold on search. That is because they consistently provide relevant results in a quick and timely manner. That’s a tough practice to leave.

Cuil.com does the same thing. They provide search results quickly. It just looks a little different. It doesn’t really do anything much better.

Then again, I still need to do some searching with Cuil.com, and I could be wrong about all of these things. But, I just don’t know why I would stop using Google to use Cuil.com, there isn’t really anything in it for me. I’m so comfortable with my sweet sweet Google, for me to use anything else would take something drastic (or do something far better like solve my complex search needs).
  • Conclusion
Scour.com is a search engine that uses the powers of the three largest search engines on the Internet to find the most relevant search results. However, through their implementation of voting and comments they are adding a human element to search, which I believe they will direct into a future enterprise that will be unbelievably helpful when dealing with complex searches. They also pay you for your hard work, so that’s not bad.

Cuil.com
is a new search engine that has a fresh look and apparently the largest database of archived Internet pages within its system of any search engine, even Google. While this is a mighty feat, the Internet was big enough as it is, and having a couple hundred million more pages doesn’t really impress me all that much. Additionally, the interface is far “fluffier” than that of Google, which I don’t think provides it with a competitive advantage of any kind.

I think that Scour.com has a huge chance of stealing a ton of “complex” search market share in the coming years. They are building the foundation at the moment. But, like most thinks that attempt to take on Google, both of these search engines will probably be eaten up and fed to one of Google’s many spiders that scour the Internet.

Nobody outsearches the Googmonster!

Tuesday, April 29, 2008

Doogle (Google + Digg)

The Internet is an unbelievably powerful tool. Never before in the history of human life has information been so universally accessible. What’s more is that we’ve also figured out ingenious ways in which to categorize and sort this inundation of knowledge. The Internet is probably flooded with millions upon millions of web pages each day. To give a minute idea of how large the Internet is growing, YouTube.com, a social networking site in which videos are uploaded by users, is estimated to have 825,000 videos uploaded daily.

This is just a single site in the huge expanse that is the Internet. Yet, rather than sputtering to a slow, painful death, the Internet collects and organizes this information promptly and logically. The primary method that has been developed in order to deal with categorizing the Internet has been search engines. Search engines have been around for quite a long time after I did some research regarding them.

I found out that one of the first search engines was called, “Archie,” and it was launched in 1990 by a student at McGill University (Source). The way that the search engine worked was by downloading various sites found on public anonymous FTP sites, or simply the Internet circa early 90’s, and indexed them. This created a searchable index of file names; however, it didn’t organize what was in those files. In later years, web search engines like “Aliweb” were developed in which people would upload sites into the database in order to make it searchable for Internet users (in ways, this is similar to Web 2.0 sites like Digg and Reddit, but those sites now have a rating system to eliminate the garbage from your search).

In later years, other search engines were founded that are what some may consider a regular search engine: AltaVista, WebCrawler, AskJeeves, and Excite. These systems worked very well in creating a portal to the hundreds of thousands of websites that you may have been searching for based on the frequency of the keyword that you were searching for. In the early days of Internet search engines, there was a very simple formula in how to catalog results. Search engines would organize web pages based on the frequency of the keyword you were searching for on that page.

Therefore, if you entered the keyword “dog,” the search engine would return results to you so that the page with the highest amount/frequency of the word “dog” on it would show up first, and so on and so on. This worked pretty well at the beginning and was one of the things that Yahoo! did incredibly well. This helped them gain a large share of search market. However, as the Internet steadily grew, people needed to develop better ways to organize their search results so that information could be found quicker, better, and easier. That is when Google developed its unbelievably brilliant, yet simple, formula. Larry Page and Sergey Brin, the founders of Google, developed an algorithm known as PageRank, aptly named after Larry Page.

I am going to attempt to explain PageRank as if I were talking to a little child, because there almost isn’t enough Internet to fully comprehend it. Quite simply, it uses the entire Internet to rank the validity of websites while also using the old keyword frequency cataloging technique. Google’s algorithm is essentially grounded on the theory of referencing and it is quite an old concept if you think about it. Google thought that if other websites referenced your own site (through means of a hyperlink), this essentially translates as a way of saying, “this site knows what they are talking about.

I believe in them and you should too.” So, the more websites that reference your website, the more people seem to agree that your information is reliable and warranted, and thus should improve your rating when searched for in a search engine. The reason that this is an old concept is because it is very similar to citations used in books or scholarly documents. If there are a lot of other sources that cite a particular source, it is generally accepted as something that is pertinent to that particular topic.

Think about physics. Tons of studies cite evidence relating to Albert Einstein or Stephen Hawking, because these guys have discovered information that is at the core of the subject. Google works in a very similar way. It uses the entire Internet to rank it’s catalog of billions of web pages. But just as Google improved upon a system that people thought was perfect enough as it is, I think that there is potential for improvement in the future.

My idea stems off of Google and Digg. Google is a brilliant search engine that works almost flawlessly, and Digg is a fantastic Web 2.0 site that uses the voice of Internet users to promote amazing things and sweep away stupid things. Ultimately, I propose a search engine that uses the algorithm that Google has developed, using a passive system of ranking websites as they do now through hyperlinks/referencing, and incorporating a voting system in which people can vote on how well the website they found when searching actually met their needs. This is a far more active and “Web 2.0” approach. I think that this could be unbelievably effective, because it can move that third option on the page to the top of the page, which ends up saving an enormous amount of time in the long run.

Testing for Realism:

  • Is it well accepted by a particular target demographic?
It can be. When I asked Yahoo! Answers what are some of the “barriers to entry when creating your own search engine to compete with Yahoo and Google,” the only response that I was given was, “well…you can’t even enter the market…” I found that quite amusing. It’s true that the search engine market seems quite saturated at the moment, especially when you consider that goliaths like Microsoft can’t even enter into the race between Yahoo! and Google.

However, in the late 1990’s when AOL was running the Internet, people laughed when they thought that in 10 years they would be using something completely different, especially a thing called Google. But the reality is, Google came out with a better product that provided people with what they needed faster. Slowly but surely, people started to sway away from their older technology and jump into something that truly got the job done. I feel this is and can be true with this sort of technology.

It simply makes for a more accurate search engine when you combine the passive and active roles of Internet users. That is one demographic that I can see switching over to this sort of search engine; the masses who want to use something better. That is a tremendous amount of people. Yet, consider further that there are on average 353,987 new Internet users per day. This may include people in developing countries who are using the Internet for the first time, young people who have never used the Internet yet, or older people who haven’t touched the darn thing their whole lives.

These numbers are complimentary of Google Answers (Source). These people, although they probably know about Google and Yahoo! are not entrenched in their Internet habits and if there is a better and more efficient technology, it’s likely that they will opt for it. Slowly, but surely, a superior product will gain market share. I don’t expect this to happen overnight or quickly, but rather over a couple years.
  • Does it fill a need?
Like all things on the Internet, it doesn’t really fill a need. The world existed well and just fine prior to the Internet, yet we have come enormously far since we started using it. Therefore, the direct answer to this question is, “no, it does not fill a need,” but the Internet is all about innovation and adaptation.

I would say the Internet is almost the personification of human ingenuity, creativeness, and improvement. Therefore, I feel as though we are doing the Internet and ourselves a disservice if we don’t continue to adapt and expand our possibilities.

We need to remember that we should always be striving to improve, and just because something works well now, doesn’t mean that it is going to work well indefinitely. We should always be conscious of moving forward and trying new things, because who knows what we can learn from it.
  • Can it be setup by an individual or at most small group of individuals?
This is absolutely true. It will take a small group of individuals with an unbelievable proficiency in computer science. I have to imagine creating an algorithm that works like that of Google’s to be quite sophisticated in order to create. That is merely a component of this 21st century search engine. Additionally, whoever designs this super search engine will have to incorporate a voting structure to accommodate people using the site. I envision this in the following way: The product would work something similar to the likes StumbleUpon or any sort of toolbar that you may have installed into your Internet browser.

In order to get active “votes” on websites by users, not only using the passive voting that the computer algorithm achieves, users will have to install some sort of toolbar into their Internet browser (better yet, the search engine could be its own Internet browser). The browser can be set up in a number of different ways. It can have three designs as I see it. The first would be a toggle, in which users can drag how effective the site was in giving them what they needed. They can move the toggle back and forth between 0 and 1000. If you got exactly what you wanted, put 1000, but if you didn’t like the site at all, give it a 0.

These tabulations would then be averaged out and combined with the passive ranking algorithm to give a middle ground of sorts. While it does this, the search engine will also have to remember the keywords that you searched for and make the ranking you gave unique to that keyword or string of keywords. For instance, you may search for “dog” and vote 750 on the first site that comes up. But, if you search for “dog collars” that exact same site might be a 100. The search engine will have to remember the phrase you searched for and connect your respective ranking to it.

The second method that you can use is a simple 1 through 10 scale in which you select a number and submit it in. This doesn’t allow for as much variation, however.
The final method would be to ask the person who was searching, “Did this website find what you were looking for,” and people can reply “Yes or No.” This allows for the least amount of variation, however, it still provides people will the ability to actively vote on how accurate a site meets their needs.
  • Can it generate income?
The search engine and Internet industry in general is huge and shows no signs of slowing down in my estimation. The primary stream of revenue that search engines utilize is through advertising, and I believe the advertising industry to be healthy as well as the Internet industry. According to HitWise.com, a company that monitors the usage of websites, they provide a list featuring, “the top 4 leading search engines based on US Internet usage, ranked by volume of searches for the 4 weeks ending March 29, 2008” (Source).

Essentially, these numbers provide a snapshot of the current search engine market share in the United States. According to them, Google maintains 67.25%, Yahoo! 20.29%, MSN/Live Search 4.88%, and finally Ask with 4.09%. This makes up 96.5% of the Internet Search Engine market in the United States, and I assume that the other 3.5% is made up by “bottom dwellers” of the search engine market. If we look at the revenues that these companies produce we see the following:

Google – 16.5 Billion or 245M/1% market share
Yahoo! – 7 Billion or 345M/1% market share
MSN/Live Search – 1.848 Billion or 462M/1% market share
Ask.com – 227 Million or 56M/1% market share

**Click the Image to see it more clearly**
It doesn’t take a statistician or mathematician to see that the numbers here are enormous. There is a great deal of revenue that can be made when entering the search engine industry, especially with such an innovative and fantastic product such as this one. The reason I provide the rate at which these companies produce revenue per 1% in market share is to provide a conservative estimate of potential revenue that can come from entering the search engine market, even in the case that you don’t topple the giants of Yahoo! and Google.

By looking at the top four companies in the Internet search engine industry, there is a mean of 277M for every 1% of market share that they capture. However, I don’t accept this as a conservative enough estimate, due to the fact that as a search engine gains popularity, advertising space begins to become more attractive and thus more expensive. This is well illustrated by the gap between Ask.com and both Yahoo! and Google. It is somewhat contradicted by MSN/Live Search, but I assume that Yahoo! and Google are going through diseconomies of scale (Source) resulting in increased per-unit costs.

After constructing a graph that illustrates the Billions of Dollars of revenue produced by each company measured against the percent of market share that each assumes, a linear trend line can be added. The linear trend line has an intercept of (0,0) because we assume that if you don’t enter the market you don’t make any money, but if you enter the market you will have a linear increase in revenue. The trend line has an equation of y = 0.2536x, which essentially translates as $253 million for every 1% of market share that is assumed. Therefore, I would make a conservative estimate that this search engine, if able to capture 1% of the Internet search engine market share, can produce revenue of $253.6 million per year.
  • Is it marketable?

This search engine is most definitely marketable. However, I don’t think that you go about marketing this website in the traditional sense. Sure, websites like GoDaddy.com have had tremendous success based on their racy Superbowl commercials, but I think that it would be a lot more beneficial for this sort of search engine, especially if it is trying to take on mammoths like Yahoo! and Google, to spread virally through word of mouth marketing (WOMM).

This marketing is especially effective, because it is usually friends or close loved one’s who refer you to a specific thing. This is how Google initially got its start and how Web 2.0 websites like Facebook, Digg, and a slew of others got their start as well. This is a very effective way to get users to start using your product and tell others about it. Word of mouth creates a buzz and people typically respond very well to it.

Especially when it comes to a new website, people who are part of that demographic who are currently users of Yahoo! and Google, will only move if they have explicitly been told by someone close to them. As for the new users, they will be inspired by the buzz and follow suit.

If the Internet has taught us anything, it is that the world is a constantly changing place that is continually looking for new ways to perform efficiently and effectively. Surely there are Internet search engine giants now that control over 60% (in some cases) of the market, but if we remember the old English adage, “the bigger they are, the harder they fall.”

No one can really predict what will come of the Internet in the next 5, 10, or 20 years from now, but I guarantee you that if we find a better way to do something, there’s no reason why we would stop ourselves from doing it. I believe Doogle can be the effective conglomeration of passive and active cataloging which will make for the best Internet search experience possible.