Whipnet's Home
Whipnet's Web Hosting Services
Whipnet's Tech Services for Houston, Tx
Contact Whipnet


Web Page Deisgn
Web / Mail Hosting
You Are Here --->



Search Engine Beginnings
The History:
Part 1
Part 2

Browsing Safely?

Firefox Safe Browsing

Firefox 1.0 released

Where is the Industry Headed?

Future of Computers




Search Engine Future


Google Search EngineI was asked by SEMPO (the Search Engine Marketing Professional Organization) to speculate on the future of search marketing. Then came the hard part: Limit it to one column.

The thrilling part of speculating about any aspect of the Internet, particularly search, is its virtually limitless potential. Every day new possibilities open up. I still believe that the Internet, and whatever it eventually morphs into, will be the single biggest factor in history to change our society. It already reaches into every aspect of our lives, and you ain't seen nothin' yet! Borders will crumble, the world will be at our fingertips, lines of trade will change completely, and our interactions with others, will change forever.

Yahoo Search EngineBut let's start with search marketing, and to do that, we have to look at search. One has to follow the progression of the other. The easy thing to do would be look a little in the future, where we can still discern a possible horizon and build from there. As soon as you look over that horizon, it becomes much more difficult to guess what might come. But it's into that great blue yonder where I believe the really exciting future of search lies.

Why DO The Search Engines Change So Much?

Why do the search engines constantly have to evolve into a different type of engine? Why can't they stay the same?

Overture Search EngineTo answer this, let's look at the ultimate goal of a search engine. What do the search engines want to do? They want to provide relevant results to you, the user. Why can't they do that under the current system?
There are several reasons why the current system isn't working. For one thing, the Internet is growing at an unheard of rate. Plus, spammers are growing at an unheard of rate as well. In many ways, the engines are fighting a losing battle to provide relevant results while combating spamming and duplicate pages.
In essence, the engines need a way to store more pages, combat spam, and still provide (or attempt to provide) pertinent results. So, in an effort to provide relevant results, the engines began sliding in other variables, which is where the 1st, 2nd, and 3rd generation search engines come in.

1st, 2nd, and 3rd Generation Engines

By understanding the path we've taken to get where we are in this crazy search engine business, it might give us some insight into where we're going.

You may have heard of 1st, 2nd, and 3rd generation engines, but what exactly does that mean?

Michael Campbell explains,

In the beginning, search results were very basic and largely depended on what was on the Web page. Important factors included keyword density, title, and where in the document keywords appeared.

First generation added relevancy for META tags, keywords in the domain name, and a few bonus points for having keywords in the URL. Basic spam filters emerged that got rid of keyword stuffing and same color text. The portals also made their appearance, and engines started looking like giant billboards and overstuffed yellow pages.

All of this is quite familiar, isn't it? Almost too familiar.

But, do META tags hold as much importance as they once did? No. Does using keywords in various tags help as much? Generally not.

Instead, the engines took it a step further in their quest for relevant results by bringing in 2nd generation engines.

Campbell explains,

Second generation, which is in full swing with the themes thing, added much in the way of off page criteria and link analysis. A few of the major components they employ are tracking clicks, page reputation, link popularity, temporal tracking, and link quality. Then they started adding in term vectors, stats analysis, cache data, and context where two-word keyword pairs were extracted from a page to better categorize it.

We'll cover "term vectors" and other information mentioned in the above paragraph later in this article. For now, let's continue with 2nd generation engines.

We all know how important a good solid link popularity is these days. Does any old link count? Certainly not. The days of huge link exchange programs with no thought for "related" links are over.

Plus, with Google's PageRank system and DirectHit's method of tracking clicks and the length of visits, we're seeing more evidence of a 2nd generation engine.

But what is a 3rd generation engine? It's almost mind boggling to consider.

Campbell explains,

Third generation is already underway. It adds word stemming and a thesaurus on top of the term vector database to assist in keeping a search in context. Auto extraction of keyword pairs also helps automatically categorize a page, where searches like `shop for' or `find' trigger totally different search results based on the context or intent of the person doing the searching.

G3 adds Web maps which, although not searchable, are a useful filtering tool to get rid of duplicate sites and many stand alone pages that drive traffic to only a few destinations. This means pages like doorways, gateways, entry, splash, or whatever you want to call them, will soon get filtered out.

They will also be extracting as much data as possible about your individual searching habits. All the major engines plan on building personal profiles, little robots that `come to know you' over a period of time, based on past searching habits.

Okay, so we have a good idea of where the search engines are headed, but how can we keep up? The 2nd and 3rd generation engines are theme-based, but what does that mean, and how does it translate to what we need to do with our own sites?

What are "Theme" Engines?

What exactly is a "theme" engine? First, let's hear the scientific definition. This isn't easy reading, so it might help if you have a brown paper bag handy in case you hyperventilate.

Computer scientists working with Campbell define "themes" or "topics" as,

Using a term vector database, they weigh page keyword density to calculate the page vector, which is compared and stored relative to the term vector. They then compute a Web page reputation by graphing interconnectivity and link relevancy, making sure the reputation of the page and the content on the page actually match. The closest matches get the highest search engine positioning.

Uh huh. Kinda hurts the brain cells, doesn't it?

Now, let's look at an easier-to-understand explanation. How does Michael Campbell define a theme engine?

One. The answer is one. What you say about your Web page, how the structure of other people's Web pages compares on the same topic, and what other people say your site is about, must match, be in harmony with each other, be as one.

Or, in the cold hard world of the search engines, where everything is weighted and calculated according to mathematical formulas, whoever is closest to the 1.000000 without going over is the winner, coming up tops in the search engine.

A theme engine looks at all the information on a 'seed set' or a group of sites and pages that it has already spidered and has in its index. It assigns each page in the index a number or page vector. This becomes the `core' of the search engine.

Suppose you just submitted a Web page, so you are now in competition with everything in the core. The engine looks at everything on your page, from one and two keyword phrase densities, to page length, compares it to the seed set and assigns your page a number, for each keyword phrase. These numbers assigned to the keyword phrases are known as 'term vectors.'

The closer your term vector is to the page vector, the better chance your page has of being a top ten contender for any particular keyword phrase. You might even be 'folded in' to the core, bumping off some other page, causing it to fall out of the search engine. (Some engines will adopt the `pay to stay in the core' model in the near future, so paid sites won't get bumped out.)

Then, there is what the rest of the Internet and its users have to say about your page. Link analysis, traffic, stats, and cache data are all taken into consideration and analyzed.

The next step is to add in and calculate words in incoming links to your page, making sure they match up to your term vector. So, what the search engine has determined that your page is about must match what the rest of the Internet says your page is about in their links to you.

So in review, in layman's terms, here is what I would define as a theme based engine:

What you say your page is about, what the search engine calculates your page to be about, and what the rest of the Internet thinks your page is about, must match, according to their mathematical formulas.

Then, as the whipped cream topping on top of the theme behavior sundae, are the stats and cache data. If your site is one of a search engine's top exit pages, it must be good, because people don't come back and search some more once they've found your site. You just got a big boost in positioning. And, if your site gets searched and clicked on so often that you are in the engine's cache for speedy data retrieval, your site must be very good indeed.

All of these factors, both on and off page criteria, help define what a theme-based search engine is looking for. They are looking for unanimous approval that your site is all about a particular topic. And the more narrow the focus on that topic, the better your site will do.

What Does the Future Hold?

Campbell answers,

In the future, you might be able to load the engine full of lists of keywords. Your interests, likes and dislikes, geographical info, and favorite Web sites can be entered, from which the engine can create a context engine just for you. Just think, they'll know what your next search is likely to be, even before you do.

It's almost frightening, isn't it?


Web whipnet.com
whipnetworks.com whiptech.com


HOME                                                          2002-2020 Whipnet Technologies