Bots, Crawlers and Spiders, oh my!

From iGeek
Revision as of 19:02, 15 April 2018 by Ari (talk | contribs) (1 revision imported)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
Crawlers.png

The terms bots, crawlers and spiders are likely to give arachnophobes the heebie-jeebies, but they're really just an important part of the way search engines work. The automated critters just go to the front page of a website, and look at every link in that page... then go to each of those and do the same, and so on. Once they've visited every page on a site (crawled across it), they have a pretty good idea by counting most frequent words, of what each article is about, as well as by looking at who links to that page/website, how it should rank in importance.

Basics

Most people creating their own websites want their site to be at the top of the search engines response list when someone searches for a particular topic. While that isn't likely, there are things you can do to move up the priority list, and it is good to understand how these things work.

There is something referred to as a bot (robot), crawler or spider. Their purpose is to automatically surf the web.

They go to every site, and try to follow every link they can find; then they crawl all over anything they can find there as well; hence the bug references.

When you register your site with a search engine, that is all you are doing - telling their spider to start crawling your site; they will find you eventually, whether you register or not.

Search engines employ these (none-too-smart) automatons to look for anything new, and to create an "index" of what they find on each site.

They keep a list of topics and key words, and they count them up. If each page on your site has the word "computer" in it, then they can "guesstimate" that your site has something to do with computers. Then when someone searches for the word "computer", they know that your site has something to do with that topic, and you should be in the list of 15,000+ sites that also refer to computers.

SEO

The problem is that there are so many sites and pages that have to do with every topic, that they also need to figure out who should be first on the list. They need to figure out popularity and give each site (or page) some relative weight, with the most massive sites showing up higher on the list.

If you want a lot of weight quickly, the search engines will let you rent it (a form of advertising); but most of us don't have the budgets to create that artificial weight, and must do it other ways.

Search engines can do little to figure out true "popularity" and how often people visit; they can't really snoop other people's sites and see who is visiting or how often, so they resort to less direct methods.

On the extreme high end, there is some ability to poll users and figure out where they are going; but you're not likely to show up in those, and we're not talking about creating a site for a company that can measurably effect the nations GDP; just a normal site.

One of the ways that search engines can guess at popularity is to just count links; they look at how many other sites are pointing to a page on your site, then they can rate how "popular" your site is. The more people that point to you, and the bigger they are, then the more valuable your information must be; and the more weight you get. So if you want to show up better on search engines, then you need to make web-friends and link to each other. Advertising banners on other sites (that have weight), don't hurt, since that adds some weight (links and readers); but the banner links stop when you stop advertising.

Another way is just based on how big your site is; if there are many articles on the site (a lot of "content"), that all have articles about the same subject, then you're probably getting visitors on that topic, and others would probably have interest as well.

Since where you rank on search engines can dramatically change the amount of traffic you get, and traffic translates to revenue, everyone wants to know exactly how search engines rank sites, so they can game the system (and trick them into putting their stuff at the top). So search engines tweak their algorithms, in ways they don't share. And others try to scan how results come up, to reverse engineer the important variables for moving up the rank. This cold war is about SEO: Search Engine Optimization and how to game/defeat it.

Addendum

When I wrote this, it was pretty easy to make a successful site. The Internet was still hungry for content, so creating good content, which got you good links from others, and you moved up the ranks. Now days, the problem has inverted: there's a ton of content out there... and people tend to not link to outside content as much (trying to keep traffic internal). So it's not just about good content, but promoting yourself to those that will link back to you. And SEO became much bigger business.


Networking : CookiesEMailNetwork Casting and SubnetsNever trust the InternetWeb BasicsWeb Search BasicsWhat is DNS?What is a WebApp?

Written 2002.03.11 Edited: 2018.04.13