How to search the internet 5: advanced operators

13/05/2010

This is part 5 of my guide to searching the internet. Here are links to:

Part 1: History of internet search;
Part 2: How a modern web search site works;
Part 3: How to actually use a modern search engine
Part 4: How to understand the results you get from using a modern web search

I covered basic use of operators in part 3. But the proper use of operators is very important if you want to get the most from a search engine, especially if the search is at all complicated. So I’m going to go into more detail on the subject here. I got much of the info from other sites, especially www.GoogleGuide.com. But I (rather modestly) think that i present the info in a much more readable and usable form.

Okay, here we go. Operators are special uses of certain words or combination of words that mean more to the search engine than the plain use of words as simple search terms. Here’s a quick example, which you may find familiar if you’ve ever learned how to use Google to find mp3 files: Let’s imagine we want to find mp3 files of tracks by the excellent early British punk band The Clash. What we actually find are listings of the contents of directories that contain the mp3 music files. So, we could use Google search terms like this:

intitle:index.of mp3 “the clash” -.html -.htm

Let’s examine that bit by bit. It starts with intitle:index.of. The intitle part tells Google to look in the title of a page for a particular word or phrase. In this instance, the phrase to look for is index.of (which would, incidentally, look for titles that include the string “index.of” and “index of” (ie with a space rather than a period). That’s how Google and most (all?) other modern search engines work. The reason for looking for a page whose title includes the phrase “index of” is that a web page listing the contents of a directory will very likely have a title containing those words. It’s also looking for the word mp3 and the phrase “the clash”. You’ll notice we used quotation marks around “the clash”. This is my personal preference: the band was called The Clash, so I want results that contain that band name. Some people disagree, thinking that cuts out a lot of relevant results. And it’s true that some webmasters may have used the word “clash” in the page title. But I think using the word “clash” would pull up lots of irrelevant results like “Clash of the Titans” and “clash of two cultures”. So I stick with the phrase “the clash”. Whether you go with my suggestion or not is up to you.

The last 2 operators in this search are -html and -htm. You see, we’re looking for a page that lists the contents of a directory. This is not a page that is destined to be viewed by site users – it has more of a “housekeeping” function. And as it isn’t meant to be viewed by general users, it is very unlikely to contain mark-up. We’re not looking for marked-up pages; so we don’t want pages whose titles are suffixed .htm or .html. That operator means the same as the NOT operator.

So, that was just a quick example of how operators are used to help construct a search term. Now let’s have a look at what operators are available to a search engine user:

city1 city2: this will look for info on flights from city 1 to city 2. We don’t use the actual names of the city though, we use the 3-letter airport codes. For instance, the search sfo bos will pull up times and info on flights from San Fransisco; whereas the search san fransisco boston pulls up some flight info but also a lot of unrelated results. You can find the 3-letter codes for airports worldwide here.

Here’s some more stuff about advanced Google search operators (with thanks to GoogleGuide.com):

allinanchor:
If you begin your query with allinanchor: Google restricts results to pages containing all query terms you specify in the anchor text on links to the page. Example: the query allinanchor: best museums birmingham will return only pages in which the anchor text on links to the pages contain the words best, museums and birmingham.

Anchor text is the text on a page that is linked to another web page or a different place on the current page. When you click on anchor text, you will be taken to the page or place on the page to which it is linked. When using allinanchor: in your query, do not include any other search operators. The functionality of allinanchor: is also available through the Advanced Web Search page, under Occurrences.

allintext:
If you start your query with allintext:
Google restricts results to those containing all the query terms you specify in the text of the page. For example, allintext: travel packing list will return only pages in which the words “travel”, “packing” and “list” appear in the test of the page. This functionality can also be obtained through the Advanced Web Search Page, under Occurrences.


How to search the internet 1: the history of search

29/03/2010

This is the first part of my guide to web search; the second part is here; part 3 is here. Part 4 is here.

The first search engine is widely considered to be Archie: a tool for indexing FTP archives which enabled users to locate resources. Its first implementation was written in 1990 by Alan Emtage, Bill Heelan, and J. Peter Deutsch, then students at McGill University in Montreal. It started off as basic lists of files that were accessed using the Unix command grep. Later, more efficient front- and back-ends were developed, and the system spread from a local tool, to a network-wide resource, to a popular service available from multiple sites around the Internet. The archie servers could be accessed in various ways: by use of a local client; by telnet; by email; and later through the World Wide Web. As the web became more widespread, its simpler interface made archie obselete, and now there are very few archie servers to be found on the internet. Wikipedia mentions an archie gateway still up in Poland; maybe that’s the last one?

Then there was Gopher – a protocol for distributing, searching, and retrieving documents over the Internet, dating from about 1991 and used throughout the 1990s. It was a predecessor of, then for a while an alternative to the World Wide Web. Wikipedia describes it as:

a TCP/IP Application layer protocol designed for distributing, searching, and retrieving documents over the Internet, and was a predecessor, and later, an alternative to the World Wide Web. The protocol offers some features not natively supported by the Web and imposes a much stronger hierarchy on information stored on it. Its text menu interface is well-suited to computing environments that rely heavily on remote computer terminals, common in universities at the time of its creation in 1991 until 1993.

Gopher was called Gopher for 3 reasons:

1. Users instruct it to “go for” information;
2. It does so through a web of menu items allegedly analogous to gopher holes;
3. It was developed at the University of Minnesota, whose sports teams are the “Golden Gophers”.

Its user interface (text, based on menus) suited the computer environment of the 1990s – mostly command-line interface on remote terminals. But by the late 90s, as graphical interfaces to the internet became more common (thanks to web browsers like Mosaic, whose integration of text and images was much more user-friendly than Gopher’s text-menu approach) Gopher was in decline. Although it still exists on the internet, it is used mostly for nostalgic reasons.

As the web became ubiquitous, and huge numbers of websites were created, organisations began to collate lists of these sites into directories. Yahoo, Lycos and the Open Directory are examples. These directories listed sites in categories by content: for instance, if you were looking for a particular site about photography, you would look through Yahoo’s list of photographic sites.

But as the web grew ever bigger, it seemed to many people that directories became too unwieldy: if you’re looking for a site about a particular photographer and you’re confronted with a list of 50,000 sites, you’ll probably give up in despair. This is where the modern search engine comes in – the likes of Google and Bing. We’ll get into all that in the next instalment of this little guide to internet search.

_gos=’c4.gostats.com’;_goa=354450;
_got=2;_goi=2;_goz=0;_gol=’Free hit counter’;_GoStatsRun();
Free hit counter
Free hit counter
<!– End GoStats JavaScript Based Code —


%d bloggers like this: