How to search the internet 5: advanced operators

13/05/2010

This is part 5 of my guide to searching the internet. Here are links to:

Part 1: History of internet search;
Part 2: How a modern web search site works;
Part 3: How to actually use a modern search engine
Part 4: How to understand the results you get from using a modern web search

I covered basic use of operators in part 3. But the proper use of operators is very important if you want to get the most from a search engine, especially if the search is at all complicated. So I’m going to go into more detail on the subject here. I got much of the info from other sites, especially www.GoogleGuide.com. But I (rather modestly) think that i present the info in a much more readable and usable form.

Okay, here we go. Operators are special uses of certain words or combination of words that mean more to the search engine than the plain use of words as simple search terms. Here’s a quick example, which you may find familiar if you’ve ever learned how to use Google to find mp3 files: Let’s imagine we want to find mp3 files of tracks by the excellent early British punk band The Clash. What we actually find are listings of the contents of directories that contain the mp3 music files. So, we could use Google search terms like this:

intitle:index.of mp3 “the clash” -.html -.htm

Let’s examine that bit by bit. It starts with intitle:index.of. The intitle part tells Google to look in the title of a page for a particular word or phrase. In this instance, the phrase to look for is index.of (which would, incidentally, look for titles that include the string “index.of” and “index of” (ie with a space rather than a period). That’s how Google and most (all?) other modern search engines work. The reason for looking for a page whose title includes the phrase “index of” is that a web page listing the contents of a directory will very likely have a title containing those words. It’s also looking for the word mp3 and the phrase “the clash”. You’ll notice we used quotation marks around “the clash”. This is my personal preference: the band was called The Clash, so I want results that contain that band name. Some people disagree, thinking that cuts out a lot of relevant results. And it’s true that some webmasters may have used the word “clash” in the page title. But I think using the word “clash” would pull up lots of irrelevant results like “Clash of the Titans” and “clash of two cultures”. So I stick with the phrase “the clash”. Whether you go with my suggestion or not is up to you.

The last 2 operators in this search are -html and -htm. You see, we’re looking for a page that lists the contents of a directory. This is not a page that is destined to be viewed by site users – it has more of a “housekeeping” function. And as it isn’t meant to be viewed by general users, it is very unlikely to contain mark-up. We’re not looking for marked-up pages; so we don’t want pages whose titles are suffixed .htm or .html. That operator means the same as the NOT operator.

So, that was just a quick example of how operators are used to help construct a search term. Now let’s have a look at what operators are available to a search engine user:

city1 city2: this will look for info on flights from city 1 to city 2. We don’t use the actual names of the city though, we use the 3-letter airport codes. For instance, the search sfo bos will pull up times and info on flights from San Fransisco; whereas the search san fransisco boston pulls up some flight info but also a lot of unrelated results. You can find the 3-letter codes for airports worldwide here.

Here’s some more stuff about advanced Google search operators (with thanks to GoogleGuide.com):

allinanchor:
If you begin your query with allinanchor: Google restricts results to pages containing all query terms you specify in the anchor text on links to the page. Example: the query allinanchor: best museums birmingham will return only pages in which the anchor text on links to the pages contain the words best, museums and birmingham.

Anchor text is the text on a page that is linked to another web page or a different place on the current page. When you click on anchor text, you will be taken to the page or place on the page to which it is linked. When using allinanchor: in your query, do not include any other search operators. The functionality of allinanchor: is also available through the Advanced Web Search page, under Occurrences.

allintext:
If you start your query with allintext:
Google restricts results to those containing all the query terms you specify in the text of the page. For example, allintext: travel packing list will return only pages in which the words “travel”, “packing” and “list” appear in the test of the page. This functionality can also be obtained through the Advanced Web Search Page, under Occurrences.


How to search the internet 4: Understanding search engine results

12/05/2010

This is the fourth part of my guide on how to search the internet. Part 1 is here, part 2 is here, and part 3 is here. Part 5, about using “advanced operators” is here.

So you’ve used Google or some other web search engine, following the tips I’ve given you in this little series, and you’ve been confronted with “results” that don’t actually seem to be any help whatsoever. And it’s true, often Google comes across as an incomprehensible joke designed to make you feel bad. But don’t fret: Google (and its kind) really don’t want you to run screaming; they want you to use the results to find what it is you’re looking for. Unfortunately, this may involve having to learn a thing or two about how Google works. It may be scary-looking at first glance, but really Google want you to find their results pages easily to comprehend. They want you to return to Google.com every time you want help in finding what you want. It can be a rather intimidating interface the first time you look at a results page: but it is all pretty simple really. You just need to know how to understanding the reams of info Google throws at you. Hopefully, this 4th part of my guide will make it all seem far easier.

First thing first: very often Google will offer you a list of sponsored results that may give you what you’re looking for; but if you click on a sponsored link you will be putting money in Mr Google’s pocket and chances are that link will be useless. Forget the sponsored links: go for the meat and potatoes in the list of real links.

Look at the search results; very often you will find other kinds of info alongside those results. Stuff like:

Suggested spelling corrections: Google may think you typed in your query incorrectly. If you’re no good at spelling, this can be a life-saver. But if you know damn well you typed your query correctly, forget this option;

Dictionary definitions: Are you actually searching for the word/s you mean to search for? Maybe you are, maybe you’re not. Think about it. Spelling can be a right tricky operation;

Cached pages: Google carries a huge number of pages that are not currently up to date. Maybe one of those cached pages may contain the info you need. Worth checking if regular searches are turning up sweet F-all;

Similar pages: Often Google won’t find a page that contains the precise info you want, but it has algorithms to turn up similar results. Have a look at them, you’ve nothing to lose really…;

News headlines: A webpage dealing with your query might be hard to find, but it’s often easier for Google to find news stories on related material. And these news stories may well include links to more relevant info. This can save you a bunch of time searching for that little nugget of info that will give you what you want. Remember: news stories are updated frequently, whereas a static page may never be more relevant. Use those options;

Product search: You want to know something about a particular project name. So search for that project name, add a bit of info on what the product can/is meant to do, and see what turns up. This approach works a lot more than you might think;

Translation: So what you want isn’t available in your mother tongue. But it may well be out there for speakers of other languages. Just think: if you are looking for info on a product released by a Portugese company, what makes you think that info will be in English? Search Portugese sites, using Google’s Translation feature or the other translators offered by search services. These translators are often pretty crap; but at least it’ll give you a good idea of what’s what;

Do book searches: Useful info may not yet be available in articles, but books often contain useful stuff. So it can often be a good idea to do a book search;

Cached pages: When a web page is undergoing a lot of changes, clicking on a Google link to a page might take you to the latest version of that page, which may be missing information that was presented some time before. Sometimes, these changes can happen frequently, so a Google link will not take you to the info that the search results first suggested.

Fortunately, Google will often cache an earlier version of the page. So, let’s say a particular page yesterday contained the info you want; but you go to today’s version of the page no longer holds that info. A problem? Not necessarily. Next to the Google link to the updated page will be a link to a [i]cached[/i] version of the page; basically, a version of the page that Google downloaded and cached before the important info was removed. So you click to navigate to the cached page, and you will find the info as it was before it got removed. Google’s system of caching certain pages helps ensure that the history of the web is respected to a certain extent.

If you want to download a version of a page that existed longer ago (several weeks, or months, maybe even years) you can go to [b]The Wayback Machine[/b] at archive.org. This is a project to archive internet sites the way they were in the past, so the current generation’s “now now now” attitude doesn’t drive the history of internet sites into oblivion. [b]The Wayback Machine[/b] doesn’t promise to archive the internet of the past forever; but it is a very useful project that has a multitude of potential uses. Archive.org, like most such projects, is run by volunteers and is always in need of financial support, as well as more practical support such as providing servers. I’d advise anyone who finds such projects very useful to contribute even just a few dollars.

There’s a lot of info on how to understand Google results, and how to configure the way Google works to it gives you the info you want and hopefully protects your privacy, here: http://www.googleguide.com/category/understanding-results/http://www.googleguide.com/category/understanding-results/. I really advise anyone who’s seriously into using Google as best they can to check out this info. Google really is one of the best resources available online… and it’s free! Let’s make the most of it while we can! Before the goddamn Man tries to take it away from us!

_gos=’c4.gostats.com’;_goa=354450;
_got=2;_goi=2;_goz=0;_gol=’Free hit counter’;_GoStatsRun();
Free hit counter
Free hit counter

var _clustrmaps = {‘url’ : ‘https://ihatehate.wordpress.com’, ‘user’ : 904987, ‘server’ : ‘2’, ‘id’ : ‘clustrmaps-widget’, ‘version’ : 1, ‘date’ : ‘2011-06-30’, ‘lang’ : ‘en’ };(function (){ var s = document.createElement(‘script’); s.type = ‘text/javascript’; s.async = true; s.src = ‘http://www2.clustrmaps.com/counter/map.js’; var x = document.getElementsByTagName(‘script’)[0]; x.parentNode.insertBefore(s, x);})();Locations of visitors to this page