How to search the internet 3: how to actually *use* a search engine

08/04/2010

Okay, this is the 3rd part of my series on how to search the internet. Part 1 briefly covered the history of internet search; part 2 looked (again, very briefly) at how modern search engines like Google work; and now we come to the nitty-gritty: how we actually use a search engine to find the stuff we want. Part 4 will look at how to understand the results brought by the search engine.

The first method we’ll look at is the method most people think of using – the keyword search. This involves thinking about what it is you want to find, then choosing words that concisely describe the object of your search.

Imagine we want to find out about giraffes. We could type the word giraffe into Google (or whatever search site we’re using – I’m going to use Google as an example, but this should be relevant for other search engines too). So, we tell Google to search for giraffe: and Google turns up 8,550,000 possible results.

Now, that is an awful lot of web pages to trawl through to find whatever it is we want to know about giraffes. And a lot of them won’t be relevant at all. If you look at the result page for the search on giraffe you’ll see that the very first result is for Giraffe Restaurants – probably not what we want.

So we need to think about what we’re looking for. Let’s imagine we want to know about what giraffes eat. We could search for giraffe food – but then we run into that problem of sites for restaurants and other human food companies with the word “giraffe” in their names.

Giraffe feeding habits would be better – it garners us some 274,000 results – but also excludes pages where the word “habits” isn’t used, for instance where the phrase “feeding behaviour” is used instead. So what about using giraffe feeding? Google gives us 618,000 results for that, but maybe will include useful pages that we might otherwise miss.

And what about giraffe feed? That gives us a massive 1,300,000 results; but this more comprehensive search will give us “feeds” and “feeding”, and sentences like “how a mother giraffe feeds its young” and “what kind of feed for a giraffe in captivity”; and Google will also pick up on the word “fed”, which might be important. It all depends on what exactly we want to find.

But anyway, both 274,000 and 1,300,000 are a lot of results to trawl through. So some further refining of search terms may be order. What precisely do we want to know? Are we after info relating to giraffe feeding habits in captivity rather than in the wild? If so, we can search for giraffe feed captivity which gets us just 21,500 results! Yes, 21,500 links is still a lot to check, but it’s an awful lot better than the 2,160,000 pages we started with!

Sometimes users might want to use search terms in the form of a question – for example what should you feed a giraffe in captivity. This is generally considered to be a bad idea, because a question might contain words that wouldn’t actually appear in a web page you want to find. Google has a feature called stop words: this tells Google to not search specifically for certain commonly used words (like you, what, in… I’m sure you get the idea. But sometimes we might want Google to include stop words in its search. For instance, we might be searching for references to the movie “How The West Was Won”. To do this, we use quotation marks in the search terms; ie we tell Google to look for “how the west was won”. Then Google will look for pages that include the complete phrase.

Another poor use of search terms would be to tell the search engine to look for articles on feeding giraffes or documents about the care of giraffes in captivity. While those would be reasonable instructions to give to a human, they are not appropriate terms for a search engine. Remember, a search engine is a computer program, and computer programs are stupid. They’re good at doing exactly what we tell them to do; but the pages we’re looking for probably wouldn’t contain the words “articles on…” or “documents about…” so using those phrases will exclude many pages that we would actually want.

Another way to craft useful search terms is through the use of operators. Operators are query words that have a special meaning to search engines. You can find some excellent examples of operators and how to use them at Googleguide.com; but I’ve provided a few examples here so you can quickly get an idea of what this operator thing is all about:

If we wanted to find info about recycling steel or recycling iron, we would use the operator OR, like this: recycle steel OR iron. If we used the search terms recycle steel iron without that OR, the search engine would look for pages that included the words “steel” and “iron”; it wouldn’t bother showing us pages that included just one of the words without the other and we might miss very useful pages. Using the capitalization with OR helps the search engine to understand that it’s meant as an operator.

If we wanted to find pages that contained information about Steve Davies but which explicitly did not mention snooker, we could use the operator NOT or the symbol, like this: “steve davies” NOT snooker or “steve davies” -snooker. (Notice too that we’ve put the name Steve Davies in quotes, so the search engine knows we are specifically looking for the name Steve Davies and not looking for pages that mention, for example, Steve Jones and David Davies.)

We can use the + symbol to help focus on a particular search term and possibly weed out others. For example, if we are looking for references to King Louis I of France in particular and not any other French kings, we can search for Louis +I France.

We can use the define: operator to learn the definition of a particular word. For example, to find out what the word “cantata” means, we can search for define:cantata. This will give us a selection of definitions of “cantata” from various online dictionaries.

If you aren’t too confident about the correct usage of operators, you can use the advanced search option with some/many/most search engines. For example, with Google you can click on Advanced search and you will be presented with a form that has a number of search term entry fields. This gives you a simple way of setting a number of parameters to your search. But if you’re confident with using operators, you can construct pretty complex search parameters using just the standard entry field.

When you are crafting terms for a particular search, it comes down to common sense at the end of the day. Just plugging in in one or two search terms might be enough for a simple search; but if things aren’t really simple, you need to give some thought to what exactly it is that you’re looking for. If you want to know about how to use the knight in chess, a search for knight +chess will be much more useful than just typing in the word knight or the question how does the knight move in chess. A little forethought can save you a lot of time, by giving you a much shorter list of results to trawl through.

Well, I think that’ll do for this part of my guide to using search engines. I realize I haven’t provided a comprehensive list of all the operators available, or all the search strategies you can use – but it would have been pretty futile for me to even attempt that. There are a number of search engines, and they’re not all the same. I advise you have a look at www.googleguide.com to pick up some tips on using Google (the most popular search engine on the web), and The Spider’s Apprentice for some more general advice. But believe me: while both of those sites are very interesting, they’re certainly not essential. This blog post, with a dollop of common sense on the side, should get you plenty of useful results to any search queries you might make.

This isn’t the end though – oh no, not by a long chalk. I’ve told you how to get results – now you need to know how to use them. Which is what I’ll cover in Part 4 of my guide to internet search.

_gos=’c4.gostats.com’;_goa=354450;
_got=2;_goi=2;_goz=0;_gol=’Free hit counter’;_GoStatsRun();
Free hit counter
Free hit counter


How to search the internet 2: how a modern web search works

29/03/2010

In the first instalment of this guide on how to search the internet, I gave a little history of the search engine: I covered Archie, Gopher, and site directories like the Open Directory Project. Those are the old technologies, all pretty much obsolete now. That brings us to the present day and the modern search engine.

When I write “modern search engine”, I mean web search sites like Google and Bing. Because they all work in pretty much the same way – the only difference seems to be in the algorithms each service uses.

Now I could tell you all about spiders crawling the web and stuff, but I think most of you would just tune out after a couple of lines. So I will give you 2 lovely Youtube videos to watch instead:

The 3 Minute Guide to How Search Works:

A slightly longer video that looks at the subject from the perspective of a webmaster who wants to increase traffic to his site:

Watched them? Good. So now you have the basic idea: little programs called “bots”, “crawlers” or “spiders” are sent out to crawl over the world wide web, following links, and compiling lists of URLs that they consider to contain good information. And how do these mindless software automatons decide that the info is “good”? It all comes down to the algorithms.

It’s Google’s algorithms – the “secret ingredient” – that has made Google the world’s favourite search engine and kept them at the top for so many years. Any coder of sufficient proficiency can create bots to crawl the web; but it’s the secret algorithms that turn a regular bot into a googlebot. And there just hasn’t been another bot that can compete.

At least that’s how it has seemed for some time. Yahoo has a hard core of admirers; Altavista.com has had success mostly due to its “Babel Fish” translation service blowing its rivals out of the water; but it’s only recently that a true contender for the title of Number One Search Engine to step up and challenge Google. That challenger’s name: Bing.

Microsoft has been trying for years to break into the search engine market, with a plethora of products: Live Search, Windows Live Search, MSN Search – they even tried to buy, then made a deal with Yahoo to get that Microsoft name up there with the giants – but nothing was able to make much impact on Google. Then in 2008 Microsoft (following the tried and tested strategy of “embrace, extend, extinguish”) bought a tech company called Powerset and, importantly, its “semantic technology”. Microsoft claim that their improved technology cuts down on the risk of “search overload”, when a user is inundated with millions of barely relevant results – something that can happen when using Google. And Microsoft has used the near-ubiquity of its web browser, by incorporating Bing into Internet Explorer 8. Google is still number one search engine, but Microsoft has certainly made its mark on the territory.

So who’s going to win this battle of the search engines? I think it could still go either way. Google has years of good form and a hell of an online presence; but Microsoft still owns the desktop and the browser. And anyway, someone else might come from the left field and clinch it in the final seconds – Ixquick is a potential outside bet with their whole “ethical privacy” trip; Google’s got the “Don’t be evil” motto but it’s Ixquick who are out there actually being “not evil” (and if privacy is a major concern, don’t forget Scroogle). One thing we should have learnt from IT history is that nothing is set in stone.

I’ll bet you’re thinking “Oh well done Google and Microsoft, give yourselves a pat on the back… but what in hell has any of this got to do with how to use a goddamn search engine?!! I figured it would be useful to cover all this history and present situation stuff. Well, maybe interesting rather than useful… I certainly find this kinda crap fascinating. But you’re right, it doesn’t tell us a great deal about how to use a search engine. So I promise: the next instalment of this howto will actually cover some proper howto material. So keep ’em peeled… you definitely don’t want to miss this!!

_gos=’c4.gostats.com’;_goa=354450;
_got=2;_goi=2;_goz=0;_gol=’Free hit counter’;_GoStatsRun();
Free hit counter
Free hit counter


How to search the internet 1: the history of search

29/03/2010

This is the first part of my guide to web search; the second part is here; part 3 is here. Part 4 is here.

The first search engine is widely considered to be Archie: a tool for indexing FTP archives which enabled users to locate resources. Its first implementation was written in 1990 by Alan Emtage, Bill Heelan, and J. Peter Deutsch, then students at McGill University in Montreal. It started off as basic lists of files that were accessed using the Unix command grep. Later, more efficient front- and back-ends were developed, and the system spread from a local tool, to a network-wide resource, to a popular service available from multiple sites around the Internet. The archie servers could be accessed in various ways: by use of a local client; by telnet; by email; and later through the World Wide Web. As the web became more widespread, its simpler interface made archie obselete, and now there are very few archie servers to be found on the internet. Wikipedia mentions an archie gateway still up in Poland; maybe that’s the last one?

Then there was Gopher – a protocol for distributing, searching, and retrieving documents over the Internet, dating from about 1991 and used throughout the 1990s. It was a predecessor of, then for a while an alternative to the World Wide Web. Wikipedia describes it as:

a TCP/IP Application layer protocol designed for distributing, searching, and retrieving documents over the Internet, and was a predecessor, and later, an alternative to the World Wide Web. The protocol offers some features not natively supported by the Web and imposes a much stronger hierarchy on information stored on it. Its text menu interface is well-suited to computing environments that rely heavily on remote computer terminals, common in universities at the time of its creation in 1991 until 1993.

Gopher was called Gopher for 3 reasons:

1. Users instruct it to “go for” information;
2. It does so through a web of menu items allegedly analogous to gopher holes;
3. It was developed at the University of Minnesota, whose sports teams are the “Golden Gophers”.

Its user interface (text, based on menus) suited the computer environment of the 1990s – mostly command-line interface on remote terminals. But by the late 90s, as graphical interfaces to the internet became more common (thanks to web browsers like Mosaic, whose integration of text and images was much more user-friendly than Gopher’s text-menu approach) Gopher was in decline. Although it still exists on the internet, it is used mostly for nostalgic reasons.

As the web became ubiquitous, and huge numbers of websites were created, organisations began to collate lists of these sites into directories. Yahoo, Lycos and the Open Directory are examples. These directories listed sites in categories by content: for instance, if you were looking for a particular site about photography, you would look through Yahoo’s list of photographic sites.

But as the web grew ever bigger, it seemed to many people that directories became too unwieldy: if you’re looking for a site about a particular photographer and you’re confronted with a list of 50,000 sites, you’ll probably give up in despair. This is where the modern search engine comes in – the likes of Google and Bing. We’ll get into all that in the next instalment of this little guide to internet search.

_gos=’c4.gostats.com’;_goa=354450;
_got=2;_goi=2;_goz=0;_gol=’Free hit counter’;_GoStatsRun();
Free hit counter
Free hit counter
<!– End GoStats JavaScript Based Code —


Ixquick.com: the internet search engine that respects your privacy

30/12/2009

Most search engines, like Google and Yahoo, earn their keep by selling targeted advertising. The way in which they decide what adverts should be targeted at you varies from search engine to search engine. Some simply display adverts that seem appropriate to the search items you just used. But there are other, possibly more sinister methods. Google, for instance, uses cookies and browser interaction to build up a detailed record of the websites you’ve been to. Although Google say they store this info just for ad targeting and related uses, there’s always the possibility that these incredibly detailed logs could be used for evil. Let’s say that Barack Obama suddenly became a Fascist dictator and ordered Google to hand over all its records so he could use them to help choose victims to persecute. Google is a law-abiding corporation, so they’d give up the info in a heartbeat (and don’t try to kid yourself otherwise – they don’t mind helping out the Chinese government, so what makes you think they wouldn’t obey US authorities?).

Unsurprisingly, some people don’t like the idea of their info being collated. And there are various ways to avoid Google’s lists. For instance there’s Scroogle.org – an independent search site that actually uses Google to locate pages for you but removes all identifying data from the search so Google doesn’t know who initiated the search. If you use Firefox, there’s an add-on called CustomizeGoogle that can anonymize your Google userid as well as tweaking your search preferences in various ways.

But both of those solutions involve using Google. Many people are happy with that as they think Google’s the best search engine out there. But if you object to the way Google collects personal data on its users, perhaps you should boycott Google altogether. So is it possible to reject Google completely and still use a search engine that is good but doesn’t compile lists on you?

Well I think that’s perfectly possible – by using Ixquick.com. Ixquick says it’s the only search engine that does not collect users’ IP addresses. And it isn’t all just hype – On July 14th, 2008 Ixquick was awarded the first European Privacy Seal.
This means that Ixquick is the only EU-approved search engine. Pretty damn impressive.

But all of that means nothing if Ixquick searches are crap. So how does Ixquick measure up to Google? Not too shabby, I think. I’ve been comparing the two search engines side by side for a couple of weeks, and on the whole Ixquick has performed very well. But don’t take my word for it – check it out for yourself. Believe me, if you are at all concerned about your privacy, it’s definitely worthwhile for you to at least take a look at Ixquick. Maybe you’ll decide that Google is just too good to abandon. But at least take a look at Ixquick. Your privacy is valuable, so don’t give it up too cheaply.

_gos=’c4.gostats.com’;_goa=354450;
_got=2;_goi=2;_goz=0;_gol=’Free hit counter’;_GoStatsRun();
Free hit counter
Free hit counter


ha ha bonk!!

26/08/2008

Linux tutorial #56: sudo

To help you get your head round the command “sudo”, there’s an excellent example of its usage below:

The cartoon is from the webcomic xkcd.com.  For your homework assignment go check it out!  The cartoon on the front page today (26 Aug) is funny too.  If you don’t get it, google “2 girls 1 cup”… or is it “1 cup 2 girls”?  Whatever, once you’ve seen it you’ll wish you hadn’t!


Dead Canoeist & Wife, happy in Panama City!

06/12/2007

darwin372x192.jpg

Here’s the photo that blew “dead canoeist” John Darwin’s amnesia story out of the water!

This photograph – reportedly discovered thanks to Google! – shows John Darwin and wife Anne, alive and smiling in Panama City on 14 July 2006… after Mr Darwin had “vanished” while canoeing, and his wife had collected on the life insurance!  WHOOPS!!

Anne Darwin yesterday said that her life had become a “nightmare” and admitted that the photograph showed her and her husband in an apartment they had just rented.

When she was confronted with the photograph she told the Daily Mirror who originally published the image: “Yes, that’s him. My sons will never forgive me.”

And neither will the UK courts!  Or the insurance company!


How to find *free* music on the web!

15/09/2007

There are all these torrents and file-sharing services that everyone uses nowadays. I’m not going to tell you about them, in fact I don’t know very much about them. Never used them in my life. No, I use a much more sinister, arcane method to find music to download, for free, on the internet.

Google Is Your Friend(tm)

I’m being absolutely serious about this. If you use the search method I’m about to explain, you will likely find music files that you want, all for free. Illegal of course, but I won’t tell if you don’t 😉

Okay, let’s assume that you’re an Arctic Monkeys fan and you want to get hold of their album Brianstorm – but you don’t want to buy the CD, and the idea of paying for downloads is just too unnatural to you. Well, fire up Google (it’s at http://www.google.com, just in case you’ve never used it!), and in the search terms field type the following:

intitle:index.of mp3 arctic monkeys brianstorm

You’ll get a whole heap of search results – as is the Google way, alas – but if you sift through them you will be sure to find a site where some kind soul has posted all the tracks off Brianstorm. I know this, you see, because that’s how I got the album.

Unfortunately it can be a little difficult finding recent music – the mp3s are often out there, but raking through the Google results can be disspiriting. But if you’re like me – a fan of “old” music, like from the ’60s and ’70s, you’re laughing. I found every album ever released by The Doors – loads of Janis Joplin/Big Brother and the Holding Company including tracks that I don’t think were ever officially released – reggae reggae reggae, by Bob Marley of course, and Lee “Scratch” Perry, and hundreds of bands and singers I’d never heard of before. If you’ve got an eclectic, adventurous taste in music, Google will introduce you to stuff that you’ll love til the day you die!

I used to have a half-decent collection of Red Hot Chilli Peppers CDs and cassettes, but they mysteriously disappeared when I moved house. I was gutted… then Google came to the rescue, and now I can keep the neighbours up all night with that heavy funk.

So just remember the simple formula. If you wanted Beatles tracks, you’d search for

intitle:index.of mp3 beatles

If you wanted reggae, no particular artist in mind, you’d type

intitle:index.of mp3 reggae

It’s a real simple formula – takes moments to learn, and will serve you well for years. So go on – fire up Google – see how much royalty money you can scam those stinking-rich rockstars for!!

PS:  I’ve amassed a good collection of tunes thanks to Google, and the many kindly souls who have posted their mp3 collections on the net.  So, one of these days, I swear to the Goddess, I’m gonna stick all my music files on a website so fellow musical adventurers can benefit from the internet’s bounty.  It needn’t cost me anything – there are loads of companies that do free hosting – and if the Recording Artistes Cartel lean on my host to shut me down, I’ll just move my mp3s to another site.  Music, like knowledge, wants to be free!  Let’s make it happen!

PPS: Just in case some droid from the music industry is reading this and planning to do me in court for piracy or copyright violation or whatever they call it – I’ve made it all up!  It’s just a pipe dream, a bit of fiction!  I’d never steal from the poor, impoverished record companies!  😉

copyleft.jpg


%d bloggers like this: