How to search the internet 2: how a modern web search works

March 29, 2010

In the first instalment of this guide on how to search the internet, I gave a little history of the search engine: I covered Archie, Gopher, and site directories like the Open Directory Project. Those are the old technologies, all pretty much obsolete now. That brings us to the present day and the modern search engine.

When I write “modern search engine”, I mean web search sites like Google and Bing. Because they all work in pretty much the same way – the only difference seems to be in the algorithms each service uses.

Now I could tell you all about spiders crawling the web and stuff, but I think most of you would just tune out after a couple of lines. So I will give you 2 lovely Youtube videos to watch instead:

The 3 Minute Guide to How Search Works:

A slightly longer video that looks at the subject from the perspective of a webmaster who wants to increase traffic to his site:

Watched them? Good. So now you have the basic idea: little programs called “bots”, “crawlers” or “spiders” are sent out to crawl over the world wide web, following links, and compiling lists of URLs that they consider to contain good information. And how do these mindless software automatons decide that the info is “good”? It all comes down to the algorithms.

It’s Google’s algorithms – the “secret ingredient” – that has made Google the world’s favourite search engine and kept them at the top for so many years. Any coder of sufficient proficiency can create bots to crawl the web; but it’s the secret algorithms that turn a regular bot into a googlebot. And there just hasn’t been another bot that can compete.

At least that’s how it has seemed for some time. Yahoo has a hard core of admirers; has had success mostly due to its “Babel Fish” translation service blowing its rivals out of the water; but it’s only recently that a true contender for the title of Number One Search Engine to step up and challenge Google. That challenger’s name: Bing.

Microsoft has been trying for years to break into the search engine market, with a plethora of products: Live Search, Windows Live Search, MSN Search – they even tried to buy, then made a deal with Yahoo to get that Microsoft name up there with the giants – but nothing was able to make much impact on Google. Then in 2008 Microsoft (following the tried and tested strategy of “embrace, extend, extinguish”) bought a tech company called Powerset and, importantly, its “semantic technology”. Microsoft claim that their improved technology cuts down on the risk of “search overload”, when a user is inundated with millions of barely relevant results – something that can happen when using Google. And Microsoft has used the near-ubiquity of its web browser, by incorporating Bing into Internet Explorer 8. Google is still number one search engine, but Microsoft has certainly made its mark on the territory.

So who’s going to win this battle of the search engines? I think it could still go either way. Google has years of good form and a hell of an online presence; but Microsoft still owns the desktop and the browser. And anyway, someone else might come from the left field and clinch it in the final seconds – Ixquick is a potential outside bet with their whole “ethical privacy” trip; Google’s got the “Don’t be evil” motto but it’s Ixquick who are out there actually being “not evil” (and if privacy is a major concern, don’t forget Scroogle). One thing we should have learnt from IT history is that nothing is set in stone.

I’ll bet you’re thinking “Oh well done Google and Microsoft, give yourselves a pat on the back… but what in hell has any of this got to do with how to use a goddamn search engine?!! I figured it would be useful to cover all this history and present situation stuff. Well, maybe interesting rather than useful… I certainly find this kinda crap fascinating. But you’re right, it doesn’t tell us a great deal about how to use a search engine. So I promise: the next instalment of this howto will actually cover some proper howto material. So keep ’em peeled… you definitely don’t want to miss this!!

“Microsoft is good on security” shock bonk

January 26, 2010

I have listened to the IT security podcast Security Now for some time now, and on the whole I’ve considered the host, Steve Gibson, to be a fairly sensible fellow. But my faith in the guy has been shaken, big time, after he said some real crazy-assed shit in the latest show (episode 232).

Gibson and fellow host Leo Laporte were talking about how Microsoft have been making incremental improvements to the security profile of its infamous web browser Internet Explorer. IE8 is a lot more secure than IE6, they said. Which is a reasonable thing to say. But then Laporte uttered these incredible words: “”Microsoft doesn’t have the greatest track record but I don’t think they’re particularly worse than anyone else [on security].” And the alleged security expert Gibson agreed!

Now, Laporte has a bit of an excuse. He’s a tech head, not a security guy. Yes, his technical background should tell him that Microsoft is a train wreck security-wise. But he’s a Microsoft fan in general, so we shouldn’t expect too much from him. But Gibson is a security professional – his hard disk data recovery utility, Spinrite, gets a lot of plaudits (many of them on his own site), and through his company GRC he sells a bunch of other security products. And the podcast generally makes excellent listening. So how can he be so deluded about Microsoft?

Because Microsoft is a truly appalling company when it comes to the security of its products (Microsoft is appalling in a lot of other ways too, but let’s concentrate on security here). For years the Windows operating systems have been infested with spyware, viruses, trojans and other malware. It’s only since Vista that Windows has had any decent security model at all. The browser Internet Explorer has long been a joke to most security-conscious computer users, most of whom use Firefox or Google Chrome/Chromium instead. IE is probably the vector for most of the attacks that take place over the internet. So even if we disregard IE’s other shortcomings, like its disregard for open standards embraced by the rest of the industry, it fails miserably when it comes to its users’ security.

Even Patch Tuesday – Microsoft’s vaunted update cycle – is a dangerous joke. Microsoft releases its software updates on the second Tuesday of every month (“whether they need to or not”, LOL). There could be a major 0-day vulnerability in the world’s most widespread personal computer software, threatening millions of users right now – but the fix won’t be released until the second Tuesday in the month comes round. And the computer criminals know this. They can engineer their attacks to make the most of the period between one Patch Tuesday and the next. If Mozilla (for example) discover a vuln in Firefox (for example) they will release the fix as soon as they can – usually within a couple of days. Microsoft will very very rarely release a fix before Patch Tuesday. And Gibson agrees with Laporte that Microsoft are “no worse than anyone else”? Crazy…

Tell you what though, Security Now 232 is still worth a listen. I won’t list everything covered, I’ll just urge you all to check it out (download link here). My confidence in Gibson may have been shaken by his comments about Microsoft, but the fact remains that he knows a lot about his business. One thing I learned is that I’ve been pronouncing the word “kludge” incorrectly for years. “Kludge” is hacker-speak, meaning an inelegant solution to a problem. I’ve always pronounced it to rhyme with “budge”. But in the podcast Gibson and Laporte said it “klooj”. That bugged me, so I googled it. And, Wikipedia, and (as well as many more sources) all agree that “kludge” is indeed pronounced “klooj”. So Gibson and Laporte were right about that. But they are dead wrong about Microsoft.

