How to search the internet 2: how a modern web search works

March 29, 2010

In the first instalment of this guide on how to search the internet, I gave a little history of the search engine: I covered Archie, Gopher, and site directories like the Open Directory Project. Those are the old technologies, all pretty much obsolete now. That brings us to the present day and the modern search engine.

When I write “modern search engine”, I mean web search sites like Google and Bing. Because they all work in pretty much the same way – the only difference seems to be in the algorithms each service uses.

Now I could tell you all about spiders crawling the web and stuff, but I think most of you would just tune out after a couple of lines. So I will give you 2 lovely Youtube videos to watch instead:

The 3 Minute Guide to How Search Works:

A slightly longer video that looks at the subject from the perspective of a webmaster who wants to increase traffic to his site:

Watched them? Good. So now you have the basic idea: little programs called “bots”, “crawlers” or “spiders” are sent out to crawl over the world wide web, following links, and compiling lists of URLs that they consider to contain good information. And how do these mindless software automatons decide that the info is “good”? It all comes down to the algorithms.

It’s Google’s algorithms – the “secret ingredient” – that has made Google the world’s favourite search engine and kept them at the top for so many years. Any coder of sufficient proficiency can create bots to crawl the web; but it’s the secret algorithms that turn a regular bot into a googlebot. And there just hasn’t been another bot that can compete.

At least that’s how it has seemed for some time. Yahoo has a hard core of admirers; Altavista.com has had success mostly due to its “Babel Fish” translation service blowing its rivals out of the water; but it’s only recently that a true contender for the title of Number One Search Engine to step up and challenge Google. That challenger’s name: Bing.

Microsoft has been trying for years to break into the search engine market, with a plethora of products: Live Search, Windows Live Search, MSN Search – they even tried to buy, then made a deal with Yahoo to get that Microsoft name up there with the giants – but nothing was able to make much impact on Google. Then in 2008 Microsoft (following the tried and tested strategy of “embrace, extend, extinguish”) bought a tech company called Powerset and, importantly, its “semantic technology”. Microsoft claim that their improved technology cuts down on the risk of “search overload”, when a user is inundated with millions of barely relevant results – something that can happen when using Google. And Microsoft has used the near-ubiquity of its web browser, by incorporating Bing into Internet Explorer 8. Google is still number one search engine, but Microsoft has certainly made its mark on the territory.

So who’s going to win this battle of the search engines? I think it could still go either way. Google has years of good form and a hell of an online presence; but Microsoft still owns the desktop and the browser. And anyway, someone else might come from the left field and clinch it in the final seconds – Ixquick is a potential outside bet with their whole “ethical privacy” trip; Google’s got the “Don’t be evil” motto but it’s Ixquick who are out there actually being “not evil” (and if privacy is a major concern, don’t forget Scroogle). One thing we should have learnt from IT history is that nothing is set in stone.

I’ll bet you’re thinking “Oh well done Google and Microsoft, give yourselves a pat on the back… but what in hell has any of this got to do with how to use a goddamn search engine?!! I figured it would be useful to cover all this history and present situation stuff. Well, maybe interesting rather than useful… I certainly find this kinda crap fascinating. But you’re right, it doesn’t tell us a great deal about how to use a search engine. So I promise: the next instalment of this howto will actually cover some proper howto material. So keep ’em peeled… you definitely don’t want to miss this!!

_gos=’c4.gostats.com’;_goa=354450;
_got=2;_goi=2;_goz=0;_gol=’Free hit counter’;_GoStatsRun();
Free hit counter
Free hit counter


Microsoft request takedown of Cryptome.org… then change their minds out of the goodness of their black hearts…

February 26, 2010

Well, Cryptome.org, one of my favourite repositories of “documents for publication that are prohibited by governments worldwide, in particular material on freedom of expression, privacy, cryptology, dual-use technologies, national security, intelligence, and secret governance”, has had an eventful day. They put online a copy of Microsoft’s “Global Criminal Compliance Handbook – US Domestic Version”, an interesting little booklet that describes exactly what info they collect about their customers for law enforcement; this annoyed Microsoft enough for them to get onto Network Solutions, Cryptome.org’s ISP, and have the site knocked off the web!

Somewhat predictably, this quickly became big news on the interwebs. But it doesn’t look like Microsoft predicted that outcome! Someone there must have realised, somewhat belatedly, that it doesn’t look too good to be a nasty bully silencing the voice of freedom. So a Microsoft lawyer got back to Network Solutions and asked for Cryptome.org to be reinstated! Here’s the email Microsoft sent to Network Solutions:

X-MimeOLE: Produced By Microsoft Exchange V6.5
Received: from opsmail.prod.netsol.com ([10.221.32.60]) by nsiva-exchange4.CORPIT.NSI.NET with Microsoft SMTPSVC(6.0.3790.3959); Wed, 24 Feb 2010 22:47:25 -0500
MIME-Version: 1.0
Content-Type: multipart/alternative;
boundary=”—-_=_NextPart_003_01CAB5CD.3E340480″
Received: from corpcm3 (corpcm3.mgt.netsol.com [10.221.32.102]) by opsmail.prod.netsol.com (8.12.10/8.12.10) with ESMTP id o1P3lOsM023759 for ; Wed, 24 Feb 2010 22:47:24 -0500 (EST)
Received: from [10.253.64.77] ([10.253.64.77:43581] helo=networksolutions.com) by corpcm3 (envelope-from ) (ecelerity 2.2.2.41 r(31179/31189)) with ESMTP id E2/39-15380-3C2F58B4; Wed, 24 Feb 2010 22:47:15 -0500
Received: (qmail 23471 invoked from network); 25 Feb 2010 03:45:41 -0000
Received: from dchost2.cov.com (HELO CBIEXI02DC.cov.com) (216.200.93.137) by tip2.lb.netsol.com with SMTP; 25 Feb 2010 03:45:41 -0000
Received: from cbiexm02sf.cov.com ([172.16.160.88]) by CBIEXI02DC.cov.com with Microsoft SMTPSVC(6.0.3790.3959); Wed, 24 Feb 2010 22:46:57 -0500
Content-class: urn:content-classes:message
Subject: Re: Ticket Number 1-452132847
Date: Wed, 24 Feb 2010 22:46:56 -0500
Message-ID:
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Thread-Topic: Re: Ticket Number 1-452132847
Thread-Index: Acq06IFo420gHHQPThWg1v/l3yM7TAAAjYcAAABtzJAAABl6EAAAmcd6ADcZlUA=
From: “Cox, Evan”
To: “DMCA”
Cc: “internet4[at]microsoft-antipiracy.com”

Dear Ms. Larsen:

I am outside counsel to Microsoft Corporation. I am writing to confirm my telephone message left with your nighttime operator at 7:45 PST this evening to withdraw Microsoft’s takedown request with respect to the file available at http://cryptome.org/isp-spy/microsoft-spy.zip which is the subject of the correspondence below.

While Microsoft has a good faith belief that the distribution of the file that was made available at that address infringes Microsoft’s copyrights, it was not Microsoft’s intention that the takedown request result in the disablement of web acess to the entire cryptome.org website on which the file was made available.

Accordingly, on behalf of Microsoft, I am hereby withdrawing the takedown request and asking that Network Solutions restore internet access to http: cryptome.org as soon as possible.

I can be reached at 415-640-5145 if you wish to discuss this request.

Sincerely,

Evan Cox
Counsel to Microsoft Corporation

So Cryptome.org is back up, even though the Microsoft compliance handbook is still available from there! All that ill-judged takedown request succeeded in doing is getting a shit-load more publicity for the handbook. Publicity that I’m doing my little bit for by posting this. I urge all interested parties to go to the site now and get yourselves a copy of it. There are also copies of compliance handbooks for many other organizations such as Yahoo, Facebook, Myspace, Comcast… and many more. Why not get yourself the full set?

I find it strange that a company as adept at public image as Microsoft can shoot itself in the foot in public so foolishly. Dicks.

_gos=’c4.gostats.com’;_goa=354450;
_got=2;_goi=2;_goz=0;_gol=’Free hit counter’;_GoStatsRun();
Free hit counter
Free hit counter


“Microsoft is good on security” shock bonk

January 26, 2010

Tuesday 26 January 2010

I have listened to the IT security podcast Security Now for some time now, and on the whole I’ve considered the host, Steve Gibson, to be a fairly sensible fellow. But my faith in the guy has been shaken, big time, after he said some real crazy-assed shit in the latest show (episode 232).

Gibson and fellow host Leo Laporte were talking about how Microsoft have been making incremental improvements to the security profile of its infamous web browser Internet Explorer. IE8 is a lot more secure than IE6, they said. Which is a reasonable thing to say. But then Laporte uttered these incredible words: “”Microsoft doesn’t have the greatest track record but I don’t think they’re particularly worse than anyone else [on security].” And the alleged security expert Gibson agreed!

Now, Laporte has a bit of an excuse. He’s a tech head, not a security guy. Yes, his technical background should tell him that Microsoft is a train wreck security-wise. But he’s a Microsoft fan in general, so we shouldn’t expect too much from him. But Gibson is a security professional – his hard disk data recovery utility, Spinrite, gets a lot of plaudits (many of them on his own site), and through his company GRC he sells a bunch of other security products. And the podcast generally makes excellent listening. So how can he be so deluded about Microsoft?

Because Microsoft is a truly appalling company when it comes to the security of its products (Microsoft is appalling in a lot of other ways too, but let’s concentrate on security here). For years the Windows operating systems have been infested with spyware, viruses, trojans and other malware. It’s only since Vista that Windows has had any decent security model at all. The browser Internet Explorer has long been a joke to most security-conscious computer users, most of whom use Firefox or Google Chrome/Chromium instead. IE is probably the vector for most of the attacks that take place over the internet. So even if we disregard IE’s other shortcomings, like its disregard for open standards embraced by the rest of the industry, it fails miserably when it comes to its users’ security.

Even Patch Tuesday – Microsoft’s vaunted update cycle – is a dangerous joke. Microsoft releases its software updates on the second Tuesday of every month (“whether they need to or not”, LOL). There could be a major 0-day vulnerability in the world’s most widespread personal computer software, threatening millions of users right now – but the fix won’t be released until the second Tuesday in the month comes round. And the computer criminals know this. They can engineer their attacks to make the most of the period between one Patch Tuesday and the next. If Mozilla (for example) discover a vuln in Firefox (for example) they will release the fix as soon as they can – usually within a couple of days. Microsoft will very very rarely release a fix before Patch Tuesday. And Gibson agrees with Laporte that Microsoft are “no worse than anyone else”? Crazy…

Tell you what though, Security Now 232 is still worth a listen. I won’t list everything covered, I’ll just urge you all to check it out (download link here). My confidence in Gibson may have been shaken by his comments about Microsoft, but the fact remains that he knows a lot about his business. One thing I learned is that I’ve been pronouncing the word “kludge” incorrectly for years. “Kludge” is hacker-speak, meaning an inelegant solution to a problem. I’ve always pronounced it to rhyme with “budge”. But in the podcast Gibson and Laporte said it “klooj”. That bugged me, so I googled it. And Dictionary.com, Wikipedia, and Answers.com (as well as many more sources) all agree that “kludge” is indeed pronounced “klooj”. So Gibson and Laporte were right about that. But they are dead wrong about Microsoft.

_gos=’c4.gostats.com’;_goa=354450;
_got=2;_goi=2;_goz=0;_gol=’Free hit counter’;_GoStatsRun();
Free hit counter
Free hit counter


Microsoft loses Euro court appeal

September 17, 2007

Ohh man, this is so sweeeet!! If you’re a lover of freedom, that is. If you think the rich and powerful should be allowed to do what they like, when they like, to whoever they like, then you may be dismayed. But the news, which I got from this BBC link, is good news to anyone whose opinion means anything to me.

If you don’t know the details of the case, I’ll give you the basics. In 2004, the European Commission decided that Microsoft was acting in an anti-competitive manner – for instance, if you bought their Windows operating system software, you had to buy their media player too. This made the market ridiculously stacked against the other manufacturers of media players. Player XYZ might be much better than Microsoft’s offering – and Microsoft’s products are notorious for their poor quality – but if you already had Microsoft’s media player (which you would, if you’d bought Windows) then you’d be less keen to shell out for another. If multimedia was of great importance to you, you’d make the sacrifice of effectively paying twice so you’d have a decent experience. But if you were one of the many customers who thought they’d just make do with the crap that had been forced on them rather than pay again… well, it’s obvious. Just as it was to the European Commission.

And if you were one of the discerning consumers who decided to do without any of Microsoft’s crapware… well, Microsoft had a trick up its evil sleeve for you too. Now that so much business is done online, your computer is likely to have to communicate with computers that are running Windows. But if your computer wasn’t running Windows as well, the two systems couldn’t interoperate properly. Microsoft built incompatibilities into its operating system. And Microsoft refused to tell its competitors how they might be able to adapt their own product to interoperate with the Windows rubbish. “Trade secrets,” said Microsoft. “Bullshit,” replied the EC.

So in 2004, after thoroughly investigating Microsoft’s business practices, the EC ordered the software company to stop this anti-competitive behaviour. And the Commission fined Microsoft for every day that the company refused to obey the law – fines that added up to over 280 million euros in six months!

Of course Microsoft appealed. They couldn’t just start obeying the same laws that their competitors had to. Microsoft’s entire business strategy is built on the concept of forcing everyone to buy their products – the infamous “Microsoft lock-in”. So they appealed to the European Court of First Instance. And today that court announced its decision.

They upheld every detail of the EC’s earlier decision, except one minor point. The EC wanted there to be an independent trustee to watch over Microsoft’s future behaviour, to make sure the crooks didn’t fall back into their old habits. The court thought this would be too invasive. Lucky old Microsoft, eh!

But Microsoft have to stop “bundling” software like its media player with the Windows operating system. Microsoft must reveal its technological “secrets” to the other players in the European computing business (this is something that’s standard practice with other companies – they call it “standards”!)

And, ohh, this must be so galling to the miserly “robber barons” in charge of the Microsoft machine – Microsoft must pay a fine of 497 million euros! £343 million! US$690 million! And they even have to pay 80% of the EC’s legal costs! (The EC, in turn, must pay some proportion of Microsoft’s lawyers’ bill – though I’m sure it’s nothing like what the guilty ones must cough up). This massive cost won’t just anger the management – think how the shareholders must feel!!

Do you think Microsoft will learn their lesson from this punishment? Do you think the criminals will be cowed, repentant, rehabilitated? Hah! Even as they reel from the blows of the European Court, Microsoft are dreaming up other ways to impose the lock-in. Their competitors have decided to agree on a common, open format for files like text documents and spreadsheets. Everyone seems happy with a format called .odf (open document format). But Microsoft are trying to impose their own format – OOXML – and appear to be bribing or pressuring everyone they can get at to agree with them. And they’re threatening developers of Linux – the Free Software operating system – with legal action based on software patents that most knowledgeable people agree are phoney.

Now is the time for everyone who uses computers to stand up against the purveyors of lock-in and the threat of litigation. Microsoft’s latest OS – Vista – is widely judged to be amongst the most insecure, unstable and encumbered code they’ve ever released! And to run Vista, you need to buy powerful, expensive hardware! All of Microsoft’s long-suffering victims, whose dependence on legacy formats 20 years out of date has kept them locked-in to this arrogant corporation – now they must say “No more!” There are other software solutions – there’s UNIX – there’s Apple’s OSX, itself based on UNIX – and there’s Linux, the Free version of UNIX – all these operating systems are far more advanced and secure than anything Microsoft can hack together. And, because they all share that common UNIX ancestry, they can interoperate at all levels… something Windows could never do, even if its owners wanted it to.

Today’s decision of the European Court of First Instance has been a heavy blow to the bully of the software world. And it’s a wake-up call to everyone else.


Ubuntu founder says SCREW YOU!! to Microsoft “deal”

June 20, 2007

Ubuntu founder Mark Shuttleworth has said there’s no chance of Ubuntu giving in to Microsoft’s protection racket.

http://www.linuxinsider.com/story/K8HEdxWd1Ut6TR/Ubuntu-Founder-No-Microsoft-Deal.xhtml

Novell, Linspire and Xandros have all signed up to Microsoft’s “interoperativity” deal – basically kissing Bill Gates’s ring in return for a promise that Microsoft won’t sue them for patent infringement – but Shuttleworth’s too canny to fall for such a blatant confidence trick.

You’d think it would be obvious to everyone.  Microsoft claim that Linux infringes 238 patents – but won’t specify what these patents are.  Ubuntu and Red Hat are saying “Put up or shut up!”  And Billy-boy still won’t say, he’s just trying to make threatening noises.  Dickhead.

If Microsoft don’t specify the patents, Ubuntu can’t stop infringing on them.  So Microsoft is perpetuating the situation.  Of course, Billy-boy says the only solution is to sign up for his deal.  But that’s a bunch of crap – the real way out would be for Microsoft to spell out where Linux code infringed on these patents, then Linux hackers could code work-arounds.

I can understand Xandros and Linspire falling for this dumb-ass trick.  Xandros has never been a true part of the Linux community, they’re just out to make a buck off other people’s work.  And Linspire… well, they used to be called “Lindows”, and their “unique selling point” was making Linux look and act like Windows… ’nuff said!

But Novell… hell, those guys have been in the business a long time, they have plenty of experience.  I guess maybe their surrender convinced the other weak hearts to crap out… and that makes me wonder all the more why Novell succumbed…

I’m not saying bribery and corruption.  No way am I saying that!


%d bloggers like this: