Untitled

Submitted by reeses on Mon, 2003-02-10 04:42.

Another day of quixotic tasks. Last night, I was trying to google for a page I had seen a few years ago, and was having very poor luck. I couldn't get the query string just right, and didn't find the page for some time, despite having looked for it about this time last year. While I may have made a bookmark, I did not transfer it between machines, and it was lost to me.

I decided that it would be a good idea to have the ability to search my entire browser history. Refer to the numerous articles on hackery and bodgosity that are prevalent in my solutions to personal technical problems for a full background, and you will be ready for what comes next.

I use both IE and Phoenix on Windows, and Phoenix less so on Linux. Therefore, a cross-platform, network-wide solution was called for. I could not write an NPAPI plugin alongside a COM object, because any combination would neglect at least one browser. Ideally, I would have a plugin that detected when I had viewed a page for more than n minutes, where n is some threshold value separating casual or involuntary pageviews (such as the infernal popups) from pages that I had found interesting enough to read in their entirety. I wanted to avoid a huge proliferation and consumption of disk space on my machine.

That said, I ended up settling on the heavyweight solution, which filled and occupied most of my Sunday afternoon, with brief breaks to run to the store, watch Alias, and phone my parents. And eat, etc. It didn't really take that long, but it was a road fraught with frustration.

See, I had the brilliant notion to slap a caching proxy in between all of the browsers and the outside world. I elected squid, because I've used it on many machines in the past several years, and it works well enough. Ask anyone about a caching proxy, and "squid" will be one of the first ten words out of their mouths. If not, they're salespeople, or the sort of person with whom you should change the subject and expect no insight.

Squid would enable me to avoid the browser issues, because all of the browsers support proxies. Yay. This would solve one part of my problem, and in my brilliance, I thought I had a solution to the other part.

ht://dig is a wonderful free tool that indexes a document corpus, and provides a search engine to search that index. I planned to feed it the stems of the pages requested through squid, since squid keeps a very easily consumed log of all pages requested. I'd do some simple preprocessing of this log, and feed it to htdig.

This sounds really good, right? You're wondering,"Why did this suck up so much time?"

Well, in spite of my obvious brilliance, like all superheroes, I had an achillean weakness.

I wanted this to run on Windows.

Oh!

Squid was easy enough to install, requiring only minor modifications, and I expected a similar LOE with htdig.

Sucker.

Let's just say that building it under cygwin sounds nice, but is a bad idea. Kind of like trying to build xscreensaver under the same environment. It just doesn't work so well. At one point, I had to edit a compiled binary, because I was afraid to attempt a recompile after changing a hardcoded path.

But I did it. And it is good.

I have a suggestion for those considering the same course of action:

  • Download one of the binary installations available, even though they're of a mesozoic vintage.
  • Install it instead on a Unix box of any variety, as long as you don't think "cygwin" == "Unix".
  • Give up and rely on mad google skillz.

Do any of these, and you are smarter than I.

Post new comment

Captcha Image: you will need to recognize the text in it.
Please type in the letters/numbers that are shown in the image above.