| Su | Mo | Tu | We | Th | Fr | Sa |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | 3 | 4 | 5 | 6 | 7 | 8 |
| 9 | 10 | 11 | 12 | 13 | 14 | 15 |
| 16 | 17 | 18 | 19 | 20 | 21 | 22 |
| 23 | 24 | 25 | 26 | 27 | 28 | 29 |
| 30 |
Browse archives
|
Zoe
Submitted by reeses on Thu, 2002-10-10 05:04.
Goose writes blog entries of disturbing length. Disturbing, because I can never muster the effort to type that much, usually giving up and deleting the last paragraph or two. If I'm lucky, I used my rusty "inverted pyramid" skills from Jr High, and I made enough sense to stop at any time and still make sense when I reread my post three months later. I made another effort at digging through the Zoe code, but it's slow going. I have a mountain of coding to do at work, yet somehow, teasing the yarn from this sweater is what I find myself doing. I'm really wondering what IDE was used to build this -- I have no idea how it compiled. There's a class that implements two of its inner interfaces. Now, it appears it has confused IDEA into a frozen loop. Tough stuff! So, let's sketch out the pieces that I would need to steal or build to implement equivalent technology.
I don't think I need a POP/IMAP client or server, because I could feed the engine with procmail, taking advantage of bogofilter and SpamAssassin. I am perfectly happy having two message stores -- the IMAP structure I use with mutt and Outlook Express, and the searchable store I browse with a web browser. I'm debating having a folder hierarchy automatically generated, perhaps similar to that employed by Endeca. It's tempting, but I'm very particular about data grouping. I'm sure it would take me a long time to get happy with it. Did I mention I'm fatally lazy? Anyway, I could probably spin up the kernel of this in Ruby over a weekend. I'd probably be able to do the message parsing, indexing by email address, and storage into a database right away. A web interface would take a day or so, for simple search and rendering of lists. The next bits would be the fun bits -- pulling out keywords while ignoring chaff, and building indexes for those keywords. A word that might be unique shouldn't be tagged, but if three or more emails contain that word, a link axis should be created on that keyword through those messages. I need to come up with a way to ignore common words, though -- I don't want every bloody word tagged. Articles are easy, names are harder. I'd have to add training to the requirements list -- type a word or phrase into a search bar, and that phrase would be a new axis between the existing documents. How long would it take to rebuild this index? It's a good thing I'm both lazy and swamped by work that pays me. I won't have to worry about this stuff. I'll just wait and watch Freshmeat until someone does Zoe right. :-) Post new comment |
SearchSimilar entriesUser login |