The Semantic Advantage

November 15, 2009

An old favorite plus a new favorite = solution

Filed under: knowledge management,products for semantic approach — Phil Murray @ 3:03 pm

No rants about search engines this week. Instead, praise for a terrific desktop search engine — dtSearch — and ABC Amber SeaMonkey Converter, one of many converters offered by Yernar Shambayev’s Process Text Group.

Online searches lead mostly to … more online searches instead of to reusable value. But we can’t live without them, and I have to admit that Google and Yahoo! are steadily improving the effectiveness of their products. However, sometimes our needs are more narrowly defined than locating something in all the world’s information.

When I’m building a network of knowledge using the approach I have designed, I need to know whether the idea or concept I want to add to the network is the same as — or similar to — other ideas or concepts already in the network. Let me stop for a minute and define idea as an observation about reality — the equivalent in meaning to a simple sentence in natural language. Contrast that with concept — the essential name of a thing, whether material or imagined. Concept appears to be the preferred terminology for practitioners who construct taxonomies (or facets), thesauri, and ontologies that organize such entities into larger structures. I won’t go into the fine points here.

I have not built a rich ontology of the concepts in the ad hoc spaces I discuss, and I haven’t found any affordable tools that allow me to look for similarities among ideas. So I resort to a very simple practice: I maintain a directory in which each idea and each concept occupies a separate file. The file contains the name of the concept or idea, explanations of those items, and text examples that contain instances of those items. A full text search of that directory using the new concept or idea as the query retrieves the search engine’s best guess at files that contain similarities with concepts and ideas already in the network of knowledge.

Or not. Because most search engines are primarily string-matching tools, and the files retrieved may not be what I want.

dtSearch is better than that. In addition to the features you might expect in a good or desktop enterprise search engine — including stemming, wild cards, fuzzy search, proximity search, and Boolean operators — you have the option of looking for files that contain synonyms based on Princeton’s WordNet — a kind of semantic network that anyone can use. So even if you can’t keep track of synonyms, the dtSearch tools will. You can add your own synonyms, too, within dtSearch.

Great stuff. Some consider the dtSearch interface dated, but I think it’s highly functional. Real easy to set up separate named indexes for different sets of directories, too. (Excuse me. I’m dating myself. We call them “folders” now, don’t we?)

I also use dtSearch for a variety of other search tasks — including finding emails from the thousands I have captured in SeaMonkey. Making those emails accessible in a reasonable (and, ideally, consistent) way has been virtually impossible. The native SeaMonkey search features — like those in other email clients I have encountered — are simply inadequate.

And even if those email search features were superb, they wouldn’t solve the problem, because SeaMonkey stores each mailbox as one big file. I do mean big for some of my mailboxes. So finding a huge file is almost meaningless. Big files will satisfy many queries unless you use proximity searches and other tricks, and even if one mailbox does contain the information you want, it may take a long time to find the right spot within that file. And you have to go through the same process if you want to execute that query again.

ABC Amber SeaMonkey Converter solves that problem by allowing me to split SeaMonkey mailboxes into separate HTML files. (I could use ABC Amber options to convert them to text or a couple dozen other output formats, but I prefer HTML for a variety of reasons.) When I use a dtSearch query against the directories containing those exported HTML emails, I get a highly relevant selection of small files — exactly what I want.

Very easy, too. When I ran ABC Amber the first time, it found the SeaMonkey mailboxes automatically. The emails in each folder were displayed in a list, and you can easily select as many or as few as you wish. Oh, and I should mention that ABC Amber promotional pages stress the ability of the converter to output a single, integrated HTML file from a mailbox. That’s a plus for many people, but not what I want.

I also tested the mailbox-to-TreePad converter. (You just click a different output option in ABC Amber.) The results were flawless and the TreePad outliner let me explore view the email content by date. Cool.

One caution: As of this writing, it appears that SeaMonkey has changed where it places email folders. So folders I created with SeaMonkey 2.0 — and any new email since the changeover — did not show up in the ABC Amber converter, but I was able to redirect the program to the new location using an ABC Amber option. I have advised the Process Text people about this.

UPDATE (16-nov-2009): The ProcessText people have already updated the converter. It now finds the SeaMonkey 2.0 mailboxes automatically. That was quick!

I’ve been using dtSearch for nearly a decade now. It’s still worth the money — about $200 for an individual license. Adding ABC Amber SeaMonkey Converter (about $20) to my set of tools will really make a difference.


Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at

%d bloggers like this: