September 2022

S M T W T F S
    123
45678910
11121314151617
181920 21222324
2526 27282930 

Style Credit

Expand Cut Tags

No cut tags
Wednesday, February 11th, 2004 12:36 am
I'm thinking of some work on the Gargoyle. All input strings are run through a series of cleaning steps:
strip_phrase - remove punctuation marks and a list of empirically derived list of 'boring' words that only hinder quip macthing ("the", "and", "a", "but", "for", "this", "with", "on", "in", "of", "by", "so", "an", "or", "as", "are", "arent", "is", "isnt", "am", "be", "do", "dont", "it", "to", "at", "have", "havnt", "has", "hasnt", "what", "when", "who", "where", "why", "how", "that", "thats", "its", "think", "now", "then", "about")
remove_pronouns - remove all the pronouns, noting which ones were present in the metadata for use later
remove_players - remove all the player names, noting which ones were present in the metadata for use later
root_phrase - convert remaining words to their root form ("hacks", "hacker", "hacked", "hacking", "hackable" -> "hack")
expand_phrase - look for words in a custom flat thesaurus, expanding as given

It's the last step I want to improve. Currently the expand list is very limited. It's great for limited domains: {"dog", "puppy"}, {"cat", "kitten"}, {"cow", "calf"}, etc. But there is no way to represented hierarchies of related concepts. The Gargoyle can never go from 'dog' to 'cat' or 'mammal' or 'animal'. This is particularly important for less trivial examples where most of the words are unlikely to appear in the quip database. 'Saville' should match other cities in Spain, but if they don't exist it's extra important to be able to look for related concepts both vertically and laterally: 'Spain', 'Europe', 'Paris', 'Basque', etc.

So I obviously need a tree structure for the thesaurus. And I also need an iterative expansion step that continues to look farther and farther from the original node until it finds words that exist in the quip database. (I'll probably make it randomly go even farther, just to make results a bit less determinate.) None of this should be very hard, but I'm unsure what the thesaurus editing UI should be like. Can I (pragmatically) do it all from the MOO, or should I make a web interface? More importantly, can I find an existing ontology to start from and save myself all that work?
Wednesday, February 11th, 2004 08:57 pm (UTC)
Somehow, this puts me in mind of the hierarchical catalogue Daniel's put to work on in Quicksilver.

Should you need a hierarchy of interesting methods of torture and death...