I'm thinking of some work on the Gargoyle. All input strings are run through a series of cleaning steps:
strip_phrase - remove punctuation marks and a list of empirically derived list of 'boring' words that only hinder quip macthing ("the", "and", "a", "but", "for", "this", "with", "on", "in", "of", "by", "so", "an", "or", "as", "are", "arent", "is", "isnt", "am", "be", "do", "dont", "it", "to", "at", "have", "havnt", "has", "hasnt", "what", "when", "who", "where", "why", "how", "that", "thats", "its", "think", "now", "then", "about")
remove_pronouns - remove all the pronouns, noting which ones were present in the metadata for use later
remove_players - remove all the player names, noting which ones were present in the metadata for use later
root_phrase - convert remaining words to their root form ("hacks", "hacker", "hacked", "hacking", "hackable" -> "hack")
expand_phrase - look for words in a custom flat thesaurus, expanding as given
It's the last step I want to improve. Currently the expand list is very limited. It's great for limited domains: {"dog", "puppy"}, {"cat", "kitten"}, {"cow", "calf"}, etc. But there is no way to represented hierarchies of related concepts. The Gargoyle can never go from 'dog' to 'cat' or 'mammal' or 'animal'. This is particularly important for less trivial examples where most of the words are unlikely to appear in the quip database. 'Saville' should match other cities in Spain, but if they don't exist it's extra important to be able to look for related concepts both vertically and laterally: 'Spain', 'Europe', 'Paris', 'Basque', etc.
So I obviously need a tree structure for the thesaurus. And I also need an iterative expansion step that continues to look farther and farther from the original node until it finds words that exist in the quip database. (I'll probably make it randomly go even farther, just to make results a bit less determinate.) None of this should be very hard, but I'm unsure what the thesaurus editing UI should be like. Can I (pragmatically) do it all from the MOO, or should I make a web interface? More importantly, can I find an existing ontology to start from and save myself all that work?
strip_phrase - remove punctuation marks and a list of empirically derived list of 'boring' words that only hinder quip macthing ("the", "and", "a", "but", "for", "this", "with", "on", "in", "of", "by", "so", "an", "or", "as", "are", "arent", "is", "isnt", "am", "be", "do", "dont", "it", "to", "at", "have", "havnt", "has", "hasnt", "what", "when", "who", "where", "why", "how", "that", "thats", "its", "think", "now", "then", "about")
remove_pronouns - remove all the pronouns, noting which ones were present in the metadata for use later
remove_players - remove all the player names, noting which ones were present in the metadata for use later
root_phrase - convert remaining words to their root form ("hacks", "hacker", "hacked", "hacking", "hackable" -> "hack")
expand_phrase - look for words in a custom flat thesaurus, expanding as given
It's the last step I want to improve. Currently the expand list is very limited. It's great for limited domains: {"dog", "puppy"}, {"cat", "kitten"}, {"cow", "calf"}, etc. But there is no way to represented hierarchies of related concepts. The Gargoyle can never go from 'dog' to 'cat' or 'mammal' or 'animal'. This is particularly important for less trivial examples where most of the words are unlikely to appear in the quip database. 'Saville' should match other cities in Spain, but if they don't exist it's extra important to be able to look for related concepts both vertically and laterally: 'Spain', 'Europe', 'Paris', 'Basque', etc.
So I obviously need a tree structure for the thesaurus. And I also need an iterative expansion step that continues to look farther and farther from the original node until it finds words that exist in the quip database. (I'll probably make it randomly go even farther, just to make results a bit less determinate.) None of this should be very hard, but I'm unsure what the thesaurus editing UI should be like. Can I (pragmatically) do it all from the MOO, or should I make a web interface? More importantly, can I find an existing ontology to start from and save myself all that work?
no subject
Should you need a hierarchy of interesting methods of torture and death...