gfish | (Reply)

S	M	T	W	T	F	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30

no subject

gfish.livejournal.com

Saturday, October 4th, 2003 05:27 pm (UTC)

That is the idea, and it works about as well as any algorythmic approach to natural language can.

De-morphemization is in two steps. First words are de-pluralized (check for irregulars, then -ies, then -es, then s). After that, it checks for various other rules (-ing, -ed, -er, -est, -ly-, -ize, -less, -ness, -able). The code has no sense of meaning, so if a word matches the check for a plural word, it get depluralized, whether or not that is the 'correct' pluralization.

However, the standard rules fail pretty often, English being a complete whore of a language. 'Vignettes', for instance, will de-pluralize to 'vignett'. 'Brother' ends up as 'broth' (as does 'brothers' and 'brethren'). So you might get some soup related false-postives in this case, but no false-negatives. And you'll only get the false-positives when the output happens to be a word, which is moderately rare.