Saturday, October 4th, 2003 01:52 pm
As I mentioned in the previous post, I had to come up with a list of irregular plurals in English. This is what I have so far. I'm posting it here because 1) I think they're very pretty words, and 2) if you see any that I'm missing, please post.


calves - calf
elves - elf
halves - half
hooves - hoof
knives - knife
leaves - leaf
lives - life
loaves - loaf
scarves - scarf
selves - self
sheaves - sheaf
shelves - shelf
staves - staff
thieves - thief
wives - wife
wolves - wolf
feet - foot
geese - goose
lice - louse
dice - die
men - man
mice - mouse
teeth - tooth
women - woman
people - person
brethren - brother
children - child
oxen - ox
algae - alga
alumnae - alumna
amoebae - amoeba
antennae - antenna
coronae - corona
faunae - fauna
florae - flora
formulae - formula
larvae - larva
nebulae - nebula
novae - nova
placentae - placenta
pupae - pupa
retinae - retina
supernovae - supernova
vertebrae - vertebra
alumnus - alumni
bacillus - bacilli
cacti - cactus
foci - focus
fungi - fungus
hippopotami - hippopotamus
magi - magus
nuclei - nucleus
octopi - octopus
radii - radius
stimuli - stimulus
syllabi - syllabus
termini - terminus
thesauri - thesaurus
addenda - addendum
bacteria - bacterium
curricula - curriculum
data - datum
errata - erratum
genera - genus
media - medium
memoranda - memorandum
millenia - millenium
ova - ovum
strata - stratum
symposia - symposium
stadia - stadium
apices - apex
appendices - appendix
cervices - cervix
indices - index
matrices - matrix
vortices - vortex
analyses - analysis
axes - axis
bases - basis
crises - crisis
diagnoses - diagnosis
ellipses - ellipsis
emphases - emphasis
hypotheses - hypothesis
metamorphoses - metamorphosis
neuroses - neurosis
oases - oasis
paralyses - paralysis
parentheses - parenthesis
synopses - synopsis
syntheses - synthesis
theses - thesis
criteria - criterion
phenomena - phenomenon
automata - automaton
schemata - schema
stigmata - stigma
cherubim - cherub
seraphim - seraph
beaux - beau
tableaux - tableau
Saturday, October 4th, 2003 01:58 pm (UTC)
I cry for alphabetization.
Saturday, October 4th, 2003 02:02 pm (UTC)
They're alphabetized within their specific pluralization grouping. All the other online lists I gathered these from did it that way. Anyway, that's what the search feature on your browser is for.

Oh, and I just added dwarves - dwarf to go with elves - elf.
Saturday, October 4th, 2003 03:11 pm (UTC)
Would words like moose, sheep and deer count? Seeing as how they're the same when plural?
Saturday, October 4th, 2003 03:50 pm (UTC)
They would for a generic list. I'm collecting them for the new root_phrase function on the Gargoyle, however, which can safely ignore auto-plurals like those. It doesn't recognize them as plurals, so it doesn't do anything to them to make them singular, which happens to be the correct action to take to make them singular. :)
Saturday, October 4th, 2003 04:51 pm (UTC)
So if people use the wrong pluralization -- which they definitely do... will it be recognized as a plural word if it follows the standard English pattern of -s -es?

'brothers', for example.
Saturday, October 4th, 2003 05:27 pm (UTC)
That is the idea, and it works about as well as any algorythmic approach to natural language can.

De-morphemization is in two steps. First words are de-pluralized (check for irregulars, then -ies, then -es, then s). After that, it checks for various other rules (-ing, -ed, -er, -est, -ly-, -ize, -less, -ness, -able). The code has no sense of meaning, so if a word matches the check for a plural word, it get depluralized, whether or not that is the 'correct' pluralization.

However, the standard rules fail pretty often, English being a complete whore of a language. 'Vignettes', for instance, will de-pluralize to 'vignett'. 'Brother' ends up as 'broth' (as does 'brothers' and 'brethren'). So you might get some soup related false-postives in this case, but no false-negatives. And you'll only get the false-positives when the output happens to be a word, which is moderately rare.
Sunday, October 5th, 2003 08:44 am (UTC)
I have a small quibble with your definition of "irregular".

1) If a French, Latin, Greek, or Hebrew word follows a regular pluralization paradigm for that language, is it really irregular? In other words, should we acknowledge "foreign" words as foreign, instead of calling them "irregular English words"?
2) The f-to-v transformation is, in my opinion, a normal ablaut, not an irregularity. This is an euphony feature to make the resultant word easier to pronounce, not a change within the root word.

Just my two bits. (Ha! a pun!)
Sunday, October 5th, 2003 09:02 am (UTC)
Because the Latin/Greek/French/Hebrew word has been adopted into English and will be used in an English-speaking context, its plural is irregular according to English paradigms. I think, since so many words are of foreign adoption (many far less obviously than "bureaux") and they function in English just as well, it is more useful for the Gargoyle to handle the fact that the foreign noun is pluralized than the fact that it is foreign.
Sunday, October 5th, 2003 09:21 am (UTC)
Yes, that is a point. I forgot the point of the exercise, i.e. automated language processing. In this context the definition makes sense.
Sunday, October 5th, 2003 09:36 am (UTC)
IIRC, alumna is the feminine singular of alumni. The male/gender-neutral singular is alumnus.

One of my favorites is gris-gris and gris-gris - identical spelling, different inflection. Gotta love those wacky French.
Sunday, October 5th, 2003 04:36 pm (UTC)
That gets into the question of what defines a language. I'm pretty loose about it, personally. English is whatever English speakers speak. While I recognize that most of our irregular plurals are loan words, I can't consider them foreign. They're not pronounced with a non-English accent, you don't write them in italics, no one thinks you're being pretentious for working them into a conversation. Etymology is fun, but it doesn't really mean anything. It's descriptive, not predictive.