by Mountaineerbr

#12 - Spell checking programmes


Escolhendo entre alguns programas para conferir a ortografia disponíveis no Linux e algumas dicas para usá-los.


That came the time to run the spell checker on all my website pages. There were many errors. Spell checking is very important.

Just this week fraud was discovered in the candidate for a position in the Supreme Court (STF) in Brazil. The fraud suspicion was confirmed because even bad orthography was copied from a lawyer's articles to the Supreme Court nominee master degree dissertation! His former supervisor said the university will need reviewing his thesis again due to the facts exposed in the media. He has just recently completed his PhD degree in Universidad de Salamanca (Spain), but has not published his thesis yet. It is speculated the nominee has not completed formal postdoctoral work as written nis his curriculum, either.

Here are some tips of what spell checkers I am using.

Aspell and Hunspell

GNU Aspell is a clone of old Lspell and it stands out because it suggests the correct words in some cases when Hunspell does not, see this blog post comparison between both programmes and this extensive list of comaprison. It must be noted, though, that Aspell development is well and active, contrary to what was claimed by the blog post from battlepenguim.com .

Hunspell is the most popular spellchecking library. It is included in OpenOffice, LibreOffice and it appears to be included in Firefox and Chrome, too.

The difference in popularity may also have to do with fear of copyright law enforcement amongst other things. Even though Stallman said it was OK for GNU Aspell to use a copyrighted word list for the English language, there seems to be a message in the mailing lists with some worries from one developer about the case.. See the mailing list and the follow-up message.

Aspell seems to have had better utf-8 support than Hunspell over time.

Other factors of comparison:

1.1.2 Comparison to Hunspell
  • Hunspell is based on Myspell. Myspell was created as a simple spell checker for use in OpenOffice. Myspell’s affix support was merged into Aspell 0.60 so Hunspell and Aspell can use the same affix file to some extent.
  • Hunspell has better support for language peculiarities including compounding and complex morphology. However the Hunspell (v1.7.0) utility currently has problems with apostrophe (') in words.
  • Aspell generally has better suggestions for non-phonetic langauges such as English and French. Aspell suggestion speed also tends to be significantly faster than Hunspell. For a comparison of suggestion quality and speed see http://aspell.net/test/.

Hunspell supports setting multiple dictionaries at once while Aspell requires the user to create a custom dictionary. For example merging en_US with en_UK and pt_BR dictionaries. Sincerely, if there is that option in Aspell, I could not find it. I found this page with a little how to get word lists from the dictionary .rws files, but that is just an example link, I don't know if that is the best tutorial over there if you are really going to do it. However, since different languages have different suggestions strategies and affix rules this is a non-trivial task. Afixing rules are used to generate inflection lists, for example.

One alternative for Aspell is to list all words of a document that are wrong in one language with the option list, pipe to another instance of Aspell which will list all words that cannot be found in the dictionary of another language, such as


$ aspell --dict-dir=/usr/lib/aspell-0.60/ --lang=pt-BR list <index.html | sort -u | aspell list

The only problem here is that you cannot know what line number the wrong word is. See also this blog post which details on the same workaround example.

For checking spelling of a single word, I wrote this shell function. While not ideal it should work for multi language checking if you don't mind some verbose scat:


# ~/.bashrc


#spell check
sp()
{
	echo "PORTUGUESE"
	aspell --lang=pt-BR --encoding=utf-8 pipe <<<"$1"

	echo "ENGLISH"
	aspell --lang=en-GB --encoding=utf-8 pipe <<<"$1"
}

Both Aspell and Hunspell command line interfaces are good and a clean way to fixing bad words in multiple or single files. If you use Vim beware about losing its undoes for files edited with the spell checker default interfaces!

Vim spell checker

In order to avoid losing Vim undoes it is best to use Vim built-in spell checker. I remember configuring my .vimrc with some extra dictionaries knowing that would be useful at some point. But as said, I have got the poor habit of not using spell checker, hopefully we will change that now!

I got a medical terms wordlist and placed it at file path ~/.vim/spell/medicalterms-en. Then I set the following in my .vimrc:


" ~/.vimrc


"spelling check

"dictionaries
set spelllang=en_gb,pt_br,medicalterms-en

"custom dictionary
set spellfile=~/.vim/spell/en.utf-8.add

The filename of the custom dictionary is en.utf-8.add because a dictionary starting with en will be read for all variants of English (I haven't got the reference but that is what I can remember).

Vim help page from :help spell cites en-rare and medical_ca dictionaries but I could not find wordlists for them on the internet so far.

Setting the spellfile property (custom words) is not as important as setting at least one main language for the spelllang property in .vimrc, however you can just set vim for a single session with:


:setlocal spell spelllang=en_us

If you don't care the setting applies only to current buffer, you can use :set spell and :set nospell to disable.

Other than configuration, usage is very simple. Perhaps the most important shortcut for the spelling function is z= when cursor is on a highlighted (bad) word. It will prompt the user to pick a suggestion from a list.

Other useful shortcuts are ]s to go to the next misspelled word and [s to go backwards. More on usage on vim help :help spell-quickstart or from this tutorial.

The user will end up composing a custom dictionary with words she uses the most, such as frequent proper names and particular expressions of her labour niche.

zg                                       Add word under the cursor as a good word to the first name in 'spellfile'.

:spellr[rare]! {word}         Add {word} as a rare word to the internal word list, similar to |zW|.

From :help spell-quickstart

I reckon that checking spelling and grammar should happen in-flight for simple mistakes or after the text is complete in order to avoid breaking the trail of thoughts while writing. Running the spell checker is a habit to be acquired! ;]