Frequently asked questions¶
UnicodeError: Can not determine encoding of input files¶
If the encoding of the input data is not detected automatically, you need to specify it on the command line, e.g.
--encoding=ISO_8859-15
The word counts in the output are smaller than expected¶
If the counts in the input data are already dampened, you should not specify a dampening to FlatCat.
--dampening none
The input does not seem to be segmented¶
If you receive the following warning when loading a segmentation:
#################### WARNING ####################
The input does not seem to be segmented.
Are you using the correct construction separator?
the reason might be that the data is in an unexpected format. By default Morfessor FlatCat assumes that morphs are separated by a plus sign surrounded on both sides by a space ‘ + ‘. If your data uses e.g. only a space to separate morphs, you need to specify:
--construction-separator ' '