Hi everyone,
everytime I enter a new text for translation, when WB does its segmentation, I always end up with a great deal of tiny segments thanks to abreviations such as Mr., Mrs. or art. and such appear (In portuguese, Sr. Sra. and so on), and I was wondering how I could edit the SRX segmentation rules file (which we can get under Settings->segmentation rules), and successfully (re)upload it.
I have been able to extract the file (Portuguese(pt) for me), understand (at least I think I did) how the rules work (the xml language) but I am not sure whether in the end I should load a .txt document (which I uploaded, but nothing really changed) or a .xml (which I tried and got a message saying: "The uploaded file does not comply with SRX standards") or any other type of document in order for this operation to work.
could you be so kind as to help me on this matter? Right now I don't know anymore if the problem is the file extension I use or the xml language that is not correct... or am I doing all the necessary steps to complete this operation?
Thank you for your help!
5 comments
-
Claudia Del Castillo Hi, Ive just received an answer on this
-
Sales Here is an extract form the ticket :
If you want to apply these exceptions to all the languages you have to change the "Language independent" SRX rules.
To do this you have to:
- Navigate to Settings > Configure (in front of "Segmentation Rules");
- Click on "View" in front of the default "Language independent" rules to download them;
- Open the file with a text editor (for instance Notepad++);
- Add the exceptions rules on the top of the other rules. Please note that by "on top" don't mean on top of the file but on the top on the "rule" nodes (please view attachment).The rules to add should look like:
<rule break="no">
<beforebreak>Mr\.</beforebreak>
<afterbreak></afterbreak>
</rule>The rule above is to not split on "Mr.". In order to exclude other abbreviations you just have replace "Mr".
Then you have to upload your file on Wordbee. You can edit the current default rules by using the "edit" button. Or you can add a new set of rules by using the "Add new" button. If you use the first option (edit) I would recommend you to do a copy of your current rules, just in case.
Regards,
Brahim
-
Joao Gaspar Thank you very much!
I wil try it and give some feedback as soon as I can.
-
Joao Gaspar Thanks Claudia and Brahim, things are way smoother right now. I still have trouble with the "Nº." abbreviation, I can't get anything with this one. Maybe because of the "º" symbol?
Anyway, much much better!
-
Jessica Liu Hi Joao,
May I ask which SRX editor are you using?