Multilingual XML - Advanced configuration example
We were asked for help with configuring a multilingual XML, i.e. an XML containing texts in multiple languages.
The nodes in the file look like this:
<SAMPLE>
<ML_TITLES>
<ML_TI_DOC LG="DA">
<TI_TEXT>the Danish content</TI_TEXT>
</ML_TI_DOC>
<ML_TI_DOC LG="DE">
<TI_TEXT>The German content</TI_TEXT>
</ML_TI_DOC>
<ML_TI_DOC LG="EN">
<TI_TEXT>The English content</TI_TEXT>
</ML_TI_DOC>
</ML_TITLES>
<ML_TITLES>
(more nodes for different languages)
</ML_TITLES>
</SAMPLE>
As you can see we need to extract the "TI_TEXT" nodes and store them to the language encoded in the "LG" attribute.
Now we configure the text extraction rules in Wordbee by going to "Settings", then "XML files" and clicking the "Add new" button. Tick the "Multilingual" option:

Now add this XPATH configuration for all the languages in the file. In this example I have added just two of the languages:

"//ML_TITLES" selects all the parent nodes that then contain the various language versions. The "//" prefix tells that the node can appear in any position in the XML.
Then we add one line per language. ML_TI_DOC/TI_TEXT is a language version. We use [@LG="EN"] instructions to choose the node for a specific language only.
If this sound Chinese don't worry. The XPATH syntax is not easy to get started with. In the configuration page you find a tool to help with building basic expressions after pasting your XML files.
Use the "Test configuration" link on the configuration page to quickly upload your file and see what it extracts.
Enjoy,
Stephan
-
Let us push the example above a bit further ;-)
The original file that was used above actually defined so called "namespaces". Such XML contain one or more URL-like declarations in the root node:
<SAMPLE xsi:schemaLocation="http://publications.europa.eu" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> ...
As soon as you have namespaces you cannot locate a node using:
- //ML_TITLES
Instead you need to substitute by:
- //*[local-name()="ML_TITLES"]
The full setup in the post above would then look "somewhat" more complicate:

Hoping you still enjoy :-)
0
Please sign in to leave a comment.
Comments
1 comment