random thoughts to oil the mind

Tag: memoQ

Commenting memoQ Light Resources

Editing memoQ’s light resources can be a painful experience, with somewhat clunky interfaces, intimidating lists, small default window sizes, and the inability to add comments to rules written with regular expressions. Anyone who’s tried to pin down an error in a list of dozens of autotranslatable rules will know the maddening experience of trying to wade through to reams of similar-looking rules to find the culprit, especially as any edited rules are automatically re-added to the bottom of the list, so any initial effort at structuring rules according to their purpose gradually falls apart over time.

As Kevin Lossner recently pointed out, one clever strategy is to export copies of the rulesets and, adding comments to these XML files, essentially manage these rules outside of memoQ. This makes the rules themselves easier to navigate, and indeed edit, with changes being implemented with a simple reimport of the file.

However, there can be a couple of disadvantages to this solution, depending your workflow. Firstly, the comments only work in one direction, as they’re lost on import. If you make any alterations to resources from within memoQ, exporting the updated ruleset would mean having to combine the files or otherwise restore all the comments. Secondly, memoQ prevents you from overwriting resources on import, so while you can always add a new version and delete the previous one, it can prove to be a royal pain if you also need to update a large number of templates and/or ongoing projects which are using the resource.

Fortunately there is one more cheeky option available to us to make our rules easier to read, however, and that is to abuse named capturing groups and use them as comments. For example, take a rule from a cascading filter for tagging the asterisks in a Markdown list:

^\s*\*\s+

We can simply give this group a name and make it easy to identify:

(?<unordered_list>^\s*\*\s+)

Once all rules have been commented, we immediately have a much cleaner overview of the ruleset, especially useful when you need to go back in and tweak or add something later.

Example cascading filter for Markdown

Note that one potential side-effect of this strategy is that mixing named and numbered capturing groups may upset the numbering, so particularly for autotranslatable rules it may be easiest simply to use named capturing groups consistently throughout.

[Photo by Daiga Ellaby on Unsplash]

memoQ and XSLT: Fun with Namespaces

Using memoQ to translate standard XLIFF (XML Localisation Interchange File Format) files can be made that bit more user friendly when you take advantage of the built-in feature to use XSLT transformations. Since I can’t get my head around namespaces, my simple transformation ended up strewn with unreadable references to local-name() nodes. As ever, there is an easier way.

Take a standard XLIFF file along the lines of:

<?xml version="1.0" encoding="UTF-8"?>
<xliff version="1.0" xmlns="urn:oasis:names:tc:xliff:document:1.2">
   <file source-language="en" datatype="plaintext" original="Project">
      <header/>
      <body>
         <trans-unit id="string">
            <source>This is the source content.</source>
         </trans-unit>
      </body>
   </file>
</xliff>

We can identify the nodes in the source using standard XPath syntax having defined the namespaces in the header:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
   xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
   xmlns:xlf="urn:oasis:names:tc:xliff:document:1.2"
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   exclude-result-prefixes="xsl xlf xsi"
>
<xsl:output method="html" omit-xml-declaration="yes" />
<xsl:template match="/">
<xsl:text disable-output-escaping='yes'><!DOCTYPE html></xsl:text>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
   <link rel="stylesheet" type="text/css" href="yourStyles.css" />
   <title>XML Preview</title>
</head>
<body>
   <h1><xsl:value-of select="xlf:xliff/xlf:file/@original"/></h1>
   <xsl:for-each select="xlf:xliff/xlf:file/xlf:body/xlf:trans-unit">
      <div class="id"><xsl:value-of select="@id"/></div>
      <div class="content"><xsl:value-of select="xlf:source"/></div><br>   </xsl:for-each>
</body><br></html><br></xsl:template><br></xsl:stylesheet>

This produces a basic, ugly but ultimately workable HTML file which memoQ can display in its preview pane:

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
   <head>
      <link rel="stylesheet" type="text/css" href="yourStyles.css"/>
      <title>XML Preview</title>
   </head>
   <body>
      <h1>messages</h1>
      <div class="id">string</div>
      <div class="content">This is the source content.</div>
   </body>
</html>

Obviously you can add references to any other information available in your XLIFF files, and then serve and style up the resulting HTML in any desired shape or form, but this basic scaffolding might help someone out there avoid the namespace minefield I ran into!

[Image courtesy of @emilep]

Ein kleiner Kniff für die Qualitätssicherung bei memoQ

This post is also available in English.

Die Übersetzungssoftware memoQ verfügt über eine nützliche Funktion als teil des QA-Checks, wodurch in der Zielsprache nach verdoppelten Wörtern geprüft wird. In meinem Fall hat es schon mehrmals auf kleine Tippfehler hingewiesen, wo ich versehentlich ein „and and“ oder ein „to to“ geschrieben habe. Dennoch bleibt der Vorteil dieses Checks etwas eingeschränkt, wenn man in seiner Sprache regelmäßig solche Formen hat, die diese Verdoppelung verlangen. Im Deutschen denkt man an Sätze, die die Wörter „die die“ verlangen. Der französische Übersetzer wiederum rauft sich die Haare, als ihm memoQ zum zigsten Mal ein „nous nous“ ankreidet.

Doch mithilfe der relativ neu eingebauten Regex-Funktion kann man dieses Problem tatsächlich beheben. Editiert man seinen Regelsatz für die Qualitätssicherung, kann man den Standardcheck unter dem Konsistenz-Reiter ausschalten, dafür unter dem Regex-Reiter eine neue Regel erstellen, die diesen Check ersetzt aber Rücksicht auf Ausnahmen nimmt. Für Französisch zum Beispiel kann man die folgende Regel als Forbidden regex match in target((Leider sind die Hilfsseiten von Kilgray auf Deutsch nicht aktuell, daher hier die englischen Namen.)) eingeben:

(?i)(?![nv]ous\b)(\b\S+\b)\s+\b\1\b

Wenn aktiviert, prüft diese Regel weiterhin nach doppelten Wörtern in der Zielsprache, einschließlich üblicher Interpunktionszeichen wie bspw. Apostrophen, ignoriert jedoch jeder Fall von „nous nous“ oder „vous vous“. Die Ausnahmen in der Regel vorne kann man dann beliebig erweitern, je nach Bedarf. Die Regel ist bestimmt nicht fehlerlos, aber sie kann die Anzahl der falschen Warnungen enorm verringern, ohne dass man auf diesen Check komplett verzichten muss.

[Foto von Ilya Pavlov auf Unsplash]

memoQ QA Check Tweak

Dieser Eintrag ist auch auf Deutsch verfügbar.

memoQ has a handy little feature as part of its QA check which warns you whenever you double up a word in the target language. I’ve had it catch numerous little and ands and to tos which slip into my work on occasion. However certain combinations of doubled up words are fairly commonplace, which can lead to this feature producing lots of unnecessary false errors. A classic example in English might be two hads in a sentence like ‘I had had enough,’ but that pales in comparison to a language like French, which sees plenty of doubled up words in pronominal verbs (nous nous lavons, vous vous souvenez etc.)

One way to fix this is to make use of the relatively new regex feature built into the QA check. Untick the option to check for duplicate words in the target under the Consistency tab. Then under the Regex tab we can replicate this functionality, while including our own exception to the rule. Add a new rule of the type Forbidden regex match in target, give it a relevant description, and then add this target regex:

(?i)(?![nv]ous\b)(\b\S+\b)\s+\b\1\b

When active, this rule will continue to highlight any duplicate words in the translation, including all the usual punctuation marks, but ignores any occurrences of nous nous or vous vous. Obviously these exceptions at the front can be replaced with whatever is required in the target language. The rule isn’t by any means flawless, and will for example also complain about repeated sequences of numbers, but it can help to reduce the number of false positives without having to abandon the check altogether.

[Photo by Ilya Pavlov on Unsplash]

Powered by WordPress & Theme by Anders Norén