Translators - Translator Resources
ProZ.com globalt register med översättningstjänster
 The translation workplace

Wordfast - Getting started, building TM from old translations




 


Användare
Trådens avsändare: twintrad
Wordfast - Getting started, building TM from old translations
twintrad

Franska till Engelska
Jul 23

Hello
I work inhouse at a company and now need a CAT tool. I was looking into Wordfast and was wondering if there was a way to build TM from old translations (source/target versions)?
Thanks in advance
Melinda


Direct link   Reply with quote
 

Gerard de Noord  Identity Verified
Frankrike
 Medlem (2003)

Tyska till Nederländska
+ ...
You'll need Wordfast's free little helper Jul 23

Have a look at +Tools:

http://www.wordfast.net/index.php?whichpage=plustools&lang=engb

http://www.wordfast.net/index.php?whichpage=knowledge&Task=view&questId=67&catId=15

Regards,
Gerard


Direct link   Reply with quote
 
R.M. Susil Premaratne  Identity Verified
Sri Lanka
 Medlem (2007)

Singalesiska till Engelska
+ ...
Wordfast query Jul 23

I suggest that you visit the website www.wordfast.com and forward your question.

They will definitely give a satisfactory reply/


Direct link   Reply with quote
 
twintrad

Franska till Engelska
Wordfast, getting started... Jul 24

Thanks both of you!
Melinda


Direct link   Reply with quote
 
FarkasAndras
Ungern

Engelska till Ungerska
+ ...
The best I know of... Jul 24

... is hunalign. I think it is the only aligner that reliably detects things like when a paragraph is missing in one of the texts. (Alignment is the "official" name of what you need to get done here.)

If you have a lot of material to align, use it. It'll spare you an eternity. There is a reason why teams that build parallel corpora/megaTMs (europarl & acquis) invariably use tools like hunalign and NOT winalign or plustools. I haven't used plustools, it could be reasonably good for all I know. Winalign isn't. But then even if the plustool aligner works well, it will only provide an alignment based on the wordfast segmentation of the documents. If the segmentation doesn't match with great reliability - and it won't - then you'll have an waful lot of correcting to do, because if one segment is off somewhere, everything after that will be out of alignment until you correct it. Hunalign may mis-align segments, but it automatically corrects the error further down the line.

Google hunalign, read the description on the site, and, for preprocessing, use the sentence boundary detector from here: http://www.statmt.org/europarl/v3/tools.tgz

It's command line so it won't do fancy graphics... bu then I prefer fancy performance to fancy graphics.

Basic workflow description: you convert your files to txt, run the europarl tool to chop it into sentences, feed them to hunalign, copy the output to excel, make corrections, delete unnecessary bits, and insert tags to make a standard tmx file (or wordfast translation memory) out of it, copy to notepad, save and use in WF.
All of this can be automated; plustools for the txt conversion and command line for merging txt's, lots of search and replace and copy/paste all through.

Hunalign performs much better if you feed in a bilingual dictionary/glossary.


All of this requires what I consider fairly basic computer skills and some time investment. Nag me with questions if you need to (read manuals and google first).


If people are interested in the whole procedure I may write up an article about how I did it. Also, if someone has a large amount of material the'd like aligned and no computer skills or time, we may be able to work something out.

[Edited at 2008-07-24 13:11]


Direct link   Reply with quote
 

Milan Condak
Tjeckien

Engelska till Tjeckiska
I am using hunalign and +Tools/+Align Jul 24


FarkasAndras wrote:

If you have a lot of material to align, use it. It'll spare you an eternity. There is a reason why teams that build parallel corpora/megaTMs (europarl & acquis)

Basic workflow description:
you convert your files to txt,
run the europarl tool to chop it into sentences,
feed them to hunalign,
copy the output to excel,
make corrections,
delete unnecessary bits,
and insert tags to make a standard tmx file
(or wordfast translation memory) out of it,
copy to notepad, save and use in WF.

All of this can be automated; plustools for the txt conversion and command line for merging txt's, lots of search and replace and copy/paste all through.

Hunalign performs much better if you feed in a bilingual dictionary/glossary.


My workflow description:

I convert/save my files to/as txt,
sometimes Extract them into sentences with Wordfast/Tools/Extract,
feed them to hunalign = I use short editable bat file.
I open the output in MS Word (Excel cell has limited size), convert text to table and I delete 3rd column with index.
I break/split the table to 100-pages files.
I run PlusTools/+Align and open one short file with table created with PlusTools to activate +Align menu, I open file for correction and close short file.
I make corrections (mostly split some segment and delete tildas),
I create Wordfast TM with button Create TM. I merge all created TMs.
-
I tested Hunalign without the bilingual glossary only with "null.dic" on all EU languages in pairs with Czech.

I thank to authors of Hunalign for this free tool.

Milan

[Edited at 2008-07-24 19:09]


Direct link   Reply with quote
 

Milan Condak
Tjeckien

Engelska till Tjeckiska
Example of using Hunalign and PlusTools/+Align Aug 2


Milan Condak wrote:

I tested Hunalign without the bilingual glossary only "null.dic" with Czech.

Milan


Here is example of aligment EN text + CS (machine translation)

http://www.condak.net/tools/hunalign2/en/00.html

Milan


Direct link   Reply with quote
 


Moderatorer för detta forum
Dan Marasescu[Call to this topic]



Senaste inläggen | VANLIGA FRÅGOR | Regler | Moderatorer | Artikelkunskapsbas
Copyright © 1999-2008 ProZ.com – Med ensamrätt. Privacy policy    Print page