Source term collector
Trådens avsändare: CafeTran Trainer
CafeTran Trainer
CafeTran Trainer
Nederländerna
Medlem (2006)
Jul 6, 2024

Many CAT tools provide functions to list the frequent source terms of a project. This process usually produces a lot of garbage. Is there a program that only looks at the left and right of frequent nouns and then lists groups of two or three words?

 
CafeTran Trainer
CafeTran Trainer
Nederländerna
Medlem (2006)
TOPIC STARTER
Source fragment harvester Jul 7, 2024

I should have chosen "Source fragment harvester" as the subject.

Since there have been no replies to my post, I'd like to post an idea I've had since I posted it:

Use a regular expression to extract the candidates.

Sort in Excel and delete the noise.

Screenshot 2024-07-07 at 14.01.15

Screenshot 2024-07-07 at 14.00.59

[Bijgewerkt op 2024-07-07 12:20 GMT]


 
CafeTran Trainer
CafeTran Trainer
Nederländerna
Medlem (2006)
TOPIC STARTER
Got this suggestion Jul 8, 2024

A kind person gave me this suggestion:

sed -E "s/( a| all| allows| are| at| in| for| of| to| with| on| by| or| of| the| and| is| at)$//"


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Source term collector







Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »
LinguaCore
AI Translation at Your Fingertips

The underlying LLM technology of LinguaCore offers AI translations of unprecedented quality. Quick and simple. Add a human linguistic review at the end for expert-level quality at a fraction of the cost and time.

More info »