https://sve.proz.com/forum/cat_tools_technical_help/368344-source_term_collector.html

Source term collector
Trådens avsändare: CafeTran Trainer
CafeTran Trainer
CafeTran Trainer
Nederländerna
Medlem (2006)
Jul 6, 2024

Many CAT tools provide functions to list the frequent source terms of a project. This process usually produces a lot of garbage. Is there a program that only looks at the left and right of frequent nouns and then lists groups of two or three words?

 
CafeTran Trainer
CafeTran Trainer
Nederländerna
Medlem (2006)
TOPIC STARTER
Source fragment harvester Jul 7, 2024

I should have chosen "Source fragment harvester" as the subject.

Since there have been no replies to my post, I'd like to post an idea I've had since I posted it:

Use a regular expression to extract the candidates.

Sort in Excel and delete the noise.

Screenshot 2024-07-07 at 14.01.15

Screenshot 2024-07-07 at 14.00.59

[Bijgewerkt op 2024-07-07 12:20 GMT]


 
CafeTran Trainer
CafeTran Trainer
Nederländerna
Medlem (2006)
TOPIC STARTER
Got this suggestion Jul 8, 2024

A kind person gave me this suggestion:

sed -E "s/( a| all| allows| are| at| in| for| of| to| with| on| by| or| of| the| and| is| at)$//"


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Source term collector


Translation news related to CAT tools





Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »
Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »