Domain parallel corpus generation method and translation model training method
A parallel corpus and translation model technology, applied in the field of generation method and translation model training, can solve problems such as low efficiency, less corpus, difficult collection and processing, etc., and achieve the effect of improving content quality, ensuring correctness, and improving efficiency
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0066] A method for generating domain-parallel corpus, comprising steps:
[0067] The machine translation model is used to align the text-level corpus and the sentence-level corpus in the parallel corpus, and after the alignment, the text-level parallel corpus and the sentence-level parallel corpus are generated to form the domain parallel corpus.
Embodiment 2
[0069] On the basis of embodiment 1, including sub-steps:
[0070] Use open parallel corpus to initialize and train supervised machine translation model;
[0071] Collect bilingual website content and analyze material title, content and reporting time to generate corpus material, and store it in the parallel corpus material library;
[0072] Chapter-level parallel corpus alignment sub-step: Calculate the reporting time difference between an original source material and a translated text material in the parallel corpus material database, and match the domain terms in the title of the translated source material, for example, the reporting time difference is greater than a preset time difference threshold. If it is less than the preset time difference threshold, the initialized supervised machine translation model is used to compare the similarity of the title content of the two materials, if it is greater than the preset title content similarity threshold, then judge them. It i...
Embodiment 3
[0075] On the basis of Embodiment 2, a method for training a translation model, comprising the steps of: updating a machine translation model with the sentence-level parallel corpus generated by the method described in Embodiment 1, and then using the updated machine translation model to generate domain parallel corpus; The generation process of the domain-parallel corpus and the update process of the machine translation model are cycled separately.
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com