Multi-level long text vector retrieval method and device and electronic equipment
A long-text, multi-level technology, applied in unstructured text data retrieval, text database indexing, neural learning methods, etc., can solve difficult word ambiguity, cumbersome, time-consuming and other problems, and achieve the effect of improving recall efficiency
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0044] like figure 1 As shown, the embodiment of the present invention provides a multi-level long text vector retrieval method, including:
[0045] S101, segment the long text in the open field into text segments;
[0046] S102. Use the trained encoder to encode the text segment and the search request into dense vectors, respectively;
[0047] S103, using the text segment and the dense vector of the search request, based on vector retrieval, query to obtain a target text segment similar to the search request;
[0048] Wherein, the encoder is trained using a training data set including multi-level text segments.
[0049] In the actual application process, since long documents often need to be divided into multiple text fragments for model training, the correlation between search requests and text fragments is multi-level, and there are not only two kinds of labels: relevant and irrelevant. For example, the following four text fragments: a. document fragments that contain an...
Embodiment 2
[0085] like figure 2 As shown, another aspect of the present invention also includes a functional module architecture completely corresponding to the aforementioned method flow, that is, an embodiment of the present invention also provides a multi-level long text vector retrieval device, including:
[0086] Text segmentation module 201, for the long text of open field is segmented into text segment;
[0087] A vector encoding module 202, configured to encode the text segment and the search request respectively into dense vectors using a trained encoder, the encoder is trained using a training data set including multi-level text segments;
[0088] The vector retrieval module 203 is configured to use the text segment and the dense vector of the search request to obtain a target text segment similar to the search request based on vector retrieval.
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com