Problem deduplication method, apparatus, electronic device, and computer-readable storage medium
A problem and standard problem technology, applied in computing, electrical digital data processing, instruments, etc., can solve problems such as poor classification effect and randomness in the number of clusters, and achieve high accuracy.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0035] The embodiment of this application provides a problem deduplication method, such as figure 1 As shown, the method includes:
[0036] S101. Perform word segmentation operations on multiple question corpora to obtain multiple question vocabulary corresponding to each question corpus, and calculate the word frequency-inverse text frequency of each question vocabulary based on the first quantity of basic question corpus;
[0037] There are a large number of forums or platforms on the Internet, such as China Agricultural Technology Promotion Information Service Platform, Zhihu, etc. User 1 posts a question on the forum or platform, and user 2 can post the corresponding answer on the corresponding forum or platform. Therefore, a forum or platform corresponds to a large number of questions. The first quantity of basic question corpus may refer to all / part of the questions on one forum / platform, or may refer to all / part of the questions on multiple forums / platforms. Multiple ...
Embodiment 2
[0046] The embodiment of the present application provides another possible implementation manner. On the basis of the first embodiment, the method shown in the second embodiment is also included, wherein S102 specifically includes:
[0047] Step A: For any two question corpora, based on the word frequency-inverse text frequency of multiple question words in each question corpus in any two question corpora, establish two question vectors corresponding to any two question corpora one-to-one, and Calculate the similarity between two question vectors;
[0048] Step B: If the similarity is greater than the preset first threshold, classify any two question corpora into the same question category; if the similarity is not greater than the preset first threshold, then classify any two question corpora into two question categories. question categories;
[0049] Repeat step A and step B until multiple question corpora are classified into corresponding question categories.
[0050] For...
Embodiment 3
[0097] The embodiment of this application provides a problem deduplication device, such as figure 2 As shown, the problem deduplication device 20 may include: a word segmentation calculation module 201, a classification module 202, and a determination module 203, wherein,
[0098] The word segmentation calculation module 201 is used to perform a word segmentation operation on multiple question corpora to obtain a plurality of question vocabulary corresponding to each question corpus, and calculate the word frequency-inverse text frequency of each question vocabulary based on the first number of basic question corpora;
[0099] The classification module 202 is used to classify multiple question corpora based on word frequency-inverse text frequency of multiple question vocabulary corresponding to each question corpus to obtain multiple question categories;
[0100] The determination module 203 is configured to determine standard questions corresponding to each question categor...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com