A Word Segmentation Method Based on Multidimensional Comprehensive Thesaurus

A multi-dimensional, thesaurus technology, applied in special data processing applications, instruments, electronic digital data processing, etc., can solve problems such as difficult to meet the real-time requirements of e-commerce search, inseparable from the understanding of information, and high computational complexity , to achieve the effect of high computational load, simple and easy-to-understand method, and strong scalability

Active Publication Date: 2017-04-05
FOCUS TECH
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0012] In addition, ambiguous words, wrong words, and singular and plural numbers in English are inseparable from the understanding of information. Although the current word segmentation method based on language understanding can eliminate ambiguity to a certain extent, the method has high computational complexity. It is also difficult to adjust after a word segmentation error occurs, and it is difficult to meet the real-time requirements of e-commerce search

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Word Segmentation Method Based on Multidimensional Comprehensive Thesaurus
  • A Word Segmentation Method Based on Multidimensional Comprehensive Thesaurus
  • A Word Segmentation Method Based on Multidimensional Comprehensive Thesaurus

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] The present invention will be described in further detail below in conjunction with the accompanying drawings and specific embodiments.

[0037] The method for constructing a multi-dimensional comprehensive thesaurus in the e-commerce field of this embodiment includes:

[0038] (1) Select a data source and perform usage statistics;

[0039] On the e-commerce platform, a large number of users search for products every day. In the search log, select the search keywords used by users within a period of time, and deduplicate the search keywords of each user every day. Then count the daily user usage of each search keyword, add up the daily user usage of the search keyword for a period of time, and calculate the user usage of the search keyword for a period of time, the user usage represents The hotspot distribution of the current search keywords;

[0040] On the e-commerce platform, in order to carry out Internet marketing, each product contains product keyword informatio...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an establishing method based on a multi-dimensional comprehensive lexicon. The method includes the steps of selecting data sources, conducting use amount statistics, selecting keywords according to the constraint conditions, setting up a multi-dimensional maintenance word section for the keywords, obtaining synonyms for the original keywords and singular forms of plural English keywords, completing the content of the lexicon, formulating the central keyword recognition rule, and finding out the central keywords contained in the original keywords. The invention meanwhile discloses a search word separation method based on the multi-dimensional comprehensive lexicon and a central keyword recognition method. By setting up the multi-dimensional comprehensive lexicon, applying the semantic recognition technology in the lexicon and recognizing the central keywords of commodities, a good foundation is set up for matching. By comprehensively applying a character string matching word separation method and a word separation method based on statistics and the lexicon, combining the automatic method with the manual method and participating in the maintenance upgrade of the lexicon, the word separation accuracy is improved.

Description

technical field [0001] The invention relates to the word segmentation technology in the search engine technology, in particular to the word segmentation method in the e-commerce search and the technology for understanding commodity information. Background technique [0002] With the rapid development of e-commerce, more and more suppliers provide a large number of commodities for display on the e-commerce platform. Numerous buyers and buyers want to find products that meet their own needs among so many commodities, and they cannot do without the help of e-commerce search engines. Only by searching for commodities through it can it be possible to find and select products. To browse detailed product information. [0003] In this case, buyers search for products through search, and hope that the products in the search results should not only be comprehensive, but also accurate, which puts forward higher requirements for the accuracy and recall of the search. In search technol...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/2448G06F16/3338G06F16/90344
Inventor 李仁勇
Owner FOCUS TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products