Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Chinese word segmentation method and device, electronic equipment and storage medium

A Chinese word segmentation and text sequence technology, which is applied in electrical digital data processing, instruments, calculations, etc., can solve the problem of high time complexity, and achieve the effect of reducing the amount of calculation, shortening the time consumed, and improving work efficiency.

Pending Publication Date: 2021-02-19
SUZHOU UNIV
View PDF4 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Therefore, the technical problem to be solved by the present invention is to overcome the excessive time complexity defect in word segmentation processing in the prior art, thereby providing a Chinese word segmentation method, including the following steps: obtaining the text sequence to be processed, the text sequence to be processed Including a plurality of characters arranged in sequence; extracting the feature vector corresponding to each character in the text sequence to be processed to obtain a feature vector group; mapping each feature vector in the feature vector group to a two-dimensional vector, wherein , the two-dimensional vector includes a first dimension value and a second dimension value, the first dimension value can affect the judgment parameter that the corresponding character interval is not a word boundary, and the second dimension value can affect the corresponding character interval as The judgment parameter of the word boundary; determine whether the corresponding character interval is a word boundary by the first dimension value and the second dimension value

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese word segmentation method and device, electronic equipment and storage medium
  • Chinese word segmentation method and device, electronic equipment and storage medium
  • Chinese word segmentation method and device, electronic equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0052] This embodiment provides a Chinese word segmentation method, figure 1 It is a flowchart illustrating the extraction, mapping and judgment of the text sequence to be processed according to some embodiments of the present invention. Although the processes described below include operations in a particular order, it should be clearly understood that these processes may also include more or fewer operations, which may be performed sequentially or in parallel (e.g., using parallel processors) or multi-threaded environment). Such as figure 1 As shown, the method includes:

[0053] S101. Acquire a text sequence to be processed, where the text sequence to be processed includes a plurality of sequentially arranged characters.

[0054] In the above implementation steps, the word segmentation of the Chinese text sequence is usually to distinguish the word boundary of a sentence in an article or several paragraphs, and extract the continuous characters in the article or several ...

Embodiment 2

[0095] This embodiment provides a Chinese word segmentation device, which is used to perform word segmentation processing on the text sequence to be processed, such as figure 2 shown, including:

[0096] The acquiring module 201 is configured to: acquire a text sequence to be processed, the text sequence to be processed includes a plurality of sequentially arranged characters; for details, please refer to the relevant description of step S101 in Embodiment 1, which will not be repeated here.

[0097] The extraction module 202 is configured to: extract the feature vector corresponding to each character in the text sequence to be processed to obtain a feature vector group; for details, please refer to the relevant description of step S102 in Embodiment 1, which will not be repeated here.

[0098] A mapping module 203, configured to map each eigenvector in the eigenvector group to a two-dimensional vector, wherein the two-dimensional vector includes a first dimension value and a...

Embodiment 3

[0102] This embodiment provides an electronic device, such as image 3 As shown, the device includes a processor 301 and a memory 302, wherein the processor 301 and the memory 302 can be connected through a bus or in other ways, image 3 Take connection via bus as an example.

[0103] The processor 301 may be a central processing unit (Central Processing Unit, CPU). The processor 301 may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), graphics processors (Graphics Processing Unit, GPU), embedded neural network processors (Neural-network Processing Unit, NPU) or other Dedicated deep learning coprocessor, application specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components and other chips, or a combination of the above-mentioned ty...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a Chinese word segmentation method and device, electronic equipment and a storage medium, and the method comprises the steps: obtaining a to-be-processed text sequence which comprises a plurality of characters arranged in sequence; extracting a feature vector corresponding to each character in the to-be-processed text sequence to obtain a feature vector group; mapping each feature vector in the feature vector group into a two-dimensional vector, with the two-dimensional vector comprising a first dimensional value and a second dimensional value; and determining whether the corresponding character interval is a word boundary or not through the first dimension value and the second dimension value. The multi-classification problem of Chinese characters and words is simplified to the two-classification problem of word boundaries, that is, the character interval is the word boundary or not the word boundary, so that when the system performs word segmentation on the to-be-processed text sequence, the calculated amount is greatly reduced, the consumed time is naturally and greatly shortened, and the working efficiency of whole Chinese word segmentation processing isimproved.

Description

technical field [0001] The invention relates to the field of natural language processing, in particular to a Chinese word segmentation method, device, electronic equipment and storage medium. Background technique [0002] As one of the basic tasks in natural language processing, Chinese word segmentation is an essential and key preprocessing link for many natural language processing tasks. The performance of the results will directly affect the final performance of subsequent tasks. Therefore, processing Chinese word segmentation accurately and efficiently can effectively help other Chinese natural language processing tasks. [0003] In recent years, with the development of neural networks, sequence tagging models based on deep learning have made breakthrough progress in the performance of Chinese word segmentation, but many sequence tagging models have too high time complexity when processing Chinese word segmentation, for example, Characters are usually classified into: s...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/284
CPCG06F40/284
Inventor 李寿山张栋周国栋
Owner SUZHOU UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products