Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A Case Elements Recognition Method Oriented to Electronic Dossier Transcript Text

A text and file technology, applied in neural learning methods, electrical digital data processing, instruments, etc., can solve problems such as laborious and laborious manual labeling of data, no identification method for case elements, and inability to directly apply case element identification.

Active Publication Date: 2022-03-01
DALIAN UNIV OF TECH
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the current named entity recognition is mostly oriented to standardized texts such as scientific and technological documents and news reports, while electronic file transcripts have the characteristics of colloquial language expression, irregular grammar, and various sentence forms. Therefore, the existing named entity recognition methods cannot directly Applied to case element identification
Existing named entity recognition methods require a large amount of manually labeled data to train the model. Manually labeling data is an extremely laborious task, and the intelligent analysis and processing of electronic files is still in its infancy, and there is no relevant standard data set manually labeled.
At present, there is no named entity recognition method for electronic dossier transcript text, that is, there is no case element recognition method

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Case Elements Recognition Method Oriented to Electronic Dossier Transcript Text
  • A Case Elements Recognition Method Oriented to Electronic Dossier Transcript Text

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] The present invention will be further described below in conjunction with accompanying drawing.

[0028] like figure 1 As shown, a case element identification method for electronic file transcript text, including the following steps:

[0029] Step 1. Electronic dossier data preprocessing: the data format of the electronic dossier is PDF, which is converted to plain text format after preprocessing, and the transcript text is screened out, and then the transcript text is cleaned, which specifically includes the following sub-steps:

[0030] Sub-step (a), electronic dossier data OCR identification, use OCR (optical character recognition) recognition software to identify the electronic dossier data, and convert it from PDF format to TXT plain text format;

[0031] Sub-step (b), screening transcript text, transcript text and other texts are included in the electronic dossier, the transcript text is characterized in that the text content is a number of question and answer pa...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention relates to the technical field of natural language processing, and relates to a case element recognition method for electronic dossier transcript text, comprising the following steps: (1) electronic dossier data preprocessing, (2) word segmentation and part-of-speech tagging in combination with a custom dictionary, (3) ) to identify the four types of case elements: time, object, weight, and amount; (4) to identify the three types of case elements: person, place, and institution. The method of the present invention fully considers the characteristics of different case elements, and adopts different processing methods in a targeted manner. For special texts such as electronic file transcripts, it can accurately identify important case elements in the absence of manual labeling data. and labeling, the case element recognition results of the present invention can be used to iteratively generate high-quality label data, thereby training a more reliable case element recognition model.

Description

technical field [0001] The invention relates to a case element recognition method for electronic file transcript text, which belongs to the technical field of natural language processing. Background technique [0002] Electronic files record and save all files generated in the process of handling cases in the form of electronic files. Electronic files have been deeply and widely used in the judicial system of our country because of their strong confidentiality, convenient statistical analysis, and high information sharing rate. With the continuous advancement of my country's "smart judiciary" informatization construction, the electronic dossier system has gradually improved, and the corresponding electronic dossier data has also increased sharply. The case handlers of the procuratorate need to review a large number of electronic files. The processing method of electronic file data is still a manual analysis and processing method. The methods and means are extremely backward,...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06V30/148G06F40/242G06F40/284G06N3/04G06N3/08
Inventor 孙媛媛刘海顺李春楠
Owner DALIAN UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products