Real-time text data flow specific information identification method and system

A technology for specific information and identification methods, which is applied in text database clustering/classification, unstructured text data retrieval, neural learning methods, etc. Data analysis, etc.

Active Publication Date: 2020-05-12
NAT COMP NETWORK & INFORMATION SECURITY MANAGEMENT CENT +1
View PDF6 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Existing methods focus on improving the performance index of the model in a closed data set. There are two main problems: one is that the expected recognition effect cannot be achieved under the characteristics of data sparsity and diversity in the production environment; Models fail to meet efficiency goals for real-world applications
In general, the existing methods lack systematic and multi-dimensional research and exploration of information recognition, and most of the research content focuses on algorithm improvement and academic output, which cannot be applied to online real-time data analysis of actual massive texts, leading to research The applicability of the results is not strong

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Real-time text data flow specific information identification method and system
  • Real-time text data flow specific information identification method and system
  • Real-time text data flow specific information identification method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0076] An information recognition framework for massive real-time text data streams and the key technical points involved in the system mainly include domain language model pre-training, deep network recognition modules, and cascaded model processing frameworks. The main technical key points and technical effects are explained as follows.

[0077] Key point 1, training domain language model. For tasks related to natural language processing, it is usually necessary to represent the text as a computable numerical vector first, and the language model is a way to represent the text as a vector. First of all, it is necessary to accumulate a large amount of domain corpus data and a certain amount of category labeling data, and preprocess the text data such as removing special symbols, and then use the domain corpus data to perform an unsupervised language model pre-training process. On the basis, using category labeling data, a supervised language model fine-tuning process is perfo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a real-time text data flow specific information identification method and system. The invention mainly solves the problem of realizing specific information identification concerned in the field for massive real-time text data streams. The invention provides a specific information identification framework and system for massive real-time text data streams, focuses on social contact text specific information identification with the characteristics of massive performance, real-time performance, diversification and complexity, and realizes a social contact big data online real-time analysis system suitable for a production environment. The method and the system aim to identify specific information implied in a text according to massive text data streams in an Internet environment and a mobile phone short message network environment.

Description

technical field [0001] The present invention relates to the fields of big data technology, natural language processing and deep learning, in particular to a specific information identification method and system for massive real-time text data streams. Background technique [0002] With the rapid development and popularization of web2.0 technology and mobile Internet, people's communication methods have broken through the traditional offline language communication and paper-based text mode, which has brought about changes in information dissemination modes in many aspects. On the one hand, from information The communication carrier can use software-based communication media such as SMS, Weibo, QQ, and WeChat. On the other hand, compared with the traditional information dissemination mode, the new technology has greatly improved the information dissemination speed, dissemination scope, and degree of influence. For example, a single microblog can involve hundreds of thousands o...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/35G06K9/62G06N3/04G06N3/08
CPCG06F16/35G06N3/08G06N3/045G06F18/2415
Inventor 李扬曦任博雅井雅琪时磊段东圣余翠玲胡燕林佟玲玲宋永浩梁冬
Owner NAT COMP NETWORK & INFORMATION SECURITY MANAGEMENT CENT
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products