Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

71 results about "Text filtering" patented technology

Short text classification method based on topic word vectors and convolutional neural network

The invention discloses a short text classification method based on a topic word vector and a convolutional neural network, which comprises the following steps: 1) a data acquisition stage: acquiringshort text data according to requirements, and labeling the short text data as a training set; 2) a data preprocessing stage: performing word segmentation, stop word removal, useless text filtering and the like on the text; 3) representing short text features, namely respectively representing a theme level and a word vector level; 4) carrying out subject term vector joint training; 5) optimizing and iterating parameters of the convolutional neural network classification model; and 6) performing category prediction on the new sample. According to the invention, short text data characteristics are combined; in the feature representation stage, a topic vector and a word vector are combined for representation; semantic feature expansion is carried out on the data characteristics of the short text, text semantic information is further mined by utilizing the local sensitive information extraction capability of the convolutional neural network in the classification model training stage, and indexes such as short text classification task category prediction accuracy can be improved.
Owner:NANJING UNIV

Text filtering method based on emotional orientation analysis against malicious information

The invention relates to a text filtering method based on emotional orientation analysis against malicious information, which belongs to the technical field of computer applications and is applicableto content filtering firewalls, content filtering gateways and the like. The method is based on text content analysis, adds the text emotional analysis method and judges whether a text is the malicious information text or not according to a theme and emotional orientation of the text, thereby improving the accuracy of filtering the malicious information text.
Owner:BEIJING INSTITUTE OF TECHNOLOGYGY

Sensitive word filtering method and system

The invention relates to the character string multi-mode matching field, and discloses a sensitive word filtering method. The sensitive word filtering method comprises the steps of performing management on Chinese, English, website sensitive words and excluding words; performing a character normalization processing method; performing a group of filtering policies and realization method for sensitive words in different existence forms, at least comprising a filtering step for Chinese, English, websites, full spelling, pinyin compiling and anagram; setting a group of criterion rules for sensitive words; and performing an approximate matching method for Chinese sensitive words. The invention also discloses a sensitive word filtering apparatus. According to the sensitive word filtering method and apparatus, the requirements of a content administrator and a searcher on issued or searched text filtering sensitive words can be satisfied; filtering for a large amount of sensitive words can be carried out rapidly and accurately; and the sensitive words, the level of the sensitive words and the positions of the sensitive words in the can be returned to the caller.
Owner:北京中科汇联科技股份有限公司

Knowledge graph complementing method based on topic keyword filtering

The invention discloses a knowledge graph complementing method based on topic keyword filtering, and belongs to the field of knowledge graphs. Aiming at the problem that a certain specific completiontask cannot be completed in a targeted mode due to the fact that the text content described by an entity of an existing knowledge graph completion method is complex and redundant, the invention discloses a knowledge graph complementing method based on topic keyword filtering, which is characterized by integrating an attention mechanism aiming at the problem that the description information of an entity is complex and redundant, providing a topic keyword scoring function, and evaluating the description of the entity, so that the availability of an entity description text is improved, and the problem that a large amount of noise information exists in the description text is solved. In order to further reflect the semantic relation between the entity description and the triple, the semantic pertinence of the entity description is improved through the theme semantic space model. Through the text filtering method, the specific completion task can be completed in a targeted manner.
Owner:HARBIN ENG UNIV

Automatic generating system for role Chinese mouth shape cartoon

The invention discloses an automatic generating system for a role Chinese mouth shape cartoon, which comprises a dialogue text filtering and coding module, a dialogue phonetic segmentation module, a dialogue segmentation code integrating module and a role Chinese mouth shape cartoon generating module, wherein the dialogue text filtering and coding module performs phrase segmentation, pinyin mouth shape coding, integral recognition mark setting and coding and filtering on a dialogue text to generate and output a dialogue mouth shape code, an integral dialogue recognition coding mark and a dialogue mouth shape filtering and coding sequence; the dialogue phonetic segmentation module performs phonetic sampling and phonetic energy statistics on dialogue audio to generate and output dialogue phonetic segmentation candidate result sequences; the dialogue segmentation code integrating module is connected with the dialogue text filtering and coding module and the dialogue phonetic segmentation module and used for integrating and correcting the dialogue phonetic segmentation candidate result sequences to generate and output a dialogue segmentation code sequence; and the role Chinese mouth shape cartoon generating module is connected with the dialogue segmentation code integrating module and used for generating and outputting the role Chinese mouth shape cartoon according to the dialogue segmentation code sequence. The system can automatically finish the manufacture of the whole role Chinese mouth shape cartoon without loading a corresponding phonetic library during processing.
Owner:INST OF AUTOMATION CHINESE ACAD OF SCI

System and method for filtering text information of webpage

The invention discloses a system and a method for filtering the text information of a webpage. The system comprises a webpage browsing terminal, a proxy server, a network host and a text filtering center module, wherein the webpage browsing terminal receives, analyzes and sends a target request through a browser; the proxy server receives the target request, sends the target request to the network host, acquires returned source code information and sends the source code information to the text filtering center module for filtering; meanwhile, the proxy server is used for receiving a filtering result which is returned by the text filtering center module; the network host is used for receiving the target request and returning the source code information; and the text filtering center module is used for analyzing, filtering and determining the source code information and returning the filtering result. The invention has the advantages that: by using an object-oriented programming idea, a text is filtered and developed; by combining various data structures, the system is fully optimized; modules are independent of one another; and the reusability and the expandability of the system are greatly improved.
Owner:SHANGHAI DIANJI UNIV

Text filtering system and method

The invention discloses a text filtering system and a text filtering method. The method comprises the following steps of: establishing a filtering model according to the filtering requirement of a user; training a group of filtering samples to form a body library which is close to the filtering requirement of the user; and extracting characteristic words of a text to be filtered, identifying entities in the characteristic words, extracting an entity relation to form an entity relation vector of the text to be filtered, calculating the similarity of the filtering model and the text to be filtered, and filtering the text which is higher than a similarity threshold value. The characteristics of filtered texts are expressed accurately through extraction of entity relations according to the established filtering model of the user, so that the filtering accuracy can be increased.
Owner:云中开源数据技术(上海)有限公司

Short text labeling method, system and device for large-scale classification system

The invention belongs to the field of text classification, particularly relates to a short text labeling method, system and device for a large-scale classification system, and aims to solve the problem that the short text labeling system for the large-scale classification system is low in stability under the condition of limited data. The method comprises the steps that a first short text information set to be classified is acquired, and preprocessing is carried out based on a forward maximum matching segmented word and a word2vec word vector representation technology to obtain a second shorttext information set; based on a rule-based classification method and a supervised neural network classification method, perform binary classification on a second short text information set, then perform short text filtering, perform first-level and second-level classification labels of each short text based on the same classification method, and perform third-level and fourth-level classificationlabels of each short text based on a label propagation method of semi-supervised learning. According to the method, the stability of the short text label system oriented to the large-scale classification system is ensured under the condition of limited data.
Owner:INST OF AUTOMATION CHINESE ACAD OF SCI +1

Social-media short text filtering method based on structure and text information

InactiveCN107562728ATo achieve the purpose of filtering junk dataEasy to handleSpecial data processing applicationsFeature extractionCharacteristic space
The invention discloses a social-media short text filtering method based on structure and text information. The method includes the following steps that 1, the structural characteristics of a short text are judged, and junk information is deleted; 2, the core of the text is extracted, a judge structure judges whether a retained segment text contains the core information of a described event, if nocore information exists, the information is determined as junk information, and if the core information exists, core components are extracted; 3, textual features are extracted, and the core components of the text obtained in the step 2 are mapped to a characteristic space. By scanning a participle set of the text, such structural characteristics whether junk information exists or not can be judged, and mass data in the social network is thus easily and efficiently processed; by identifying characteristics of words, sentence patterns and the like, the feature selection purpose can be achieved, based on the method in which word2vec word vectors are added so as to obtain the average, a sentence vector is constructed, the calculation amount of a classifier model in the training process is reduced, and the semantic information of the text can be well represented.
Owner:UNIV OF ELECTRONIC SCI & TECH OF CHINA

Detection method and device of promotion information

ActiveCN106909669AEffective and accurate filteringImprove efficiencySpecial data processing applicationsLitterText filtering
The invention discloses a detection method and device of promotion information and relates to the technical field of text filtering processing. The method comprises the following steps: obtaining a pre-set sample set and extracting an information unit of each sample in the sample set; counting the occurrence number of each information unit in the sample set, and determining the information unit with the occurrence number which is more than a pre-set first threshold value as a candidate feature unit; in view of each candidate feature unit, counting a distribution condition of the candidate feature unit in each document position; determining whether the candidate feature unit is a promotion feature unit or not according to a statistical result; detecting the promotion information in a detection document according to the determined promotion feature unit. Visibly, the detection method and device of the promotion information can be used for effectively and accurately filtering advertisement information or garbage promotion information, so that a machine grasping method can also extract pure news content and the efficiency of compiling news of owned media platforms is extremely improved.
Owner:北京时间有限公司

Method of recommending personalized treatment scheme for stroke patient

ActiveCN111524571ASolve the problem of inconsistent input lengthReduce training timeTherapiesMedical automated diagnosisMedical recordNerve network
The invention discloses a method of recommending a personalized treatment scheme for a stroke patient. The method comprises the following steps: S1, preprocessing text information about physical examination and evaluation results in electronic medical records of patients; S2, expressing words, sentences and documents in the physical examination and evaluation results in the electronic medical records of the patients in a vector manner; S3, training a neural network model based on document vectors to obtain a personalized treatment scheme recommendation model; and S4, carrying out unified dataexpression, word segmentation and text filtering processing on the physical examination and evaluation results in an electronic medical record of a new patient, then carrying out document vector representation, and inputting represented document vectors into the personalized treatment scheme recommendation model to obtain a recommended personalized treatment scheme. According to the method, evaluation and physical examination information in the electronic medical record of the patient is taken as a document, the process of personalized treatment scheme recommendation is converted into a multi-label classification problem, the personalized treatment scheme can be recommended according to the physical examination results and the evaluation results of the patient, an auxiliary decision is provided for a doctor, and the burden of the doctor is reduced.
Owner:UNIV OF ELECTRONICS SCI & TECH OF CHINA

Text filtering system and method

InactiveCN102521402AFiltration needs to overcomeOvercome the shortcomings of low filtration accuracySpecial data processing applicationsAdaptive learningFiltration
The invention discloses a text filtering system and a text filtering method. The system at least comprises an ontology base construction module, an adaptive learning module and a text filtering module, wherein the ontology base construction module is used for constructing an ontology base according to the filtering requirements of a user; the adaptive learning module dynamically regulates the ontology base constructed by the ontology base construction module by performing training and learning on a group of filtering samples to make the ontology base gradually meet the filtering requirements of the user; and the text filtering module performs preprocessing, characteristic word set extraction and similarity matching on a text to be filtered to obtain relevance between the text to be filtered and an ontology, and filters the text to be filtered according to the relevance. By the system and the method, a filtering model for the user can be accurately expressed; and in the filtration, the filtering model expressed by the ontology for the user can be regulated by automatic learning, and a filtering threshold value can be dynamically regulated to achieve a good filtering effect.
Owner:SHANGHAI DIANJI UNIV

Regular expression based URL filtering method

The invention discloses a regular expression based URL filtering method. The method comprises: step 1, obtaining a first URL required to be crawled, and crawling a page corresponding to the first URL; step 2, displaying text content of the page corresponding to the first URL and a plurality of second URLs, and prompting a user to input a URL filtering rule and a text filtering rule; step 3, in response to the URL filtering rule submitted by the user, filtering the plurality of second URLs by applying the URL filtering rule to obtain one or more third URLs; and step 4, adding the one or more third URLs into a crawling queue.
Owner:孙燕群

Method and System for Text Filtering

The present disclosure discloses a method and system for text filtering. The method for text filtering comprises: pre-defining a semantic keyword in a text filtering system, the semantic keyword comprising at least one basic keyword and one logical operator; after obtaining an input text, finding, by the text filtering system, the basic keyword constituting the semantic keyword in the input text according to the pre-defined semantic keyword; in an event that a text content matching the at least one basic keyword in the input text is found, further conducting a semantic match in the found text content, the semantic match further comprising matching the found text content with the semantic keyword according to the logical operator constituting the semantic keyword; and in an event that the semantic match is successful, filtering the matched text context.
Owner:ALIBABA GRP HLDG LTD

Text similarity calculation method and apparatus

The invention provides a text similarity calculation method. The method comprises the steps of performing text segmented word filtering processing on text segmented words obtained by performing word segmentation processing on a text sample in an original black sample library and a newly input text sample according to a text filtering ratio of multiple preserving gradients based on a same drop policy; performing reconstruction on the text sample in the original black sample library and the newly input text sample by using the residual text segmented words after the filtering; representing the similarity between the newly input text sample and the black sample by utilizing the filtering ratio of the text segmented words; and by matching the text segmented words in the reconstructed black sample library and newly input text sample, setting black sample similarity for the text segmented words obtained by performing word segmentation on the newly input text sample. According to the method, the calculation efficiency of calculating the similarity between the newly input text sample and the text sample in the black sample library can be remarkably improved.
Owner:ADVANCED NEW TECH CO LTD

Text filtering method and system based on keyword weight value

The invention provides a text filtering method based on a keyword weight value. The method comprises the following steps that the weight value of a keyword is calculated; a text is filtered based on the calculated weight value of the keyword; the process of calculating the weight value of the keyword comprises the steps of judging whether the keyword is a brand-new keyword or not, calculating the number of accurate judgment data and the number of wrong judgment data in historical judgment data and the number of accurate judgment data including the keyword and the number of wrong judgment data including the keyword if the keyword is the brand-new keyword, and calculating the weight value of the keyword. The invention further provides a text filtering system based on the keyword weight value.
Owner:CHINA MOBILE COMM GRP CO LTD

Polysemy keyword based text filtering method and device

The invention discloses a polysemy keyword based text filtering method and device. The polysemy keyword based text filtering method comprises collecting a text set with an appointed keyword; generating into predetermine polysemy keyword vectors and text vectors based on the text set, wherein the predetermine polysemy keyword comprises the appointed keyword; calculating the similarity between the text vectors and the predetermine polysemy keyword vectors; filtering out texts of the text vectors with the similarity less than a predetermined threshold value. According to the polysemy keyword based text filtering method, a text list which is corresponding to the mainstream meaning is screened out based on the polysemy tag, then the used required texts are screened, the costs are low, the efficiency is high, the filtering efficiency is good, manual interference is not needed, and all polysemy keywords are applicable.
Owner:TENCENT TECH (SHENZHEN) CO LTD

Method and device for preventing text filtering and monitoring

The invention discloses a method and a device for preventing text filtering and monitoring. The method comprises the steps as follows: setting an image conversion function in an input method tool; when receiving text information input by a user, storing the text information into a memory by the input method tool; and when receiving an output instruction from the user, converting the text information in the memory into an image by the image conversion function in the input method tool, and outputting the converted image. According to the technical scheme, the text information input by the user is converted into the image, so that the information is guaranteed to be not subjected to behavior monitoring such as text monitoring, text filtering and the like in the transmission process and the information safety is guaranteed; and meanwhile, the content of the text information is presented on the image, so that a receiver can directly read and view the text information without processes of key exchange, decoding and the like.
Owner:BEIJING FEINNO COMM TECH

Junk text library establishing method and system and junk text filtering method

ActiveCN106708961AOvercoming technical issues with collecting spam samplesShorten the timeSpecial data processing applicationsText filteringFilter methods
An embodiment of the invention discloses a junk text library establishing method and belongs to the technical field of establishment of computer text libraries, wherein the method comprises: S100, acquiring at least one pre-collected junk text sample from text; S200, detecting whether long characteristic words are present in each junk text sample or not; if yes, recording the long characteristic words into a long characteristic word set; S300, classifying the junk text samples corresponding to the long characteristic word set based on Bayes classifier to obtain junk text and non-junk text; S400, comparing the number of new junk text with a preset convergence threshold, executing step 500 if the number of new junk text is less than the convergence threshold, and executing step 600 otherwise; S500, finishing the establishment of the junk text library, and ending the process; S600, acquiring new junk sample files from the text, and returning to execute the step S200 to step S500. According to the embodiment of the invention, the method allows the junk text library to be established just with few text samples collected, time and labor are saved, and the precision is greater.
Owner:北京粉笔蓝天科技有限公司

Method and apparatus for statistical text filtering

Disclosed herein is a method for automatically filtering a corpus of documents containing textual and non-textual information of a natural language. According to the method, through a first dividing step (101), the document corpus is divided into appropriate portions. At a following determining step (105), for each portion of the document corpus, there is determined a regularity value (VR) measuring the conformity of the portion with respect to character sequences probabilities predetermined for the language considered. At a comparing step (107), each regularity value (VR) is then compared with a threshold value (VT) to decide whether the conformity is sufficient. Finally, at a rejecting step (111), any portion of the document corpus whose conformity is not sufficient is rejected and removed from the corpus. An apparatus for carrying out such a method is also disclosed.
Owner:IBM CORP

Text Filtering Based on Phonetic Pronunciations

An approach is provided in which an information handling system converts a first set of text to synthesized speech using a text-to-speech converter. The information handling system then converts the synthesized speech to a second set of text using a speech-to-text converter. In response to converting the synthesized speech to the second set of text, the information handling system analyzes the second set of text against a filtering criterion and prevents usage of the synthesized speech based on the analysis.
Owner:DOORDASH INC

Dialogue intention recognition method based on entity replacement

The invention discloses a dialogue intention recognition method based on entity replacement. The dialogue intention recognition method comprises the following steps of 1, text word segmentation; 2, text filtering; 3, recognition of the text named entities; 4, replacing of the text named entities; 5, text feature extraction; and 6, text intention recognition. The dialogue intention recognition method utilizes the named entity recognition result to replace the entity name in the text information with the entity type, and can reduce the magnitude and imbalance degree of corpus data of the dialogue system, so as to comprehensively improve the accuracy of dialogue process intention recognition.
Owner:NANTONG UNIVERSITY +1

Text filtering method and device and computer storage medium

The invention discloses a text filtering method and device and a computer storage medium. The method comprises the following steps: obtaining text fluency based on a language model; obtaining an effective word rate based on an effective word dictionary constructed in a user-defined manner; when the text fluency meets a first preset threshold value and the effective word rate meets a second preset threshold value, executing a filtering operation on the text. According to the method, the problems of time and labor consumption, low efficiency, high cost and low quality of manual text screening and filtering are solved, and the semantic-level and character-level text screening quality in the corpus is improved, so the training model and service quality are improved, and the calculation overhead is reduced.
Owner:PENG CHENG LAB

Novel network media platform variant comment adversarial text generation method

The invention provides a novel network media platform variant comment adversarial text generation method, which comprises the following steps of: on the basis of summarizing variant text variant rules commonly used by a novel network media platform, firstly, extracting feature words from classified annotated texts; carrying out variant vocabulary generation based on various rules on the feature words, and carrying out variant text generation based on variant rules on the basis of the variant vocabulary generation; then training the annotated text through a word2vec word vector method to obtain word vectors of all vocabularies, obtaining a similar word list of all vocabularies according to the word vectors, and achieving variant text generation based on neural network word vectors; and finally, achieving a variant text generation method combining variant rules and word vector similar words through a probability randomization method. According to the method, massive variant texts in different forms can be generated, conventional text filtering can be resisted, and high fidelity is achieved.
Owner:NORTHWESTERN POLYTECHNICAL UNIV

Rich text filtering method, rich text filtering device and computer readable storage medium

PendingCN112883688ARealize the attackSolving requires a lot of strategyText processingPlatform integrity maintainanceTheoretical computer scienceEngineering
The invention provides a rich text filtering method, a rich text filtering device and a computer readable storage medium. The invention discloses a rich text filtering method, which comprises the following steps of: acquiring a character string of a rich text, analyzing the character string into an object comprising label data and content, traversing object nodes, filtering the label data when the current traversal object node is the label data, and transferring meaning for the content when the current traversal object node is the content. And after all the object nodes are traversed, recombining the objects which are subjected to filtering aiming at the label data and escape aiming at the content into a character string for representing the rich text.
Owner:CHINA TELECOM CORP LTD

Classification and identification method and device for junk short messages, computer equipment and storage medium

The invention discloses a junk short message classification and identification method and device, computer equipment and a storage medium. The method comprises the steps of performing text filtering on a short message text set to obtain a junk short message text set; sequentially inputting the junk short message text set into a first-level classification model and a second-level classification model to obtain a plurality of categories of junk short message text sets; and inputting each type of junk short message text set into the entity information extraction model to obtain each type of junkshort message text set after the entity information is identified or restored. By using the technical scheme of the invention, accurate classification and identification of massive short messages canbe realized, and entity information in junk short messages can be accurately extracted.
Owner:EVERSEC BEIJING TECH

An automated tool based on a code level support sw-64 architecture

The invention relates to an automation tool based on a code level support sw-64 framework, and the execution process of the tool comprises the following steps: S1, carrying out one-time batch processing on source code packets, and filtering out the source code packets related to the code level support sw-64 framework; S2, installing srpm source code packets in the filtered source code packet list;S3, running an aname command for the srpm source code packet in the filtered source code packet list so as to carry out text filtering and replacement; And S4, generating a patch file for the file before and after replacement. According to the invention, during the software package transplanting process, we only pay attention to the source codes related to the architecture and slightly modifyingthe automatic tool when the source code package is transplanted on a brand-new architecture platform. The automation tool is high in readability, convenient to maintain and high in transportability, the working content can be greatly simplified, manpower is saved, and the working efficiency is improved.
Owner:CHINA STANDARD SOFTWARE

Text classification technology-based information processing method

The invention discloses a text classification technology-based information processing method. A text is preprocessed by adopting an HTML text mark weighting scheme; before an HTML document is subjected to scan processing, an HTML mark needs to be correctly identified and processed first and texts of different parts of a webpage are subjected to weighting processing according to the HTML mark; descriptive information such as titles, page descriptions, keywords, hyperlinks and the like can be reserved, so that the classification effect is improved; a symbol dictionary is established for filtering non Chinese characters, so that the dimension of an initial text vector is reduced and characteristic information content in the text vector is increased; and stop words are removed, so that the subsequent text filtering accuracy can be improved and the subsequent text filtering rate can be increased. The information processing method is simple in operation and high in practicality, and is capable of improving subsequent information filtering accuracy and efficiency.
Owner:HEFEI MINZHONGYIXING SOFTWARE DEV CO LTD

Junk text filtering method and device, electronic device and storage medium

The embodiment of the invention provides a junk text filtering method and device, an electronic device and a storage medium. Characteristic information of a word segmentation result is extracted by using a pre-trained deep learning network model, different weights are given to different components in the junk text through the attention mechanism model, feature information is combined through the attention weights, local key information of the text is captured, the to-be-filtered text is classified, the junk text is filtered, and the junk text filtering accuracy is improved.
Owner:BEIJING QIYI CENTURY SCI & TECH CO LTD

Text filtering and extracting method and system based on full-information natural language

The invention discloses a text filtering and extracting method and system based on a full-information natural language, and the method comprises the steps: preprocessing a to-be-filtered text, and obtaining to-be-filtered text information; filtering the to-be-filtered text information according to the frame features, and dividing the to-be-filtered text information into frame information and filtered text information; inputting the filtered text information into a processing model based on a full-information natural language knowledge base, outputting knowledge point information, and feeding back the knowledge point information to the full-information natural language knowledge base; and outputting the target format file according to the framework information and the knowledge point information. Through filtering processing and refining processing based on a full-information natural language technology, the accuracy of understanding text main body ideas is improved, and specific and definite technical expressions in the text are automatically extracted.
Owner:TIANWEN DIGITAL MEDIA TECH BEIJING
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products