Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

37 results about "Text normalization" patented technology

Text normalization is the process of transforming text into a single canonical form that it might not have had before. Normalizing text before storing or processing it allows for separation of concerns, since input is guaranteed to be consistent before operations are performed on it. Text normalization requires being aware of what type of text is to be normalized and how it is to be processed afterwards; there is no all-purpose normalization procedure.

Systems and methods for text normalization for text to speech synthesis

Algorithms for synthesizing speech used to identify media assets are provided. Speech may be selectively synthesized form text strings associated with media assets. A text string may be normalized and its native language determined for obtaining a target phoneme for providing human-sounding speech in a language (e.g., dialect or accent) that is familiar to a user. The algorithms may be implemented on a system including several dedicated render engines. The system may be part of a back end coupled to a front end including storage for media assets and associated synthesized speech, and a request processor for receiving and processing requests that result in providing the synthesized speech. The front end may communicate media assets and associated synthesized speech content over a network to host devices coupled to portable electronic devices on which the media assets and synthesized speech are played back.
Owner:APPLE INC

System And Method For Automatically Generating Adaptive Interaction Logs From Customer Interaction Text

ActiveUS20140153709A1Shorten the timeFacilitates correct generationSpecial service for subscribersManual exchangesAdaptive interactionContact center
A system and method for providing an adaptive Interaction Logging functionality to help agents reduce the time spent documenting contact center interactions. In a preferred embodiment the system uses a pipeline comprising audio capture of a telephone conversation, automatic speech transcription, text normalization, transcript generation and candidate call log generation based on Real-time and Global Models. The contact center agent edits the candidate call log to create the final call log. The models are updated based on analysis of user feedback in the form of the editing of the candidate call log done by the contact center agents or supervisors. The pipeline yields a candidate call log which the agents can edit in less time than it would take them to generate a call log manually.
Owner:MICROSOFT TECH LICENSING LLC

Method and system for the automatic recognition of deceptive language

A system for identifying deception within a text includes a processor for receiving and processing a text file. The processor includes a deception indicator tag analyzer for inserting into the text file at least one deception indicator tag that identifies a potentially deceptive word or phrase within the text file, and an interpreter for interpreting the at least one deception indicator tag to determine a distribution of potentially deceptive word or phrases within the text file and generating deception likelihood data based upon the density or distribution of potentially deceptive word or phrases within the text file. A method for identifying deception within a text includes the steps of receiving a first text to be analyzed, normalizing the first text to produce a normalized text, inserting into the normalized text at least one part-of-speech tag that identifies a part of speech of a word associated with the part-of-speech tag, inserting into the normalized text at least one syntactic label that identifies a linguistic construction of one or more words associated with the syntactic label, inserting into the normalized text at least one deception indicator tag that identifies a potentially deceptive word or phrase within the normalized text, interpreting the at least one deception indicator tag to determine a distribution of potentially deceptive word or phrases within the normalized text, and generating deception likelihood data based upon the density or frequency of distribution of potentially deceptive word or phrases within the normalized text.
Owner:DECEPTION DISCOVERY TECH

System and Method for the Normalization of Text

A computer-implemented method of normalizing abbreviated text to substantially unabbreviated text, performed on at least one computer system comprising at least one processor, includes generating, based at least partially on data in at least one data resource comprising abbreviated text associated with unabbreviated text, a plurality of transformation functions in at least one order; transforming at least one string with at least one of the transformation functions, wherein the at least one string at least partially comprises abbreviated text; and determining if at least a portion of the at least one string has been at least partially transformed to substantially unabbreviated text. A system and a computer program product for implementing the aforementioned method includes appropriately communicatively connected hardware components.
Owner:AVAYA INC

Apparatus and method for detecting characteristics of electronic mail message

The present invention enables accurate detection of risks from an electronic mail message. In a mail inspection unit, an information extraction section extracts text and a mail address from electronic mail accumulated in a journal DB, and a text normalization section normalizes the text. A sort-information saving section generates text sort information according to the score obtained from a sorting engine, and stores it in a mail-management-information storage section. A personal-information saving section extracts personal information from a personal-information storage section according to the mail address, and stores it in the mail-management-information storage section. Finally, a risk-level determination section compares the information stored in the mail-management-information storage section with the information stored in a category-information storage section to determine the risk level of the electronic mail.
Owner:IBM CORP

Microblog text normalizing, word segmenting and part-speech tagging method and system

The invention relates to a microblog text normalizing, word segmenting and part-speech tagging method. The microblog text normalizing, word segmenting and part-speech tagging method comprises the steps that firstly, a tagged corpus is established, and tagged corpora in the tagged corpus is divided into a training set, a development set and a testing set; secondly, a microblog dictionary is established through SVM model training and learning; thirdly, through the training set, the development set and the microblog dictionary, a text normalizing, word segmenting and part-speech tagging combined model is formed through training and learning with a BeamSearch method; fourthly, through the combined model, text normalizing, word segmenting and part-speech tagging are conducted on a microblog text to be processed at the same time, and the performance of the combined model is tested. According to the method, a large number of microblogs with tagged sentences are used as the training corpus, a candidate result is expanded through the mciroblog dictionary, the established combined model can act on three tasks at the same time, the three tasks influence each other, so that the performance of each task is improved, and therefore the overall performance is improved.
Owner:北京牡丹电子集团有限责任公司数字科技中心

Microblog text normalization method based on context graph random walk and phonetic configuration codes

InactiveCN110032738AGet phonetic similarityConform to the expression characteristicsData processing applicationsNatural language data processingChinese charactersMicroblogging
The invention provides a microblog text normalization method based on context graph random walk and phonetic configuration codes, and belongs to the technical field of computer technology social mediatext content analysis and mining. The method comprises the following steps: identifying non-standard words, and extracting word contexts; constructing a context graph for random walk to obtain a standardized candidate set based on context; obtaining a standardized candidate set based on phonetic configuration by using the phonetic configuration codes of the Chinese characters; and processing thetwo standardized candidate sets to obtain a final standardized result. The method overcomes the defect that Chinese character pronunciation is not fully considered in a traditional method. In essence,the social media is different from written languages such as news and the like and is full of a large number of non-standard abbreviations, homophones and homomorphic words, so that the effect of processing the microblog text by a natural language processing tool is not ideal. Therefore, the invention provides a microblog text normalization method which combines phonetic configuration codes withpredecessor and postdecessor understanding, thereby providing possibility for utilizing a natural language processing tool to analyze and mine after normalization.
Owner:中森云链(成都)科技有限责任公司

Single-character text normalization model training method and device, and single-character text recognition method and device

The invention relates to a single-character text normalization model training method and device, and a single-character text recognition method and device. The model training method comprises the following steps: acquiring a plurality of single-character sample pictures; normalizing the single-character sample pictures to obtain standard character pictures corresponding to the single-character sample pictures; generating a training data set according to the plurality of single-character sample pictures and standard character pictures in one-to-one correspondence with the plurality of single-character sample pictures; and training a deep learning neural network by using the training data set and a mean square loss function to obtain a single-character text normalization model. The trainingdata set used in training is composed of original data and the standard character pictures which are obtained through normalization processing and have a unified style, so that in the process of training the model, the training and convergence of the model can be accelerated, the model can better learn the essential characteristics of various input texts, and the recognition precision of the modelis further improved.
Owner:上海眼控科技股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products