Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

2732 results about "Lexicon" patented technology

A lexicon, word-hoard, wordbook, or word-stock is the vocabulary of a person, language, or branch of knowledge (such as nautical or medical). In linguistics, a lexicon is a language's inventory of lexemes. The word "lexicon" derives from the Greek λεξικόν (lexicon), neuter of λεξικός (lexikos) meaning "of or for words."

Speech-enabled language translation system and method enabling interactive user supervision of translation and speech recognition accuracy

A system and method for a highly interactive style of speech-to-speech translation is provided. The interactive procedures enable a user to recognize, and if necessary correct, errors in both speech recognition and translation, thus providing robust translation output than would otherwise be possible. The interactive techniques for monitoring and correcting word ambiguity errors during automatic translation, search, or other natural language processing tasks depend upon the correlation of Meaning Cues and their alignment with, or mapping into, the word senses of third party lexical resources, such as those of a machine translation or search lexicon. This correlation and mapping can be carried out through the creation and use of a database of Meaning Cues, i.e., SELECT. Embodiments described above permit the intelligent building and application of this database, which can be viewed as an interlingua, or language-neutral set of meaning symbols, applicable for many purposes. Innovative techniques for interactive correction of server-based speech recognition are also described.
Owner:ZAMA INNOVATIONS LLC

System and methods for maintaining speech-to-speech translation in the field

A method and apparatus are provided for updating the vocabulary of a speech translation system for translating a first language into a second language including written and spoken words. The method includes adding a new word in the first language to a first recognition lexicon of the first language and associating a description with the new word, wherein the description contains pronunciation and word class information. The new word and description are then updated in a first machine translation module associated with the first language. The first machine translation module contains a first tagging module, a first translation model and a first language module, and is configured to translate the new word to a corresponding translated word in the second language. Optionally, the invention may be used for bidirectional or multi-directional translation
Owner:META PLATFORMS INC

Context vector generation and retrieval

A system and method for generating context vectors for use in storage and retrieval of documents and other information items. Context vectors represent conceptual relationships among information items by quantitative means. A neural network operates on a training corpus of records to develop relationship-based context vectors based on word proximity and co-importance using a technique of “windowed co-occurrence”. Relationships among context vectors are deterministic, so that a context vector set has one logical solution, although it may have a plurality of physical solutions. No human knowledge, thesaurus, synonym list, knowledge base, or conceptual hierarchy, is required. Summary vectors of records may be clustered to reduce searching time, by forming a tree of clustered nodes. Once the context vectors are determined, records may be retrieved using a query interface that allows a user to specify content terms, Boolean terms, and / or document feedback. The present invention further facilitates visualization of textual information by translating context vectors into visual and graphical representations. Thus, a user can explore visual representations of meaning, and can apply human visual pattern recognition skills to document searches.
Owner:FAIR ISAAC & CO INC

Natural language processing interface

The present invention provides an interface and associated object model that exposes a comprehensive set of natural language processing features to an application developer. In one embodiment, the features include lexicon management services and proofing services.
Owner:MICROSOFT TECH LICENSING LLC

Systems and methods for building a native language phoneme lexicon having native pronunciations of non-native words derived from non-native pronunciations

Systems and methods are provided for automatically building a native phonetic lexicon for a speech-based application trained to process a native (base) language, wherein the native phonetic lexicon includes native phonetic transcriptions (base forms) for non-native (foreign) words which are automatically derived from non-native phonetic transcriptions of the non-native words.
Owner:NUANCE COMM INC

Pronunciation correction of text-to-speech systems between different spoken languages

Pronunciation correction for text-to-speech (TTS) systems and speech recognition (SR) systems between different languages is provided. If a word requiring pronunciation by a target language TTS or SR is from a same language as the target language, but is not found in a lexicon of words from the target language, a letter-to-speech (LTS) rules set of the target language is used to generate a letter-to-speech output for the word for use by the TTS or SR configured according to the target language. If the word is from a different language as the target language, phonemes comprising the word according to its native language are mapped to phonemes of the target language. The phoneme mapping is used by the TTS or SR configured according to the target language for generating or recognizing an audible form of the word according to the target language.
Owner:MICROSOFT TECH LICENSING LLC

Method and Apparatus for Automatic Detection of Spelling Errors in One or More Documents

Methods and apparatus are provided for automatically detecting spelling errors in one or more documents, such as documents being processed for the creation of a lexicon According to one aspect of the invention, a spelling error is detected in one or more documents by determining if at least one given word in the one or more documents satisfies a predefined misspelling criteria, wherein the predefined misspelling criteria comprises the at least one given word having a frequency below a predefined low threshold and the at least one given word being within a predefined edit distance of one or mote other words in the one or more documents having a frequency above a predefined high threshold; and identifying a given word as a potentially misspelled word if the given word satisfies the predefined misspelling criteria
Owner:IBM CORP

Lexicon with sectionalized data and method of using the same

A data structure for a word lexicon includes a plurality of separate data sections for storing information related to word entries. An indices section includes pointers indicating the location of the information. The location of the associated pointers for each word entry is obtained as a function of a list of the word entries.
Owner:MICROSOFT TECH LICENSING LLC

Conceptual world representation natural language understanding system and method

A Natural Language Understanding system is provided for indexing of free text documents. The system according to the invention utilizes typographical and functional segmentation of text to identify those portions of free text that carry meaning. The system then uses words and multi-word terms and phrases identified in the free to text to identify concepts in the free text. The system uses a lexicon of terms linked to a formal ontology that is independent of a specific language to extract concepts from the free text based on the words and multi-word terms in the free text. The formal ontology contains both language independent domain knowledge concepts and language dependent linguistic concepts that govern the relationships between concepts and contain the rules about how language works. The system according to the current invention may preferably be used to index medical documents and assign codes from independent coding systems, such as, SNOMED, ICD-9 and ICD-10. The system according to the current invention may also preferably make use of syntactic parsing to improve the efficiency of the method.
Owner:NUANCE COMM INC

Method and system for approximate string matching

A method and system are provided for approximate string matching of a target string to a trie data structure. The trie data structure has a root node and generations of child nodes each node representing at least one character in an alphabet to provide a lexicon of words and word fragments. The method involves traversing the trie data structure starting from the root node by comparing each node of a branch of the trie data structure to characters in the target string and adding characters traversed in a branch of the trie data structure to a gathered string to provide suggestions of approximate matches. If the method reaches a node flagged as a node for a word or a word fragment and, if the target string is longer than the gathered string, the method loops back to the root node, and continues the traverse from the root node. This enables the trie data structure to use word fragments for compound words and to split non-delimited words where appropriate. The method also includes, at each node, determining if there is a correction rule for one or more characters in the remainder of the target string from the current node, and if so, applying the correction rule to the target string to obtain a modified target string.
Owner:IBM CORP

System and method for recognizing word patterns in a very large vocabulary based on a virtual keyboard layout

A word pattern recognition system based on a virtual keyboard layout combines handwriting recognition with a virtual, graphical, or on-screen keyboard to provide a text input method with relative ease of use. The system allows the user to input text quickly with little or no visual attention from the user. The system supports a very large vocabulary of gesture templates in a lexicon, including practically all words needed for a particular user. In addition, the system utilizes various techniques and methods to achieve reliable recognition of a very large gesture vocabulary. Further, the system provides feedback and display methods to help the user effectively use and learn shorthand gestures for words. Word patterns are recognized independent of gesture scale and location. The present system uses language rules to recognize and connect suffixes with a preceding word, allowing users to break complex words into easily remembered segments.
Owner:CERENCE OPERATING CO

Language input architecture for converting one text form to another text form with tolerance to spelling, typographical, and conversion errors

A language input architecture converts input strings of phonetic text to an output string of language text. The language input architecture has a search engine, one or more typing models, a language model, and one or more lexicons for different languages. The typing model is configured to generate a list of probable typing candidates that may be substituted for the input string based on probabilities of how likely each of the candidate strings was incorrectly entered as the input string. The language model provides probable conversion strings for each of the typing candidates based on probabilities of how likely a probable conversion output string represents the candidate string. The search engine combines the probabilities of the typing and language models to find the most probable conversion string that represents a converted form of the input string.
Owner:MICROSOFT TECH LICENSING LLC

Oral modification of an asr lexicon of an asr engine

Methods, apparatus, and computer program products are described for providing oral modification of an ASR lexicon of an ASR engine that include receiving, in the ASR engine from a user through a multimodal application, speech for recognition, where the ASR engine includes an ASR lexicon of words capable of recognition by the ASR engine, and the ASR lexicon does not contain at least one word of the speech for recognition; indicating by the ASR engine through the multimodal application to the user that the ASR lexicon does not contain the word; receiving by the ASR engine from the user through the multimodal application an oral instruction to add the word to the ASR lexicon, where the oral instruction is accompanied by an oral spelling of the word from the user; and executing the instruction by the ASR engine.
Owner:NUANCE COMM INC

Linguistically-adapted structural query annotation

A system and method for natural language processing of queries are provided. A lexicon includes text elements that are recognized as being a proper noun when capitalized. A natural language query includes a sequence of text elements including words. The query is processed. The processing includes a preprocessing step, in which part of speech features are assigned to the text elements in the query. This includes identifying, from a lexicon, a text element in the query which starts with a lowercase letter and assigning recapitalization information to the text element in the query, based on the lexicon. This information includes a part of speech feature of the capitalized form of the text element. Then parts of speech for the text elements in the query are disambiguated, which includes applying rules for recapitalizing text elements based on the recapitalization information.
Owner:XEROX CORP

Systems and methods for building a native language phoneme lexicon having native pronunciations of non-natie words derived from non-native pronunciatons

Systems and methods are provided for automatically building a native phonetic lexicon for a speech-based application trained to process a native (base) language, wherein the native phonetic lexicon includes native phonetic transcriptions (base forms) for non-native (foreign) words which are automatically derived from non-native phonetic transcriptions of the non-native words.
Owner:MICROSOFT TECH LICENSING LLC

System and methods for providing runtime spelling analysis and correction

A system and methods are provided for runtime spelling analysis and correction in a computing system. Misspelled entries or input text is automatically corrected turning the misspelled entries or input text into valid entries or text. The words used for spell checking and correction may be collected through multiple channels or from multiple sources, including words commonly found on the Web, and in users' entries or input text, as well as words from a standard language lexicon, all of which may be in one or more languages. The word(s) are automatically corrected only when there is a very high confidence that the correction is desirable. In various embodiments, the system implements a plurality of mechanisms with which the user can override the correction if invoked.
Owner:MICROSOFT TECH LICENSING LLC

Method and apparatus for performing speech recognition utilizing a supplementary lexicon of frequently used orthographies

The invention relates to a method and an apparatus for recognising speech, more particularly to a speech recognition system and method utilising a speech recognition dictionary supplemented by a lexicon containing frequently occurring word sequences (orthographies). In typical speech recognition systems, the process of speech recognition consists of scanning the vocabulary database or dictionary by using a fast match algorithm to find the top N candidates that potentially match the input speech. In a second pass the N candidates are re-scored using more precise likelihood computations. The novel method comprises the introduction of a step in the search stage that consists of forcing the insertion in the list of N candidates entries selected from a lexicon containing frequently used orthographies to increase the probability of occurrence of certain text combinations.
Owner:RPX CLEARINGHOUSE

Spelling and grammar checking system

System of correcting misspelled words in input text detects a misspelled word in the input text, determines a list of alternative words for the misspelled word, and ranks the list of alternative words based on a context of the input text. In certain embodiments, finite state machines (FSMs) are utilized in the spelling and grammar correction process, storing one or more lexicon FSMs, each of which represents a set of correctly spelled reference words. Storing the lexicon as one or more FSMs facilitates those embodiments of the invention employing a clinet-server architecture. The input text to be corrected may also be encoded as a FSM, which includes alternative word(s) for word(s) in need of correction along with associated weights. The invention adjusts the weights by taking into account the grammatical context in which the word appears in the input text. In certain embodiments the modification is performed by applying a second FSM to the FSM that was generated for the input text, where the second FSM encodes a grammatically correct sequence of words, thereby generating an additional FSM.
Owner:GLOBAL INFORMATION RES TECH

System and method for suggestion mining

A system and method for extraction of suggestions for improvement form a corpus of documents, such as customer reviews, are disclosed. A structured terminology provided or a topic includes a set of semantic classes, each including a set of terms. A thesaurus of terms relating to suggestions of improvement is provided. Text elements of text strings in the documents which are instances of terms in the structured terminology are labeled with the corresponding semantic class and text elements which are instances of terms in the thesaurus are also labeled. A set of patterns is applied to the labeled text strings to identify suggestions of improvement expressions. The patterns define syntactic relations between text elements, some of which are required to be instances of one of the terms in a particular semantic class or thesaurus. A set of suggestions for improvements is output based on the identified suggestions of improvement expressions.
Owner:XEROX CORP

System and method for recognizing word patterns in a very large vocabulary based on a virtual keyboard layout

A word pattern recognition system based on a virtual keyboard layout combines handwriting recognition with a virtual, graphical, or on-screen keyboard to provide a text input method with relative ease of use. The system allows the user to input text quickly with little or no visual attention from the user. The system supports a very large vocabulary of gesture templates in a lexicon, including practically all words needed for a particular user. In addition, the system utilizes various techniques and methods to achieve reliable recognition of a very large gesture vocabulary. Further, the system provides feedback and display methods to help the user effectively use and learn shorthand gestures for words. Word patterns are recognized independent of gesture scale and location. The present system uses language rules to recognize and connect suffixes with a preceding word, allowing users to break complex words into easily remembered segments.
Owner:CERENCE OPERATING CO

Determining language for character sequence

A method for selecting the language for a character sequence fed into a data processing device, wherein decision trees are trained for different characters on the basis of lexicons of predetermined languages. The decision trees describe language probabilities on the basis of characters in the environments of the characters. The decision trees for at least some of the characters of the character sequence fed into the data processing device are traversed, thus obtaining a probability of at least one language for each character. The language for the character sequence is selected on the basis of the probabilities obtained.
Owner:WSOU INVESTMENTS LLC

System and method for dynamically evaluating latent concepts in unstructured documents

A system and method for dynamically evaluating latent concepts in unstructured documents is disclosed. A multiplicity of concepts are extracted from a set of unstructured documents into a lexicon. The lexicon uniquely identifies each concept and a frequency of occurrence. A frequency of occurrence representation is created for the documents set. The frequency representation provides an ordered corpus of the frequencies of occurrence of each concept. A subset of concepts is selected from the frequency of occurrence representation filtered against a pre-defined threshold. A group of weighted clusters of concepts selected from the concepts subset is generated. A matrix of best fit approximations is determined for each document weighted against each group of weighted clusters of concepts.
Owner:NUIX NORTH AMERICA

Spelling and grammar checking system

System of correcting misspelled words in input text detects a misspelled word in the input text, determines a list of alternative words for the misspelled word, and ranks the list of alternative words based on a context of the input text. In certain embodiments, finite state machines (FSMs) are utilized in the spelling and grammar correction process, storing one or more lexicon FSMs, each of which represents a set of correctly spelled reference words. Storing the lexicon as one or more FSMs facilitates those embodiments of the invention employing a clinet-server architecture. The input text to be corrected may also be encoded as a FSM, which includes alternative word(s) for word(s) in need of correction along with associated weights. The invention adjusts the weights by taking into account the grammatical context in which the word appears in the input text. In certain embodiments the modification is performed by applying a second FSM to the FSM that was generated for the input text, where the second FSM encodes a grammatically correct sequence of words, thereby generating an additional FSM.
Owner:GLOBAL INFORMATION RES TECH

System and method for improving text input in a shorthand-on-keyboard interface

A word pattern recognition system improves text input entered via a shorthand-on-keyboard interface. A core lexicon comprises commonly used words in a language; an extended lexicon comprises words not included in the core lexicon. The system only directly outputs words from the core lexicon. Candidate words from the extended lexicon can be outputted and simultaneously admitted to the core lexicon upon user selection. A concatenation module enables a user to input parts of a long word separately. A compound word module combines two common shorter words whose concatenation forms a long word.
Owner:NUANCE COMM INC

Process and system for high precision coding of free text documents against a standard lexicon

Coding free text documents, especially in medicine, has become an urgent priority as electronic medical records (EMR) mature, and the need to exchange data between EMRs becomes more acute. However, only a few automated coding systems exist, and they can only code a small portion of the free text against a limited number of codes. The precision of these systems is low and code quality is not measured. The present invention discloses a process and system which implements semantic coding against standard lexicon(s) with high precision. The standard lexicon can come from a number of different sources, but is usually developed by a standard's body. The system is semi-automated to enable medical coders or others to process free text documents at a rapid rate and with high precision. The system performs the steps of segmenting a document, flagging the need for corrections, validating the document against a data type definition, and looking up both the semantics and standard codes which correspond to the document's sentences. The coder has the option to intervene at any step in the process to fix mistakes made by the system. A knowledge base, consisting of propositions, represents the semantic knowledge in the domain. When sentences with unknown semantics are discovered they can be easily added to the knowledge base. The propositions in the knowledge base are associated with codes in the standard lexicon. The quality of each match is rated by a professional who understands the knowledge domain. The system uses this information to perform high precision coding and measure the quality of the match.
Owner:JAMIESON PATRICK WILLIAM

Systems and methods for creating and publishing relational data bases

A searchable electronic database system that can return search results independent of reference source type. The electronic database system includes information that can be content or discipline specific. The database can be focused to allow research to be limited to the discipline specific universe of information. The database can include person, organization, publication, and other entity types. The publications can include journal articles, books, dissertations, grants, clinical trials, and web resources. The database can also include ontology and lexicon entities. The entities are interconnected through relationships. Searches performed on the database return results across all entity types. A single search can return results from each of the different publication types. Details of the results can be displayed. Dynamic links to one or more fields in a particular result detail can link to a result categorized according to the field.
Owner:SINGH SADANAND +5

Computer system with natural language to machine language translator

Presented is a system and method for converting or translating expressions in a natural language such as English into machine executable expressions in a formal language. This translation enables a transformation from the syntactic structures of a natural language into effective algebraic forms for further exact processing. The invention utilizes algorithms employing a reduction of sequences of terms defined over an extensible lexicon into formal syntactic and semantic structures. This term reduction incorporates both syntactic type and semantic context to achieve an effective formal representation and interpretation of the meaning conveyed by any natural language expression.
Owner:RAVENFLOW

Methods and apparatus for automatically extending the voice vocabulary of mobile communications devices

A method of improving the performance of a speech recognizer, the method involving: providing a lexicon for the speech recognizer; monitoring a user's interaction with a network; accessing a plurality of words associated with the monitored interaction; and including the plurality of words in the lexicon.
Owner:NUANCE COMM INC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products